_SH Log's
Back to Root
EST: 4 min read

Building ComiKola: AI Comic & Webtoon Platform

ComiKola generates comics end-to-end — AI scripting, character design, panel images — on a Go + React + Temporal stack. Here's the full architecture.

#ai#image-generation#go#temporal

ComiKola generates comics and webtoons end-to-end from a story prompt. The user describes a story; ComiKola writes the script, designs the characters, generates panel images, and assembles the final webtoon strip. Here's how I built it.

The pipeline

User prompt: "A detective cat solves a mystery in a rainy city"
  → Script generator (Claude Sonnet)
      → Panel descriptions (text)
      → Character descriptions (canonical)
      → Dialogue (speech bubbles)
  → Character sheet generator (Flux/SDXL)
      → Consistent character images per character
  → Panel image generator (Flux + ControlNet)
      → Each panel with correct characters
  → Panel assembler (Go + PIL)
      → Add speech bubbles, borders, layout
  → Final webtoon strip (WebP)

The hardest problem: character consistency across panels. Standard diffusion models generate a new "cat detective" in every panel — different face, different coat color, different style. ComiKola fixes this with a character sheet approach.

Character consistency: the character sheet approach

Instead of generating characters per-panel, I generate a canonical character sheet first:

CHARACTER SHEET — Detective Mochi
Appearance: Silver tabby cat, 5'2", wearing a dark trench coat and fedora
Style: Noir, slightly cartoonish, expressive eyes
Lighting: Consistent rim lighting from upper left
Color palette: Desaturated blues and grays with amber accent (coat buttons)

The character sheet image gets used as a reference image (img2img strength: 0.65) for every panel containing that character. This anchors the visual identity across panels without fine-tuning.

For characters that appear in > 10 panels, I fine-tune a LoRA adapter (~500 inference steps, 15 minutes on an A10G) that encodes the character identity. LoRA is overkill for short strips but essential for long-form webtoons.

Script generation for comics

Comic scripts are structured differently from prose. Each panel needs:

  • Setting description (for image generation)
  • Character positions (left/center/right)
  • Dialogue (< 30 words per bubble — readability rule)
  • Panel mood/lighting
{
  "panel": 3,
  "setting": "Rain-soaked rooftop, night, city lights below",
  "characters": [
    { "name": "Mochi", "position": "left", "pose": "pointing accusingly" },
    { "name": "Villain", "position": "right", "pose": "backing away" }
  ],
  "dialogue": [
    { "character": "Mochi", "text": "It was YOU all along, Mr. Cheese!" }
  ],
  "mood": "dramatic confrontation",
  "lighting": "lightning flash from behind Mochi"
}

The LLM produces this JSON directly. Schema enforcement via JSON mode prevents free-form output that breaks the pipeline.

Panel assembly in Go

Panel assembly — adding speech bubbles, borders, and layout — runs in Go using a custom pipeline over the image bytes:

type PanelAssembler struct {
    font    *truetype.Font
    bubbles BubbleRenderer
}

func (a *PanelAssembler) Assemble(panel Panel, img image.Image) (image.Image, error) {
    canvas := image.NewRGBA(img.Bounds())
    draw.Draw(canvas, canvas.Bounds(), img, image.Point{}, draw.Src)

    for _, d := range panel.Dialogue {
        bubble := a.bubbles.Render(d.Text, d.Character, panel.Positions)
        a.drawBubble(canvas, bubble)
    }

    a.drawBorder(canvas, 4, color.Black)
    return canvas, nil
}

Speech bubble placement uses character position (left/center/right) to anchor the bubble tail. Readability check: if text + bubble would overlap the character's face, bump the bubble up.

Temporal workflow

The full comic generation pipeline runs as a Temporal workflow for durability:

func ComicWorkflow(ctx workflow.Context, req ComicRequest) (ComicResult, error) {
    var script Script
    workflow.ExecuteActivity(ctx, GenerateScript, req.Prompt).Get(ctx, &script)

    // Generate character sheets in parallel
    characterFutures := map[string]workflow.Future{}
    for _, char := range script.Characters {
        f := workflow.ExecuteActivity(ctx, GenerateCharacterSheet, char)
        characterFutures[char.Name] = f
    }
    characterSheets := map[string]string{}
    for name, f := range characterFutures {
        var path string
        f.Get(ctx, &path)
        characterSheets[name] = path
    }

    // Generate panels sequentially (order matters for story flow)
    var panels []string
    for _, panel := range script.Panels {
        var imgPath string
        workflow.ExecuteActivity(ctx, GeneratePanel, panel, characterSheets).Get(ctx, &imgPath)
        var assembled string
        workflow.ExecuteActivity(ctx, AssemblePanel, panel, imgPath).Get(ctx, &assembled)
        panels = append(panels, assembled)
    }

    var finalURL string
    workflow.ExecuteActivity(ctx, LayoutAndUpload, panels).Get(ctx, &finalURL)
    return ComicResult{URL: finalURL}, nil
}

Character sheet generation runs in parallel (independent). Panel generation runs sequentially because each panel's image influences the next (style anchoring).

FAQ

What is ComiKola? ComiKola is an AI platform that generates comics and webtoons end-to-end from a story prompt — handling script writing, character design, panel image generation, and final assembly.

How does ComiKola maintain character consistency? Using a character sheet approach: canonical character images generated first and used as reference images (img2img) for every panel. For long-form content, LoRA fine-tuning encodes character identity.

What image generation model does ComiKola use? Flux and SDXL via API, with ControlNet for pose and composition control. Character sheets use Flux for maximum quality; panels use SDXL for speed/cost.

How long does a 10-panel comic take to generate? Approximately 90–180 seconds end-to-end: ~10 seconds for script, ~30 seconds for character sheets (parallel), ~60–120 seconds for 10 panel images, ~10 seconds for assembly.


Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: Building QuantumSketch · Temporal.io for Long-Running GenAI Workflows.