Open Source AI Canvas: My Vibecoding Journey

Author: Xiao Huaihua · Live Site: xhh-drawing-board.vibevibe.cn

How It Started

The spark came from a Datawhale article about AI image generation. It showed how AI could create images from text prompts — and I immediately thought: what if there was a dedicated canvas where you could combine drawing with AI generation in real time?

I explored several existing workflow tools — ComfyUI, various Gradio-based apps, even some commercial platforms — but none of them felt like a true "canvas" experience. They were either too complex for casual users or too rigid for creative exploration. I wanted something that felt like opening a sketchpad, except AI was your co-artist.

So I decided to build one myself, with Gemini as the AI backbone.

The First Attempt: One-Shot Chaos

My initial approach was ambitious: describe the entire application to Gemini in one massive prompt and let it generate everything at once.

The result? Chaos.

The generated code was a tangled mess — drawing logic mixed with API calls, UI state scattered everywhere, event handlers conflicting with each other. When something broke, fixing one thing would break two others.

I was on a deadline for a community showcase, and found myself debugging at 4 AM, desperately trying to untangle spaghetti code that neither I nor the AI could fully reason about anymore.

Lesson learned: one-shot complex implementation is a trap. The AI can generate a lot of code, but without structure, you get a lot of bad code.

The Pivot: Docs First, Small Iterations

After that painful night, I completely changed my strategy:

1. Generate Detailed Dev Docs First

Before writing a single line of code, I had Gemini help me create comprehensive development documentation:

Feature specifications for each module
Data flow diagrams
API interface definitions
Component responsibility boundaries

2. Break Into Versioned Components

Instead of one monolithic app, I split the project into clearly versioned components:

v0.1: Basic canvas with freehand drawing
v0.2: Shape tools (rectangles, circles, lines)
v0.3: AI image generation integration
v0.4: Layer management
v0.5: Export and sharing

3. Single Focus Per Iteration

Each version addressed exactly one concern. No "while we're at it" scope creep. If I was working on shape tools, I wasn't touching the AI integration — even if I had ideas for it.

4. Version Control on Docs

This was the unexpected game-changer. I version-controlled the development docs alongside the code. When the AI needed context for a new feature, I could feed it the relevant doc version and get coherent output that fit the existing architecture.

Wrestling with Gemini's API

One of the trickiest parts was the API integration. Gemini's multimodal API has a non-standard format compared to what most tutorials cover:

Image inputs need specific encoding and MIME type handling
The response format for image generation tasks differs from text completion
Rate limiting and error handling required careful retry logic
Token counting for mixed text-and-image prompts isn't straightforward

I spent considerable time reading API documentation and experimenting with different request formats. The AI helped me write the integration code, but I had to guide it with precise API specs — it couldn't guess the correct format from general knowledge alone.

Design Philosophy: Accessibility First

Two key design choices shaped the final product:

Pure Frontend HTML

I deliberately built the canvas as a pure frontend application — no backend server required. Just HTML, CSS, and JavaScript. This means:

Anyone can open the link and start drawing immediately
No account creation, no installation, no dependencies
Works on any device with a modern browser
Easy to fork and self-host

API-Friendly Architecture

The AI integration is cleanly abstracted behind an API layer. Users bring their own API key, which stays in their browser's local storage. The architecture makes it trivial to swap Gemini for another provider in the future.

What I'd Do Differently

If I started over today, I'd follow the docs-first approach from day one. The time "lost" writing specifications is repaid tenfold in coherent AI output and maintainable code.

I'd also set up automated tests earlier. The canvas has a lot of interactive state, and manual testing became a bottleneck as features accumulated.

Try It Out

The project is open source. Visit xhh-drawing-board.vibevibe.cn to try the live version, or check out the source code to see how it all fits together.

Building this taught me that Vibe Coding isn't about letting AI do everything — it's about creating the right structure so that AI can do its best work within clear boundaries.

Open Source AI Canvas: My Vibecoding Journey ​

How It Started ​

The First Attempt: One-Shot Chaos ​

The Pivot: Docs First, Small Iterations ​

1. Generate Detailed Dev Docs First ​

2. Break Into Versioned Components ​

3. Single Focus Per Iteration ​

4. Version Control on Docs ​

Wrestling with Gemini's API ​

Design Philosophy: Accessibility First ​

Pure Frontend HTML ​

API-Friendly Architecture ​

What I'd Do Differently ​

Try It Out ​