Draw in the Air: How to Create a Gesture-Controlled App with AI

Ready to create?

Ever since Tom Cruise first swiped through data on a holographic screen in Minority Report, we’ve dreamed of a future where our digital interactions are seamless, intuitive, and gesture-based. That future is closer than you think. In fact, you can build a piece of it right now, in your browser.

Today, we’re going to create a web application that lets you draw and color on a live video feed using nothing but your hand gestures. Pinch your fingers to draw, use a peace sign to pause—it’s magic. But we’re not going to write hundreds of lines of complex code from scratch. Instead, we’re going to act as a director, providing a series of creative and technical prompts to an AI assistant, guiding it to build the app for us step-by-step.

This is a new paradigm for creation, where the power lies not just in knowing how to code, but in knowing how to ask the right questions. Let’s begin.

The Pre-flight Check: What You’ll Need

Before we start directing our AI, let’s make sure our stage is set. The requirements are surprisingly simple:

A Modern Web Browser: Google Chrome, Microsoft Edge, or any browser that supports modern JavaScript and has access to your camera.
A Webcam: The app needs to see your hand to track it.
A Text Editor: Something simple like Notepad will work, but I highly recommend a free editor like Visual Studio Code for a better experience.
A Secure Context (HTTPS): For security reasons, browsers will not allow a webpage to access your camera unless it’s served over a secure connection. Don’t worry, I’ll show you the easiest way to do this locally.

The Creative Spark: Prompting Our AI, Step-by-Step

Our process will be iterative. We’ll start with a core idea and then add layers of functionality, just like a real development cycle. Here are the prompts I used to generate the final code.

Prompt 1: The Foundation

First, we need the basic structure. We’ll ask the AI to create an HTML file that uses the MediaPipe Hand Landmarker model to track a user’s hand via webcam and draw a simple line when the thumb and index finger are pinched.

Prompt: “Create a single HTML file for a web application called ‘Hand Drawing & Coloring App’. Use the MediaPipe Hand Landmarker from a CDN to perform real-time hand tracking from the user’s webcam. The app should have two main components: a live video feed on the left and a drawing canvas on the right. When the user pinches their thumb and index finger, it should draw a line on the canvas following the movement of their index finger.”

The AI will generate the initial code, which includes setting up the camera, loading the MediaPipe model, and the core drawing logic based on the distance between fingertips.

Prompt 2: Adding the Artist’s Toolkit

A single red line is fun, but artists need more. Let’s ask for a sidebar with colors, an eraser, and brush size controls.

Prompt: “Update the previous code. Move the drawing canvas to cover the entire video feed area, and make the video feed a semi-transparent background. Create a sidebar on the left. This sidebar should contain:

A grid of selectable color swatches.

‘Pen’ and ‘Eraser’ tool buttons.

A slider to control the brush size.

A ‘Clear All’ button.”

With this, our app becomes a functional tool. The AI will add the HTML elements for the sidebar and the JavaScript logic to handle tool selection, color changes, and brush size adjustments.

Prompt 3: Adding Coloring Pages and Saving

To make the app more engaging, let’s add pre-made templates to color in. And crucially, we need a way to save our masterpieces.

Prompt: “Now, add a new section to the sidebar called ‘Coloring Pages’ with buttons for ‘Apple’, ‘Car’, and ‘House’. When a button is clicked, it should draw the outline of that object on the canvas. Also, add a ‘Save Picture’ button to the tools section that downloads the content of the drawing canvas as a PNG file.”

Prompt 4: The Final Polish

The last feature is the coolest. Let’s combine our drawing with the real world by including the camera feed in the final saved image.

Prompt: “Finally, update the ‘Save Picture’ function. When the user saves the image, it should include the camera’s video feed as the background, with the drawing layered on top. The final downloaded image should look exactly like what the user sees on the screen.”

This final prompt results in the complete code provided previously. The AI handles the complexity of compositing two canvases—one for the video and one for the drawing—into a single, downloadable image.

Bringing It to Life: How to Run the App

Create the File: Open your text editor, create a new file, and save it as hand_gesture_track_draw.html.
Copy & Paste: Copy the final, complete code and paste it into this file.
Serve it Securely (The Easy Way):
- If you’re using Visual Studio Code, install the “Live Server” extension.
- Once installed, right-click your HTML file inside VS Code and select “Open with Live Server.”
- This will automatically open the file in your browser using a local server, which provides the secure (HTTPS) context needed for camera access.
Grant Permissions: Your browser will ask for permission to use your webcam. Click “Allow.”

The Grand Unveiling: A Preview of Your Creation

Once running, you’ll be greeted by a sleek interface. The left sidebar is your control panel, glowing with colors and tools. The main area is your canvas, a window into your world with a semi-transparent view from your webcam.

Raise your hand. You’ll see it instantly outlined with a series of lines and dots—the 21 landmarks that the AI model is tracking in real-time. It’s mesmerizing.

Now, pinch your thumb and index finger together. A colored line appears, as if by magic, right at your fingertip. Move your hand, and the line follows. To stop drawing, simply un-pinch your fingers. Make a peace sign (✌️), and the app will recognize it as a “pause” gesture, stopping the drawing input entirely. It feels incredibly intuitive because it is.

Demo Image: a user pinching their fingers to draw a bird on the canvas, with the live video in the background (not shown)

How the Magic Works: A Peek Under the Hood

While we directed the AI, it made some smart technical choices. The core of this application is MediaPipe Hand Landmarker, a powerful model from Google. It works by:

Detecting a Palm: First, it quickly scans the video to find the general area of a hand.
Mapping the Landmarks: Once it finds a hand, it applies a more detailed model to pinpoint 21 specific key points—your knuckles, joints, and fingertips.
Calculating Gestures: The “pinch” gesture isn’t a built-in feature. The code simply calculates the distance between the landmarks for your thumb tip (landmark #4) and your index fingertip (landmark #8). If that distance falls below a certain threshold, the app knows you’re pinching. It’s clever, simple geometry.

The New Frontier of Creation

In less than an hour, without writing a single line of code from memory, we’ve built an interactive, AI-powered application that feels like science fiction. This process demonstrates a monumental shift. Your role as a creator is evolving from a meticulous coder into a visionary director. You provide the intent, the features, and the creative goals, and the AI handles the complex syntax and boilerplate.

The next masterpiece or groundbreaking application might not be born from a keyboard, but from a conversation.

You can try out the live hand gesture drawing demo right here!

So, what will you build next?

Fun AI Lab