Wanna add Vision capabilities like image identification to your app?

Let's build a simple Pictionary game using OpenAI's Vision API and Python. In this tutorial, we'll create a web app where you can draw anything, and GPT-4o will try to guess what it is.

What we're building

A straightforward web application where:

  • You draw on a canvas
  • Your drawing gets sent to OpenAI's Vision API for Analysis
  • GPT-4o responds with a one-word guess

The tech stack

  • Python with Flask for the backend
  • Simple HTML canvas for drawing
  • OpenAI's Vision API (GPT-4o)
  • Basic JavaScript for handling drawings

The code

Here's the core Python code that makes it all work:

from flask import Flask, request, jsonify
from flask_cors import CORS 

from dotenv import load_dotenv
from openai import OpenAI

import os
import re

load_dotenv()

app = Flask(__name__)
CORS(app, resources={r"/submit-drawing": {"origins": "*"}}, methods=["POST", "OPTIONS"])


api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(
    api_key=api_key,
)

@app.route('/submit-drawing', methods=['POST', 'OPTIONS'])
def submit_drawing():
    if request.method == 'OPTIONS':
        return jsonify({"message": "CORS preflight request successful"}), 200

    data = request.json
    image_data = data.get('image')

    if not image_data:
        return jsonify({"error": "No image data provided"}), 400

    img_data_match = re.match(r'data:(image/.*?);base64,(.*)', image_data)
    
    if not img_data_match:
        return jsonify({"error": "Invalid image data format"}), 400

    img_type, img_b64_str = img_data_match.groups()

    # Define the prompt to process the image
    prompt = "Analyze this image and guess what it is in a single word."

    try:
        response = client.chat.completions.create(
            model="gpt-4o-2024-08-06", # gpt-4o, gpt-4o-mini
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:{img_type};base64,{img_b64_str}"},
                        },
                    ],
                }
            ],
        )

        guess = response.choices[0].message.content

        # Return the response as JSON
        return jsonify({"guess": guess})

    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

That's your app.py. Don't forget to add your OPENAI_API_KEY and to install the packages in your requirements.txt:

flask
openai
python-dotenv
flask-cors

Testing this out

In your terminal, run python app.py then load your index.html and start drawing!

A car.
My beautiful car drawing

Complete source code

Here's the full front-end source code for index.html, script.js, and style.css:

Paste the following into your index.html:

Practical AI, not AI theatre

Want help finding your best AI use case?

Share the process, data, or customer experience you want to improve and I’ll suggest a practical starting point.

Book an AI Implementation Call