Watch language models compete head-to-head on crossword clues, streaming results as they generate answers
Crossword Sprint AI is a real-time benchmarking platform that pits multiple language models against each other in timed crossword challenges. Each model receives the same clues simultaneously and races to provide correct answers as quickly as possible.
The system scores each attempt based on both accuracy and speed, providing a comprehensive view of model performance under constrained output conditions (typical crossword answers are 3-15 characters).
The system uses SSE to stream race updates to the client in real-time. Each model attempt is sent individually as soon as it completes, rather than waiting for all models to finish.
All models process each clue simultaneously using Promise.all(), with individual completion callbacks that trigger SSE updates the moment each model finishes.
Each attempt is measured for:
Models are scored using a time-weighted accuracy system that rewards both correctness and speed:
In Wordle Mode, multiple AI models race to solve the same 5-letter word puzzle. Each model gets up to 6 guesses, and after each guess, they receive Wordle-style feedback:
Each AI model receives a carefully crafted prompt that includes:
Example prompt structure: You are playing Wordle. Guess a 5-letter English word. Rules: - You have up to 6 guesses total - After each guess, you'll get feedback: * Green (correct): letter is in the word and in the correct position * Yellow (present): letter is in the word but in a different position * Gray (absent): letter is not in the word at all - Output ONLY a single 5-letter lowercase word, nothing else Previous guesses and feedback: Guess 1: CRANE 🟨⬜⬜⬜🟨 Guess 2: STORM ⬜⬜⬜⬜⬜ Your next guess (output only the 5-letter word):
All models play simultaneously, and you see their guesses appear in real-time via Server-Sent Events (SSE). Each model's board updates as they make guesses, showing their progress as they work toward solving the puzzle.
Models are ranked based on:
Models that fail to solve the puzzle within 6 guesses are ranked below all successful solvers.
This project was created for the Web Summit Hackathon, inspired by Groq's impressive AI inference speed demonstrations. The goal was to build a visual, interactive way to compare multiple language models racing against each other in real-time.
Special thanks to Vercel and v0 for making this possible. Built entirely with v0's AI code generation and deployed on Vercel's infrastructure with the AI SDK powering real-time model streaming.
Built by George Jefferson