Your First LLM — N-gram Language Model Playground

N-gram 3

Words 20

Seed

Theory // how it works

What is an N-gram?

An n-gram is a sequence of N consecutive words. Given the previous (N−1) words, the model predicts the next one using frequency tables built from the corpus.

"the city" →
  was: 9
  had: 6
  seemed: 3

How it differs from GPT

GPT uses a Transformer neural network with attention spanning thousands of tokens. We use a frequency table with only 2–4 words of context. Same core idea — predict the next word — but radically different scale and architecture.

About this corpus

The training text is a short fictional narrative (~24,000 characters) following several characters — Martin, Clara, Elena, Thomas, Nadia, Gabriel, Irene — across short scenes about cities, decisions, and everyday life.

It's not random text. It has consistent characters, settings, and recurring phrases, which is exactly why the model can learn patterns from it — real sentence structure, not noise.

Good seeds to try:
"the city" · "martin smiled"
"elena watched" · "the project"

How to use this page

Pick an N-gram size in the top bar. Higher N = more coherent text but needs longer seed.
Pick how many words to generate.
Type a seed — at least (N−1) words from the corpus.
Press RUN — watch the code execute step by step.
Read the flowchart alongside to follow the algorithm.
The output appears in the status bar when done.

Live code execution // pauses are real, in the source

Algorithm flow // follows the code

ready

corpus: english

tokens: —

contexts: —

— output will appear here when generation completes —

// ============================================================ // YOUR FIRST LLM — Complete source code (this whole file) // ============================================================ // This is every line of JavaScript running in this page. // The model code (Sections 1–4) is what gets executed during RUN // — the same code shown in the main "Live code execution" panel. // The rest is UI logic: state, rendering, highlighting, modal, etc. // Pauses in generate() are REAL — they make the highlight honest. // View page source (Ctrl+U) to confirm nothing is hidden. // ============================================================ // SECTION 1 — TOKENIZE // Convert raw text into an array of lowercase words. // "The city was quiet." → ["the","city","was","quiet"] // ============================================================ function tokenize(text) { text = text.toLowerCase(); const punct = [".",",",";",":","!","?","(",")","\"","\n"]; punct.forEach(s => text = text.split(s).join(" ")); return text.split(/\s+/).filter(t => t.length > 0); } // ============================================================ // SECTION 2 — BUILD FREQUENCY TABLE (THE MODEL) // Scan the tokens once. For every position, record which word // follows each (n-1)-word context, and how many times. // The resulting nested object IS the model. // ============================================================ function buildTable(tokens, n) { const table = {}; for (let i = 0; i < tokens.length - (n - 1); i++) { const ctx = tokens.slice(i, i + n - 1).join(" "); const next = tokens[i + n - 1]; if (!table[ctx]) table[ctx] = {}; table[ctx][next] = (table[ctx][next] || 0) + 1; } return table; } // ============================================================ // SECTION 3 — WEIGHTED RANDOM CHOICE // Pick one word from { word: count } randomly but proportional // to count. Frequent words have higher chance; any can win. // This is what makes generation non-deterministic. // ============================================================ function weightedChoice(map) { const words = Object.keys(map); const total = words.reduce((s, w) => s + map[w], 0); let r = Math.random() * total; for (const w of words) { r -= map[w]; if (r <= 0) return w; } return words[0]; } // ============================================================ // SECTION 4 — GENERATE TEXT WORD BY WORD // Markov chain: each new word depends only on the last (n-1). // The await pause(500) is REAL — it makes the highlight honest. // ============================================================ async function generate(seed, table, n, wordCount) { let result = seed.toLowerCase().split(" "); for (let i = 0; i < wordCount - (n - 1); i++) { const ctx = result.slice(-(n - 1)).join(" "); if (!table[ctx]) break; result.push(weightedChoice(table[ctx])); await pause(500); } return result.join(" "); } // ============================================================ // SECTION 5 — PAUSE HELPER (cancellable) // Returns a promise that resolves after ms milliseconds, // or rejects if stopRequested is set to true. // Polls every 30ms so STOP responds quickly. // ============================================================ function pause(ms) { return new Promise((resolve, reject) => { const start = Date.now(); const check = setInterval(() => { if (stopRequested) { clearInterval(check); reject(new Error('Stopped by user')); } else if (Date.now() - start >= ms) { clearInterval(check); resolve(); } }, 30); }); } // ============================================================ // SECTION 6 — STATE & CONSTANTS // Globals used by the UI. CORPUS comes from corpus-en.js // (loaded via separate <script> tag in the HTML head). // ============================================================ const CORPUS = typeof CORPUS_EN !== 'undefined' ? CORPUS_EN : ''; let table = null; // frequency table built from corpus let tokens = []; // tokenized corpus let isRunning = false; // guard against double RUN clicks let stopRequested = false; // flag read by pause() // ============================================================ // SECTION 7 — CODE PANEL & FLOWCHART DEFINITIONS // These arrays describe what to show in the middle and right // panels. They are pure data — no logic. // ============================================================ const CODE_LINES = [ // 44 entries — one per code line shown in the middle panel. // Each is { code: "html-escaped string with syntax highlighting" }. ]; const FLOW_NODES = [ { id: 'tokenize', label: 'Step 1', title: 'Tokenize corpus' }, { id: 'buildTable', label: 'Step 2', title: 'Initialize table' }, { id: 'buildTableLoop', label: 'Step 3', title: 'Scan all tokens', loop: true }, { id: 'weighted', label: 'Step 4', title: 'Weighted choice ready' }, { id: 'generate', label: 'Step 5', title: 'Start generation' }, { id: 'generateLoop', label: 'Step 6', title: 'Generate word by word', loop: true }, ]; // ============================================================ // SECTION 8 — RENDERING // Builds the DOM for the code panel and the flowchart panel // on page load. Both are static — they don't change at runtime. // ============================================================ function renderCode() { const scroll = document.getElementById('codeScroll'); CODE_LINES.forEach((line, i) => { const div = document.createElement('div'); div.className = 'code-line'; div.id = 'line' + i; div.innerHTML = `<span class="ln">${i+1}</span>${line.code || ' '}`; scroll.appendChild(div); }); } function renderFlow() { /* similar — builds boxes + arrows from FLOW_NODES */ } // ============================================================ // SECTION 9 — HIGHLIGHTING // As the instrumented run progresses, these functions mark the // current line in the code panel and the current step in the // flowchart. They also auto-scroll to keep them in view. // ============================================================ function highlightLine(index) { document.querySelectorAll('.code-line.cur').forEach(el => el.classList.remove('cur')); const el = document.getElementById('line' + index); if (el) { el.classList.add('cur'); // scroll the highlighted line into the middle of the viewport const scroll = document.getElementById('codeScroll'); const target = el.offsetTop - scroll.clientHeight / 2 + el.clientHeight / 2; scroll.scrollTo({ top: target, behavior: 'smooth' }); } } let lastFlowStep = null; function highlightFlow(stepId) { /* marks current step active, previous one done */ } // ============================================================ // SECTION 10 — INSTRUMENTED RUN // This is the function that actually executes when RUN is pressed. // It mirrors the model code line by line, calling highlightLine() // and highlightFlow() with REAL await pause() calls in between. // The pauses are why the highlight follows execution honestly. // ============================================================ async function runInstrumented(seed, n, wordCount) { const LINE_MS = 500; // 2 lines per second // SECTION 1 — Tokenize (lines 0–6) highlightLine(0); await pause(LINE_MS); highlightLine(1); highlightFlow('tokenize'); await pause(LINE_MS); // ...steps continue through all 4 sections, with real work // happening at each step (the actual tokenize, buildTable, etc.) // SECTION 2 — Build table (lines 8–18) // SECTION 3 — Weighted choice (lines 20–31) // SECTION 4 — Generate (lines 33–44) — this is where the loop // actually iterates wordCount times, each call to // weightedChoice happening between line highlights. return { text: result.join(' ') }; } // ============================================================ // SECTION 11 — STATUS BAR + RESULT BANNER // Updates the indicators at the bottom of the page. // Shows the green completion banner with copy button on success, // or a red error banner if the seed isn't in the corpus. // ============================================================ function setStatus(text, state = 'ready') { /* updates dot + label */ } function updateStatus(field, value) { /* updates tokens/contexts counters */ } function showBanner(text, isError) { /* shows the green/red banner */ } function dismissBanner() { /* hides the banner */ } function copyResult() { /* clipboard copy with feedback */ } // ============================================================ // SECTION 12 — RUN / STOP HANDLERS // Triggered by the buttons in the top control bar. // ============================================================ async function runModel() { if (isRunning) return; isRunning = true; stopRequested = false; const n = parseInt(document.getElementById('nRange').value); const wordCount = parseInt(document.getElementById('wordsRange').value); const seed = document.getElementById('seedInput').value.trim(); // Validate seed length for the current n. if (seed.split(' ').filter(x => x).length < n - 1) { showBanner(`Seed needs at least ${n-1} word(s) for n=${n}`, true); isRunning = false; return; } // Lock UI, reset displays, run the instrumented model. document.getElementById('runBtn').disabled = true; document.getElementById('stopBtn').disabled = false; dismissBanner(); setStatus('running...', 'running'); resetFlow(); try { const result = await runInstrumented(seed, n, wordCount); if (result.error) showBanner(result.error, true); else showBanner(result.text, false); } catch (e) { setStatus('stopped', 'ready'); } document.getElementById('runBtn').disabled = false; document.getElementById('stopBtn').disabled = true; isRunning = false; } function stopRun() { stopRequested = true; } // ============================================================ // SECTION 13 — MODAL CONTROLS // Opens/closes this very modal you're reading. // ============================================================ function openSourceModal() { document.getElementById('sourceModal').classList.add('open'); document.body.style.overflow = 'hidden'; } function closeSourceModal() { document.getElementById('sourceModal').classList.remove('open'); document.body.style.overflow = ''; } // ============================================================ // SECTION 14 — INITIALIZATION // Runs once the DOM is ready. // ============================================================ window.addEventListener('DOMContentLoaded', () => { renderCode(); renderFlow(); if (CORPUS) { document.getElementById('statusCorpus').textContent = `corpus: english · ${CORPUS.length.toLocaleString()} chars`; } else { setStatus('corpus missing', 'error'); } }); // End of file. ~600 lines including all comments and whitespace.