yourfirstllm.dev/level/1-ngrams
Model theory
What is an n-gram?
Given the last N−1 words, predict the next one using frequency tables built from the corpus.
"the city"
  was: 9
  had: 6
  seemed: 3

How it trains
Reads the corpus once, counts every sequence of N consecutive words, stores probabilities in a dictionary.

Expected output
Locally coherent text. Loses global thread after ~15 words — no long-range memory.

How it differs from GPT
GPT uses transformers with attention over thousands of tokens. We use a frequency table and two words of context.
statistical no libraries pure JS
Actual code running in your browser
// This is the actual code running in your browser  ·  full source: Ctrl+U
1// buildTable — learns from the corpus
2// builds a frequency table of word sequences
3
4function buildTable(tokens, n) {
5  const table = {};
6  for (let i = 0; i < tokens.length - (n-1); i++) {
7    const ctx = tokens.slice(i, i+n-1).join(' ');
8    const next = tokens[i+n-1];
9    if (!table[ctx]) table[ctx] = {};
10    table[ctx][next] = (table[ctx][next] || 0) + 1;
11  }
12  return table;
13}
14
15// weightedChoice — picks next word by probability
16// more frequent words have higher chance of being chosen
17
18function weightedChoice(map) {
19  const words = Object.keys(map);
20  const total = words.reduce((s,w) => s+map[w], 0);
21  let r = Math.random() * total;
22  for (const w of words) {
23    r -= map[w];
24    if (r <= 0) return w;
25  }
26  return words[0];
27}
28
29// generate — produces text word by word
30// each word becomes context for the next choice
31
32function generate(seed, table, n, words) {
33  let result = seed.split(' ');
34  for (let i = 0; i < words - (n-1); i++) {
35    const ctx = result.slice(-(n-1)).join(' ');
36    if (!table[ctx]) break;
37    result.push(weightedChoice(table[ctx]));
38  }
39  return result.join(' ');
40}
Parameters — session only, resets on reload
N-gram size (n) 3
Words to generate 30
Seed (2 words from the corpus)
Output will appear here after clicking RUN...
Execution trace // 100ms playback
Run the model to see step-by-step debug info here.
YOUR_FIRST_LLM v1.0 // CONSOLE PHOSPHOR P31 · 80×25 · UTF-8
--:--:-- > System ready. Load corpus and press RUN to start.
>
ready
corpus: not loaded
tokens: —
contexts: —
last run: —