The audit recommends starting with the articles that anchor reader understanding of large language models. This backlog turns the first 20 candidates into a concrete editorial queue.
First 20-page batch
| Page | Issue | First edits | Source targets |
|---|---|---|---|
| Transformer architecture | Stub, unsourced | Definition, self-attention, multi-head attention, diagram, limitations. | Original Transformer paper, LLM surveys. |
| Self-attention mechanism | Thin concept page | Query/key/value explanation, intuition, multi-head benefit, examples. | Transformer paper, attention explainers. |
| Tokenization | Stub | BPE, WordPiece, SentencePiece, vocabulary tradeoffs, examples. | Tokenizer docs and model papers. |
| Pre-training | Incomplete | Language modeling objective, corpus scale, contrast with supervised training. | GPT, BERT, and survey sources. |
| Fine-tuning | Stub | Full fine-tuning, adapters, prompt tuning, task examples. | BERT, PEFT, provider docs. |
| RLHF | Outdated | Preference data, reward model, PPO or alternatives, limitations. | OpenAI alignment sources, surveys. |
| GPT-4 | Outdated info | Release context, public capabilities, limitations, official claims only. | OpenAI technical report and docs. |
| LLaMA | Stub | Model family, release context, open-weight posture, variants. | Meta papers and official release notes. |
| BERT | Brief | Encoder-only architecture, masked language modeling, historical impact. | BERT paper and NLP references. |
| Chatbots | Thin overview | LLM chat architecture, turn-taking, memory limits, safety controls. | Conversational AI surveys and official docs. |
| Summarization | Stub | Extractive vs abstractive, prompt-based workflows, evaluation caveats. | Summarization papers and benchmark docs. |
| Code generation | Stub | Assistant behavior, examples, risk, evaluation, licensing concerns. | Codex/Copilot research and docs. |
| Hugging Face Transformers | Thin | Library purpose, hub, tokenizers, pipeline example, limitations. | Official Hugging Face docs. |
| LangChain and RAG | Thin | Chains, retrieval, agents, vector stores, when to avoid complexity. | LangChain docs and RAG papers. |
| Hallucination | Stub | Definition, causes, mitigations, evaluation, examples. | Reliability papers and safety references. |
| Bias in LLMs | Stub | Bias sources, measurement, mitigation, limitations. | Bias benchmarks and research papers. |
| AI alignment | Stub | Alignment definition, RLHF relationship, goals, limits. | Alignment literature and official policies. |
| Privacy and data | Stub | Training data, memorization, PII risks, privacy attacks, compliance. | Memorization and privacy research. |
| Evaluation metrics | Technical stub | Perplexity, BLEU, ROUGE, human eval, benchmark caveats. | Standard NLP evaluation papers. |
| Prompt engineering | Stub | Definition, examples, few-shot prompting, failure modes. | GPT-3 paper and provider guides. |
Execution order
- Week 1: classify all 20 pages and attach source leads before drafting.
- Weeks 2-4: expand the seven high-impact foundation/training/model pages.
- Weeks 5-7: expand application, tooling, safety, and evaluation pages.
- Weeks 8-9: peer review, copy edit, add internal links, and update quality classes.