Skip to content

LlmWikis knowledge page

Schema Engineering for Durable LLM Wikis

Schema engineering for an LLM Wiki is larger than asking a model to emit valid JSON. The useful schema is the operating contract that tells an agent what it may read, what it may write, how claims become durable, and how future sessions can recover the same project state without private chat memory.

Core idea

Use schema in layers. The root instruction file governs behavior, the wiki page schema governs durable knowledge, and task-local structured outputs govern extraction or tool calls. Keep each layer small, typed, and testable.

Three contract layers

Layer Artifact What it controls Typical fields or rules
Repository contract AGENTS.md or CLAUDE.md Where the agent looks, what it may change, which loops it must follow, and what it reports back. Read-only paths, writable paths, ingest/query/lint rules, final verification summary.
Knowledge contract Wiki frontmatter and page templates How compiled pages preserve source status, review state, type, and graph links. title, type, status, source_status, related, last_reviewed.
Runtime exchange contract JSON Schema or provider tool schema How extraction, classification, validation, or tool calls produce machine-readable results. required fields, enums, descriptions, local validation, error handling.

Prompt engineering is not the same layer

A prompt tells the model what to attempt. A schema tells the surrounding system what counts as a valid artifact or action. In a durable LLM Wiki, both matter: the prompt frames the current job, while the schema preserves boundaries after the chat window is gone.

Design principles

  • Start with the consumer. Decide whether the next reader is a human reviewer, another agent, a wiki page, a validator, or an application function.
  • Separate immutable sources from compiled synthesis. Raw source folders should be read-only; wiki pages can change under logged review rules.
  • Treat names as instructions. Field names, page types, and route labels should say exactly what the agent should preserve.
  • Make optional fields rare. Required fields and small enums are easier to lint, compare, and migrate.
  • Validate twice. Let an API or parser enforce structure, then validate locally against the application rule that actually matters.
  • Measure failures by class. Separate schema compile errors, parse failures, semantic mistakes, stale-source problems, and workflow violations.

Task-local extraction shape

A wiki ingest task can use structured output without turning the whole wiki into an API response. The schema below is intentionally narrow: it extracts claims, evidence, contradictions, and update targets from one source so the agent can propose a reviewed wiki update.

{
  "type": "object",
  "properties": {
    "summary": {
      "type": "string",
      "description": "A concise source summary in 2 to 4 sentences."
    },
    "claims": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "claim": { "type": "string" },
          "support": {
            "type": "string",
            "enum": ["source-linked", "source-needed", "derived"]
          },
          "evidence_locator": { "type": "string" }
        },
        "required": ["claim", "support", "evidence_locator"],
        "additionalProperties": false
      }
    },
    "contradictions": {
      "type": "array",
      "items": { "type": "string" }
    },
    "update_targets": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["summary", "claims", "contradictions", "update_targets"],
  "additionalProperties": false
}

Provider compatibility warning

Do not assume every model provider accepts every JSON Schema keyword or enforces every constraint the same way. Keep provider-facing schemas simple, compile-test them against the target API, and validate the parsed result locally before writing durable wiki pages.

Durability checklist

  • Root instructions name read-only and writable directories.
  • Every compiled page has type, status, source status, related links, and last-reviewed metadata.
  • Every ingest writes or updates index.md and appends to log.md.
  • Contradictions are preserved until a human resolves them.
  • Structured extraction schemas are versioned when downstream consumers depend on them.
  • Provider-specific schema limitations are documented beside the workflow that uses them.

Measurement stack

Measure schemas as operating contracts, not only as parsers. A durable wiki should know whether the contract compiles, whether outputs parse, whether claims are semantically supported, and whether the graph stays healthy after repeated edits.

Metric What to measure Why it matters
Compile success Provider or parser acceptance rate for the schema. Catches unsupported keywords and provider-subset mismatches before live use.
Parse success Percentage of responses that parse into the expected object. Separates structural reliability from answer quality.
Semantic accuracy Claims checked against source traces or a reviewed gold set. Valid JSON can still carry unsupported or wrong claims.
Failure class Compile-time, parse-time, semantic, refusal, truncation, or workflow failure. Shows whether to simplify schema, improve descriptions, change prompts, or adjust the workflow.
Graph hygiene Broken links, orphan pages, stale dates, missing provenance, and unresolved contradictions. Keeps schema quality connected to the wiki’s long-term usefulness.

Common failure modes

Failure Symptom Correction
Bloated root schema AGENTS.md becomes a long essay and the agent starts dropping constraints. Move durable explanations into wiki pages and keep the root file focused on behavior.
Weak source boundary Compiled claims cannot be traced back to raw sources. Require evidence locators and source_status before a page can be marked reviewed.
Schema-valid but wrong output JSON parses cleanly but contains unsupported claims. Run semantic validation and human review before persistence.
Provider lock-in A schema works in one API but fails elsewhere. Use a small common core and keep provider-specific adapters separate.
Silent drift Old pages remain plausible after sources or project rules change. Lint stale dates, unresolved contradictions, missing links, and unsupported claims.

Practical rule

Use schema engineering when: the same knowledge will be revisited, updated, handed to another AI, or trusted by a human later. The schema is not decoration; it is how the wiki remembers what counts as safe work.