Knowledge Graphs for LLM Wikis

A knowledge graph is the semantic layer that makes an LLM Wiki more than searchable markdown. It names the things the wiki knows about, the claims being made, where those claims came from, how pages relate, which facts are stale or contradicted, and which path an agent should follow before it answers or updates memory.

Authority boundary

This guide teaches how to design an LLM Wiki knowledge graph. It does not announce a hosted graph database, public MCP server, public write API, automatic ingestion pipeline, automatic LLM Wiki sync, official UAIX graph profile, certification, endorsement, SDK, or CLI. For UAI-1 and Project Handoff authority, use UAIX UAI-1 and UAIX Project Handoff.

Layered graph architecture

The strongest LLM Wiki graph does not start with a database. It starts with governed source memory and derives graph views from it. Each layer has a job, and agents should know which layer they are reading before they answer, summarize, or propose a write.

Layer	Primary role	Agent rule
Raw source layer	Original files, source summaries, hashes, URLs, screenshots, reports, transcripts, and archive manifests.	Use for provenance and re-checking, not as unsorted operating truth.
Reviewed wiki layer	Human-readable pages with frontmatter, source traces, trust labels, contradictions, and review state.	Treat this as the authoring layer for durable memory.
Schema layer	Page kinds, required metadata, relation vocabulary, claim states, source-span rules, and lint checks.	Validate before graph export, retrieval, or agent onboarding.
Derived graph layer	JSON-LD/RDF, property-graph loads, graph navigation maps, query fixtures, and GraphRAG indexes derived from reviewed pages.	Use for navigation and evidence assembly; write accepted facts back to the reviewed wiki or Project Handoff surface.

RDF, property graph, and markdown roles

Markdown remains the reviewed memory that people can inspect. RDF/JSON-LD is strongest when a team needs standards-aligned interchange, named graph snapshots, and SHACL-style validation. A property graph is strongest for impact analysis, relationship exploration, graph dashboards, and operational queries. A mature LLM Wiki can support more than one projection, but it should never let a generated projection drift into a second source of truth.

Question	Best first representation	Why
Can a future agent read and edit this safely?	Reviewed markdown with explicit metadata.	Humans can inspect the exact source trace, contradiction, and review state.
Can another system consume graph facts?	JSON-LD or RDF projection.	Terms, provenance, and graph statements can be shared without hiding source links.
Can maintainers explore impact paths?	Property graph projection.	Edges such as depends-on, supersedes, contradicts, and reviewed-by are easy to traverse.
Can an agent assemble an answer context?	Hybrid lexical, path, vector, and graph retrieval.	Different retrieval modes catch different evidence, then citations and review rules decide whether the answer is usable.

Knowledge graph build order

Keep raw sources immutable. Store originals, source summaries, hashes, and locators before graph extraction starts.
Make markdown pages graph-ready. Add page type, status, source trace, related pages, contradictions, and last-reviewed fields.
Name durable entities and claims. Use stable IDs for projects, pages, concepts, people, systems, constraints, decisions, claims, and evidence records.
Derive graph edges from reviewed content. Generate supports, contradicts, supersedes, depends-on, derived-from, reviewed-by, and archives-to links from visible page fields and reviewed notes.
Validate before retrieval. Run deterministic link, metadata, provenance, contradiction, and freshness checks before exposing the graph to retrieval, GraphRAG, or export tools.

Minimum graph model

Node	Purpose	Minimum fields
Page	Reviewed wiki document that humans and agents can read.	ID, route or path, title, page type, status, owner, last reviewed.
Section	Anchorable part of a page that supports precise citation.	ID, parent page, heading, source span, summary.
Source	Original or summarized evidence record.	ID, source path or URL, hash when local, date checked, sensitivity, license or access note.
Claim	Atomic assertion that can be supported, contradicted, superseded, or marked stale.	ID, claim text, source trace, confidence label, review status, freshness rule.
Entity	Thing the wiki tracks across pages.	ID, preferred label, aliases, type, owning source or steward.
Relation	Typed edge between pages, claims, sources, or entities.	Subject, predicate, object, provenance, status.
Contradiction	Visible unresolved conflict.	Claim IDs, conflicting sources, decision owner, status, next review action.
Review event	Evidence that a human or accepted process checked the graph state.	Reviewer, timestamp, changed nodes, checks run, blockers.

Stable IDs

Titles, routes, and filenames change. Stable graph identity should not. Use project-owned IDs for durable records, snapshot IDs for release or archive states, and source IDs that survive file moves. Avoid durable blank-node-style identity for important handoff facts, because future agents need to cite the same thing after a rename.

kg:page/operations-ingest
kg:section/operations-ingest#write-checklist
kg:claim/ingest-requires-log-entry
kg:source/raw-2026-05-06-kg-report
kg:review/2026-05-06-public-guide-pass

Claim, source-span, and review states

For AI handoff, the most useful graph node is often the claim. A claim is smaller than a page and stronger than a keyword. It can say exactly what is being asserted, which source span supports it, who reviewed it, whether it is contradicted, and where it may be promoted.

State	Meaning	Allowed use
`candidate`	Extracted from a report, source note, old chat, or generated draft.	Background only; requires review before handoff or retrieval authority.
`source-linked`	Claim points to a file, URL, section, hash, or source span.	Can be reviewed, compared, and queued for promotion.
`reviewed`	Owner, source, sensitivity, contradiction, and target surface were checked.	Can guide answers when cited and within scope.
`current`	Promoted into current wiki, Project Handoff, docs, tests, release notes, or machine artifacts.	Can be treated as operating memory until superseded.
`contradicted`	Conflicts with another claim, source, route, or owner decision.	Must be surfaced in answers; do not smooth it away.
`stale`	Superseded, expired, or outside the current support surface.	May explain history but should not drive new work without fresh review.
`blocked`	Private, unsafe, unsupported, or missing authority.	Stop and route to the named owner or support escalation path.

Hybrid retrieval order

Graph retrieval is strongest when it works with plain search and exact paths. Use the graph to connect evidence, not to replace evidence. A good assistant answer should be able to show the path it walked.

Start exact. Check named routes, page IDs, source IDs, issue IDs, package names, and file paths before broad retrieval.
Search text. Use lexical search over reviewed pages and source summaries to catch names, aliases, and explicit wording.
Traverse the graph. Follow supports, contradicts, supersedes, depends-on, reviewed-by, and archives-to edges to gather surrounding context.
Use vectors last as recall help. Embeddings can find nearby concepts, but similarity is not proof.
Filter by governance. Remove or flag private, stale, blocked, unreviewed, and unsupported nodes before answer drafting.
Cite and abstain. Cite source pages and source records. If the graph lacks reviewed evidence, say so instead of inventing an answer.

Relation vocabulary

supportsEvidence or page supports a claim.

contradictsEvidence conflicts with a claim and needs a visible contradiction record.

supersedesA newer page, decision, or claim replaces an older one.

depends-onA page or answer is unsafe without the target context.

derived-fromA node was compiled from a named source, section, or review event.

archives-toBulky history moved to cold memory with transfer evidence.

Standards mapping

Standard	Use in an LLM Wiki graph	Boundary
RDF	Represent reviewed graph statements as subject-predicate-object triples and datasets.	RDF export is a projection of reviewed wiki state, not the only authoring layer.
JSON-LD	Share graph-shaped metadata with web-friendly JSON and explicit context terms.	Use it for export or structured data after the page schema is stable.
PROV-O	Model source lineage, generation, derivation, and review events.	Keep original sources and hashes; provenance markup is not a substitute for evidence.
SHACL	Describe graph shape checks for required properties, node types, and link rules.	Shape checks should fail exports or retrieval readiness, not rewrite sources automatically.
SPARQL	Run golden queries such as claims without sources or contradictions without owners.	Query endpoints are optional future infrastructure, not current LlmWikis public support.
SKOS	Organize controlled terms, labels, aliases, broader topics, and narrower topics.	Use it for vocabulary hygiene before adding heavier ontology commitments.
OWL	Define richer ontology semantics when the domain has stable rules.	Do not add formal reasoning until the basic source and review graph is reliable.

GraphRAG boundary

GraphRAG can help an agent gather related concepts and traverse evidence paths, but it must not turn nearest-neighbor confidence into truth. A graph-assisted answer should cite the pages and source records it used, name contradictions, avoid unsupported synthesis, and write any useful new result back as a reviewed candidate rather than an invisible chat-only memory.

Question	Graph query first	Answer rule
What supports this claim?	Claim -> supports -> Source or Page.	Show source status and date checked.
What changed?	Page -> supersedes -> old Page; Page -> review event.	Show old and new status, not only the latest prose.
Can I use this in handoff?	Claim -> source trace; Claim -> review status; Claim -> contradictions.	Promote only reviewed current facts into hot handoff files.
What is unsafe to answer?	Claim -> contradicts; Page -> stale; Source -> restricted.	Stop, cite the blocker, and ask the owner or reviewer.

Implementation sequence

Pick the authoring contract. Define frontmatter fields, page kinds, claim states, relation names, source-span format, sensitivity labels, and review events.
Seed graph-ready pages. Convert a small, high-value slice before broad ingest: one source page, one concept page, one contradiction record, one review log entry, and one query fixture.
Build deterministic extraction first. Parse frontmatter, headings, explicit links, route maps, source traces, and review logs before asking an LLM to suggest extra entities or claims.
Run graph lint. Fail export readiness when required IDs, source traces, contradiction owners, sensitivity labels, or review states are missing.
Publish derived artifacts deliberately. Name whether the output is markdown navigation, JSON-LD/RDF, property graph, search index, vector cache, or GraphRAG context bundle.
Review answer behavior. Test useful queries, contradiction handling, citation presence, stale/blocked filtering, and abstention before using graph retrieval in handoff.

Evaluation and safety checks

Check	Question	Pass condition
Provenance	Can every reviewed claim point to a source span or source-needed status?	No orphan current claims.
Contradiction handling	Does retrieval surface conflicts instead of silently choosing one?	Answers name unresolved contradiction records and owners.
Freshness	Are stale and superseded nodes filtered or labeled?	Stale history is visible but not treated as current advice.
Access control	Can private or blocked sources leak into public answers?	Sensitivity and access labels are enforced before drafting.
Promotion	Where does a useful graph result become durable memory?	Accepted findings are written back to reviewed wiki pages, Project Handoff, docs, tests, or release records.
Abstention	What happens when evidence is missing?	The assistant says what is missing and asks for source review instead of filling the gap.

Graph lint checklist

Every reviewed claim has a source trace or an explicit source-needed status.
Every contradiction edge has a contradiction record, owner, and next review action.
Every superseded page points forward to the current page and carries stale or archived status.
Every GraphRAG candidate answer lists source pages, graph path, unresolved contradictions, and promotion target.
Every export names whether it is JSON-LD/RDF, property graph, search index, or retrieval cache, and whether it is read-only.