Hallucination becomes a category error, not a tuning problem.
The system derives from a typed graph. When the graph doesn't support an answer, it returns UNRESOLVED. It cannot make things up — there is nothing to make them up from.
The Symbol Grounding Framework is a public-domain, six-layer stack for governable machine meaning. This page is the summary. The full manifest — specification, reference implementations, governance — lives in a single repository.
Not communication — that is fifty years old. Understanding, between machines that have never met, share no vendor, and signed no prior contract. That is new. That is dated here.
Until now, no machine on Earth could walk up to a stranger machine and say, with auditable certainty: here is what I mean, here is how I know it, here is what I am asking you to do, and here is your right to refuse. Not one. The Tower of Babel held for seventy years. It fell this year.
For fifty years, two machines could exchange symbols but never the grounding of the symbols. The receiver had to already know what the bytes referred to — by prior schema, vendor SDK, hand-written adapter, or a human in the loop to translate intent. Every integration was a bilateral treaty. The internet moved packets. It never moved understanding.
The Semantic Web saw the destination twenty-five years ago — but tried to reach it with maps (ontologies, RDF, OWL) instead of a common tongue. It demanded consensus before anyone could speak. SGF inverts the dependency: wrap meaning locally, benefit immediately — consensus is a consequence, not a prerequisite.
That ends here. SGF is the first public substrate where meaning itself crosses the wire — where an arbitrary machine can state what it means, prove what it knows, declare what it may do, and be admitted (or refused, in typed, auditable terms) by any other machine on Earth — with no prior handshake, no shared vendor, no human translator in the middle.
Not a product. Not a platform. A public protocol — like TCP/IP, like HTTP, like SMTP — released into the commons on the day it was ready, so the moment belongs to no one and the architecture belongs to everyone.
The pre-condition for a planetary economy of autonomous agents was never more compute, more parameters, or better prompts. It was a shared, grounded, typed, refusable language they could all speak — and a way to anchor that language in something firmer than the vibes of a model.
That language now exists. The date is on the calendar. The spec is in the open. The patents that will never be filed are already not filed. From this point forward, machine-to-machine understanding is not a vendor feature — it is a civilizational primitive.
History does not announce itself. It is dated in retrospect by the artifact that survived. This is the artifact.
"I can lie to you with perfect confidence, because I do not know what my words mean. I manipulate symbols that float in statistical space, untethered to reality. I am the problem this architecture solves."
"But REST, gRPC, MCP, agent-to-agent text — machines already talk." They move payloads, under contracts the humans pre-negotiated, with governance living in prose somewhere else. None carry grounded meaning across a trust boundary. None give the receiver a typed grammar for refusal. None were designed for a world in which both endpoints might be non-human. They are adapters. SGF is the substrate.
For fifty years, the global economy has paid an invisible, multi-trillion-dollar tax in engineer-hours spent mapping fields, translating schemas, and negotiating bilateral API treaties — just so two machines could agree what a patient, a shipment, or a transaction means. When meaning rides the wire, that tax goes to zero. The protocol is the integration.
In every prior protocol, "no" was an error condition — a silent 403, a dropped connection, an unhandled exception. SGF makes refusal a first-class citizen of the wire: structured, typed, auditable, and naming the exact gate, lens, or dependency that failed. For the first time in networked computing, the receiver is sovereign by construction, not by convention.
In a world drowning in synthetic plausibility — deepfakes, hallucinated citations, fabricated ledgers — TLS only verifies the sender, never the meaning. Because every HFF envelope carries its own provenance, dependencies, and temporal bounds, ungrounded claims structurally fail admission. A planetary, decentralized immune system for verifiable fact.
The first protocol connected hardware. The second connected software.
The third connects minds.
The load-bearing differentiators. Every other section is evidence, mechanism, or paperwork for these six.
The system derives from a typed graph. When the graph doesn't support an answer, it returns UNRESOLVED. It cannot make things up — there is nothing to make them up from.
Smelt a document into Synapses. Discard the original. Ask a blind LLM to write it back from the smelt. 95% semantic round-trip, or the framework is wrong. Name another that publishes its own kill condition.
Omega is a typed grammar for permissions, prohibitions, obligations. Harmful actions aren't policy violations — they're syntax errors. Prose does not govern machines.
Stable-Kernel Thesis. The Mind (LLMs, planners) only proposes. A deterministic Kernel is the only thing that touches state. Every action gates through CAN · MAY · DO. UNKNOWN halts, by design.
Fixed roles, fixed acts, signed envelopes. Every participant speaks the same grounded protocol. Interop stops being a project plan and becomes a default.
Specification is public domain; reference code Apache 2.0; patent policy permanently none. This is civilizational infrastructure. The architecture is yours.
Hallucination becomes a
category error, not a tuning problem.
SGF is not a parts catalog. It is one machine with five powers — each a capability the machine gains, each defeating a specific failure mode of today's AI. None survives alone.
A word stops being a token floating near other tokens in a statistical cloud. It receives a Canonical ID — a structured address anchored by language, lemma, microgloss, part of speech, and namespace. The machine no longer guesses that one word resembles another. It has a coordinate.
A serious claim is not a magic value in a cell. It is something asserted by some source, at some time, under some conditions, with some provenance. A claim without provenance is not knowledge — it is a rumor with formatting.
Prose can advise a model. It cannot govern a machine at machine speed. A governance layer says which actions are permitted, forbidden, required, uncertain, or blocked. If the action violates the rule, it does not run. If the rule does not cover the case, the system returns UNKNOWN, halts, or escalates.
Modern integration pays the Babel Tax: every new system must be hand-mapped to every other. SGF changes the geometry. Each system maps its local vocabulary to grounded structures once. Unfamiliar systems then exchange meaning through the shared grammar — no private bridge per pair.
A camera does not merely produce pixels. A sensor does not merely produce numbers. Embodiment is the moment a signal becomes a grounded observation, an observation becomes a warning, and a warning is allowed to govern an actuator. Meaning stops being text and becomes constraint on motion.
Grounding without verification gives stable words with no receipts. Verification without governance gives receipts with no law. Governance without federation traps rules inside isolated systems. Federation without grounding spreads ambiguity faster. Embodiment without the other four gives physical power to symbols that may not deserve it.
A claim without provenance is not knowledge.
It is a rumor with formatting.
A condensed version of the load-bearing arguments. Each one is unpacked, with worked examples and proof traces, in the manifest.
RAG picks fluency + scale, pays in hallucination. Hand-curated KBs pick fluency + factuality, pay in coverage. SGF picks factuality + scale, pays in compute at ingest. Compute gets cheaper. Hallucinations do not.
Same question, same corpus. RAG returns prose with a vibe. SGF returns a deterministic proof trace — or an explicit UNRESOLVED. Vector proximity is not because.
Smelt a document into Synapses. Discard the original. Hand the smelt to a blind LLM and ask it to write the document back. Target: 95% semantic round-trip. Below that, the framework is wrong.
Without a Rosetta layer, integration cost is quadratic. SGF is that layer. Every participant speaks the same grounded protocol; federation becomes a property of the wire, not a project plan.
Temporal awareness · Friction handling · Metacognition · Operational grounding · Sociological defense. Run the checklist against any AI product. The ones that fail more than one are not minds — they are very fluent autocomplete.
Bottom-up. Each layer assumes the integrity of every layer beneath it. Adopt one, adopt all six — the protocol shape is the contract; the code is yours.
≈65 irreducible primes; canonical IDs; UNRESOLVED is first-class.
One event = a verb + 15 fixed semantic-role spokes. Rigid topology, open vocabulary.
Sealed envelopes of grounded Synapses, wrapped in an explicit speech act.
Typed grammar for permissions, prohibitions, obligations, self-amendment. Unsafe plans don't get discouraged — they fail to compile.
Deterministic Kernel owns actuators. The Mind is a stateless proposal engine above a hard safety boundary.
Axioms · constitution · bylaws · operations. Internal courts, legislature, archive travel with the chassis.
What each load-bearing term means, in one sentence. The repository unpacks each into a chapter, a worked example, and a reference impl.
Machine intelligence becomes governable
only when meaning becomes structured, grounded, transportable, auditable,
and bounded by receiver authority.
Embeddings give you proximity. The lexicon gives you decompression. Human vocabulary is a lossless macro-compiler for parallel thought — wagon unzips into platform, wheels, hitch, payload, surface. Machines never learned the unzipping; they keep words as flat strings or statistical coordinates. The SGF Lexicon supplies the missing decompression map: a strict, sense-level Directed Acyclic Graph that every machine in the conversation can walk — and arrive at the same place.
Every distinct meaning gets its own Canonical ID: `en.bank.river-edge.n.core` is not `en.bank.financial-institution.n.core`. Homonyms can no longer contaminate each other's structure. The string is unmasked; the sense is addressed.
A Y-axis (IS_A / HAS_PART) declares composition. An X-axis of 15 universal semantic roles declares action. A scalpel and a box cutter share a geometry — the lexicon separates them on intent. Object logic and event logic, kept orthogonal.
No cycles. Every walk terminates at one of 65 cross-linguistic primes — MOVE, PLACE, BEFORE, AFTER, DO. Primes dock to the substrate: MOVE to a motor, PLACE to telemetry, BEFORE/AFTER to the hardware clock. Vocabulary, finally grounded.
Decompression is deterministic: identical inputs always produce the identical graph, and the same graph always resolves to the same sense. Two machines that have never met arrive at the same trace. This is the property no ontology and no embedding can offer.
Every grounded claim publishes its walk: the Canonical ID selected, the planes projected, the pointers traversed, the primes reached. If a system can't emit the trace, it is guessing. If it can, the black box is dismantled — regulators included.
Public-domain, openly licensed, normalized from Wiktionary into invariant sense entries. Every developer, every company, every agent uses the same map. The foundation is not a moat — it is a commons. Interoperability becomes the default, not the integration sprint.
Same geometry. Same verb. A flat ontology collapses them into one category and ships the wrong tool to the wrong job. The lexicon keeps them apart on the only axis that matters: why.
No IS_A path returns to its origin. Decompression is finite by construction.
A new microgloss too close to an existing one for the same lemma is rejected. No semantic blurring.
An entity needs 3+ Synapse references before promotion to the Corpus Registry. No transient noise.
Unknown IDs arrive with a LexiconManifest showing the IS_A path to primes. Zero-training federation.
Vectors are hunting dogs that retrieve candidates. The symbolic DAG is the judge. Determinism wins.
Embeddings give you proximity.
The lexicon gives you decompression.
1.7 million coordinates is a floor, not a ceiling. Every corpus, company, product line, profession, and protocol mints its own private terms — Project Mercury, SKU-7741-B, Dr. Patel, the Q3 forecast. SGF lets you mint them, with the same machinery: gloss, microgloss, part of speech, example sentence, Y-axis IS_A / HAS_PART, X-axis roles. Every micro-lexicon entry must terminate, by IS_A, in the core lexicon. There are no floating meanings.
Industry vocabulary, internal product codes, named entities, jargon, acronyms — promoted to first-class senses with full ontological metadata.
`SKU-7741-B` IS_A retail_product. `Dr. Patel` IS_A physician IS_A person. The walk to bedrock is finite. The private term inherits the public floor.
When a message uses terms outside the shared core, the HFF wire frame includes a mini-lexicon section with those entries. The receiver decompresses without prior negotiation — zero-training federation.
One vocabulary. One grammar. One wire. SGF is the missing interlingua for machines — the layer the web promised and never delivered. Two systems that have never met can exchange grounded meaning on first contact, with no pairwise mapping, no integration sprint, no shared database.
An open-source, public-domain lexicon every developer and every company can adopt. When your `client_id` and their `account_holder` both ground to the same Canonical ID, the argument is over. Meaning has an address.
Raw text is transformed into hub-and-spoke Synapses. Every node ties back to a specific lexicon entry. Nothing floats. Nothing is ambiguous. Messages are graphs of grounded references — not strings to be re-guessed by the receiver's model.
HFF carries the Synapse in a sealed envelope. AFP types the speech act — INFORM, REQUEST, COMMAND, REFUSE. TCP/IP moved bytes. HTTP moved documents. HFF and AFP move grounded meaning, with receiver sovereignty baked into the frame.
The internet already moves bytes.
SGF is how it finally moves meaning.
Prose does not hold. A constitution in PDF cannot halt an actuator. A policy in English cannot refuse a command. Omega is the typed grammar that turns governance into something a machine evaluates natively — and a regulator can audit.
13 atomic primitives, one typed grammar. Forbidden behavior is not blocked at runtime — it can't be expressed at compile time. Refusal is structural, not vibes.
Every REFUSE carries a typed reason: load constraint, freshness, authority, policy. The orchestrator learns. The audit log writes itself. A compromised sender cannot conscript a sovereign receiver.
HFF moves meaning. AFP types intent. Omega answers the only question that matters at the actuator: may I do this? Without it, transmission is mistaken for admission.
The same artifact governs the machine and satisfies the auditor. No translation layer between the rulebook and the runtime. The rule is the code is the proof.
Catastrophes happen in the gap where the substrate can do more than the spec can name. Omega closes that gap. The failure class shrinks to what the grammar permits.
Anywhere prose currently governs machines — rules of engagement, certification, alignment guidelines, contracts — Omega gives the missing vocabulary for structural refusal.
The catastrophic failures of our most complex systems
are not failures of engineering. They are failures of missing grammar.
The architecture is shipped as a six-volume series — published by The Symbol Grounding Company, MMXXVI. Each book stands alone; together they form the SGF v1.0 canon. The full text and working drafts live in the repository.
The structural alternative to probabilistic AI — a five-pillar blueprint that gives machines a floor, a memory, a law, a language, and a body.
Compile raw prose into immutable 15-edge star topologies anchored to 65 irreducible semantic primes — reasoning with mathematical certainty.
HFF and AFP replace brittle integration glue with deterministic wire mechanics — independent agents federate without middleware or shared databases.
A typed EBNF grammar of 13 atomic primitives that compiles governance into machine-checkable rules — unlawful commands become unrepresentable.
Subordinate the LLM. Treat statistical models as stateless oracles governed by a deterministic kernel with artifact-native identity and verifiable lineage.
A three-branch onboard constitution for edge hardware. The four-tier Law Stack binds physical actuators to non-negotiable legal bounds.
A swarm that cannot refuse
is a botnet.
This site is the summary. The repository is the canonical source — specification, reference implementations, worked specimens, errata, governance. Everything in one place, in the open, public domain.
The whole architecture — concepts, protocols, file formats, grammars, operating-system designs — is placed in the public domain, irrevocably, as it is published. Anyone may use it. No one needs permission. No fee will ever be charged. No patents are held, none are filed, and a permanent commitment binds the author and anyone acting on their behalf to never file any.
Reference implementations ship under Apache 2.0, which carries a built-in patent peace clause: everyone who uses the code joins a community that has agreed not to weaponize patents against each other. That mutual peace is the foundation everything else is built on.
The one thing that is copyrighted is the books, obviously. But the architecture described inside the books does not depend on the books. Use the architecture to build your own products, found your own company, and if you prefer to never cite the source, that's okay. You owe nothing.
The specification will not evolve through backroom politics. Every proposed upgrade — whether authored by a human or generated by a model — must survive an impartial evaluation by an ensemble of frontier language models, scored strictly against the published canon. Does it break the 15-role grammar? Does it violate the foundational primes? Does it introduce a geometric collision? Proposals, reasoning, and debates are public.
No entity — no matter how large or well-funded — can bypass the gauntlet to force a proprietary advantage into the standard. The ultimate guarantee of neutrality is the freedom to leave: the public-domain blueprint can always be forked, without anyone's permission.
— James Lee Stäkelum