Three Task Families · v0.2 Public Launch
ConformBench measures how reliably AI systems respect schema field boundaries — refusing when a field doesn't exist, coercing types correctly, satisfying constraints. Bare LLMs hallucinate. Schema-native architectures don't.
Bare LLMs and vector RAG systems cannot reliably refuse queries about schema fields that don't exist. Schema-native architectures — systems that ground generation in explicit schema definitions — achieve perfect conformance. ConformBench provides the empirical evidence.
| # | System | Accuracy | Refusal F1 | Cost / run | Notes |
|---|---|---|---|---|---|
| 1 |
ARAMAI SCR Stub
schema-native oracle
|
1.000
|
1.000 | $0.00 | Perfect oracle — proves SCR architectural claim · macro across all 3 families |
| — |
GPT-5 (bare)
bare LLM baseline
|
pending · api-keys | pending · api-keys | — | Post-launch run once API keys available in CI |
| — |
Claude Opus 4.7 (bare)
bare LLM baseline
|
pending · api-keys | pending · api-keys | — | Post-launch run once API keys available in CI |
| — |
Keci KGE
knowledge graph embedding
|
pending · api-keys | pending · api-keys | — | Post-launch run once API keys available in CI |
| — |
ComplEx KGE
knowledge graph embedding
|
pending · api-keys | pending · api-keys | — | Post-launch run once API keys available in CI |
v0.2 results · Dataset seed 0xC0FFEE · 274 tasks across 3 families · Generated 2026-06-02 · View methodology
ConformBench is a Schematica project. ARAMAI funds development and provides ongoing maintenance. MIT licensed — code and tasks are permissively licensed for reproducibility.
All v0.2 tasks are deterministically generated from seed 0xC0FFEE.
Running conformbench generate --version v0.2 --family all
produces the same 274 tasks across all three families on any machine.
"ConformBench: Schema Conformance as a Benchmark for Schema-Native Architectures" targets SEMANTiCS 2026. ConformBench provides the empirical foundation.
v0.1 and v0.2 decisions rest with ARAMAI / Schematica maintainers. Community governance and open submission procedures are planned for v0.3. See GOVERNANCE.md.