Abstract
We know how to document an AI system. We know how to test it, log what it did, and report when something goes wrong. What current governance practice does not clearly tell us is how to verify which model is actually computing. This is not a hypothetical gap. When an organization says "this is the model we evaluated," that claim is typically supported by a model card, a registry entry, or a hash of a weight file — evidence about a file, not about the system that is running.
A neural network is not a static document. A weight file stores the network; the model is what appears when that file is loaded and begins transforming inputs into outputs. The file and the running model are related, but they are not the same thing — and current governance practice rarely distinguishes between them. This paper proposes a framework for doing so. It identifies three kinds of evidence that can support model identity claims, each answering a different question.
Structural evidence — drawn from the model's internal computations during live operation — can verify which specific model is running, and is the most resistant to tampering. Thermodynamic evidence — drawn from the model's output statistics — can verify that the system is a genuine neural network rather than a substitute, but cannot distinguish one model from another. Functional evidence — drawn from patterns in the model's outputs over an API — can detect whether a model was copied from another, but this signal fades quickly: routine model updates can erase it within days to weeks of continued training. The paper shows that inspecting the model's files alone is insufficient for verifying which specific model is running. The identity-bearing signal cannot be recovered from the tested static properties of those files; it is most reliably established by observing the model while it operates. The paper formally proves that these three kinds of evidence cannot substitute for one another.
Verifying that a system is genuine does not tell you which specific model it is. Detecting that a model was copied does not tell you the identity of the copy. The practical consequence is a standard for identity claims: any claim should declare which kind of evidence supports it, because borrowing evidence from the wrong category produces unreliable conclusions. The framework maps directly to compliance questions raised by current AI governance obligations, including those under the EU AI Act. It provides the missing evidentiary specification for model identity claims: which kind of evidence is admissible for which identity question.
1. The Identity Problem in Deployed AI
Most current AI governance practice can tell you what was documented, what was evaluated, and what was logged. It can tell you which model was specified in the contract, which version was listed in the registry, and which file was deployed to the server. What it cannot tell you — and rarely asks — is which model is actually running right now. Every organization that deploys or consumes an AI system operates under an implicit assumption: the model that was documented is the model that is computing. This assumption has no enforcement mechanism in widely deployed systems.
Every organization that deploys or consumes an AI system makes identity claims, whether it realizes it or not. "This is the model we evaluated." "This is the model specified in the contract." "This model has not been modified since our safety assessment." These claims are currently supported by documentary evidence: model cards, registry entries, version identifiers, and cryptographic hashes of weight files. This evidence identifies an artifact — a file, a record, an entry in a database. It does not identify the model. A model card records what was supposed to be deployed; it does not establish which model is actually computing.
That distinction matters. A neural network is not a static file that someone can right-click and rename. When it runs, its behavior depends on its specific internal structure — structure that was shaped by a particular training process and that no other model shares. Two networks can have different model cards, different version numbers, different file hashes, and yet carry the same teacher's knowledge inside them.
Conversely, two files with identical documentation can contain fundamentally different models. The file is the container. The model is what runs. In February 2026, Anthropic disclosed that multiple organizations had conducted industrial-scale distillation campaigns against Claude, using approximately 24,000 fraudulent accounts to generate over 16 million exchanges for the purpose of training competing models on Claude's outputs. The resulting systems were unauthorized copies carrying the teacher's behavioral fingerprint, yet they could satisfy many artifact-level documentary checks. They have their own weight files, their own model cards, their own version hashes. Every artifact-level record is consistent. The identity-level evidence is absent. No document in the governance stack records which model's knowledge is inside those weights, because no document in the governance stack asks that question. This gap is not unique to the Anthropic case. It is structural. Recent standards work confirms the pattern: NIST's agent-identity initiative, published in February 2026, addresses how to authenticate and authorize the actions of AI agents — but takes for granted that the model driving the agent is already known. An individual Internet-Draft circulated through the IETF Datatracker in March 2026 proposes cryptographic audit trails for AI decisions — but takes for granted that the model producing those decisions has been identified. Both are building useful infrastructure.
Neither addresses the identity layer that their infrastructure depends on. This paper addresses it. When someone claims that a deployed system is a particular model, what kind of evidence supports that claim? The answer depends on which aspect of the model's identity is being examined — because different aspects respond to different evidence, hold up under different conditions, and degrade on different timescales.
2. Three Evidence Classes and What Each Can Certify
If model identity cannot be established by documentation alone, what can establish it? The answer depends on which question you are asking. If you want to know which specific model is running — not which model was documented, but which model is computing right now — you need one kind of evidence. If you want to know whether the system is genuinely a neural network and not an obvious substitute or corrupted deployment, you need a different kind. If you want to know whether the model was copied from another — whether its training data included unauthorized material from someone else's model — you need a third kind. These are three different questions, and they require three different kinds of evidence. This paper calls them structural, thermodynamic, and functional evidence. Each is gathered at a different point in the system. Each holds up under different conditions. Each degrades on a different timescale. And critically, evidence of one kind cannot substitute for evidence of another — a point developed formally in §4. Throughout this paper, the validated regime refers to the populations and conditions already tested in the cited prior work: 22 models from 16 independent training lineages, subjected to distillation, fine-tuning, adversarial erasure, and direct targeting interventions.
Structural evidence answers the question: which specific model is this? It examines the model's internal activity while it is running. Specifically, it measures statistical patterns in the model's intermediate representations — the internal signals the model produces as it processes input, before it generates its final output. This is the most powerful form of identity evidence: it can distinguish a specific model from every other model in a tested population, much like a fingerprint distinguishes one person from another. Structural evidence is also the most durable. In the validated regime, it remained unchanged — within the noise expected from repeated measurement of the same model — under knowledge distillation (training a smaller model to mimic a larger one), supervised fine-tuning, and the tested adversarial erasure attempts. When researchers attempted to directly alter this fingerprint within a model's own training lineage, the fingerprint barely moved. When they attempted to force it across lineage boundaries, the model's performance collapsed before the fingerprint shifted substantially. The cost of this durability is access: structural evidence requires running the model and observing its internal computations, not just its outputs. Privileged runtime access is necessary; inspecting the model's files without running it is not sufficient to predict the fingerprint.
Thermodynamic evidence answers a broader question: is this system actually a neural network, or is it something else? It examines the shape of the model's raw output scores — the numerical values the model produces before they are converted into the probabilities a user sees. One measurable feature of these raw scores stays remarkably stable across all tested neural language models, regardless of who trained them, how large they are, or what data they were trained on. Across 22 models from 16 independent training lineages, this property varied by less than 4% — a remarkably tight band. Thermodynamic evidence is useful as a rapid sanity check: it can confirm that the system under examination behaves like a real neural network rather than an obvious substitute, a lookup table, or a corrupted deployment. What it cannot do is tell you which model it is. Two completely different models, trained by different organizations on different data, will produce nearly identical thermodynamic measurements.
Functional evidence answers a different question: where did this model's training come from? It examines stable patterns in the model's outputs — patterns that survive after removing the most predictable statistical effects, and that carry signatures of the model's training history. If a model was trained by copying the behavior of another model (a process called knowledge distillation), the copy carries detectable traces of the original. In the validated regime, distilled models retained 31–52% of the original model's functional signature. Functional evidence is the most accessible form of identity evidence — it requires only the model's outputs, the kind of access available through a standard API, without needing to see the model's internal workings. This makes functional evidence the keyhole of the framework: it can support meaningful identity-related claims even when the operator exposes no weights and no internal computations, only structured output access. But it is also the most fragile. Routine continued training — the ordinary process of updating a model on new instructions or domain-specific data, with no adversarial intent — erases these traces within one to two training cycles. In practical terms, that means days to weeks. An organization that waits months to check for unauthorized copying may find the evidence has already been overwritten by ordinary model updates.
These three evidence classes are not interchangeable. This is not a design preference — it follows from how each kind of evidence actually behaves. Each evidence class has its own deformation law: an observed regularity in how that kind of evidence changes, or resists change, under real-world conditions like training, fine-tuning, and tampering. The structural layer is stable where the functional layer is fragile. The thermodynamic layer is universal where the structural layer is individual. The functional layer reveals training lineage where the structural layer reveals identity. Using evidence from one class to support a claim that belongs to another is like using a blood-type match to establish a fingerprint identification — the evidence is real, but it answers the wrong question. This paper calls that error inferential debt: borrowing support from one evidentiary regime and spending it in another. This cross-class inadmissibility is not merely a methodological caution. It has both empirical and formal backing, developed in §4. Three specific directions of this independence have been confirmed by measurement across multiple model families and training interventions: structural evidence cannot certify functional claims, functional evidence cannot certify structural claims, and structural evidence cannot certify thermodynamic claims.
3. The Opacity Barrier
A natural hope for compliance is that model identity could be verified from static artifacts. If a weight file could be inspected, its statistics computed, and the result compared against a registry, identity verification would reduce to a documentation problem — checkable with the same tools used for software integrity. The model card would not merely be supporting documentation; it would itself become identity evidence.
This hope is not supported by the available measurements. Two systematic tests have been conducted. The first tested whether publicly available architecture specifications — hidden dimension, layer count, vocabulary size, head configuration, parameter count, and related descriptors — predict the structural fingerprint. Across 22 Transformer models from 16 training lineages, a regression using eight architecture features achieved an in-sample fit of 0.25 but a leave-one-out predictive score of −3.93 (a cross-validation measure where zero means predictions match the global average, and negative values mean predictions are worse than simply guessing the mean): the model's predictions were substantially worse than guessing the population mean. The failure is not rescued by the coarse public architecture descriptors tested here. Models with nearly identical specifications — the same depth, the same width, the same head configuration — can have structural fingerprints that differ by an order of magnitude.
Architecture sets the constraints; it does not determine the identity. The second test asked whether training-shaped properties of the endpoint weights — the geometry of the output projection matrix, the local density structure of the embedding space — carry predictive signal that architecture features do not. Across a validated 10-model overlap spanning six independent training lineages, endpoint weight statistics achieved a leave-one-out predictive score of −0.89.
The architecture-only baseline on the same overlap achieved −27.16. Both are worse than the population mean. The endpoint weight geometry does not predict the structural fingerprint, even though those weights were shaped by the same training process that produced the fingerprint. The pattern across both tests is the same: the structural fingerprint is shaped by training — it is neither an architecture constant nor a measurement artifact — but the signal was not recoverable from the tested static endpoint properties of the trained weights. The fingerprint is manifested in how the model transforms inputs through its internal geometry during inference, rather than being recoverable from the tested endpoint statistics alone. This has a direct regulatory consequence. Verification regimes that rely on static artifact inspection — weight-file checksums, parameter audits, registry comparisons, or statistical profiles of the output projection — are checking documentation, not identity. They can confirm that an artifact exists and that it matches a record. By themselves, they are insufficient to confirm that the artifact, when executed, produces the structural fingerprint that was measured during evaluation. An organization that needs to verify which model is running — not which model is documented, but which model is computing — must observe the model in operation. Within the validated regime, structural identity is not statically legible from artifacts alone and is most reliably established through direct measurement of the model in computation. This does not mean that static artifact documentation is valueless. Model cards, weight hashes, and registry entries are necessary for traceability, version control, and accountability. They serve the artifact layer of the evidence hierarchy. What they cannot do is substitute for the structural layer. They answer "which file is deployed?" — a question about artifacts. They do not answer "which model is running?" — a question about identity.
4. The Admissibility Condition
The evidence taxonomy in §2 established that different identity layers respond to different evidence and degrade on different timescales. §3 showed that the structural layer, in particular, is not accessible through static artifact inspection alone. This section addresses the logical relationship between evidence classes: when one class of evidence is used to support a claim that belongs to another class, what goes wrong? The answer has been formally established.
The core result is a verification-theoretic impossibility theorem, standard in form but applied here to the specific problem of model identity evidence. The theorem states: if two systems produce the same observation under a given evidence class but differ on whether a system-level identity claim holds, then no decision procedure restricted to that evidence class can be both sound and complete for that claim. Soundness means no false certifications; completeness means no missed certifications.
The theorem shows that when an evidence class cannot distinguish two systems that differ on a claim, no amount of analytic sophistication applied to that evidence class alone can close the gap. The limitation is not in the analysis — it is in the evidence. This general result has a direct corollary for layered identity. When two identity layers are operationally independent — meaning systems exist that produce the same observation under one evidence class while differing on a claim at another layer — then evidence restricted to one layer is formally insufficient to certify claims about the other. The corollary has been proved for three specific directions, each grounded in validated experimental data. First, structural evidence cannot certify functional claims. Distillation experiments across multiple model families show that a model's structural fingerprint remains within the measurement noise floor even as its functional output templates converge substantially toward the teacher.
class membership →
Two systems can be structurally indistinguishable yet functionally distinct. A verifier examining only structural evidence would have no basis for determining whether distillation had occurred. Second, functional evidence cannot certify structural claims. Routine continued training — standard fine-tuning on instruction-following or domain text, with no adversarial intent — erases the functional fingerprint within one to two epochs while the structural fingerprint remains unchanged. Two systems can be functionally indistinguishable yet structurally distinct. A verifier examining only functional evidence would have no basis for distinguishing the original model from a fine-tuned derivative. Third, structural evidence cannot certify thermodynamic claims. Across a validated population of 22 Transformer models, the structural and thermodynamic observables show no strong cross-sectional association in the validated sample. Models with nearly identical structural fingerprints can occupy measurably different positions in the thermodynamic observable space. A verifier examining only structural evidence would have no basis, in the validated sample, for certifying the corresponding thermodynamic claim. Each of these three directions has been independently confirmed by measurement and independently proved as a formal corollary of the parent theorem. The proofs were formally verified in the Coq proof assistant, which mechanically checks each logical step and flags any unresolved gap as an unfinished obligation; the proof files contain no such gaps. The result is a formal impossibility statement, conditional on the validated evidence patterns it addresses. This yields a practical admissibility standard. Any claim about a model's identity should declare which evidence class supports it, because evidence from one class is formally insufficient for claims that belong to another. A structural identity claim requires structural evidence — live measurement of the model's internal computation geometry. A provenance claim requires functional evidence — observation of the model's output behavior — and comes with an expiration date set by the training-erasure timescale. A broad class-membership claim in the validated regime requires thermodynamic evidence. And a claim that mixes evidence classes without acknowledging the crossing incurs inferential debt: it borrows support from one evidentiary regime and spends it in another, without the formal license to do so.
5. Regulatory Mapping
The European Union's AI Act entered into force in August 2024, with obligations for providers of general-purpose AI models applying from August 2025. Providers must draw up technical documentation covering training and testing processes, publish summaries of training content, and supply information enabling downstream providers to understand model capabilities and limitations. Providers of models assessed as presenting systemic risk face additional obligations: adversarial testing, systemic risk assessment and mitigation, cybersecurity protection, and incident reporting to the AI Office and national authorities. The General-Purpose AI Code of Practice, published by the European AI Office in July 2025, serves as a voluntary compliance tool — adherence provides legal certainty but is not mandatory. These obligations create substantial accountability demands. They require documentation of what was built, testing of how it performs, monitoring of how it behaves after deployment, and reporting when something goes wrong. What they do not specify is how to establish which model is the subject of all that documentation, testing, monitoring, and reporting. Current official materials do not specify a runtime procedure for model identity verification, lineage tracing beyond documentation-level summaries, or detection of unauthorized substitution or distillation. The accountability infrastructure assumes the identity question has already been answered.
The evidence taxonomy developed in this paper maps directly to the compliance questions that these obligations raise. In the table below, computation integrity refers to evidence that the deployed system is faithfully executing the documented artifact, rather than merely possessing the documented files.
| Compliance Question | Evidence Class | Access Required | Timescale |
|---|---|---|---|
| Is this the model we evaluated? | Structural | Privileged runtime access | One-time verification |
| Has this model been modified since deployment? | Structural | Privileged runtime access | Periodic re-verification |
| Was this model distilled from an unauthorized source? | Functional | API-level output | Time-sensitive: degrades with continued training |
| Does this system exhibit the thermodynamic signature of a deployed neural language model? | Thermodynamic | Inference output | One-time class check |
| Is the deployed system executing the documented artifact? | Structural + computation integrity | Verified execution environment | Per-inference or periodic |
Each row names a question that current obligations motivate but do not specify how to answer. The evidence taxonomy provides the missing specification: which evidence class is admissible for which question, what access it requires, and what timescale governs its reliability. Recent standards work has begun addressing adjacent problems. NIST's agent-identity initiative, published in February 2026, addresses how to authenticate and authorize the actions of AI agents. An individual Internet-Draft circulated through the IETF Datatracker in March 2026 proposes cryptographic audit trails for AI decisions. Both address adjacent authorization and audit-trail questions. Neither specifies how model identity itself should be established, what evidence class supports that determination, or how different evidence classes map to different identity layers. This paper provides an evidentiary framework for the identity layer that these adjacent efforts presuppose but do not build.
6. Operational Recommendations
The evidence taxonomy and admissibility framework developed in this paper yield several operational consequences for organizations that deploy, procure, or regulate AI systems.
Identity verification precedes behavioral testing. Testing a model's accuracy, fairness, safety, or compliance presupposes knowing which model is being tested. A substituted model — whether through unauthorized distillation, silent version updates, or deployment-pipeline errors — can still satisfy many behavioral benchmarks while being a fundamentally different system. When behavioral testing is used to support model-specific claims, testing without prior identity verification means the organization may be evaluating a system it has not conclusively identified. For evaluations intended to support model-specific claims, identity verification should precede behavioral testing.
Evidence class must match claim type. A structural identity claim — "this is the specific model we evaluated" — requires structural evidence: observation of the model's internal computation geometry during live inference. A provenance claim — "this model was not distilled from an unauthorized source" — requires functional evidence: analysis of the model's output behavior patterns. A broad class-membership claim in the validated regime — "this system is a deployed neural language model" — is supported by thermodynamic evidence. Documentary evidence — model cards, registry entries, weight-file hashes — supports artifact traceability but is insufficient as the sole basis for any of these identity claims. Each claim type has an admissible evidence class, and mixing classes without declaration incurs the inferential debt described in §2 and formalized in §4.
Functional evidence is time-sensitive. The detection window for training provenance is measured in training epochs, not calendar time. In the validated regime, routine continued training — fine-tuning on instruction-following data or domain text, with no adversarial intent — erases functional provenance evidence within one to two epochs. An organization that needs to determine whether a model was distilled from an unauthorized source should conduct that assessment as close to the model's release as possible. Waiting months for a scheduled audit may find the evidence already overwritten by ordinary model updates.
Structural evidence is the most durable foundation. Among the three evidence classes, structural evidence is the most resistant to modification in the tested non-destructive regime. It anchors the strongest identity claims — individual model authentication — and, in the validated non-destructive regime, has remained within the measurement noise floor across distillation, fine-tuning, the tested erasure interventions, and same-family direct targeting. Cross-family targeting was destructive before substantial movement occurred. For organizations that require persistent model identity assurance, structural evidence provides the most reliable foundation, though it requires the deepest access. Where privileged runtime access is not available — as with many commercial API deployments — functional and thermodynamic evidence remain applicable, though they address provenance and class membership rather than individual structural identity. The framework does not require every operator to expose internal computations. For many deployment contexts, structured output access is enough to support provenance and class-membership claims, even when individual structural identity remains out of reach.
Static documentation is necessary but not sufficient. Model cards, version identifiers, weight-file checksums, and registry entries serve essential functions: traceability, version control, accountability, and compliance documentation. This paper does not argue that static documentation is valueless — it argues that static documentation occupies the artifact layer of the evidence hierarchy and cannot, by itself, answer the identity question. An organization that relies solely on documentary evidence for identity claims is answering "which file is deployed?" when the operative question is "which model is computing?" This framework stops at admissible evidence. But it is worth noting that identity evidence need not serve only retrospective accountability. In environments where authorization depends on which model is running — access to sensitive data, permission to take consequential actions, eligibility for regulated deployments — identity evidence becomes an input to access-control decisions, not just an audit record. How organizations act on that evidence — permit, restrict, escalate, or refuse — is a governance decision that should be made explicit wherever model identity is security-relevant.
Acknowledgments
Portions of this research were developed in collaboration with AI systems that served as co-architects for experimental design, adversarial review, and manuscript preparation. All scientific claims, experimental designs, measurements, and editorial decisions remain the sole responsibility of the author.
Patent Disclosure
The measurement protocols and evidence framework described in this work operate within the scope of U.S. Provisional Patent Applications 63/982,893, 63/990,487, 63/996,680, and 64/003,244. All four provisional patents are assigned to Fall Risk AI, LLC.
Supplementary Material
This paper is accompanied by EvidenceSufficiency.v, a Coq proof file that formally verifies the observation-limited verification impossibility theorem and its three cross-layer inadmissibility corollaries described in §4. The file proves 5 theorems from 3 empirical axioms, with no unresolved obligations (Admitted), and compiles cleanly under the Rocq Prover 9.1.1. It is available as a supplementary file alongside this paper on Zenodo.
References
View 11 references ↓
[1] Coslett, A. R. (2026). The δ-Gene: Inference-Time Physical Unclonable Functions from Architecture-Invariant Output Geometry. Zenodo. DOI: 10.5281/zenodo.18704275
[2] Coslett, A. R. (2026). Template-Based Endpoint Verification via Logprob Order-Statistic Geometry. Zenodo. DOI: 10.5281/zenodo.18776711
[3] Coslett, A. R. (2026). The Geometry of Model Theft: Distillation Forensics, Adversarial Erasure, and the Illusion of Spoofing. Zenodo. DOI: 10.5281/zenodo.18818608
[4] Coslett, A. R. (2026). Provenance Generalization and Verification Scaling for Neural Network Forensics. Zenodo. DOI: 10.5281/zenodo.18872071
[5] Coslett, A. R. (2026). Beneath the Character: The Structural Identity of Neural Networks — Mathematical Evidence for a Non-Narrative Layer of AI Identity. Zenodo. DOI: 10.5281/zenodo.18907292
[6] Coslett, A. R. (2026). Which Model Is Running? Structural Identity as a Prerequisite for Trustworthy Zero-Knowledge Machine Learning. Zenodo. DOI: 10.5281/zenodo.19008116
[7] Coslett, A. R. (2026). The Deformation Laws of Neural Identity. Manuscript in preparation.
[8] Anthropic. (2026, February 23). Detecting and preventing distillation attacks. https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
[9] European Parliament and Council. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence Act). Articles 50, 53, 55.
[10] European AI Office. (2025). General-Purpose AI Code of Practice.
[11] NIST National Cybersecurity Center of Excellence. (2026, February). Accelerating the Adoption of Software and Artificial Intelligence Agent Identity and Authorization (Concept Paper). https://www.nccoe.nist.gov/projects/software-and-ai-agent-identity-and-authorization
[12] AILEX LLC. (2026, March). Verifiable AI Provenance (VAP) Framework and Legal AI Profile (LAP). Internet-Draft draft-ailex-vap-legal-ai-provenance-03. https://datatracker.ietf.org/doc/draft-ailex-vap-legal-ai-provenance/