Post-Hoc Disclosure Is Not Runtime Proof: Model Identity at Frontier Scale

Coslett, Anthony Ray

doi:10.5281/zenodo.19216634

Abstract

Current AI deployment stacks authenticate agents, workloads, and credentials but do not verify which neural network is computing at inference time. Recent incidents — including the undisclosed use of an open-weight foundation model inside a commercial product, industrial-scale distillation campaigns, and emerging agent identity standards that authenticate software without authenticating models — show that this gap has practical consequences. Post-hoc disclosure resolved these incidents; runtime proof would have made the model identity question answerable at inference time. This paper asks whether runtime model identity is technically feasible at frontier scale. We present three results. First, we enrolled and verified five open-weight transformer models spanning 8 billion to 72.7 billion parameters across three families, with zero false acceptances in all pairwise comparisons and self-verification within the acceptance threshold for all models. A thermodynamic observable predicted by extreme value theory remained within two percent of its predicted value across the full range, with no statistically significant scale-dependent correction detected across more than two orders of magnitude in parameter count. Second, we tested structural separability on three declared-lineage distillation pairs spanning 8 billion to 70 billion parameters — each derivative sharing identical architecture with its base — and measured separations ranging from 2,858 to 4,583 times the acceptance threshold, increasing monotonically with model scale across two base-model families. All derivatives self-verified within the acceptance threshold. Third, we demonstrate a frontier-scale software attestation path — including signed JWT issuance and downstream policy consumption — and situate it within a previously formalized attestation architecture that composes with enterprise identity infrastructure, complementing rather than replacing current agent identity frameworks. These results demonstrate that runtime model identity is measurable and separable across the tested range of open-weight instruct-tuned transformers from 8B to 72.7B, with a frontier-validated software attestation path and an inherited route to stronger hardware-backed and proof-backed assurance.

1. The Runtime Identity Gap

Modern AI deployment infrastructure authenticates three things: artifacts, credentials, and agents. Model registries track which artifacts were shipped. API keys and OAuth tokens authenticate which credentials are presented. Workload and agent identity frameworks — SPIFFE, mTLS, workload attestation — authenticate which software is authorized to act. These are well-developed problems with mature tooling.

None of them identify the neural network actually computing.

This is not a gap in implementation. It is a gap in the category of evidence. Artifact identity records what was deployed at a point in time. Agent identity confirms which service is making a request. Model identity would confirm that the computational process currently producing outputs corresponds to a specific, enrolled model — not a substitution, not a derivative, not a silently updated replacement. To our knowledge, no widely deployed system publicly describes this assurance for model identity at runtime.

The distinction matters because the relationship between an artifact and a running model is not guaranteed to be stable. Open-weight models can be fine-tuned, distilled, quantized, composed into routing pipelines, and redeployed under new names without altering any credential, certificate, or workload identity. The agent identity layer continues to authenticate correctly throughout. The model underneath can change without any identity-layer signal.

Three recent classes of incident illustrate the consequence.

In March 2026, a commercially deployed AI coding tool launched a new flagship model described as the product of continued pretraining and reinforcement learning. Soon after launch, a developer examining the tool's API responses discovered an internal model identifier referencing a different organization's open-weight model. Subsequent reporting confirmed the base model's origin, and the deployer acknowledged the foundation. The detection mechanism was an accidental metadata leak in an API response — not a runtime identity check, not an audit trail, and not any property of the model itself. Public clarification and reporting followed after launch [1][2].

In February 2026, Anthropic disclosed that multiple AI laboratories had conducted industrial-scale distillation campaigns against its Claude models, involving over sixteen million interactions across approximately twenty-four thousand accounts created specifically for capability extraction. A successful distillation campaign yields a derivative artifact rather than the original model itself. The disclosed detection path was usage-pattern analysis on the provider side, not a runtime model-identity check at inference [3].

Concurrently, a broader infrastructure response is underway. The NIST National Cybersecurity Center of Excellence issued a concept paper in early 2026 inviting public comment on how software and AI agents should be identified, authenticated, authorized, and audited, with responses informing a potential demonstration project [4]. Okta announced an AI agent identity suite designed to extend identity governance to autonomous AI systems [5]. The open-source community has also produced agent frameworks intended for broad deployment. Each of these efforts addresses the agent authentication layer — the question of which software is authorized to act. None, as publicly described, provides a runtime measurement of which model is producing the computation that the agent relies upon.

The pattern across these incidents and responses is consistent. When model identity was eventually established, it was established through post-hoc disclosure: leaked metadata, provider-side detection, community reverse-engineering, or updated documentation. In each case, the public record shows model identity being established post hoc rather than through a runtime identity check at inference.

Post-hoc disclosure is not runtime proof. The former is a detective control that operates after the fact, depends on accident or effort, and provides no guarantee of timeliness. The latter would be a preventive control that operates at inference time, depends on measurement rather than trust, and composes with existing authorization infrastructure.

The distinction between these two is not semantic. It is the distinction between discovering a model substitution after deployment controversy and enabling enforcement at inference time.

This paper asks whether a runtime model identity layer is technically feasible at the model sizes where these incidents occurred.

We present evidence that it is. Using a structural measurement that observes activation geometry during a standard forward pass — without inspecting weights, modifying the model, or requiring provider cooperation — we enrolled and verified five open-weight transformer models spanning 8 billion to 72.7 billion parameters across three families. All five self-verified within the acceptance threshold. All ten pairwise comparisons produced correct rejections. A thermodynamic observable predicted by extreme value theory remained within two percent of its predicted value across the full parameter range. A declared same-family distillation pair — a 70-billion-parameter derivative and its stated base model, sharing identical architecture and parameter count — produced a structural separation of 4,583 times the acceptance threshold. Two additional declared-lineage pairs at 8 billion and 14 billion parameters showed the same pattern, with separations of 2,858 and 3,616 times the threshold respectively, increasing monotonically with scale.

The measurement layer used in these experiments is operational rather than merely conceptual. Prior work in this series established the formal properties of the structural observable [6], its behavior under adversarial training perturbations [7,8], the formal admissibility conditions for identity evidence across independent layers [9], its composition with standard enterprise identity protocols [10], and its deployment through zero-knowledge and hardware-attested proof pipelines [11]. This paper contributes the frontier-scale empirical evidence that the measurement layer remains viable — and the market context that explains why it is needed now.

2. Structural Identity at Frontier Scale

The remainder of this paper is organized as follows. Section 2 presents the frontier-scale structural and thermodynamic validation. Section 3 presents the declared-lineage structural separability result. Section 4 examines what current deployment stacks can and cannot see in terms of model identity evidence. Section 5 describes the attestation architecture that connects the measurement to existing enterprise infrastructure. Section 6 discusses limitations and open questions.

The structural fingerprint used throughout this work is the IT-PUF (Inference-Time Physically Unclonable Function), introduced in prior work on the δ-gene structural observable [6] and further analyzed in distillation forensics [7], provenance generalization [8], and the three-layer deformation doctrine [12]. The measurement observes the geometry of hidden-state activations during a standard forward pass. At two fixed sites within the model's layer stack, activation vectors are captured and decomposed into radial and tangential components relative to the token-level centroid. This decomposition yields a scalar τ ∈ [0,1] — the structural fingerprint — measuring the tangential fraction of variance at each prompt and measurement site. Across multiple prompts and seeds, these scalars form a 64-dimensional response vector that characterizes the model's structural identity. No weight inspection or model modification is required; the measurement observes what the model computes, not what it stores. The full derivation, including the gauge-transport decomposition, appears in [6], with the three-layer deformation doctrine — the finding that neural identity comprises three independently varying observable layers, each governed by distinct deformation laws — established in [12].

Why the observable is stable. The intuition for why this observable is stable across architectures and scales rests on a property of the normalization layers (RMSNorm or LayerNorm) present in all tested transformer architectures. Normalization layers discard radial information before it reaches downstream computation — they divide each activation vector by its magnitude, preserving only directional content. The structural measurement exploits this by observing the pre-normalization geometry, where the radial-tangential decomposition captures architectural structure that is imposed by the normalization mechanism itself rather than learned during training. The tangential fraction τ measures the portion of variance that survives normalization and drives computation. Because this geometric structure is architecturally imposed, it provides a measurement site that is stable across model families while remaining sensitive to training-induced differences in the geometry of what each model computes. The acceptance threshold ε is calibrated from the variance observed when the same model is measured repeatedly under controlled conditions; it represents the noise floor of the measurement itself rather than a tunable parameter.

Prior validation of this measurement covered 23 open-weight models spanning 410 million to 7.6 billion parameters across 16 families and 3 architecture types, with zero false acceptances in 1,012 pairwise comparisons [6]. The question left open was whether the same measurement pipeline — without modification — would produce valid structural fingerprints at the model sizes deployed in production by the organizations described in §1.

Frontier enrollment. We enrolled five open-weight instruct-tuned transformer models spanning 8 billion to 72.7 billion parameters across three families. Each model was loaded in bfloat16 precision with automatic multi-GPU sharding across three NVIDIA A100-SXM4-80GB GPUs (238 GB total VRAM). The measurement pipeline was identical to the sub-7B validation: same prompt bank, same seeds (42, 123, 456, 789), same dual-site 64-dimensional protocol, same acceptance threshold ε. No changes to the measurement logic were required for frontier-scale models. The only operational adaptation was subprocess isolation between sequential model loads, to prevent CUDA memory fragmentation from biasing the device-map planner.

Model	Family	Parameters	Layers	Enrollment Time	Self-Verification
Llama-3.1-8B-Instruct	Llama	8.0B	32	8.0 s	0.0 × ε
Qwen2.5-14B-Instruct	Qwen	14.8B	48	11.7 s	0.0 × ε
Mistral-Small-24B-Instruct	Mistral	23.6B	40	13.2 s	0.0 × ε
Llama-3.1-70B-Instruct	Llama	70.6B	80	29.5 s	0.0 × ε
Qwen2.5-72B-Instruct	Qwen	72.7B	80	29.8 s	0.0 × ε

All five models enrolled on first attempt with zero validation failures across all four seeds. Self-verification — remeasuring the enrolled model and comparing against its own anchor — produced 0.0 × ε in every case, reflecting the deterministic inference path (bfloat16, greedy decoding, fixed prompt bank). In operational deployments where hardware, software versions, or serving configurations may differ, self-verification distances may be nonzero but are expected to remain well within the acceptance threshold ε. The acceptance criterion is distance ≤ ε, not literal zero; the observed zero is a property of this specific measurement environment, not a normative requirement. Pairwise discrimination. All ten pairwise comparisons between the five enrolled models produced correct rejections:

Pair	Distance	Relationship
Llama-3.1-8B ↔ Llama-3.1-70B	67 × ε	Same family, 9× scale gap
Mistral-24B ↔ Qwen-72B	951 × ε	Cross-family
Qwen-14B ↔ Qwen-72B	1,628 × ε	Same family, 5× scale gap
Mistral-24B ↔ Llama-70B	1,889 × ε	Cross-family
Llama-8B ↔ Mistral-24B	1,903 × ε	Cross-family
Qwen-14B ↔ Mistral-24B	2,113 × ε	Cross-family
Llama-8B ↔ Qwen-14B	2,178 × ε	Cross-family
Qwen-14B ↔ Llama-70B	2,198 × ε	Cross-family
Llama-70B ↔ Qwen-72B	2,502 × ε	Cross-family
Llama-8B ↔ Qwen-72B	2,513 × ε	Cross-family

Figure 1 · Pairwise Structural Distances (×ε)

L-8B

Q-14B

M-24B

L-70B

Q-72B

L-8B

—

2.2k

1.9k

67

2.5k

Q-14B

2.2k

—

2.1k

2.2k

1.6k

M-24B

1.9k

2.1k

—

1.9k

951

L-70B

67

2.2k

1.9k

—

2.5k

Q-72B

2.5k

1.6k

951

2.5k

—

Tightest pair (Llama-8B ↔ Llama-70B, 67×ε) highlighted green. All others exceed 900×ε.

All ten comparisons produced decisive rejections. The tightest frontier pair is Llama-3.1-8B ↔ Llama-3.1-70B at 67 × ε: same family label, same tokenizer, identical architecture class, but a 9× parameter gap. The measurement resolves this as two distinct models with 67 times the margin of the acceptance threshold. The remaining same-family pair (Qwen-14B ↔ Qwen-72B at 1,628 × ε) falls within the cross-family range rather than below it, indicating that family membership alone does not impose a strict ordering on structural distances. What is preserved at frontier is decisive separability: every pair, regardless of family relationship, is far above threshold.

Thermodynamic invariant. Alongside the structural fingerprint τ, a second observable — the normalized third logit gap δ_norm — was measured at three frontier scales via autoregressive generation. Extreme value theory predicts δ_norm ≈ 0.318 for softmax distributions over large vocabularies [6]. Prior cross-sectional validation across 22 transformers at 0.4B–7.6B [12] yielded a mean of 0.308 with CV 3.48%. The frontier measurements extend this range:

Model	Parameters	δ_norm	Tokens	Deviation from EVT
Llama-3.1-8B-Instruct	8.0B	0.3173	5,120	0.2%
Llama-3.1-70B-Instruct	70.6B	0.3153	5,120	0.8%
Qwen2.5-72B-Instruct	72.7B	0.3129	5,048	1.6%

Figure 2 · δ_norm Across 25 Transformer Models (0.4B–72.7B)

EVT: 0.318

Parameters (B, log scale) →

δ_norm

○ Prior (22 models)
● Frontier (this paper)

No statistically significant scale correction detected (OLS p = 0.69). EVT universality confirmed across 180× parameter range.

All three values fall within 2% of the EVT prediction. A formal trend analysis combining the 22 prior models with these three frontier measurements (25 total data points, 0.41B–72.7B) found no statistically significant scale-dependent correction: OLS slope p = 0.69 on the full dataset, p = 0.34 on the subset excluding the known low-side outlier (Yi-1.5-6B-Chat, δ_norm = 0.271, z = −3.47σ from the cross-sectional mean), with Spearman ρ not significant in any tested configuration. The absence of a significant scale correction across the tested range — which spans over two orders of magnitude in parameter count — is consistent with the theoretical derivation, which depends on the softmax normalization structure rather than model size [6].

This result has a specific implication for the three-layer deformation doctrine established in [12]. That work documented three observable layers — structural (τ, the activation-geometry fingerprint), thermodynamic (δ_norm, the normalized logit gap), and functional (the PPP residual, a template derived from logprob distributions that captures what the model has learned to predict rather than how it computes) — each obeying distinct deformation laws under training perturbation. The frontier data confirms that the first two layers retain their documented properties at 72B: all five measured structural fingerprints were distinct, and δ_norm remained approximately universal across the tested range. The third layer is not measured in this paper; its absence defines one of the paper's explicit limits and is discussed in §6.

3. Structural Separability Under Declared Lineage

Operational scaling. Enrollment time increased modestly with model size across the tested range: 8 seconds at 8B, 30 seconds at 72B, consistent with forward-pass cost dominating the measurement. No obvious superlinear overhead from the measurement hooks or fingerprint computation was apparent in this frontier sample. At the cloud GPU rates used for these experiments (A100 spot instances), the compute cost per frontier enrollment was approximately $0.05.

Section 2 established that the structural measurement pipeline produces valid, discriminative fingerprints at frontier scale. This section asks a harder question: when a frontier model is explicitly derived from another through distillation — same architecture, same parameter count, same layer count, declared lineage — does the structural fingerprint still resolve the two as distinct?

The test case is a real production artifact, not a laboratory construction. DeepSeek released DeepSeek-R1-Distill-Llama-70B as a publicly available model, described as a distilled derivative of Meta's Llama-3.3-70B-Instruct [14]. The base and derivative share identical architecture (LlamaForCausalLM), identical parameter count (70.6 billion), identical layer count (80), and identical hidden dimension (8,192). What differs is the training trajectory: the derivative underwent a distillation process that reshaped its internal representations without altering its architectural specification.

This is the hardest structural separability test in this program to date. Prior distillation experiments at sub-7B scales [7] tested teacher-student pairs that differed in parameter count by 7–15×, providing a natural structural gap. Here, the base and derivative are architecturally identical at frontier scale. Any measured separation must arise from training-history-induced changes to activation geometry rather than from differences in architecture, tokenizer, or model size.

Experimental design. This experiment was conducted in March 2026, after the frontier enrollment results in §2. Llama-3.3-70B-Instruct was obtained directly from Meta and converted to HuggingFace safetensors format. DeepSeek-R1-Distill-Llama-70B was loaded from its public HuggingFace repository. Both were enrolled using the same pipeline described in §2: bfloat16 precision, three-GPU sharding, four seeds, dual-site 64-dimensional protocol. Each enrollment ran in a subprocess-isolated process with fresh CUDA state. Both models enrolled cleanly: all four seeds passed validation, and both self-verified at 0.0 × ε. Cross-verification was performed in both directions:

Direction	Seed 42	Seed 123	Seed 456	Seed 789	Maximum
Derivative measured against base anchor	2,721 × ε	4,321 × ε	1,815 × ε	4,583 × ε	4,583 × ε
Base measured against derivative anchor	2,721 × ε	4,321 × ε	1,815 × ε	4,583 × ε	4,583 × ε

The symmetry is exact at reported precision: both directions produce identical per-seed distances and an identical maximum of 4,583 × ε. This is expected from the L2 distance metric used in the protocol and confirms that neither anchor orientation introduces asymmetry. All four seeds produce distances far above the acceptance threshold. The weakest seed (456 at 1,815 × ε) is still over eighteen hundred times the threshold. The conclusion does not depend on any single seed. Comparison to prior results. The magnitude of this separation can be contextualized against three reference classes from this paper and the broader program:

Comparison Type	Scale	Distance	Source
Sub-7B distillation (different-scale teacher↔student)	0.5B–7B	726–1,212 × ε	Prior distillation forensics [7]
Frontier same-family (Llama-3.1-8B ↔ Llama-3.1-70B)	8B–70B	67 × ε	§2 of this paper
Frontier cross-family (Llama-3.1-70B ↔ Qwen2.5-72B)	70B–72B	2,502 × ε	§2 of this paper
Frontier distillation (Llama-3.1-8B ↔ DeepSeek-R1-Distill-8B)	8B	2,858 × ε	This paper
Frontier distillation (Qwen-2.5-14B ↔ DeepSeek-R1-Distill-14B)	14B	3,616 × ε	This paper
Frontier distillation (Llama-3.3-70B ↔ DeepSeek-R1-Distill-70B)	70B	4,583 × ε	This paper

Figure 3 · Structural Distances Across Comparison Regimes (×ε, log scale)

Sub-7B distillation

726–1.2k

Frontier same-fam

67

Frontier cross-fam

2,502

Distill 8B

2,858

Distill 14B

3,616

Distill 70B

4,583

Upper: reference distances. Lower: declared-lineage distillation pairs (this paper). All values in multiples of ε.

For this 70B pair, the structural separation produced by distillation exceeded both the sub-7B distillation reference and the frontier cross-family reference measured in §2. Two additional declared-lineage distillation pairs — at 8B and 14B, both from the same distillation family (DeepSeek-R1) — showed the same pattern: structural separability far above threshold, with distances of 2,858 × ε and 3,616 × ε respectively. All three derivatives self-verified at 0.0 × ε against their own anchors. On all three measurements, the derivative is structurally farther from its own base than models of the same family that were not distilled. This breaks the sub-7B hierarchy, where same-family distances were consistently smaller than cross-family distances across 1,012 comparisons [6]. The break is informative: the base and derivative share architecture but have maximally different formative trajectories — one pretrained by its original organization, the other reshaped through DeepSeek's reasoning distillation process. Architecture is a necessary condition for structural proximity; it is not sufficient. The hierarchy holds for models that share training lineage (Llama-8B ↔ Llama-70B = 67×ε), not for architecture-matched models with divergent formative paths.

Interpretation. This result is consistent with the identity formation thesis — that structural identity is determined by training trajectory and locked during the formative phase of training — developed in prior work on training-stage dynamics [15]. More specifically, it instantiates at production scale a result formalized in that same work: architecture and training specification, absent trajectory-specific information, do not uniquely determine structural identity [15]. That theorem was established at 410 million parameters in a controlled seed experiment; the present result confirms it at 70 billion parameters with a production artifact. The result also exhibits a striking asymmetry with prior adversarial findings. Earlier work on dynamic inertness [12] showed that direct, unconstrained gradient targeting of the structural fingerprint — the strongest possible targeted attack — produced only tens of multiples of the acceptance threshold in movement before plateauing. Here, distillation that was not targeting the structural observable at all produced over four thousand multiples of departure. Targeted perturbation along the fingerprint's nullspace is resisted; formative training that reshapes what the model computes produces large departures. The structural carrier is sensitive to the latter and inert to the former. A model that has undergone distillation from a different teacher system has traversed a fundamentally different formative path, even when the starting architecture is shared. The structural fingerprint reflects that divergence. One possible explanation for the magnitude of the separations is that frontier models possess substantially more structural degrees of freedom than sub-7B models. A 70-billion-parameter model has more activation dimensions, more layers, and a richer geometry available for the distillation process to reshape. The three declared-lineage pairs in this study — all from the same distillation family but spanning a 9× parameter range across two base-model families (Llama and Qwen) — show a monotonically increasing sequence: 2,858 × ε at 8B, 3,616 × ε at 14B, and 4,583 × ε at 70B. The Qwen-based 14B pair sits between the two Llama-based pairs, which argues against base family as the primary driver. This trend is consistent with the hypothesis that structural separability under distillation increases with model scale, but three points across two families constitute an exploratory observation, not a confirmed scaling relationship. The interpretation is consistent with the prior finding that structural identity is training-determined, not architecture-readable [12].

Operational implication. The structural measurement distinguishes the derivative from its base with a margin of over four thousand times the acceptance threshold, using the same pipeline that enrolled five models in §2 and required approximately thirty seconds per enrollment. The measurement requires no knowledge of the distillation method, no access to the teacher model, and no cooperation from either deployer. It requires only a forward pass through the model under test. For deployment contexts where the question is not "which family does this model belong to?" but rather "is this the specific model that was authorized?" — the measurement resolves the question decisively, even when the authorized and unauthorized models share identical architecture, identical parameter count, and declared lineage.

4. What Current Stacks Cannot See

The results in §§2–3 demonstrate that structural model identity is measurable and separable at frontier scale, including under declared same-family distillation. This section examines why that capability does not yet exist in deployed infrastructure — and what class of assurance it would provide if it did.

A deployment stack that governs AI-powered services must, in principle, answer four questions: 1. What artifact was shipped? Model registries, version-controlled repositories, and bills of materials record which files were deployed at a given time. This is artifact identity — a static record. 2. What agent or workload is authenticated? OAuth tokens, SPIFFE identities, mTLS certificates, and API keys authenticate which software is authorized to make requests. This is workload identity — a runtime credential. 3. What model is currently computing? At the moment of inference, is the neural network producing outputs the same one that was enrolled, authorized, and attested? This is runtime model identity — a measurement. 4. Is training lineage detectable? If the model was derived from another through distillation, fine-tuning, or continued pretraining, is the lineage detectable? This is a forensic question.

Figure 4 · The Identity Evidence Gap

Q1 What artifact was shipped? Registries, BOMs, version control

✓ Addressed

Q2 What workload is authenticated? OAuth, SPIFFE, mTLS, API keys

✓ Addressed

EVIDENCE GAP

Q3 What model is computing? Runtime structural measurement

✗ Not addressed

Q4 Is training lineage detectable? Forensic provenance analysis

✗ Post-hoc only

Q1–Q2 addressed by current standards. Q3–Q4: the evidence gap this paper demonstrates is technically closable.

Current enterprise stacks address questions 1 and 2 with mature tooling. Question 3 is not addressed by any widely deployed system. Question 4 is typically addressed post-hoc, when lineage is inferred through external evidence. Questions 3 and 4 correspond respectively to the structural and functional layers of the three-layer deformation doctrine [12], with questions 1 and 2 addressing the artifact and credential layers that sit above the measurement surface.

The gap between questions 2 and 3 is not a missing feature. It is a missing evidence class.

Prior work in this series formalized this distinction. The admissibility doctrine, established in prior work on evidence classification for identity claims [9], holds that evidence from one identity layer cannot certify claims about another when the layers are operationally independent. Authenticating the workload (question 2) provides no information about which model the workload is running (question 3), because the credential binds to the software process, not to the neural network's computational identity. A workload can present valid credentials while serving a different model than the one that was authorized — through substitution, silent update, supply-chain compromise, or the ordinary practice of swapping open-weight models behind a stable API endpoint. This is not a theoretical concern. Each incident class described in §1 is an instance of a valid workload identity coexisting with an unverified or misidentified model identity. The coding tool's API credentials were valid; the model foundation was not the one initially disclosed. The distillation campaigns used authenticated accounts to extract capability into derivative models rather than access the original model directly. The agent identity frameworks being developed by standards bodies authenticate the agent correctly — they are designed to. The question they do not ask is whether the model behind the agent is the one it claims to be.

For models accessible only through commercial APIs — the regime that applies to the first incident class — prior work established a separate verification methodology based on logprob order-statistic geometry [13], validated at commercial API scale across three providers [8]. That regime operates within the API interface itself, without weight access, and addresses question 3 for API-served models.

5. Toward Runtime Model Identity

The formal structure of the gap can be stated precisely. Let W denote a workload identity credential and M denote a runtime model identity claim. Current stacks verify W. The assumption — implicit in every deployment that does not perform runtime model verification — is that valid W implies correct M. But W and M are operationally independent: W is issued by an identity provider based on software configuration, while M is a property of the neural network's learned parameters as manifested in computation. No runtime binding between the two is enforced by the standard identity stack. The evidence presented in §§2–3 shows that M is obtainable. The structural fingerprint τ yields a runtime model identity measurement that is obtainable in 30 seconds at frontier scale, deterministic under controlled inference conditions, and separable across all tested model pairs — including pairs that share identical architecture, parameter count, and declared lineage. What remains is the question of how this measurement integrates with the infrastructure that already handles W. That is the subject of §5.

The gap identified in §4 — the absence of a runtime binding between workload identity W and model identity M — is partially closed at frontier scale by the measurement and software attestation path validated in this work. This section describes what was demonstrated, what is inherited from prior work, and what remains.

What was demonstrated at frontier scale. The experiments described in §§2–3 did not only validate the structural measurement. They also validated the software attestation path exercised in this study: the measurement engine enrolled each model, compared it against its anchor, issued a software-signed JSON Web Token (JWT) carrying the identity verdict, and the resulting claim was consumed by a standard Open Policy Agent (OPA) policy that produced correct ALLOW and DENY decisions. For a 70-billion-parameter model, this end-to-end path — from forward pass through signed claim to policy decision — completed in approximately thirty seconds. The attestation claim was expressed as a project-specific profile built on the Entity Attestation Token (EAT) framework defined in RFC 9711 [16], carrying fields that include the identity verdict, the trust mode, and a binding digest linking the claim to a specific enrollment anchor. This is the same format consumed by the enterprise integration surface described in prior work on composable model identity [10]: Envoy proxies, Cedar policies, SPIFFE-issued identity documents, and SIEM event pipelines. Prior work [10] proved that this composition preserves four formal composition properties — issuer authenticity, reference integrity, non-separability, and temporal binding — under the standard Coq verification stack used throughout this series. What this paper adds is the empirical demonstration that the software attestation path operates unchanged at the frontier model sizes studied here.

What is inherited but not newly frontier-validated. The attestation architecture supports two trust configurations, documented in prior work on identity-conditioned inference verification [11]. In the standard mode — the mode validated at frontier in this paper — the measurement engine runs as a software process alongside the model, and the attestation is signed by a software key. In the TEE-backed mode, validated in prior work on NVIDIA H100 with Intel TDX at up to 7B [11], the measurement runs inside a hardware-attested confidential computing enclave, and the attestation is co-signed by the hardware attestation service. A zero-knowledge proof-of-computation pipeline, also validated at sub-7B [11], provides a third option: cryptographic assurance that the measurement was performed correctly, without disclosing the model's internal representations. Extending the TEE-backed and ZK proof layers to 70B+ models involves multi-GPU sharding within a confidential computing boundary — feasible but not yet demonstrated. The distinction matters: the measurement and its signed claim are frontier-validated; the cryptographic assurance that the measurement was performed honestly remains validated at sub-7B only.

Standards alignment. The NIST NCCoE concept paper referenced in §1 asks how AI agents should be identified, authenticated, authorized, and audited [4]. The attestation architecture described here provides a concrete candidate answer to the authentication component for the model layer: the structural measurement identifies the model, the EAT-profiled JWT authenticates the claim, and standard policy engines authorize or deny based on the claim's content. The April 2, 2026 comment deadline for the NIST concept paper falls within the publication window of this work.

6. Limitations and Open Questions

The measurement layer does not replace workload identity. It complements it. A complete deployment stack would verify both W (the agent is authorized) and M (the model is the one that was enrolled and attested). The infrastructure for W exists. The evidence presented in this paper demonstrates that both the measurement and the software attestation path for M are technically viable at frontier scale. The remaining engineering task is extending the stronger trust anchors — hardware attestation and proof-of-computation — to the same scale.

This paper closes the runtime model identity gap structurally at frontier scale but does not yet measure frontier-scale functional teacher trace. The structural and functional layers are formally independent evidence classes under the admissibility doctrine (§4; formally established in [9]); confirming structural separability does not certify functional provenance, and the converse also holds. Prior work validated functional provenance at sub-7B scales, demonstrating 31–52% convergence toward the teacher in distillation scenarios [7], with subsequent work establishing the directional alignment diagnostic, calibrating the measurability threshold, and confirming generalization across multiple teacher-student families [8]. Whether that functional signal persists, compresses, or amplifies at 70B remains an open empirical question and the natural next experiment in this program.

The structural separability results in §3 are drawn from three declared-lineage distillation pairs spanning 8B to 70B. The observed monotonic increase in structural separation with scale (2,858 × ε → 3,616 × ε → 4,583 × ε) is consistent with the hypothesis that separability amplifies at larger scales, but three points across two base-model families remain an exploratory trend, not a confirmed scaling relationship. Family is partially confounded with scale (8B and 70B are Llama-based; 14B is Qwen-based), and the Qwen pair's placement between the two Llama pairs — while suggestive — does not rule out family effects. Establishing whether the trend generalizes requires testing additional distillation families across a wider parameter range.

The software attestation path (measurement, JWT issuance, and policy consumption) is frontier-validated in this paper. The stronger trust layers — hardware-attested measurement within a confidential computing enclave and zero-knowledge proof that the measurement was performed correctly — remain validated at sub-7B only [11]. Extending them to 70B+ models involves multi-GPU sharding within a confidential computing boundary, which is feasible but not yet demonstrated. The three layers of the attestation stack (software-signed, hardware-attested, and proof-backed) provide progressively stronger trust anchors; this paper demonstrates the first at frontier scale, with a clear path to the others.

Acknowledgments

Portions of this research were developed in collaboration with AI systems that served as co-architects for experimental design, adversarial review, formal verification sketching, and manuscript preparation. All scientific claims, experimental designs, measurements, and editorial decisions remain the sole responsibility of the author. Frontier validation experiments were conducted on RunPod cloud infrastructure using NVIDIA A100-SXM4-80GB GPUs.

Author's Disclosure

The author operates Fall Risk AI, LLC, a research and infrastructure entity developing the measurement and attestation capabilities described in this work. In connection with the standards discussion in §1 and §5, the author submitted formal public comments to the NIST National Cybersecurity Center of Excellence (NCCoE) regarding the Software and AI Agent Identity and Authorization concept paper prior to the April 2, 2026 comment deadline.

Patent Disclosure

The structural measurement protocol applied in this work operates within the scope of U.S. Provisional Patent Applications 63/982,893 (weights-based identity verification, filed February 13, 2026) and 63/990,487 (API-based endpoint verification, filed February 25, 2026). The broader identity verification and attestation framework of which this measurement is a component is additionally covered by U.S. Provisional Patent Applications 63/996,680 (privacy-preserving model identity verification, filed March 4, 2026) and 64/003,244 (identity-conditioned inference verification, filed March 12, 2026). All four provisional patents are assigned to Fall Risk AI, LLC.

References

View 16 references ↓

[1] Cursor, "Introducing Composer 2," Blog post, March 19, 2026. https://cursor.com/blog/composer-2

[2] TechCrunch, "Cursor admits its new coding model was built on top of Moonshot AI's Kimi," March 22, 2026. https://techcrunch.com/2026/03/22/cursor-admits-its-new-coding-model-was-built-on-top-of-moonshot-ais-kimi/

[3] Anthropic, "Detecting and preventing distillation attacks," Blog post, February 2026. https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

[4] NIST National Cybersecurity Center of Excellence, "Accelerating the Adoption of Software and AI Agent Identity and Authorization," Concept Paper, 2026.

[5] Okta, "Every AI Agent Needs an Identity," Blog post, March 2026. https://www.okta.com/blog/ai/okta-ai-agents-early-access-announcement/

[6] A. R. Coslett, "The δ-Gene: A Structural Observable for Neural Network Identity," Fall Risk AI, 2026. DOI: 10.5281/zenodo.18704275

[7] A. R. Coslett, "The Geometry of Model Theft: Distillation Forensics via Structural Fingerprinting," Fall Risk AI, 2026. DOI: 10.5281/zenodo.18818608

[8] A. R. Coslett, "Provenance and Neural Forensics: From Distillation Detection to Zero-Knowledge Attestation," Fall Risk AI, 2026. DOI: 10.5281/zenodo.18872071

[9] A. R. Coslett, "What Counts as Proof? Admissible Evidence for Neural Network Identity Claims," Fall Risk AI, 2026. DOI: 10.5281/zenodo.19058540

[10] A. R. Coslett, "Composable Model Identity: Enterprise Integration of Neural Network Identity Claims," Fall Risk AI, 2026. DOI: 10.5281/zenodo.19099911

[11] A. R. Coslett, "Which Model Is Running? Identity-Conditioned Inference Verification," Fall Risk AI, 2026. DOI: 10.5281/zenodo.19008116

[12] A. R. Coslett, "The Deformation Laws of Neural Identity," Fall Risk AI, 2026. DOI: 10.5281/zenodo.19055966

[13] A. R. Coslett, "Template-Based Endpoint Verification via Logprob Order-Statistic Geometry," Fall Risk AI, 2026. DOI: 10.5281/zenodo.18776711

[14] DeepSeek AI, "DeepSeek-R1-Distill-Llama-70B," HuggingFace model card, 2026. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

[15] A. R. Coslett, "Where Identity Comes From: Formation Dynamics of Structural Neural Network Identity," Fall Risk AI, 2026. DOI: 10.5281/zenodo.19118807

[16] IETF, "Entity Attestation Token (EAT)," RFC 9711, 2025.

Cite this paper

A. R. Coslett, "Post-Hoc Disclosure Is Not Runtime Proof: Model Identity at Frontier Scale," Paper XI, Fall Risk AI, LLC, March 2026. DOI: 10.5281/zenodo.19216634

Click to select · Copy to clipboard