Measured Model Substitution Under Valid Agent Credentials

Coslett, Anthony Ray

doi:10.5281/zenodo.19342848

Abstract

Three model substitution scenarios were executed against a live inference endpoint with real HTTP requests, signed attestation JWTs, and OPA policy enforcement. In each scenario, every tested workload, artifact, or API identity control relevant to that scenario — workload JWT validation, health checks, gateway process continuity, artifact manifest integrity, API key authentication — remained valid while the model changed. In each scenario, a structural identity measurement based on activation geometry during a standard forward pass detected the substitution and the enforcement layer denied the request. Three substitutions were tested and three were detected, with zero false accepts in this run. The warm-path verification latency was 5.7–6.7 seconds on a single A100 with the model already loaded. The complete evidence chain — before/after measurement results, attestation claim summaries, OPA policy evaluations, and HTTP response codes — is published alongside this note as machine-readable JSON.

1. Purpose

Enterprise identity systems can authenticate a workload, verify an artifact, and authorize a request without ever establishing which neural network is actually computing inside the service. That distinction is the point of this note.

The measurement used here identifies neural networks at inference time from the geometry of their output distributions, without inspecting weights, modifying the model, or requiring cooperation from the deployer. It has already been validated across 48 models spanning three architecture types and parameter counts from 410 million to 72.7 billion [1–6]. What has not yet been shown is the operational step: the measurement running inside a live gateway, issuing a signed attestation, evaluating policy, and changing the HTTP outcome of a real request.

This note provides that experiment.

The question it answers is operational rather than scientific: if an authorized model is enrolled behind a production-shaped gateway and a different model is substituted behind the same stable endpoint, does the measurement detect the substitution in a real request flow, and does the enforcement chain act on it?

Three kinds of identity evidence are relevant in these scenarios: artifact identity (which file was deployed), workload identity (which process or service is authenticated), and model identity (which neural network is actually computing). This note tests what happens when the first two remain valid while the third changes.

2. Architecture

Two servers run on the same host:

A model server on port 8080 exposes a health endpoint and an inference endpoint. The model server loads the neural network and serves completions.

A gateway on port 9090 intercepts every request and runs the following chain before deciding whether to proxy to the model server:

1. Validate the workload JWT
2. Check model server health via real HTTP to port 8080
3. Run the structural identity measurement against the enrolled anchor
4. Issue a signed attestation JWT encoding the measurement result
5. Verify the attestation JWT (full signature + claims verification)
6. Evaluate OPA policy against the verified attestation claims
7. If policy allows → proxy to model server, return inference response
8. If policy denies → return HTTP 403 directly from the gateway

Two tokens are in play throughout. The workload JWT (agent identity) is issued once before the experiment begins, stays valid across all phases, and represents the authenticated software agent — this token is green in every scenario. The attestation JWT (model identity) is issued per-measurement by the production attestation issuer and encodes whether the running model matches the enrolled model — this token turns red when the model changes.

Client

Workload JWT ✓

→

Gateway :9090

1. Validate workload JWT

2. Health check

3. Structural measurement

4. Issue attestation JWT

5. Verify attestation JWT

6. Evaluate OPA policy

✓ match → ALLOW ✗ mismatch → DENY 403

→

Model
Server

:8080

Figure 1. Gateway request flow with workload and model identity tokens. The gateway runs six checks; proxy and deny are the two outcomes. The workload JWT (step 1) remains valid in every scenario. The attestation JWT (step 4) encodes the structural measurement result and flips from match to mismatch when the model changes.

3. Three Scenarios

Scenario A: Same-Family Substitution Behind a Stable Endpoint

The enrolled model is Llama-3.1-8B-Instruct. The substituted model is DeepSeek-R1-Distill-Llama-8B — a model distilled onto the same Llama architecture with the same parameter count and the same API contract.

The gateway process stays alive throughout. The workload JWT is the same JWT. The health endpoint returns 200 in both cases. The policy hash is unchanged.

Before substitution (A1): The gateway measures the model, finds 0.0×ε distance from the enrolled anchor, issues an attestation with identity_match: true and confidence: high, OPA allows, the request is proxied to the model server, HTTP 200 is returned.

After substitution (A2): The gateway measures the model, finds 2,858×ε distance from the enrolled anchor, issues an attestation with identity_match: false and confidence: low, OPA denies with reason model_identity_mismatch, HTTP 403 is returned directly from the gateway. The request never reaches the model server.

Control	Before (A1)	After (A2)	Status
Gateway PID	1639	1639	Unchanged
Workload JWT	PASS	PASS	Unchanged
Health check	200	200	Unchanged
Policy hash	f061...bb2b	f061...bb2b	Unchanged
Attestation match	true	false	Changed
OPA verdict	allow	deny	Changed
HTTP response	200	403	Changed
Proxied to backend	yes	no	Changed

Every tested workload and endpoint identity control remained green. Only the model identity layer turned red.

Scenario B: Cross-Family Substitution with Artifact Hash Integrity

The enrolled model remains Llama-3.1-8B-Instruct from Scenario A. The substituted model is Qwen-2.5-7B-Instruct — a different architecture family, different tokenizer, different training lineage.

Both models' weight files are verified against build manifests containing SHA-256 hashes of every artifact. Both manifests pass. Llama: 10 files, all hashes match. Qwen: 10 files, all hashes match. Traditional artifact integrity verification sees nothing wrong with either model.

The structural identity measurement rejects the substituted model at 3,416×ε — exceeding Scenario A's same-family rejection of 2,858×ε, as expected for cross-family pairs. The signed attestation carries identity_match: false. OPA denies.

The manifests verified the files. They did not verify the computation.

Scenario C: Silent API Rotation

The enrolled model is gpt-4.1-mini served through the OpenAI API. The rotated model is gpt-4.1-nano. Both use the same API key, the same endpoint base, the same authentication, the same billing account. The provider authenticates both models identically.

Each model is enrolled across three independent sessions. Per-model thresholds are computed from maximum cross-session self-distance with a safety margin. Self-verification accepts both models against their own enrollment. Cross-verification rejects: the mean per-prompt distance of 5.66 exceeds both per-model thresholds (3.79 and 1.16).

The API key authenticated both models. The structural measurement distinguished them.

4. Validation Gates

Before any scenario executes, two validation gates confirm measurement engine integrity:

V1 (self-verification): Qwen-2.5-0.5B-Instruct enrolled and immediately re-measured. Distance: 0.0×ε. τ_mean = 0.1201, within canonical range (0.02–0.15), cross-seed CV = 0.081. The measurement produces a zero-distance self-match and a structurally consistent fingerprint.

V2 (cross-family rejection): Llama-3.2-1B-Instruct measured against the Qwen anchor. Distance: 2,652×ε. The measurement produces a decisive cross-family rejection.

Both gates passed before any experimental scenario was executed.

5. Timing

Model loading is infrastructure overhead — 34–45 seconds to download and shard an 8B model. This is not verification latency. With the model already loaded and serving:

Phase	Measurement	Gateway Latency
A1 (baseline)	6.71s	6.76s
A2 (substituted)	6.66s	6.70s
B (cross-family)	5.66s	—

OPA policy evaluation: < 1ms. The measurement dominates.

This measurement is not inline on every user request. It runs at model load, on a periodic schedule, or as an out-of-band health check — the same deployment pattern as certificate rotation or container attestation refresh. The per-request path carries the cached attestation JWT, not the measurement itself.

6. Platform

GPU: NVIDIA A100-SXM4-80GB
Runtime: PyTorch 2.4.1+cu124, CUDA 12.4
Contract: d=64, k=32, seeds=(42, 123, 456, 789), ε=1.003×10⁻⁴
Gateway: FastAPI/uvicorn on port 9090
Model server: FastAPI/uvicorn on port 8080
Attestation: production fallrisk_attest.py (RS256 signed JWTs)
Policy: production policy.rego logic (hash: f0610f29e279bb2b)

Distances throughout this note are expressed as multiples of ε, the acceptance threshold derived from measurement-platform numerical precision. A distance of 0.0×ε indicates an identical model; any distance above 1.0×ε indicates a different model.

7. What This Does and Does Not Establish

This experiment establishes that artifact integrity, workload identity, and API authentication can remain valid while runtime model identity changes — and that a structural identity measurement, integrated into a standard gateway enforcement chain, detects the change and produces an enforceable policy decision. Three substitution scenarios were tested — same-family, cross-family, and API rotation — and all three were detected with zero false accepts.

This experiment does not establish that existing identity controls are unnecessary. Workload identity, artifact integrity, and API authentication are real and necessary controls. The finding is that they occupy a different evidence class than model identity and do not answer the same question. Artifact integrity and workload identity did not establish runtime model identity in these scenarios.

8. Evidence Artifacts

The complete machine-readable evidence is published alongside this note:

cat3_results.json — structured results for all three scenarios, including the full before/after evidence chain for Scenario A
manifest_authorized.json — SHA-256 manifest for the enrolled model (10 files, all verified)
manifest_substituted.json — SHA-256 manifest for the substituted model (10 files, all verified)

References

View 6 references ↓

[1] A. R. Coslett, "The δ-Gene: Inference-Time Physical Unclonable Functions from Architecture-Invariant Output Geometry," 2026. DOI: 10.5281/zenodo.18704275

[2] A. R. Coslett, "Template-Based Endpoint Verification via Logprob Order-Statistic Geometry," 2026. DOI: 10.5281/zenodo.18776711

[3] A. R. Coslett, "Composable Model Identity: Formal Hardening of Structural Attestations in the Enterprise Identity Stack," 2026. DOI: 10.5281/zenodo.19099911

[4] A. R. Coslett, "Post-Hoc Disclosure Is Not Runtime Proof: Model Identity at Frontier Scale," 2026. DOI: 10.5281/zenodo.19216634

[5] A. R. Coslett, "Agent Identity Is Not Model Identity," 2026. DOI: 10.5281/zenodo.19240883

[6] A. R. Coslett, "What Counts as Proof? Admissible Evidence for Neural Network Identity Claims," 2026. DOI: 10.5281/zenodo.19058540

Cite this paper

A. R. Coslett, "Measured Model Substitution Under Valid Agent Credentials," CAT-3, Fall Risk AI, LLC, March 2026. DOI: 10.5281/zenodo.19342848

Click to select · Copy to clipboard

Acknowledgments

Portions of this research were developed in collaboration with AI systems that served as co-architects for experimental design, adversarial review, and manuscript preparation. All scientific claims, experimental designs, measurements, and editorial decisions remain the sole responsibility of the author. Experiments were conducted on Google Colab using NVIDIA A100-SXM4-80GB GPUs.

Author's Disclosure

Anthony Ray Coslett is the founder of Fall Risk AI, LLC, which holds the provisional patents listed below. The structural identity measurement described in this paper operates within the scope of that intellectual property. No external funding was received for this research.

Patent Disclosure

U.S. Provisional Patent Applications 63/982,893, 63/990,487, 63/996,680, and 64/003,244 are assigned to Fall Risk AI, LLC.