Abstract
Abstract
We study what model-identifying information leaks through commercial language-model APIs that expose top-\(k\) token log probabilities. Building on extreme-value theory predictions for logit order-statistic gaps, we confirm that the normalized third logit gap (\(\delta_{\text{norm}}\)) remains near the Gumbel-class constant \(\approx 0.318\) across 6 models from 3 providers (OpenAI, Google Vertex AI, xAI) and 3 independent measurement sessions, demonstrating that output-layer universality persists through API truncation and quantization. We introduce a PPP-residualization transform that removes the dominant tail scale factor and reveals a low-dimensional but stable endpoint-specific geometry in the remaining gap spectrum. Contrary to common assumption, "provider" is not a geometrically coherent label: models do not cluster by corporate origin under these observables, but they do separate by model identity across independent sessions. Using a challenge-response protocol with centroid averaging and per-model thresholds, we demonstrate cross-session endpoint verification with a 0.83% breach rate (119/120 correct identifications across three temporal sessions); per-model thresholds eliminate all breaches on this dataset. We observe a robustness phase transition governed by enrollment depth.
Under single-session enrollment, prompt selection is load-bearing: the majority of bootstrapped banks fail to separate the six endpoints. Under two-session enrollment, bank sensitivity collapses on this dataset, and a bank compiler produces small compiled banks that exceed the margin of larger uncompiled banks. A dimensionless robustness parameter \(\text{SNR}(K, S)\) unifies both axes: prompt count \(K\) and enrollment depth \(S\) jointly govern the transition from bank-sensitive to bank-robust verification. We discuss operational implications for re-enrollment cadence and template management in production deployments. Post-publication results extend this framework in two directions. A distillation experiment across six training protocols demonstrates that a model's structural fingerprint (weight-geometry regime) is completely invariant to knowledge distillation, while its functional fingerprint (PPP-residual template) converges 31--52% toward the teacher's — enabling forensic detection of distillation provenance through API measurements alone. A conditional impossibility theorem, machine-checked in Coq (41 theorems, 0 Admitted), proves that no standalone model can spoof another's PPP-residual template across independent challenge prompts without exhausting its KL divergence budget, under four explicit trust assumptions.
1. Introduction
Enterprise AI increasingly operates through API intermediaries. An organization procuring language model capabilities from a commercial provider trusts that the model behind the endpoint is the model it contracted for. This trust is rarely verified. Model substitution — where a cheaper or different model serves requests behind the same API facade — can occur through supply-chain compromise, cost optimization by middleware vendors, or operational error during deployment rotation. Existing model identification methods require access to model weights (representation fingerprinting [Zhang et al., 2025], weight statistics [Yoon et al., 2025]) or cooperation from the model operator (watermarking [Kirchenbauer et al., 2023]). Neither applies when the model is behind a commercial API controlled by a third party. This work asks a simpler question: what can you learn about a model's identity from the numbers it is already returning? Specifically, we exploit the top-\(k\) log probabilities that several major API providers expose alongside generated tokens. These logprobs are a truncated, noisy window into the model's output distribution — but they carry structural information about the output geometry that is stable across sessions and distinctive across models. We build on the \(\delta\)-gene framework [Coslett, 2026a], which establishes that the third pre-softmax logit gap \(G_3 = z_{(3)} - z_{(4)}\) is a temperature-invariant, architecture-agnostic behavioral fingerprint, and that extreme value theory predicts its normalized value as \(\delta_{\text{norm}} \approx 0.318\) — a universal constant of the Gumbel class. The weights-based Inference-Time Physical Unclonable Function (IT-PUF) achieves zero false acceptances across 1,012 comparisons spanning 23 models and 16 vendor families. However, it requires direct access to model activations. The present contribution extends this framework to the API regime: no weights, no activations, only the logprob numbers that commercial APIs already return.
We find that the same physics — EVT universality, structural model identity, challenge-response verification — survives the API wall, albeit with thinner margins that demand more careful protocol design. Contributions. This paper makes five contributions: 1. Universality through APIs (§3). We confirm \(\delta_{\text{norm}} \approx 0.318\) holds across 6 frontier models from 3 commercial providers, demonstrating that Gumbel-class output geometry persists through API truncation and infrastructure noise. 2. PPP residualization (§4). A transform that removes the dominant tail scale factor from logprob gap vectors, revealing model-specific geometric fingerprints invisible in raw logprob data. 3. Cross-session endpoint verification (§5). A challenge-response protocol achieving 119/120 correct cross-session identifications (0.83% breach rate with global threshold; 0/120 with per-model thresholds) across three independent temporal sessions. 4. An enrollment-depth phase transition (§6). We discover that single-session enrollment makes prompt selection load-bearing (the majority of random banks fail), while multi-session enrollment eliminates this sensitivity on our dataset. A bank compiler produces small compiled banks that exceed the margin of larger uncompiled banks. 5. Operational protocol (§7). A complete enrollment-verification-re-enrollment workflow with template management guidelines for production deployment. Post-publication results add two further contributions: 6. Distillation provenance (§9.1). Knowledge distillation transfers a model's functional fingerprint (PPP-residual template) 31--52% toward the teacher's, while the structural fingerprint (weight-geometry regime) is completely invariant — enabling forensic detection of distillation provenance through API measurements alone. 7. Conditional API impossibility (§9.2). A machine-checked theorem (41 Coq proofs, 0 Admitted) proving that no standalone model can spoof another's PPP-residual template across independent challenge prompts without exhausting its KL budget, under four explicit trust assumptions. The total experimental cost was \(0.72 across three providers, plus approximately \)56 in A100 compute for the distillation experiments.
2. Background
2.1 Output Geometry and the \(\delta\)-Gene
At each token position, a language model produces a vocabulary-sized logit vector \(\mathbf{z} \in \mathbb{R}^V\). Ordering these as \(z_{(1)} \geq z_{(2)} \geq \cdots \geq z_{(V)}\), the gaps between consecutive order statistics are
For large vocabularies (\(V = 32{,}000\)–\(128{,}000\)), extreme value theory predicts that these gaps follow a Poisson Point Process (PPP) with exponential spacings: \(k \cdot G_k \xrightarrow{d} \text{Exp}(\beta)\), where \(\beta\) is a tail scale parameter determined by the model's output distribution. The normalized third gap
is predicted to be \(\approx 0.318\), a distribution-free constant of the Gumbel class [Coslett, 2026]. This prediction has been validated across six distinct neural architectures (dense Transformer, MoE, RWKV, Mamba, and hybrids) using direct logit access.
2.2 API Logprob Interfaces
Several commercial API providers expose top-\(k\) log probabilities alongside generated tokens. At the time of writing, OpenAI, Google (Vertex AI), and xAI support a top_logprobs parameter with \(k \leq 20\). Not all providers expose logprobs — an asymmetry that may enable one-sided compliance probes (§8, Future Work). These logprobs are post-softmax (\(\log p_i\)), not raw logits, and represent a truncated, potentially quantized view of the output distribution. The question this paper addresses is whether the structural geometry predicted by EVT — specifically the gap ratios and their model-specific fingerprints — survives this truncation.
2.3 Threat Model
We consider an enterprise that procures AI capabilities through a commercial API and wishes to verify that the model behind the endpoint matches what was contracted. The adversary is a middleware layer that could substitute a cheaper model, route to a different provider, or serve cached responses. The verifier has: - API access with logprob capability - No access to model weights or activations - No cooperation from the model operator (beyond the standard API contract) - The ability to send challenge prompts and collect responses The verification goal is model-level identity — distinguishing a provider's frontier-large endpoint from their frontier-small endpoint, or from another provider's frontier endpoint entirely — not provider-level attestation (which we show is geometrically incoherent; §5.2).
3. Universality Through API Logprobs
3.1 Experimental Setup
We measured six frontier-tier production endpoints across three commercial API providers:
| Provider | Endpoint | Class | API |
|---|---|---|---|
| OpenAI | O-L | frontier-large | Chat Completions |
| OpenAI | O-S | frontier-small | Chat Completions |
| G-L | frontier-large | Vertex AI | |
| G-S | frontier-small | Vertex AI | |
| xAI | X-L | frontier-large | Chat Completions |
| xAI | X-S | frontier-small | Chat Completions |
Exact endpoint identifiers (model strings and version suffixes) are provided to reviewers and reproduction partners on request.\(^1\)
\(^1\) Our results are reproducible given any six logprob-capable frontier endpoints; the specific choice affects quantitative margins but not the qualitative findings.
Each model was queried with 10 challenge prompts (details in §5.1) at temperature \(T = 0\) with top_logprobs = 7, collecting the top-7 log probabilities at each generated token position. Three independent measurement sessions (A, B, C) were conducted over approximately 12 hours, with sessions separated by roughly 6-hour intervals. Each session produced 60 measurements (6 models \(\times\) 10 prompts), yielding 180 total measurements across 3 sessions. Sentinel handling. Some API providers return sentinel values (e.g., IEEE 754 maximum float) as logprob placeholders for zero-probability tokens. We apply a magnitude-based filter removing token positions with anomalous gap values prior to any downstream computation. Affected positions are rare and concentrated at early token positions where the vocabulary distribution is most concentrated.
3.2 Gap Structure and \(\delta_{\text{norm}}\)
For each token position in each response, we compute log-probability gaps \(G_k^{(\log)} = \log p_{(k)} - \log p_{(k+1)}\) from the top-\(k\) logprobs. The normalized third gap \(\delta_{\text{norm}} = G_3 / (G_2 + G_3 + G_4)\) is then computed per-token and averaged per model-prompt pair. Across all 6 models, 3 providers, and 3 sessions, per-prompt mean \(\delta_{\text{norm}}\) spans the range \(0.248\)–\(0.394\) across all model–prompt pairs. Model-level means (averaged across prompts and sessions) fall within 7% of the EVT prediction of \(0.318\), with 4 of 6 models within 4.4%. Table 1. \(\delta_{\text{norm}}\) universality across commercial API endpoints (Session C).
| Endpoint | Provider | \(n\) | \(\bar{\delta}_{\text{norm}}\) | SD | Dev. from 0.318 |
|---|---|---|---|---|---|
| O-S | OpenAI | 2,560 | 0.312 | 0.236 | 2.0% |
| O-L | OpenAI | 2,560 | 0.306 | 0.237 | 3.7% |
| X-S | xAI | 2,555 | 0.304 | 0.233 | 4.4% |
| G-S | Vertex | 381† | 0.304 | 0.236 | 4.4% |
| G-L | Vertex | 397† | 0.299 | 0.228 | 6.1% |
| X-L | xAI | 2,558 | 0.297 | 0.235 | 6.8% |
| Grand mean | — | — | 0.304 | — | 4.5% |
| EVT prediction | — | — | 0.318 | — | — |
†Vertex AI models produced shorter completions (~38–41 tokens/prompt vs 256 for OpenAI/xAI). Head-to-head truncation analysis confirmed the depth confound correction is modest (1.2–1.3×) and does not alter qualitative conclusions. Cross-session universality (Sessions A, B, C) is documented in the full experimental record. This confirms that the Gumbel-class universality established by Coslett [2026] for raw logits persists through the API truncation and any internal quantization or routing transformations applied by the providers. The per-token SD of \(\approx 0.23\) is itself consistent across all six endpoints, indicating that the noise structure — not just the mean — is universal.
3.3 Coverage and Depth Limitations
API logprobs expose only the top-\(k\) tokens (typically \(k \leq 20\); we use \(k = 7\)). This limits coverage: at any token position, we observe only a fraction of the vocabulary. Coverage varies by model and by token position — high-entropy positions (where probability mass is spread) have lower coverage than low-entropy positions (where the top few tokens dominate). We compute per-token coverage as \(\text{cov} = \sum_{i=1}^{k} p_i\), the total probability mass captured by the exposed logprobs. Across our dataset, median coverage ranges from 0.65 to 0.92 depending on model. Positions with very low coverage are excluded from gap computations, as the exposed tail may not represent the true order statistics faithfully.
4. PPP Residualization
4.1 The Dominant Scale Factor
Raw logprob gap vectors are dominated by a single parameter: the tail scale \(\beta\), which determines the overall magnitude of the gaps. Under the PPP model, \(\mathbb{E}[G_k] = \beta / k\). Different models have different \(\beta\) values (empirically, \(\beta_{\text{robust}} \in [1.44, 1.60]\) for raw logits [Coslett, 2026]), and this scale difference accounts for most of the inter-model variance in raw gap vectors. If model discrimination relied only on \(\beta\), it would be fragile: \(\beta\) can be estimated and subtracted. The question is whether model-specific structure remains after removing \(\beta\).
4.2 The Residualization Transform
For each token position, we fit the PPP prediction \(\hat{G}_k = \hat{\beta} / k\) to the observed gaps \(G_1, \ldots, G_K\) using a robust estimator for \(\hat{\beta}\) that down-weights the winner-take-all anomaly in \(G_1\) (which deviates systematically from the PPP model due to the gap between the "chosen" token and its nearest competitor). The residual vector is
These residuals encode deviations from the idealized PPP — structural features of the output geometry that are specific to each model's architecture and training, not explained by a single scale parameter. We construct per-prompt templates by averaging residual vectors across all tokens in a response, weighted by a coverage-dependent quality filter. A model's enrolled template is the mean of its per-prompt templates across enrollment sessions. Details of the weighting scheme and estimator internals are part of the measurement engine implementation [Coslett, 2026].
4.3 Geometric Content of Residuals
The 7-dimensional residual vectors carry model-specific geometry that is invisible in raw logprobs. Principal component analysis on the \(6 \times 7\) cross-model residual matrix reveals that 2–3 principal components capture \(> 90\%\) of the inter-model variance. This low-rank structure means that model identity is encoded in a small number of geometric features. Critically, the residual geometry does not cluster by provider. Fisher Linear Discriminant Analysis with provider as class label achieves effect size \(d < 0.5\) for all provider pairs (§5.2). The geometry reflects the model's learned output distribution, not the infrastructure serving it.
5. Cross-Session Endpoint Verification
5.1 Protocol
Challenge bank. We use a curated bank of 10 challenge prompts designed to elicit diverse regions of the output geometry. The prompts span multiple linguistic and reasoning modalities; the specific contents and category taxonomy are withheld as an operational credential. The bank construction methodology — selecting prompts that jointly improve discrimination across a declared model zoo — is described in §6.3. Template construction. For each model \(m\) and prompt \(p\), we compute the mean residual vector \(\bar{r}_{m,p}\) by averaging across all token positions in the response. The per-prompt templates are then averaged into a centroid:
For multi-session enrollment, templates from \(S\) sessions are averaged:
Verification. Given an enrolled template \(T_m^{\text{enroll}}\) and a fresh verification template \(T_m^{\text{verify}}\), the genuine distance is \(d_{\text{gen}} = \|T_m^{\text{enroll}} - T_m^{\text{verify}}\|_2\). The impostor distance for model \(j \neq m\) is \(d_{\text{imp}}(m, j) = \|T_m^{\text{enroll}} - T_j^{\text{verify}}\|_2\). A breach occurs when the genuine distance exceeds the minimum impostor distance for any model in the zoo. Thresholds. We evaluate two thresholding strategies: - Global threshold \(\tau\): A single threshold set as the midpoint between the maximum genuine distance and minimum impostor distance across all models. - Per-model threshold \(\tau_m\): Each model \(m\) gets its own threshold, set as the midpoint between its genuine distance and its nearest impostor.
5.2 Provider Non-Coherence
Before evaluating model-level verification, we test whether models cluster by provider. If they did, one could verify "this is an OpenAI model" without distinguishing which OpenAI model. Fisher LDA with provider as class label yields pairwise effect sizes:
| Provider Pair | Fisher \(d\) |
|---|---|
| OpenAI vs Google | 0.38 |
| OpenAI vs xAI | 0.29 |
| Google vs xAI | 0.47 |
All effect sizes are below the conventional "small effect" threshold of 0.5. The geometric hierarchy violates provider boundaries: we observe cases where cross-provider endpoint pairs are closer than within-provider pairs. Provider-level attestation is not supported by the logprob geometry. Implication: Endpoint verification must operate at the model level, not the provider level. This is actually a desirable property — it means the verification is sensitive to model identity, not to which company's infrastructure serves it.
5.3 Three-Session Results
Three sessions (A, B, C) were collected over approximately 12 hours. We evaluate all pairwise enrollment-verification combinations using \(K = 10\) centroid templates. Table 2. Cross-session verification results (global threshold \(\tau\), \(K = 10\)).
| Enrollment → Verify | Gap Ratio | Breaches | Status |
|---|---|---|---|
| A → B | 1.10× | 0/30 | Perfect |
| A → C | 0.72× | 1/30 | M\(_\text{weak}\) breach |
| B → C | 1.32× | 0/30 | Perfect (best) |
| (A+B) → C | 0.93× | 0/30 | Marginal |
| Total | — | 1/120 | 0.83% |
The sole breach occurs for one model (hereafter M\(_\text{weak}\)) in the A→C pair — the widest temporal gap.\(^2\) M\(_\text{weak}\) exhibits the highest drift across all session pairs and is the only model that ever breaches. Five of six models achieve zero breaches across all session pairs.
\(^2\) We anonymize the identity of the bottleneck model to avoid providing a targeted difficulty map. The model is one of the six listed in §3.1. Exact model identification is available to reviewers on request.
Per-model thresholds rescue all breaches. With \(\tau_m\) (each model's threshold set independently), the M\(_\text{weak}\) A→C breach disappears because M\(_\text{weak}\)'s own nearest impostor is still farther than its genuine drift. Per-model thresholds achieve 0/120 breaches across all session pairs and enrollment modes. Statistical note. The 95% upper confidence bound on the per-model FAR is \(\leq 3/120 \approx 2.5\%\) (rule of three). Larger zoo sizes would tighten this bound.
5.4 Temporal Decay
Drift between enrolled and verification templates increases with temporal separation, but not uniformly. The closest session pair (B↔C, ~6 hours) shows approximately half the drift of the widest pair (A↔C, ~12 hours), with A↔B intermediate. Notably, the two pairs with similar nominal time gaps (A↔B and B↔C, both ~6 hours) differ substantially in drift magnitude. This suggests that drift is not purely a function of elapsed time but also depends on whether provider infrastructure state (load balancer routing, MoE expert selection, etc.) has changed between sessions. System fingerprints (the specific pattern of logprob values at deterministic token positions) were identical across all three sessions for half the endpoints, while others showed routing variation with partially recurring fingerprints. Implication for production: Re-enrollment cadence should be drift-aware, not clock-driven. A monitoring system can track genuine distance trends and trigger re-enrollment when drift approaches the verification margin.
6. The Enrollment-Depth Phase Transition
6.1 Single-Session Fragility
Under single-session enrollment (Session A → Session B), prompt selection is load-bearing. A leave-one-prompt-out (LOPO) analysis shows that 6 of 10 prompts are individually necessary for \(K = 10\) perfect separation: removing any one of them causes a breach. A bootstrap analysis drawing random 10-prompt subsets (with replacement from our candidate pool) yields a breach rate of 93.5% — nearly all random banks fail. This is not a failure of the physics. It reflects insufficient averaging: with a single enrollment session, the template centroid contains token-level noise that happens to align with or against the separation direction depending on which prompts are included. The strongest "hero" prompt provides \(\approx\!+3.4\) margin for the bottleneck pair; the worst "poison" prompt provides \(\approx\!-1.5\) margin. The single-session centroid is dominated by this prompt-level variance.
6.2 Multi-Session Robustness
Under two-session enrollment ((A+B) averaged → Session C), bank sensitivity collapses entirely: Table 3. Bootstrap breach rates by enrollment depth (\(B = 10{,}000\) random banks, \(K = 10\)).
| Enrollment Mode | K=10 Breach Rate |
|---|---|
| Single session (A→B) | 93.5% |
| Multi-session ((A+B)→C) | 0.0% |
Under single-session enrollment, the vast majority of randomly drawn banks fail to separate all six endpoints. Under two-session enrollment, no failures are observed at \(K = 10\) across 10,000 bootstrap resamples. Intermediate values of \(K\) show a monotone decrease in breach rate with increasing bank size under both enrollment modes, but the qualitative transition — from majority-failure to zero-failure — occurs specifically at the single-to-multi-session boundary. Caveat. Sessions B and C are temporally proximate (~6 hours apart, with identical system fingerprints for 3/6 models). The (A+B) enrollment template may benefit from the B component's proximity to C. A conservative reading conditions this result on enrollment sessions that span the typical drift range of the verification target. Validation with multi-day session gaps would strengthen this claim.
6.3 Bank Compilation
We implement a bank compiler that builds prompt banks by iteratively selecting prompts whose joint contribution improves worst-case verification margin across all models in the declared zoo. The algorithm uses training-session data to evaluate candidate prompts and selects those that improve the bottleneck pair's separation. The compiler achieves separation from small compiled banks (single-digit \(K\)). Compiled banks generalize to the held-out session across all tested bank sizes — zero overfitting. Notably, compiled banks at small \(K\) achieve margins exceeding those of the full uncompiled bank on the held-out session. This "less is more" effect occurs because certain prompts degrade the bottleneck pair's separation, diluting the centroid when included; the compiler identifies and excludes these. Note: The compiled bank is parameterized by the declared zoo. If the enrolled model set changes, the compiler should be re-run — prompts that degrade one bottleneck pair may strengthen another.
6.4 Characterizing the Transition
The experiments reveal a qualitative transition in verification behavior as enrollment depth increases: Under single-session enrollment, prompt selection dominates: the majority of random banks fail, and verification performance is highly sensitive to which prompts are included. Under multi-session enrollment, this sensitivity collapses and verification becomes robust to bank composition. An intermediate regime — multi-session enrollment with bank compilation — achieves higher margins than either extreme, suggesting that compilation and enrollment depth are complementary rather than redundant. The sharpness of this transition (from 93.5% failure to 0% failure) is surprising given that only one additional enrollment session was added. We attribute this to the centroid-averaging mechanism operating near the tail of the breach distribution, where small improvements in signal-to-noise ratio produce large changes in tail probability (§6.5).
6.5 A Robustness Model
We formalize the transition with a dimensionless robustness parameter. For the bottleneck pair (model \(m\) vs nearest impostor \(j\)), define:
where \(\Delta\mu_{m,j}\) is the true separation vector in residual feature space and \(\sigma_{\text{eff}}(K, S)\) is the effective template noise after centroid averaging over \(K\) prompts and \(S\) enrollment sessions. The breach probability under a Gaussian approximation reduces to:
where \(\sigma_\parallel\) is the noise variance projected onto the separation direction \(\hat{u} = \Delta\mu / \|\Delta\mu\|\). This model captures the two-axis structure: - Increasing \(K\) (more prompts) concentrates the centroid by \(\sqrt{K}\) - Increasing \(S\) (more sessions) suppresses template noise by \(\sqrt{S/(1+S)}\) - At \(S = 1\): the factor is \(\sqrt{K/2}\); at \(S = 2\): \(\sqrt{2K/3}\) The improvement from \(S = 1 \to 2\) is modest in the exponent (\(\sqrt{4/3} \approx 1.15\times\)), but because the operating point is near the Gaussian tail, small SNR changes produce dramatic probability changes — explaining the sharp phase transition observed empirically. Empirical check. For the bottleneck model M\(_\text{weak}\), per-prompt margins yield \(\mu = 0.087\), \(\sigma = 0.885\), giving \(\mu/\sigma = 0.099\). The scalar Gaussian model predicts \(\sim\!38\%\) breach at \(K = 10\); empirical: 0%. The overprediction arises because the scalar model treats \(\sigma\) as total scatter rather than the relevant \(\sigma_\parallel\) along the separation direction. The low-rank discrimination manifold (participation ratio \(\approx 1.1\)–\(1.3\)) means \(\sigma_\parallel \ll \sigma\), accelerating concentration beyond the scalar prediction.
7. Operational Protocol
7.1 Template-Based Endpoint Verification (TBEV)
The full verification protocol proceeds as: 1. Enrollment. Send challenge bank through the target API endpoint across \(S \geq 2\) sessions. Compute PPP-residualized gap vectors, average into per-prompt templates, form centroid. Store as the enrolled template with metadata (bank identifier, session timestamps, model label, per-model threshold). 2. Verification. Send the same challenge bank through the endpoint. Compute fresh template. Compare L2 distance to enrolled template (genuine test) and to all other enrolled templates in the zoo (impostor test). If genuine distance \(<\) impostor distance (or \(<\) per-model threshold \(\tau_m\)), the endpoint passes verification. 3. Re-enrollment. Monitor drift between consecutive verification measurements. When drift approaches the verification margin, trigger re-enrollment with fresh sessions. Template management. The enrolled template functions as a geometric anchor — the reference point against which verification measurements are compared. Like any verification credential, it must be minted (enrollment), can be rotated (re-enrollment when drift accumulates), and can be revoked (if the enrollment data is compromised). The challenge bank serves as a measurement schedule — it diversifies the observation space but the verification security derives from the template geometry, not from bank secrecy. In the robust regime (\(S \geq 2\)), any bank achieves separation; in the fragile regime (\(S = 1\)), the bank is load-bearing and its contents should be protected.
8. Limitations and Future Work
Zoo size. The API verification zoo contains 6 endpoints from 3 providers. Larger zoo sizes (especially within-family pairs from the same provider) would stress-test the discrimination margin and are needed before any deployment claim. Temporal range. Our three sessions span approximately 12 hours. Multi-day and multi-week session gaps are needed to characterize long-term drift and establish re-enrollment cadence bounds. The temporal proximity of Sessions B and C may partially inflate the multi-session enrollment benefit reported in §6.2. Phase transition generality. The enrollment-depth phase transition is demonstrated on a small candidate pool. Generalization to larger candidate pools and to zoo compositions not tested here is expected but unvalidated. No formal security guarantee. Unlike the weights-based IT-PUF [Coslett, 2026], the API regime had no formal impossibility result for spoofing at time of initial publication. An adversary with knowledge of the challenge bank and sufficient API access could potentially train a proxy that matches the target's gap structure. The bank compiler provides margin engineering, not cryptographic security.
This gap is now addressed in §9.2, which presents a conditional impossibility theorem for the API regime. Provider API stability. Commercial APIs can change logprob behavior at any time — adjusting quantization, modifying the number of exposed tokens, or removing logprob access entirely. Any production deployment must monitor for API contract changes. K-scaling with larger pools. The compiled-bank advantage is demonstrated within a small candidate pool. Testing with 50–100 candidate prompts would reveal whether the "less is more" effect persists or is an artifact of the small pool. Bridge to formal verification. The SNR model (§6.5) is currently empirical. Formalizing the concentration inequality for template centroids — connecting the participation ratio to \(\sigma_\parallel\) — would provide a provable bound on bank sizing, potentially amenable to the Coq verification framework established for the weights-based regime. A first step toward this bridge — a conditional impossibility theorem for API-regime spoofing — is reported in §9.2. Capability-asymmetry probes. Heterogeneous API telemetry across providers — where some expose logprobs and others do not — suggests the possibility of one-sided policy-compliance probes exploiting observable incompatibility as a detection signal. We defer exploration of this deployment mode to future work.
9. Post-Publication Results
Two results obtained after initial submission address the most significant limitation identified in §8 — the absence of a formal security guarantee — and extend the PPP-residualization framework to distillation provenance detection.
9.1 Distillation Provenance
9.1.1 Motivation
The weights-based IT-PUF [Coslett, 2026, §9.1] identified knowledge distillation as a critical untested attack class. On February 23, 2026, Anthropic publicly disclosed that DeepSeek, Moonshot AI, and MiniMax had conducted industrial-scale distillation campaigns against Claude, involving approximately 16 million exchanges through 24,000 fraudulent accounts. This disclosure made the question operationally urgent: does distillation transfer a model's PPP-residual fingerprint?
9.1.2 The Two-Layer Identity Hypothesis
We hypothesize that neural network identity operates on two separable layers: 1. Structural identity — the weight-geometry observable \(g_{\text{norm}}\) measured by the weights-regime IT-PUF. Determined by architecture and pre-training; not transferable through output matching. 2. Functional identity — the PPP-residualized template measured through API logprob interfaces (this paper, §4). Determined by the model's learned competitive dynamics among vocabulary items; potentially transferable through distillation. If confirmed, the two layers provide complementary forensic capabilities: the weights regime answers "what model is this?" while the API regime answers "who taught this model?"
9.1.3 Convergence Metric
To quantify transfer, we define template convergence as the fraction of the baseline-to-teacher distance closed in PPP-residual template space:
where \(T_S\) is the student's PPP-residual centroid template after distillation, \(T_{S,0}\) is the student's undistilled baseline template, \(T_A\) is the teacher's template, and \(d(\cdot, \cdot)\) is \(L^2\) distance in the 7-dimensional residual space defined in §4.2. A value of 0 indicates no movement toward the teacher; a value of 1 would indicate perfect template matching.
9.1.4 Experimental Design
We distill six variants from a single teacher (Qwen2.5-7B-Instruct, 7.6B parameters) into two student architectures (Qwen2.5-0.5B-Instruct, 494M; Llama-3.2-1B-Instruct, 1.24B), training for three epochs each and measuring every checkpoint in both the weights and API regimes. The six variants span the distillation design space:
| Variant | Protocol | Top-\(K\) | Purpose |
|---|---|---|---|
| A1 | High-bandwidth logit KD | 200 | Maximum information transfer |
| B1 | Top-\(K\) masked KD | 20 | API-realistic bottleneck |
| B2 | Top-\(K\) masked KD | 7 | Extreme information constraint |
| C1 | Cross-tokenizer SFT | N/A | Text-only transfer, different architecture |
| D1 | Self-distillation | 200 | Control: isolates "more training" effect |
| E1 | Shuffled-logits KD | 200 | Control: tests distribution shape vs.\ structural ordering |
Training uses bfloat16 precision, KL divergence temperature \(T = 2.0\), and AdamW optimizer (\(\text{lr} = 2 \times 10^{-5}\)). Variant labels (A1--E1) are retained for cross-referencing with the companion technical report.
9.1.5 Results: Structural Identity Is Invariant
All 18 distilled checkpoints remain within \(1\)--\(5 \times \varepsilon\) of their own undistilled baselines in \(g_{\text{norm}}\) distance, where \(\varepsilon = 1.003 \times 10^{-4}\) is the IT-PUF acceptance threshold. The teacher remains \(726\)--\(1{,}212 \times \varepsilon\) away from every student at every epoch. From the weights perspective, all five Qwen-0.5B students are the same model regardless of distillation protocol. This confirms the impossibility result of the weights-based IT-PUF [Coslett, 2026, §6.3]: \(g_{\text{norm}}\) measures weight geometry, and distillation changes what a model outputs without moving the structural observable.
9.1.6 Results: Functional Identity Transfers
The PPP-residualized templates — the observables defined in this paper — tell a different story. Teacher-distilled models converge toward the teacher's functional fingerprint, while both controls show no convergence (Figure \ref{fig:two-layer-identity}).
| Variant | \(\text{Conv}_T\) (best epoch) | Trajectory |
|---|---|---|
| A1 (high-bandwidth KD) | \(\mathbf{0.52}\) | Converges toward teacher |
| B1 (top-20 KD) | \(0.36\) | Converges toward teacher |
| B2 (top-7 KD) | \(0.31\) | Converges toward teacher (slower) |
| C1 (cross-tokenizer SFT) | \(0.10^*\) | Non-monotonic; confounded\(^\dagger\) |
| D1 (self-distill control) | \(-0.61\) | Diverges from teacher |
| E1 (shuffled-logits control) | \(0.18\) | No sustained convergence |
\(^*\)C1 achieves minimum PPP distance 0.266 at epoch 2, but the Llama-1B baseline is already 0.296 from the teacher — only 10% improvement over the untrained model.
\(^\dagger\)The cleanest evidence comes from A1 and B1, which start far from the teacher (\(\sim\)1.35 and \(\sim\)1.13 respectively, against a baseline of 1.55) and show sustained monotonic convergence.
Control analysis. D1 (self-distillation) diverges monotonically from the teacher: \(2.25 \to 2.43 \to 2.48\). This eliminates the confound that additional training moves functional fingerprints toward arbitrary other models. E1 (shuffled-logits) preserves the teacher's marginal token distribution but destroys the structural ordering — which tokens compete at which ranks. The absence of convergence confirms that the PPP residual (§4) measures structural competitive dynamics, not aggregate distribution shape.
9.1.7 The Information Gradient
The three KD variants measure how much forensic information survives the API information bottleneck (Figure \ref{fig:info-gradient}):
| Top-\(K\) | \(\text{Conv}_T\) | Attenuation vs.\ full-vocab |
|---|---|---|
| 200 (high-bandwidth) | 0.52 | — |
| 20 (standard API) | 0.36 | \(0.69\times\) |
| 7 (minimal) | 0.31 | \(0.60\times\) |
The attenuation from \(K = 200\) to \(K = 20\) is modest. Even \(K = 7\) shows 31% convergence. The forensic channel survives aggressive bottlenecking, consistent with the \((d-1)\) expensive structural directions identified in §9.2.
9.1.8 Universality Check
All 18 distilled checkpoints produce \(\delta_{\text{norm}} \in [0.304, 0.349]\) (mean \(0.314\), CV \(3.0\%\)), confirming Gumbel universality across distilled models. Distillation does not disrupt the fundamental thermodynamic structure exploited by the PPP model (§3).
9.1.9 Forensic Implications
The two layers provide complementary forensic capabilities:
| Question | Regime | Observable | Answer |
|---|---|---|---|
| What model is this? | Weights | \(g_{\text{norm}}\) \(\tau\) vectors | Structural fingerprint — unforgeable |
| Who taught this model? | API | PPP residual template | Functional fingerprint — transfers |
| Was this model distilled? | Both | \(g_{\text{norm}}\) anchored + PPP shifted | Structural \(\neq\) functional \(\implies\) distillation detected |
Caveats. This experiment was conducted at small scale (0.5B--7B). Whether functional identity transfer survives at frontier scale (100B+) with industrial techniques is an open question. Template convergence is measured in PPP-residual template space and should not be interpreted as a fraction of capability transfer. Epistemological status: VALIDATED. The structural invariance is consistent with the proven impossibility theorem [Coslett, 2026, §6.3], but the functional identity transfer has no Coq backing.
9.2 Conditional Impossibility for API-Regime Spoofing
9.2.1 The Epistemological Gap
At time of initial publication, the weights regime had a machine-checked impossibility result [Coslett, 2026, NoSpoofing.v], while the API regime had only the empirical evidence reported in §5 of this paper. This section closes that gap with a conditional impossibility theorem for API-regime spoofing.
9.2.2 Threat Model
The theorem is conditional on four trust assumptions, formalized as Coq Hypothesis declarations:
| Assumption | Content | Excludes |
|---|---|---|
| TA1 | API returns logprobs from an actual model | Float-spoofing |
| TA2 | Attacker uses own weights; no real-time oracle | Forwarding attacks |
| TA3 | Challenge prompts unknown to attacker | Pre-optimization |
| TA4 | Attacker maintains \(\text{PPL} < \text{PPL}_{\max}\) | Gibberish matching |
The conditional structure mirrors TLS (certificates assume honest CA) and hardware PUFs (challenge-response assumes physical access control). Both forwarding and float-spoofing attacks trivially defeat any API verification scheme; the theorem characterizes the security boundary that holds when these attacks are excluded by deployment architecture.
9.2.3 Shift-Equivariance
The original proof strategy relied on a "median amplification" lemma: shifting \(\hat\beta\) by \(\Delta\) requires shifting at least \(\lceil n/2 \rceil\) components by \(\geq \Delta\). This lemma is false. Counterexample: \([0, 0, 10] \to [0, 1, 10]\) shifts the median by 1 with one component moved. The correct content is shift-equivariance: \(\hat\beta\) subtraction absorbs exactly the uniform scale direction. An attacker can shift all gap components by the same constant for free — the one direction \(\hat\beta\) removes. Every non-uniform structural difference survives \(\hat\beta\) subtraction, and each such direction costs KL per component. With \(d\) components, the attacker has 1 free direction and \((d-1)\) expensive directions. This result is stronger than the original lemma: it completely identifies the free and expensive subspaces rather than providing a counting bound. It also explains the PPP-residualization transform (§4.2) in information-theoretic terms: the \(\hat\beta\) subtraction is not merely a normalization step — it is the operation that strips the attacker's sole free direction, exposing the \((d-1)\) structural directions that carry the verification signal.
9.2.4 Proof Architecture
The formal proof (APINoSpoofing.v, 1,531 lines, 41 theorems, 0 Admitted, 0 Axioms) follows a modular structure: 1. Shift-equivariance — \(\hat\beta\) removes exactly 1 degree of freedom. 2. Structural detection — non-uniform gaps survive \(\hat\beta\) subtraction (proved for \(d = 2, 3\)). 3. Per-component KL cost floor — each structural direction costs \(c_i > 0\) KL (Hypotheses, mirroring Cramér-Rao in NoSpoofing.v). 4. Budget exhaustion — per-prompt closing cost exceeds per-prompt budget (API analog of Theorem T7 in [Coslett, 2026]). 5. Multi-prompt amplification — \(K\) independent prompts multiply total cost. 6. Pigeonhole detection — if total cost exceeds budget, at least one prompt detects. 7. Translation layer rigidity — post-processing transformations that approximately preserve gaps cannot eliminate the residual.
9.2.5 Status Summary
| Component | Status |
|---|---|
| Budget exhaustion structure | PROVEN (Coq, 0 Admitted) |
| Pigeonhole detection | PROVEN (Coq, 0 Admitted) |
| Shift-equivariance | PROVEN (Coq, 0 Admitted) |
| Translation layer bound | PROVEN (Coq, 0 Admitted) |
| Structural gaps \(g_i > 0\) | VALIDATED (§5 of this paper) |
| Per-component cost rates \(c_i > 0\) | CITED (information theory) |
| Tight \(c_i\) values | OPEN |
| General \(d\)-component pigeonhole | OPEN (currently \(d = 2, 3\)) |
| Class R (routing) attackers | OPEN |
The API regime advances from VALIDATED to PROVEN (conditional on TA1--TA4).
9.3 Connecting Distillation and Impossibility
The impossibility theorem (§9.2) proves that the structural directions in PPP-residual space are expensive to close for KL-bounded standalone attackers. The distillation experiment (§9.1) shows that even unbounded training with oracle teacher access closes only 31--52% of these directions. The two results address different threat models — §9.2 characterizes a bounded adversary, §9.1 characterizes an unbounded cooperative procedure — and should not be conflated. However, they exhibit an empirical correspondence: the directions the impossibility theorem identifies as expensive are the same directions that resist closure under oracle knowledge distillation. The distillation experiment validates that PPP-residual templates are functionally plastic under standard KD, while the weights-regime observable remains structurally rigid; the impossibility theorem proves that spoofing the functional template is budget-exhausting for standalone attackers under TA1--TA4. A natural extension is to test whether adversarial distillation — explicitly targeting the PPP residual geometry while maintaining generation quality — can close more than cooperative KD achieves. This question is addressed in [Coslett, 2026c], where adversarial erasure across two orders of magnitude of penalty strength fails to outperform passive fine-tuning at provenance erasure, and cross-family spoofing is shown to be a geometric illusion arising from capability-topology alignment.
10. Conclusion
We have shown that the output-geometry universality underlying the \(\delta\)-gene framework persists through commercial API logprob interfaces. Six frontier models from three providers produce normalized gap ratios within 7% of the Gumbel-class prediction, confirming that the physics is real and API-accessible. A PPP-residualization transform reveals model-specific geometric fingerprints hidden behind the dominant tail scale factor. These fingerprints are stable across independent sessions and distinctive across models — but not across providers. "Who serves the model" and "what model is being served" are geometrically orthogonal questions, and only the latter is answerable from logprob geometry. Cross-session verification achieves 119/120 correct identifications with a global threshold (0.83% breach rate); per-model thresholds eliminate all breaches. The sole weak point — one model exhibiting the highest drift across all session pairs — establishes the conservative lower bound rather than the representative case. The most significant finding is the enrollment-depth phase transition. Under single-session enrollment, prompt selection is load-bearing: the majority of random banks fail, and the specific bank functions as a credential. Under multi-session enrollment, this sensitivity collapses, and a bank compiler can produce small compiled banks that outperform larger uncompiled banks on held-out sessions.
The transition from fragile to robust verification is governed by a single, operationally controllable parameter — enrollment depth. Post-publication results extend this framework in two directions. Knowledge distillation transfers functional identity — the PPP-residual templates defined in §4 — from teacher to student, with convergence monotonic in the amount of teacher information available (§9.1). But the weights-regime structural fingerprint is completely invariant to distillation, enabling forensic detection of teacher provenance when both regimes are measured. A conditional impossibility theorem (§9.2) closes the epistemological gap: under threat assumptions TA1--TA4, no standalone model can spoof another's PPP-residual template across \(K\) independent prompts without exhausting its KL budget. The formal verification stack for the API regime now comprises 41 Coq theorems with zero uses of Admitted. The total cost of the experiments reported in this paper was \(\\)0.72\( for the API verification, plus approximately \)\$56$ in A100 compute for the distillation experiments. The models revealed their identities willingly, through the numbers they were already returning. Distilled models revealed their teachers' identities as well.
References
View 1 references ↓
Coslett, A. R. (2026a). The \(\delta\)-Gene: Inference-Time Physical Unclonable Functions from Architecture-Invariant Output Geometry. Zenodo. DOI: 10.5281/zenodo.18704275. Coslett, A. R. (2026c). The Geometry of Model Theft: Distillation Forensics, Adversarial Erasure, and the Illusion of Spoofing. Zenodo. DOI: [to be assigned]. de Haan, L. and Ferreira, A. (2006). Extreme Value Theory: An Introduction. Springer. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T. (2023). A watermark for large language models. ICML 2023. Leadbetter, M. R., Lindgren, G., and Rootzén, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer. Pappu, R., Recht, B., Taylor, J., and Gershenfeld, N. (2002). Physical one-way functions. Science, 297(5589):2026–2030. Resnick, S. I. (1987). Extreme Values, Regular Variation, and Point Processes. Springer. Shao, S., Li, Y., He, Y., Yao, H., Yang, W., Tao, D., and Qin, Z. (2025). SoK: Large language model copyright auditing via fingerprinting. arXiv:2508.19843. Shao, S., Li, Y., Yao, H., Chen, Y., Yang, Y., and Qin, Z. (2026). Reading between the lines: Towards reliable black-box LLM fingerprinting via zeroth-order gradient estimation. The ACM Web Conference (WWW 2026). arXiv:2510.06605. Suh, G. E. and Devadas, S. (2007). Physical unclonable functions for device authentication and secret key generation. DAC 2007. Yoon, D., Chun, M., Allen, T., Müller, H., Wang, M., and Sharma, R. (2025). Intrinsic fingerprint of LLMs: Continue training is NOT all you need to steal a model! arXiv:2507.03014. Zhang, J., Liu, D., Qian, C., Zhang, L., Liu, Y., Qiao, Y., and Shao, J. (2025). REEF: Representation encoding fingerprints for large language models. ICLR 2025. arXiv:2410.14273.
Acknowledgments
Portions of this research were developed in collaboration with AI systems that served as co-architects for experimental design, adversarial review, and manuscript preparation. All scientific claims, experimental designs, measurements, and editorial decisions remain the sole responsibility of the author.
Patent Disclosure
The API endpoint verification methodology described in this work is the subject of U.S. Provisional Patent Application 63/990,487. The weights-based identity verification methodology is the subject of U.S. Provisional Patent Application 63/982,893. Both provisional patents are assigned to Fall Risk AI, LLC.
Appendix A. Notation
| Symbol | Definition |
|---|---|
| \(z_{(k)}\) | \(k\)-th largest logit |
| \(G_k\) | \(k\)-th order-statistic gap: \(z_{(k)} - z_{(k+1)}\) |
| \(\delta_{\text{norm}}\) | Normalized third gap: \(G_3 / (G_2 + G_3 + G_4)\) |
| \(\beta\) | PPP tail scale parameter |
| \(r_k\) | PPP residual: \(G_k - \hat{\beta}/k\) |
| \(T_m^{(K,S)}\) | Enrolled template for model \(m\) (centroid over \(K\) prompts, \(S\) sessions) |
| \(d_{\text{gen}}\) | Genuine distance (same model, different sessions) |
| \(d_{\text{imp}}\) | Impostor distance (different models) |
| \(\tau_m\) | Per-model verification threshold |
| \(\text{SNR}(K,S)\) | Dimensionless robustness parameter |
| \(\sigma_\parallel\) | Template noise projected onto separation direction |
Appendix B. Margin Distribution (Bottleneck Model M\(_\text{weak}\))
The per-prompt signed margins for the bottleneck model exhibit a heavy-tailed positive distribution: \(\mu = 0.087\), \(\sigma = 0.885\). Of the 10 prompts, 4 contribute positive margin (\(> +0.05\)), 1 is neutral, and 5 contribute negative margin. The two strongest positive contributors (\(\approx\!+1.8\), \(\approx\!+1.3\)) are prompts that impose highly structured, cross-domain output constraints; the strongest negative contributor (\(\approx\!-1.5\)) elicits output where the bottleneck pair's responses happen to converge. Specific prompt identifiers and category labels are withheld as operational credentials. The asymmetry — two hero prompts outweighing five poison prompts — explains why centroid averaging concentrates faster than the symmetric Gaussian model in §6.5 predicts.