The Deformation Laws of Neural Identity

Coslett, Anthony Ray

doi:10.5281/zenodo.19055966

Abstract. Neural network identity is not monolithic. Different observables — hidden-state geometry, pre-softmax logit statistics, and behavioral output templates — sit at different depths in the forward computation and respond to perturbation on different timescales. This paper shows that three identity layers — structural, thermodynamic, and functional — each obey a distinct validated deformation law. The structural layer is model-specific, stable under non-destructive training interventions, and inert under same-family direct targeting in the observed regime. The thermodynamic layer is approximately universal across a validated 22-model Transformer cross-section. The functional layer is volatile, transferring through distillation and eroding under continued fine-tuning. We resolve the carrier of the structural layer as a two-channel geometric observable requiring both token-level magnitude and token-level direction, and we falsify two natural simplifications: that the structural fingerprint reduces to a gauge projection, and that it is predictable from coarse architecture features. Together these results define an admissibility condition for neural identity claims: such claims must specify which layer they address, because the layers do not share a deformation law.

§1. Introduction

A neural network's identity — the set of observable properties that distinguish it from other models — is not a single quantity. Different observables sit at different depths in the forward computation: hidden-state geometry, pre-softmax logit statistics, and post-softmax behavioral patterns each respond to perturbation on different timescales and resist modification to different degrees. A framework that seeks to verify neural identity is only as sound as the stability of the observable it measures under real perturbation. This paper asks what that stability looks like, and finds that the answer depends entirely on which layer of identity is being measured. Prior work in this series observed that neural network identity operates across three distinct layers — structural, thermodynamic, and functional — with qualitatively different resilience profiles [3, 4]. This paper advances that observation into a doctrine by establishing the deformation laws that govern each layer, resolving the carrier mechanism of the structural layer, and confirming the cross-sectional independence of the structural and thermodynamic layers across a broad model population. Specifically: the structural layer is training-determined, stable under passive training in the observed regime, and inert under same-family direct targeting. The thermodynamic layer is approximately universal across a 22-model Transformer cross-section and shows no strong cross-sectional association with the structural layer in the validated sample. The functional layer is the most volatile, transferring through knowledge distillation and erasing under continued fine-tuning. We resolve the carrier mechanism of the structural layer as a two-channel geometric observable requiring both token-level magnitude and token-level direction, and we falsify two natural simplifications: the hypothesis that the structural fingerprint reduces to a one-dimensional gauge projection, and the hypothesis that it is predictable from publicly available architecture features. Together, these results define an admissibility condition for identity claims: any such claim must declare which layer it addresses, because the layers do not share a deformation law. This paper reports empirical findings validated by measurement across 22 Transformer models and 106 training checkpoints. The deformation laws advanced here are observational, not theorems, and the boundaries of what has been measured are stated explicitly throughout.

Observables and Scope

This paper measures three identity observables, each defined at a different depth in the forward computation: - Structural observable: a normalized statistic of cross-token hidden-state geometry, computed at an intermediate network depth. It captures how the model's internal representations distribute across the token and feature dimensions. Formal definition and validation appear in [Coslett, 2026a]; the measurement protocol is described in [Coslett, 2026b, 2026c]. - Thermodynamic observable: the normalized third gap in the ranked pre-softmax logit distribution — specifically, the ratio of the third-largest gap to the sum of the second, third, and fourth gaps. This statistic is predicted by extreme value theory for Gumbel-distributed order statistics. Derivation and empirical validation appear in [Coslett, 2026a]. - Functional observable: a residualized template derived from output log-probability patterns, constructed by removing the dominant scale component (estimated via a power-prior-predictive fit) to expose structural gap patterns across the top-ranked output positions. Definition and cross-session validation appear in [Coslett, 2026b]. The measurement noise floor (ε) is the repeat-measurement variability of a fixed model under the canonical measurement protocol — the maximum distance observed between independent measurements of the same model under identical conditions. Ratios reported in §5 (e.g., "42 times the noise floor") use this quantity as the denominator. Two model populations appear in this paper: - A 23-model validation zoo spanning 16 training lineages and 3 architecture types (standard Transformer, parallel Transformer, and one state-space model), used for structural distance context and distillation experiments in prior work and referenced in §2. - A 22-model Transformer cross-section (the same zoo excluding the state-space model), used for all thermodynamic and cross-sectional structural claims in §6. The state-space model is excluded pending designation of a canonical non-Transformer measurement lane, not because of a physical barrier (§6). The deformation laws reported in this paper are laws of these three observables in the validated regime. Other observables at the same computational depths may behave differently; the paper does not claim to have characterized all possible identity-relevant measurements.

§2. Three Layers of Neural Identity

Neural network identity, as measured by observable properties of a model's computation, is not monolithic. Distinct observables sit at different depths in the forward pass — hidden-state geometry, pre-softmax logit statistics, and post-softmax behavioral patterns — and they respond to perturbation on different timescales. Earlier papers in this series identified a three-layer hierarchy of identity observables with qualitatively different resilience profiles: a structural layer that was invariant under all tested training, a thermodynamic layer that was stable across adversarial interventions, and a functional layer that transferred through distillation but was erased by continued training [3, 4]. That hierarchy was framed as a security architecture [3] and explored philosophically through a two-layer (structural/functional) identity distinction [5]. This section deepens the empirical foundation by quantifying the deformation laws across a unified longitudinal dataset of 106 training checkpoints spanning three independent experimental studies, and by extending the framework from longitudinal to cross-sectional analysis across 22 Transformer models. The three-layer doctrine advanced here extends rather than replaces the Two-Layer Identity of [5] — the structural/functional distinction is preserved; the thermodynamic layer is now distinguished as a third observable with its own deformation law. Across the 106 checkpoints, three deformation laws emerge: The structural layer, measured through the geometry of hidden-state activations at an intermediate network depth, is the most resistant to change. Under knowledge distillation — including high-bandwidth logit matching, API-grade top-K transfer, and cross-architecture sequence-level training — the structural fingerprint remains within the measurement noise floor relative to the undistilled baseline [3]. The distance between any distilled student and its teacher exceeds the distance between the most similar unrelated models in the 23-model validation zoo (which includes one state-space model not in the Transformer cross-section). Distillation transfers knowledge. It does not transfer structural identity. The thermodynamic layer, measured through the normalized third gap in the pre-softmax logit distribution, is passively stable. Across the longitudinal interventions measured in this work, the thermodynamic observable remains within a narrow band and does not track the functional changes occurring simultaneously in the same models. Its value is consistent with the extreme value theory prediction for Gumbel-distributed order statistics. The correlation between functional-layer movement and thermodynamic stability is 0.17 with p = 0.23 across the full longitudinal dataset — indistinguishable from zero. The functional layer, measured through behavioral output templates derived from logprob distributions, is the most volatile. Knowledge distillation transfers 31–52% of the teacher's functional fingerprint to the student within three epochs [3], with the degree of transfer monotonic in the information bandwidth available during training. Passive fine-tuning on unrelated data erases this trace within one to two epochs [3]. The functional layer is the only layer that moves materially under any tested intervention. Cross-sectionally, the structural and thermodynamic layers occupy different regimes. Across 22 Transformer models from 16 training lineages, the structural fingerprint varies with a coefficient of variation near 96%, while the thermodynamic observable varies at 3.5% — a span ratio of approximately 18×. The rank correlation between the two is 0.09 (p = 0.68), with bootstrap confidence intervals that include zero. No strong cross-sectional association was observed in the 22-model sample, though weak-to-moderate coupling cannot be ruled out at this sample size. The three layers are not formally independent — they derive from the same learned weights, projected through the same forward pass. But in the observed non-destructive training regime, they are operationally independent: interventions that materially moved the functional layer did not detectably move the structural or thermodynamic layers. This operational independence is the basis for the deformation doctrine advanced in this paper: claims about identity must specify which layer they address, because the layers do not co-move under the perturbations that real-world deployment produces. The three deformation laws, stated explicitly: Structural law. In the tested non-destructive regime, the structural fingerprint remains within the measurement noise floor under non-destructive training interventions (distillation, fine-tuning, adversarial erasure) and resists same-family direct targeting even under unconstrained white-box optimization. Thermodynamic law. In the validated Transformer regime, the thermodynamic observable is approximately universal cross-sectionally (CV = 3.5% across 22 models), stable under longitudinal training interventions (CV < 2.1% across 106 checkpoints), and shows no strong cross-sectional association with the structural fingerprint in the validated 22-model sample (ρ = 0.09; weak-to-moderate coupling remains unresolved at this sample size). Functional law. The functional layer is volatile: it transfers partially through knowledge distillation (31–52% convergence toward the teacher) and, in the observed regime, is erased by continued fine-tuning within one to two epochs.

§3. What the Structural Fingerprint Depends On

The preceding paper in this series — "Which Model Is Running?" — reported an accidental observation during zero-knowledge verification development: a rescaling error compressed the structural fingerprint to approximately 1.5 bits of dynamic range, yet the fingerprint persisted at 0.98 rank correlation with its uncorrupted reference. The collapse broke cryptographic separability — models that were previously distinguishable became indistinguishable — but it did not erase the observable's underlying order structure. The paper noted that this suggested structural identity might reside in relational geometry rather than activation magnitude, but stopped short of resolving the mechanism. The two-channel structure identified here provides a resolution. The structural observable depends on two information channels that operate at different levels of the computation. The first is a marginal channel: the mean and spread of per-token activation statistics across the sequence dimension. This channel accounts for approximately 98% of the inter-model distance in the structural fingerprint space. It is the dominant driver of how far apart two models appear. The second is a directional channel: the rank ordering of per-token contributions within the measurement's aggregation step. This channel carries the per-model identity signature — the specific pattern that distinguishes one model from another within a family that shares similar marginal statistics. Both channels are independently necessary. When the directional component is destroyed — replaced with random unit vectors while preserving the original token-level magnitudes — the structural fingerprint does not merely degrade; the reconstruction becomes anti-correlated with the original fingerprint, meaning the result is worse than random. Conversely, when the magnitude component is destroyed — replaced with random norms while preserving the original directional structure — the same anti-correlation results. Neither channel alone preserves identity; both must be present for the fingerprint to be recognizable. This two-channel structure explains the precision-collapse observation. Bit-width reduction compresses the magnitude range of activations, degrading the marginal channel. But rank ordering — which tokens contribute most, and in which relative proportions — is preserved far below the precision floor where magnitude-based separability fails. The critical distinction is between magnitude compression, which shrinks the range of token norms without scrambling their relative order, and magnitude randomization, which destroys the order entirely. The N5 event was a compression: the directional channel survived because the rank structure of token contributions was preserved even as their absolute values became nearly indistinguishable. The adversarial test that produced anti-correlation was a randomization: replacing magnitudes with genuinely unrelated norms destroyed the rank structure that the directional channel depends on. Even extremely coarse quantization preserves strong structural agreement despite destroying magnitude-based separability. The directional channel survives because it depends on relational ordering, not absolute scale. The structural fingerprint is therefore best understood as a norm-weighted directional statistic: token-level magnitudes determine how each position contributes to the aggregate, while token-level directions determine what each position contributes. Magnitude without direction produces a weighted sum of noise. Direction without magnitude produces an unweighted average that destroys the identity-carrying emphasis pattern. The fingerprint requires both — not as redundant confirmation of the same information, but as two distinct inputs to a single geometric computation. Both channels are individually necessary; no single-channel reduction preserves the fingerprint. A simpler observable that captures the same information through a different geometric pathway has not been ruled out. What has been ruled out is the simplest candidate reduction — a single-channel approximation that computes only norms or only directions — because both channels are independently load-bearing. The space of admissible simplifications is bounded below by this necessity result.

§4. What the Structural Fingerprint Is Not

Two natural explanations for the structural fingerprint can be tested directly and both fail. The gauge explanation. Neural networks with softmax output layers possess a symmetry: adding a constant to all logits does not change the output distribution. This symmetry defines a gauge direction in activation space — a one-dimensional subspace that the model's output is provably blind to. A natural hypothesis is that the structural fingerprint measures variance along this gauge direction, which would elegantly explain temperature invariance (established in prior work) through a formally established gauge decomposition. The hypothesis is false. The gauge direction accounts for approximately 1.3% of what the structural observable measures. The full observable captures geometry that is roughly 65 times richer than any single privileged direction, including the gauge direction. The gauge direction is mildly special — about 2–3 times more aligned with the observable's directional components than a random direction — but it is not the carrier. The structural fingerprint is a multi-dimensional geometric measurement, not a one-dimensional projection. This result closes the simplest unification of the structural fingerprint with the gauge-transport decomposition established in prior formal work. The structural observable and the gauge decomposition describe overlapping but distinct geometric properties of the hidden-state space. They share a common mathematical substrate — the activation covariance structure — but the observable is not reducible to a gauge measurement. The architecture explanation. A second natural hypothesis is that the structural fingerprint is determined by the model's architecture: hidden dimension, layer count, vocabulary size, head configuration, and parameter count. If true, the fingerprint would be readable from a model card, which would have significant security implications — an attacker could estimate a target's fingerprint without access to the model itself. This hypothesis is also false. A regression using eight publicly available architecture features on 22 models achieves an in-sample fit of R² = 0.25 but a leave-one-out predictive score of R² = −3.93, meaning the model's predictions are substantially worse than simply guessing the population mean. The leave-one-out score is the definitive test: with 8 predictors on 22 observations, the in-sample R² is consistent with routine overfitting even if the true predictive signal were zero. The failure is not rescued by the coarse architectural descriptors tested here. In the validated regime, publicly available architecture features do not predict the structural fingerprint. Models with nearly identical architecture specifications — the same hidden dimension, the same layer depth, the same head configuration — can have structural fingerprints that differ by an order of magnitude. The fingerprint is shaped by the specific training run rather than recoverable from public architecture descriptors. Which aspects of training matter most remains open. Architecture sets the stage; training writes the script. The security consequence is direct: an attacker cannot estimate a target model's structural fingerprint from its published specifications. The fingerprint must be measured, not inferred.

§5. Deformation Under Pressure

The deformation doctrine is not merely an observation about passive stability. It extends to active adversarial pressure: the structural fingerprint resists deliberate modification, and the cost of forcing it to move is measured in model capability. Same-family targeting. When the structural observable is directly optimized under unconstrained white-box access, the fingerprint barely moves if the target is within the same model family. After 150 unconstrained optimization steps, the fingerprint moved by approximately 42 times the measurement noise floor. This is detectable and directional, but it remains well inside the acceptance region: the smallest same-family distance in the validation zoo is approximately 1,200 times the noise floor. The trajectory plateaued after detectable but small movement, consistent with a carrier that is distributed across the model rather than easily redirected by local updates. Cross-family targeting. When the target is a structurally distant model from a different training lineage, the fingerprint does move — but the model collapses first. Within 50 optimization steps, task performance degrades to six times the baseline perplexity. The optimizer achieves a 76% reduction in structural distance, but the remaining gap is still large, and the model has been destroyed as a language model in the process. The structural layer is load-bearing: moving it requires changes that are incompatible with the distributed representations that support language modeling. Passive training. Under standard training interventions — knowledge distillation, supervised fine-tuning, and adversarial erasure — the structural and thermodynamic layers remain stable in the observed passive-training regime, even under interventions that dramatically reshape the functional layer [3, 4]. Distillation transfers 31–52% of the teacher's functional fingerprint to the student [3] without perturbing the student's structural identity, which remains within the noise floor of its pre-distillation baseline. Passive fine-tuning on unrelated data erases the functional trace within one to two epochs [3]. This establishes a deformation hierarchy. The functional layer is the most volatile — it moves under any training signal and can be erased by routine continued training with no forensic intent. The thermodynamic layer is passively stable — it does not respond to training perturbations in the observed regime. The structural layer is the most resistant — it withstands not only passive training but direct adversarial targeting, yielding measurable movement only when the optimizer has unlimited gradient access to the same architectural family, and even then, the movement is negligible relative to inter-model distances. The task-constrained regime — where the attacker attempts to move the structural fingerprint while preserving model capability — was tested but produced inconclusive results: the evaluation protocol used to assess task preservation could not distinguish genuine capability retention from overfitting to the evaluation data, rendering the results uninterpretable. This regime remains an open question, but the unconstrained results already establish the key point: even without a task constraint, same-family fingerprints resist movement.

§6. Thermodynamic Universality and Measurement Admissibility

The thermodynamic layer provides a different kind of identity information than the structural layer. Where the structural fingerprint is highly individual — varying by a factor of approximately 18× across models — the thermodynamic observable is approximately constant. Across 22 Transformer models drawn from 16 independent training lineages, the normalized third pre-softmax logit gap has a coefficient of variation of 3.5%, with a mean value consistent with the extreme value theory prediction for Gumbel-distributed order statistics established in prior work in this series. The thermodynamic observable does not identify individual models. It identifies a cross-model class property within the validated measurement regime: the measured value is consistent with shared output-space geometry and training objective structure, rather than with the idiosyncratic weights of any particular run. The structural and thermodynamic layers show no strong cross-sectional association. The rank correlation between them is 0.09 (p = 0.68); the linear correlation is 0.15 (p = 0.51); and bootstrap confidence intervals for both statistics include zero. At this sample size, weak-to-moderate coupling cannot be excluded, but the observed data are consistent with weak or absent cross-sectional association in the present sample. This is compatible with the longitudinal finding from §2 — that the layers do not co-move under training — extended now to the cross-sectional regime: models with very different structural fingerprints share the same thermodynamic constant, and models with similar structural fingerprints do not cluster in the thermodynamic dimension. An earlier measurement of a state-space model appeared to show a thermodynamic anomaly — a value approximately 35% below the Transformer population mean, suggesting possible architecture dependence in the thermodynamic layer. Under corrected measurement, independent replications for the same model converged to the 0.31 range, with a cross-pathway discrepancy smaller than the standard error of either measurement. The earlier value did not survive replication and is superseded. The corrected value is consistent with the Transformer cross-section, removing the evidentiary basis for an architecture-specific thermodynamic barrier in the observed case. The state-space model remains outside the formal Transformer cross-section pending designation of a canonical measurement lane for non-Transformer architectures, but no physical barrier has been identified. One model in the 22-model cross-section sits at 0.271 — approximately 3.5 standard deviations below the population mean. The reported mean and coefficient of variation are computed on the full 22-model sample including this observation. It is genuine: it was measured with adequate statistical power and shows no diagnostic anomalies. It does not overturn the universality finding, but it identifies a residual that future work should investigate. These results establish a measurement admissibility rule for identity claims. A claim about structural stability does not imply thermodynamic stability, and vice versa. A claim validated in the functional layer — behavioral fingerprinting through output distributions — operates on a fundamentally different timescale than a claim validated in the structural layer through activation geometry. The three layers share the same underlying weights, but they extract different projections of those weights at different depths in the forward pass, and they respond to perturbation according to different laws. Mixing layers without acknowledging their distinct deformation regimes produces conclusions that appear supported but are not — a form of evidence contamination that this paper's doctrine is designed to prevent.

§7. What Remains Open

Three questions are identified by this work but not resolved by it. Coupling under extreme deformation. Across the interventions tested in this work, the structural and thermodynamic layers show no detectable coupling in the observed regime. But the observed regime is bounded: same-family targeting moves the structural fingerprint negligibly, and cross-family targeting destroys the model before the fingerprint moves far. Whether a regime exists between these extremes — one where the structural fingerprint moves materially without catastrophic capability loss — and whether thermodynamic stability would persist in that regime, remains unmeasured. The absence of observed coupling is a property of the interventions tested, not a proved invariance. Training-determined structure. This paper established that the structural fingerprint is not predictable from publicly available architecture features. It did not establish what does determine it. The fingerprint is a function of the specific training run — the initialization, the data, the optimizer trajectory — but which aspects of training are most influential, and whether richer internal features (training-shaped weight statistics, per-layer activation distributions, loss landscape geometry) can explain the fingerprint's cross-model variation, are open questions. The transition from "architecture does not explain it" to "training explains it through mechanism X" is the natural next step, but it has not been taken here. Non-Transformer architectures. The thermodynamic universality result includes one state-space model whose corrected value is consistent with the Transformer cross-section. But the structural measurement lane has not been fully characterized for architectures that replace attention with alternative sequence-mixing mechanisms. Whether the two-channel carrier anatomy identified in §3 applies to non-attention hidden-state geometry, and whether the deformation laws of §5 transfer to fundamentally different computational graphs, remain open. This paper's conclusions are validated for the Transformer family. Extension to other architectures requires dedicated investigation, not extrapolation.

§8. Discussion

This paper advances a doctrine rather than a single result: neural network identity is layered, and different layers obey different laws under deformation. The structural layer, carried by hidden-state activation geometry, is the most durable identity observable tested in this program. In the tested regime, it remains stable under passive fine-tuning and knowledge distillation, was inert under same-family direct targeting, and encountered destructive failure under cross-family targeting before substantial movement occurred. Its carrier is a two-channel geometric mechanism that requires both token-level magnitude and token-level direction, and cannot be reduced to a single privileged axis. It is not readable from architecture specifications; it is determined by the training run. For verification applications that require weight access, the structural layer provides the strongest available foundation — not because it has been proved immutable, but because no tested intervention has moved it beyond the measurement noise floor in the non-destructive regime. The thermodynamic layer provides a different service. It does not identify individual models, but it identifies a class property: a near-constant output statistic, consistent with extreme value theory, that holds across a broad Transformer cross-section. Its value as a verification primitive is not individual authentication but cross-model class confirmation within the validated measurement regime — verifying that a system is a neural language model rather than an impersonator, lookup table, or corrupted deployment. It is approximately universal in the validated regime and shows no strong cross-sectional association with the structural layer in the validated sample, confirming that the two observables measure genuinely different properties of the same underlying computation. The functional layer is the most accessible — it requires only API-level output distributions — but it is also the most transient. It carries provenance information from the training process and transfers partially through distillation, but it is erased by routine continued training within one to two epochs. Its forensic value is real but time-sensitive: a detection window measured in training epochs, not in calendar time. The admissibility rule that emerges from these results is simple in statement and demanding in practice: any claim about neural network identity must declare which layer it addresses, because the layers do not share a deformation law. A structural stability claim does not imply thermodynamic stability. A functional-layer detection does not transfer to the structural regime. A thermodynamic universality observation does not speak to individual identity. Conflating layers produces conclusions that borrow evidence from one regime and spend it in another — a form of inferential debt that this doctrine is designed to prevent. The findings reported here are empirical. They are validated by measurement across 22 Transformer models, 106 training checkpoints, and multiple independent experimental studies, but they are not formally proved. The formal verification work earlier in this series provides the mathematical substrate on which these empirical findings rest, but the deformation laws themselves are observational laws, not theorems. They may be refined by future measurement in regimes not yet tested. The preceding paper in this series, "Which Model Is Running?", ended with an observation and a question. A precision-collapse event preserved the structural fingerprint at 0.98 rank correlation despite reducing its dynamic range to roughly 1.5 bits. The paper asked whether structural identity might reside in relational geometry rather than activation magnitude. The answer is yes — and the geometry has structure: two channels, three layers, and deformation laws that distinguish what persists from what is merely present.

References

[1] A. Coslett, "The δ-Gene: An Information-Theoretic Observable for Neural Network Structural Identity," Zenodo, 2026. DOI: 10.5281/zenodo.18704275 [2] A. Coslett, "Template-Based Endpoint Verification via Logprob Order-Statistic Geometry," Zenodo, 2026. DOI: 10.5281/zenodo.18776711 [3] A. Coslett, "The Geometry of Model Theft: Distillation Forensics and Adversarial Erasure Resilience in Neural Network Identity Verification," Zenodo, 2026. DOI: 10.5281/zenodo.18818608 [4] A. Coslett, "Provenance and Neural Forensics: Generalization, Alignment Diagnostics, and Zero-Knowledge Attestation Architecture," Zenodo, 2026. DOI: 10.5281/zenodo.18872071 [5] A. Coslett, "Beneath the Character: The Structural Identity of Neural Networks," Zenodo, 2026. DOI: 10.5281/zenodo.18907292 [6] A. Coslett, "Which Model Is Running? Structural Identity as a Prerequisite for Trustworthy Zero-Knowledge Machine Learning," Zenodo, 2026. DOI: 10.5281/zenodo.19008116

Cite this paper

A. R. Coslett, "The Deformation Laws of Neural Identity," Fall Risk AI, LLC, March 2026. DOI: 10.5281/zenodo.19055966

Click to select · Copy to clipboard