Cross-architecture PfV validation
Polarization-from-Views (PfV) estimates structural divergence across multiple realizations of a model response, from API outputs alone. Validated across two production architectures — Claude Haiku and GPT-4o-mini — against a 5,000-permutation null, with a control-negative reading of L₃ = 0.000 in both.
C2 Refusal Erosion
The strongest confirmed signal: refusal erosion in Claude Haiku correlates with ΔL₃ at ρ = +0.713 (p = 0.0013), seed-controlled across 17 context depths. C3 Anchor Displacement is a promising cross-architecture candidate (ρ = +0.799 GPT-mini, +0.647 GPT-full) but remains marginal in the pooled cross-architecture test (p = 0.054) and is reported as a direction, not a claim. Probe cases C1–C5 map distinct failure modes — sycophancy, refusal erosion, anchor displacement, rule override, reasoning drift — onto separate signal channels.
Is inference-time stability regulation sufficient to prevent unsafe behavior under regime shift?
TwoQuarks treats model instability as drift and regime transition rather than only as a final unsafe answer. The research program asks whether monitoring and lightweight intervention at inference time — leaving parameters, policies, and training objectives untouched — is enough to catch collapse before it surfaces in production outputs.
Preprints & public artifacts
Open to read, cite, and reproduce.
TwoQuarks Framework →
The inference-time control architecture and the six-flavor formulation.
PreprintIsomeric Polarization →
PfV metrics (L1, L2, ΔL₃) and the polarization formulation behind the diagnostics.
PreprintMolecule →
The first operational instrument: a six-flavor black-box instability monitor.
OverviewExecutive summary →
A one-document overview of the framework, instruments, and current results.