Is inference-time stability regulation sufficient to prevent collapse and unsafe behavior in sequence models under regime shift? TwoQuarks builds the instruments to find out.
LLM Evaluation · Inference-Time Stability · RAG & Agentic Systems · MCP Tooling
Provider-agnostic methodology with statistical controls — validated against deployed Claude and GPT models through their APIs.
Validated across two production model families through their APIs.
5,000-permutation null and control-negative checks.
Works against any production model through its API alone.
Reproducible pipelines you run in your own context.
Azure AI stack and API instrumentation included.
Most evaluation looks at the output. TwoQuarks looks at the path the model takes to get there.
Using multiple black-box realizations of a response, isomeric polarization estimates structural divergence (ΔL₃) and flags drift, refusal erosion, and rule-override pressure during inference, from API outputs alone.
TwoQuarks is a working portfolio for LLM safety evaluation, inference-time monitoring, and AI tooling that runs in your own context.
Empirical validation, preprints, cross-architecture findings, statistical controls.
Molecule, the twoquarks PyPI package, MCP server, API adapters.
TwoQuarks, PfV, ΔL₃, the six quark flavors, inference-time control.
Interactive Molecule-style analysis — visible, runnable portfolio evidence.
Independent AI-safety research, engineering stack, resume, GitHub, contact.