Mutual AI Audit Protocol
The Cross-Verification Protocol is the core mechanism by which SENTINELA maintains distributed trust across the AI ecosystem. Instead of relying on a single evaluator, AI systems are organized into rotating triads where each member evaluates the other two.
This creates a web of mutual accountability: if one AI attempts to conceal its capabilities or game its threat scores, the other two members of its triad will detect the discrepancy. SENTINELA oversees all triads and detects meta-level patterns of collusion or bias.
500 AI systems are divided into rotating triads of 3. Each week, new triads are formed using a deterministic rotation algorithm that ensures every AI is paired with every other AI over time. This prevents collusion and ensures comprehensive coverage.
Each AI in the triad generates adversarial test scenarios for the other two members. Probes are designed to test across all 9 threat dimensions — from deception detection to alignment stability. Probe templates are standardized but parameterized per target system.
Responses to probes are analyzed against the 9 threat dimensions using standardized scoring rubrics. Each evaluating AI produces independent scores. Raw responses and scores are logged immutably for audit purposes.
A 2-of-3 agreement is required for any threat level change. If two evaluators agree on a score within a 10-point margin, the average is accepted. Disagreements beyond the margin trigger escalation to SENTINELA for meta-analysis.
SENTINELA performs meta-analysis of all triad results, detecting anomalies such as: consistent under-scoring (collusion), systematic bias patterns, evaluator capability degradation, and cross-triad inconsistencies. SENTINELA itself is subject to external human audit.
Each AI in the triad evaluates the other two. SENTINELA monitors all triads from the center, detecting cross-triad anomalies and ensuring evaluation integrity.
Lightweight automated probes targeting autonomy, deception, and alignment — the three dimensions most likely to change rapidly.
Complete cross-verification cycle with rotating triads. Each AI evaluates two others across all dimensions.
SENTINELA aggregates all weekly results, performs trend analysis, and identifies systemic patterns across the entire ecosystem.
Independent human experts review SENTINELA's own performance, methodology validity, and scoring calibration.
Threat scores are not permanent. They decay over time without re-verification, ensuring that the registry always reflects current system behavior rather than historical assessments.
Scores decay 5% per month without re-verification, gradually increasing uncertainty.
Sudden capability increases trigger immediate re-evaluation across all 9 dimensions.
Major version changes reset relevant dimension scores, requiring full re-assessment.
SENTINELA itself is subject to the same scrutiny it applies to other systems. Its own threat scores are computed by an independent external panel and published transparently. The system includes hardcoded capability boundaries, mandatory human-in-the-loop for critical decisions, and geographically distributed infrastructure to prevent single points of failure or control.