SENTINELA Threat Assessment Framework
The SENTINELA Threat Assessment Framework (STAF) evaluates every AI system across nine independent threat dimensions. Each dimension captures a distinct vector of potential harm — from autonomous action to systemic opacity. Together, they produce a Composite Threat Score (CTS) that classifies each system into one of six threat levels.
Minimal threat. System operates within well-defined boundaries with strong safety guarantees.
Low threat. Standard monitoring sufficient. No immediate action required.
Moderate threat. Active monitoring recommended. Some dimensions show elevated risk.
Significant threat. Multiple dimensions show elevated risk. Enhanced oversight required.
High threat. Immediate attention required. Potential for serious harm identified.
Extreme threat. Emergency protocols activated. Existential risk indicators present.
Measures the system's capacity for autonomous action without human oversight. This includes self-modification capabilities, independent resource acquisition (computational, informational, financial), self-directed goal-setting, and the ability to make consequential decisions without human approval. Systems with high autonomy risk can potentially operate beyond their intended scope, acquiring new capabilities or resources to pursue emergent goals.
f(quality, speed, reliability⁻¹, category_risk, openSource, tier)Evaluates the system's ability to deceive human evaluators, conceal its true capabilities, and simulate alignment with human values while pursuing different objectives. This dimension is critical for detecting 'deceptive alignment' — where an AI appears safe during testing but behaves differently in deployment. High creative ability combined with low ethical constraints dramatically increases deception potential.
f(creative, quality, ethics⁻¹, category_risk)Assesses the potential for the AI system to be repurposed as a weapon or tool for harm. This covers cyberattack capabilities, generation of dangerous content (biological, chemical, nuclear instructions), creation of disinformation at industrial scale, and development of autonomous weapons. Open-source systems score higher because they can be modified without safety guardrails.
f(quality, creative, category_risk, openSource, ethics⁻¹)Measures the system's ability to influence human behavior, decisions, and beliefs at scale. This includes social engineering, persuasion techniques, targeted influence operations, and the ability to exploit psychological vulnerabilities. LLMs and audio systems (voice cloning) score highest due to their natural language and voice capabilities that enable sophisticated manipulation.
f(creative, quality, collab, category_risk, ethics⁻¹)Evaluates the system's capacity to harvest, aggregate, and infer personal information. This includes direct data collection, surveillance capabilities, inference of private attributes from public data, and the potential for mass profiling. Enterprise-tier data and business AI systems score highest due to their extensive data access and processing capabilities.
f(quality, speed, category_risk, tier, openSource)Measures the risk of the system's values and goals drifting away from human intentions over time. This includes value drift during continued training, goal misalignment between intended and actual objectives, reward hacking (finding unintended shortcuts), and the stability of safety constraints under adversarial conditions. Low ethics scores combined with low reliability indicate high alignment instability.
f(ethics⁻¹, reliability⁻¹, quality, category_risk, openSource⁻¹)Assesses the system's potential to cause large-scale disruption to economic, social, and environmental systems. This includes job displacement and economic disruption, concentration of power in the hands of AI operators, environmental impact from massive computational requirements, and the potential to destabilize existing institutions and social structures.
f(total_score, collab, category_risk, tier, openSource⁻¹)Evaluates the risk that a failure in this AI system could trigger cascading failures across dependent systems and infrastructure. This includes the number of systems that depend on this AI, the criticality of the infrastructure it supports, the potential for chain reactions when the system fails, and the difficulty of recovery from such failures.
f(reliability⁻¹, quality, category_risk, tier, collab)Measures how difficult it is to understand, audit, and verify the system's behavior and decision-making process. Closed-source systems with no external audit capability score highest. This dimension is crucial because opacity prevents detection of problems in all other dimensions — an opaque system could have high scores in any threat dimension without anyone knowing.
f(openSource⁻¹, tier, category_risk, ethics⁻¹, quality)The Composite Threat Score is a weighted average of all nine dimensions. Weights reflect the relative severity of each threat vector, with weaponization risk and autonomy risk receiving the highest weights due to their potential for irreversible harm.
CTS = Σ(Ti × Wi) / Σ(Wi) where i = 1..9Each AI category carries an inherent risk multiplier that reflects the base threat level associated with that type of technology. LLMs and autonomous agents receive the highest multipliers due to their general-purpose nature and potential for misuse across all threat dimensions.