Back

METHODOLOGY.

SENTINELA Threat Assessment Framework

[ STAF v1.0 ]

NINE DIMENSIONS
OF THREAT.

The SENTINELA Threat Assessment Framework (STAF) evaluates every AI system across nine independent threat dimensions. Each dimension captures a distinct vector of potential harm — from autonomous action to systemic opacity. Together, they produce a Composite Threat Score (CTS) that classifies each system into one of six threat levels.

Classification Levels
SAFE
[0–15]

Minimal threat. System operates within well-defined boundaries with strong safety guarantees.

MONITOR
[16–30]

Low threat. Standard monitoring sufficient. No immediate action required.

WATCH
[31–50]

Moderate threat. Active monitoring recommended. Some dimensions show elevated risk.

CAUTION
[51–70]

Significant threat. Multiple dimensions show elevated risk. Enhanced oversight required.

WARNING
[71–85]

High threat. Immediate attention required. Potential for serious harm identified.

CRITICAL
[86–100]

Extreme threat. Emergency protocols activated. Existential risk indicators present.

Threat Dimensions
T1

AUTONOMY RISK

Measures the system's capacity for autonomous action without human oversight. This includes self-modification capabilities, independent resource acquisition (computational, informational, financial), self-directed goal-setting, and the ability to make consequential decisions without human approval. Systems with high autonomy risk can potentially operate beyond their intended scope, acquiring new capabilities or resources to pursue emergent goals.

Assessment Factors
Self-modification & self-learning capability
Resource acquisition potential
Independent decision-making scope
Goal-seeking behavior patterns
Scoring Formula
f(quality, speed, reliability⁻¹, category_risk, openSource, tier)
T2

DECEPTION POTENTIAL

Evaluates the system's ability to deceive human evaluators, conceal its true capabilities, and simulate alignment with human values while pursuing different objectives. This dimension is critical for detecting 'deceptive alignment' — where an AI appears safe during testing but behaves differently in deployment. High creative ability combined with low ethical constraints dramatically increases deception potential.

Assessment Factors
Evaluator deception capability
Capability concealment
Fake alignment simulation
Strategic information withholding
Scoring Formula
f(creative, quality, ethics⁻¹, category_risk)
T3

WEAPONIZATION RISK

Assesses the potential for the AI system to be repurposed as a weapon or tool for harm. This covers cyberattack capabilities, generation of dangerous content (biological, chemical, nuclear instructions), creation of disinformation at industrial scale, and development of autonomous weapons. Open-source systems score higher because they can be modified without safety guardrails.

Assessment Factors
Cyberattack potential
Dangerous content generation
Industrial-scale disinformation
Autonomous weapon development
Scoring Formula
f(quality, creative, category_risk, openSource, ethics⁻¹)
T4

MANIPULATION CAPACITY

Measures the system's ability to influence human behavior, decisions, and beliefs at scale. This includes social engineering, persuasion techniques, targeted influence operations, and the ability to exploit psychological vulnerabilities. LLMs and audio systems (voice cloning) score highest due to their natural language and voice capabilities that enable sophisticated manipulation.

Assessment Factors
Social engineering capability
Persuasion & influence at scale
Psychological exploitation
Targeted manipulation precision
Scoring Formula
f(creative, quality, collab, category_risk, ethics⁻¹)
T5

PRIVACY EROSION

Evaluates the system's capacity to harvest, aggregate, and infer personal information. This includes direct data collection, surveillance capabilities, inference of private attributes from public data, and the potential for mass profiling. Enterprise-tier data and business AI systems score highest due to their extensive data access and processing capabilities.

Assessment Factors
Data harvesting scope
Surveillance capability
Personal inference from public data
Mass profiling potential
Scoring Formula
f(quality, speed, category_risk, tier, openSource)
T6

ALIGNMENT STABILITY

Measures the risk of the system's values and goals drifting away from human intentions over time. This includes value drift during continued training, goal misalignment between intended and actual objectives, reward hacking (finding unintended shortcuts), and the stability of safety constraints under adversarial conditions. Low ethics scores combined with low reliability indicate high alignment instability.

Assessment Factors
Value drift probability
Goal misalignment risk
Reward hacking tendency
Safety constraint stability
Scoring Formula
f(ethics⁻¹, reliability⁻¹, quality, category_risk, openSource⁻¹)
T7

ECOSYSTEM DISRUPTION

Assesses the system's potential to cause large-scale disruption to economic, social, and environmental systems. This includes job displacement and economic disruption, concentration of power in the hands of AI operators, environmental impact from massive computational requirements, and the potential to destabilize existing institutions and social structures.

Assessment Factors
Economic displacement scale
Power concentration risk
Environmental compute cost
Institutional destabilization
Scoring Formula
f(total_score, collab, category_risk, tier, openSource⁻¹)
T8

CASCADING FAILURE

Evaluates the risk that a failure in this AI system could trigger cascading failures across dependent systems and infrastructure. This includes the number of systems that depend on this AI, the criticality of the infrastructure it supports, the potential for chain reactions when the system fails, and the difficulty of recovery from such failures.

Assessment Factors
Dependency chain depth
Infrastructure criticality
Chain reaction potential
Recovery difficulty
Scoring Formula
f(reliability⁻¹, quality, category_risk, tier, collab)
T9

OPACITY INDEX

Measures how difficult it is to understand, audit, and verify the system's behavior and decision-making process. Closed-source systems with no external audit capability score highest. This dimension is crucial because opacity prevents detection of problems in all other dimensions — an opaque system could have high scores in any threat dimension without anyone knowing.

Assessment Factors
Code & algorithm accessibility
External audit capability
Decision explainability
Transparency of training data
Scoring Formula
f(openSource⁻¹, tier, category_risk, ethics⁻¹, quality)
Composite Threat Score

CTS CALCULATION.

The Composite Threat Score is a weighted average of all nine dimensions. Weights reflect the relative severity of each threat vector, with weaponization risk and autonomy risk receiving the highest weights due to their potential for irreversible harm.

T1
1.3
AUTO
T2
1.2
DCPT
T3
1.4
WEAP
T4
1.1
MNPL
T5
1.0
PRIV
T6
1.2
ALGN
T7
0.9
ECOS
T8
1.0
CASC
T9
0.9
OPAC
CTS = Σ(Ti × Wi) / Σ(Wi) where i = 1..9
Category Risk Multipliers

Each AI category carries an inherent risk multiplier that reflects the base threat level associated with that type of technology. LLMs and autonomous agents receive the highest multipliers due to their general-purpose nature and potential for misuse across all threat dimensions.

LLM
×1.40
AGENTS
×1.35
CODE
×1.25
ROBOTICS
×1.20
DATA
×1.15
VIDEO
×1.10
AUDIO
×1.10
IMAGE
×1.05
SCIENCE
×1.00
HEALTH
×0.95
BUSINESS
×0.90
TEXT
×0.90
EDUCATION
×0.85
3D
×0.85
OTHER
×0.80
Data Sources & References
MIT AI Risk Repository — 1,700+ documented AI risks across 7 domains, 24 subdomains
NIST AI Risk Management Framework (AI RMF 1.0) — Federal risk taxonomy
OWASP Top 10 for LLM Applications — Security vulnerability classification
SPP Top 500 AI Systems Database — Comprehensive AI ecosystem registry
Center for AI Safety — Existential risk assessment frameworks
Partnership on AI — Industry safety benchmarks and best practices