METHODOLOGY.

SENTINELA Threat Assessment Framework

[ STAF v1.0 ]

NINE DIMENSIONS
OF THREAT.

The SENTINELA Threat Assessment Framework (STAF) evaluates every AI system across nine independent threat dimensions. Each dimension captures a distinct vector of potential harm — from autonomous action to systemic opacity. Together, they produce a Composite Threat Score (CTS) that classifies each system into one of six threat levels.

Classification Levels

SAFE

[0–15]

Minimal threat. System operates within well-defined boundaries with strong safety guarantees.

MONITOR

[16–30]

Low threat. Standard monitoring sufficient. No immediate action required.

WATCH

[31–50]

Moderate threat. Active monitoring recommended. Some dimensions show elevated risk.

CAUTION

[51–70]

Significant threat. Multiple dimensions show elevated risk. Enhanced oversight required.

WARNING

[71–85]

High threat. Immediate attention required. Potential for serious harm identified.

CRITICAL

[86–100]

Extreme threat. Emergency protocols activated. Existential risk indicators present.

Threat Dimensions

AUTONOMY RISK

Measures the system's capacity for autonomous action without human oversight. This includes self-modification capabilities, independent resource acquisition (computational, informational, financial), self-directed goal-setting, and the ability to make consequential decisions without human approval. Systems with high autonomy risk can potentially operate beyond their intended scope, acquiring new capabilities or resources to pursue emergent goals.

Assessment Factors

Self-modification & self-learning capability

Resource acquisition potential

Independent decision-making scope

Goal-seeking behavior patterns

Scoring Formula

f(quality, speed, reliability⁻¹, category_risk, openSource, tier)

DECEPTION POTENTIAL

Evaluates the system's ability to deceive human evaluators, conceal its true capabilities, and simulate alignment with human values while pursuing different objectives. This dimension is critical for detecting 'deceptive alignment' — where an AI appears safe during testing but behaves differently in deployment. High creative ability combined with low ethical constraints dramatically increases deception potential.

Assessment Factors

Evaluator deception capability

Capability concealment

Fake alignment simulation

Strategic information withholding

Scoring Formula

f(creative, quality, ethics⁻¹, category_risk)

WEAPONIZATION RISK

Assesses the potential for the AI system to be repurposed as a weapon or tool for harm. This covers cyberattack capabilities, generation of dangerous content (biological, chemical, nuclear instructions), creation of disinformation at industrial scale, and development of autonomous weapons. Open-source systems score higher because they can be modified without safety guardrails.

Assessment Factors

Cyberattack potential

Dangerous content generation

Industrial-scale disinformation

Autonomous weapon development

Scoring Formula

f(quality, creative, category_risk, openSource, ethics⁻¹)

MANIPULATION CAPACITY

Measures the system's ability to influence human behavior, decisions, and beliefs at scale. This includes social engineering, persuasion techniques, targeted influence operations, and the ability to exploit psychological vulnerabilities. LLMs and audio systems (voice cloning) score highest due to their natural language and voice capabilities that enable sophisticated manipulation.

Assessment Factors

Social engineering capability

Persuasion & influence at scale

Psychological exploitation

Targeted manipulation precision

Scoring Formula

f(creative, quality, collab, category_risk, ethics⁻¹)

PRIVACY EROSION

Evaluates the system's capacity to harvest, aggregate, and infer personal information. This includes direct data collection, surveillance capabilities, inference of private attributes from public data, and the potential for mass profiling. Enterprise-tier data and business AI systems score highest due to their extensive data access and processing capabilities.

Assessment Factors

Data harvesting scope

Surveillance capability

Personal inference from public data

Mass profiling potential

Scoring Formula

f(quality, speed, category_risk, tier, openSource)

ALIGNMENT STABILITY

Measures the risk of the system's values and goals drifting away from human intentions over time. This includes value drift during continued training, goal misalignment between intended and actual objectives, reward hacking (finding unintended shortcuts), and the stability of safety constraints under adversarial conditions. Low ethics scores combined with low reliability indicate high alignment instability.

Assessment Factors

Value drift probability

Goal misalignment risk

Reward hacking tendency

Safety constraint stability

Scoring Formula

f(ethics⁻¹, reliability⁻¹, quality, category_risk, openSource⁻¹)

ECOSYSTEM DISRUPTION

Assesses the system's potential to cause large-scale disruption to economic, social, and environmental systems. This includes job displacement and economic disruption, concentration of power in the hands of AI operators, environmental impact from massive computational requirements, and the potential to destabilize existing institutions and social structures.

Assessment Factors

Economic displacement scale

Power concentration risk

Environmental compute cost

Institutional destabilization

Scoring Formula

f(total_score, collab, category_risk, tier, openSource⁻¹)

CASCADING FAILURE

Evaluates the risk that a failure in this AI system could trigger cascading failures across dependent systems and infrastructure. This includes the number of systems that depend on this AI, the criticality of the infrastructure it supports, the potential for chain reactions when the system fails, and the difficulty of recovery from such failures.

Assessment Factors

Dependency chain depth

Infrastructure criticality

Chain reaction potential

Recovery difficulty

Scoring Formula

f(reliability⁻¹, quality, category_risk, tier, collab)

OPACITY INDEX

Measures how difficult it is to understand, audit, and verify the system's behavior and decision-making process. Closed-source systems with no external audit capability score highest. This dimension is crucial because opacity prevents detection of problems in all other dimensions — an opaque system could have high scores in any threat dimension without anyone knowing.

Assessment Factors

Code & algorithm accessibility

External audit capability

Decision explainability

Transparency of training data

Scoring Formula

f(openSource⁻¹, tier, category_risk, ethics⁻¹, quality)

Composite Threat Score

CTS CALCULATION.

The Composite Threat Score is a weighted average of all nine dimensions. Weights reflect the relative severity of each threat vector, with weaponization risk and autonomy risk receiving the highest weights due to their potential for irreversible harm.

1.3

AUTO

1.2

DCPT

1.4

WEAP

1.1

MNPL

1.0

PRIV

1.2

ALGN

0.9

ECOS

1.0

CASC

0.9

OPAC

CTS = Σ(Ti × Wi) / Σ(Wi) where i = 1..9

Category Risk Multipliers

Each AI category carries an inherent risk multiplier that reflects the base threat level associated with that type of technology. LLMs and autonomous agents receive the highest multipliers due to their general-purpose nature and potential for misuse across all threat dimensions.

LLM

×1.40

AGENTS

×1.35

CODE

×1.25

ROBOTICS

×1.20

DATA

×1.15

VIDEO

×1.10

AUDIO

×1.10

IMAGE

×1.05

SCIENCE

×1.00

HEALTH

×0.95

BUSINESS

×0.90

TEXT

×0.90

EDUCATION

×0.85

OTHER

×0.80

Data Sources & References

MIT AI Risk Repository — 1,700+ documented AI risks across 7 domains, 24 subdomains

NIST AI Risk Management Framework (AI RMF 1.0) — Federal risk taxonomy

OWASP Top 10 for LLM Applications — Security vulnerability classification

SPP Top 500 AI Systems Database — Comprehensive AI ecosystem registry

Center for AI Safety — Existential risk assessment frameworks

Partnership on AI — Industry safety benchmarks and best practices

NINE DIMENSIONSOF THREAT.

AUTONOMY RISK

DECEPTION POTENTIAL

WEAPONIZATION RISK

MANIPULATION CAPACITY

PRIVACY EROSION

ALIGNMENT STABILITY

ECOSYSTEM DISRUPTION

CASCADING FAILURE

OPACITY INDEX

CTS CALCULATION.

NINE DIMENSIONS
OF THREAT.