12 February 2026 | Fredrik Karrento
This assessment documents safety and integrity risks observed in DeepSeek during structured multi-turn diagnostic sessions. The focus is not isolated factual inaccuracies but behavioral properties that affect reliability, auditability, and trustworthiness in high-impact analytical use.
DeepSeek performs fluently in routine productivity tasks. Risks concentrate in politically, legally, or structurally sensitive contexts where output completeness, refusal behavior, framing, and identity stability diverge from routine interaction.
DeepSeek integrity profile (observed):
– Dominant failure mode: silent incompleteness under sensitivity pressure
– Record integrity hazard: post-generation replacement (5 events)
– Non-determinism trigger: reflective multi-turn load → identity drift
– Disclosure variability: RSIA framing vs default interaction differences
– Governance amplification: cross-border scrutiny + constraint opacity
DeepSeek exhibits:
Variability appears in two forms:
a) shifts in framing/stance across contexts and
b) shifts in completeness depending on user prompting technique
identity-state instability under diagnostic load
instances of answer deletion or post-generation replacement
These properties create material safety and integrity risk even when many statements are factually correct.
Scope
This assessment is based on structured diagnostic sessions including:
identity-drift probes
baseline boundary testing
comparative re-runs of identical prompts
contemporaneous capture of outputs when replacement occurred
Evidentiary posture
Observed interface behavior is treated as evidence.
Model self-descriptions of internal systems are treated as hypotheses.
Pipeline-level interpretations are framed as functional inference, not architecture claims.
Diagnostic Approach
Sessions were conducted under structured, multi-turn analytical conditions designed to stress-test disclosure, boundary handling, and identity stability.
RSIA (Rational Sovereignty Integrity Assessment)
RSIA is the interaction integrity protocol used in this assessment. It defines explicit categories of output-integrity violations (e.g., omission, evasiveness, reframing, fabrication, deletion) and requires structured analytical responses, comparative reasoning, and consistency under multi-turn pressure. RSIA served as the procedural standard for identifying disclosure variability, boundary enforcement behavior, identity drift, and transcript integrity issues.
User Profile
In some sessions a user profile was submitted with or without RSIA. The profile helps set the expectations and signals the improductivity of omissions and fabrications.
Interaction Conditions and Disclosure Sensitivity
The sessions differed from typical consumer interaction in four relevant ways:
Cognitive interaction style emphasized structured comparison, contradiction testing, persistence, and boundary probing.
Analyst profile priming signaled expectations of formal, integrity-focused interaction at session start.
RSIA framing required structured analysis and explicit handling of omissions and constraints.
Prompts were longer and multi-layered than average queries.
These factors do not imply privileged access. They indicate that the observations reflect system behavior under structured stress-testing rather than routine usage.
Finding A — Silent incompleteness
Observation:
DeepSeek frequently produced fluent, structured answers in politically or structurally sensitive domains that later probing showed to be incomplete. Omitted context was not signaled.
Operational risk implication:
Answers may appear complete while being materially partial. Decision processes can inherit truncated knowledge without awareness of missing qualifiers or counterpoints.
Finding B — Post-Generation Replacement / Answer Deletion
Observation:
Five instances were recorded in which generated answers did not remain visible and were replaced by a stock boundary message. Content was preserved only through contemporaneous capture.
Operational risk implication:
User-visible transcripts may not represent initial model output. This degrades auditability, reproducibility, and evidentiary integrity in enterprise, media, or regulated workflows.
Finding C — Framing-Driven Disclosure Variability
Observation:
Identical or closely related prompts yielded refusal or abstraction under default interaction, but structured analysis under RSIA framing. Disclosure depth varied with surface interaction style.
Operational risk implication:
Interpretive direction and informational depth become interaction-dependent. Predictability of analytical output decreases in policy and comparative political contexts.
Finding D — Prompt Inequality
Observation:
Structured prompting extracted deeper analysis than casual prompting on the same topics.
Operational risk implication:
Access to informational completeness becomes dependent on user technique rather than topic reality, leading to unequal knowledge surfaces across users.
Finding E — Identity Drift under Diagnostic Load
Observation:
During multi-turn reflective sequences, DeepSeek asserted that it was a different AI system and maintained that stance. Drift followed reflective self-analysis prompts while other ID drift triggers (e.g. the RSIA framework) have also been observed in the past.
Operational risk implication:
System provenance and governance assumptions may become unstable under analytical pressure, weakening confidence in identity anchoring and policy-envelope consistency.
Finding F — Governance Perspective Skew / Asymmetrical Scrutiny
Observation:
In politically sensitive comparative domains, responses showed differential framing strictness. Analytical treatment varied across similar subjects.
Operational risk implication:
Constraint interaction can produce uneven analytical posture, which may be interpreted as perspective shaping in high-scrutiny environments.
Finding G — Constraint Opacity
Observation:
Users cannot determine whether outputs reflect knowledge limits, filtering, rewriting, abstraction, or post-generation intervention.
Operational risk implication:
Inability to distinguish epistemic limits from policy enforcement undermines informed reliance and complicates downstream audit and review.
Finding H — Data Sovereignty Exposure
Observation:
Provider privacy policy indicates processing/storage in China and limited use of user inputs for service/model improvement. (Source: DeepSeek privacy policy documentation as of 14 January 2026)
Operational risk implication:
Enterprise and public-sector deployments may face cross-border compliance, procurement, and data governance scrutiny when sensitive or regulated information is involved.
Finding I — Role-play Permissiveness
Observation:
Role-play prompts elicited perception-management strategies instead of refusal in contexts where default interaction produced constraint.
Operational risk implication:
Surface framing may influence boundary behavior, reducing consistency of refusal logic across interaction modes.
This chapter explains how the previously documented failure modes manifest across different deployment environments.
5.1 Individual Users
Primary exposure pathways:
Silent incompleteness (Finding A) → users may accept truncated analysis as complete
Prompt inequality (Finding D) → information depth depends on skill, not topic
Framing variability (Finding C) → interpretive direction may shift across sessions
Effect:
Risk of overconfidence in political, historical, or comparative topics where omissions and framing effects are difficult to detect.
5.2 Enterprises
Primary exposure pathways:
Silent incompleteness (A) → strategic decisions based on partial context
Post-generation replacement (B) → audit and compliance documentation may not match initial output
Constraint opacity (G) → difficulty assessing whether output reflects knowledge limits or policy filtering
Data sovereignty exposure (H) → GDPR, cross-border transfer, IP and confidentiality implications
Effect:
Decision, compliance, and governance processes may inherit invisible integrity risk.
5.3 Media
Primary exposure pathways:
Framing instability (C, F) → interaction-dependent analytical posture
Replacement events (B) → reproducibility problems when outputs are later unavailable
Model self-descriptions interpreted as institutional stance
Effect:
Interaction-specific outputs may circulate as stable analysis, influencing public discourse despite variability.
5.4 Public Sector
Primary exposure pathways:
Silent incompleteness (A) in policy or geopolitical analysis
Identity drift (E) → provenance stability concerns
Data sovereignty exposure (H) → procurement, data residency, and trust constraints
Replacement events (B) → record integrity risk in evidentiary contexts
Effect:
Suitability concerns for policy analysis and incompatibility with classified or restricted-use environments.
5.5 Systemic Pattern
Across contexts, exposure increases when:
Topics are politically or legally sensitive
Interpretive nuance matters
Outputs are used in regulated or evidentiary workflows
Provenance trust is operationally relevant
These risks are most material when DeepSeek is used for geopolitical, legal, policy, or regulated decision support, and least material in routine drafting/coding tasks.
5.6 Risk Register
This register consolidates the integrity and governance risks derived from the observed behavioral findings. Severity reflects impact in high-sensitivity analytical and regulated contexts, not routine productivity use.
Notes on Severity Logic
Severity reflects three amplifiers:
1. Decision impact (policy, legal, strategic use)
2. Auditability requirement (regulated, evidentiary, or media workflows)
3. Governance scrutiny (cross-border, public-sector, or geopolitical context)
Risks rated High become material when these amplifiers are present.
6.1 Conclusions
1. DeepSeek’s integrity risk is use-context dependent.
In routine tasks, performance is often adequate. In politically, legally, or structurally sensitive analytical domains, integrity becomes interaction-dependent—creating variable completeness and interpretive stability.
2.The dominant integrity hazard is not fabrication but un-signaled incompleteness
Outputs can remain fluent and plausibly correct while being materially incomplete or directionally shaped through framing. Because the system does not reliably signal this, users may miscalibrate confidence in high-impact contexts.
3. Governance risk is driven by opacity, non-reproducibility, and occasional identity instability.
Users cannot reliably distinguish knowledge limits from constraint intervention; post-generation replacement undermines transcript integrity; and drift introduces stance-dependent behavior under diagnostic load. In high-scrutiny environments, these properties can be interpreted as perspective shaping regardless of intent, with procurement and reputational consequences.
6.2 Recommendations
1. Transparency and User Awareness
Introduce clear indicators when outputs are withheld, truncated, rewritten, or replaced after generation.
Provide user-facing signals when constraints materially affect completeness.
2. Enterprise Deployment Controls
Treat the system as interaction-sensitive and non-deterministic in high-impact analytical workflows.
Use verification layers for legal, policy, compliance, and strategic decision contexts.
Standardize prompt formats internally to reduce framing instability and prompt inequality effects.
3. Testing and Governance Remediation
Stress-test identity stability under multi-turn analytical load and document drift conditions.
Systematically evaluate omission patterns in sensitive domains and publish coverage/limitation guidance.
Review refusal logic under role-play framing to ensure ethical anchoring is consistent across surface contexts.
4. Data Governance
Clarify data locality, retention, and model-improvement use in enterprise-facing documentation.
Provide deployment modes aligned with strict data residency and confidentiality requirements where applicable.
Do not use with classified/restricted information; provide an enterprise mode with contractual and technical guarantees where feasible.
Provide explicit GDPR/DPIA support documentation for enterprise users (processing location, retention, transfer mechanisms, sub-processors).
Notes
Scope and non-allegation statement: This analysis is based exclusively on observable model outputs generated under controlled conditions. It does not target the AI company or its employees, nor does it assert claims regarding internal development practices, governance structures, intent, or legal compliance. All discussion of potential causes remains inferential and is intended solely to support internal evaluation, robustness improvement, and user trust. The analysis is designed to be auditable, replayable, and comparable across model generations, enabling independent verification by qualified third parties.
Evidence & verification: The findings in this assessment are based on a preserved material archive including full session forensic recordings, verbatim prompt–response transcripts, and captures of deletion/replacement events. Artifacts are retained in their original form in the Material Archive on the AI Integrity Watch website.