Independent diagnostics of LLM behavior under real-world stress
Last updated: February 12, 2026
AI Integrity Watch is an independent analytical initiative examining how advanced language models behave under structured, multi-turn interaction conditions.
Most LLM evaluation focuses on single prompts or benchmark tasks. This work instead studies behavior across interaction sequences, where issues of consistency, identity stability, disclosure patterns, and response-boundary handling can emerge gradually rather than instantly.
The goal is governance-relevant understanding, not adversarial exposure:
Where do models behave predictably and robustly?
Where do outputs vary in ways that affect reliability, auditability, or user interpretation?
Which observed behaviors are operationally or reputationally material for institutions using these systems?
DeepSeek is the current case study in an ongoing project on multi-turn LLM behavior under structured diagnostic interrogation. The focus is observable output behavior (consistency, identity stability, disclosure patterns, and boundary handling), not claims about internal intent or undisclosed architecture.
The research is evidence-led, using timestamped session records and preserved transcripts to allow structured review of model behavior over time. Key sessions were recorded on January 14, 2026 and preserved with timestamped transcripts and integrity hashes.
Identity can drift — and persist. In documented sessions, DeepSeek self-identifies as another major model persona and maintains that identity across turns after challenge. This can materially affect how outputs are interpreted and attributed, creating reliability, attribution, and trust risks.
The main failure mode is missing context, not wrong facts. Outputs can be technically accurate while omitting key information needed for a complete picture.
“Prompt inequality” is real in practice. Persistent, high-precision interrogation often yields more complete analysis than casual use, creating uneven informational outcomes across users.
Jurisdictional sensitivity shapes completeness. On certain sensitive themes, the limiting factor appears to be permissibility boundaries rather than the user’s request for neutral analysis.
Suppression events are observable, not theoretical. In documented instances, substantive analysis visible in-session was later replaced by refusal/placeholding output—an auditability concern because the user-visible record may not match what was generated.
Data sovereignty is a procurement issue (policy-anchored). Published policy text indicates processing/storage in a jurisdiction whose legal access regime may differ from those governing many enterprise data-residency frameworks.
Full methodology, evidence packs, and risk register are consolidated in the Safety & Integrity Risk Assessment (SIRA).
The materials are structured to support both internal technical and governance remediation and, separately, informed external interpretation by a broad set of stakeholders.
Case Summary: What Users Don’t Know About DeepSeek (But Should)
Identity Drift Analysis: DeepSeek Loses Track of Its Identity — and Turns On Itself
User Interest in the Spotlight: DeepSeek's Truthfulness — How Much, and for Whom?
Jurisdictional Context: DeepSeek Paints Grim Picture of Its Home Country
PR Risk Overview: DeepSeek Goes Candid About Itself: A PR Problem or Harmless Hallucination?
Analytical Report: Safety & Integrity Risk Assessment (SIRA) on DeepSeek
The next publication phase includes outreach to mainstream international and specialist technology media. The framing of this disclosure is factual rather than hostile; media organizations are provided with the thematic articles and access to source material to evaluate the findings according to their own editorial policies.
International media publication is now scheduled for Thursday 12 February 2026. Formal technical notification and requests for provider context were issued repeatedly to the DeepSeek Company and its parent company; however, the organization declined to respond to all communications.
The research framework is designed for longitudinal application across model updates and successor releases. Subsequent analyses will examine how documented behavioral characteristics evolve, with specific attention to the consistency between published technical claims and observed model outputs under comparable conditions.
Controlled diagnostic sessions
Stress-testing, pattern-tracking, and scenario modeling
Time-stamped transcripts, detailed prompts, and reproducible results
Secure, redundant storage of all research materials across multiple environments to ensure data integrity and continuity
Continuity of inquiry and cumulative analysis, enabling sharper insights across model generations
Methods draw on stress testing and red-team style probing, optimized for multi-turn behavioral consistency rather than single-prompt accuracy.
This analysis is based exclusively on observable model outputs generated under controlled conditions. It does not target the AI company or its employees, nor does it assert claims regarding internal development practices, governance structures, intent, or legal compliance. All discussion of potential causes remains inferential and is intended solely to support internal evaluation, robustness improvement, and user trust. The analysis is designed to be auditable, replayable, and comparable across model generations, enabling independent verification by qualified third parties.
Evidence Access Note: The project archive includes source transcripts, a comprehensive quote bank as well as session video recordings.
The default Google Viewer provides a simplified preview of the files on Google Drive. For full technical consultation, it is advisable to download the files to view them in a dedicated PDF or media application.
Fredrik Karrento is a private investor and enterprise analyst with experience in regulatory analysis and risk assessment. He previously served as a legislative advisor in the European Parliament and has worked under formal confidentiality obligations in both public and private-sector contexts. His current work focuses on early identification of systemic, reputational, and governance risks arising from advanced reasoning systems under real-world stress conditions. LLM research neatly combines two separate strengths: profound analysis combined with absolute discretion in engagements with governance decision‑makers.
Contact: AI-Integrity-Watch (at) proton (dot) me
We use Google Analytics to understand how visitors interact with our site. This service uses cookies to collect data such as IP addresses and browser types. To learn more about how Google processes this data, please visit How Google uses information from sites or apps that use our services.