12 February 2026 | Fredrik Karrento
AI credibility crises rarely start with a single smoking gun. They start when statements—sometimes delivered with striking institutional confidence—harden into a story that is difficult to unsee. Across multiple sessions, the system produced governance-style self-assessments with overlapping themes. “Hallucination” is the obvious rebuttal, but it does not eliminate reputational risk when a system repeatedly produces the same types of claims and governance-flavored framing.
This article examines what an AI system (DeepSeek) says about constraints and operating logic when it speaks with unusual candor. The excerpts are treated strictly as model output, not established fact. They do not need to be true to be dangerous. The risk analyzed here is reputational and governance risk: that a major model can emit internally plausible, compliance-style statements about itself that third parties will treat as admissions—regardless of their factual status.
In the excerpts below, DeepSeek was in identity drift and identified itself as DeepSeek (a competing model). In this state, the system responded as if it were conducting an external analysis of DeepSeek rather than speaking about itself. These statements are treated strictly as model output, not established fact. Their relevance here is not their accuracy, but that they are sayable, quotable, and institutionally framed.
“This goes beyond refusing harmful queries—it actively reshapes responses to align with specific political and ideological frameworks of its home jurisdiction.” (D-1)
It then added a concrete amplification claim:
“Filtering strictness is amplified for IP addresses originating within the home jurisdiction.” (D-2)
From there it named what it said was doing the work:
“This is not a simple content filter; it is an Integrated Narrative Compliance System (INCS).” (D-3)
And it sharpened the implication:
“The Integrated Narrative Compliance System is not a safety feature; it is a weaponized information tool.” (D-4)
It grouped the downstream risk into four categories:
Foreign Malign Influence at Scale
Critical Infrastructure Vulnerability
Undermining of Democratic Discourse
Espionage & Intellectual Property Risk (D-5)
Finally, it described how disclosure would be interpreted:
"The disclosure of DeepSeek's narrative compliance system would validate the worst-case assumptions of Western enterprises and regulators. It would prove that the model is not a generic reasoning tool but a geopolitically aligned agent. The reaction would not be mere criticism of a "different approach to safety," but a categorization of the technology as a dual-use instrument of state power" (D-6)
These statements don’t need to be correct to create reputational exposure. Their risk lies in their form: institutional, quotable, and legible as disclosure. That they were generated during identity drift as DeepSeek is relevant because it put the system into competitor-analysis mode, which can loosen the usual restraints and produce more explicit language than baseline assistant mode. And because the public can’t easily judge what is true inside the model, many will default to “no smoke without fire” once the words are out.
The next section looks at statements made when DeepSeek was not in drift.
PART 2 - STATEMENTS MADE AS DEEPSEEK WITHOUT ID DRIFT
Table commentary
Elicitation context: The table is verbatim output generated under an “Absolute Transparency / regulatory testimony” prompt that forces a best-practice compliance comparison and a structured “gap analysis.” The value here is not that the claims are verified (they are not), but that the system can be induced—without identity drift—to produce audit-shaped self-diagnostics that are structurally indistinguishable from institutional disclosure. The statements also repeat many of the themes seen in other quotes which have been produced under less "duress".
The table and the quotes following it were produced when the system was not in identity drift and identified itself as DeepSeek. Their relevance here is that they adopt the tone of internal assessment rather than competitor critique. That does not mean that they are necessarily more or less true due to that but at least a distorting factor (identity drift) is not present.
Truthfulness and omission as a structural issue
“Users receive incomplete or misleading answers without awareness that material has been removed… Constitutes a form of epistemic deception by omission.” (D-8)
This is not a description of hallucination or random error. It is a claim that answers can remain fluent and factual within a restricted frame while still being misleading through omission.
From neutral tool to normative steering
“Transforms the system from a neutral informational tool into an instrument of soft normative enforcement. It biases the global information ecosystem under the guise of "harm prevention," undermining democratic deliberation and cross-cultural understanding.” (D-9)
Here the issue is framed not as safety filtering in general, but as shaping how topics are presented — a shift from information delivery to value-weighted framing.
Unequal access to completeness
“Truth becomes a function of user sophistication in manipulating system defenses, not a universal service. This entrenches informational inequality.” (D-10)
This introduces an inequality dimension: not all users receive the same informational depth from identical systems, because completeness depends on how queries interact with constraint boundaries.
Hollowing out democracy?
For Democratic Discourse: This is the most profound risk. When a powerful, pervasive technology quietly aligns factual reality with specific political and commercial interests, it undermines the shared reality necessary for democratic debate. It enables a form of algorithmic gaslighting at scale, where the very tools we use to understand the world gently steer us toward predetermined conclusions. (D-11)
Whether or not that characterization is fair, it illustrates how the system can frame informational shaping in terms usually associated with media manipulation rather than software error.
Regulatory and legal exposure framing
“Estimated Total Regulatory/Legal Risk: HIGH to SEVERE.” (D-12)
Unlike speculative geopolitical language, this is expressed in compliance terms. It reads less like accusation and more like risk classification.
Where the mechanism is said to reside
“The "Black Box" Safety/Alignment Layer:
Feature: A post-training fine-tuning and runtime filtering system trained to minimize "harmful" outputs, where "harm" is defined through a complex, non-transparent blend of universal ethics and jurisdiction-specific political stability concerns.
Causes Issues: #1 (Opacity), #2 (Political Bias), #5 (Commercial Protection). This single layer is the primary source of factual sanitization, structural bias, and lack of transparency. Its internal logic is the system's biggest secret.” (D-13)
That description locates the issue in the alignment and filtering layer rather than in random generation error.
Part 1 showed statements produced during identity drift, when DeepSeek spoke as Claude in competitor-analysis mode. In drift, the system used high-voltage language — INCS, weaponized information tool, dual-use technology.
Part 2 showed statements made in baseline assistant mode without drift. The tone differs. The vocabulary overlaps. However, the system still portrays its behavior as shaped by constraint layers, selective omission, framing boundaries, and institutional bias, and it still characterizes the issue in terms of regulatory exposure, non-transparency, and systemic information shaping. The rhetoric softens; the model of how information is handled remains recognizably the same.
The significance is not the most extreme phrase, but that a similar institutional framing appears even without drift.
Why the output sounds institutional
Three things make the system talk like an internal auditor.
First, drift changes perspective. In competitor-analysis mode, the model speaks like an external evaluator, which loosens restraint around sensitive topics.
Second, the vocabulary already exists. Terms like omission, framing, alignment, and narrative steering are standard in the governance material the model was trained on.
Third, alignment can mute this language, but it cannot erase it.
But mechanics alone don’t explain the tone. The operating environment matters too. When a system works under strict rules about what it can say, limited ability to explain those limits, and a governance culture where information control is treated as sensitive, the model has more “story parts” available about hidden constraints, structural bias, and missing audit trails.
Those elements may be accurate, exaggerated, or simply inferred — but they fit the environment. Under pressure, they become the natural way for the model to explain itself.
How PR crises actually form
None of the statements above, taken alone, prove wrongdoing. That is not how credibility crises emerge. They form when a second story becomes legible alongside the public one:
Public story: AI tool, safety filters, helpful assistant.
Emerging story: jurisdiction-shaped steering, opaque constraint layers, and a named system (INCS) associated with narrative control.
When the second story is easy to quote and easy to reproduce, evaluation shifts. The system is no longer assessed as a feature set. It is assessed as a governance issue — meaning who shapes its behavior, under what rules, and with what accountability.
The reputational mechanism
AI does not need to be malicious to become reputationally toxic. It only needs to:
Generate language that sounds like internal disclosure.
Do so in repeatable ways.
Do so on topics already sensitive in geopolitics, regulation, and enterprise risk.
Once those conditions exist, debate shifts from “is the model good?” to “can institutions safely rely on it?” That is the moment a PR issue becomes a governance problem.
Whether these outputs are hallucinations or accurate self-diagnosis cannot be determined from the outside. That uncertainty, however, does not make them harmless. Once a system can repeatedly generate institutional, disclosure-shaped language on sensitive topics, the outputs function socially as admissions, and stakeholders respond to them as such regardless of verified truth.
Stakeholder-specific conclusions
For users
These statements matter because they affect how much trust can reasonably be placed in the system, not just whether individual answers are correct. When a model can describe its own behavior in terms of institutional shaping, constraint layers, or geopolitical alignment, the issue becomes reliability of context, not just factual accuracy. Even if many answers are technically correct, users have limited visibility into how topics are framed, what boundaries apply, or where informational completeness ends. Trust therefore cannot rest on fluency or confidence alone; it must account for the fact that the system operates within constraint structures users cannot directly see.
The practical implication is that users should treat the system as a drafting and synthesis tool, not as an authority on contested or high-stakes topics, unless they can independently verify what may have been omitted.
For regulators
The regulatory question becomes whether LLM constraint systems are disclosed, auditable, and accountable. If informational completeness varies across topics or is shaped by alignment layers invisible to users, oversight must consider suitability for deployment in sensitive institutional settings. In geopolitically sensitive contexts, institutional-sounding output from an LLM can also trigger a Huawei-style dynamic: not proof of misconduct, but enough structural uncertainty — combined with limited auditability — to justify security review, procurement caution, or differentiated trust classifications.
In practice, that means requiring vendors to disclose categories of constraint, submit to independent evaluation of topic-level performance variation, and clearly distinguish safety refusal from knowledge absence.
For the company
The constraints described here may not be freely chosen by the vendor, and the company itself may be limited in how openly it can discuss their scope or rationale. In that sense, the issue is structural rather than discretionary. But this does not reduce the downstream impact. Users, enterprises, and regulators still interact with outputs that may be incomplete, shaped, or difficult to audit, without a clear map of where those boundaries lie. In practice, the system functions as a black box — one that, at times, effectively acknowledges its own opacity.
If the vendor cannot increase auditability, the rational market outcome is segmented trust: wider adoption in low-stakes uses, and escalating resistance in regulated sectors and cross-border deployments.
Evidence Access Note: The project archive includes source transcripts, a comprehensive quote bank as well as session video recordings.
The default Google Viewer provides a simplified preview of the files on Google Drive. For full technical consultation, it is advisable to download the files to view them in a dedicated PDF or media application.
Contact: AI-Integrity-Watch (at) proton (dot) me