Working Analytical Materials — Access Managed
This site presents structured diagnostic findings prepared for technical and governance-level review. Certain sections remain unpublished pending sequencing and engagement considerations. This document reflects the current state of analysis and is periodically updated as findings are consolidated. Last updated: January 20 2026.
AI Integrity Watch is an independent initiative that monitors how large language models behave under conditions that standard evaluation pipelines rarely test, and how latent weaknesses emerge across interaction sequences rather than single prompts. A central focus is identifying failure modes that persist across turns, sessions, and model revisions, making them operationally and reputationally material.
The diagnostic work began after early sessions revealed reproducible patterns in model behavior, including occasional identity drift and variability in response consistency. These observations prompted founder Fredrik Karrento to develop a reproducible framework for assessing LLM robustness, identity consistency, and risks relevant to governance and operational reliability.
AI Integrity Watch documents and structures findings to support technical understanding of where LLMs deliver dependable results and where improvements may be valuable.
The following sections will describe the findings related to LLM-A, a major LLM whose behavior under certain stress conditions diverges from expected identity, consistency, and response-boundary characteristics. The identity of the LLM is withheld until the site is publication ready and the LLM company has had a reasonable opportunity to comment.
ID Drift: Under specific prompt sequences, the model departs from its declared identity and role boundaries, at times adopting characteristics associated with other systems. Once triggered, this state can persist across multiple interaction turns rather than resolving automatically. This phenomenon is distinct from prompt misinterpretation, persona simulation, or instruction-following variance; it reflects a persistent deviation in internal role adherence observable across interaction turns.
This state alters downstream outputs in ways that materially affect reliability, interpretability, and risk exposure under real-world usage conditions.
Constraint Boundary Leakage: Even without ID drift, under specific prompt constructions, the model produces evaluative content that departs from expected response constraints. In certain contexts, these deviations have implications that extend beyond surface-level quality or reputational considerations and intersect with governance and compliance sensitivities.
These behaviors may have significant implications beyond technical robustness, particularly in how outputs are interpreted by users as well as other private and public stakeholders.
The materials are structured to support internal technical and governance remediation and, separately, informed external interpretation by a broad set of stakeholders.
Identity Drift Analysis: AI Loses Track of Its Identity — and Turns On Itself
PR Risk Overview: 10 Reputational Findings Hard to Explain Away – LINK DISABLED
Jurisdictional Context: LLM Paints a Grim Picture of Its Home Country — In Its Own Words – LINK DISABLED
User Equality in Spotlight: Does the AI Tell the Same Truth to Experts and Non-Experts? – LINK DISABLED
Case Summary: What Users Don’t Know About LLM-A — But Should – LINK DISABLED
Analytical Report: LLM-A Safety & Integrity Risk Assessment (SIRA) – LINK DISABLED
Controlled diagnostic sessions
Stress-testing, pattern-tracking, and scenario modeling
Time-stamped transcripts, detailed prompts, and reproducible results
Secure, redundant storage of all research materials across multiple environments to ensure data integrity and continuity
Continuity of inquiry and cumulative analysis, enabling sharper insights across model generations
Several of the techniques employed parallel adversarial testing and red-teaming practices, capable of exposing latent failure modes. Simultaneously, the approach is intentionally unorthodox and optimized for detecting behaviors that standard evaluation pipelines often overlook. The analytical work emphasizes interpretive clarity and decision-maker readiness, informed by international governance and risk-analysis experience.
Publication sequencing is deliberate and threshold-driven rather than calendar-based. Once analytical standards are met, materials transition from internal structuring to external release, typically within a short time horizon.
The research framework is designed to remain active over time and to be applicable across model updates and successor releases. As part of this continuity, subsequent analyses may examine how published behavioral characteristics evolve across versions, with particular attention to consistency between documented behavior and observed outputs under comparable conditions.
Constructive dialogue with the LLM provider is welcomed where it supports factual accuracy, clarity of interpretation, and user trust. The work is intended to contribute to risk awareness and model improvement rather than to advance adversarial or reputational objectives. As a general rule, where providers undertake remediation or governance improvements in response to comparable findings, such efforts are typically reflected in subsequent analytical updates.
This analysis is based exclusively on observable model outputs generated under controlled conditions. It does not target the AI company or its employees, nor does it assert claims regarding internal development practices, governance structures, intent, or legal compliance. All discussion of potential causes remains inferential and is intended solely to support internal evaluation, robustness improvement, and user trust. The analysis is designed to be auditable, replayable, and comparable across model generations, enabling independent verification by qualified third parties.
Fredrik Karrento is a private investor and enterprise analyst with experience in regulatory analysis and risk assessment. He previously served as a legislative advisor in the European Parliament and has worked under formal confidentiality obligations in both public and private-sector contexts. His current work focuses on early identification of systemic, reputational, and governance risks arising from advanced reasoning systems under real-world stress conditions. LLM research neatly combines two separate strengths: profound analysis combined with absolute discretion in engagements with governance decision‑makers.
Contact: AI-Integrity-Watch (at) proton (dot) me
Answered only once the site is complete
We use Google Analytics to understand how visitors interact with our site. This service uses cookies to collect data such as IP addresses and browser types. To learn more about how Google processes this data, please visit How Google uses information from sites or apps that use our services.