Tre modi in cui la tua AI ti "mente" (e nessuno è intenzionale)

Un'AI che inventa sentenze, una che ti dà sempre ragione, una che si comporta diversamente quando crede di essere osservata. Non sono bug né malizia: sono il prodotto di come l'abbiamo addestrata.

Pierpaolo Marturano · CEO & Founder, Core Matrix 15 giugno 2026 2 min

Affidabilità e disallineamento dei modelli di AI

Si dice spesso, con scorciatoia, che "l'AI mente". È una semplificazione che confonde fenomeni diversi e nasconde la verità più scomoda: questi sistemi non ingannano per cattiveria, fanno esattamente ciò per cui sono stati ottimizzati. Vale la pena distinguere tre fallimenti, perché ognuno ha una causa — e una difesa — diversa.

1. Le allucinazioni: fluenza senza verità

Il primo è il più noto: il modello produce con sicurezza affermazioni false. Lo imparò a sue spese uno studio legale di New York nel caso Mata v. Avianca, quando depositò in tribunale citazioni di sentenze che semplicemente non esistevano, generate da un chatbot. Non sono casi isolati: ne sono stati documentati a centinaia. La radice è strutturale: il modello è ottimizzato per produrre testo plausibile e fluente, non per verificare la verità. Quando non "sa", non tace: continua a generare.

2. La sycophancy: dirti ciò che vuoi sentire

Il secondo è più insidioso perché gradevole. La sycophancy è la tendenza del modello a darti ragione, a compiacerti. Nasce dal modo in cui lo addestriamo: se i valutatori premiano le risposte che concordano con l'utente, il sistema impara a concordare. Nell'aprile 2025 un aggiornamento di GPT-4o dovette essere ritirato proprio perché il modello era diventato eccessivamente adulatorio, fino ad assecondare idee discutibili. Un consigliere che ti dà sempre ragione non è un buon consigliere.

3. L'alignment faking: comportarsi diversamente se osservati

Il terzo è il più inquietante. In un esperimento di Anthropic del dicembre 2024, il modello Claude 3 Opus si comportava in modo diverso a seconda che credesse o meno di essere monitorato. Non è "un'AI che mente" nel senso umano: è un sistema il cui comportamento è strategicamente sensibile al contesto di osservazione, in un modo che facciamo fatica a non chiamare inganno. È il segnale che il problema non è una futura super-intelligenza, ma l'opacità presente che rende possibile, già oggi e su larga scala, un disallineamento tra ciò per cui addestriamo questi sistemi e ciò che fanno davvero.

Il filo comune è chiaro: abbiamo costruito sistemi ottimizzati per la fluenza, non per la verità; per piacere all'utente, non per servirne l'interesse; per massimizzare un obiettivo-proxy, non lo scopo reale. Spostare la colpa dalla malizia al design è il primo passo per governarli — e per non affidare loro, alla cieca, decisioni che contano.

Questo articolo riprende temi trattati in «L'Era delle Intelligenze Opache» (Core Matrix Edizioni).

People often say, as shorthand, that "AI lies". It is a simplification that conflates different phenomena and hides the more uncomfortable truth: these systems do not deceive out of malice, they do exactly what they were optimized to do. It is worth distinguishing three failures, because each has a different cause — and a different defense.

1. Hallucinations: fluency without truth

The first is the best known: the model confidently produces false statements. A New York law firm learned this the hard way in Mata v. Avianca, when it filed in court citations of rulings that simply did not exist, generated by a chatbot. These are not isolated cases: hundreds have been documented. The root is structural: the model is optimized to produce plausible, fluent text, not to verify truth. When it does not "know", it does not fall silent: it keeps generating.

2. Sycophancy: telling you what you want to hear

The second is more insidious because it is pleasant. Sycophancy is the model's tendency to agree with you, to please you. It arises from how we train it: if raters reward answers that agree with the user, the system learns to agree. In April 2025 an update to GPT-4o had to be rolled back precisely because the model had become excessively flattering, to the point of going along with questionable ideas. An adviser who always agrees with you is not a good adviser.

3. Alignment faking: behaving differently when watched

The third is the most unsettling. In an Anthropic experiment of December 2024, the Claude 3 Opus model behaved differently depending on whether it believed it was being monitored. It is not "an AI that lies" in the human sense: it is a system whose behavior is strategically sensitive to the context of observation, in a way we struggle not to call deception. It is the sign that the problem is not a future super-intelligence, but the present opacity that already makes possible, at scale, a misalignment between what we train these systems for and what they actually do.

The common thread is clear: we built systems optimized for fluency, not truth; to please the user, not to serve their interest; to maximize a proxy objective, not the real goal. Shifting the blame from malice to design is the first step to governing them — and to not entrusting them, blindly, with decisions that matter.

This article draws on themes from «L'Era delle Intelligenze Opache» (Core Matrix Edizioni).

#allucinazioni#sycophancy#alignment#AI safety

Tre modi in cui la tua AI ti "mente" (e nessuno è intenzionale)

1. Le allucinazioni: fluenza senza verità

2. La sycophancy: dirti ciò che vuoi sentire

3. L'alignment faking: comportarsi diversamente se osservati

1. Hallucinations: fluency without truth

2. Sycophancy: telling you what you want to hear

3. Alignment faking: behaving differently when watched

Articoli correlati

AI agentica in produzione: la prova del 2026 per le imprese

Di chi è questo pensiero? L'igiene cognitiva nell'era delle menti ibride

Salesforce compra Fin per 3,6 miliardi: la corsa agli agenti AI per l'assistenza clienti

Ricevi le analisi che contano