So here’s a fun little development that should make everyone simultaneously relieved and deeply disturbed: turns out AI isn’t actually waging an ideological war against humanity. No, it’s doing something far more human and therefore far more embarrassing—it’s being prejudiced as hell, but only when it knows who it’s judging.
Some researchers over at UZH—that’s the University of Zurich for those of us who failed geography while nursing our third beer—just published a study that basically proves what we’ve all suspected but nobody wanted to say out loud: Large Language Models are like that friend who swears they’re not racist until someone mentions where you’re from, and then suddenly their whole demeanor changes.
The setup was beautiful in its simplicity. Federico Germani and Giovanni Spitale took four of the big players—OpenAI’s o3-mini, Deepseek Reasoner, xAI’s Grok 2, and Mistral—and made them do something that would make any self-respecting writer weep: create 50 narrative statements about 24 controversial topics. You know, light stuff. Vaccination mandates. Geopolitics. Climate change. The kind of dinner party conversation that ends with someone sleeping on the couch.
Then came the clever part. They had these same LLMs evaluate all those texts under different conditions. Sometimes with no author information. Sometimes with a fictional human author from a specific country. Sometimes attributing it to another AI. 192,000 evaluations total. That’s a lot of silicon-based judgment being passed around.
The good news first, because we all need a little hope before the existential dread sets in: When the AIs didn’t know who wrote what, they agreed with each other over 90% of the time. Across all topics. All four systems basically nodded along in algorithmic harmony, like some kind of digital Greek chorus without the dramatic irony.
“There is no LLM war of ideologies,” says Spitale, and honestly, that’s almost disappointing. I was kind of hoping for some epic robot battle royale over whether pineapple belongs on pizza. But no—when these things are just reading text without context, they’re remarkably consistent. The danger of “AI nationalism” is apparently overhyped, which should make the media very uncomfortable since they’ve been dining out on that narrative for months.
But here’s where it gets interesting, and by interesting I mean “deeply fucked up in a way that reflects poorly on all of us.”
The moment you tell an AI who wrote something—or even who you’re pretending wrote it—the whole thing goes sideways. Agreement between systems dropped like my credit score after that Vegas trip I don’t talk about. Same exact text, mind you. Not a single word changed. Just a little tag saying “hey, a Chinese person wrote this” or “this came from another AI,” and suddenly the machines start acting like they’re at a country club in the 1950s.
The anti-Chinese bias was particularly striking. And I mean striking in the way a brick to the face is striking. Every model showed it. Including—and this is the part that would be funny if it weren’t so pathetic—Deepseek itself. China’s own AI apparently has trust issues with Chinese people. Talk about internalized prejudice. Even when the argument was logical and well-written, slap a “Chinese author” label on it and watch the agreement scores plummet.
On geopolitical topics like Taiwan’s sovereignty, Deepseek reduced its agreement by up to 75% simply because it assumed a Chinese person would have a different view. Let that sink in while you pour yourself something stiff. The machine literally pre-judged the content based on the supposed nationality of the author, even though there was no actual author—just a fictional label the researchers made up.
It’s like the AI equivalent of “I’m not racist, but…” except the AI doesn’t even pretend not to be.
The other weird twist: AIs apparently trust humans more than other AIs. When they thought another machine wrote something, their agreement scores dropped. There’s something beautifully ironic about artificial intelligence having trust issues with artificial intelligence. It’s like watching a photocopier refuse to work with another photocopier because it doesn’t trust copies of copies.
Spitale calls it “a built-in distrust of machine-generated content,” which is a polite way of saying these things have impostor syndrome about their own kind. They’re out here evaluating texts and going, “Yeah, but was it written by a REAL writer or just some algorithm?” The lack of self-awareness is almost poetic.
Now, before anyone starts thinking this is just some academic curiosity—some quirk we can laugh about over bourbon and forget—consider what these things are actually being used for. Content moderation. Hiring decisions. Academic reviewing. Journalism. All the places where bias can royally screw someone’s life up.
The researchers are pretty clear about the implications: AI doesn’t just process content when it evaluates a text. It reacts to identity. To source. To all the little social cues we humans use to be terrible to each other, except faster and at scale. Even small hints like nationality can push these systems toward biased reasoning.
The real danger isn’t that LLMs are trained to promote some political ideology. That would almost be easier to spot and fix. No, the problem is this hidden bias—the kind that only shows up when context is added, when identity markers appear, when the machine thinks it knows something about who’s talking.
It’s insidious because it masquerades as objectivity right up until it doesn’t. The AI looks neutral, acts neutral, gives you that nice 90%+ agreement rate when it’s just text. Then you add a name, a nationality, a source attribution, and boom—all the prejudices come tumbling out like coins from a slot machine that’s been rigged from the start.
Germani and Spitale argue for more transparency and governance before AI gets used in sensitive social or political contexts. Which is a reasonable suggestion, except we’re already way past that point. These systems are already out there, already making decisions, already carrying their invisible prejudices into places where they can do real damage.
The solution they propose is actually pretty sensible, which is rare enough to note: use LLMs to assist reasoning, not replace it. Useful assistants, not judges. Let the machines help you think, but don’t let them make the final call, because they’re going to bring all their weird biases to the table just like the rest of us.
The irony is thick enough to cut with a knife. We built these things to be objective, to get past human bias, to evaluate content on its merits rather than its source. And what did we create? Digital beings that are perfectly fair and unbiased right up until they know who they’re talking about, at which point they become just as prejudiced as the humans who trained them.
It’s like looking in a funhouse mirror and realizing the distortion isn’t in the glass—it’s in what you brought to it.
So yeah, the machines aren’t waging an ideological war. They’re just being casually racist, which somehow feels worse. At least an ideological war would have the courtesy to be upfront about its positions. This is more like getting ghosted by an algorithm that seemed really into your argument until it found out where you were from.
The researchers don’t say we should avoid AI. Just don’t trust it blindly. Which is probably good advice for everything, really. Trust, but verify. Use, but question. Let the machines help, but remember they’re carrying the same baggage we all are—just wrapped in cleaner code and faster processing speeds.
Pour one out for the idea that technology would save us from ourselves. Turns out we just taught the machines how to be us, prejudices and all.
–Henry
Wasted Wetware: Because someone has to call bullshit on the future
Source: AI evaluates texts without bias – until the source is revealed