So here we are. University of Chicago economists—because apparently economists have nothing better to do than study how teenagers cheat—just published research showing that students are using “AI humanizer” apps to make their ChatGPT essays undetectable. And surprise, surprise, it’s working like a charm.
The whole thing reads like a fever dream from a cyberpunk novel nobody wanted to write. We’ve got AI writing essays. Then we’ve got AI detecting those essays. Then we’ve got AI disguising those essays so they look human again. And then we’ve got other AI trying to detect the disguised AI. It’s turtles all the way down, except the turtles are all robots and they’re all lying to each other.
The researchers tested all the big-name detection tools—the ones schools are dropping serious cash on to catch plagiarists—and found that most of them crater from 90% accuracy to below 50% when students run their ChatGPT slop through something called “humanization software.” Only one tool, Pangram, maintained decent accuracy at 96.7%. Which means even the best detector is still wrong about three times out of a hundred. Great odds when you’re deciding whether to fail a kid.
Here’s where it gets really beautiful: these detection tools also flag about one in every hundred genuine human essays as AI-generated. Do the math on a class of thirty students and you’ve got at least one innocent kid getting hauled in front of the honor board every few assignments. Vanderbilt University already pulled the plug on Turnitin’s AI detector after discovering it was disproportionately accusing non-native English speakers and students with learning disabilities of cheating. Because nothing says “cutting-edge technology” like algorithmic discrimination.
The humanization software has names that sound like bad spy movie codenames: StealthGPT, Undetectable AI, WriteHuman. These things take the robotic perfection of ChatGPT’s output and deliberately mess it up to make it sound more human. They add inconsistencies, stylistic variations, the kind of subtle imperfections that make writing feel like it came from an actual person instead of a very confident autocorrect function.
Think about that for a second. We’re using AI to make AI writing look less like AI writing so other AI can’t tell it’s AI writing. We’ve created a technological ouroboros that’s eating its own ass and somehow making money doing it. The kids are paying for apps to fool the systems their schools are paying for to catch them. It’s beautiful in its own horrible way.
The researchers are calling it an “escalating technological arms race,” which is academic speak for “this is completely out of control and we have no idea what to do about it.” They’ve introduced this thing called a “policy cap framework” that lets institutions decide how many false accusations they’re willing to tolerate versus how much AI use they’re willing to miss. It’s like asking someone whether they’d rather occasionally execute an innocent person or let some murderers walk free. No pressure.
What really cracks me up is the part about “legitimate AI assistance” versus “problematic assistance.” Where’s the line between using AI for grammar correction and using it to write your entire paper? Nobody knows. Is it okay to have ChatGPT help you brainstorm? What about reorganizing your ideas? What if it just writes one paragraph? Two? The whole damn thing?
It’s all context-dependent, apparently. Which means it’s completely subjective. Which means we’re back to the same place we’ve always been with plagiarism: hoping teachers can tell the difference between a student’s voice and something that sounds off. Except now the fake stuff sounds exactly like the real stuff, so good luck with that.
Some schools are giving up entirely on automated detection. Others are trying to split the difference with policy frameworks that accept some AI use will slip through. The smart ones are rethinking the whole assessment game—more in-person exams, oral presentations, project-based learning that requires actual human interaction. You know, the way we used to do things before we decided everything needed to be scalable and remote and automated.
But here’s what nobody wants to say out loud: maybe the kids are onto something. Not in a “cheating is good” way, but in a “this whole system is absurd” way. We’ve built an education system that values the ability to produce certain types of written output on demand. Then we invented a machine that can produce that exact output. Then we got mad when students used the machine. Then we built another machine to catch them. Then they built a third machine to fool the second machine. And now we’re all standing around wondering how we got here.
The detection companies are doubling down, naturally. Pangram Labs is developing “more sophisticated approaches using active learning algorithms and hard negative mining techniques.” Which is impressive-sounding gibberish that means they’re teaching their AI to spot AI that’s been taught to not look like AI. It’s an arms race where both sides are running in circles and the finish line keeps moving.
The researchers conclude with this gem: “the era of easily distinguishing human from AI writing could be coming to an end.” Could be? Brother, we’re already there. We passed that exit three miles back while arguing about whether to stop for coffee.
What we’re really watching is the death of a certain kind of assessment. The five-paragraph essay. The research paper. The take-home exam. All those things that test whether you can produce a specific format of written content. Those are done. Finished. You can either accept that and redesign your evaluation methods, or you can keep playing whack-a-mole with detection software while the kids run circles around you with their humanizer apps.
The paradox is delicious. The whole point of writing assignments is supposedly to teach critical thinking, research skills, argumentation. But if a machine can do it convincingly enough that experts can’t tell the difference, what are we really measuring? The ability to follow a format? The capacity to regurgitate information? Those were always terrible proxies for actual learning, we just pretended they weren’t because they were easy to grade.
So here we are. Students using AI to write essays, then using more AI to make those essays look human, to fool AI detectors that schools bought to catch AI writing. It’s a perfect closed loop of technological absurdity, and the only people making any sense are the kids who looked at the whole setup and thought “yeah, I’m not playing this stupid game.”
The University of Chicago economists wrapped up their study by suggesting we need “more nuanced policies that account for the reality of AI assistance in modern writing.” What they mean is: we need to admit that the old way of doing things is dead and figure out what the hell comes next.
Good luck with that. In the meantime, I’ll be over here watching the machines teach each other to lie while the humans try to figure out what truth even means anymore.
Source: Students Use “AI Humanizer” Apps To Make ChatGPT Essays Undetectable