On Detecting AI Writing
Why detecting AI text matters
In 2023, GPT-4 was released. By 2024, every major LLM provider had models that could produce fluent, structured, contextually appropriate prose on nearly any topic. By 2025, the RAID benchmark — the largest independent evaluation of AI text detectors — showed that LLM-generated text had been able to fool humans at rates above 77% for several years running.1
This creates a real problem across multiple domains. In education, institutions need to evaluate whether student work reflects genuine understanding. In publishing, editors need to know whether submitted articles are original. In regulation and law, the provenance of a document can matter enormously. In information ecosystems more broadly, the ability to distinguish human expression from machine generation has become a question about the integrity of public discourse.
The challenge is this: the same qualities that make AI writing useful — fluency, coherence, grammatical correctness — are exactly the qualities that make it hard to detect. And the stakes of getting detection wrong cut in both directions. A false negative lets AI-generated content pass as human. A false positive accuses a human writer of cheating.
How detection actually works under the hood
There is no single method for detecting AI-generated text. Instead, the field has developed several distinct families of approaches, each with different assumptions, strengths, and failure modes. Understanding these families is essential before evaluating any commercial tool.
Statistical metrics
Measure text properties like perplexity and burstiness against known baselines for human vs. AI writing.
No training neededTrained classifiers
Neural networks (often fine-tuned transformers) trained on labelled datasets of human and AI text.
SupervisedZero-shot detection
Use the generating model itself to assess whether text occupies a "likely" region of its output space.
Model-basedWatermarking
Embed imperceptible statistical signals during generation that can later be detected with a secret key.
ProactiveIn practice, most commercial detection tools combine multiple approaches. But each one has a distinct logic, and the limitations of one are not always offset by the strengths of another.
Perplexity: the surprise metric
Perplexity is the most fundamental concept in AI text detection. It measures how predictable a piece of text is to a language model. Formally, perplexity is the exponentiated average negative log-likelihood of the tokens in a sequence:
PPL(x) = exp( -1/N Σ log P(x_i | x_1, ..., x_{i-1}) )
In simpler terms: for each word in a passage, a language model assigns a probability to that word given everything that came before it. If the model finds the word highly probable, the log-probability is close to zero. If the word is surprising, the log-probability is very negative. Perplexity aggregates these surprises into a single number.2
This is the core detection signal. When you run a passage through a reference language model (typically GPT-2, since it’s openly available), AI-generated text clusters toward the low-perplexity end, while human text spreads more broadly across higher perplexity values.
But here is the critical problem: the distributions overlap. Human text that happens to be clean, formal, and predictable — academic prose, technical writing, formulaic business emails — can score just as low on perplexity as AI-generated text. And AI text generated with high temperature or creative sampling can look surprisingly “human” in its perplexity profile.3
Burstiness: the rhythm metric
If perplexity measures how surprising each word is, burstiness measures how much that surprise varies across a passage. Human writing naturally “bursts” — a long complex sentence followed by a punchy short one, a technical paragraph followed by a casual aside. This creates high variance in sentence-level metrics.
AI-generated text tends to be more metronomic. Sentence lengths cluster around a narrower band. Structural patterns repeat. The rhythm is even, almost suspiciously so.23
Trained classifiers
The second major family of detection methods uses supervised machine learning. You take a large dataset of human-written text and AI-generated text, label them, and train a classifier — typically a fine-tuned transformer model like RoBERTa — to distinguish between the two classes.14
This is the approach behind many commercial detectors. GPTZero, for instance, combines statistical features (perplexity, burstiness) with a trained transformer backbone. Turnitin integrates a classifier into its existing plagiarism infrastructure. Originality.ai trains on specific model outputs and retrains as new LLMs emerge.4
The strength of this approach is that classifiers can learn subtle stylistic patterns that simple metrics miss — token-level distributions, syntactic preferences, discourse-level structures. The weakness is that they can overfit to the models they were trained on. A classifier trained heavily on GPT-3.5 outputs may struggle with Claude or Gemini. And when a new model generation arrives, the classifier’s accuracy can drop until it is retrained.
Zero-shot methods: DetectGPT and its descendants
In 2023, Mitchell and colleagues at Stanford introduced DetectGPT, a fundamentally different approach. The core insight was elegant: if a passage was generated by a particular language model, then small perturbations of that passage should tend to decrease the model’s log-probability. For human-written text, perturbations could go either way.5
DetectGPT works by generating many slight rewrites of the suspect text (using a separate model like T5 to fill in masked words), then comparing the log-probabilities of the original against these perturbations. If the original consistently scores higher, it is likely sitting at a local maximum of the model’s probability surface — a telltale sign of machine generation.
This approach achieved strong results — 0.95 AUROC for detecting GPT-NeoX generated news articles — but was computationally expensive, requiring around 100 model calls per passage. Fast-DetectGPT (Bao et al., 2024) replaced the perturbation step with a single conditional probability curvature computation, achieving comparable accuracy 340× faster.6
The fundamental limitation of zero-shot methods is their vulnerability to adversarial modification. Simple paraphrasing can reduce DetectGPT’s detection rate from over 70% to near chance levels.5
Watermarking: detecting by design
All the methods discussed so far are post-hoc: they try to detect AI text after it has been generated. Watermarking takes a fundamentally different approach: it embeds a detectable signal during generation, so that detection later becomes a matter of checking for the signal rather than guessing about authorship.7
The green-list / red-list method
The most influential watermarking scheme was proposed by Kirchenbauer et al. in 2023. Before generating each token, the method uses a hash of the preceding token to divide the vocabulary into a “green list” and a “red list.” It then adds a small bias to the logits of green-list tokens, making them slightly more likely to be selected. Over many tokens, this creates a statistical excess of green-list tokens that is invisible to human readers but detectable with the right key.7
Google DeepMind’s SynthID
SynthID, developed by Google DeepMind, extends watermarking to production scale. Deployed in Google’s Gemini models since 2024 and open-sourced in late 2024, SynthID uses tournament sampling with pseudorandom functions to guide token generation in a way that creates a detectable statistical signature without retraining the model.8
SynthID is notable for being the first watermark deployed at production scale in a commercial LLM product. However, independent assessment has shown that it remains vulnerable to meaning-preserving attacks like paraphrasing and back-translation, and is “easier to scrub than other state-of-the-art schemes even for naive adversaries.”9
Code demo: computing perplexity
To make these concepts concrete, here is a working Python example that computes the per-token log-probability and overall perplexity of a text passage using GPT-2. This is essentially what the statistical layer of many detection tools does under the hood.
import torch
import math
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
model.eval()
def compute_perplexity(text: str) -> dict:
tokens = tokenizer.encode(text, return_tensors="pt")
with torch.no_grad():
outputs = model(tokens, labels=tokens)
loss = outputs.loss.item()
logits = outputs.logits
shift_logits = logits[:, :-1, :]
shift_labels = tokens[:, 1:]
log_probs = torch.log_softmax(shift_logits, dim=-1)
token_log_probs = log_probs.gather(
2, shift_labels.unsqueeze(-1)
).squeeze(-1)[0]
token_strings = [
tokenizer.decode([t]) for t in tokens[0][1:]
]
perplexity = math.exp(loss)
return {
"perplexity": round(perplexity, 2),
"avg_log_prob": round(-loss, 4),
"tokens": [
{"token": tok, "log_prob": round(lp.item(), 4)}
for tok, lp in zip(token_strings, token_log_probs)
],
}
ai_text = (
"The impact of climate change on global agriculture "
"is a significant concern for policymakers and "
"researchers worldwide. Rising temperatures and "
"changing precipitation patterns are expected to "
"affect crop yields in many regions."
)
human_text = (
"Look, I know everyone keeps saying the farms are "
"in trouble. And yeah, they probably are. But my "
"uncle's been growing corn since '84 and he says "
"the real problem isn't the heat — it's the banks."
)
result_ai = compute_perplexity(ai_text)
result_human = compute_perplexity(human_text)
print(f"AI-like perplexity: {result_ai['perplexity']}")
print(f"Human-like perplexity: {result_human['perplexity']}")
print(f"Ratio: {result_human['perplexity'] / result_ai['perplexity']:.1f}x")
The platforms: do commercial detectors work?
Multiple companies now offer AI text detection as a service. The three most widely used are GPTZero, Turnitin, and Originality.ai, each built for different contexts and each making different trade-offs between sensitivity and false-positive rates.4
| Platform | Primary audience | Detection rate | False positive rate | RAID rank |
|---|---|---|---|---|
| GPTZero | Educators, individuals | ~88–95% | ~0.2–2% | Top 3 |
| Turnitin AI | Institutions (bundled) | ~77–98% | Variable | Strong |
| Originality.ai | Content marketing, SEO | ~85–94% | ~2–3% | #1 overall |
| Winston AI | General, publishing | ~96% | ~3–4% | Good |
| Copyleaks | Enterprise, multilingual | ~99% claimed | ~0.03% | Moderate |
| ZeroGPT | Free, casual use | ~80–85% | ~15–20% | Lower |
What the RAID benchmark tells us
The RAID benchmark (Dugan et al., 2024) is the most rigorous independent evaluation of AI text detectors. It contains over 10 million documents spanning 11 generative models, 8 writing domains, 12 adversarial attacks, and 4 decoding strategies. Its key findings are sobering.1
First, many detectors that claim 99%+ accuracy on their own test sets perform far worse on RAID. At a 5% false positive rate, the best detector achieved about 85% accuracy overall. Second, simple adversarial modifications — changing the sampling strategy, adding a repetition penalty, or running text through a paraphraser — caused substantial drops in detection accuracy across all tools tested. Third, detectors showed particular difficulty generalizing to unseen models and domains they were not trained on.
Curtin University drops Turnitin
In 2026, Curtin University in Australia became one of the most prominent institutions to stop using Turnitin’s AI detection module, citing ongoing concerns about reliability. This decision reflects a growing pattern: as the stakes of false accusations become clearer, institutions are pulling back from automated detection as a disciplinary tool.4
The false positive problem
If there is a single issue that undermines trust in AI text detection, it is false positives. A false positive means flagging genuinely human-written text as AI-generated. And the evidence for systematic bias in this direction is concerning.
Non-native English speakers
A 2023 study affiliated with Stanford found that seven AI detectors flagged 61% of essays by non-native English speakers as AI-generated. On roughly 20% of those essays, all seven detectors agreed the text was AI — and they were all wrong.10
The mechanism is straightforward. Non-native English writers tend to use simpler vocabulary, more uniform sentence structures, and fewer idiosyncratic turns of phrase — the exact statistical profile that detectors associate with AI output. In terms of perplexity and burstiness, writing by language learners looks like machine-generated text because both prioritize safe, high-probability word choices.310
Technical and academic writing
The same problem affects native English speakers who write in formal, structured styles. Technical documentation, legal prose, scientific writing, and formulaic academic essays all score low on burstiness and can score low on perplexity — because the genre demands uniform, precise language. Detectors have flagged portions of the U.S. Constitution and the King James Bible as AI-generated.3
High-achieving students
Perhaps the cruelest irony: studies have found an association between higher grades and higher false detection rates. Students who follow academic writing conventions diligently — removing fluff, standardizing structure, using precise vocabulary — are inadvertently optimizing their text to look like AI output.3
The arms race: evasion and humanizers
The detection problem is not static. As detection tools improve, evasion tools emerge in response. A growing industry of “AI humanizer” tools specifically advertises the ability to rewrite AI-generated text to bypass detection. And the evidence suggests they are often effective.
Independent benchmarks show that after a humanizer pass, detection rates drop by 15–35% across all major tools. For heavily paraphrased content, detection rates can fall to 50–65%, with considerable variation between tools. At that level of modification, no tool maintains reliable accuracy.411
Why evasion is easy
The fundamental reason evasion works is that the statistical signals detectors rely on — low perplexity, low burstiness, specific token distribution patterns — are fragile. They can be disrupted by:
Paraphrasing: Rewriting at the sentence level changes the token sequence while preserving meaning, destroying both perplexity-based and watermark-based signals.
Back-translation: Translating to another language and back introduces natural variation that mimics human burstiness.
Style injection: Prompting the AI to write “like a specific person” or “with deliberate errors” can shift perplexity upward.
Mixing: Combining human and AI text in a single document creates a chimera that most detectors struggle with.
Turnitin retrained specifically on humanizer outputs in August 2025, but detection of humanized text still falls significantly compared to unmodified AI output. The fundamental asymmetry persists: it is easier to perturb text than to detect perturbation.4
What comes next
The field is evolving in several directions simultaneously, driven by both technical innovation and regulatory pressure.
Regulatory landscape
Technical frontiers
Content Credentials (C2PA): The Coalition for Content Provenance and Authenticity, backed by Adobe and others, embeds provenance metadata directly into files at creation time. Unlike watermarking, C2PA tracks the full editing chain, not just the original generation. This approach is gaining traction for images and video, though text adoption remains limited.
Diffusion-based LLMs: New architectures like LLaDA generate text through diffusion processes rather than autoregressive token prediction. Early research shows these models produce text with perplexity and burstiness profiles that more closely mimic human writing, leading to high false-negative rates in existing detectors. This suggests that detection methods designed for autoregressive models may need fundamental rethinking.12
Hybrid detection: The most promising approaches combine multiple signals — statistical features, trained classifiers, stylometric analysis, citation verification, and factual consistency checks — rather than relying on any single method. The PAN 2026 shared task continues to push detector robustness against adversarial modifications.13
Summary
Here is what we know, stated plainly.
Detection works moderately well on unmodified AI text. The best tools achieve 85–95% accuracy on raw, unedited outputs from current LLMs. That is useful, but not definitive.
Detection degrades significantly under adversarial conditions. Paraphrasing, humanizer tools, back-translation, and mixed human-AI content all reduce accuracy to levels that many experts consider unreliable for high-stakes decisions.
False positives are a real and systematic problem. Non-native English speakers, formal writers, technical authors, and diligent students are disproportionately affected. Any deployment of detection tools must account for this.
Watermarking is the most promising technical direction, but it only works if providers deploy it and it remains fragile to paraphrasing attacks.
No tool should be used as the sole basis for consequential decisions. Detection scores are indicators, not proof. They belong in a workflow that includes human judgment, contextual assessment, and due process.
That may change. Provenance standards, universal watermarking, and better hybrid detectors may shift the balance. But for now, the most responsible approach is to use detection tools as one input among many, never as a verdict, and to invest as much in fair process as in better algorithms.
-
Dugan L, Hwang A, Trhlik F, et al. RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors. Proceedings of the 62nd Annual Meeting of the ACL. 2024. ↩ ↩2 ↩3
-
Hastewire. Understanding Perplexity and Burstiness in AI Text Detection. January 2026. ↩ ↩2
-
Pangram Labs. Why Perplexity and Burstiness Fail to Detect AI. March 2025. ↩ ↩2 ↩3 ↩4 ↩5
-
EyeSift. AI Detector Accuracy Benchmarks 2026. May 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Mitchell E, Lee Y, Khazatsky A, Manning CD, Finn C. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. Proceedings of ICML. 2023. ↩ ↩2
-
Bao G, Zhao Y, Zhong S, et al. Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. ICLR. 2024. ↩
-
Kirchenbauer J, Geiping J, Wen Y, et al. A Watermark for Large Language Models. Proceedings of ICML. 2023. ↩ ↩2
-
Google DeepMind. Watermarking AI-generated text and video with SynthID. 2024. ↩
-
SRI Lab, ETH Zurich. Probing Google DeepMind’s SynthID-Text Watermark. December 2024. ↩
-
Liang W, Yuksekgonul M, Mao Y, Wu E, Zou J. GPT detectors are biased against non-native English writers. Patterns. 2023;4(7):100779. ↩ ↩2
-
Originality.ai. AI Detection Accuracy Studies — Meta-Analysis of 14 Studies. April 2026. ↩
-
CurveMark: Detecting AI-Generated Text via Probabilistic Curvature and Dynamic Semantic Watermarking. Applied Sciences. 2025. ↩
-
Overview of PAN 2026: Voight-Kampff Generative AI Detection, Text Watermarking, Multi-Author Writing Style Analysis. 2026. ↩