May 24, 2026

On Detecting AI Writing

Why detecting AI text matters

In 2023, GPT-4 was released. By 2024, every major LLM provider had models that could produce fluent, structured, contextually appropriate prose on nearly any topic. By 2025, the RAID benchmark — the largest independent evaluation of AI text detectors — showed that LLM-generated text had been able to fool humans at rates above 77% for several years running.¹

This creates a real problem across multiple domains. In education, institutions need to evaluate whether student work reflects genuine understanding. In publishing, editors need to know whether submitted articles are original. In regulation and law, the provenance of a document can matter enormously. In information ecosystems more broadly, the ability to distinguish human expression from machine generation has become a question about the integrity of public discourse.

77%

Peak human accuracy at identifying AI text

10M+

Documents in the RAID benchmark

85%

Best detector accuracy on RAID at 5% FPR

The challenge is this: the same qualities that make AI writing useful — fluency, coherence, grammatical correctness — are exactly the qualities that make it hard to detect. And the stakes of getting detection wrong cut in both directions. A false negative lets AI-generated content pass as human. A false positive accuses a human writer of cheating.

How detection actually works under the hood

There is no single method for detecting AI-generated text. Instead, the field has developed several distinct families of approaches, each with different assumptions, strengths, and failure modes. Understanding these families is essential before evaluating any commercial tool.

Statistical metrics

Measure text properties like perplexity and burstiness against known baselines for human vs. AI writing.

No training needed

Trained classifiers

Neural networks (often fine-tuned transformers) trained on labelled datasets of human and AI text.

Supervised

Zero-shot detection

Use the generating model itself to assess whether text occupies a "likely" region of its output space.

Model-based

Watermarking

Embed imperceptible statistical signals during generation that can later be detected with a secret key.

Proactive

In practice, most commercial detection tools combine multiple approaches. But each one has a distinct logic, and the limitations of one are not always offset by the strengths of another.

Perplexity: the surprise metric

Perplexity is the most fundamental concept in AI text detection. It measures how predictable a piece of text is to a language model. Formally, perplexity is the exponentiated average negative log-likelihood of the tokens in a sequence:

PPL(x) = exp( -1/N Σ log P(x_i | x_1, ..., x_{i-1}) )

In simpler terms: for each word in a passage, a language model assigns a probability to that word given everything that came before it. If the model finds the word highly probable, the log-probability is close to zero. If the word is surprising, the log-probability is very negative. Perplexity aggregates these surprises into a single number.²

Key insight AI-generated text tends to have low perplexity because language models generate text by choosing high-probability tokens. Human writing, with its idiosyncratic word choices, tangents, and stylistic quirks, tends to have higher perplexity.

This is the core detection signal. When you run a passage through a reference language model (typically GPT-2, since it’s openly available), AI-generated text clusters toward the low-perplexity end, while human text spreads more broadly across higher perplexity values.

Figure 1. Schematic perplexity distributions. AI text clusters low; human text is more spread. The overlap zone is where detection errors concentrate.

But here is the critical problem: the distributions overlap. Human text that happens to be clean, formal, and predictable — academic prose, technical writing, formulaic business emails — can score just as low on perplexity as AI-generated text. And AI text generated with high temperature or creative sampling can look surprisingly “human” in its perplexity profile.³

Burstiness: the rhythm metric

If perplexity measures how surprising each word is, burstiness measures how much that surprise varies across a passage. Human writing naturally “bursts” — a long complex sentence followed by a punchy short one, a technical paragraph followed by a casual aside. This creates high variance in sentence-level metrics.

AI-generated text tends to be more metronomic. Sentence lengths cluster around a narrower band. Structural patterns repeat. The rhythm is even, almost suspiciously so.²³

How detectors combine them Most detection tools compute a composite score from perplexity and burstiness together. A passage that is both low in perplexity and low in burstiness is much more likely to be flagged than one that shows either signal alone. This dual approach reduces false positives — but does not eliminate them.

Trained classifiers

The second major family of detection methods uses supervised machine learning. You take a large dataset of human-written text and AI-generated text, label them, and train a classifier — typically a fine-tuned transformer model like RoBERTa — to distinguish between the two classes.¹⁴

This is the approach behind many commercial detectors. GPTZero, for instance, combines statistical features (perplexity, burstiness) with a trained transformer backbone. Turnitin integrates a classifier into its existing plagiarism infrastructure. Originality.ai trains on specific model outputs and retrains as new LLMs emerge.⁴

The strength of this approach is that classifiers can learn subtle stylistic patterns that simple metrics miss — token-level distributions, syntactic preferences, discourse-level structures. The weakness is that they can overfit to the models they were trained on. A classifier trained heavily on GPT-3.5 outputs may struggle with Claude or Gemini. And when a new model generation arrives, the classifier’s accuracy can drop until it is retrained.

A common misconception When a detector reports "94% likely AI-generated," most people read that as "there is a 94% chance this is AI." That is not what it means. The score is the model's internal confidence, not the posterior probability of AI authorship. Interpreting it as a probability requires knowing the base rate of AI text in the population being tested — something most tools do not account for.

Zero-shot methods: DetectGPT and its descendants

In 2023, Mitchell and colleagues at Stanford introduced DetectGPT, a fundamentally different approach. The core insight was elegant: if a passage was generated by a particular language model, then small perturbations of that passage should tend to decrease the model’s log-probability. For human-written text, perturbations could go either way.⁵

DetectGPT works by generating many slight rewrites of the suspect text (using a separate model like T5 to fill in masked words), then comparing the log-probabilities of the original against these perturbations. If the original consistently scores higher, it is likely sitting at a local maximum of the model’s probability surface — a telltale sign of machine generation.

Figure 2. DetectGPT intuition. AI text sits at a local probability peak; perturbations fall downhill. Human text does not show this pattern.

This approach achieved strong results — 0.95 AUROC for detecting GPT-NeoX generated news articles — but was computationally expensive, requiring around 100 model calls per passage. Fast-DetectGPT (Bao et al., 2024) replaced the perturbation step with a single conditional probability curvature computation, achieving comparable accuracy 340× faster.⁶

The fundamental limitation of zero-shot methods is their vulnerability to adversarial modification. Simple paraphrasing can reduce DetectGPT’s detection rate from over 70% to near chance levels.⁵

Watermarking: detecting by design

All the methods discussed so far are post-hoc: they try to detect AI text after it has been generated. Watermarking takes a fundamentally different approach: it embeds a detectable signal during generation, so that detection later becomes a matter of checking for the signal rather than guessing about authorship.⁷

The green-list / red-list method

The most influential watermarking scheme was proposed by Kirchenbauer et al. in 2023. Before generating each token, the method uses a hash of the preceding token to divide the vocabulary into a “green list” and a “red list.” It then adds a small bias to the logits of green-list tokens, making them slightly more likely to be selected. Over many tokens, this creates a statistical excess of green-list tokens that is invisible to human readers but detectable with the right key.⁷

Figure 3. Kirchenbauer et al. watermarking. Each token's predecessor determines a green/red vocabulary split via hashing. Green tokens get a logit boost. The statistical excess is invisible to readers but detectable with the hash key.

Google DeepMind’s SynthID

SynthID, developed by Google DeepMind, extends watermarking to production scale. Deployed in Google’s Gemini models since 2024 and open-sourced in late 2024, SynthID uses tournament sampling with pseudorandom functions to guide token generation in a way that creates a detectable statistical signature without retraining the model.⁸

SynthID is notable for being the first watermark deployed at production scale in a commercial LLM product. However, independent assessment has shown that it remains vulnerable to meaning-preserving attacks like paraphrasing and back-translation, and is “easier to scrub than other state-of-the-art schemes even for naive adversaries.”⁹

The fundamental watermark limitation Watermarks only work for text that you generated. They require control over the generation process. You cannot retroactively watermark text from a model you don't control. And any sufficiently thorough paraphrase or rewrite can remove the signal.

Code demo: computing perplexity

To make these concepts concrete, here is a working Python example that computes the per-token log-probability and overall perplexity of a text passage using GPT-2. This is essentially what the statistical layer of many detection tools does under the hood.

import torch
import math
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
model.eval()

def compute_perplexity(text: str) -> dict:
    tokens = tokenizer.encode(text, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(tokens, labels=tokens)
        loss = outputs.loss.item()
        logits = outputs.logits
    
    shift_logits = logits[:, :-1, :]
    shift_labels = tokens[:, 1:]
    log_probs = torch.log_softmax(shift_logits, dim=-1)
    token_log_probs = log_probs.gather(
        2, shift_labels.unsqueeze(-1)
    ).squeeze(-1)[0]
    
    token_strings = [
        tokenizer.decode([t]) for t in tokens[0][1:]
    ]
    
    perplexity = math.exp(loss)
    
    return {
        "perplexity": round(perplexity, 2),
        "avg_log_prob": round(-loss, 4),
        "tokens": [
            {"token": tok, "log_prob": round(lp.item(), 4)}
            for tok, lp in zip(token_strings, token_log_probs)
        ],
    }

ai_text = (
    "The impact of climate change on global agriculture "
    "is a significant concern for policymakers and "
    "researchers worldwide. Rising temperatures and "
    "changing precipitation patterns are expected to "
    "affect crop yields in many regions."
)

human_text = (
    "Look, I know everyone keeps saying the farms are "
    "in trouble. And yeah, they probably are. But my "
    "uncle's been growing corn since '84 and he says "
    "the real problem isn't the heat — it's the banks."
)

result_ai = compute_perplexity(ai_text)
result_human = compute_perplexity(human_text)
print(f"AI-like perplexity:    {result_ai['perplexity']}")
print(f"Human-like perplexity: {result_human['perplexity']}")
print(f"Ratio: {result_human['perplexity'] / result_ai['perplexity']:.1f}x")

What to expect The "AI-like" text (formal, predictable, generic) will typically score a perplexity of ~30–60. The "human-like" text (idiosyncratic, conversational, contains proper nouns and contractions) will typically score ~80–200 or higher. The ratio between them illustrates the core detection signal. You can extend this demo by adding a burstiness calculation — computing the variance of sentence-level perplexity scores — and combining the two into a simple classifier. The full code is available on GitHub and Google Colab.

The platforms: do commercial detectors work?

Multiple companies now offer AI text detection as a service. The three most widely used are GPTZero, Turnitin, and Originality.ai, each built for different contexts and each making different trade-offs between sensitivity and false-positive rates.⁴

Platform	Primary audience	Detection rate	False positive rate	RAID rank
GPTZero	Educators, individuals	~88–95%	~0.2–2%	Top 3
Turnitin AI	Institutions (bundled)	~77–98%	Variable	Strong
Originality.ai	Content marketing, SEO	~85–94%	~2–3%	#1 overall
Winston AI	General, publishing	~96%	~3–4%	Good
Copyleaks	Enterprise, multilingual	~99% claimed	~0.03%	Moderate
ZeroGPT	Free, casual use	~80–85%	~15–20%	Lower

"No detection tool is a substitute for editorial process, writing guidelines, and writer accountability."

What the RAID benchmark tells us

The RAID benchmark (Dugan et al., 2024) is the most rigorous independent evaluation of AI text detectors. It contains over 10 million documents spanning 11 generative models, 8 writing domains, 12 adversarial attacks, and 4 decoding strategies. Its key findings are sobering.¹

First, many detectors that claim 99%+ accuracy on their own test sets perform far worse on RAID. At a 5% false positive rate, the best detector achieved about 85% accuracy overall. Second, simple adversarial modifications — changing the sampling strategy, adding a repetition penalty, or running text through a paraphraser — caused substantial drops in detection accuracy across all tools tested. Third, detectors showed particular difficulty generalizing to unseen models and domains they were not trained on.

The RAID lesson If a detector hasn't been evaluated on RAID, be skeptical of its accuracy claims. Any number quoted without a specified false-positive rate is essentially meaningless for comparison purposes.

Curtin University drops Turnitin

In 2026, Curtin University in Australia became one of the most prominent institutions to stop using Turnitin’s AI detection module, citing ongoing concerns about reliability. This decision reflects a growing pattern: as the stakes of false accusations become clearer, institutions are pulling back from automated detection as a disciplinary tool.⁴

The false positive problem

If there is a single issue that undermines trust in AI text detection, it is false positives. A false positive means flagging genuinely human-written text as AI-generated. And the evidence for systematic bias in this direction is concerning.

Non-native English speakers

A 2023 study affiliated with Stanford found that seven AI detectors flagged 61% of essays by non-native English speakers as AI-generated. On roughly 20% of those essays, all seven detectors agreed the text was AI — and they were all wrong.¹⁰

The mechanism is straightforward. Non-native English writers tend to use simpler vocabulary, more uniform sentence structures, and fewer idiosyncratic turns of phrase — the exact statistical profile that detectors associate with AI output. In terms of perplexity and burstiness, writing by language learners looks like machine-generated text because both prioritize safe, high-probability word choices.³¹⁰

61%

ESL essays falsely flagged as AI

5–15%

Overall false positive rate on human essays

Technical and academic writing

The same problem affects native English speakers who write in formal, structured styles. Technical documentation, legal prose, scientific writing, and formulaic academic essays all score low on burstiness and can score low on perplexity — because the genre demands uniform, precise language. Detectors have flagged portions of the U.S. Constitution and the King James Bible as AI-generated.³

High-achieving students

Perhaps the cruelest irony: studies have found an association between higher grades and higher false detection rates. Students who follow academic writing conventions diligently — removing fluff, standardizing structure, using precise vocabulary — are inadvertently optimizing their text to look like AI output.³

The equity concern Detection tools that systematically disadvantage non-native speakers, disabled writers, and diligent students are not just inaccurate. They raise fundamental questions about fairness, especially when used as evidence for academic misconduct proceedings. Multiple institutions, including Michigan State, Yale, and Penn State, have explicitly stated that AI detection results should never be the sole basis for disciplinary action.

The arms race: evasion and humanizers

The detection problem is not static. As detection tools improve, evasion tools emerge in response. A growing industry of “AI humanizer” tools specifically advertises the ability to rewrite AI-generated text to bypass detection. And the evidence suggests they are often effective.

Independent benchmarks show that after a humanizer pass, detection rates drop by 15–35% across all major tools. For heavily paraphrased content, detection rates can fall to 50–65%, with considerable variation between tools. At that level of modification, no tool maintains reliable accuracy.⁴¹¹

Why evasion is easy

The fundamental reason evasion works is that the statistical signals detectors rely on — low perplexity, low burstiness, specific token distribution patterns — are fragile. They can be disrupted by:

Paraphrasing: Rewriting at the sentence level changes the token sequence while preserving meaning, destroying both perplexity-based and watermark-based signals.

Back-translation: Translating to another language and back introduces natural variation that mimics human burstiness.

Style injection: Prompting the AI to write “like a specific person” or “with deliberate errors” can shift perplexity upward.

Mixing: Combining human and AI text in a single document creates a chimera that most detectors struggle with.

Turnitin retrained specifically on humanizer outputs in August 2025, but detection of humanized text still falls significantly compared to unmodified AI output. The fundamental asymmetry persists: it is easier to perturb text than to detect perturbation.⁴

What comes next

The field is evolving in several directions simultaneously, driven by both technical innovation and regulatory pressure.

Regulatory landscape

March 2024

U.S. bipartisan bill introduced requiring clear labeling of AI-generated video, audio, and images with digital watermarks or metadata.

March 2025

Spain passed one of Europe's strictest AI labeling laws, with fines up to €35 million or 7% of global revenue for failure to label AI content.

May 2025

YouTube began requiring creators to disclose when generative AI significantly alters or simulates realistic content.

January 2026

California's AI Transparency Act (SB 942) took effect, introducing "latent disclosure" — invisible digital markers embedded in AI-generated images.

Technical frontiers

Content Credentials (C2PA): The Coalition for Content Provenance and Authenticity, backed by Adobe and others, embeds provenance metadata directly into files at creation time. Unlike watermarking, C2PA tracks the full editing chain, not just the original generation. This approach is gaining traction for images and video, though text adoption remains limited.

Diffusion-based LLMs: New architectures like LLaDA generate text through diffusion processes rather than autoregressive token prediction. Early research shows these models produce text with perplexity and burstiness profiles that more closely mimic human writing, leading to high false-negative rates in existing detectors. This suggests that detection methods designed for autoregressive models may need fundamental rethinking.¹²

Hybrid detection: The most promising approaches combine multiple signals — statistical features, trained classifiers, stylometric analysis, citation verification, and factual consistency checks — rather than relying on any single method. The PAN 2026 shared task continues to push detector robustness against adversarial modifications.¹³

The provenance shift The long-term trend may be away from detection (guessing after the fact) and toward provenance (proving origin at creation). If every major LLM provider embeds watermarks, and every major platform checks for them, the detection problem partially dissolves into an infrastructure problem. But this requires industry-wide coordination that does not yet exist.

Summary

Here is what we know, stated plainly.

Detection works moderately well on unmodified AI text. The best tools achieve 85–95% accuracy on raw, unedited outputs from current LLMs. That is useful, but not definitive.

Detection degrades significantly under adversarial conditions. Paraphrasing, humanizer tools, back-translation, and mixed human-AI content all reduce accuracy to levels that many experts consider unreliable for high-stakes decisions.

False positives are a real and systematic problem. Non-native English speakers, formal writers, technical authors, and diligent students are disproportionately affected. Any deployment of detection tools must account for this.

Watermarking is the most promising technical direction, but it only works if providers deploy it and it remains fragile to paraphrasing attacks.

No tool should be used as the sole basis for consequential decisions. Detection scores are indicators, not proof. They belong in a workflow that includes human judgment, contextual assessment, and due process.

The real question is not "can we detect AI writing?" It is "can we detect it reliably enough, across diverse populations and adversarial conditions, to base real consequences on the result?" The honest answer, as of mid-2026, is: not yet.

That may change. Provenance standards, universal watermarking, and better hybrid detectors may shift the balance. But for now, the most responsible approach is to use detection tools as one input among many, never as a verdict, and to invest as much in fair process as in better algorithms.

Dugan L, Hwang A, Trhlik F, et al. RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors. Proceedings of the 62nd Annual Meeting of the ACL. 2024. ↩ ↩² ↩³
Hastewire. Understanding Perplexity and Burstiness in AI Text Detection. January 2026. ↩ ↩²
Pangram Labs. Why Perplexity and Burstiness Fail to Detect AI. March 2025. ↩ ↩² ↩³ ↩⁴ ↩⁵
EyeSift. AI Detector Accuracy Benchmarks 2026. May 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Mitchell E, Lee Y, Khazatsky A, Manning CD, Finn C. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. Proceedings of ICML. 2023. ↩ ↩²
Bao G, Zhao Y, Zhong S, et al. Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. ICLR. 2024. ↩
Kirchenbauer J, Geiping J, Wen Y, et al. A Watermark for Large Language Models. Proceedings of ICML. 2023. ↩ ↩²
Google DeepMind. Watermarking AI-generated text and video with SynthID. 2024. ↩
SRI Lab, ETH Zurich. Probing Google DeepMind’s SynthID-Text Watermark. December 2024. ↩
Liang W, Yuksekgonul M, Mao Y, Wu E, Zou J. GPT detectors are biased against non-native English writers. Patterns. 2023;4(7):100779. ↩ ↩²
Originality.ai. AI Detection Accuracy Studies — Meta-Analysis of 14 Studies. April 2026. ↩
CurveMark: Detecting AI-Generated Text via Probabilistic Curvature and Dynamic Semantic Watermarking. Applied Sciences. 2025. ↩
Overview of PAN 2026: Voight-Kampff Generative AI Detection, Text Watermarking, Multi-Author Writing Style Analysis. 2026. ↩

Back to blog