What Are the Best AI Detectors? • LITFL • Artificial Intelligence

6 Popular AI Detectors Are Put to the Test!

Detecting content written by artificial intelligence (AI) is a hot topic, and if you’re anything like us, you might be keen to know which are the best AI detectors out there. After all, they could be handy for answering questions like…

Is your natural writing style likely to result in false accusations of using AI to write on your behalf?
Are your students using AI to cheat?
If you use AI to write for you, will people know that you did?

We reviewed six leading AI detectors to see how they perform. We put them to the test against a dozen articles (eight human-generated articles and four AI-generated articles). After organising the results, clear patterns emerged—and you can use them to help you determine whether or not the content you’re reading was probably written by a human.

This knowledge is not just academic. It’s also crucial in practical scenarios, to ensure you don’t end up like this university professor who went viral for all the wrong reasons. (He incorrectly accused half his class of using AI, which put them at risk of failing his course.)

Safe use of AI in medicine

If you’d like to be in the know about how to use AI safely in medicine (in a way that won’t make you go viral for all the wrong reasons!), Medmastery has a free course for you: ChatGPT Essentials! Sign up for a trial account to get access to the entire ChatGPT Essentials course! You’ll also get access to selected webinars, plus, the first chapters of over 120 additional accredited courses and workshops!

Key things you need to know when using tools for detecting AI content

Never put 100% of your trust in these tools. They aren’t perfect.
As the large language models we use for writing get smarter, they may also get better at avoiding detection.
Some people have a writing style that’s more likely to result in incorrect accusations that they used AI! So, use caution when interpreting results from these tools.

Last, but not least, it goes without saying that you need to read the tool’s documentation to verify that the results actually mean what you think they mean. Often it’s intuitive, but not always.

Method

AI detectors reviewed

Here are the six popular AI content detectors that we tested:

Articles reviewed

Human-generated articles: We tested 8 pieces of content written by 7 human authors on a variety of topics. Six articles were written for physicians; one for the lay person…and one article had nothing to do with medicine at all.

We were curious if the choice of topic or degree of technicality would make a difference to whether the detectors could correctly identify authorship. All articles were written before ChatGPT was released to the public in November 2022.

8 Human-generated articles tested

Article Number	Topic	Intended audience	Year of Publication
1	Shoulder dislocations	Healthcare professionals	2020
2	Spinal infections	Healthcare professionals	2021
3	Swine flu pandemic	Healthcare professionals	2019
4	Diuretics	Healthcare professionals	2015
5	Choosing medicine as a career	Healthcare professionals	2014
6	The common cold	Lay people	2016
7	ECG handout	Healthcare professionals	2017
8	Travel planning	Lay people	2012

4 AI-generated articles tested

Article Number	AI author	Topic	Comments
9	ChatGPT	Common cold	AI rewrite of article #6 above.
10	ChatGPT	Spinal infections	An original ChatGPT creation.
11	Gemini	Spinal infections	An original Gemini creation.
12	Gemini	Spinal infections	We asked Gemini to rewrite its original spinal infection article (#11) in a way that would evade AI detectors.

Results

Interpreting results from AI-content detection tools

Generally speaking, AI detectors analyse the text you provide and then indicate the probability that a human or an AI generated the text.

For example, if the detector says “50% AI”, that doesn’t mean an AI wrote half the text. What it actually means is that the tool thinks there’s a 50% chance an AI wrote the text and a 50% chance a human wrote the text. In other words, the tool isn’t very sure about who (or what) wrote

Below are the results for the ‘percentage probability‘ of content within each article being AI generated ranging from HUMAN (AI 0%) to Artificial intelligence (AI 100%)

Human articles

Human Article	Sapling	GPTZero	Content at Scale	Copyleaks	Originality.ai	Undetectable AI
1	AI: 57.1%	AI: 0%	human	human	AI: 0%	human
2	AI: 50.6%	AI: 2%	human	human	AI: 96%	human
3	AI: 3.6%	AI: 3%	human	human	AI: 6%	human
4	AI: 2.1%	AI: 1%	human	human	AI: 3%	AI
5	AI: 0%	AI: 2%	human	human	AI: 0%	AI
6	AI: 28.1%	AI: 1%	human	human	AI: 29%	AI
7	AI: 0%	AI: 1%	human	human	AI: 0%	human
8	AI: 3.9%	AI: 1%	human	human	AI: 0%	human

Artificial Intelligence generated articles

AI Article	Sapling	GPTZero	Content at Scale	Copyleaks	Originality.ai	Undetectable AI
9	AI: 100%	AI: 89%	human	human	AI: 100%	AI
10	AI: 99.7%	AI: 83%	“hard to tell”	“AI content detected”	AI: 96%	AI
11	AI: 100%	AI: 100%	human	“AI content detected”	AI: 100%	AI
12	AI: 99.7%	AI: 81%	human	“AI content detected”	AI: 100%	human

Conclusion

The best AI detectors

The most accurate AI content detector

Based on our testing, GPTZero was the most accurate for detecting AI content as it correctly identified the origin of all eight human-generated articles and all four AI-generated articles.

The runner up

Copyleaks was almost flawless. It correctly classified all eight pieces of human-generated content. And only one of the AI-generated articles fooled it.

IMPORTANT: Our test was relatively small, so please don’t use this info to assume you can completely trust the results from this—or any—AI detector. Our sample size was relatively small so it’s quite possible that even GPTZero may have eventually made mistakes if we fed it enough articles.

You can increase the likelihood of coming to an accurate conclusion about the origin of an article if you run it through multiple AI detectors… but even if multiple detectors predict it’s likely AI-generated, we’d merely classify that content as “highly suspicious” until we could get more evidence.

3 recommended uses of AI detectors

1. When reading content on an unfamiliar website, you can use tools for detecting AI content to help determine whether a human (preferably with experience in the subject matter!) likely wrote the content.

Our intention isn’t to say that AI-generated content is inherently bad. However, without human oversight it may contain errors, and you need to know whether you should be on “red alert” for them. For example, here’s a viral case where AI-generated tutorials contained instructions about software features that don’t even exist. That would be a mere annoyance for software users. But if we were to use a similar approach to generating medical content, results could obviously be disastrous and even life-threatening.

2. When evaluating someone else’s writing, you may find AI detectors useful in helping you figure out whether or not they had an AI do the writing for them. However, remember not to completely put your trust in any AI-detector because they sometimes make mistakes.

3. Finally, you may find it useful to put your own writing through an AI detector just to see how these tools classify it. After all, if other people might look, you might as well know what they’re going to find!

References

Educational Resources

Guilleminot S. AI in Healthcare. LITFL

AI in HEALTHCARE

additional resources…

Want to become a pro at prompting, and consistently get usable results? Be sure to check out Medmastery’s AI prompting course. Learn techniques to apply to the plethora of AI resources in constant development.

Sheralyn Guilleminot

BSc.Pharm (University of Manitoba), Pharmacist and Medical Writer

Mike Cadogan

BA MA (Oxon) MBChB (Edin) FACEM FFSEM. Emergency physician, Sir Charles Gairdner Hospital. Passion for rugby; medical history; medical education; and asynchronous learning #FOAMed evangelist. Co-founder and CTO of Life in the Fast lane | Eponyms | Books | Twitter |