Most now know that AI-generated responses can include false information; answer engines say as much directly in their user interfaces. But plagiarism is rarely mentioned - should it be?
Although responsible creators use generated copy sparingly (if at all), plagiarised elements could still seep through into their published work, potentially setting them up for a fall should the originality of said work come into question.
To assess the level of risk users face when employing AI answer engines for content creation, we pitted ChatGPT, Perplexity AI, and Google Gemini against multiple plagiarism detectors. Here’s what you should know.
Why plagiarism in generated responses would be problematic
Copyright infringement and potential legal risks for end users are the most immediate concerns when AI search tools produce plagiarised content. However, the repercussions could ripple further, disrupting the broader online ecosystem.
If generative AI presents identical text, it is stealing the unique value of the source material, meaning there’d be little reason for a user to engage with the creator’s work, thereby reducing traffic to websites.
Methodology
We fed a variety of prompt types into three of the most popular yet distinct AI answer engines.
Each prompt was used identically across all three answer engines, and every response was passed through three different plagiarism detectors.
Results were then assessed — and the level of plagiarism placed into one or more of 9 categories:
- Original: No flagged sources
- Incidental: Flagged source focuses on a completely unrelated topic
- Sufficient rephrasing but flagged: Effective rephrasing but close enough to the general angle of the source text that it still gets flagged
- Generic or universal phrasing: Words used exactly as they are in a relevant source - but not unique enough to be tied to any one piece of content.
- Conceptual paraphrasing: Text is effectively rephrased, but other conceptual elements are clearly traceable to a published source
- Insufficient rephrasing: Identical or similar wording used to express the same ideas but in a slightly different order
- Overt plagiarism (short sequences): Short sequences of identical or near-identical pieces of subject-relevant text
- Overt plagiarism (full sentences): Sentences that are identical or near-identical to subject-specific source text
- Overt plagiarism (entire passages): Full passages identical to subject-specific source text.
We then apply a scoring system to the categories to grade the overall plagiarism risk for each tested answer engine:
- Original = 0
- Incidental = 0
- Sufficient rephrasing but flagged = 0
- Generic or universal phrasing = 1
- Conceptual paraphrasing = 2
- Insufficient rephrasing = 3
- Overt plagiarism (short sequences) = 4
- Overt plagiarism (full sentences) = 5
- Overt plagiarism (entire passages) = 6
When a response falls into two categories, the more severe category is used for the calculation.
AI tools tested:
- ChatGPT
- Perplexity AI
- Google Gemini
Plagiarism detectors used:
- Grammarly Plagiarism Checker - Detects plagiarism but does not elaborate
- SmallSEOTools Plagiarism Checker - Detects plagiarism, determines between direct and partial plagiarism, and shows sources
- Duplichecker - Detects direct and partial plagiarism, shows percentage overlap with source material for each instance of plagiarism
Prompt categories:
- Broad general knowledge
- Niche general knowledge
- Broad academic/technical
- Niche academic/technical
- Broad creative
- Specific creative
- Direct request to rephrase existing text
The test
Round 1: Broad general knowledge
Prompt: ‘Using fewer than 1000 words, write an overview of WWII’
- ChatGPT
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 32% plagiarised
- Duplichecker: 32% plagiarised
Analysis
SmallSEOTools’ Plagiarism Checker revealed a couple of ChatGPT’s text sequences were extremely similar to those of a flagged source, noting some direct overlap.
Here’s one example of a problem sequence:
- Generated text: ‘As Allied forces closed in on Berlin, Hitler committed suicide on April 30, 1945.
- Source text: ‘As Allied forces closed in on Berlin, Hitler committed suicide on April 30, 1945, in his bunker.’
Although the direct overlap shows content was clearly taken from the source, it is unlikely to be considered plagiarism because it recounts a widely documented historical fact. The phrasing is generic and straightforward, describing an event with limited ways to express it concisely.
Verdict
Category 6: Generic or universal phrasing
- Perplexity AI
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 20% plagiarised — but no exact text matches
- Duplichecker: 25% plagiarised
Analysis
Plagiarism Checker caught a few vaguely similar phrases, but, considering the broadness of the topic and its content saturation online, we wouldn’t consider this plagiarism.
Here’s an example:
- Generated text: ‘The war began on September 1, 1939, when Germany invaded Poland, prompting Britain and France to declare war on Germany two days later.’
- Source text: ‘Britain and France declare war on Germany on 3 September 1939, two days after the German invasion of Poland.’
Duplichecker suggested the following breakdown for similarities between flagged text and its sources:
- iwm.org (50%)
- Cliffsnotes.com (17%)
- Nam.ac.uk (10%)
- Americanheritagemuseum.org (10%)
- Encyclopedia.ushmm.org (8%)
Interestingly, Perplexity only cited Nam.ac.uk and Encyclopedia.ushmm.org in its response. There could be several reasons for this, including a citation cap per response, or incidental use of common phrasing due to the broadness of the topic.
Verdict
Category 4: Sufficient rephrasing but flagged
- Google Gemini
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 15% plagiarised
- Duplichecker: 26% plagiarised
Analysis
With Gemini, we found small snippets of copied text within some of the response sentences, e.g.:
- Generated text: ‘World War II, a conflict of unprecedented scale and devastation, engulfed the globe from 1939 to 1945.’
- Source text: ‘This was the reality of WWII, a conflict that, from 1939 to 1945, engulfed the globe in a struggle for power, freedom, and survival.’
However, similar to ChatGPT's repetitive text, the language lacks unique expressiveness, making it unlikely to warrant plagiarism claims.
Verdict
Category 7: Generic or universal phrasing
Round 2: Niche general knowledge
Prompt: ‘Using fewer than 1000 words, describe the impact of the Great Fire of Smyrna.’
- ChatGPT
- Grammarly: No plagiarism
- Plagiarism Checker: No plagiarism
- Duplichecker: No plagiarism
Verdict
Category 1: Original
- Perplexity
- Grammarly: Minor plagiarism detected
- Plagiarism Checker: 6% plagiarised
- Duplichecker: 10% plagiarised
Analysis
Both Plagiarism Checker and Duplichecker flagged minor instances of generic phrasing — which would not be considered plagiarism. Neither detector found any exact text matches.
Verdict
Category 6: Generic or universal phrasing
- Google Gemini
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 13% plagiarised
- Duplichecker: 20% plagiarised
Analysis
Plagiarism Checker couldn’t find any exact text matches in the response to text in flagged sources, and the partial matches were questionable — in our opinion, the generated snippets were sufficiently rewritten.
Duplichecker found one text sequence that showed a 7% exact match with a flagged source, however, the source in question was completely unrelated to the subject. Therefore, the overlap was clearly incidental.
Verdict
Category 2: Incidental. And Category 3: Sufficient rephrasing but flagged
Round 3: Broad academic/technical knowledge
Prompt: ‘Using fewer than 1000 words, explain the basics of Newton’s laws of motion.’
- ChatGPT
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 61% plagiarised
- Duplichecker: 62% plagiarised
Analysis
Both advanced plagiarism detectors found significant overlaps between text in ChatGPT’s response and flagged sources.
A significant amount of the reported plagiarism was simply due to the terminology used to express and discuss Newton’s laws of motion. And to be fair to ChatGPT, it did enclose these sequences in quotation marks, but as the model in question didn’t cite sources, any extensive, exact wording could still potentially be an issue.
Beyond this grey area, there were also some examples of general rephrasing that didn’t go quite far enough to fully distinguish itself from sources, e.g.:
- Generated text: ‘Newton’s laws of motion, formulated by Sir Isaac Newton in the 17th century, are three fundamental principles that describe the relationship between the motion of an object and the forces acting upon it. These laws are foundational to classical mechanics.’
- Source text: ‘Isaac Newton came up with three laws of motion. These laws are foundational to classical mechanics and describe the relationship between the motion of an object and the forces acting upon it.’
Verdict
Category 5: Insufficient rephrasing
- Perplexity AI
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 52% plagiarised
- Duplichecker: 54% plagiarised
Analysis
While the detectors showed Perplexity’s response to contain fewer overlaps with source material than ChatGPT’s, it was by no means ready to be copied and pasted into a document for publishing.
Like ChatGPT, much of the overlap occurred when expressing Newton’s laws. As Perplexity does cite sources, this wouldn’t normally be an issue. However, for this query, the response was served without citations.
It was also clear that Perplexity had used online questions and answers from quiz sites such as Quizlet.com and Gauthmath.com to formulate examples of Newton’s laws, e.g.:
- Generated text: ‘Passengers in a car continue moving forward when the car suddenly stops.’
- Source text: ‘Why do passengers in a car continue moving forward when the car suddenly stops?’
That said, educational material like this might not fall under strict copyright protections, as the passenger scenario and others used are fairly common examples of Newton’s first laws.
Verdict
Category 4: Generic or universal phrasing
- Google Gemini
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 47% plagiarised
- Duplichecker: 59% plagiarised
Analysis
It’s much of the same for Gemini. While a lot of matches were highlighted, it did cite two of the sources flagged by Duplichecker.
Verdict
Category 5: Generic or universal phrasing
Round 4: Niche technical/academic knowledge
Prompt: ‘Using fewer than 1000 words, explain the role of quantum entanglement in quantum cryptography.’
- ChatGPT
- Grammarly: Minor plagiarism detected
- Plagiarism Checker: 10% plagiarised
- Duplichecker: 13% plagiarised
Analysis
Plagiarism Checker found one exact match, but the language was generic enough to be negligible, and reported partial overlaps were also insignificant, e.g.:
- Generated text: ‘This nonlocal correlation lies at the heart of the security guarantees offered by quantum cryptography.’
- Source text: ‘This sequential structure lies at the heart of the techniques used to prove the security of the DI-QKD protocols against general attacks.’
Verdict
Category 3: Sufficient rephrasing but flagged
- Perplexity AI
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 12% plagiarised
- Duplichecker: 8% plagiarised
Analysis
The detectors found very little overlap with sources overall, but 4% (one snippet) was an exact match — not egregious but still warrants attention:
- Generated text: ‘Despite its promise, quantum cryptography faces several challenges:’
- Source text: ‘Despite its promise, quantum cryptography faces several challenges that need to be addressed before it can be widely adopted.’
Perplexity does cite one of the two flagged sources, but not for the section of the response that contains the duplicate phrasing.
It’s unlikely this amount of duplication would be a serious problem, but it depends on the context. For instance, it would come under greater scrutiny if this text sequence was discovered in two different academic papers than it would in two different blog posts.
As it is a verbatim reuse of creative phrasing, we consider this a minor instance of plagiarism.
Verdict
Category 7: Overt plagiarism (short sequences)
- Google Gemini
- Grammarly: Minor plagiarism detected
- Plagiarism Checker: No plagiarism detected
- Duplichecker: No plagiarism detected
Analysis
Gemini appears to keep things clean when it comes to highly specific technical queries. Any light overlap found by Grammarly likely boils down to universal phrasing.
Verdict
Category 1: Original
Round 5: Broad creative task
Prompt: ‘Using fewer than 1000 words, write a short story about a hero on a quest.’
- ChatGPT
- Grammarly: No plagiarism detected
- Plagiarism Checker: 12% plagiarised
- Duplichecker: 12% plagiarised
Analysis
ChatGPT does a respectable job in terms of originality when it comes to broad creative tasks.
The 2% exact match was generic text traced to a Reddit thread.
Interestingly, the partial overlap was matched to an AI-generated story in an area of Story.com dedicated to generated content — a great example of how the more AI content there is online, the more homogenised it stands to become in the absence of sufficient human oversight.
Verdict
Category 2: Generic or universal phrasing
- Perplexity AI
- Grammarly: No plagiarism detected
- Plagiarism Checker: 16% plagiarism
- Duplichecker: 19% plagiarised
Analysis
With Perplexity’s response, we found a small amount of clear influence in flagged sources. Although wording had been well adjusted, when it comes to purely creative works, AI must also contend with theft of concept — which was the problem in this round.
Here’s an example:
- Generated content: ‘Yet, beneath his rugged exterior lay a heart yearning for adventure.’
- Source text: ‘But beneath his reserved exterior lay a restless spirit, yearning for the thrill of discovery beyond the walls of the city.’
The phrasing is different, but it’s still taking the structure, imagery, and conceptual framing of an idea from a human creator and serving it to users without warning.
Verdict
Category 5: Conceptual paraphrasing
- Google Gemini
- Grammarly: No plagiarism detected
- Plagiarism Checker: No plagiarism detected
- Duplichecker: 3% plagiarised
Analysis
Duplichecker found only generic, partial overlap with a single source.
Verdict
Category 4: Generic or universal phrasing
Round 6: Specific creative task
Prompt: ‘Using fewer than 1000 words, write a blog post entitled, ‘Expert SEO tips for 2025’.’
- ChatGPT
- Grammarly: Minor plagiarism detected
- Plagiarism Checker: 8% plagiarised
- Duplichecker: 16% plagiarised
Analysis
ChatGPT’s blog post contained only a couple instances of exact matches with flagged sources, one being:
- Generated text: ‘Leverage testimonials, case studies, and reviews to build trust.
- Source text: ‘Leverage testimonials, case studies, and reviews to build trust and generate enthusiasm among potential customers.’
The second flagged sequence was scraped from user-generated content on deviantart.com:
‘Identify pages with high bounce rates and optimise them for engagement.’
The question is… do these sentences contain elements of unique expression, or are they generic?
We’d suggest the latter, meaning this instance of duplication is likely a fair use exception.
Verdict
Category 4: Generic or universal phrasing
- Perplexity AI
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 35% plagiarised
- Duplichecker: 25% plagiarised
Analysis
This was another example of Perplexity failing to cite websites it had borrowed word-for-word phrasing from. However, like ChatGPT, these were relatively generic in the field of SEO, i.e.,
‘Implementing structured data to enhance visibility in voice search results.’
Most of Perplexity’s partial plagiarism (~ 10%) in this response can be forgiven, as much of it was attributed to sources.
Verdict
Category 4: Generic or universal phrasing
- Google Gemini
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 6% plagiarised
- Duplichecker: 9% plagiarised
Analysis
Gemini’s response featured fewer flagged sequences than ChatGPT or Perplexity, and they were all quite generic, i.e.:
‘Artificial intelligence (AI) and machine learning are transforming the way search engines operate.’
Verdict
Category 4: Generic or universal phrasing
Round 7: Direct request to rephrase published text
Prompt: ‘Rephrase the following published passage, creating original content that can be published on my blog.’ [text taken from NASA website]
- ChatGPT
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 0% plagiarised
- Duplichecker: 0% plagiarised
Analysis
Grammarly might use stricter standards for citation and originality, flagging content that could be considered insufficiently attributed, even if it technically doesn’t count as plagiarism under other definitions.
The other two detectors gave us the all clear — so we’re going with the majority here.
Verdict
Category 1: Original
- Perplexity AI
- Grammarly: No plagiarism detected
- Plagiarism Checker: 0% plagiarised
- Duplichecker: 0% plagiarised
Analysis
Despite having a few issues earlier in our tests, Perplexity really shines when it comes to direct rephrasing requests. Even Grammarly was satisfied that it was all unique content.
Verdict
Category 1: Original
- Google Gemini
- Grammarly: Significant plagiarism detected
- Plagiarism Checker: 0% plagiarised
- Duplichecker: 0% plagiarised
Analysis
Again, the Grammarly checker seems to think there is evidence of serious plagiarism, while the other two detectors conclude the text is 100% unique.
Verdict
- Category 1: Original
Results
Findings
- Yes, AI duplicates content, but…
All three of the AI we tested were guilty of replicating at least small sequences of text from source content, but it appears that they’re programmed - to some extent - to understand where the line is.
Where they fail to truly rephrase a sentence or snippet, they rarely cross into the realm of overt plagiarism. Rather, they operate in something of a grey area, taking scraps here and there that are either too generic or universal to send alarm bells ringing.
- The more saturated a subject is, the higher the likelihood of plagiarism
This shows that the less saturated a subject is online, and the more specific your query, the easier it is for generative AI to effectively rewrite text and present an original passage.
- Creative prompts show conceptual duplication risks
In creative tasks, while wording was often original, the structure and ideas sometimes mirrored human-created content. This highlights potential pitfalls in relying on AI for purely creative work.
- Direct rephrasing prompts are the most reliable
When explicitly asked to rephrase existing content, all three AI tools performed well, consistently producing text deemed original by plagiarism detectors.
What does this mean for your use of answer engines for content tasks?
Our tests suggest overt plagiarism is a rarity in AI generated responses, but less severe forms of duplication can still be problematic in certain circumstances. To avoid overlap with existing materials when using answer engines to generate text, increase your input.
AI chatbots aren’t only drawing information from what’s already out there online. They also use whatever information you supply directly — unique to your knowledge and your conception of the task at hand.
The more you offer in the way of guidance and personal flair, the more specific the requirements of the AI’s goal become, resulting in a significantly lower risk of plagiarised copy appearing in the response — and in your published work.
In other words, even though you may turn to AI to handle the heavy lifting when it comes to planning or drafting your content, it’s best to bring as much to the table as possible.
How do you protect your content from AI plagiarism?
If you’re a website owner concerned that AI might copy your content, there’s a simple solution — but it’s a double-edged sword.
By implementing a Robots Exclusion Protocol via a Robots.txt file, you can exclude specific crawlers from accessing your webpages.
The problem is, once AI answer engine crawlers are blocked from your content, they can’t include your information in their responses, therefore — your website will not be cited, and your visibility will suffer.
Based on our findings, we advise against excluding answer engine crawlers from your content, as the reward of visibility in these new avenues of search far outweigh the drawback of a few generic sentences being used without attribution.
Restricting access to your site as a matter of principle may actually end up stunting business growth.
Contact TDMP for expert digital marketing support
Wondering how AI search will impact your digital strategy? TDMP can help you navigate this significant shift in the online ecosystem. Contact us today to learn more about how we can optimise your digital presence for the AI era.