The article reports on extensive testing of AI-text detectors, where the author evaluated around 10 different tools to determine how reliably they could distinguish between human-written and AI-generated text. It notes that in earlier rounds of testing the best accuracy was only 66%. The latest results showed that three detectors achieved perfect scores in the tests used—correctly identifying AI-generated text every time in that sample set.
The article goes on to caution, however, that perfect scores in a controlled test do not mean these detectors are infallible in all real-world conditions. It highlights that different detectors, even the ones that scored well, still produced inconsistent results across different writing styles, generations and contexts. It also notes issues with false positives—human-written text being flagged as AI-generated, especially when the writing is formal or follows patterns typical in AI-generated outputs.
Finally, the author emphasises that while detectors are improving, reliance on them alone is risky. They suggest a layered approach: instead of trusting detection tools blindly, organisations should combine technical checks with human review, author provenance and workflow controls. The article encourages caution in adopting policies based solely on detector scores.