The creator of the ChatGPT large language model (LLM) has quietly taken down the tool that it has been using to identify texts generated by AI because its accuracy level was too low.
OpenAI’s AI classifier tool, which was released in January, was touted to be able to tell whether text has been generated by ChatGPT or not. At launch, OpenAI hyped the tool as “significantly more reliable on text” compared to its previous attempts at building similar detection systems, but it admitted it would not be perfect.
“Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives),” OpenAI said as it announced the tool’s arrival in January.
The company explained that its classifier’s reliability, however, improved as the length of the input text increased. But in an update to a January statement about its launch, the company confirmed the AI classifier tool is no longer available as of July 20, adding: “We are working to incorporate feedback and are currently researching more effective provenance techniques for text.”
The release of the tool came after teachers and other professionals cited concerns about ChatGPT-fueled cheating and plagiarism, as well as inaccuracies in the content generated by the chatbot. In May, a US attorney was charged for using the chatbot for citations that turned out to be misleading and nonexistent in a case that he was handling.
This latest development compounds ChatGPT’s already jittery situation. A Stanford University study (pdf) released on July 19 on the precision of ChatGPT implies that the chatbot’s accuracy has decreased with the release of its latest versions - GPT-3.5 and GPT-4. Its accuracy at solving maths problems has fallen from 97.6% to 2.4%, with the report warning that this accuracy could get worse. “This highlights the need to continuously evaluate and assess the behavior of LLMs in production applications.”
After a meteoric rise in popularity late since its launch last November, ChatGPT’s web traffic recorded its first drop, a fall of 9.7% from May to June. OpenAI’s move to pull the AI detector tool might drive the numbers further down, taking steam out of the much-heralded AI boom.