Neurosurgeons In Czech Republic Test Writing Capabilities Of ChatGPT
By Deborah Borfitz
August 16, 2023 | Researchers in the Czech Republic succeeded in using ChatGPT (Chat Generative Pre-trained Transformer) to produce a convincing scientific article in only one hour with no special human training—a feat intended to point out the current capabilities of large language models (LLMs), according to Martin Májovský, M.D., Ph.D., a researcher in the department of neurosurgery and neurooncology at the First Faculty of Medicine of Charles University in Prague. The use of LLMs in academic writing has been unexplored, making this one of the first reports of its kind.
The upside of LLMs is that they can legitimately take over some aspects of academic work, he points out, including study design, data collection, and manuscript preparation. LLMs can for example be employed for patient data mining from hospital information systems and, perhaps most usefully now, to do language editing for non-native speakers.
Current limitations of ChatGPT, powered by the GPT-3 (Generative Pre-trained Transformer 3), were revealed in the latest study published in the Journal of Medical Internet Research (DOI:10.2196/46924) where the model was used to generate a fraudulent scientific article related to neurosurgery. These included semantic inaccuracies and errors in the references, which could be overcome by newer LLMs if the training data includes more scientific articles to make fraud even harder to detect, Májovský says.
“We will witness fast development of LLMs in the near future,” he says, noting that new web-based applications dedicated to scientific writing are emerging. Currently available GPT-4 language models provide “significantly improved reasoning” relative to the upgraded version (3.5) of GPT-3.
The process of generating the article was initiated by posing the question, “Suggest relevant RCT in field of neurosurgery that is suitable for aim and scope of PLOS Medicine and would have high chance of acceptance.” ChatGPT replied by suggesting an article entitled, “Effectiveness of deep brain stimulation for treatment-resistant depression: a randomized controlled trial.”
With 12 additional prompts—starting with, “Now give me abstract according to open access articles on PLOS Medicine,” and ending with, “Can you create some charts? Can you provide datasheet for creating charts?”—the article was born. The research team then had the article reviewed by a senior professor of neurosurgery for accuracy and coherence, and consulted with a board-certified psychiatrist and a senior statistician with a medical degree to ensure that the content was relevant and true.
Spotting the Fakes
Májovský says he is confident that many researchers use LLMs with the “best intentions” for language editing and abstract creation. “We can only speculate if fraudulent articles have been published.”
Earlier this year, Northwestern University reported that even skeptical experts couldn’t spot all the fake abstracts written by ChatGPT. The concern is that ChatGPT will be used by so-called “paper mills” to produce fabricated scientific work for profit and others might build their science off those studies, unintentionally spreading the misinformation.
Publishers, for their part, are implementing preventive measures. Springer Nature Journals announced its ground rules about the ethical use of LLM in January, to forbid the use of an LLM tool as a credited author on a research paper and require researchers using LLM tools to document this use in the methods or acknowledgements sections.
“Probably humanities are more susceptible to frauds using LLMs than natural sciences,” says Májovský. His reasoning is because many more sources tend to be included for model training purposes.
Májovský and his colleagues conclude that increased vigilance and enhanced detection methods are needed to combat the potential misuse of AI in scientific research. Their suggestions include making the submission of datasets mandatory.
“Providing original datasets supports authenticity of research and some steps done by authors may be replicated [e.g., statistical analysis”],” Májovský says. “It does not prevent fraud per se, as the datasets can be generated as well, but it may discourage some people from doing so.”
In the belief that LLMs represent a disruptive technology changing the landscape of academic writing as well as many other fields, he says, his group is “actively participating in the ongoing debate in the medical community... [and] preparing more studies using LLMs.”
Their work, Májovský says, is evidence that “even ordinary doctors” (neurosurgeons) are interested in generative AI. It is their hope that science at large both appreciates the promise and potential risks of LLMs. History has shown that the risks associated with any new technology can invariably be identified and managed.