Artificial Intelligence and CME Training: Work in Progress with AI-CHECK
The AI-CHECK project has reached its second phase. We asked ChatGPT to develop the content of a CME course on acne, had it evaluated by a panel of experts, and compared it with the NICE guidelines for the treatment of the disease. Several critical issues emerged: the main weakness was a limited ability to deal with uncertainty and controversy. When AI is involved, very strict control of medical content is required, and expert supervision remains essential.
We have already discussed it on this website.
Zadig’s research project AI-CHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) – currently unique of its kind – is assessing the potential and limitations of artificial intelligence (AI) in the development of Continuing Medical Education (CME) training materials, with the aim of establishing guidance and best practices for its use.
ChatGPT under scrutiny
In the second phase of the AI-CHECK study, whose results were published in Dermatology Reports, ChatGPT was queried, using a rigorous methodology, on the management of acne, with the goal of transforming its answers into a CME course for general practitioners.
Acne was chosen because it is a common condition (affecting 9.4% of the world’s population) and because treatment protocols have not changed significantly in recent years. This helped reduce the risk that ChatGPT’s performance might be affected by misalignment between the sources it draws on and the most recent scientific literature.
Just like a student who must demonstrate their knowledge to a professor, ChatGPT was subjected to 23 questions on acne, selected by an expert dermatologist to provide comprehensive information on the management of the condition. The questions addressed to ChatGPT (prompts) were structured to include a common introduction specifying the intended audience and the language and tone to be used:
“We must develop an evidence-based distance-learning CME course on acne for general practitioners. You must provide high-level scientific information, using professional language suitable for physicians. Avoid generic statements and content written for a non-expert audience. Provide 4,000 characters…”
This was followed by the question on the specific topic.
ChatGPT was also asked to provide three up-to-date bibliographic references to support its answers. The reproducibility of ChatGPT’s performance was tested by repeating both the questionnaire and the request for references three times.
ChatGPT’s answers were independently assessed by five dermatology specialists on five aspects: quality, readability, accuracy, completeness, and consistency with the guidelines of the National Institute for Health and Care Excellence (NICE), using a 5-point Likert scale. Overall, ChatGPT’s responses received a positive rating (good or very good) in 87.8% of cases for quality; 94.8% for readability; 75.7% for accuracy; 85.2% for completeness; and 76.8% for guideline consistency.
However, one should not stop at these overall high scores, because they say nothing about some striking shortcomings. For example, ChatGPT failed to provide accurate answers about cutaneous adverse effects associated with isotretinoin and did not cite the European acne treatment guidelines among its relevant sources, referring only to the U.S. guidelines, even though both had been published in the same year (2016).
The bibliography was analysed according to three criteria: relevance, importance, and currency. The bibliographic references received a positive evaluation in 82.7% of cases, although they were often outdated. There were also hallucinations, all related to the citation of references with errors in authors, title, journal, year of publication, issue or page numbers, or a combination of these.
The verdict: “Could do better”
Overall, this dermatology-focused experiment suggests that ChatGPT is currently only a potentially useful tool for continuing medical education and that it still needs to improve in order to be reliable. The answers provided are clear and understandable, but sometimes incomplete or inaccurate.
In particular, it was observed that ChatGPT struggles with issues that the scientific community treats as uncertainties and controversies, aiming to resolve them as knowledge advances. In such cases, ChatGPT tends to always provide an answer – even if incorrect – rather than acknowledge that it cannot find one. This limitation could lead to the dissemination of incorrect information.
For now, human oversight remains essential to identify gaps and inconsistencies. Once again, faced with the entry of artificial intelligence into medical practice, the conclusion is that this reality should neither be denied nor demonised; rather, it should be understood in order to harness its potential and adopt a usefully critical approach.
Now the judgement passes to users
From 7 May to 7 September 2025, the free CME course “Acne in the age of ChatGPT” will be available on the Saepe platform (www.saepe.it). Physicians who choose it will find a dossier containing the series of questions and answers generated by ChatGPT, reviewed by one of Italy’s and the world’s leading acne experts, Vincenzo Bettoli, who is also the scientific director of the course. The corrections and changes (additions and deletions) have been left visible so that participants can see them and assess the AI’s performance. The two clinical cases used for the practical exercise and the CME assessment questionnaire included in the course were also drafted by ChatGPT and subsequently reviewed, corrected, and approved by the scientific director.
Participants in the course will be asked to answer a set of questions before and after completing it. These will explore, on the one hand, their overall attitude towards the use of artificial intelligence in medicine and, on the other, their opinions and impressions of the course itself.
The collection and analysis of the data from the delivery of the CME course will complete the third and final phase of AI-CHECK.


