AI Can Write Create a CME Module. But That Doesn't Mean It Can Write a Good One.

Imelda Wei Ding Lo
2 days ago
8 min read

Two continuing medical education (CME) slides side by side: one manually made by the author and one AI-generated. — I compared a manually developed CME suite with one generated by Claude Sonnet 4.6 in a single pass. The output looked finished. It wasn't.

Like many other professionals, medical writers and continuing medical education (CME) developers are under increasing pressure to incorporate AI into their production workflows. The argument in favor is straightforward: upload your source documents, generate a first draft, and just edit from there. Under deadline pressure, that workflow is difficult to resist.

However, while AI works fast, speed isn’t the same as efficiency. AI-generated content introduces interpretive risk— the risk that audiences understand regulated material in ways that exceed what the underlying evidence supports. Addressing that risk requires writers to retrace citations, verify claims, and reassess framing decisions they didn’t originally make. In practice, this review process can take longer than developing the material from scratch.

To examine this tradeoff in a concrete context, I ran a structured comparison using the source documents behind a simulated CME suite on lipedema in Canadian primary care. I developed the suite—which consists of a working literature review, a slide deck, and an accreditation-style activity package—manually. I then used the same source documents to generate parallel versions with Claude Sonnet 4.6, allowing each document to build sequentially on prior AI-generated output.

For this experiment, I deliberately avoided iterative prompting and corrections between documents. My goal was to evaluate what AI output can look like before making a significant editorial investment, and what it costs a writer who treats that initial output as close to complete.

Claude AI’s Literature Review: Everything Is There, But Not Everything Belongs

To generate the AI literature review, I used the following prompt:

You are assisting a medical writer developing a CME literature review for Canadian family physicians. The topic is lipedema, with a focus on recognition and initial assessment in primary care. Using the attached source documents, produce a structured internal working literature review covering: clinical definition and staging, pathophysiology, diagnosis and differential diagnosis, psychosocial impact and patient experience, management and treatment options, evidence gaps and limitations, and a practice gap and educational needs assessment. The intended audience is Canadian family physicians. Length should be approximately 8 pages. Use Vancouver citation style. Cite only the provided sources.

This prompt is detailed enough for Claude to define scope, structure, and audience. At the same time, it leaves enough room to evaluate what AI can produce with minimal editorial intervention.

What came back was a 12-page document—compared to the 8-page version I developed manually—that covered every requested section thoroughly, perhaps too thoroughly. Ultimately, the problem wasn’t what the AI included, but what it didn’t filter out.

Citations

The first issue was citation placement. Instead of placing them after the claim so readers and reviewers can identify exactly which source supports which claim, the AI stacked citations at the ends of paragraphs.

For example, a paragraph making four distinct clinical assertions only gets one citation at the very end: (4,11). As such, which source supports which claim is unclear, and a reviewer can’t trace a claim to its sources without re-reading every cited paper and reconstructing the assignment manually.

Scope

The AI-generated literature review did not consistently tailor the content to the intended audience: Canadian family physicians encountering lipedema in primary care settings. As a result, it included material at an inappropriate level of depth, such as detailed discussions of genetic mechanisms that may lead to lipedema.

This level of detail isn’t necessary for family doctors learning about how to recognize or assess lipedema in the primary care setting.

Structure

The AI literature review concluded with a section titled “Practice Gap and Educational Needs Assessment,” which included numbered practice gaps and learning objectives.

This content does not belong in a literature review. Instead, it belongs in the instructional components of a CME activity, such as the slide deck or accreditation package. The literature review’s function is to establish the evidence base, not to translate the evidence into educational design.

The AI likely added this section because it was pattern-matching the concept of a “CME document” without distinguishing between research synthesis and instructional development.

Claude AI’s Slide Deck: The Format Was Right, But the Judgment Wasn't

To generate the AI slide deck, I uploaded the AI literature review to Claude Sonnet 4.6, and used the following prompt:

You are assisting a medical writer developing a CME slide deck for Canadian family physicians. The topic is lipedema, with a focus on recognition and initial assessment in primary care. Using the lit review attached, produce a structured CME slide deck of approximately 25 to 30 slides. The deck should include: a title slide with a faculty disclosure placeholder, a learning objectives slide with three to four measurable objectives written using Bloom's taxonomy action verbs, content sections covering clinical definition and staging, pathophysiology, diagnosis and differential diagnosis, psychosocial impact, and management options, a clinical case exercise, a summary slide, and a references slide in Vancouver citation style. The intended audience is Canadian family physicians. Cite only the provided sources. Each slide should have a key-message title rather than a topic title.

The deck that came back was visually impressive and, at first glance, appeared structurally correct. It had a title slide, learning objectives with Bloom’s taxonomy verbs, content sections, a case exercise, a summary, and a references slide.

However, when I read it as an educational tool rather than scanning it for completeness, I noticed several underlying issues:

The learning objectives bundled too many competencies into single statements. For example, one learning objective asked participants to identify clinical features, apply the ILA 2022 diagnostic criteria, and differentiate lipedema from obesity, lymphedema, and lipohypertrophy—all within a single line. Under Bloom's taxonomy, each objective should target one measurable behavior. Bundled objectives can’t be meaningfully assessed and weaken alignment between teaching and evaluation.

A Claude-generated slide showing dense learning objectives, which violate Bloom's taxonomy requirement that each objective should only target one measurable behavior. — The Claude-generated slide deck's learning objectives bundled too many competencies into single statements.

The deck was too detailed for the format and audience. Like the AI literature review, the AI slide deck included details that went beyond what was needed for the format and audience. It had entire slides devoted to information that could be condensed into a single bullet point. An example would be the “No Biomarker Exists” slide shown below.

A Claude-generated CME slide deck slide about biomarkers which does not need to exist since this is too much information. — The Claude-generated deck had entire slides devoted to information that could be condensed into a single bullet point, like this one.

Certainty was overstated even when the literature was contested, and this was more evident in the deck than in the literature review because slides have fewer words and less context. As a result, there’s less room to qualify claims, and AI did not compensate for this by moderating its tone. For example, the cuff sign test was labeled as “B1 minor criterion” without any indication that it is an optional corroborating finding, not a required diagnostic criterion. Another slide stated that “adipose legs without allodynia cannot be diagnosed as lipedema,” a phrase that implies a binary rule in a diagnostic process the literature itself describes as clinical, contextual, and evolving.
Technical terms appeared without definition. Specialized terms like telangiectasias were used without being defined on the slide. They could be explained through a voiceover script, but the fact that these terms were not addressed on the slide contributed to the deck's overall density and assumed a level of clinical vocabulary that a general primary care audience may not have.
Content was repeated, and some slides had no clear learning purpose. Across the presentation, key points were repeated without adding new information. Further, two slides near the end of the deck—one stating that family physicians play a central role in coordinating the lipedema management team, another listing six documented practice gaps justifying CME priority for Canadian family physicians—did not serve a learning objective. The first could have been a spoken point, while the second was internal process documentation that has no place in a learner-facing activity.
Errors from the literature review carried downstream. The deck presented three lipedema stages as settled fact without acknowledging that some sources describe four. Scope miscalibration in the lit review flowed directly into the deck because it was built on the AI-generated lit review rather than independent editorial judgment.
The case exercise was underdeveloped relative to the deck's length. The AI deck covered genetics, detailed pathophysiology, and specialist-level clinical content across its slides, but the case exercise drew on almost none of it. Accordingly, the learner has no opportunity to apply most of what the deck covered. This structural incoherence is a signal that the AI generated breadth without asking what the learner actually needs to do with the content.
The management goals slide used a pyramid graphic that implies a treatment hierarchy. A pyramid-like visual isn’t neutral; it suggests to the reader that some treatments are foundational while others are supplementary. In reality, the evidence does not support that reading—lipedema management is multimodal and individualized. AI should have used bullet points since the treatments are parallel rather than ranked.

A Claude-generated CME deck slide with a pyramid-like structure that implies a treatment hierarchy that doesn't exist. — The Claude-generated deck included a pyramid-like visual that implied a treatment hierarchy that does not exist.

The Activity Package: A Document That Does Not Know What It Is

Once Claude AI finished generating the slide deck, I told it to generate the AI activity package using the following prompt:

You are assisting a medical writer developing CME activity documentation for a Canadian primary care continuing medical education activity on lipedema. Using the attached source documents (lit review and deck), produce an accreditation-style activity package including: an activity description of approximately 150 to 200 words suitable for submission to an accredited CME provider, a practice gap statement grounded in the provided sources, three to four measurable learning objectives written using Bloom's taxonomy action verbs and aligned with the practice gap, a target audience statement, and a faculty disclosure page. The intended accreditation framework is Mainpro+ (CFPC).

Once again, the prompt is specific—it names the components, the word count, and the accreditation framework—yet what came back was a multi-page document that, at first glance, looked like a professional CME submission package. On closer reading, it was something else: a bloated, professional-looking document that failed to grasp the exact purpose and audience.

Here’s what I noticed were the main flaws:

The practice gap statement was far too long. A practice gap statement answers one question: what is the documented gap, and why does this activity address it? My manual version does this in a short paragraph. The AI version argued the case across six numbered gaps with sub-citations—the length and structure of an internal needs assessment, which is a development document, not a submission document.
The learning objectives carried the same problems as the deck. Each objective bundled multiple competencies into a single Bloom's statement. This is another downstream problem from earlier AI products.
The target audience statement had an unnecessary rationale section. A target audience statement identifies who the activity is for. My manual version does that in two sentences. The AI version added a rationale section explaining why family physicians are the right audience—a point the practice gap already established.
The fictionalized content looked professional without being substantive. The package included a COI disclosure table with checkboxes, a commercial support agreement section, and an attestation signature line. These are unnecessary, especially since this is a simulated module.

What Finished Actually Means When You Use AI in CME

All three Claude-generated CME documents had the same core problem: the output looked finished, but the content required substantial editing. The polished appearance isn’t a minor inconvenience—in compliance-sensitive documents such as CME literature reviews, slide decks, and activity packages, it’s the condition most likely to result in under-scrutiny.

Now, that’s not to say we shouldn’t use AI at all in CME workflows. Rather, if you decide to use AI to generate drafts for editing, you need to know what you’re inheriting, and that inheriting a draft is not the same as inheriting a finished document.

You may need to cut, restructure, or rewrite sections from scratch. In fact, with AI’s tendency to hallucinate facts and obscure citation traceability, tracing and verifying claims and rewriting inaccurate or overstated sections may cost you more time than starting from scratch.

Insights