Tag: EHR

  • AI, Medical Training, and the New DoubleThink

    Trainee: “I cut and pasted the clinical note from the EHR into an AI tool to get a second opinion on management.”

    Me: “Were you worried about HIPAA, or about how the AI company might use the data?”

    Trainee: “Maybe. But I saw the senior resident in the ED do the same.”

    Our institutional reluctance to decide, openly and practically, how AI should be used in patient care is already eroding something important in medical training. We are forcing students, residents, and young physicians into a new kind of double think: publicly honoring one set of rules while privately relying on another set of practices to get through the day and care for patients well.

    Medicine has long contained smaller forms of this tension. I think of tests I learned to order not because they were likely to change management, but because the risk of not ordering them felt legally unsafe. Faced with a patient, I might describe that as “covering the bases,” even when I knew the test was unlikely to be useful and might even lead to false positives, extra cost, and unnecessary follow-up. Medicine has never been free of these mismatches between official rationale and lived practice. AI is making them larger, more frequent, and harder to ignore.

    I first saw this clearly in early 2023. ChatGPT had only recently entered public awareness, and physicians almost immediately found clever ways to use it for tedious administrative work. One example was feeding it the contents of a patient note and asking it to draft an appeal to an insurer for a referral or procedure. What had taken many minutes could suddenly be done in seconds.

    The ingenuity was impressive. The compliance problem was obvious.

    In many settings, pasting identifiable patient information into publicly available AI systems could run afoul of privacy rules and institutional policies, especially when no business associate agreement or equivalent contractual protection was in place. Yet that did not stop people. The tools were simply too useful. Today, some health systems do have contractual arrangements with leading AI vendors that provide stronger protections and limit how submitted data may be retained or used. But those protections still do not apply to many of the models that clinicians can easily access on their own.

    Over the past year, in conversations with medical students and trainees around the country, I have heard the same pattern again and again. The AI tools available inside approved hospital environments are often weaker, harder to use, or less helpful than the best systems available to the general public. So trainees improvise. They compare models. They exchange tips. They gravitate toward whatever seems most capable of helping them think through a diagnostic problem, frame a management plan, or communicate more effectively.

    They are not doing this because they are naïve. Quite the opposite. Most are well aware that AI systems can hallucinate, omit, and mislead. But they also know that these tools can jog memory, widen a differential, reframe a problem, and help them express a plan more clearly. When I ask how they justify the regulatory risk, the answer is usually some version of one of two things: they learned the behavior from those slightly ahead of them, or they believe that, in the moment, the benefit to the patient outweighs the institutional rule they are bending.

    That is not a healthy equilibrium. It is ethically unstable, legally exposed, and educationally corrosive. A recent NEJM AI editorial captured this tension well: clinicians are making pragmatic tradeoffs in the face of real need, but they are doing so in a vacuum of institutional clarity.

    So what should healthcare institutions do?

    One response is restrictive. Hospitals can limit AI use to tools vetted by the institution or bundled by the EHR vendor, and treat outside use as a serious compliance violation even when clinicians access those tools through personal accounts. That approach has the appeal of clarity. But it is unlikely to work for long if the permitted tools are materially worse than what is available elsewhere. Trainees will not stop comparing quality simply because leadership wishes they would.

    The better response is forward-looking. Institutions should acknowledge three realities at once: these tools are already clinically influential; their capabilities will change rapidly; and no single company is likely to remain best indefinitely. On that basis, hospitals and medical schools should make safe AI use part of formal clinical apprenticeship. They should teach where AI helps, where it fails, what kinds of patient data can and cannot be used in which settings, how outputs should be checked, and how responsibility remains with the clinician. At the same time, healthcare leaders should negotiate flexible privacy-preserving agreements with multiple vendors so that clinicians can use high-performing tools lawfully, compare them directly, and develop informed judgment about their strengths and weaknesses.

    If enough healthcare institutions demand that kind of access, more AI vendors will create the contractual and technical mechanisms needed to support it.

    The restrictive path will not just be frustrating. It will be demoralizing. Years ago, I wrote about how clunky and antiquated much of our EHR infrastructure felt compared with the tools available to ordinary teenagers outside medicine. That gap was not trivial; it contributed to burnout. We now risk repeating the same mistake with AI, but on a larger scale.

    If we force clinicians to choose between following outdated institutional constraints and using the best available tools to help patients, many will choose the latter, quietly. That silence is the real danger. Healthcare institutions should not train the next generation to hide their use of AI. They should train them to use it well, lawfully, critically, and in the open.

  • The Medical Alignment Problem—A Primer for AI Practitioners.

    Version 0.6 (Revision history at the bottom) November, 30, 2023

    Much has been written about harmonizing AI with our ethical standards, a topic of great significance that still demands further exploration. Yet, an even more urgent matter looms: realigning our healthcare systems to better serve patients and society as a whole. We must confront a hard truth: the alignment of these systems with our needs has always been imperfect, and the situation is deteriorating.

    My purpose is not to sway healthcare policy but to shed light on this issue for a specific audience: my peers in computer science, along with students in both medicine and computer science. They frequently pose questions to me, prompting this examination. These inquiries aren’t just academic or mercantile; they reflect a deep concern about how our healthcare systems are failing to meet their most fundamental objectives and an intense desire to bring their own expertise, energy and optimism to address these failures.

    A sampling of these questions

    • Which applications to clinical medicine are ripe for improvement or disruption by the application of AI?
    • What do I have to demonstrate to get my AI program adopted?
    • Who decides which programs are approved or paid for?
    • This program we’ve developed helps patients. So why are doctors, nurses and other healthcare personnel so reluctant to use our program?
    • Why can’t I just market this program directly to patients?

    To avoid immediately disappointing any reader, beware, I am not going to answer those questions here although I have done so in the past and will continue to do so. Here I will focus only on the misalignment between organized/establishment healthcare and its mission to improve the health of members of our society. Understanding the misalignment is a necessary preamble to answering the questions of the sort listed above.

    Basic Facts of Misalignment of Healthcare

    Let’s proceed to some of the basic facts about the healthcare system and the growing misalignments. Again, many of these pertain to several developed countries but they are most applicable to the US.

    Primary care is the where you go for preventive care (e.g. yearly checkups) and go first when you have a medical problem. In the US, primary care doctors are amongst the lowest paid. They also have a constantly increasing administrative burden. As a result, despite the growing needs for primary care with the graying of our citizens, the gap between the number of primacy care doctors and the need for such doctors may exceed 40,000 within the next 10 years in the US alone.

    In response to the growing gap between the demand for primary care and the availability of primary care doctors, the U.S. healthcare system has seen a notable increase in the employment of nurse practitioners (NPs) and physician assistants (PAs). These professionals now constitute an estimated 25% of the primary care workforce in the United States, a figure that is expected to rise in the coming years.

    You might think that the fact that U.S. doctors earn roughly double the income of doctors in Europe would result in a stable workload. Despite this higher pay, they face relentless pressure, often exerted by department heads or hospital administrators, to see more patients each day.

    The thorough processes that were once the hallmark of medical training—careful patient history taking, physical examinations, crafting thoughtful diagnostic or management plans, and consulting with colleagues—are now often condensed into forms that barely resemble their original intent. This transformation of medical practice into a high-pressure, high-volume environment contributes to several profound issues: clinician burnout, patient dissatisfaction, and an increased likelihood of clinical errors. These issues highlight a growing disconnect between the healthcare system’s operational demands and the foundational principles of medical practice. This misalignment not only affects healthcare professionals but also has significant implications for patient care and safety.


    The acute workforce shortage in healthcare extends well beyond the realm of primary care, touching various subspecialties that are often less lucrative and, perhaps as a result, perceived as less prestigious. Fields such as Developmental Medicine, where children are assessed for conditions like ADHD and autism, pediatric infectious disease, pediatric endocrinology, and geriatrics, consistently face the challenge of unfilled positions year after year.

    This shortage is compounded by a growing trend among medical professionals seeking careers outside of clinical practice. Recent surveys indicate that about one-quarter of U.S. doctors are exploring non-clinical career paths in areas such as industry, writing, or education. Similarly, in the UK, half of the junior doctors are considering alternatives to clinical work. This shift away from patient-facing roles points to deeper issues within the healthcare system, including job dissatisfaction, the allure of less stressful or more financially rewarding careers, and perhaps a disillusionment with the current state of medical practice. This trend not only reflects the personal choices of healthcare professionals but also underscores a systemic issue that could further exacerbate the existing shortages in crucial medical specialties, ultimately impacting patient care and the overall effectiveness of the healthcare system.

    Doctors have been burned by information technology: Electronic health records (EHRs). Initially introduced as a tool to enhance healthcare delivery, EHRs have increasingly been utilized primarily for documenting care for reimbursement purposes. This shift in focus has led to a significant disconnect between the potential of these systems and their actual use in clinical settings. Most of the currently widely used implementations over the last 15 years have rococo user interfaces that would offend the sensibilities of most “less is more” advocates. Many technologists will be unaware of the details of clinicians’ experience with these systems because EHR companies will have contractually imposed gag orders to prevent doctors from publishing screenshots. Yet these same EHR systems are widely understood to be major contributors to doctor burnout and general disaffection with clinical care. These same EHR’s cost millions (hundreds of millions for a large hospital) and have made many overtaxed hospital information technology leaders wary of adopting new technologies.

    At least 25% of the US healthcare costs are administrative. This administrative overhead heaped atop of the provisioning of healthcare services includes the tug of war between healthcare providers and healthcare payors on how much to bill and how much to reimburse. It also includes the authorization for procedures, referrals, the multiple emails and calls to coordinate care between the members of the care team writ large (pharmacist, visiting nurse, rehabilitation hospital, social worker) and the multiple pieces of documentation entailed by each patient encounter (e.g. post-visit note to the patient, to the billing department, to a referring doctor). These non-clinical tasks don’t have the same liability as patient care and the infrastructure to execute them is more mature. As noted by David Cutler and colleagues, this makes it very likely that administrative processes will present the greatest initial opportunity for a broad foothold of AI into the processes of healthcare.

    Even in centralized, nationalized healthcare systems there is a natural pressure to do something when faced with a patient who is suffering or worried. Watchful waiting, when medically prudent, requires ensuring that the patient understands that not doing anything might be the best course of action. This requires the doctor to establish trust during the first visit and in future visits, so the patient can be confident that their doctor will be vigilant and ready to change course when needed. This requires a lot more time and communication than many simple treatments or procedures. The pressure to treat is even more acute when reimbursement for healthcare is under a fee-for-service system, as is the case for at least 1/3 of US healthcare. That is, doctors get paid for delivering treatments rather than better outcome. One implication is that advice (by humans or AI) to not deliver a treatment might be in financial conflict with the interests of the clinician.

    The substrate for medical decision-making is high-quality data about the patients in our care. Those data are often obtained at considerable effort, cost and risk to the patient (e.g, when involving a diagnostic procedure). Sharing those data across healthcare wherever it is provided has been an obvious and long-sought goal. Yet in many countries, patient data remains locked in propriety systems or accessible to only a few designees. Systematic and continual movement of patient data to follow them across countries is relatively rare and incomplete. EHR companies that have large marketshare therefore have outsized leverage in influencing the process of healthcare, of guiding medical leaders to market patient data (e.g for market research or training AI models). They are often also aligned with healthcare systems that would rather not share clinical data with their competitors. Fortunately, the 21st Century Cures act passed by the US congress has explicitly provided for the support of APIs such as SMART-on-FHIR to allow patients to transport their data to other systems. The infrastructure to support this transport is still in its infancy but has been accelerated by companies such as Apple which have provided customers access to their own healthcare records across hundreds of hospitals.

    Finally, at the time of this writing (2023) hospitals and healthcare systems are under enormous pressure to deliver care in a more timely and safer fashion and simultaneously are financially fragile. This double jeopardy was accentuated by the consequences of the 2020 pandemic. It may also be that the pandemic merely accelerated the ongoing misalignment between medical capabilities, professional rewards, societal healthcare needs and an increasingly anachronistic and inefficient medical education and training process. The stresses caused by the misalignment may create cracks into which new models of healthcare may find a growing niche but it might also bolster powerful reactionary forces to preserve the status quo.

    Did I miss an important gap relevant to AI/CS scientists, developers or entrepreneurs? Let me know by posting in this post’s comments section (which I moderate) or just reply to my X/Twitter post @zakkohane.

    VersionComment
    0.1Initially covered many more woes of medicine
    0.2Refocused on bits most relevant to AI developers/computer scientists.
    0.3Removed many details that detracted from the message
    0.4Inserted the kinds of questions that I have answered in the past but need to first provide this bulletized version of the misalignments of the healthcare system as a necessary preamble.
    0.5Added more content on EHR’s and corrected cut and paste errors! (Sorry!)
    0.6Added positions unfilled as per https://twitter.com/jbcarmody/status/1729933555810132429/photo/1
    Version History

  • Standing on the shoulders of clinicians.

    The recent publication “Health system-scale language models as all-purpose prediction engines” by Jiang et al. in Nature (June 7th, 2023) piqued my interest. The authors executed an impressive feat by developing a Large Language Model (LLM) that was fine-tuned using data from multiple hospitals within their healthcare system. The LLM’s predictive accuracy was noteworthy, yet it also highlighted the critical limitations of machine learning approaches for prediction tasks using electronic health records (EHRs).

    Take a look at the above diagram from our 2021 publication Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians?. It makes the point that the EHR is not merely a repository of objective measurements, but it also includes a record (whether explicit or not) of physician beliefs about the patient’s physiological state and prognosis for every clinical decision recorded. To draw a comparison, using clinicians’ decisions to diagnose and predict outcomes resembles a diligent, well-read medical student who’s yet to master reliable diagnosis. Just as such a student would glean insight from the actions of their supervising physician (ordering a CT scan or ECG, for instance), these models also learn from clinicians’ decisions. Nonetheless, if they were to be left to their own devices, they would be at sea without the cue of the expert decision-maker. In our study we showed that relying solely on physician decisions—as represented by charge details—to construct a predictive model resulted in performances remarkably similar to those models using comprehensive EHR data..

    The LLMs from Jiang et al.’s study resemble the aforementioned diligent but inexperienced medical student. For instance, they used the discharge summary to predict readmission within 30 days in a prospective study. These summaries outline the patients’ clinical course, treatments undertaken, and occasionally, risk assessments from the discharging physician. The high accuracy of the LLMs—particularly when contrasted with baselines like APACHE2, which primarily rely on physiological measurements—reveals that the effective use of the clinicians’ medical judgments is the key to their performance.

    This finding raises the question: what are the implications for EHR-tuned LLMs beyond the proposed study? It suggests that quality assessment and improvement teams, as well as administrators, should consider employing LLMs as a tool for gauging their healthcare systems’ performance. However, if new clinicians—whose documented decisions might not be as high-quality—are introduced, or if the LLM is transferred to a different healthcare system with other clinicians, the predictive accuracy may suffer. That is because clinician performance is highly variable over time and location. This variability (aka data set shift) might explain the fluctuations in predictive accuracy the authors observed during different months of the year.

    Jiang et al.’s study illustrates that LLMs can leverage clinician behavior and patient findings—as documented in EHRs—to predict a defined set of near-term future patient trajectories. This observation paradoxically implies that in the near future, one of the most critical factors for improving AI in clinical settings is ensuring our clinicians are well-trained and thoroughly understand the patients under their care. Additionally, they must be consistent in communicating their decisions and insights. Only under these conditions will LLMs obtain the per-patient clinical context necessary to replicate the promising results of this study more broadly.

  • How we obtained, analyzed COVID19 data across 96 hospitals, 5 countries in 4 weeks.

    At first, I waited for others in the government, industry and in academia to put together the data and the analyses that would allow clinicians to practice medicine that has worked the best: knowing what to expect when treating a patient. With the first COVID19 patients, ignorance was to be expected but with hundreds of patients seen early on in Europe, we could expect solid data about the clinical course of these patients. Knowing what to expect would allow doctors and nurses to be on the look out for different turns in the trajectory of their patients and thereby act rapidly and without having to wait for a full morbid declaration of yet another manifestation of viral infection pathology. Eventually we could learn what works and what does not but first just knowing what to expect would be very helpful. I’m a “cup half-full” optimist but when, in March, I saw that there were dozens of efforts that would yield important results but in months rather than weeks [If there’s interest, I can post an explanation of why I came to that conclusion], I decided that I would try to see if I could do something useful with my colleagues in biomedical informatics. Here I’ll focus on what I have found amazing—that groups can work together on highly technical tasks to complete multi-institutional analyses in less than a handful of weeks if they have shared tools, whether open source or proprietary but most importantly, if they have detailed understanding of the data from their specific home institution.

    I first reached out to my i2b2 colleagues with a quick email. What are “i2b2 colleagues”? Over 15 years ago, I helped start a NIH-funded national center for biocomputing predicated on the assumption that by instrumenting the healthcare enterprise we could use the data acquired during the course of healthcare (at considerable financial cost and grueling effort of the healthcare workforce, but that’s another story). One of our software products was a free and open source software system called i2b2 (named after our NIH-funded national center for biomedical computing: Informatics for Integrating Biology and the Bedside) that enable data to be extracted by authorized users from various proprietary electronic health record systems—EHR. i2b2 was adopted by hundreds of academic health centers and an international community of informaticians work together to share knowledge (eg. how to analyze EHR data) was formed. The group meets twice a year, once in the US and once in Europe and has a non-profit foundation to keep it organized. This is the “i2b2” group I sent an email out to. I wrote that there was an opportunity to rapidly contribute to our understanding of the clinical course. We were going to have to focus on the data that was available now, was useful in the aggregate (obtaining permission to share individual patient data across institutions let alone countries is a challenging and lengthy process). As most of us were using the same software to extract data from the health record, we had a head start but we all knew there would be a lot of thought and work required to succeed. Among the many tasks we had to address:

    • Make sure that the labs reported by each hospital were the same ones. A glucose result can be recorded under dozens of different names in an EHR. Which one(s) should be picked? Which standard vocabulary should be used to name that glucose and other labs (also known as the terrifyingly innocuous-sounding yet soul-deadening process of “data harmonization.”
    • Determine what constitutes a COVID19 patient? In some hospitals they received patients said to be COVID19 positive but they don’t know positive by which test. In others they use two specific tests. If the first is negative and the second is positive is the time of diagnosis: the admission, the time of the first test or second test?

    Assigning these tasks across more than 100 collaborators in 5 countries during the COVID19 mandatory confinement and then coordinating these without a program manager was going to be challenging under any conditions. Doing so with the goal of showing results within weeks, even moreso. In addition to human resourcefulness and passion we were fortunate to have a few tools that made this complex international process a tractable one. These were Slack, Zoom, Google documents, Jupyter notebooks, Github and a shared workspace on the Amazon cloud (where the aggregate data was stored, the individual patient data remained at all the hospital sites). We divided the tasks into subtasks (e.g. common data format, visualization, results website, manuscript writing, data analysis) and created a Slack channel for each. Then those willing to work on the subtask self-organized on each of their respective channels. Three out of the five tools I’ve listed above were not available just 10 years ago.

    We were able to publish the first results and a graphically sophisticated website within 4 weeks. See covidclinical.net for the result. That and pre-print. All of this with consumer-level tools and of course a large prior investment in open source software designed for the analysis of electronic health records. Now we have a polished version of the pre-print published in Nature Digital Medicine and a nice editorial.

    Nonetheless, the most important takeaway from this rapid and successful sprint to characterize the clinical course of COVID19 is the familiarity each of the informaticians at each clinical site had with their institutional data. This certainly did help them respond rapidly to data requests but that was less important than understanding precisely the semantics of their own data. Even sites that would have the same electronic health record vendor would have practice styles that would mean that a laboratory name (e.g. Troponin) in one clinic would not be the same test as in the ICU laboratory. Sorting that out in multiple Zoom and Slack conversations required dozens of conversations. Yet, many of the commercial aggregation efforts are of necessity blind to these distinctions because their business model precludes this detailed back and forth with each source of clinical data. Academic aggregation efforts tend to be more fastidious about aligning semantics across sites but it’s understandable that the committee-driven processes that result are ponderous and with hundreds of hospitals take months, at least. Among the techniques we used to maintain our agility was a ruthless focus on a subset of the data for a defined set of questions and to steadfastly refuse to expand our scope until we completed the first, narrowly defined tasks, as encapsulated by our first pre-print. Our experience with international colleagues using i2b2 since 2006 also created a lingua franca and patience with reciprocal analytic “favor-asking.” 4CE has continued to have multiple meetings per week and I hope to add to this narrative in the near future.