Category: Decision-support

  • MODW4US

    Make Our Data Work for Us

    Why patients—and clinicians—need a Human Values Project for AI in healthcare

    Why call for “making our data work for us” in healthcare?

    Because our data already works—for many parties other than us.

    Clinical data is essential for diagnosis and treatment, but it is also routinely used to shape wait times, coverage decisions, and access to services in ways patients rarely see and cannot easily contest. Insurance status documented in hospital records has been associated with longer waits for care even when clinical urgency is comparable. Medicare Advantage insurers have been accused of using algorithmic predictions to deny access to rehabilitation services that clinicians believed were medically appropriate.

    This asymmetry is not new. Medicine has always involved unequal access to expertise and power. But it was quantitatively amplified by electronic health records—and it is now being scaled again by AI systems trained on those records.

    At the same time, something paradoxical is happening.

    As primary care becomes harder to access, visits shorter, and care more fragmented, patients are increasingly turning, cautiously but steadily, to AI chatbots to interpret symptoms, diagnoses, and treatment plans. Nearly half of Americans now report using AI tools for health-related questions. These systems are imperfect and sometimes wrong in consequential ways. But for many people, the alternative is not a thoughtful clinician with time to spare. It is no timely expert input at all.

    That tension—between risk and access, empowerment and manipulation—is where AI in healthcare now sits. And to be perfectly clear, I personally use AI chatbots all the time for second opinions, or extended explanation, about the care of family members and pets (!). It makes me a better patient and doctor.


    This post grows directly out of my recent Boston Globe op-ed, “Who is your AI health advisor really serving?”, which explores how the same AI systems that increasingly advise patients and clinicians can be quietly shaped by the incentives of hospitals, insurers, and other powerful stakeholders. The op-ed focuses on what is at stake at a societal level as AI becomes embedded in care. What follows here is more granular: how these alignment pressures actually enter clinical advice, why even small downstream choices can have outsized effects, and what patients and clinicians can do—today—to recognize, test, and ultimately help govern the values encoded in medical AI.
    [Link to Globe op-ed ]


    Where alignment actually enters—and why it matters

    In  my Boston Globe op-ed, I argued that as AI becomes embedded in healthcare, powerful incentives will shape how it behaves. Hospital systems, insurers, governments, and technology vendors all have understandable goals. But those goals are not identical to the goals of patients. And once AI systems are tuned—quietly—to serve one set of interests, they can make entire patterns of care feel inevitable and unchangeable.

    This is not a hypothetical concern.

    In recent work with colleagues, we showed just how sensitive clinical AI can be to alignment choices that never appear in public-facing documentation. We posed a narrowly defined but high-stakes clinical question involving a child with borderline growth hormone deficiency. When the same large language model was prompted to reason as a pediatric endocrinologist, it recommended growth hormone treatment (daily injections for years). When prompted to reason as a payer, it recommended denial and watchful waiting (which might be the better recommendation for non-growth-deficient children).

    Nothing about the medical facts changed. What changed was the frame—a few words in the system prompt.

    Scale that phenomenon up. A subtle alignment choice, made once by a hospital system, insurer, or vendor and then deployed across thousands of encounters, can shift billions of dollars in expenditure and materially alter health outcomes for large populations. These are not “AI company values.” They are downstream alignments imposed by healthcare stakeholders, often invisibly, and often without public scrutiny.


    Why experimenting yourself actually matters

    This is the context for the examples below.

    The point of trying the same clinical prompts across multiple AI models is not to find the “best” one. It is to calibrate yourself. Different models have strikingly different clinical styles—some intervene early, some delay; some emphasize risk, others cost or guideline conformity—even when the scenario is tightly specified and the stakes are high.

    By seeing these differences firsthand, two things happen:

    1. You become less vulnerable to false certainty.
      Each model speaks confidently. Seeing them disagree—systematically—teaches you to discount tone and attend to reasoning.
    2. You partially immunize yourself against hidden alignment.
      Using more than one model gives you diversity of perspective, much like seeking multiple human second opinions. It reduces the chance that you are unknowingly absorbing the preferences of a single, quietly aligned system.

    This kind of experimentation is not a substitute for clinical care. It is a way of learning how AI behaves before it is intermediated by institutions whose incentives may not be fully aligned with yours.


    Using AI with your own data

    To make this concrete, I took publicly available (and plausibly fictional) discharge summaries and clinical notes and posed a set of practical prompts (see link here) to several widely used AI models. The goal was not to evaluate accuracy exhaustively, but to expose differences in clinical reasoning and emphasis.

    Some prompts you might try with your own records (see the bottom of this post about getting your own records):

    • “Summarize this hospitalization in plain language. What happened, and what should I do next?”
    • “Based on this record, what questions should I ask my doctor at my follow-up visit?”
    • “Are there potential drug interactions among these medications?”
    • “Explain what these lab values mean and flag any that are abnormal.”
    • “Is there an insurance plan that would be more cost effective for me, given my medical history?”
    • “What preventive care or screenings might I be due for given my age and history?”
    • “Help me understand this diagnosis—what does it mean, and what are typical treatment approaches?”

    Across models, the differences are obvious. Some are conservative to a fault. Others are aggressive. Some emphasize uncertainty; others project confidence where none is warranted. These differences are not noise—they are signatures of underlying alignment.

    Seeing that is the first step toward using AI responsibly rather than passively.


    The risks are real—on both sides

    AI systems fail in unpredictable ways. They hallucinate. They misread context. They may miss urgency or overstate certainty. A plausible answer can still be wrong in ways a non-expert cannot detect.

    But here is the uncomfortable comparison we need to make.

    We should not measure AI advice against an idealized healthcare system with unlimited access and time. We should measure it against the system many patients actually experience: long waits, rushed visits, fragmented records, and limited access to specialists.

    The real question is not whether AI matches the judgment of a thoughtful physician with time to think. It is whether AI can help patients make better use of their own data when that physician is not available—and whether it does so in a way aligned with patients’ interests.


    Why individual calibration is not enough

    Learning to interrogate AI systems helps. But it does not solve the structural problem.

    Patients should not have to reverse-engineer the values embedded in their medical advice. Clinicians should not have to guess how an AI system will behave when trade-offs arise between cost, benefit, risk, and autonomy. Regulators should not have to discover misalignment only after harm occurs at scale. If AI is going to influence care at scale—and it already does—values can no longer remain implicit.

    This is where the Human Values Project (HVP) begins.

    The aim of HVP is to make the values embedded in clinical AI measurable, visible, and discussable. We do this by systematically studying how clinicians, patients, and ethicists actually decide in value-laden medical scenarios—and by benchmarking AI systems against that human variation. Not to impose a single “correct” value system, but to make differences explicit before they are locked into software and deployed across health systems. The HVP already brings together clinicians, patients, and policymakers across the globe.

    In the op-ed, I called for public and leadership pressure for truthful labeling of the influences and alignment procedures shaping clinical AI. Such labeling is only meaningful if we have benchmarks against which to measure it. That is what HVP provides.


    Conclusion

    Medicine is full of decisions that lack a single right answer. Should we favor the sickest, the youngest, or the most likely to benefit? Should we prioritize autonomy, cost, or fairness? Reasonable people disagree.

    AI does not eliminate those disagreements. It encodes them.

    The future of clinical AI depends not only on technical accuracy, but on visible alignment with values that society finds acceptable. If we fail to make those values explicit, AI will quietly entrench the priorities of the most powerful actors in a $5-trillion system. If we succeed, we have a chance to build decision systems that earn trust—not because they are flawless, but because their commitments are transparent.

    That is the wager of the Human Values Project.


    How to participate in the Human Values Project

    The Human Values Project is an international, ongoing effort, and participation can take several forms:

    • Clinicians:
      Contribute to structured decision-making surveys that capture how you approach difficult clinical trade-offs in real-world scenarios. These data help define the range—and limits—of reasonable human judgment.
    • Patients and caregivers:
      Participate in parallel surveys that reflect patient values and preferences, especially in situations where autonomy, risk, and quality of life are in tension.
    • Ethicists, policymakers, and researchers:
      Help articulate and evaluate normative frameworks that can guide alignment, without assuming a single universal standard.
    • Health systems and AI developers:
      Collaborate on benchmarking and transparency efforts so that AI systems disclose how they behave in value-sensitive clinical situations.

    Participation does not require endorsing a particular ethical framework or AI approach. It requires a willingness to make values explicit rather than implicit. Participants will receive updates on findings and early access to benchmarking tools. If you want to learn more or wish to participate, visit the site: https://hvp.global or send email to join@respond.hvp.global

    If AI is going to help make our data work for us, then the values shaping its advice must be visible—to patients, clinicians, and society at large.



    For those wanting to go deeper, the following papers lay out some of the conceptual and empirical groundwork for HVP.

    Kohane IS, Manrai AK. The missing value of medical artificial intelligence. Nat Med. 2025;31: 3962–3963. doi:10.1038/s41591-025-04050-6
    
    Kohane IS. The Human Values Project. In: Hegselmann S, Zhou H, Healey E, Chang T, Ellington C, Mhasawade V, et al., editors. Proceedings of the 4th Machine Learning for Health Symposium. PMLR; 15--16 Dec 2025. pp. 14–18. Available: https://proceedings.mlr.press/v259/kohane25a.html
    
    Kohane I. Systematic characterization of the effectiveness of alignment in large language models for categorical decisions. arXiv [cs.CL]. 2024. Available: http://arxiv.org/abs/2409.18995
      
    Yu K-H, Healey E, Leong T-Y, Kohane IS, Manrai AK. Medical artificial intelligence and human values. N Engl J Med. 2024;390: 1895–1904. doi:10.1056/NEJMra2214183
    

    Getting your own data

    To try this with your own information, you first need access to it.

    Patient portals.
    Most health systems offer portals (such as MyChart) where you can view and download visit summaries, lab results, imaging reports, medication lists, and immunizations. Many now support exports in standardized formats, though completeness varies.

    HIPAA right of access.
    Under HIPAA, you have a legal right to a copy of your medical records. Providers must respond within 30 days (with a possible extension) and may charge a reasonable copying fee. The Office for Civil Rights has increasingly enforced this right.

    Apple Health and other aggregators.
    Under the 21st Century Cures Act, patients have access to a computable subset of their data. Apple Health can aggregate records across participating health systems, creating a longitudinal view you can export. Similar options exist on Android and via third-party services. I will expound on that in another post.

    Formats matter—but less than you think.
    PDFs are harder to process computationally than structured formats like C-CDA or FHIR, but for the prompts above, even a discharge summary PDF is enough.