ByteMed #2: AI Roadmap, GPT4 in cancer center and machine learning preaasessment for pre-eclampsia
The AI Maturity Roadmap: A Framework for Effective and Sustainable AI in Health Care
BACKGROUND
Though many health systems are starting to deploy artificial intelligence (AI) and machine learning for clinical applications, there is limited guidance available to benchmark progress and assess maturity on a sector-wide scale.
METHODS
The Health Management Academy, Microsoft and Nuance convened the AI Collaborative in 2022. The group comprises more than 50 AI decision-makers from leading health systems across the United States, who worked together in three workstreams to create the first iteration of the AI Maturity Roadmap. The workstreams covered assessing the solution landscape, establishing clinical AI use cases, and creating the initial roadmap.
RESULTS
The AI Maturity Roadmap defines six key focus areas for health systems: Culture, Governance, Business Implementation, Value, Maintenance and Operations, and Information Architecture. Within these areas, there are several granular themes. To benchmark progress against these themes, there are five levels of maturity, ranging from “awareness” at level one to “transformational” at level five.
In the first instance, 24 health systems in the AI Collaborative have been benchmarked against the model. There is a range of maturity that currently skews toward the lower levels, with most self-reporting that their efforts are at level two, “active,” or level three, “operational.”
CONCLUSIONS
Even among industry leaders, there is a wide variety in maturity levels, as health systems approach AI from different angles and baselines. This roadmap will be a valuable tool for guiding best practices, investments, and conversations, and aligning progress to help health systems take advantage of sector-wide advancement.
The AI Collaborative continues to meet, discuss, and refine the AI Maturity Roadmap, with plans to expand the Business Implementation, Value, Maintenance and Operations, and Information Architecture sections. For more information about the AI Maturity Roadmap, please visit The Health Management Academy.
GPT-4 in a Cancer Center — Institute-Wide Deployment Challenges and Lessons Learned
The enormous potential for generative pretrained transformers (GPTs) and other artificial intelligence (AI) large language models (LLMs) to improve health care has become increasingly clear. Software tools based on LLMs have been shown to perform as well as or better than humans on many health care–related tasks, including generation of clinical documentation, extraction of structured data from medical records, performance on a growing number of medical board examination benchmarks, and writing accurate and empathetic responses to patients’ medical questions. However, health care and cancer care settings pose unique ethical, legal, regulatory, and technical challenges for large-scale deployment and adoption of LLMs. Such challenges include the essentiality of patient data privacy and security, the direct negative consequences of errors and biases, the need for model interpretability and supporting evidence, the necessity of safeguarding intellectual property and proprietary data, and the difficulty of modifying clinical and operational workflows. Consequently, few LLMs are in use in hospitals outside of controlled research studies or small pilot programs, and none to our knowledge is yet broadly deployed in a dedicated cancer center. In this case study, we report the challenges and lessons learned in the evaluation and deployment of LLMs at the Dana-Farber Cancer Institute for use in all business areas, including basic research, clinical research, and operations, but not in direct clinical care. In early discussions about whether and how to proceed, we realized that although some risks could be mitigated by clear policy guardrails and a secure technical environment, others would remain, including those regarding compliance with rapidly evolving regulations. We also recognized that substantial, ongoing work would be required to ensure appropriate ethical consideration of each use case and to ensure patient- and human-centric decision-making. After engaging in discussions over many months and employing a process framework for ethical implementation of AI in our cancer center, we believed it would be better to tackle these challenges as a community, rather than prohibit the use of LLMs altogether. Here, we detail aspects of sponsorship, governance, technical implementation, program launch, socialization, user feedback, and ongoing support and user training in preparation to make generative AI LLMs broadly available to our 12,500-member workforce in a compliant, auditable, and secure manner. We hope other institutions can benefit from our experience as they consider the deployment of these software tools to further their medical and research missions.
Heterogeneity and predictors of the effects of AI assistance on radiologists
The integration of artificial intelligence (AI) in medical image interpretation requires effective collaboration between clinicians and AI algorithms. Although previous studies demonstrated the potential of AI assistance in improving overall clinician performance, the individual impact on clinicians remains unclear. This large-scale study examined the heterogeneous effects of AI assistance on 140 radiologists across 15 chest X-ray diagnostic tasks and identified predictors of these effects. Surprisingly, conventional experience-based factors, such as years of experience, subspecialty and familiarity with AI tools, fail to reliably predict the impact of AI assistance. Additionally, lower-performing radiologists do not consistently benefit more from AI assistance, challenging prevailing assumptions. Instead, we found that the occurrence of AI errors strongly influences treatment outcomes, with inaccurate AI predictions adversely affecting radiologist performance on the aggregate of all pathologies and on half of the individual pathologies investigated. Our findings highlight the importance of personalized approaches to clinician–AI collaboration and the importance of accurate AI models. By understanding the factors that shape the effectiveness of AI assistance, this study provides valuable insights for targeted implementation of AI, enabling maximum benefits for individual clinicians in clinical practice.
Background
Affecting 2–4% of pregnancies, pre-eclampsia is a leading cause of maternal death and morbidity worldwide. Using routinely available data, we aimed to develop and validate a novel machine learning-based and clinical setting-responsive time-of-disease model to rule out and rule in adverse maternal outcomes in women presenting with pre-eclampsia.
Methods
We used health system, demographic, and clinical data from the day of first assessment with pre-eclampsia to predict a Delphi-derived composite outcome of maternal mortality or severe morbidity within 2 days. Machine learning methods, multiple imputation, and ten-fold cross-validation were used to fit models on a development dataset (75% of combined published data of 8843 patients from 11 low-income, middle-income, and high-income countries). Validation was undertaken on the unseen 25%, and an additional external validation was performed in 2901 inpatient women admitted with pre-eclampsia to two hospitals in south-east England. Predictive risk accuracy was determined by area-under-the-receiver-operator characteristic (AUROC), and risk categories were data-driven and defined by negative (–LR) and positive (+LR) likelihood ratios.
Findings
Of 8843 participants, 590 (6·7%) developed the composite adverse maternal outcome within 2 days, 813 (9·2%) within 7 days, and 1083 (12·2%) at any time. An 18-variable random forest-based prediction model, PIERS-ML, was accurate (AUROC 0·80 [95% CI 0·76–0·84] vs the currently used logistic regression model, fullPIERS: AUROC 0·68 [0·63–0·74]) and categorised women into very low risk (–LR <0·1; eight [0·7%] of 1103 women), low risk (–LR 0·1 to 0·2; 321 [29·1%] women), moderate risk (–LR >0·2 and +LR <5·0; 676 [61·3%] women), high risk (+LR 5·0 to 10·0, 87 [7·9%] women), and very high risk (+LR >10·0; 11 [1·0%] women). Adverse maternal event rates were 0% for very low risk, 2% for low risk, 5% for moderate risk, 26% for high risk, and 91% for very high risk within 48 h. The 2901 women in the external validation dataset were accurately classified as being at very low risk (0% with outcomes), low risk (1%), moderate risk (4%), high risk (33%), or very high risk (67%).
Interpretation
The PIERS-ML model improves identification of women with pre-eclampsia who are at lowest and greatest risk of severe adverse maternal outcomes within 2 days of assessment, and can support provision of accurate guidance to women, their families, and their maternity care providers.