Future Pandemic Response | Jay Skaria's Portfolio

AI-Driven ICU Prediction: Clinical Decision Support for Future Pandemic Response

Problem Statement/Vision:
The COVID-19 pandemic revealed how the inability to anticipate a patient’s illness trajectory can lead to deterioration in illness severity, escalation in care intensity, and operational resource strain (1). Hospitals faced an influx of severely ill patients, often unable to predict or prevent their deterioration, which pushed ICU capacities to critical limits. Integrating machine learning into hospital workflows presents a vital opportunity to address these challenges by improving future pandemic decision-making.

This project designs a machine learning solution that will transform pandemic response by predicting ICU admissions for hospitalized patients. For clinical teams, early alerts enable closer monitoring and targeted interventions, such as antiviral therapies, to prevent illness deterioration. For patients, this prevents functional decline and supports faster recovery and discharge. For hospital leaders, earlier identification and intervention enables proactive resource planning, reduced ICU overcrowding, and decreased costs per case, helping maintain hospital operations during surges and enhancing healthcare system resilience (2).

Success Metrics:

An AI-powered solution transforms pandemic care by:

• Decreasing percentage of hospitalized patients admitted to ICU

• Decreasing frequency of ICUs reaching >90% capacity

Solution Overview:

This solution is based on a COVID-19 use case, though it has potential broader applicability to other infectious disease outbreaks of the future. COVID-19 literature identifies multiple risk factors associated with severe illness, including male sex, elderly age, obesity, and comorbidities like hypertension, diabetes, and cardiovascular disease (3). Predictive modeling studies also highlight various EHR data as indicators of ICU admission risk, such as oxygen desaturation and laboratory abnormalities like elevated C-reactive protein and decreased lymphocyte count (1, 3, 4). Our model predicts ICU-admission probability based on features derived from this research and trained on the GEMINI dataset of hospitalized patients in Ontario (see Appendix) (5). Most features contain structured data, such as integers and discrete text, with the exception of radiology reports for Chest CT, which is unstructured, narrative text. A language model (described in Technical Architecture) screens for key words corresponding with COVID-19 ICU risk in the literature, such as “ground-glass opacities” or “bronchus distortion” (6). By splitting the data between COVID-19 patients who required ICU admission and those who did not, the model is trained to produce an output of ICU-admission probability, or “ICU Risk Score,” as a percentage from 0 to 100. Upon a patient’s initial hospitalization, admission data provides a baseline for predictions, which is refined temporally as additional data is collected throughout the course of illness.

Our solution consists of two components. The first is a clinical decision support system (CDSS) integrated with the hospital EHR that alerts clinical teams when a patient’s ICU Risk Score reaches a threshold of 70% (pending ongoing threshold refinement). Nurses see the alert in the EHR as an icon beside the patient’s name (Figure 1), while attending physicians receive an EHR notification (e.g. Epic InBasket message). This system enables earlier, data-driven identification of patients at high risk for ICU admission and timelier interventions.

Figure 1: EHR alert indicating high ICU risk

Figure 2: COVID-19 Operations Dashboard

The second component of the solution is an EHR-integrated “COVID-19 Operations Dashboard” that provides real-time data on the number of COVID-19 patients at high risk for ICU admission and the current ICU bed capacity (Figure 2). This dashboard is designed to provide leadership a high-level overview of anticipated ICU demand and supply throughout the hospital, helping plan for potential ICU admissions, proactively allocate resources, and decrease the likelihood of ICUs reaching max capacity.

Technical Architecture:

Module 1: Data Preprocessing

Prior to model training, the following preprocessing steps are applied:

Standardization: Differences between a patient's actual test results and demographic-specific expected values are calculated and normalized. This prevents skewed weighting and mitigates demographic-related bias.
Multicollinearity Assessment: Highly collinear feature pairs are identified and removed.
Interaction Terms and Feature Engineering: Meaningful interaction terms and secondary features are generated from relevant sub-categories.
Outlier Detection: Z-score analysis is applied to numerical columns, and outliers with a z-score ≥ 4 are removed.
Class Imbalance Handling: To address the imbalance between ICU and non-ICU cases in COVID-19 data, a Random Under-Sampling (RUS) strategy is implemented. This balances the dataset by randomly reducing the majority class size.

Module 2: Model Architecture

Feature Extraction: Features are extracted from standardized inputs, including demographics, vitals, and lab test results. Dimensionality reduction is applied to unstructured radiology reports using models such as BERT or RoBERTa.
Ensemble Architecture: A supervised ensemble model is proposed to handle data complexity. The model incorporates Logistic Regression, KNN, SVM, Decision Trees, Random Forest, XGBoost, and Deep Neural Networks.
Evaluation and Optimization: Model performance is evaluated using accuracy, precision, recall, and F1-score, with emphasis on recall. Recursive Feature Elimination with Cross-Validation (RFECV) is used to optimize feature selection. Hyperparameters are tuned using 5-fold cross-validation.
Validation Metrics: Confusion matrices and ROC curve analysis are conducted on both training and test sets.

Module 3: Sensitivity and Feature Importance Analysis

Interpretability: SHAP (SHapley Additive exPlanations) values are used to interpret model predictions and understand feature importance.
Sensitivity Analysis: Feature permutation is applied by randomly shuffling individual features while maintaining their statistical distribution. The resulting change in performance metrics indicates the feature's importance.
Cross-Model Feature Importance: An importance metric is defined to identify key features consistently across multiple trained models .

Screenshot 2025-06-14 at 12.25.02 AM.png

Figure 3: Proposed ML architecture

Development Plan and Prototype:

Real-world data commonly undergoes "Data Shift," a phenomenon where data distributions change over time, leading to performance degradation in previously trained models. To mitigate this, the proposed solution incorporates incremental model retraining using the most recent data batches. The GEMINI dataset, which captures detailed clinical and administrative data incrementally, necessitates continuous training (CT) capabilities within the ML pipeline to ensure consistently accurate and up-to-date predictions.

An MLOps Level 1 implementation is recommended within the infrastructure layer to orchestrate continuous learning and delivery. In this proposed pipeline, once a significant volume of new data has accumulated—or when model performance falls below a defined threshold—the designated ML models will undergo incremental learning using only the newly acquired data. This approach optimizes computational efficiency by avoiding full dataset retraining.

Deployment of updated models will utilize the Canary release pattern. This deployment strategy introduces new models gradually into the production environment, progressively increasing their exposure to real-world data while concurrently phasing out older models. This minimizes risk and ensures stable performance during transitions.

Predicted probabilities of ICU admission generated by the model will integrate with a clinical decision support system (CDSS) within the hospital's electronic health record (EHR). Additional interface specifications are outlined in the Solution Overview section.

Screenshot 2025-06-14 at 12.25.35 AM.png

Figure 4: Proposed MLOPs deployment pipeline

Implementation Plan:

The solution implementation follows a phased approach consisting of data integration, model deployment, and iterative expansion. Emphasis is placed on effective knowledge translation to integrate the machine learning (ML) model into real-world practice, aiming to support at-risk populations in future pandemics.

Phase 1: Initial Data IntegrationInitial integration focuses on incorporating core datasets from the GEMINI network, prioritizing key predictors such as demographics, comorbidities, vital signs, laboratory values, and imaging data, as supported by recent literature (1, 3, 4, 12–15). This phase includes data harmonization, validation, and adherence to privacy legislation such as Ontario’s Personal Health Information Protection Act (PHIPA) (16).

Phase 2: Model Deployment and Pilot TestingThe ML model is deployed on secure hospital servers in collaboration with EHR vendors. Pilot testing is conducted in selected Ontario hospitals, such as UHN using Epic, to assess model validation, usability, and interpretability. Stakeholder feedback is gathered to guide iterative refinements.

Phase 3: Expansion and Model OptimizationFollowing the pilot, additional data sources—such as Ontario Laboratories Information System (OLIS) records—are integrated to improve model accuracy. Deployment expands to additional hospitals and vendors. To address potential "Data Shift" (17), the model undergoes periodic retraining and re-validation using updated data batches.

Throughout all phases, the implementation strategy complies with established privacy and security frameworks (18), with continued engagement of clinical and operational stakeholders to ensure alignment with workflow and public health priorities. The solution is designed to support policy and decision-making across the healthcare system, offering a scalable and interoperable approach to reducing disease burden in Canada.

References

Cheng FY, Joshi H, Tandon P, et al. Using Machine Learning to Predict ICU Transfer in Hospitalized COVID-19 Patients. J Clin Med. 2020 Jun 1;9(6):1668.
Syeda, H. B., Syed, M., Sexton, K. W., Syed, S., Begum, S., Syed, F. & Yu Jr, F. (2021). Role of machine learning techniques to tackle the COVID-19 crisis: systematic review. JMIR medical informatics, 9(1), e23811.
Hu J, Wang Y. The Clinical Characteristics and Risk Factors of Severe COVID-19. Gerontology. 2021;67(3):255-266.
Burian E, Jungmann F, Kaissis GA, et al. Intensive Care Risk Estimation in COVID-19 Pneumonia Based on Clinical and Imaging Parameters: Experiences from the Munich Cohort. J Clin Med. 2020 May 18;9(5):1514.
Gemini Medicine. (2024, March 1). Gemini Data Dictionary. GEMINI Data Dictionary. https://geminimedicine.ca/wp-content/uploads/2023/12/GEMINI-Data-Repository-Data-Dictionary-v3.0.2.html
Osman, A. M., Abdrabou, A. M., Hashim, R. M., Khosa, F., & Yasin, A. (2021). COVID-19 pandemic: CT chest in COVID-19 infection and prediction of patient’s ICU needs. Egyptian Journal of Radiology and Nuclear Medicine, 52(1), 135.
He, H. and Garcia, E.A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, volume 21.
Liu, X.Y. and Wu, J. and Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)}, volume 39.
Batista, G.E. and Prati, R.C. and Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, volume 6.
Lemaître, G. and Nogueira, F. and Aridas, C.K. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research, volume 18.
Fisher, Andrew and Rudin, Cynthia and Dominici, Francesca. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, volume 20.
Martono, Fatmawati F, Mulyanti S. Risk Factors Associated with the Severity of COVID-19. Malays J Med Sci. 2023 Jun;30(3):84-92.
Karimi Z, Malak JS, Aghakhani A, et al. Machine learning approaches to predict the need for intensive care unit admission among Iranian COVID-19 patients based on ICD-10: A cross-sectional study. Health Sci Rep. 2024 Sep 2;7(9):e70041.
Famiglini L, Campagner A, Carobene A, Cabitza F. A robust and parsimonious machine learning method to predict ICU admission of COVID-19 patients. Med Biol Eng Comput. 2022 Mar 30:1–13.
Chieregato M, Frangiamore F, Morassi M, et al. A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data. Sci Rep. 2022 Mar 14;12(1):4329.
Beardwood JP, Kerr JA. Coming soon to a health sector near you: An advance look at the new Ontario Personal Health Information Protection Act (PHIPA). Healthc Q. 2004;7(4):62-7.
Finlayson SG, Subbaswamy A, Singh K, et al. The Clinician and Dataset Shift in Artificial Intelligence. N Engl J Med. 2021 Jul 15;385(3):283-286.
Sarabdeen J, Chikhaoui E, Mohamed Ishak MM. Creating standards for Canadian health data protection during health emergency - An analysis of privacy regulations and laws. Heliyon. 2022 May 21;8(5):e09458.

Appendix

Screenshot 2025-06-14 at 12.16.39 AM.png