Predicting ICU Mortality Risk Using Machine Learning
Overview:
This project applied a neural network model to predict in-hospital mortality within the first 24 hours of a patient’s Intensive Care Unit (ICU) admission. By leveraging demographic, comorbidity, and vital sign data from a large international ICU dataset, we explored the potential of machine learning to support clinical forecasting in a high-stakes setting.
​
Tools & Technologies:
Python, Google Colab, pandas, numpy, matplotlib, scikit-learn, pytorch, neural networks
​
Background:
In the ICU, early prediction of patient mortality is crucial but remains a significant challenge. Current tools like APACHE or SOFA scores offer only generalized risk assessments and often lack the precision needed for individualized care. Without accurate early predictions, clinicians may struggle to allocate resources effectively, tailor treatment plans, or guide end-of-life discussions with families.
This uncertainty can lead to delays in critical interventions, emotional strain for families, and inefficiencies in hospital operations. Machine learning offers a promising solution by leveraging high-dimensional data to provide timely, data-driven mortality predictions. This project explores whether a neural network model can use data from the first 24 hours of ICU admission to support more accurate and actionable risk assessments.
​
Research Questions:
​
1. How accurately can we predict ICU patient mortality using hospital data collected from the first 24 hours of admission?​​
​
2. What are the strengths and limitations of a neural network to address the primary research question?
​​
Rationale:
In the high-pressure environment of the ICU, the ability to anticipate patient outcomes early in the admission process is invaluable. Accurately predicting mortality risk in the first 24 hours can help:
-
Prioritize patients for life-saving interventions
-
Allocate ICU resources more efficiently
-
Inform decisions between aggressive vs. palliative care
-
Support timely, compassionate communication with families
​
This project focused on applying machine learning techniques to ICU data from the first 24 hours to predict mortality, with the aim of enhancing clinical decision support systems in critical care.
​
Dataset:​​​
Source: GOSSIS dataset, certified by Harvard Privacy Lab, compiled with contributions from 200+ institutions in the United States, Argentina, Australia, New Zealand, Brazil, and Sri Lanka and over a 1 year period
Scope: 91,000+ ICU records, 186 variables
Features Used: 16 features including, demographics (age, gender, ethnicity), vitals, ​and comorbidities
Target Variable: hospital_death (binary: 0 = survived, 1 = died)
​
Data Pre-Processing & ML Modeling:
We began by conducting exploratory data analysis to understand the structure, data types, and distributions of features relevant to predicting ICU mortality. Using the data dictionary as a reference, we identified an initial set of 19 clinically meaningful variables spanning demographics, vitals, and comorbidities that were sufficiently complete for analysis. The target variable was hospital_death.
Missing Data Handling: Dropped features with high missingness: diastolic blood pressure and first-hour min/max temperature.
For binary comorbidity features (e.g., AIDS, cirrhosis), assumed Missing Not at Random (MNAR) and imputed using mode, given minimal missingness.
​
For continuous variables:
-
Age (MCAR): imputed using mean, due to low skewness.
-
BMI (MCAR): imputed using median to mitigate the influence of outliers.
-
Outliers in BMI were filtered out if values fell outside the range of 17–60.
​
Categorical Encoding:
Applied one-hot encoding to ethnicity and gender for compatibility with neural network architecture.
​
Multicollinearity Check:
Examined all remaining features for multicollinearity; no features were removed based on this assessment.
​
Scaling:
All numerical features were scaled prior to model training to improve optimization performance.
​
Class Imbalance:
The target label was heavily imbalanced, with significantly fewer deaths compared to survivals. This influenced our model choice and evaluation metrics.
​
Modeling Approach: Given the dataset's size and dimensionality, we selected a neural network due to its capacity to model complex, non-linear relationships and perform well on unseen data.
​
Final model used 16 features, including:
-
Demographics: age, gender, ethnicity
-
Vitals: max heart rate, respiration rate
-
Comorbidities: AIDS, cirrhosis, diabetes, etc.
Training Configuration: 70:30 train-test split
​
Neural Network Architecture:
-
Input features: 16
-
Hidden units: 5
-
Epochs: 500
-
Batch size: 32
-
Learning rate: 0.01
-
Loss function: binary cross-entropy with class weighting (pos_weight) to address imbalance
-
Optimizer: Adam
-
Regularization: L2 penalty to prevent overfitting
​
Results:
​
​​
​
​
​
​
​
​
​
Overall Accuracy: 0.78​
AUC Score: 0.62
​
The model performed well in identifying survivors, but struggled with correctly identifying patients who died. Despite introducing class weights, false positives (57% of class 1 predictions) and false negatives (18% of class 0 predictions) were significant. The F1-score for mortality was lower than random guessing, which underscores the model's limitations in sensitivity.
​
Conclusion:
This neural network model provides a starting point for using machine learning to support ICU mortality prediction. While the model demonstrated reasonable accuracy and strong performance for the majority class (survivors), its clinical utility is limited by high false positive and false negative rates when predicting mortality. Furthermore, the black-box nature of neural networks restricts interpretability and this can present barrier to deployment in a high-stakes medical setting.
​
Future improvements should focus on:
-
Addressing class imbalance using techniques like SMOTE or ensemble methods
-
Exploring interpretable models or adding explainability layers (e.g., SHAP values)
-
Comparing against other algorithms (e.g., logistic regression, decision trees, XGBoost)
-
Expanding feature engineering and hyperparameter tuning
-
Validating on external datasets and evaluating generalizability
​
In a critical care setting, it must acknowledged that accuracy alone is insufficient for evaluating a model's performance. Minimizing false negatives and understanding model reasoning are essential for safe deployment.
​
Keywords:
​ICU Mortality, Neural Network, Health Informatics, Class Imbalance, Machine Learning, Critical Care, Deep Learning, Binary Classification, Explainability, Risk Prediction

