Heart Failure Patient Prognosticator

Project phases

Published: June 8, 2022

Last Updated: 2 years ago.

View All Projects

In 2019, Vancouver Coastal Health (VCH) and Providence Health Care (PHC) collaborated with Decision Support at VCH and Medical Quality Leadership & Practice teams, to create a data repository of patients with a primary diagnosis of heart failure. Using data from this repository, VCH and PHC cardiologists sought to demonstrate the feasibility of developing a machine-learning (ML) model for predicting the future risk profile of individual heart failure patients, based on their medical history.

Together with VCH and PHC, UBC Cloud Innovation Centre (CIC) developed ML models to enable practitioners to predict four primary patient outcomes. The solution aims to better inform decisions regarding post-discharge treatment plans, yield more targeted interventions, reduce hospitalizations, and improve overall survival in patient outcomes.

Approach

A person looking at a computer screen of codes.

The repository of data, otherwise known as the Heart Failure Patient (HFP) audit, is a detailed examination of unplanned hospitalizations across three hospitals in Vancouver, over a two-year period, including external data from Cardiac Services BC. After the collection of this robust volume of data, analysis was required to determine what factors might inform predictions about death or rehospitalization. Practitioners at VCH and PHC approached the UBC CIC to see whether the implementation of machine learning could assist with data analysis, confirm the utility of the data collected and identify the type of data most important for patient outcome predictions.

The models are trained on AWS Sagemaker Jupyter notebooks, an interactive platform designed to run python code. The results obtained from this solution aim to serve as a baseline for practitioners to better inform decisions regarding post-discharge treatment plans, yield more targeted interventions, reduce hospitalizations, and improve overall survival in patient outcomes

Technical Details

The solution trained Machine Learning Models to predict Hospitalization, Death or Risk within N days from the patient’s initial hospital discharge. The solution works by loading patient data stored in an S3 bucket, an AWS service for storing data, and then training the data using AWS Sagemaker, an AWS service for machine learning. The solution offers a new way for practitioners to interact with the HFP audit dataset using Jupyter notebooks to analyze, interpret and visualize the data. 

Four types of prediction tasks were identified for this solution:

1. A binary classifier for Death within N days from first discharge.

2. A binary classifier for Rehospitalization within N days from first discharge.

3. A binary classifier for Risk (either death or rehospitalization) within N days.

4. A regression model to predict the number of hospitalizations.

For the classification tasks, the solution utilizes the following classifiers: Random Forest Classifier, Gaussian Naive Bayes Classifier, K-Nearest Neighbours Classifier and a Dummy Baseline classifier, which makes stratified predictions based on class distributions.

The regression tasks utilize three regression models: a Linear Regression model, a Ridge Regression model, and an Extra Trees Regression model.

The models were evaluated using cross-validation, a statistical method used to estimate the skill of machine learning models. Moreover, three accuracy metrics: overall accuracy and two-class accuracies, were utilized to assess the models employed in this solution. The solution also uses a confusion matrix, a technique for summarizing the performance of a classification algorithm, to analyse the models further. Lastly, to advance model analysis, metrics such as Specificity and Sensitivity were also implemented.

A picture of the confusion matrix for the predictive models for death. There are four squares each in different colours with the x axis being "True Label" and the y axis being "Predicted Label".

Predictive Models for Death – Confusion Matrix for a model

The solution measures feature importance through permutation importance to better understand which features are most important for patient outcome predictions. Permutation importance involves permuting or ‘shuffling’ data in a particular column of the dataset and then observing its performance on a test set. A decrease in the performance informs the user of how important a particular feature in the dataset is for predictions. 

Since the current dataset is highly varied, the solution deploys imputation techniques to ensure that the models are trained using as much data as is available. For this project, column imputation was informed by a medical expert. The solution uses one of three imputation techniques for different columns, including Median Imputation (imputing the median value of the column), KNN Imputation (using a KNN classifier to predict the value of the column), or Fixed Value Imputation (imputing a fixed value).

The models are trained on AWS Sagemaker Jupyter notebooks, an interactive platform designed to run python code. The solution also uses  Scikit-learn’s (sklearn) library, a software machine learning library for the Python programming language, to implement the classifiers and regression models.

Supporting Artifacts

A chart showing the predictive regression models for hospitalisation within N days.
A bar chart titled "Predicting Hospitalizations within N days: GNB".

Predictive Regression Models for Hospitalization within N days – Feature Importances

A screenshot of a chart that predicts hospitalizations 60 days from discharge.

Predictive Models for Risk – Plot Comparisons between models and comparisons over time

Link to solution on GitHub: www.github.com/UBC-CIC/heart-failure-prognosticator

Acknowledgements

Vancouver Coastal Health

Providence Health Care

Photo by Towfiqu barbhuiya on Unsplash

Photo by Nubelson Fernandes on Unsplash

About the University of British Columbia Cloud Innovation Centre (UBC CIC)

The UBC CIC is a public-private collaboration between UBC and Amazon. A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.

Using Amazon’s innovation methodology, dedicated UBC and Amazon CIC staff work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.