With the advent of high content screening methodologies (e.g. cellular imaging, transcriptomics, etc.), it becomes more challenging to tease apart and visualize batch effects. This is further compounded when building machine learning models which can easily use these confounding variables instead of real biological signal to generate predictions leading to poor real world relevance.
Sepsis is the leading cause of death in the Intensive Care Unit, and it’s responsible for 1 in 3 hospital deaths. Each hour without treatment increases a patient’s risk of death by 4-8%. Thus, early detection of sepsis is crucial for improving survival.
With the inclusion of advanced data preprocessing and machine learning, our research has been able to better predict which patients will get sepsis during their hospital stay.
In our study, we sought to develop a robust sepsis prediction model using physiological data (vital signs and lab results) from the 2019 PhysioNet Challenge. In the first phase of our analysis, we trained a recurrent neural network using long short term memory (LSTM). While the LSTM parameters themselves can be optimized in well-understood ways to produce a more accurate classifier, the impact of pre-processing parameters on sepsis prediction performance remain largely unknown.
Now entering into our second phase, we are applying Augusta™ on a patient data set consisting of ~40,000 patients with the intent of more systematically considering the impact of upstream decisions made in processing the data before training a model.
Initial results look promising.