As with any assay, L1000 data is noisy. Experimental replicates (the same compound tested on the same cell line under the same conditions) often result in different levels of expression being measured. The process of de-noising the L1000 data makes it easier to see true assay response, and pick a representative concentration for each compound.
Sepsis is the leading cause of death in the Intensive Care Unit, and it’s responsible for 1 in 3 hospital deaths. Each hour without treatment increases a patient’s risk of death by 4-8%. Thus, early detection of sepsis is crucial for improving survival.
With the inclusion of advanced data preprocessing and machine learning, our research has been able to better predict which patients will get sepsis during their hospital stay.
In our study, we sought to develop a robust sepsis prediction model using physiological data (vital signs and lab results) from the 2019 PhysioNet Challenge. In the first phase of our analysis, we trained a recurrent neural network using long short term memory (LSTM). While the LSTM parameters themselves can be optimized in well-understood ways to produce a more accurate classifier, the impact of pre-processing parameters on sepsis prediction performance remain largely unknown.
Now entering into our second phase, we are applying Augusta™ on a patient data set consisting of ~40,000 patients with the intent of more systematically considering the impact of upstream decisions made in processing the data before training a model.
Initial results look promising.