Already working with clinicians, leveraging international collaborations to put prediction platforms in place, and re-investigating our drug lead pipeline, BioSymetrics will pursue any collaboration to have a positive impact against COVID and support the heroes working tirelessly in the hospitals and clinics around the world.
With the advent of high content screening methodologies (e.g. cellular imaging, transcriptomics, etc.), it becomes more challenging to tease apart and visualize batch effects. This is further compounded when building machine learning models which can easily use these confounding variables instead of real biological signal to generate predictions leading to poor real world relevance.
As with any assay, L1000 data is noisy. Experimental replicates (the same compound tested on the same cell line under the same conditions) often result in different levels of expression being measured. The process of de-noising the L1000 data makes it easier to see true assay response, and pick a representative concentration for each compound.
When are two compounds the same? The effect of Simplified Molecular Input Line Entry System (SMILES) format on chemical database overlap including best practice for canonicalization and harmonization to understand the impact of these compound effects on a particular dataset and specific application.
CASE STUDY: Machine learning for activity prediction, as part of lead compound generation
The Challenge: The ability to quickly iterate multiple large feature sets with the flexibility to test models at scale is a challenge for any data scientist. Continue Reading