In this webinar, Mikalai Malinouski Ph.D.demonstrated how BioSymetrics uses AI in the prediction of mechanism of action (MOA), and provided a an example of how machine learning workflows can benefit both novel drug discovery and drug repositioning.
Some of our high level learning was recently presented at the 2020 MIT AI for Drug Discovery conference in February.
BioSymetrics Mechanism of Action Prediction Platform framework will be put to work both in prioritizing chemical libraries and in predicting mechanism for identified compounds. Per the three COVID-19 research initiatives announced by the SCN in the post below, we will be working with William Stanford, Amy Wong, Molly Shoichet, Stephen Juvet, Samira Mubareka, Scott Gray-Own, and Mitchel Sabloff as a component of their research. We will also have the opportunity to screen one of our
A batch effect occurs whenever non-biological factors begin to influence your experimental readout. Often, these effects can be so large that they present a barrier towards understanding the underlying biology. When working with simple readouts, it can be easy to visualize and understand batch effects. However, with the advent of high content screening methodologies (e.g. cellular imaging, transcriptomics, etc.), it becomes more challenging to tease apart and visualize these effects. This is further compounded when
The Connectivity Map (CMap) is a conceptual, comprehensive linking of cellular signatures to genomic (i.e. mutation) and pharmacological (i.e. drug-mediated) effects. The CMap dataset is based on the L1000 assay (developed by the Broad Institute), which measures the mRNA abundance of 978 landmark genes plus 80 control genes from human cells. As with any assay, L1000 data is noisy. Experimental replicates (the same compound tested on the same cell line under the same conditions) often
The effect of SMILES format on chemical database overlap A common format for representing compounds is the Simplified Molecular Input Line Entry System (SMILES), which encodes a chemical structure as a short string. But despite being a standard format, it is possible to represent the same structure in multiple ways. For example, caffeine can be represented as “CN1C=NC2=C1C(=O)N(C(=O)N2C)C” or equally validly as “Cn1c(=O)c2c(ncn2C)n(C)c1=O”, depending on the starting atom.
Electronic Medical Records (EMRs) contain a large number of missing values which imposes difficulties for data scientists who want to model after this data. In a previous post, we discussed the different feature engineering methods available on diagnosis codes, medication data and clinical notes of EMRs. In this post, we highlight the challenges of missing values when modelling with time-series data of EMRs and discuss some techniques to address it. EMR data, especially for laboratory
PURPOSE: Identify mechanism of action (MoA) from animal phenotype models OVERVIEW: BioSymetrics leverages a proprietary machine learning platform (Augusta™) to generate structure-based activity predictions. This in combination with a vertebrate, in vivo phenotypic profiling framework has allowed us to make phenotype-mechanism association predictions across a range of potential clinical applications. INPUT: Chemical structures, experimental datasets (public and private) OUTPUT: Implicated pathways/processes USE CASE: Phenotype MoA Prediction 1INPUT: Phenotype Assays2Activity prediction model is fit and validated3The
PURPOSE: Quantify and correct bias from high-content screening (HCS) data INPUT: Chemical structures, morphological properties (or original images) OUTPUT: Dynamic workflow that integrates bias removal and mechanism prediction USE CASE: Batch effects are a common issue when dealing with high througput assays, often resulting in patterns within the data unrelated to assay response. Machine Learning (ML) models latch on to any source of regularity Without Augusta™ pre-processing and Contingent-AI (patent pending), ML models will learn
CASE STUDY: Machine learning for activity prediction, as part of lead compound generation The Challenge: The ability to quickly iterate multiple large feature sets with the flexibility to test models at scale is a challenge for any data scientist.