Brian Stoller 3/16/21 Brian Stoller 3/16/21

Feature Selection Using Contingent AI: Going Beyond Mutual Information

Feature selection results in an information-rich vector of understandable features, ultimately leading to higher performing and more explainable models. The correct approach is to consider features in combination, in order to maximize the information available to the model.

Gabe Musso 5/4/20 Gabe Musso 5/4/20

MOA Prediction Platform to Help With COVID-19 Research

Gabe Musso 3/23/20 Gabe Musso 3/23/20

Mitigating Batch Effects in Cell Painting Data

With the advent of high content screening methodologies (e.g. cellular imaging, transcriptomics, etc.), it becomes more challenging to tease apart and visualize batch effects. This is further compounded when building machine learning models which can easily use these confounding variables instead of real biological signal to generate predictions leading to poor real world relevance.

Gabe Musso 2/12/20 Gabe Musso 2/12/20

De-noising CMap L1000 Data

As with any assay, L1000 data is noisy. Experimental replicates (the same compound tested on the same cell line under the same conditions) often result in different levels of expression being measured. The process of de-noising the L1000 data makes it easier to see true assay response, and pick a representative concentration for each compound.

Gabe Musso 1/28/20 Gabe Musso 1/28/20

When are Two Compounds the Same?

When are two compounds the same? The effect of Simplified Molecular Input Line Entry System (SMILES) format on chemical database overlap including best practice for canonicalization and harmonization to understand the impact of these compound effects on a particular dataset and specific application.

Gabe Musso 1/21/20 Gabe Musso 1/21/20

Dealing with Missing Values in Healthcare Data

In this post, we highlight the challenges of missing values when modelling with time-series data of EMRs and discuss some techniques to address it.

Gabe Musso 10/2/19 Gabe Musso 10/2/19

Feature Engineering of Electronic Medical Records

A comprehensive overview of data cleaning and feature engineering techniques for clinical data

Gabe Musso 6/20/19 Gabe Musso 6/20/19

Dishing Dirt About Clean Data

A daughter's desire to please her parents demonstrates how a data scientist with good intentions can cause far more harm and expense in the long run, through the selection and creation of the wrong features during data pre-processing.

Gabe Musso 3/20/19 Gabe Musso 3/20/19

'Contingent AI', What is it?

Gabe Musso 2/7/19 Gabe Musso 2/7/19

PRESS RELEASE: Dr. Calum MacRae Joins as Strategic Advisor

BioSymetrics