High Content Screening for COVID-19

In response to the global COVID-19 pandemic, there has been an extensive effort to find effective therapeutics to treat this disease. To accelerate the regulatory process, researchers are investigating drugs previously approved for other indications towards the treatment of COVID-19. In this effort, Recursion Pharmaceuticals has developed an assay that can effectively screen thousands of drugs against SARS-CoV-2. Using this assay, they have screened over 1600 clinically relevant small molecules and released all of this information online1. This blog post aims to explore this data set and provide further insights.

Using a high content screening (HCS) strategy, RxRx-19 is comprised of over 305,520 images across 1672 drugs, 2 cell lines, multiple disease states and varying experimental conditions. Each image was preprocessed using Recursion’s proprietary deep learning pipeline to generate a set of 1024 features for each image. To visualize this data, UMAP2 was used to map these features onto a lower two-dimensional plot as seen below in Figure 1.

The following graph is interactive, use your mouse to hover over points for more detail. Scroll to zoom in or drag to pan across the axes.

Figure 1: Dimension reduction of RxRx-19 data using UMAP mapped according to (a) experiment type, (b) disease condition and (c) plate. To allow for smoother interactivity, a random subset of 1200 points are shown equally sampled across experiment types.

Looking above at Figure 1, it is clear that the data is largely separated according to experiment type. The assay was run in both VERO (kidney epithelial cells from the African green monkey) and HRCE (human renal cortical epithelial cells). To simplify these findings and to focus on human biology, this blog post will focus on the results from the HRCE assay. However, it is important to recognize the large variation between HRCE-1 and HRCE-2 samples to capture the full range of compounds tested (see Figure 2 below).

Figure 2: Venn diagram depicting the number of compounds tested across HRCE experiment types.

Using BioSymetric’s denoising engine, we can quantify the degree to which any confounding variable may influence downstream predictions and biological insights. In short, this is done via comparing the predictive performance of machine learning models from predicting true biological signal (disease state without treatment) with predicting confounding variables (experiment type, plate, batch, etc.). Negative scores here indicate that the confounding variable has a stronger signal than the biological variable.

Table 1: Degree of confounding across non-biological variables within the HRCE assay.

Potential ConfounderAdjusted Score
(lower is more bias)Experiment type-0.184Plate ID0.384Imaging site (4 sites per well)0.696Well location0.786

Using this information, it is clear that the major confounding variable here is experiment type. Using this same score from above, we can compete multiple batch normalization procedures to identify specific parameters optimized for this data set. Using this process (implemented within our denoising platform), we are able to achieve a marginally positive confounder score (0.082) indicating that we have not only decreased the amount of noise but have also allowed for biological signal to be predicted more strongly than non-biological variation. While these processes were applied on the full feature set, we can apply UMAP again to this batch-corrected data to visualize the resulting effect (Figure 3 below).

Figure 3: Visualizing batch-corrected data across HRCE experiment types according to (a) experiment type and (b) disease condition. To allow for smoother interactivity, a random subset of 800 points are shown equally sampled across experiment types.

Using this batch-corrected data set, we can now begin to explore the actual results of the assay. To do this, we built a machine learning classifier that is trained using untreated samples to classify them according to whether they are healthy (mock or UV-irradiated virus) or diseased. Using this trained model, we predicted the degree to which our diseased, but treated samples, were able to recapitulate the healthy phenotype. Since all compounds were assayed in replicate across a range of doses, we can evaluate each compound according to its dose response curve. The drug response navigator below sorts each drug according to their overall response at high concentrations as well as their tendency to see a consistent increase with respect to dose (Kendall’s Tau).

The interactive figure below allows us to easily visualize and compare drugs according to their dose response curves. Click a point on the drug response navigator (top left panel) to view the average predicted response at each dose (top right panel). The bottom card will show the structure of the selected compound.

The results here clearly demonstrate that, within this study, remdesivir shows the strongest potential for COVID treatment. From concentrations as low as 0.3 µM, this drug is able to recover the healthy phenotype across both HRCE experimental conditions. In a similar manner, the related compound, GS-441524 also performs well, but requires a slightly higher dose (3 µM) to consistently show effect. Since remdesivir is a prodrug of GS-441524, we can expect that both of these exert their effect through a common mechanism of inhibiting viral RNA replication3. The three remaining compounds that show strong effects are CX-4945, almitrine and BYL719. CX-4945 (silmitasertib) is an inhibitor of protein kinase CK24. While initially developed to treat brain cancers, this drug is now being investigated to treat COVID. BYL719 (alpelisib) is specific PI3K inhibitor initially developed to treat breast cancer5. Interestingly, while almitrine was intended for the treatment of COPD, recent computational analysis has revealed that it may act as an inhibitor of the SARS-CoV-2 protease (MPro)6. Although showing weaker effect, and perhaps requiring higher concentrations, several other drugs (e.g. nilotinib, panobinostat, halofuginone, etc.) seem to reduce the viral phenotype using a diversity of mechanisms.

Overall, this phenotypic screen published by Recursion Pharmaceuticals demonstrates that there are several avenues for developing treatments for COVID-19. Through HCS phenotypic screening, we are able to identify a plethora of underlying mechanisms beyond those explored by target-based approaches alone. At BioSymetrics, we specialize in handling complex biological data and building machine learning models to reveal insightful biology.

Share

References

Heiser, Katie, et al. "Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2." bioRxiv (2020).
McInnes, Leland, John Healy, and James Melville. "Umap: Uniform manifold approximation and projection for dimension reduction." arXiv preprint arXiv:1802.03426 (2018).
Gordon, Calvin J., et al. "The antiviral compound remdesivir potently inhibits RNA-dependent RNA polymerase from Middle East respiratory syndrome coronavirus." Journal of Biological Chemistry 295.15 (2020): 4773-4779.
Siddiqui-Jain, Adam, et al. "CX-4945, an orally bioavailable selective inhibitor of protein kinase CK2, inhibits prosurvival and angiogenic signaling and exhibits antitumor efficacy." Cancer research 70.24 (2010): 10288-10298.
Furet, Pascal, et al. "Discovery of NVP-BYL719 a potent and selective phosphatidylinositol-3 kinase alpha inhibitor selected for clinical evaluation." Bioorganic & medicinal chemistry letters 23.13 (2013): 3741-3748.
Hu, Fan, Jiaxin Jiang, and Peng Yin. "Prediction of potential commercially inhibitors against SARS-CoV-2 by multi-task deep model." arXiv preprint arXiv:2003.00728 (2020).

High Content Screening for COVID-19

'Contingent AI', What is it?

PRESS RELEASE: Dr. Calum MacRae Joins as Strategic Advisor

BioSymetrics