CASE STUDY: Machine learning for activity prediction, as part of lead compound generation The Challenge: The ability to quickly iterate multiple large feature sets with the flexibility to test models at scale is a challenge for any data scientist.  
USE CASE: Value Based Care CLIENT: Major UK based Healthcare network in partnership with Intacare. OVERVIEW: The annual cost of radiotherapy is escalating year-on-year with little visibility of root cause and control.  Maintaining cost efficient healthcare for patients required an investigation of current code/claim and cost data. GOAL: Identify and quantify potential cost savings of revising existing reimbursement mechanisms. “Processing the data manually would have required many months of man hours.” Matt Hickey, CEO Intacare
A Comprehensive Overview of Data Cleaning and Feature Engineering Techniques for Clinical Data Housed in Electronic Medical Records. The electronic medical record (EMR) is a digital version of a patient’s chart that collects data related to a patient’s visit such as past medical history, lab results, prescriptions, diagnosis, and patient reported outcomes. EMR data are notorious for being messy, incomplete, and inconsistent. Part of the “messiness” is due to the diverse nature of clinical data.
Challenge: Combine Disparate Data Sets in PreProcessing for ML Summary: Compelling results show that combining data sources generally allowed better diagnostic performance than with any data set alone (Figures 1&2)
Report Title: Distributed Processing Frameworks for Machine Learning of Combined Biomedical Data Types Whitepaper discusses the computing requirements of combined data types for which the Augusta™ platform was constructed to operate This is a must read for understanding the  compute power complexities of pre-processing various data types and identifying ideal scenarios when using/pricing Augusta™ Please complete the form below to download our free white paper.