Augusta is a biomedical AI (Artificial Intelligence) and ML (Machine Learning) framework designed to transition time from data pre-processing and integration to model building and interrogation using familiar toolsets within Python. Augusta begins with diverse, raw medical data types (e.g. images, chemical structures, genomic data, tabular data), and operates across three modules:

  1. Augusta Pre-Processing
  2. Augusta ML (Machine Learning)
  3. Augusta Architect
Common Use Cases:
  • Drug discovery and development, incl. small molecule activity prediction
  • Diagnostics & precision medicine
  • Patient outcomes prediction and stratification
  • Preprocessing
  • Feature reduction & selection
  • Data Integration (e.g. combining genomics with clinical data)
  • Model creation
  • Model tuning
  • Model training
  • Model interrogation
  • Visualization
  • Faster, effective data pre-processing, directly integrated with model building
  • Seamless distributed computing
  • Flexible architecting of processing pipelines, changing as data type and volume requires
  • Adaptable to changing needs/preferences over time


  • Quickly standardize or normalize data
  • Permute over pre-processing options
  • Create workflows where critical biases can be reduced
  • Save, modify, and re-run workflows

Use with:

  • Any Data Source: BYOD (Bring Your Own Data) local, databases, cloud
  • Any Combination of Sources: Modular and customizable pipelines for processing raw data in any combination

Data Integration (sample pipelines)

  • MRI/fMRI and other imaging modalities
  • EEG
  • Genomics,
  • Proteomics
  • Chemistry
  • EHR/EMR data
  • Streaming/wearables data
  • Tabular data
  • Custom data options available

Feature Optimization

Integrating data of various types (e.g. combining genomics with clinical data), enables the engineering of unique features, providing for greater machine learning insights. Features can be easily grouped, sub grouped, and archived, making them easily accessible to models, increase tuning parameters, and enhanced interrogation capabilities


Model Creation

Use machine learning models from Tensorflow and Scipy, with a unified syntax and output

Model Tuning and Interrogation

Iterate over model-specific parameters, investigate the impact of combinations of pre-processing decisions and model hyperparameters in the context of model performance

  • Evaluate model performance via cross-validation, using metrics such as accuracy, precision/recall and AUC
  • Implement multiple feature reduction methods and evaluate impact on model performance
  • Visualization incorporating Seaborn and Matplotlib packages

Contingent-AI™ (Patent pending) allows data scientists to permute options in the model generation process based on decisions made in pre-processing.


Augusta Architect is a simple, Python-based syntax that allows the processing and integration of multiple, diverse data types, and ability to run/compare multiple machine learning algorithms

Augusta Architect Code Blocks

Augusta Architect uses “code blocks” to construct the data flows and schematic framework for preprocessing and machine learning, reducing the time required to program the flow from data collection through your machine learning engine.  A single toolset from data ingestion to result output, eliminates the need to port data from system to system.  Augusta effectively maintains data integrity and eliminates error prone steps.  The result:

  • Increased speed to market
  • Easy iteration and edits
  • Stronger confidence
  • Greater precision
  • Lower cost of R&D efforts
Request a Demo
Augusta™ Video
Watch time  2:min
Case Studies
June 20, 2019 in Case Study

Case Study: Lead Compound Generation using Augusta™

CASE STUDY: Machine learning for activity prediction, as part of lead compound generation The Challenge: The ability to quickly iterate multiple large feature sets with the flexibility to test models…

Read More
May 28, 2019 in Case Study

Case Study: Combining Data Improves Accuracy in the Diagnosis of Alzheimer’s Disease

Challenge: Combine Disparate Data Sets in PreProcessing for ML Summary: Compelling results show that combining data sources generally allowed better diagnostic performance than with any data set alone (Figures 1&2)

Read More
Request a Demo
Blog Posts
September 6, 2019 in Blog Post

Identifying and Addressing the Challenges in the Diagnosis of Sepsis

Sepsis is the leading cause of death in the Intensive Care Unit, and it’s responsible for 1 in 3 hospital deaths. Each hour without treatment increases a patient’s risk of... Read More
June 20, 2019 in Blog Post

Dishing Dirt About Clean Data

A daughter’s desire to please her parents demonstrates how a data scientist with good intentions can cause far more harm and expense in the long run, through the selection and creation of the wrong features during data pre-processing.

Read More
March 20, 2019 in Blog Post

‘Contingent AI’, What is it?

What is Contingent AI?  In any data science pipeline there are a number of options that are selected for data processing (e.g. contrast settings for medical images, data imputation approaches,…

Read More
Request a Demo
Webinar (Archive)
Click image to view. Watch time 20:min
Machine Learning with ContingentAI