Bringing data science and AI/ML tools to infectious disease research
H3D Foundation and Ersilia present


Event Sponsors



Session 2: virtual screening cascade
Breakout session


I have access to a library of compounds for only their structure (SMILES) is available. I want to identify a few hits for an anti-infective drug, but only have the capacity to test 10 molecules in the first round. What can I do?


Virtual Screening Cascade
Retrosynthetic accessibility
Other bioactivities
Aqueous solubility
hERG blockade
Cytotoxicity
Antimalarial activity


Virtual Screening Cascade
Retrosynthetic accessibility
Other bioactivities
Aqueous solubility
hERG blockade
Cytotoxicity
Antimalarial activity
Good drug candidate


Virtual Screening Cascade
Retrosynthetic accessibility
Other bioactivities
Aqueous solubility
hERG blockade
Cytotoxicity
Antimalarial activity


Virtual Screening Cascade
Retrosynthetic accessibility
Other bioactivities
Aqueous solubility
hERG blockade
Cytotoxicity
Antimalarial activity
Bad drug candidate


Ersilia Model Hub
Repository of pre-trained, ready to use AI/ML models for drug discovery
- Available models can be browsed at https://ersilia.io/model-hub
- The code is available at https://github.com/ersilia-os/ersilia (GPLv3 License)
- Accessibility:
- Command Line Interface
- Google Colab implementation
Turon, G., & Duran-Frigola, M. (2021). Ersilia Model Hub (Version 1.0.0) [Computer software]


Getting Started

https://github.com/ersilia-os/event-fund-ai-drug-discovery


Ersilia Model Hub in Colab
Turon, G., & Duran-Frigola, M. (2021). Ersilia Model Hub (Version 1.0.0) [Computer software]
#@title The Ersilia Model Hub
#@markdown Click on the play button to install Ersilia in this Colab notebook.
%%capture
%env MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_4.12.0-Linux-x86_64.sh
%env MINICONDA_PREFIX=/usr/local
%env PYTHONPATH={PYTHONPATH}:/usr/local/lib/python3.7/site-packages
%env CONDA_PREFIX=/usr/local
%env CONDA_PREFIX_1=/usr/local
%env CONDA_DIR=/usr/local
%env CONDA_DEFAULT_ENV=base
!wget https://repo.anaconda.com/miniconda/$MINICONDA_INSTALLER_SCRIPT
!chmod +x $MINICONDA_INSTALLER_SCRIPT
!./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX
!python -m pip install git+https://github.com/ersilia-os/ersilia.git
!python -m pip install requests --upgrade
import sys
_ = (sys.path.append("/usr/local/lib/python3.7/site-packages"))



Ersilia Model Hub
Each model is identified by a code (eosxxxx) and a slug (1-2 word reference).
There are five basic commands on Ersilia:
- Fetch the model from its online storage
- Serve the model on your system
- Run the desired API (predict)
- Close the model (stop serving)
- Delete the model from the system
Turon, G., & Duran-Frigola, M. (2021). Ersilia Model Hub (Version 1.0.0) [Computer software]


Breakout Session Exercise
We will use the 400 compounds from the MMV Malaria Box to run a series of predictions using models available in the Ersilia Model Hub and select the molecules with higher interest for experimental testing.
Steps:
- Install the Ersilia Model Hub in Colab (together)
- Run a model for antimalarial activity prediction and analyse the results (together)
- In small groups, run and analyse the outcomes of the different suggested models


Antimalarial Activity

Bosc et al, Journal of Cheminformatics, 2021


Mounting Google Drive
We will first mount google drive on Colab to store the model predictions and import some basic Python Packages
#mount your own GDrive in Colab
from google.colab import drive
drive.mount('/content/drive')
import matplotlib.pyplot as plt
import pandas as pd


MMV Malaria Box
The dataset is prepared and stored as a csv file in the /data folder of the h3d_ersilia_ai_workshop in your personal Drive
#we can open it as a pandas dataframe
smiles = "drive/MyDrive/h3d_ersilia_ai_workshop/data/session2/mmv_malariabox.csv"
df=pd.read_csv(smiles)
df.head()



Antimalarial Activity: eos2gth
#Step One: Retrieve the model from the internet
!ersilia fetch eos2gth
We use the ! notation to access the CLI commands for Ersilia from the notebook
Data is prepared in the folder /data already in your google drive, and predictions will also be stored there.



Antimalarial Activity: eos2gth
#we must import the ErsiliaModel Function for Python
from ersilia import ErsiliaModel
#Step 2: load the model in the notebook
model = ErsiliaModel("eos2gth")
#Step 3: bring the model alive
model.serve()
#Step 4: run predictions for the input smiles
output = model.predict(input=smiles, output="pandas")
#Step 5: close the model
model.close()


We have obtained the predictions as "output" in pandas format, which we can directly save to Drive as .csv
output.to_csv("drive/MyDrive/DataScience_Workshop/data/day2/eos2gth.csv", index=False)
Analysing predictions
To analyse the predictions, we can read them from the Drive
df = pd.read_csv("drive/MyDrive/DataScience_Workshop/data/day2/eos2gth.csv")
df.head()



InChiKey
SMILES
Prediction
Analysing predictions
We can sort the molecules from higher to lower score
output.sort_values("score", ascending=False).head()



Analysing predictions
Or plot the distribution of all scores in a histogram
plt.hist(output["score"], bins=50, color="#50285a")
plt.xlabel("MAIP Score")
plt.ylabel("Number of molecules")
plt.show()



Analysing predictions
- Is this model a regression or a classification?
- How can we interpret the model "score" as antimalarial potential?

https://chembl.gitbook.io/malaria-project/output-file


Plotting the results
To better understand the model outputs, we need to go back to the original model publication where the training data is explained

Active
Inactive
Model testing on experimental dataset
https://chembl.gitbook.io/malaria-project/output-file


Given this test set, what could we choose as a good activity threshold for our dataset?
Next steps
In groups, run additional predictions for the MMV Dataset and select the molecules that you would move on to experimental testing.
We provide a list of suggested models to be used, you do not need to run predictions for all:
- eos46ev (anti tuberculosis)
- eos4e41 (antibiotic activity)
- eos2ta5 (hERG blockade)
- eos2r5a (retrosynthetic availability)
- eos6oli (aqueous solubility)
- eos9yui (natural product likeness)


Next steps
Some pointers:
- Run predictions for relevant models, storing the results in Google Drive
- Analyse the results on the Google Colab Notebook or directly using Excel / Google Sheets
- Discuss the results and rank/select best molecules
- Prepare a short (10 min presentation) of your findings
There is no right or wrong answer, it's just a practise exercise


EventFund_Session2_Breakout
By Gemma Turon
EventFund_Session2_Breakout
- 37