Bringing data science and AI/ML tools to infectious disease research

H3D Foundation and Ersilia present

Event Sponsors

Session 4: Generative models

Breakout Session 4

I have a list of selected hits, but I would like to improve some of the molecules with not so great activity and diversify the collection

Similarity search

Similarity search is a type of generative model based on searching for similar molecules among an already generated virtual library.

In this case, the generative step has already been done and we only need to filter out molecules

  • Much faster
  • Potentially less diversity

Breakout Exercise

We will look for alternatives to one of the molecules from the MMV Malaria Box we used in session 2.


Molecule Selection

Select a molecule of interest from the MMV Malaria Box Dataset (session2/data)


Predicted activity against malaria:

Score =  4

Similarity Models in the EMH

We will work with two similarity models:

  • eos4b8j: gdbchembl-similarity:

    • 166.4 billion possible molecules of up to 17 atoms

    • Interactive browsing at:

  • eos4b8j gdbmedchem-similarity

    • 10 million possible molecules curated from GDBChEMBL

    • Download and browse:

Both models provide the 100 closest molecules

Model Fetch and Predict

#running as python package
from ersilia import ErsiliaModel

model = ErsiliaModel("eos4b8j")
output_eos48bj = model.predict(input="Cc1ccc(Nc2nc(NCCO)c3ccccc3n2)cc1C", output="pandas")

from ersilia import ErsiliaModel

model = ErsiliaModel("eos7jlv")
output_eos7jlv = model.predict(input="Cc1ccc(Nc2nc(NCCO)c3ccccc3n2)cc1C", output="pandas")


Find the 100 compounds from each database that are most similar to your selected molecule:

  • Are the hits obtained from each database different?
  • Are hits from GDBMedChem synthetically more accessible than hits from GDBChEMBL?
  • Do we have any molecule with predicted higher antimalarial potential than the original hit?
  • Which molecules would you select for further screening?
  • Is there any ADMET consideration you are taking into account for the selection, after what we reviewed on session 3?


By Gemma Turon


  • 41