Bringing data science and AI/ML tools to infectious disease research
H3D Foundation and Ersilia present
Session 4: Using open source ML assets
Gemma Turon, @TuronGemma, email@example.com
Miquel Duran-Frigola, @mduranfrigola, firstname.lastname@example.org
Ersilia Open Source Initiative, @ersiliaio, https://ersilia.io
30th September 2022
What is Git and GitHub?
Git is a software that tracks changes to files. Changes include creating new files, deleting files or folders or editing the content of a file.
When can Git be useful?
GitHub is an internet hosting service for software development and version control using Git.
When can GitHub be useful?
Many organisations and laboratories working with coding have a GitHub Organisation profile
1 Repository = 1 Project
Ersilia's backend is in GitHub
An example of repositories we have been using:
- Ersilia: https://github.com/ersilia-os/ersilia
- Retrosynthetic Accessibility: https://github.com/reymond-group/RAscore
- Antibiotic activity: https://github.com/chemprop/chemprop
Important files in a GitHub Repo
- README File: basic information about the repository, shows up on the repository landing page
- LICENSE: it is essential to check the license of the code before using it. Common Open Source Licenses are:
- MIT, GPLv3, Apache, Mozilla ...
Important sections in a Repo
- Contributors: let's you see who is contributing to this code --> great for contacting people
- Issues: a place where everyone can drop questions or problems they have encountered when using the code in the repository, and hopefully get help from the community or the developers.
How to use the Ersilia Model Hub with my own data
Download a dataset from ChEMBL
- chembl assay id: 3882128
- Select all molecules and start download
- Unpack ZIP file and extract .csv
- Save .csv to a folder in Drive
Let's jump on the Notebook for Session 4 Skills Development
- The Notebook contains an easy to use interactive tool to run models from the Ersilia Model Hub.
- Run all the cells in order! If the notebook disconnects for some reason, run everything again
- Input the full paths to the files: drive/MyDrive/foldername/filename.csv
- Check the cells that require input (✍)are properly filled
Use the models in the Hub
Public datasets of interest
Models in the literature
Newly generated data
By Gemma Turon
Presentation for the Session 4 Skills development Session