AI/ML projects | BBOP @ LBNL

BBOP is at the cutting edge of developing and applying new artificial intelligence (AI) and machine learning (ML) techniques in bioinformatics and biomedical ontologies. Approaches we are exploring include Knowledge Graphs (KGs) and Large Language Models (LLMs) such as GPT and LLAMA.

Below are some examples of AI/ML-related projects we are currently engaged in. Note that this work is evolving quickly, so this page may not be up to date!

OntoGPT: Using ontologies and AI to extract knowledge from text

Description: OntoGPT is a package for the generation of ontologies and knowledge bases using large language models (LLMs).
Link: https://monarch-initiative.github.io/ontogpt
Publications:

Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics, Volume 40, Issue 3, March 2024, btae104, https://doi.org/10.1093/bioinformatics/btae104.
Caufield H, Kroll C, O'Neil ST, Reese JT, Joachimiak MP, Hegde H, Harris NL, Krishnamurthy M, McLaughlin JA, Smedley D, Haendel MA, Robinson PN, Mungall CJ. CurateGPT: A flexible language-model assisted biocuration tool. arXiv [cs.CL]. 2024. http://arxiv.org/abs/2411.00046
O'Neil ST, Schaper K, Elsarboukh G, Reese JT, Moxon SAT, Harris NL, Munoz-Torres MC, Robinson PN, Haendel MA, Mungall CJ. Phenomics Assistant: An Interface for LLM-based Biomedical Knowledge Graph Exploration. bioRxiv. 2024. p. 2024.01.31.578275. https://www.biorxiv.org/content/biorxiv/early/2024/02/02/2024.01.31.578275
Toro S, Anagnostopoulos AV, Bello SM, Blumberg K, Cameron R, Carmody L, Diehl AD, Dooley DM, Duncan WD, Fey P, Gaudet P, Harris NL, Joachimiak MP, Kiani L, Lubiana T, Munoz-Torres MC, O'Neil S, Osumi-Sutherland D, Puig-Barbe A, Reese JT, Reiser L, Robb SM, Ruemping T, Seager J, Sid E, Stefancsik R, Weber M, Wood V, Haendel MA, Mungall CJ. Dynamic Retrieval Augmented Generation of ontologies using artificial intelligence (DRAGON-AI). J Biomed Semantics. 2024 Oct 17;15(1):19. http://dx.doi.org/10.1186/s13326-024-00320-3 PMID: 39415214
Groza T, Caufield H, Gration D, Baynam G, Haendel MA, Robinson PN, Mungall CJ, Reese JT. An evaluation of GPT models for phenotype concept recognition. BMC Med Inform Decis Mak. 2024 Jan 31;24(1):30. http://dx.doi.org/10.1186/s12911-024-02439-w PMCID: PMC10829255
Reese JT, Chimirri L, Bridges Y, Danis D, Caufield JH, Wissink K, McMurry JA, Graefe AS, Casiraghi E, Valentini G, Jacobsen JO, Haendel M, Smedley D, Mungall CJ, Robinson PN. Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools. medRxiv. 2024 Nov 7; https://www.medrxiv.org/content/10.1101/2024.07.22.24310816v2 PMCID: PMC11302616
Matentzoglu N, Caufield JH, Hegde HB, Reese JT, Moxon S, Kim H, Harris NL, Haendel MA, Mungall CJ. MapperGPT: Large Language Models for Linking and Mapping Entities. arXiv [cs.CL]. 2023. http://arxiv.org/abs/2310.03666

TALISMAN: Using generative AI to summarize and interpret complex genomic data

Description: Uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis.
Link: https://monarch-initiative.github.io/ontogpt/#gene-enrichment-using-spindoctor
Publication: Joachimiak MP, Caufield JH, Harris NL, Kim H, Mungall CJ. Gene Set Summarization using Large Language Models. arXiv [q-bio.GN]. 2023. http://arxiv.org/abs/2305.13338

GRAPE: Scalable ML over knowledge graphs for drug repurposing and pandemic response

Description: GRAPE (Graph Representation Learning, Prediction and Evaluation) is a software resource for graph processing and embedding that performs much better in both time and space usage than other existing methods.
Link: https://github.com/AnacletoLAB/grape
Publications:

Cappelletti L, Fontana T, Casiraghi E, Ravanmehr V, Callahan TJ, Cano C, Joachimiak MP, Mungall CJ, Robinson PN, Reese J, Valentini G. GRAPE for fast and scalable graph processing and random-walk-based embedding. Nature Computational Science. Nature Publishing Group; 2023 Jun 26;3(6):552–568. https://www.nature.com/articles/s43588-023-00465-8
Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub—building and exchanging biological knowledge graphs. Bioinformatics. Oxford Academic; 2023 Jun 30;39(7):btad418. https://academic.oup.com/bioinformatics/article-abstract/39/7/btad418/7211646
Reese JT, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Shefchek KA, Good BM, Balhoff JP, Fontana T, Blau H, Matentzoglu N, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ. KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response. Patterns (N Y). 2021 Jan 8;2(1):100155. http://dx.doi.org/10.1016/j.patter.2020.100155 PMCID: PMC7649624

Exomiser: Evaluating LLMs for predicting causative variants and differential diagnosis of genetic disease

Description: We constructed software to evaluate the performance of LLMs on differential diagnosis of rare disease compared to existing state of the art software (e.g. Exomiser).
Link: https://github.com/monarch-initiative/malco
Publications:

Reese JT, Danis D, Caufield JH, Groza T, Casiraghi E, Valentini G, Mungall CJ, Robinson PN. On the limitations of large language models in clinical diagnosis. medRxiv. 2024 Feb 26; http://dx.doi.org/10.1101/2023.07.13.23292613 PMCID: PMC10370243
Reese JT, Chimirri L, Danis D, Caufield JH, Wissink K, Casiraghi E, Valentini G, Haendel MA, Mungall CJ, Robinson PN. Evaluation of the Diagnostic Accuracy of GPT-4 in Five Thousand Rare Disease Cases. medRxiv. 2024 Jul 22; http://dx.doi.org/10.1101/2024.07.22.24310816

Artificial Intelligence Ontology: enumerating the concepts in AI

Description: The Artificial Intelligence Ontology (AIO) is a systematization of AI concepts, methodologies, and their interrelations that we developed via manual curation, with the assistance of LLMs to help with concept recognition.
Link: https://github.com/berkeleybop/artificial-intelligence-ontology
Publication: Joachimiak MP, Miller MA, Harry Caufield J, Ly R, Harris NL, Tritt A, Mungall CJ, Bouchard KE. The Artificial Intelligence Ontology: LLM-assisted construction of AI concept hierarchies. arXiv [cs.LG]. 2024. http://arxiv.org/abs/2404.03044

CultureBot: predicting growth conditions for microbes using Knowledge Graphs and AI

Description: CultureBot is a computational framework that supports automated high throughput microbial culturing and growth assays using novel knowledge-based AI methods.

ML over large clinical datasets

Description: We are interested in applying modern ML techniques to extract actionable knowledge from biomedical data. For example, we used semantic similarity and ontology-based ML to identify subclusters of long COVID patients.
Link: https://github.com/National-COVID-Cohort-Collaborative/semanticsimilarity/tree/master
Publication: Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Caufield JH, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN, N3C Consortium, RECOVER Consortium. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. EBioMedicine. 2023 Jan;87:104413. http://dx.doi.org/10.1016/j.ebiom.2022.104413 PMCID: PMC9769411

DRAGON-AI: Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence

Description: An ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG).
Link: part of CurateGPT: https://github.com/monarch-initiative/curategpt
Publication: Toro S, Anagnostopoulos AV, Bello SM, Blumberg K, Cameron R, Carmody L, Diehl AD, Dooley DM, Duncan WD, Fey P, Gaudet P, Harris NL, Joachimiak MP, Kiani L, Lubiana T, Munoz-Torres MC, O'Neil S, Osumi-Sutherland D, Puig-Barbe A, Reese JT, Reiser L, Robb SM, Ruemping T, Seager J, Sid E, Stefancsik R, Weber M, Wood V, Haendel MA, Mungall CJ. Dynamic Retrieval Augmented Generation of ontologies using artificial intelligence (DRAGON-AI). J Biomed Semantics. 2024 Oct 17;15(1):19. http://dx.doi.org/10.1186/s13326-024-00320-3 PMID: 39415214

Knowledge Graph Hub

Description: KG-Hub is a platform that enables standardized construction, exchange, and reuse of knowledge graphs (KGs).

KG-Hub includes:

Design patterns and standards for building interoperable graphs based on the Biolink Model (https://biolink.github.io/biolink-model/).
Example KG projects (see https://kghub.io/).
A set of OBO Foundry ontologies, ready to use in your own KG (see https://kghub.org/kg_obo/).
The KG-Registry, a catalog of KG projects, their components, and their related tools (see https://kghub.org/kg-registry/).

Link: https://kghub.org/

Publication: J Harry Caufield, Tim Putman, Kevin Schaper, Deepak R Unni, Harshad Hegde, Tiffany J Callahan, Luca Cappelletti, Sierra A T Moxon, Vida Ravanmehr, Seth Carbon, Lauren E Chan, Katherina Cortes, Kent A Shefchek, Glass Elsarboukh, Jim Balhoff, Tommaso Fontana, Nicolas Matentzoglu, Richard M Bruskiewich, Anne E Thessen, Nomi L Harris, Monica C Munoz-Torres, Melissa A Haendel, Peter N Robinson, Marcin P Joachimiak, Christopher J Mungall, Justin T Reese, KG-Hub—building and exchanging biological knowledge graphs, Bioinformatics, Volume 39, Issue 7, July 2023, btad418, https://doi.org/10.1093/bioinformatics/btad418

Other work

OBO Academy article on Leveraging ChatGPT for ontology curation

More info

Article about knowledge-backed AI with Monarch

Edit