Building a Comprehensive Platform for Interpretable Multi-Omics Integration and Biomedical Discovery
Modern biomedical research faces significant challenges in integrating diverse omics data types and translating biological knowledge into actionable insights, as the fragmentation of biomedical resources and lack of standardized formats for combining textual biological priors with quantitative features have limited the development of interpretable AI models. To address these challenges, we developed a comprehensive biomedical AI ecosystem that harmonizes nomenclature across resources and enables the creation of Graph-Language Foundation Models (GLFMs) by integrating multi-omics data (epigenomic, genomic, transcriptomic, proteomic) through advanced preprocessing techniques, including DNA methylation analysis based on CpG sites, while automatically generating Text-Numeric Graphs (TNGs)—a novel data format that bridges textual biological knowledge with quantitative measurements to facilitate interpretable AI model development for precision medicine applications. Check out the BioMedGraphica and mosGraphGen repositories, access the platform at app.biomedgraphica.org, and explore our standardized datasets including OmniCellTOSG for single-cell data and MOTASG for bulk omics data.