Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Recent advances in large language models (LLMs) have led to impressive performance in medical question answering and clinical decision support. In parallel, biomedical foundation models (BioFMs) trained on multi-omics and molecular data have demonstrated strong potential in domain-specific tasks. However, LLMs and BioFMs typically operate in separate embedding spaces, limiting their ability to reason across modalities. While recent efforts such as CellWhisperer and TxGemma have advanced the integration of biomedical data with language models, they either lack modularity to support diverse modalities or tightly couple biological inputs with language models, hindering extensibility and reuse of bio-specific representations. We introduce BioVERSE (Biomedical Vector Embedding Realignment for Semantic Engagement), a modular architecture that integrates multi-omics and molecular data with LLMs via pretrained BioFMs as encoders. By projecting embeddings of bio-entities from open-source encoders into the language model’s latent space using contrastive learning, followed by instruction tuning on multimodal training data, BioVERSE unifies biological and textual representations within a streamlined architecture. This enables zero-shot annotation, multimodal reasoning, and explainable interaction through natural language. We demonstrate that compact BioVERSE models (e.g., MAMMAL + Granite-8B) often outperform larger LLMs and domain-specific BioFMs on tasks requiring joint understanding of biomedical entities and text. The compact design supports on-premises, privacy-preserving deployment and enables clinicians and researchers to query and reason about complex biological inputs via interactive, multi-turn dialogue.