A UI for an Embedding-based Protein Search Engine

June 23rd, 2025

Advancing Genomic Discovery

I'm excited to share that our work on Gaia has been published in Science Advances. Gaia is an AI-powered, context-aware protein sequence search tool built on embeddings from gLM2, a genomic language model that captures amino acid sequence, local gene neighborhood, and structural information. Gaia indexes over 85 million protein clusters from 131,744 microbial genomes, enabling real-time retrieval using approximate nearest-neighbor search.

As the frontend developer and designer for Gaia, I had the privilege of developing a clear, accessible interface that helps researchers more effectively investigate previously enigmatic "hypothetical proteins."

A Simple Gateway to a Powerful Model

Yunha Hwang and the Tatta Bio team created gLM2—a sophisticated genomic language model. Our goal was to ensure this powerful tool was accessible and easy to use, even for researchers unfamiliar with complex computational tools.

Our challenge: Provide microbiologists and researchers with a straightforward and intuitive way to harness genomic AI.

Gaia's main interface showing UMAP visualization, protein structure prediction, and AI-powered annotations

Principles Guiding Gaia's Interface

We designed Gaia's frontend with three main principles in mind:

1. Simplicity

Gaia's primary interface prioritizes a simple, IKEA-like design that fades into the background. Researchers paste protein sequences in and quickly retrieve context-rich results.

We pushed explanatory text into Info Popovers, and generally avoided "application chrome" and visual flair.

Info popover explaining pLDDT confidence scores in an accessible way

This approach allowed us to lean into the next principle:

2. Density

The application leans on Gaia Agent Annotations - LLM powered explanations of search results - to ensure the core results page is information dense by default. If a user wants a high level explanation of their query results they can invoke the agent via a button to get help interpreting the results.

Gaia Agent providing contextual explanations to help researchers interpret complex genomic data

Another key feature is the genomic context viewer, an interactive visualization component that helps researchers:

  • Clearly see gene arrangements and orientations
  • Discover functional relationships between genes
  • Identify meaningful genomic patterns
  • Explore genomic data interactively and efficiently

Links to InterPro and JGI's IMG/M tool are also provided, so users can quickly check and validate Gaia's results.

3. Snappiness

Gaia's interface balances quick initial responsiveness with complex, computationally intensive tasks. Initial search results load rapidly (~350 ms), providing researchers immediate context. However, extensive computational tasks—such as running BLAST for sequence alignment, performing structure predictions with ESMFold, and computing annotations against the PFAM database—require more processing time.

To manage this effectively, Gaia initially renders a results skeleton. Subsequently, PFAM annotations appear on a sequence viewer, followed by the protein structure from ESMFold. Finally, PFAM annotations are overlaid on the structure itself. This staged approach ensures researchers experience minimal delays and can interact with essential results promptly, even as comprehensive computations complete in the background.

Performance waterfall showing how Gaia loads results progressively, with fast initial responses followed by more intensive computations

Technical Highlights

Building Gaia's interface involved addressing interesting technical challenges:

  • Custom Visualization Components: Developed tailored React components for clear and interactive genomic data visualization.
  • Responsive Search: Created a dynamic search system accommodating diverse researcher input formats.
  • Cross-device Compatibility: Ensured usability across devices, from desktop environments to tablets used in laboratories.

Looking Ahead

What motivates me about Gaia is how it facilitates scientific discovery, turning previously challenging tasks into manageable, efficient experiences. As Yunha noted in her post, processes that once took hours can now be completed in seconds, allowing researchers to spend more time on insights rather than manual analysis.

At Nitro Bio, our commitment is to build intuitive, effective software tools to support researchers in groundbreaking scientific endeavors.

Explore Gaia

Experience Gaia's capabilities directly at gaia.tatta.bio. Gaia is designed to streamline your genomic research and enhance discovery, whether you're investigating hypothetical proteins or exploring genomic contexts.


Interested in discussing how Nitro Bio can support your scientific projects through intuitive software design? Reach out to us.