A UI for an Embedding-based Protein Search Engine
June 23rd, 2025

Advancing Genomic Discovery
I'm excited to share that our work on Gaia has been published in Science Advances. Gaia is an AI-powered, context-aware protein sequence search tool built on embeddings from gLM2, a genomic language model that captures amino acid sequence, local gene neighborhood, and structural information. Gaia indexes over 85 million protein clusters from 131,744 microbial genomes, enabling real-time retrieval using approximate nearest-neighbor search.
As the frontend developer and designer for Gaia, I had the privilege of developing a clear, accessible interface that helps researchers more effectively investigate previously enigmatic "hypothetical proteins."
A Simple Gateway to a Powerful Model
Yunha Hwang and the Tatta Bio team created gLM2—a sophisticated genomic language model. Our goal was to ensure this powerful tool was accessible and easy to use, even for researchers unfamiliar with complex computational tools.
Our challenge: Provide microbiologists and researchers with a straightforward and intuitive way to harness genomic AI.
Principles Guiding Gaia's Interface
We designed Gaia's frontend with three main principles in mind:
1. Simplicity
Gaia's primary interface prioritizes a simple, IKEA-like design that fades into the background. Researchers paste protein sequences in and quickly retrieve context-rich results.
We pushed explanatory text into Info Popovers, and generally avoided "application chrome" and visual flair.
This approach allowed us to lean into the next principle:
2. Density
The application leans on Gaia Agent Annotations - LLM powered explanations of search results - to ensure the core results page is information dense by default. If a user wants a high level explanation of their query results they can invoke the agent via a button to get help interpreting the results.
Another key feature is the genomic context viewer, an interactive visualization component that helps researchers:
- Clearly see gene arrangements and orientations
- Discover functional relationships between genes
- Identify meaningful genomic patterns
- Explore genomic data interactively and efficiently
Links to InterPro and JGI's IMG/M tool are also provided, so users can quickly check and validate Gaia's results.
3. Snappiness
Gaia's interface balances quick initial responsiveness with complex, computationally intensive tasks. Initial search results load rapidly (~350 ms), providing researchers immediate context. However, extensive computational tasks—such as running BLAST for sequence alignment, performing structure predictions with ESMFold, and computing annotations against the PFAM database—require more processing time.
To manage this effectively, Gaia initially renders a results skeleton. Subsequently, PFAM annotations appear on a sequence viewer, followed by the protein structure from ESMFold. Finally, PFAM annotations are overlaid on the structure itself. This staged approach ensures researchers experience minimal delays and can interact with essential results promptly, even as comprehensive computations complete in the background.
Technical Highlights
Building Gaia's interface involved addressing interesting technical challenges:
- Custom Visualization Components: Developed tailored React components for clear and interactive genomic data visualization.
- Responsive Search: Created a dynamic search system accommodating diverse researcher input formats.
- Cross-device Compatibility: Ensured usability across devices, from desktop environments to tablets used in laboratories.
Looking Ahead
What motivates me about Gaia is how it facilitates scientific discovery, turning previously challenging tasks into manageable, efficient experiences. As Yunha noted in her post, processes that once took hours can now be completed in seconds, allowing researchers to spend more time on insights rather than manual analysis.
At Nitro Bio, our commitment is to build intuitive, effective software tools to support researchers in groundbreaking scientific endeavors.
Explore Gaia
Experience Gaia's capabilities directly at gaia.tatta.bio. Gaia is designed to streamline your genomic research and enhance discovery, whether you're investigating hypothetical proteins or exploring genomic contexts.
Interested in discussing how Nitro Bio can support your scientific projects through intuitive software design? Reach out to us.