Building Agents for Life Sciences

In 2026, I am re-positioning Nitro Bio to focus on building agents for life science. In some ways this is not a big change. Much of the same software techniques, principles, and ethos will stay the same. Nitro Bio's mission is still to build the elegant, performant software that scientists deserve. But what makes agents different from and better than regular web software? And why would you want to work with Nitro Bio when you're building them?

A Quick Primer on Agents

An agent is composed of two parts: an LLM (i.e. gpt-5.2 or claude-opus-4.5) and a harness (i.e. OpenAI Codex or Claude Code). By now the LLM component should be familiar to most users—it is the thing generating responses in the ChatGPT app. But in contrast to a standard chat interface, this LLM <-> Harness system doesn't just respond to users with text. Rather, it explores, plans, and acts upon other systems by issuing Tool Calls. A Tool Call is a request from the LLM to its harness to run a specific function. For a life science agent, this may be query_protein_database(), run_sequence_alignment(), or generate_structure_prediction(). By composing these tool calls, an agent can accomplish complex multi-step analyses autonomously - retrieving sequences, running comparisons, and validating predictions without requiring separate UI flows for each step.

This LLM <-> Harness architecture is fundamentally more powerful than traditional web software. First, agents can collaborate with users in natural language, accepting corrections and clarifications mid-task rather than forcing users through rigid forms and predetermined workflows. Second, instead of building separate UI flows for every possible user intent, you can provide agents with a small set of powerful, composable tools. These tools can be chained together with arbitrary complexity to accomplish tasks developers never explicitly programmed for. A traditional webapp might have distinct "Import Sequences," "Run BLAST," and "Visualize Phylogeny" features, each requiring custom UI and backend logic. An agent with tools like parse_fasta(), query_ncbi(), and render_tree() can accomplish all three tasks simply by reasoning about which sequence of tool calls will achieve the user's goal.

Agentic Design Questions

However, the traditional Human → System interaction still matters. Users will continue to click buttons, fill forms, and directly manipulate interfaces — especially for routine tasks where invoking an agent would be overkill, or when the agent fails and the user needs to step in.

Agent → System interactions also require you to expose programmatic interfaces that agents can reason about and invoke. In extremely simple cases, this may just be sharing the API documentation and an auth token with the agent. But for applications of any complexity it's about designing tool signatures, return types, and error messages that an LLM can understand in context.

Third, Human → Agent → System flows introduce a new kind of interaction entirely: oversight and approval mechanisms where the user guides, corrects, or gates the agent's actions before they touch the underlying system.

The tricky part is that these modes aren't cleanly separable. A single feature might need to support all three. Take removing an outlier sample: a human might click to exclude it from analysis, an agent might call a filter_sample() function, or a human might need to approve an agent's plan to exclude several low-quality samples from a differential expression analysis. Each mode has different error handling needs, different confirmation thresholds, and different failure modes. Building software that elegantly supports all three is a new design challenge.

How Can Nitro Bio Help?

There are a couple failure modes Nitro Bio can help you avoid. The first is a simple lack of engineering sophistication and vision - being stuck in the CRUD app mindset. And almost everyone else building an agent right now treats the interface as an afterthought - a chat box bolted onto existing software.

In 2025, I learned that the hardest part of agent design isn't the LLM calls, it's the human-in-the-loop interactions. When should the agent ask for approval? How do you show a plan that's inspectable but not overwhelming? What happens when the agent fails halfway through a 10-step task? These frontend and UX design questions separate agents that scientists tolerate from agents they actually want to use. Nitro Bio sits at the intersection of all three: building the harness, understanding the science, and obsessing over the interaction design.

My scientific figure-building tool was my testing ground for developing intuitions about building agents. I started off by stealing a couple of ideas from Claude Code — most obviously the two-phase plan/edit approach.

In the planning phase, the agent can only inspect the current figure and propose a plan. This restriction forces the model to think completely before acting. Before executing the plan, the draw agent presents it to the user for approval. I should note that this plan/approval phase isn't appropriate in all agentic workflows. In this case, since the user can be reasonably expected to both evaluate the plan and the agent's output it seems to be an appropriate UX.

This human-in-the-loop approach is extended to the editing phase, where the agent is given free rein to edit the canvas. The draw agent doesn't execute its entire plan in one shot. It performs a batch of actions, then inspects the canvas to see what actually happened before continuing. This is primarily a hallucination mitigation. It has to observe the real canvas state before deciding what to do next.

I also let the agent request structured input mid-task. Some actions are expensive in time and dollars — image generation, for example — and so I want the user to be informed and not encounter a surprising bill or unresponsive application. Rather than firing off generations autonomously, the agent suggests a prompt and waits for approval. I reuse this pattern for interactions that need more than a yes/no, like background removal where the user draws a bounding box to select the subject.

These are the kinds of questions that make harness design both impactful and non-trivial: the right answer depends on the domain, the cost of mistakes, and how much users can evaluate before execution. As these factors change and models improve, the harness will need to evolve to match.

If you're building agents for scientific workflows and facing these design questions, let's talk. Nitro Bio specializes in the harness layer—the 80% of agent development that isn't the LLM call. Grab a time here.