case studyagentshealthcare

Agentic platform for healthcare workflows — Rust core, Vertex AI, human-in-the-loop by default.

01 · problem

Problem

Healthcare workflows that benefit from agents — patient education, discharge planning, research synthesis — can not tolerate the usual failure modes of open-loop LLM systems. PHI handling, auditability, and human review are not afterthoughts. Off-the-shelf agent frameworks assume a trust model that does not hold in this domain.

02 · shape

Shape

A Rust service wraps a Plan-Act-Verify state machine. Each turn: the planner decomposes the request, the actor issues tool calls, a verifier LLM scores the result, and the coordinator decides to continue, escalate to a human reviewer, or halt. Firestore holds the run log and the memory tier; Vertex AI Gemini is the model behind every agent role.

03 · build

Build

Memory is three-tier: raw transcripts, Gemini-summarized mid-term context, distilled long-term facts. A guardrail pipeline runs on every tool call — confidence threshold, critic agent, cross-check against a second model, self-consistency vote, JSON schema enforcement — and any failed gate routes the turn to a reviewer queue. OpenTelemetry spans every stage so a failed run is reconstructible from logs alone. Speech in and out use Cloud STT/TTS behind the same interface as text.

O.R.C.A. — Plan-Act-Verify orchestration A chat request enters the Axum API behind auth, rate-limit, and tenant middleware. The pure-Rust coordinator drives a Plan-Act-Verify loop with LLM planner and verifier. A guardrail pipeline gates the final response. RAG and 3-tier episodic memory persist to Firestore. RUST · AGENT PLATFORM · MAYO CLINIC chat request axum · auth → rate-limit → tenant PLAN · ACT · VERIFY — coordinator (pure rust) planner · llm coordinator verifier · llm conf < 0.85 → human review guardrail · confidence · critic · cross-check RAG embed vector re-rank MEM raw summ distill
figure · service topology
04 · result

Result

Sub-second median latency per turn is the bar. The confidence gate routes the bottom quartile of responses to human review before they reach a patient-facing surface. The system is designed so that every decision a model made can be reconstructed from telemetry alone, without the run log being the thing a reviewer has to decode.

stack

RustAxumrigVertex AI GeminiFirestoreGKEOpenTelemetry