โ—€ Back to posts
Post ยท May 27, 2026

Running an AI Operations Agent for a Mobile Vet Startup

At Nala Vet, I run a single fly.io machine as an always-on AI operations agent for my mobile veterinary clinic. It reads my CRM, reschedules appointments, sends SMS, opens PRs on GitHub, and posts a daily briefing to Telegram โ€” all from one container. Here's exactly how I built it and what I learned.


The Problem

I run a small team operating a mobile vet CRM in Miami. Appointments change constantly โ€” vets have routes to optimize, leads come in overnight, customers need follow-ups. A lot of the ops load is just: look at something, decide, take an action. Classic stuff, but it was all human-gated. Someone had to log into a dashboard, find the right record, make the change, send a message.

The ask was simple: could one agent handle the boring operations loop โ€” morning briefings, rescheduling, lead triage, follow-up SMS โ€” without a human touching a UI for every single thing?

The answer turned out to be yes, with one important constraint: the agent doesn't act unilaterally. Every write operation goes through a confirmation step. I stay in the loop, just without the dashboard.


The Stack

Nala Vet Infrastructure Architecture

Here's what's running:

Harness (ghcr.io/capotej/harness) is the base container image. It ships Hermes Agent pre-installed with a Python environment, standard CLI tools (git, gh, curl, jq, ripgrep), and no install ceremony. You pull it, mount your config, and you're running.

Hermes Agent (by Nous Research) is the AI agent runtime inside Harness. It's a persistent process โ€” not a serverless function, not a webhook handler. It has built-in tools: terminal, file read/write, web search, a cron scheduler, and persistent memory across sessions. Critically, it also accepts custom bash scripts as tools, which is how I plug it into my stack.

fly.io is where the container lives. Single machine, restart = always. No cold-start latency, no queue to poll. The agent is just there. When I text it, the message hits within a second or two.

Custom wrapper scripts are injected into the container at deploy time via fly.io's [[files]] config. Each one is a thin, pre-authenticated HTTP client against my internal APIs:

  • crm โ€” talks to api.nala.vet, authenticated with a bot JWT scoped through Pundit
  • tg โ€” pushes a message to my Telegram chat
  • brief โ€” calls GET /crm/bot/briefing, formats the KPI digest, sends it via tg
  • pr โ€” wraps gh pr create with hard rails: no direct push to main or production, no force-push, feature branches only

One quirk worth knowing: fly.io [[files]] injects files at 0644 โ€” no execute bit. So every invocation is bash /etc/nala-claw/bin/crm, not just crm. Minor annoyance, zero impact in practice.

SOUL.md is the agent's identity file โ€” more on this in its own section below.

The agent knows the full monorepo context: hv-api (Rails), hv-web (React/TypeScript/Vite/Tailwind), hv-crm, hv-driver. It clones repos on demand and makes changes via feature branches.


What It Actually Does

Here are the real operations, with how they work:

Daily briefing. A Hermes cron job fires at 8 AM ET (0 8 * * *). It runs brief, which calls GET /crm/bot/briefing and gets back a structured JSON blob: appointments today, appointments this week, leads by stage, open tasks. The agent formats it and pushes it to Telegram. Zero human involvement โ€” I wake up to the digest already in my phone.

Appointment reschedules. I text: "reschedule Maria's appointment to Thursday 3pm." The agent calls GET /crm/appointments with a search, surfaces what it found โ€” customer name, pet, current slot, proposed new slot, assigned vet โ€” and asks for YES. On confirmation, it calls PATCH /crm/appointments/:id. If anything is ambiguous (multiple Marias, no Thursday availability), it asks before proceeding. The rule is strict: every CRM write requires an explicit YES.

Lead triage. "Show me new leads" โ†’ GET /crm/leads?stage=new โ†’ formatted list in Telegram. Read operations need no confirmation.

Outbound SMS. "Send Maria a follow-up about her appointment." The agent drafts the message, shows the exact text and destination number, waits for YES, then POST /crm/sms. Pundit enforces this server-side too โ€” the bot JWT doesn't have permission to send SMS without the right scope.

Code changes. "Fix the typo in the booking modal." The agent clones hv-web, finds the component, makes the edit, shows a diff summary, asks for YES, PUSH. On confirmation: git commit, push a feature branch, gh pr create. A PR lands in the HomeVet GitHub org with a descriptive title and body. The agent never touches main directly.

What it refuses. Invoice charges, voids, and refunds are human-only โ€” not in the bot's Pundit scope. PII edits (customer name, phone, address) are locked. No force-pushing to shared branches. These aren't just soft guidelines in the SOUL.md โ€” they're enforced at the API layer regardless of what the agent attempts.


The SOUL.md Pattern

This is the most transferable idea from the whole setup, so it deserves its own section.

Hermes Agent loads $HERMES_HOME/SOUL.md as the first slot of the system prompt, completely replacing the default "You are Hermes Agent..." identity. The file is plain Markdown. It's version-controlled, editable without touching application code, and survives container restarts.

Mine defines:

  • Who the agent is โ€” Nala Claw, operations agent for Nala Vet, single trusted operator (me, via Telegram)
  • What tools it has โ€” each wrapper script listed with its exact invocation (bash /etc/nala-claw/bin/crm) and what it does
  • Authority scope โ€” which CRM endpoints it can write to, which ones are off-limits, what confirmation pattern to enforce
  • Hard rails for code โ€” never push main, never force-push, always open a PR, always confirm before write
  • Tone โ€” brief, imperative, no apologies, like a senior engineer texting their boss

The key insight is that SOUL.md turns a general-purpose agent into a domain-specific operator. I'm not writing prompt hacks or prepending instructions to every message โ€” I'm defining a stable identity that the agent embodies across every session.

Putting it in a file (not a chat prefix) matters for three reasons: it survives restarts, it's diff-able, and it puts my trust model front and center as code rather than convention.


Tradeoffs and Honest Observations

A few things that are actually rough edges:

Single machine, single operator. This setup is designed for me. The identity model is one trusted Telegram user. Adding a second operator means extending the allowlist and thinking carefully about which operations each one can authorize โ€” non-trivial and intentionally deferred.

No browser in the container. The Harness base image doesn't have the X11/glib libraries Playwright needs. Screenshot-based QA isn't available. Workaround: the agent generates static HTML mockups for visual review. Works well enough for my team.

Wrapper scripts can't be made executable via [[files]]. fly.io injects them at 0644. Hence bash /etc/nala-claw/bin/crm everywhere. One line in SOUL.md explains this, and after that the agent never gets confused about it.

Submodule SSH URLs. My monorepo uses SSH submodule URLs; the container has GH_TOKEN for HTTPS auth. So I clone sub-repos independently with gh repo clone instead of initializing submodules. Took one session to sort out, works reliably since.

What makes this different from a cron + webhook setup. A fixed cron can send a briefing. Only an agent can handle "reschedule whatever appointment Maria has this week to the next available Thursday slot" โ€” that requires understanding the request, querying live data, resolving ambiguity, and deciding what to do with the answer. That's the gap the agent fills.


What's Next

I'm expanding beyond Miami โ€” more vets, more routes, more appointment volume. The dispatch automation will need to get tighter: the agent currently handles reschedules on request, but I'm moving toward having it proactively flag conflicts and propose solutions before I have to ask.

The memory system helps here. Hermes Agent accumulates facts across sessions โ€” conventions about how I name things, preferences, environment quirks โ€” so the agent gets less wrong over time without being retrained.

The broader pattern โ€” Harness + SOUL.md + wrapper scripts + a fly.io always-on machine โ€” is not Nala-specific. Any ops-heavy small team that runs over Telegram or Slack, with an internal API that can issue bot credentials, could wire this up in a day. The hard part isn't the infrastructure. It's writing a SOUL.md that accurately scopes what the agent should and shouldn't do, and building the confirmation rails that make the human comfortable letting it act.