The Diagnostic: Find the Money
All Industries
Techie Corner

How Runwell Actually Builds AI Automations

A technical deep-dive for the engineers, ops leads, and CTO-types who want to know what's under the hood before signing off on a RunWell engagement.

In this article

If you're the owner of a professional services firm, you can skip this post. Nothing here will help you decide whether Runwell is right for your business — your Blueprint will do that. This post is for the technical person inside the firm. The fractional CTO, the senior dev on retainer, the ops lead who's been burned by an automation vendor before. You're the one who'll get handed our proposal and asked, "Is this legit?" Here's what we actually do, in the level of detail you need to answer that question.

TL;DR

What you'll know after reading this

  • What "agent-native" means in practice — and why most AI automation agencies aren't.
  • Our four-layer architecture: trigger, context assembly, reasoning loop, action and logging.
  • How we monitor production automations without ever holding credentials or standing access to your systems.
  • The seven questions a technical reviewer should ask in any Runwell discovery call.

The short version

We build agent-native AI systems inside Anthropic's developer environment using Claude Code and Claude Cowork. We do not wrap ChatGPT. We do not move data between SaaS apps with a no-code flowchart and call it AI. We build agents that read documents, reason across tools, follow multi-step workflows, and hand off completed work — the same way a trained junior associate would.

We never take standing access to your systems. We never share credentials. Every automation we build reports its own health back to a Runwell dashboard, which is what we monitor — not your accounts.

If any of those claims need unpacking, read on.

Why most "AI automation" you've seen is garbage

In the last 18 months, every consultant, agency, and Fiverr seller has rebranded as an "AI automation expert." Most of what they ship falls into one of three buckets:

  1. ChatGPT wrappers. A web form, a system prompt, an OpenAI API call, and an output panel. Useful for FAQ chat. Useless for operational work that needs to read a contract, check a CRM record, and decide whether to escalate.
  2. No-code flowcharts dressed up as AI. Zapier or Make scenarios with one "AI step" stuffed in the middle that calls GPT to summarize an email. The structural logic is still 100% deterministic if-this-then-that. The AI is decoration.
  3. Custom GPTs. A glorified search engine over the firm's documents. Helpful as an internal lookup tool. Not automation. Nothing executes.

None of these are wrong as products. They're wrong as solutions to operational waste — because operational waste lives in the multi-step judgment work between the apps. That's the work we build for.

What "agent-native" actually means

An agent-native build is a system where the AI model is making real decisions inside a structured loop, with access to tools that let it act on the world. Concretely, that means:

  • The model receives a goal, not a script.
  • It has access to a defined set of tools — read this document, query this CRM, draft this email, post to this Slack channel, write to this row in this sheet.
  • It chooses which tools to use, in what order, based on what it learns at each step.
  • It can ask for human approval at defined gates before taking irreversible actions.
  • It logs every decision, every tool call, every input, every output.

This is not novel architecture — it's standard for anyone building seriously with Claude or GPT-4-class models in 2026. What's notable is how few "AI automation agencies" actually build this way. Most are still gluing prompts to Zapier triggers.

We build inside Claude Code (Anthropic's developer-facing agent environment) and Claude Cowork (Anthropic's collaboration-layer agent runtime). These are the same tools Anthropic uses internally. They give us version control, structured tool definitions, observability into every agent decision, and a deployment path that doesn't depend on third-party orchestration platforms going down.

The architecture, concretely

A typical Runwell automation has four layers:

// runwell automation stack
LAYER 01
Trigger
A form is submitted. An email lands. A row is added. A calendar event ends. The trigger fires a webhook into our environment.
LAYER 02
Context assembly
Before the agent reasons about anything, we gather the relevant context: the document attached to the form, the relevant CRM records, the firm's policies, the historical pattern from similar prior cases. Garbage context produces garbage decisions, so this layer is heavily engineered.
LAYER 03
Agent reasoning loop
Claude reasons over the assembled context, calls tools as needed, and produces an output: a drafted document, a routing decision, a structured data update, a Slack notification, a flag for human review.
LAYER 04
Action and logging
The output executes against your systems via the API integrations from Layer 02. Every step is logged — input, intermediate reasoning, tool calls, output, timing, cost.

When something breaks, we can replay the exact reasoning trace. When you want to know why an agent made a specific call, we can show you. This is not a black box.

Zero credential sharing. Zero standing access.

This is the part most clients ask about, so let's be specific.

What other vendors doWhat Runwell does
Credential sharing. Ask for your admin login. Store it. Use it whenever they need to. Build every integration via the destination system's official API, with a service account or OAuth scope you provision, scoped to exactly what the automation needs.
Standing access. Run a persistent session into your systems 24/7 with broad permissions. Their breach is your breach. Automations run in our environment, call your API with a narrowly scoped credential, do the work, disconnect. No persistent middleman session.
Trust-us monitoring. Their dashboard reads from your accounts. Every automation emits health telemetry — uptime, error rate, latency, cost per run, success rate. We watch the telemetry dashboard. We don't watch your accounts.
Security model

There is no Runwell employee with a password to your CRM. There is no persistent session sitting in a Zapier-style middleman. If something breaks, the dashboard tells us — we don't need to log into your systems to find out. If you've ever had a vendor lose a password and create a security incident, you know why this matters.

For technical reviewers

Want to see a real reasoning trace from a production agent?

Book a 30-minute technical deep-dive. Redacted for client confidentiality, full architecture walkthrough, your questions. No sales. Just engineering.

Book the technical deep-dive →

Why we build in Claude, not GPT or open-source

This question comes up. Honest answer:

Claude's reasoning is better for multi-step operational work

That's not a marketing line — it's been our experience across builds. For document-heavy reasoning (contracts, intake forms, engagement letters, financial documents), Claude makes fewer "lost in the middle" errors and follows multi-turn instructions more reliably. We've A/B tested.

Claude Code and Claude Cowork are first-class agent environments

OpenAI's Assistants API is fine. LangChain and LlamaIndex are fine. But for production agent work with audit trails and structured tool use, the Anthropic stack is what we build for. It's also what Anthropic uses internally, which means the path is well-trodden.

Open-source models are catching up but aren't there yet for this use case

Llama, Mistral, Qwen are improving quickly. When self-hosted reasoning quality matches Claude for multi-step ops work — and clients have a real need for on-prem — we'll build there too. Today, most clients are better served by Claude.

We are not a Claude reseller. We are not financially incentivized by Anthropic. We use Claude because it's the best model for this work today. If that changes, we change.

What we don't do

A short list, because it tells you more than the long version of what we do.

  • We don't sell SaaS. There is no Runwell platform you log into. The automations live in your stack.
  • We don't lock you in. Every automation we build, you own. Every integration, every prompt, every piece of context engineering — documented and handed over.
  • We don't do retainers without earning them. The Engine includes 90 days of guarantee work. After that, the Rails retainer is opt-in and only for clients who've already gone through the Engine.
  • We don't take work we can't build. If your stack is too custom or too legacy, we'll tell you in the Blueprint.
  • We don't build chatbots. We will not build you "an AI to answer customer questions on your website." That's a different product category and a different vendor.
"We are a builder, not a platform. We ship code into your environment, integrated with your systems, owned by you, monitored by us through telemetry rather than access." The Runwell technical thesis, in one sentence

What to ask us in the discovery call

If you're the technical person sitting in on a Runwell discovery call, here are the questions that will tell you whether we know what we're doing.

  1. Show me a reasoning trace from a real production agent. We can. Redacted for client confidentiality.
  2. What's your error rate on production builds, and how do you measure it? Tracked per-automation in the Runwell dashboard. We'll show you the format.
  3. What happens when Claude has an outage? Graceful degradation — automations queue, retry, and alert. We'll walk you through the failure modes.
  4. What does your handoff documentation look like if we want to maintain this internally later? Full prompt library, integration docs, monitoring schema, playbook for adding new automations. Sample available on request.
  5. How do you handle PII? Data minimization at context assembly, no PII in prompt logs by default, optional self-hosted deployment for HIPAA-style needs (priced separately).
  6. What happens at the end of the engagement — who owns what? You own everything that runs in your environment. We hand over the prompt library, the integration code, and the monitoring schema. You can fire us tomorrow and the automations keep running.
  7. Can we run the same automation against a staging environment first? Yes. Every Engine engagement includes a 1–2 week shadow phase against a staging dataset before any production write goes live.

If we can't answer any of these to your satisfaction, don't sign the engagement.

The bottom line for technical reviewers

Runwell is a builder, not a platform. We ship code into your environment, integrated with your systems, owned by you, monitored by us through telemetry rather than access. The AI layer is real Claude reasoning inside a structured agent loop — not a chatbot pretending to do operational work. The architecture is standard for serious 2026 agent development. It's notable mostly because so few people building under the "AI automation" banner are actually doing it this way.

If you have specific technical questions we didn't cover here, book a 30-minute technical deep-dive. No sales. Just engineering.

Questions we get after this article

No items found.

Want to see your firm's number?

10 questions. 3 minutes. A score out of 100 plus your top 3 operational gaps, with a dollar figure attached. Free. No email required to see the result.

Disclaimer. This article is for educational purposes only and does not constitute legal, accounting, tax, medical, or financial advice. References to specific compliance frameworks (ABA Model Rules, AICPA SSTS, HIPAA, SEC/FINRA, state bar rules) reflect the authors' and reviewers' interpretation as of the publish date and may not apply to your jurisdiction or specific facts. Consult your own licensed advisor before acting on anything written here. Statistics and case details have been anonymized; dollar figures reflect actual client outcomes as of the engagement date. RunWell is not a law firm, CPA firm, or registered investment advisor.

The Diagnostic: Find the MoneyLaw FirmsPlaybook

[ARTICLE TITLE — 50–70 chars, primary keyword in first 5 words, buyer-language, specific industry]

[SUBTITLE / DEK — 140–200 chars. Promise the outcome. What the reader will know, do, or decide after reading. Example: "Line-by-line teardown of a Clio-powered firm doing $4.2M, the automations that recovered $312K in 21 days, and the compliance guardrails we didn't skip."]

Reviewed for compliance by [Expert Name], [Credential] · This article cites ABA Model Rule 1.18. See full disclaimer.

[LEDE — 2–3 sentences. Quantified hook. State the number, the firm, and the outcome in the first 150 words so Google's AI Overviews and SGE can lift it. Avoid hedging. Example: "A 9-attorney firm doing $4.2M was losing $312,000 a year to manual intake, handoff gaps, and timekeeper drift. Over 21 days, we instrumented the leak, rebuilt three automations, and recovered the money without adding headcount. Here's the line-by-line teardown."]

TL;DR — Key takeaways

What this article covers

  • The exact dollarization method: how to attach a defensible number to operational waste in a professional firm.
  • The three waste categories that hide in 90% of law firms in the $1M–$5M band.
  • A compliance-safe build path that respects [ABA Model Rule 1.18, your state bar's conflict rules, and your malpractice carrier's requirements].
  • The 21-day rollout, including the two things we intentionally did NOT automate (and why).

The problem, dollarized

[Opening narrative. Set the scene with the specific firm: size, practice area, stack (Clio, MyCase, Smith.ai, QuickBooks, LawPay). Introduce the owner's language — "I'm the bottleneck," "things are falling through the cracks," etc. — pulled from the Customer Intelligence Report. Make this section concrete and specific.]

[Paragraph two. The precipitating event — the partner departure, the lost matter, the 2am billing reconciliation. This is where you earn trust by describing the exact thing the reader has lived through.]

$312,000

Annual operational waste identified in a 9-attorney firm

doing $4.2M in revenue

Source: Runwell Blueprint diagnostic, Q1 2026 · Anonymized client data

Where the waste actually hides

[Transition. Most owners think the leak is in one place; it's usually distributed across three. Preview the three subsections below.]

1. Intake leakage, the $127K that never got a consult

[Body. Show the math. Lead volume × response-time decay × conversion rate × average matter value. Cite the Clio Legal Trends Report where relevant.]

Compliance note

Automated intake for law firms must respect ABA Model Rule 1.18 (duties to prospective clients) and your state bar's conflict-of-interest rules. Never use a chatbot to give legal advice, never form an attorney-client relationship without human review, and never route PHI through a non-BAA vendor. See our compliance-safe intake playbook.

2. Broken handoffs, the $84K that died between paralegal and partner

[Body. The handoff audit. How to instrument it. Where the data actually lives in Clio / MyCase.]

Expert perspective

"I've defended malpractice cases for 18 years, and I can tell you the broken handoff is the #1 root cause of missed statutes of limitations. An instrumented handoff is a malpractice-insurance argument that carriers now credit."

Jane Doe, Esq.

ABA-certified ethics attorney · 18 years in legal malpractice defense.

3. Billing evaporation, the $101K that never made it to the invoice

[Body. Timekeeper drift. The 6-minute rounding problem. How calendar + email reconciliation recovers 8–14% of billable hours without a new hire.]

Waste categoryAnnual cost% of revenueFix complexity
Intake leakage$127,0003.0%Medium
Broken handoffs$84,0002.0%High
Billing evaporation$101,0002.4%Low
Total$312,0007.4%
See your number

Want to run this math on your own firm?

The Runwell Automation Scorecard is 10 questions. Takes 3 minutes. Returns a score out of 100 plus your top 3 operational gaps. Free. No email required to see the result.

Take the free scorecard →

The compliance-safe build (21 days, four automations)

[Body. The rollout plan. Day 1–7: instrument. Day 8–14: build. Day 15–21: adoption.]

"The firm didn't have a tech problem. It had a handoff problem pretending to be a tech problem. We didn't replace the stack. We replaced the handoffs."

Observation from the diagnostic

What we did NOT automate (and why)

[This is the E-E-A-T money section. Show judgment. Two things you didn't automate because they shouldn't be automated: conflict checks final approval, partner-level trust conversations. Respecting the craft is what separates Runwell from "automation agencies."]

Do NOT automate this

Do not automate the final conflict-check approval. Automate the data gathering and the cross reference, but the judgment call about whether something is or is not a conflict must remain with a licensed attorney. Any vendor telling you otherwise is one malpractice suit away from a state bar referral.

21-day results

[Body. The numbers. The adoption rate. What the team said.]

Pitfalls to avoid

  • Pitfall 1. [Describe pitfall + why it happens + how to avoid.]
  • Pitfall 2. [Describe pitfall + why it happens + how to avoid.]
  • Pitfall 3. [Describe pitfall + why it happens + how to avoid.]

Next steps

[Close. Restate the stakes. Offer the soft-conversion wedge (Scorecard). The hard-conversion wedge (Blueprint) comes from the FAQ + final CTA below, not here.]

Felicia Cristofaro

Founder, Runwell · Formerly operator of a 7-figure land investment business

Felicia has built Runwell on her own dogfood: after firing herself from intake on a $1M+ land flipping operation, she productized the method for professional-firm owners in the $500K–$10M band. Runwell has diagnosed ops waste in law firms, CPA firms, RIAs, dental practices, and real estate teams.

Questions we get after this article

What does this actually cost?

Three entry points: Runwell OS, $97/mo. Playbooks, templates, and monthly founder Q&A. Self-serve support. The Blueprint, $2,497 one-time. 44-question diagnostic, 25 to 40 page report, plus a personalized video walkthrough. Delivered in 48 to 72 hours. The Architect, $7,500. The Blueprint plus implementation specs, strategy session, and vendor introductions. The Engine. Done-for-you build priced at 10 to 20% of the recoverable annual value your Blueprint surfaces. Guaranteed to pay for itself within 90 days or we refund and rebuild.

How long does a full build take?

The Blueprint diagnostic is 48–72 hours. A typical build is 4–8 weeks end-to-end, depending on the number of automations and the complexity of your current stack. We work in 2-week sprints with weekly demos so you see progress the entire way. No "we'll show you in month 3" surprises.

What if my team won't adopt the new system?

Change management is inside scope, not an add-on. We interview the people who will actually use the system before designing it, run a 2-week shadow phase where both the old and new workflow coexist, and train on camera with the team's actual cases. Adoption failure is the single biggest reason automation projects die, so we built the whole methodology around it. If adoption falls below 80% in the first 30 days, it's on us to fix.

How do I know if my firm is big enough (or too big) for this?

Runwell is built for firms doing $500K to $10M in revenue. Below $500K, the ops leak usually isn't large enough to justify a done-for-you build, and we'd point you to Runwell OS at $97/mo instead. Above $10M, you likely need a full-time ops leader plus a systems integrator, and we can make an introduction. The sweet spot sits between $1M and $5M, where owner-as-bottleneck is most acute.

What if my firm is in a regulated industry (law, CPA, healthcare, RIA)?

Regulated firms are our core market. Every build path we publish is pre-reviewed against the relevant rules (ABA Model Rules, AICPA SSTS, HIPAA §164, SEC/FINRA), and we refuse builds that create compliance risk. When the build touches regulated data, we run it past a licensed expert on your side (your ethics counsel, your compliance officer) before anything goes live. Non-regulated automation vendors can cost you a bar complaint or an OCR audit. We'd rather lose the deal.

Do I have to learn Zapier, Make, or n8n to use what you build?

No. You receive a working system and a one-page "what to do if something breaks" runbook. Your team uses the tools they already use (Clio, Karbon, Follow Up Boss, whatever). The automation layer is invisible to them. You own the automation workspace, and we document everything so any competent operator can maintain it. You can also hire us on Runwell Ops for light monthly support if you prefer.

How is this different from just hiring an operations consultant or fractional COO?

Three differences: (1) We dollarize the waste before quoting a fix, so the build price anchors against a defensible number, not an hourly rate. (2) We ship working automations, not decks and SOPs. (3) We guarantee the outcome. If the recovered waste doesn't pay for the build within 90 days, we refund or rebuild. Most fractional COOs advise. We implement. Most consultants document. We deploy.

Want to see your firm's number?

10 questions. 3 minutes. A score out of 100 plus your top 3 operational gaps, with a dollar figure attached. Free. No email required to see the result.

Disclaimer. This article is for educational purposes only and does not constitute legal, accounting, tax, medical, or financial advice. References to specific compliance frameworks (ABA Model Rules, AICPA SSTS, HIPAA, SEC/FINRA, state bar rules) reflect the authors' and reviewers' interpretation as of the publish date and may not apply to your jurisdiction or specific facts. Consult your own licensed advisor before acting on anything written here. Statistics and case details have been anonymized; dollar figures reflect actual client outcomes as of the engagement date. Runwell is not a law firm, CPA firm, or registered investment advisor.

This article was reviewed for compliance by [Expert Name], [Credential] on [Review Date].