Skip to main content
Workflows are multi-step, durable processes that coordinate agents and tools. Unlike single tool calls (which are ephemeral), workflows can be paused, resumed, and survive failures.

Problem Solved

Single tool calls are stateless:
Tool call: code_run("process_data")
  ├─ Network fails mid-execution
  └─ Lost state, lost progress
     Have to restart from scratch
Workflows are durable:
Workflow: DataPipeline
  ├─ Step 1: Read data (DONE)
  ├─ Step 2: Transform (FAILED, retrying)
  ├─ Step 3: Analyze (PENDING)
  └─ Step 4: Report (PENDING)

  If step 2 fails:
    └─ Automatic retry
       If still fails:
         └─ Alert operator
            Operator fixes and resumes

  State is persistent and recoverable

Workflow Architecture

Workflow Nodes

Agent Invocation Node

Call an agent to perform work: State machine:

Tool Call Node

Directly call a capability:

Approval Node

Wait for human approval: States:

Conditional Branch Node

Route based on data:

Wait Node

Delay execution:

State Machine

Each workflow has a global state:

Example: Loan Approval Workflow

Durability & Resilience

Workflows survive failures: Automatic retries:
Retry Policy:
  ├─ Exponential backoff: 1s, 2s, 4s, 8s, ...
  ├─ Max retries: 3
  ├─ Max duration: 24 hours
  └─ Alert on failure: true

Workflow Definition

Workflows are defined as JSON or via visual designer:
{
  "name": "Loan Approval",
  "description": "Evaluate and approve loan applications",
  "nodes": [
    {
      "id": "collect",
      "type": "agent",
      "agent_id": "collector-agent",
      "task": "Collect application details",
      "next": "assess"
    },
    {
      "id": "assess",
      "type": "tool",
      "tool": "code_run",
      "language": "python",
      "code": "assess_risk(input['details'])",
      "next": "route"
    },
    {
      "id": "route",
      "type": "conditional",
      "condition": "input['risk_level'] == 'high'",
      "true_branch": "deny",
      "false_branch": "approve_or_review"
    },
    {
      "id": "approve_or_review",
      "type": "conditional",
      "condition": "input['risk_level'] == 'low'",
      "true_branch": "auto_approve",
      "false_branch": "human_approval"
    },
    {
      "id": "human_approval",
      "type": "approval",
      "message": "Review and approve loan?",
      "timeout_hours": 72,
      "approvers": ["loan-officer"],
      "approved_branch": "notify_approval",
      "rejected_branch": "notify_denial"
    },
    {
      "id": "auto_approve",
      "type": "agent",
      "agent_id": "notifier",
      "task": "Send approval notification",
      "next": "end"
    },
    {
      "id": "deny",
      "type": "agent",
      "agent_id": "notifier",
      "task": "Send denial notification",
      "next": "end"
    }
  ]
}

Workflow Execution & Monitoring

Starting a Workflow

POST /api/workflows/{workflow_id}/execute
{
  "context": {
    "applicant_id": "123",
    "amount": 50000
  }
}

Response:
{
  "execution_id": "exec-456",
  "status": "RUNNING",
  "current_node": "collect"
}

Monitoring Progress

GET /api/workflows/exec-456

Response:
{
  "execution_id": "exec-456",
  "status": "RUNNING",
  "progress": {
    "collect": "SUCCESS",
    "assess": "RUNNING",
    "route": "PENDING",
    "human_approval": "PENDING"
  },
  "current_node": "assess",
  "started_at": "2024-03-22T10:00:00Z",
  "elapsed_seconds": 45
}

Manual Intervention

# Pause workflow
POST /api/workflows/exec-456/pause

# Resume workflow
POST /api/workflows/exec-456/resume

# Cancel workflow
POST /api/workflows/exec-456/cancel

# Approve pending approval node
POST /api/workflows/exec-456/nodes/human_approval/approve
{
  "decision": "approved",
  "reason": "Meets criteria"
}

Audit Trail

Every step is logged:
Execution Audit Trail:
└─ exec-456 (Loan Approval)

   ├─ 10:00:00 [collect] STARTED
   ├─ 10:00:23 [collect] SUCCESS
   │  └─ Output: {applicant details}

   ├─ 10:00:24 [assess] STARTED
   ├─ 10:00:45 [assess] SUCCESS
   │  └─ Output: {risk_level: "medium"}

   ├─ 10:00:46 [route] STARTED
   ├─ 10:00:46 [route] SUCCESS
   │  └─ Branch: approve_or_review

   ├─ 10:00:47 [approve_or_review] STARTED
   ├─ 10:00:47 [approve_or_review] SUCCESS
   │  └─ Branch: human_approval

   ├─ 10:00:48 [human_approval] STARTED
   │  └─ Waiting for approval from: loan-officer

   ├─ 13:45:00 [human_approval] APPROVED
   │  └─ Approved by: john@loanteam.com
   │  └─ Reason: "Meets criteria"

   ├─ 13:45:01 [notify_approval] STARTED
   ├─ 13:45:15 [notify_approval] SUCCESS

   └─ 13:45:16 [END] COMPLETED
      └─ Total duration: 3h 45m 16s

Use Cases

Loan/Credit Approval

Multi-step verification with human checkpoints. Durable against failures.

Data Processing Pipelines

Extract → Transform → Load with automatic retries and recovery.

Content Moderation

Automated initial review + human escalation for edge cases.

Customer Onboarding

Collect info → Verify identity → Create account → Send welcome email.

Incident Response

Detect issue → Alert team → Collect diagnostic info → Auto-remediate → Follow up.

Best Practices

Clear Approval Gates

Use approval nodes for critical decisions. Set appropriate timeouts.

Error Paths

Define what happens on failure. Auto-retry vs. manual intervention.

Audit Trail

All workflows generate complete audit logs for compliance.

Timeout Handling

Set timeouts on approval nodes. Escalate if no response within period.

Next: Understand WebAssembly for secure plugin execution.