Skip to main content
Computer is an Agent-only capability. It cannot be attached to MCP gateways due to its sensitive nature. Only agents can access computer control.
Computer capability gives agents full programmatic control of a desktop environment. Useful for complex tasks requiring visual feedback and interactive control.

Key Features

  • Screenshots - Capture current screen state
  • Mouse Control - Move, click, drag operations
  • Keyboard Input - Type text, press keys
  • Screen Navigation - Scroll, zoom, multi-window
  • Real-time Feedback - Visual loop with agent decisions
  • State Tracking - Remember screen positions

How to Enable

For Agents Only

  1. Agents > Select Agent > Settings > Capabilities
  2. Search for Computer
  3. Click Attach
  4. Save
Computer cannot be attached to MCP gateways. It’s agent-exclusive for security reasons.

Usage Examples

Take Screenshot

"Take a screenshot of the current desktop"
Returns PNG image of desktop with dimensions and visible elements.

Click on Element

"Click the blue button in the top-right corner"
Agent:
  1. Analyzes screenshot
  2. Identifies button coordinates
  3. Executes click
  4. Captures new screenshot

Fill Form

"Fill the login form with username 'alice' and password 'secret'"
Agent:
  1. Screenshots form
  2. Identifies input fields
  3. Clicks on username field
  4. Types username
  5. Clicks on password field
  6. Types password
  7. Clicks submit button
"Open the application, go to settings, and change the theme to dark"
Agent works through steps with visual feedback.

Screen Coordinates

Agent receives screen coordinates for all elements:
Screenshot resolution: 1920x1080
Button position: x=1850, y=20
Agent can:
  • Click at coordinates
  • Drag between points
  • Identify text positions
  • Calculate relative positions

Interaction Types

Mouse Actions

  • click(x, y) - Single click
  • double_click(x, y) - Double click
  • right_click(x, y) - Right/context click
  • drag(x1, y1, x2, y2) - Drag from point to point
  • move(x, y) - Move cursor without clicking
  • scroll(direction, amount) - Scroll up/down/left/right

Keyboard Actions

  • type(text) - Type text string
  • key(name) - Press single key (Enter, Tab, Escape, etc.)
  • hotkey(mod, key) - Keyboard shortcut (Ctrl+C, Cmd+V, etc.)
  • screenshot() - Capture current screen
  • wait(seconds) - Wait for page to load
  • maximize() - Maximize window
  • minimize() - Minimize window

Size Tiers

Each Computer instance is a dedicated virtual machine. Choose a size based on your workload:
SizevCPUsRAMDisk
x222 GB40 GB
x4 (default)34 GB80 GB
x848 GB160 GB
x16816 GB240 GB
x321632 GB360 GB

Supported Operating Systems

  • Ubuntu 24.04, Ubuntu 22.04
  • Debian 12, Debian 11

Configuration

Optional agent specifications:
{
  "computer": {
    "shell_enabled": true,
    "browser_enabled": false,
    "browser_max_tabs": 5,
    "browser_max_download_mb": 50,
    "browser_allowed_domains": []
  }
}
SettingDefaultEffect
shell_enabledtrueEnable shell command execution
browser_enabledfalseEnable browser subsystem
browser_max_tabs5Maximum concurrent browser tabs
browser_max_download_mb50Maximum download size (MB)
browser_allowed_domains[]Domain allowlist (empty = all allowed)

Browser Subsystem

Computer includes an optional stateful browser that persists sessions, cookies, and navigation state across tool calls. This is disabled by default — set browser_enabled: true to activate it.

Browser Tools

When the browser subsystem is enabled, the agent gains these tools:
ToolPurpose
browser_navigateNavigate to a URL
browser_snapshotGet the current page’s accessibility tree
browser_actInteract with page elements (click, type, select)
browser_screenshotCapture a screenshot of the current page
browser_pdfGenerate a PDF of the current page
browser_tabsList open browser tabs
browser_closeClose a browser tab

Stateful vs Stateless Browser

Key difference: The Computer browser subsystem maintains state (cookies, login sessions, tabs) across calls. The standalone Browser capability is stateless — each call starts fresh.
FeatureComputer Browser (stateful)Browser Capability (stateless)
Login persistenceStays logged in across callsEach call is a fresh session
Multi-step workflowsNavigate across pages, fill multi-step formsSingle-page operations only
TabsMultiple tabs, switch between themNo tab management
Domain controlAllowlist specific domainsNo domain restrictions
AvailabilityAgent-onlyAgents and MCP gateways

Domain Allowlist

Use browser_allowed_domains to restrict which sites the browser can visit. An empty list (default) allows all domains. When set, navigation to domains not in the list is blocked.
{
  "browser_allowed_domains": ["example.com", "app.internal.com"]
}

Resource Limits

LimitValueNotes
Default SSH Timeout30 secondsPer command execution
Browser Snapshot500 elements maxTruncated if exceeded
Browser Max Tabs5Per session
Browser Max Download50 MBPer file

Cost

For current pricing details, see Pricing. Monitor in Account > Usage dashboard.

Common Use Cases

Web Application Testing

Screenshot app, verify buttons, test workflows

Automation

Automate repetitive UI tasks programmatically

Data Entry

Fill forms and navigate multi-step processes

Visual Inspection

Verify visual appearance matches requirements

Agent Loop Pattern

Typical agent workflow:
  1. Screenshot - See current state
  2. Analyze - LLM processes image
  3. Decide - LLM decides next action
  4. Execute - Perform mouse/keyboard action
  5. Repeat - Loop until task complete
Each iteration includes LLM context (screenshot analysis), so agent sees results of actions.

Best Practices

Start with Screenshot

Always capture initial state before taking actions.

Be Explicit

Use clear instructions for agent:
✓ "Click the 'Save' button (blue, bottom-right)"
✗ "Save the file"

Handle Errors

If action doesn’t work as expected:
"Screenshot again to verify action completed"

Use Coordinates When Possible

Provide coordinates directly when known:
"Click at coordinates (1850, 50)"

Wait for State Changes

Allow time for UI updates:
"Wait 2 seconds for dialog to load"
"Take screenshot to verify"

Limitations

  • Desktop/Web only - Works with rendered interfaces
  • Not for APIs - Use HTTP Client for APIs
  • Visual interpretation - Relies on screenshot analysis
  • Speed - Slower than direct API calls
  • Flakiness - UI changes can break workflows

When NOT to Use

TaskUse Instead
API accessHTTP Client
Quick calculationsCode Runner
File operationsFiles
Database accessHTTP Client (via REST)

Troubleshooting

Screenshot is blank

  • Wait for page to load
  • Check window is focused
  • Verify viewport size is correct

Click doesn’t work

  • Coordinates may be off
  • Element may not be clickable
  • Try right-clicking instead
  • Screenshot again to verify state

Text not entered

  • Field may not be focused
  • Type more slowly
  • Use keyboard navigation (Tab)
  • Copy-paste if typing fails

Agent stuck in loop

  • Break task into smaller steps
  • Increase wait times
  • Provide more explicit instructions
  • Use timeout to stop execution

Privacy & Security

Computer capability has broad system access. Only use with trusted tasks.
  • Screenshots may contain sensitive data
  • Keyboard input includes all characters
  • No automatic filtering of credentials
  • Use with caution in production
Best practices:
  • Use dedicated user accounts
  • Limit to non-sensitive applications
  • Monitor screen capture content
  • Disable in production where possible

API Access

# Execute computer action
curl -X POST https://api.noorle.com/v1/agents/{agent_id}/computer \
  -H "X-API-Key: ak-{your_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "screenshot"
  }'

# Click action
curl -X POST https://api.noorle.com/v1/agents/{agent_id}/computer \
  -H "X-API-Key: ak-{your_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "click",
    "x": 100,
    "y": 100
  }'

Next Steps