Computer is an Agent-only capability. It cannot be attached to MCP gateways due to its sensitive nature. Only agents can access computer control.
Computer capability gives agents full programmatic control of a desktop environment. Useful for complex tasks requiring visual feedback and interactive control.
Key Features
- Screenshots - Capture current screen state
- Mouse Control - Move, click, drag operations
- Keyboard Input - Type text, press keys
- Screen Navigation - Scroll, zoom, multi-window
- Real-time Feedback - Visual loop with agent decisions
- State Tracking - Remember screen positions
How to Enable
For Agents Only
- Agents > Select Agent > Settings > Capabilities
- Search for Computer
- Click Attach
- Save
Computer cannot be attached to MCP gateways. It’s agent-exclusive for security reasons.
Usage Examples
Take Screenshot
"Take a screenshot of the current desktop"
Returns PNG image of desktop with dimensions and visible elements.
Click on Element
"Click the blue button in the top-right corner"
Agent:
- Analyzes screenshot
- Identifies button coordinates
- Executes click
- Captures new screenshot
"Fill the login form with username 'alice' and password 'secret'"
Agent:
- Screenshots form
- Identifies input fields
- Clicks on username field
- Types username
- Clicks on password field
- Types password
- Clicks submit button
Navigate Multi-step Process
"Open the application, go to settings, and change the theme to dark"
Agent works through steps with visual feedback.
Screen Coordinates
Agent receives screen coordinates for all elements:
Screenshot resolution: 1920x1080
Button position: x=1850, y=20
Agent can:
- Click at coordinates
- Drag between points
- Identify text positions
- Calculate relative positions
Interaction Types
Mouse Actions
- click(x, y) - Single click
- double_click(x, y) - Double click
- right_click(x, y) - Right/context click
- drag(x1, y1, x2, y2) - Drag from point to point
- move(x, y) - Move cursor without clicking
- scroll(direction, amount) - Scroll up/down/left/right
Keyboard Actions
- type(text) - Type text string
- key(name) - Press single key (Enter, Tab, Escape, etc.)
- hotkey(mod, key) - Keyboard shortcut (Ctrl+C, Cmd+V, etc.)
Navigation
- screenshot() - Capture current screen
- wait(seconds) - Wait for page to load
- maximize() - Maximize window
- minimize() - Minimize window
Size Tiers
Each Computer instance is a dedicated virtual machine. Choose a size based on your workload:
| Size | vCPUs | RAM | Disk |
|---|
| x2 | 2 | 2 GB | 40 GB |
| x4 (default) | 3 | 4 GB | 80 GB |
| x8 | 4 | 8 GB | 160 GB |
| x16 | 8 | 16 GB | 240 GB |
| x32 | 16 | 32 GB | 360 GB |
Supported Operating Systems
- Ubuntu 24.04, Ubuntu 22.04
- Debian 12, Debian 11
Configuration
Optional agent specifications:
{
"computer": {
"shell_enabled": true,
"browser_enabled": false,
"browser_max_tabs": 5,
"browser_max_download_mb": 50,
"browser_allowed_domains": []
}
}
| Setting | Default | Effect |
|---|
shell_enabled | true | Enable shell command execution |
browser_enabled | false | Enable browser subsystem |
browser_max_tabs | 5 | Maximum concurrent browser tabs |
browser_max_download_mb | 50 | Maximum download size (MB) |
browser_allowed_domains | [] | Domain allowlist (empty = all allowed) |
Browser Subsystem
Computer includes an optional stateful browser that persists sessions, cookies, and navigation state across tool calls. This is disabled by default — set browser_enabled: true to activate it.
When the browser subsystem is enabled, the agent gains these tools:
| Tool | Purpose |
|---|
browser_navigate | Navigate to a URL |
browser_snapshot | Get the current page’s accessibility tree |
browser_act | Interact with page elements (click, type, select) |
browser_screenshot | Capture a screenshot of the current page |
browser_pdf | Generate a PDF of the current page |
browser_tabs | List open browser tabs |
browser_close | Close a browser tab |
Stateful vs Stateless Browser
Key difference: The Computer browser subsystem maintains state (cookies, login sessions, tabs) across calls. The standalone Browser capability is stateless — each call starts fresh.
| Feature | Computer Browser (stateful) | Browser Capability (stateless) |
|---|
| Login persistence | Stays logged in across calls | Each call is a fresh session |
| Multi-step workflows | Navigate across pages, fill multi-step forms | Single-page operations only |
| Tabs | Multiple tabs, switch between them | No tab management |
| Domain control | Allowlist specific domains | No domain restrictions |
| Availability | Agent-only | Agents and MCP gateways |
Domain Allowlist
Use browser_allowed_domains to restrict which sites the browser can visit. An empty list (default) allows all domains. When set, navigation to domains not in the list is blocked.
{
"browser_allowed_domains": ["example.com", "app.internal.com"]
}
Resource Limits
| Limit | Value | Notes |
|---|
| Default SSH Timeout | 30 seconds | Per command execution |
| Browser Snapshot | 500 elements max | Truncated if exceeded |
| Browser Max Tabs | 5 | Per session |
| Browser Max Download | 50 MB | Per file |
Cost
For current pricing details, see Pricing.
Monitor in Account > Usage dashboard.
Common Use Cases
Web Application Testing
Screenshot app, verify buttons, test workflows
Automation
Automate repetitive UI tasks programmatically
Data Entry
Fill forms and navigate multi-step processes
Visual Inspection
Verify visual appearance matches requirements
Agent Loop Pattern
Typical agent workflow:
- Screenshot - See current state
- Analyze - LLM processes image
- Decide - LLM decides next action
- Execute - Perform mouse/keyboard action
- Repeat - Loop until task complete
Each iteration includes LLM context (screenshot analysis), so agent sees results of actions.
Best Practices
Start with Screenshot
Always capture initial state before taking actions.
Be Explicit
Use clear instructions for agent:
✓ "Click the 'Save' button (blue, bottom-right)"
✗ "Save the file"
Handle Errors
If action doesn’t work as expected:
"Screenshot again to verify action completed"
Use Coordinates When Possible
Provide coordinates directly when known:
"Click at coordinates (1850, 50)"
Wait for State Changes
Allow time for UI updates:
"Wait 2 seconds for dialog to load"
"Take screenshot to verify"
Limitations
- Desktop/Web only - Works with rendered interfaces
- Not for APIs - Use HTTP Client for APIs
- Visual interpretation - Relies on screenshot analysis
- Speed - Slower than direct API calls
- Flakiness - UI changes can break workflows
When NOT to Use
| Task | Use Instead |
|---|
| API access | HTTP Client |
| Quick calculations | Code Runner |
| File operations | Files |
| Database access | HTTP Client (via REST) |
Troubleshooting
Screenshot is blank
- Wait for page to load
- Check window is focused
- Verify viewport size is correct
Click doesn’t work
- Coordinates may be off
- Element may not be clickable
- Try right-clicking instead
- Screenshot again to verify state
Text not entered
- Field may not be focused
- Type more slowly
- Use keyboard navigation (Tab)
- Copy-paste if typing fails
Agent stuck in loop
- Break task into smaller steps
- Increase wait times
- Provide more explicit instructions
- Use timeout to stop execution
Privacy & Security
Computer capability has broad system access. Only use with trusted tasks.
- Screenshots may contain sensitive data
- Keyboard input includes all characters
- No automatic filtering of credentials
- Use with caution in production
Best practices:
- Use dedicated user accounts
- Limit to non-sensitive applications
- Monitor screen capture content
- Disable in production where possible
API Access
# Execute computer action
curl -X POST https://api.noorle.com/v1/agents/{agent_id}/computer \
-H "X-API-Key: ak-{your_key}" \
-H "Content-Type: application/json" \
-d '{
"action": "screenshot"
}'
# Click action
curl -X POST https://api.noorle.com/v1/agents/{agent_id}/computer \
-H "X-API-Key: ak-{your_key}" \
-H "Content-Type: application/json" \
-d '{
"action": "click",
"x": 100,
"y": 100
}'
Next Steps