Computer use is currently in beta and requires a beta header:
"computer-use-2025-01-24"
(Claude 4 models and Claude Sonnet 3.7)"computer-use-2024-10-22"
(Claude Sonnet 3.5 (deprecated))
Overview
Computer use is a beta feature that enables Claude to interact with desktop environments. This tool provides:- Screenshot capture: See what’s currently displayed on screen
- Mouse control: Click, drag, and move the cursor
- Keyboard input: Type text and use keyboard shortcuts
- Desktop automation: Interact with any application or interface
Model compatibility
Computer use is available for the following Claude models:Model | Tool Version | Beta Flag |
---|---|---|
Claude 4 models | computer_20250124 | computer-use-2025-01-24 |
Claude Sonnet 3.7 | computer_20250124 | computer-use-2025-01-24 |
Claude Sonnet 3.5 v2 (deprecated) | computer_20241022 | computer-use-2024-10-22 |
Claude 4 models use updated tool versions optimized for the new architecture. Claude Sonnet 3.7 introduces additional capabilities including the thinking feature for more insight into the model’s reasoning process.
Security considerations
Computer use is a beta feature with unique risks distinct from standard API features. These risks are heightened when interacting with the internet. To minimize risks, consider taking precautions such as:
- Use a dedicated virtual machine or container with minimal privileges to prevent direct system attacks or accidents.
- Avoid giving the model access to sensitive data, such as account login information, to prevent information theft.
- Limit internet access to an allowlist of domains to reduce exposure to malicious content.
- Ask a human to confirm decisions that may result in meaningful real-world consequences as well as any tasks requiring affirmative consent, such as accepting cookies, executing financial transactions, or agreeing to terms of service.
Computer use reference implementation
Get started quickly with our computer use reference implementation that includes a web interface, Docker container, example tool implementations, and an agent loop.Note: The implementation has been updated to include new tools for both Claude 4 models and Claude Sonnet 3.7. Be sure to pull the latest version of the repo to access these new features.
Please use this form to provide
feedback on the quality of the model responses, the API itself, or the quality
of the documentation - we cannot wait to hear from you!
Quick start
Here’s how to get started with computer use:Beta header requirements:
- Claude 4 models and Claude Sonnet 3.7: Beta header only required for the computer use tool
- Claude Sonnet 3.5 (deprecated): Beta header required for computer, bash, and text editor tools
How computer use works
1. Provide Claude with the computer use tool and a user prompt
- Add the computer use tool (and optionally other tools) to your API request.
- Include a user prompt that requires desktop interaction, e.g., “Save a picture of a cat to my desktop.”
2. Claude decides to use the computer use tool
- Claude assesses if the computer use tool can help with the user’s query.
- If yes, Claude constructs a properly formatted tool use request.
- The API response has a
stop_reason
oftool_use
, signaling Claude’s intent.
3. Extract tool input, evaluate the tool on a computer, and return results
- On your end, extract the tool name and input from Claude’s request.
- Use the tool on a container or Virtual Machine.
- Continue the conversation with a new
user
message containing atool_result
content block.
4. Claude continues calling computer use tools until it's completed the task
- Claude analyzes the tool results to determine if more tool use is needed or the task has been completed.
- If Claude decides it needs another tool, it responds with another
tool_use
stop_reason
and you should return to step 3. - Otherwise, it crafts a text response to the user.
The computing environment
Computer use requires a sandboxed computing environment where Claude can safely interact with applications and the web. This environment includes:- Virtual display: A virtual X11 display server (using Xvfb) that renders the desktop interface Claude will see through screenshots and control with mouse/keyboard actions.
- Desktop environment: A lightweight UI with window manager (Mutter) and panel (Tint2) running on Linux, which provides a consistent graphical interface for Claude to interact with.
- Applications: Pre-installed Linux applications like Firefox, LibreOffice, text editors, and file managers that Claude can use to complete tasks.
- Tool implementations: Integration code that translates Claude’s abstract tool requests (like “move mouse” or “take screenshot”) into actual operations in the virtual environment.
- Agent loop: A program that handles communication between Claude and the environment, sending Claude’s actions to the environment and returning the results (screenshots, command outputs) back to Claude.
- Receives Claude’s tool use requests
- Translates them into actions in your computing environment
- Captures the results (screenshots, command outputs, etc.)
- Returns these results to Claude
How to implement computer use
Start with our reference implementation
We have built a reference implementation that includes everything you need to get started quickly with computer use:- A containerized environment suitable for computer use with Claude
- Implementations of the computer use tools
- An agent loop that interacts with the Claude API and executes the computer use tools
- A web interface to interact with the container, agent loop, and tools.
Understanding the multi-agent loop
The core of computer use is the “agent loop” - a cycle where Claude requests tool actions, your application executes them, and returns results to Claude. Here’s a simplified example:When using the computer use tool, you must include the appropriate beta flag for your model version:
Note: For Claude 4 models and Claude Sonnet 3.7, the beta flag is only required for the computer use tool. For Claude Sonnet 3.5 (deprecated), the beta flag is required for computer, bash, and text editor tools.
Claude 4 models
Claude 4 models
When using
computer_20250124
, include this beta flag:Claude Sonnet 3.7
Claude Sonnet 3.7
When using
computer_20250124
, include this beta flag:Claude Sonnet 3.5 v2 (deprecated)
Claude Sonnet 3.5 v2 (deprecated)
When using
computer_20241022
, include this beta flag:Optimize model performance with prompting
Here are some tips on how to get the best quality outputs:- Specify simple, well-defined tasks and provide explicit instructions for each step.
- Claude sometimes assumes outcomes of its actions without explicitly checking their results. To prevent this you can prompt Claude with
After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking: "I have evaluated step X..." If not correct, try again. Only when you confirm a step was executed correctly should you move on to the next one.
- Some UI elements (like dropdowns and scrollbars) might be tricky for Claude to manipulate using mouse movements. If you experience this, try prompting the model to use keyboard shortcuts.
- For repeatable tasks or UI interactions, include example screenshots and tool calls of successful outcomes in your prompt.
- If you need the model to log in, provide it with the username and password in your prompt inside xml tags like
<robot_credentials>
. Using computer use within applications that require login increases the risk of bad outcomes as a result of prompt injection. Please review our guide on mitigating prompt injections before providing the model with login credentials.
If you repeatedly encounter a clear set of issues or know in advance the tasks
Claude will need to complete, use the system prompt to provide Claude with
explicit tips or instructions on how to do the tasks successfully.
System prompts
When one of the Anthropic-defined tools is requested via the Claude API, a computer use-specific system prompt is generated. It’s similar to the tool use system prompt but starts with:You have access to a set of functions you can use to answer the user’s question. This includes access to a sandboxed computing environment. You do NOT currently have the ability to inspect files or interact with external resources, except by invoking the below functions.As with regular tool use, the user-provided
system_prompt
field is still respected and used in the construction of the combined system prompt.
Available actions
The computer use tool supports these actions: Basic actions (all versions)- screenshot - Capture the current display
- left_click - Click at coordinates
[x, y]
- type - Type text string
- key - Press key or key combination (e.g., “ctrl+s”)
- mouse_move - Move cursor to coordinates
computer_20250124
)
Available in Claude 4 models and Claude Sonnet 3.7:
- scroll - Scroll in any direction with amount control
- left_click_drag - Click and drag between coordinates
- right_click, middle_click - Additional mouse buttons
- double_click, triple_click - Multiple clicks
- left_mouse_down, left_mouse_up - Fine-grained click control
- hold_key - Hold a key while performing other actions
- wait - Pause between actions
Example actions
Example actions
Tool parameters
Parameter | Required | Description |
---|---|---|
type | Yes | Tool version (computer_20250124 or computer_20241022 ) |
name | Yes | Must be “computer” |
display_width_px | Yes | Display width in pixels |
display_height_px | Yes | Display height in pixels |
display_number | No | Display number for X11 environments |
Keep display resolution at or below 1280x800 (WXGA) for best performance. Higher resolutions may cause accuracy issues due to image resizing.
Important: The computer use tool must be explicitly executed by your application - Claude cannot execute it directly. You are responsible for implementing the screenshot capture, mouse movements, keyboard inputs, and other actions based on Claude’s requests.
Enable thinking capability in Claude 4 models and Claude Sonnet 3.7
Claude Sonnet 3.7 introduced a new “thinking” capability that allows you to see the model’s reasoning process as it works through complex tasks. This feature helps you understand how Claude is approaching a problem and can be particularly valuable for debugging or educational purposes. To enable thinking, add athinking
parameter to your API request:
budget_tokens
parameter specifies how many tokens Claude can use for thinking. This is subtracted from your overall max_tokens
budget.
When thinking is enabled, Claude will return its reasoning process as part of the response, which can help you:
- Understand the model’s decision-making process
- Identify potential issues or misconceptions
- Learn from Claude’s approach to problem-solving
- Get more visibility into complex multi-step operations
Augmenting computer use with other tools
The computer use tool can be combined with other tools to create more powerful automation workflows. This is particularly useful when you need to:- Execute system commands (bash tool)
- Edit configuration files or scripts (text editor tool)
- Integrate with custom APIs or services (custom tools)
Build a custom computer use environment
The reference implementation is meant to help you get started with computer use. It includes all of the components needed have Claude use a computer. However, you can build your own environment for computer use to suit your needs. You’ll need:- A virtualized or containerized environment suitable for computer use with Claude
- An implementation of at least one of the Anthropic-defined computer use tools
- An agent loop that interacts with the Claude API and executes the
tool_use
results using your tool implementations - An API or UI that allows user input to start the agent loop
Implement the computer use tool
The computer use tool is implemented as a schema-less tool. When using this tool, you don’t need to provide an input schema as with other tools; the schema is built into Claude’s model and can’t be modified.1
Set up your computing environment
Create a virtual display or connect to an existing display that Claude will interact with. This typically involves setting up Xvfb (X Virtual Framebuffer) or similar technology.
2
Implement action handlers
Create functions to handle each action type that Claude might request:
3
Process Claude's tool calls
Extract and execute tool calls from Claude’s responses:
4
Implement the agent loop
Create a loop that continues until Claude completes the task:
Handle errors
When implementing the computer use tool, various errors may occur. Here’s how to handle them:Screenshot capture failure
Screenshot capture failure
If screenshot capture fails, return an appropriate error message:
Invalid coordinates
Invalid coordinates
If Claude provides coordinates outside the display bounds:
Action execution failure
Action execution failure
If an action fails to execute:
Follow implementation best practices
Use appropriate display resolution
Use appropriate display resolution
Set display dimensions that match your use case while staying within recommended limits:
- For general desktop tasks: 1024x768 or 1280x720
- For web applications: 1280x800 or 1366x768
- Avoid resolutions above 1920x1080 to prevent performance issues
Implement proper screenshot handling
Implement proper screenshot handling
When returning screenshots to Claude:
- Encode screenshots as base64 PNG or JPEG
- Consider compressing large screenshots to improve performance
- Include relevant metadata like timestamp or display state
Add action delays
Add action delays
Some applications need time to respond to actions:
Validate actions before execution
Validate actions before execution
Check that requested actions are safe and valid:
Log actions for debugging
Log actions for debugging
Keep a log of all actions for troubleshooting:
Understand computer use limitations
The computer use functionality is in beta. While Claude’s capabilities are cutting edge, developers should be aware of its limitations:- Latency: the current computer use latency for human-AI interactions may be too slow compared to regular human-directed computer actions. We recommend focusing on use cases where speed isn’t critical (e.g., background information gathering, automated software testing) in trusted environments.
- Computer vision accuracy and reliability: Claude may make mistakes or hallucinate when outputting specific coordinates while generating actions. Claude Sonnet 3.7 introduces the thinking capability that can help you understand the model’s reasoning and identify potential issues.
- Tool selection accuracy and reliability: Claude may make mistakes or hallucinate when selecting tools while generating actions or take unexpected actions to solve problems. Additionally, reliability may be lower when interacting with niche applications or multiple applications at once. We recommend that users prompt the model carefully when requesting complex tasks.
- Scrolling reliability: While Claude Sonnet 3.5 v2 (deprecated) had limitations with scrolling, Claude Sonnet 3.7 introduces dedicated scroll actions with direction control that improves reliability. The model can now explicitly scroll in any direction (up/down/left/right) by a specified amount.
- Spreadsheet interaction: Mouse clicks for spreadsheet interaction have improved in Claude Sonnet 3.7 with the addition of more precise mouse control actions like
left_mouse_down
,left_mouse_up
, and new modifier key support. Cell selection can be more reliable by using these fine-grained controls and combining modifier keys with clicks. - Account creation and content generation on social and communications platforms: While Claude will visit websites, we are limiting its ability to create accounts or generate and share content or otherwise engage in human impersonation across social media websites and platforms. We may update this capability in the future.
- Vulnerabilities: Vulnerabilities like jailbreaking or prompt injection may persist across frontier AI systems, including the beta computer use API. In some circumstances, Claude will follow commands found in content, sometimes even in conflict with the user’s instructions. For example, Claude instructions on webpages or contained in images may override instructions or cause Claude to make mistakes. We recommend: a. Limiting computer use to trusted environments such as virtual machines or containers with minimal privileges b. Avoiding giving computer use access to sensitive accounts or data without strict oversight c. Informing end users of relevant risks and obtaining their consent before enabling or requesting permissions necessary for computer use features in your applications
- Inappropriate or illegal actions: Per Anthropic’s terms of service, you must not employ computer use to violate any laws or our Acceptable Use Policy.
Pricing
Computer use follows the standard tool use pricing. When using the computer use tool: System prompt overhead: The computer use beta adds 466-499 tokens to the system prompt Computer use tool token usage:Model | Input tokens per tool definition |
---|---|
Claude 4 / Sonnet 3.7 | 735 tokens |
Claude Sonnet 3.5 (deprecated) | 683 tokens |
- Screenshot images (see Vision pricing)
- Tool execution results returned to Claude
If you’re also using bash or text editor tools alongside computer use, those tools have their own token costs as documented in their respective pages.