Fine-grained tool streaming
Tool use now supports fine-grained streaming for parameter values. This allows developers to stream tool use parameters without buffering / JSON validation, reducing the latency to begin receiving large parameters.
Fine-grained tool streaming is a beta feature. Please make sure to evaluate your responses before using it in production.
Please use this form to provide feedback on the quality of the model responses, the API itself, or the quality of the documentation—we cannot wait to hear from you!
When using fine-grained tool streaming, you may potentially receive invalid or partial JSON inputs. Please make sure to account for these edge cases in your code.
How to use fine-grained tool streaming
To use this beta feature, simply add the beta header fine-grained-tool-streaming-2025-05-14
to a tool use request and turn on streaming.
Here’s an example of how to use fine-grained tool streaming with the API:
In this example, fine-grained tool streaming enables Claude to stream the lines of a long poem into the tool call make_file
without buffering to validate if the lines_of_text
parameter is valid JSON. This means you can see the parameter stream as it arrives, without having to wait for the entire parameter to buffer and validate.
With fine-grained tool streaming, tool use chunks start streaming faster, and are often longer and contain fewer word breaks. This is due to differences in chunking behavior.
Example:
Without fine-grained streaming (15s delay):
With fine-grained streaming (3s delay):
Because fine-grained streaming sends parameters without buffering or JSON validation, there is no guarantee that the resulting stream will complete in a valid JSON string.
Particularly, if the stop reason max_tokens
is reached, the stream may end midway through a parameter and may be incomplete. You will generally have to write specific support to handle when max_tokens
is reached.