Latency
An important priority to us is latency, especially for voice-based conversations. When a user is left in silence, it creates an unnatural and awkward exchange.
The key areas that cause latency in a Voiceflow AI Agent are asynchronous steps, where we wait for a long-running action to finish before moving on to the next part of the conversation.
While some things are outside of Voiceflow’s control—like how long it takes your API to respond in a function or API step—we can work on improving the overhead around them.
Javascript step improvements
Executing “untrusted” code is tricky. Bad actors can write malicious code and potentially access sensitive data.
Luckily we’ve explored a new solution in how we run Javascript (JS) code behind the scenes that allows us both to be more secure and improve performance. Nothing needs to change on the user end to access these improvements.Here are some benchmarks [1] (n=5):
simple_arithmetic
represents the typical use case of JS steps where we assign a new value to a variable. While a difference of ~30ms is largely imperceptible during a conversation, it can compound into a more noticeable difference when running multiple javascript steps in a turn.
fibonacci
represents more complex calculations and logic, along with larger assignments. There is a notable improvement here from 868ms to 51ms.
Javascript steps created after July 30th, 2024 will automatically be using the new system. We’ll slowly convert existing Javascript steps to use the new system. No action is needed from Voiceflow users.
For more information: https://docs.voiceflow.com/changelog/changes-to-javascript-step-behavior
The old JavaScript step was also subject to cold starts, which can add an additional 300ms to your calls once in a while. Cold starts will no longer apply with the new step.
Function step improvements
Functions are a critical part of Voiceflow’s ecosystem that ensure reusability across agents.
Function speed
We’ve noted that loading the code to run a function can often take a significant amount of time. This varies depending on the size of the code:
- for a small function of ~15 lines: 40-100ms
- for a function that contains 5 paragraphs of “lorem-ipsum” as a string: 100-150ms
If you have several large functions in sequence, this load time can drastically slow a user down.
We’ve introduced new caching methods that reduce the load-in time of functions on average with the times mentioned above forming the high end of the range. In most cases, code loading in is now 0ms, so we can see an improvement of 50-150ms.
Function cold starts
Voiceflow employs a runtime monitoring system where test agents with very simple designs are called every minute, allowing us to ensure their output responses are what we consistent.
One test agent exclusively runs a very simple function:
Here is what the performance of the entire DM API call looks like across one week looks like:
*this is the time of the entire DM API call, not just the function in isolation
On average (P50), the entire Dialog Manager API call (not exclusively the function) is just ~150ms. But very occasionally, 0.01% (the dark blue spikes P99.99) of the time it spikes up to 3-4 seconds. If your agent is handling 10,000 requests, 1 request might take longer than expected, this scales up with the number of people using our agent. These spikes were mostly due to our use of AWS Lambda to run our functions, which has a high cold-start penalty.
If someone is left on a phone call, or a chat, waiting for 3 seconds when you’re expecting 150 milliseconds, it’s not ideal. Our priority is to make extreme quality the default and put the extra effort into limiting that 0.01%.
This spike was mostly due to our use of AWS Lambda to run our functions, which has a high cold-start penalty. We’re reducing the cold start by optimizing our Lambdas.
old function cold start average (n=7) — 2866.12
new function cold start average — 522.18
We're reducing the cold start time by optimizing our Lambdas, and while it might still take 0.5s longer every 1 in 10,000 calls to a function, it’s a far better improvement from 3 seconds.
Our continuing goal of reducing these spikes includes additional provisioning measures, so the frequency drops even further, with a target of 0.5s every 1 in 100,000 calls.
Here is what the performance of the DM API call across one week looks like after these changes:
*this is the time of the entire DM API call, not just the function in isolation
We can see that P50 and P99.99 times are both drastically lower. P99.9 time is under 1 second.
API step improvements
API step cold starts
Our API step uses the same system as Functions, so the same cold start logic applies to API steps as well. We’ll be reducing latency from 3 seconds at P99.99 to 0.5 seconds.
Consistency and next steps
We’ve increased our level of logging to further understand low-level networking issues can occasionally cause requests to get dropped. Fixes have already been implemented and further updates are planned.
Improving performance remains a top priority for Voiceflow. Our goal to provide exceptional automated customer experience depends on stellar performance, and we can't wait to share the ongoing and future measures we have planned that will elevate the impact of your AI Agents.
Reference
[1] Javascript Step Benchmarks
fibonacci
:
function fibonacci(n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
sessions = fibonacci(30)
simple_arithmetic
:
sessions += 5
New JS step:
Old JS step:
[1] Function step cold start benchmarks
Old function cold start:
New function cold start: