Harness the power of OpenAI’s Whisper model for ASR with Voiceflow

Voice assistants are becoming increasingly popular as they provide an efficient and intuitive way for users to interact with various applications. And with the advent of large language models (LLMs) like OpenAI’s GPT series, voice assistants have become more capable of understanding and generating responses for longer and more complex user inputs.

This quick project is on Voiceflow ASR Demo, which harnesses the power of OpenAI’s Whisper model for automatic speech recognition (ASR) without the need for an external API. By using a Docker container, you can run the ASR service locally or on your server, providing a more versatile and customizable solution.

What’s the idea?

As users interact with LLM-powered voice assistants, they tend to provide longer and more complex utterances. This is beneficial because it gives the assistant more context to generate better answers. The idea then is to use the Whisper model for ASR without relying on an external API, offering you more control and customization options while keeping your data in-house.

What is the Voiceflow ASR Demo?

The Voiceflow ASR Demo is a test page that demonstrates ASR capabilities using OpenAI’s Whisper model. The project consists of a simple webpage that captures audio from the user’s microphone, sends it to your custom endpoint, and displays the transcribed text and the time it took to render the transcription.

Setting up the Voiceflow ASR Demo

To get started, you'll need Node.js and Docker installed on your machine. Follow these steps to set up the demo:

Clone the repository: git clone https://github.com/voiceflow-gallagan/whisper-asr-demo.git

Change to the project directory: cd whisper-asr-demo

Install the required dependencies: npm install

Pull and run the Docker container for the ASR webservice: docker run -d -p 9000:9000 -e ASR_MODEL=base.en onerahmet/openai-whisper-asr-webservice:latest

Start the proxy server: npm start

Now, the proxy server should be running at http://localhost:3000. Open the index.html file in your browser to test the ASR demo.

Using the Voiceflow ASR Demo

Click the “Start Recording” button to start capturing audio from your microphone.

Speak into your microphone.

The recording will stop automatically after a specified duration of silence (2 seconds by default) or can be manually stopped by clicking the “Stop Recording” button.

The transcribed text and the time it took to render the transcription will be displayed on the page.

Do more with OpenAI's Whisper model

This demo should be a good start for you to provide an efficient and customizable way to leverage OpenAI’s Whisper model for ASR in your Voiceflow Voice Assistants. By using a local or server-hosted Docker container, you can avoid relying on external APIs and maintain greater control over your data.

Thanks to Ahmet Oner for sharing the whisper-asr-webservice we're using in this demo. Do not hesitate to check it to find more information and details to use a different model.

‍

What is the Voiceflow ASR Demo?

Key features

Start and stop recording with a button
Auto-end recording after a specified duration of silence
Utilizes a Docker container to run the ASR webservice locally
Uses a proxy to avoid CORS issues

Setting up the Voiceflow ASR Demo

To get started, you'll need Node.js and Docker installed on your machine. Follow these steps to set up the demo:

Clone the repository: git clone https://github.com/voiceflow-gallagan/whisper-asr-demo.git
Change to the project directory: cd whisper-asr-demo
Install the required dependencies: npm install
Pull and run the Docker container for the ASR webservice: docker run -d -p 9000:9000 -e ASR_MODEL=base.en onerahmet/openai-whisper-asr-webservice:latest
Start the proxy server: npm start

Now, the proxy server should be running at http://localhost:3000. Open the index.html file in your browser to test the ASR demo.

Using the Voiceflow ASR Demo

Click the “Start Recording” button to start capturing audio from your microphone.
Speak into your microphone.
The recording will stop automatically after a specified duration of silence (2 seconds by default) or can be manually stopped by clicking the “Stop Recording” button.
The transcribed text and the time it took to render the transcription will be displayed on the page.

Do more with OpenAI's Whisper model

Thanks to Ahmet Oner for sharing the whisper-asr-webservice we're using in this demo. Do not hesitate to check it to find more information and details to use a different model.

‍

Harness the power of OpenAI’s Whisper model for ASR with Voiceflow

What’s the idea?

What is the Voiceflow ASR Demo?

Key features

Setting up the Voiceflow ASR Demo

Using the Voiceflow ASR Demo

Do more with OpenAI's Whisper model

What is the Voiceflow ASR Demo?

Key features

Setting up the Voiceflow ASR Demo

Using the Voiceflow ASR Demo

Do more with OpenAI's Whisper model

17 prompts for building AI apps in Voiceflow

Building a knowledge base with OpenAI, LangChain, OpenSearch, and Unstructured

How to create a simple Telegram bot with Voiceflow

Want to try Claude in your Voiceflow assistant? Here's how