model: Specifies which model to use (DeepSeek R1 or another variant like DeepSeek R1:7b)messages: A list of messages for the chat; the "user" role represents your inputstream: false: Tells Ollama to return the full response in one go (you can set it to true for streaming responses)

How to Run DeepSeek-R1 Locally with Ollama

As the demand for high-performing language models continues to grow, more developers and organizations are seeking ways to run large language models (LLMs) locally. Whether it’s for privacy, customization, or avoiding the high costs of cloud-based APIs, local deployment puts full control in your hands.

DeepSeek R1—a powerful open-source LLM optimized for reasoning and problem-solving—offers an excellent balance between performance and accessibility. And with Ollama, setting it up locally is remarkably straightforward. In this guide, you’ll learn exactly how to install Ollama, run DeepSeek R1 on your own machine, and begin leveraging its capabilities—without relying on external services.

Why Run DeepSeek-R1 Locally?

Deploying DeepSeek R1 on your local system offers a range of practical benefits:

Privacy & Security

Keep sensitive data on your own infrastructure, minimizing the risk of exposure, logging, or data retention associated with cloud APIs.

Performance & Speed

Avoid latency caused by API requests over the internet. With local inference, responses are faster and more consistent.

Cost Savings

Say goodbye to pay-per-token charges or subscription fees. Once downloaded, the model runs freely on your hardware.

Customization

Running locally means you can experiment, fine-tune prompts, or integrate the model into bespoke workflows without platform limitations.

Offline Availability

Once installed, DeepSeek R1 can run entirely offline—ideal for secure environments or when working without internet access.

Setting Up DeepSeek R1 with Ollama

Ollama is a lightweight tool that makes running LLMs locally simple and efficient. It handles downloading, installing, and running models behind the scenes, letting you focus on using the model—not configuring it.

Ollama supports a variety of models, including DeepSeek R1 and its smaller distilled versions. Whether you're working on macOS, Linux, or Windows, the setup process is smooth and consistent.

Understanding the Terminal (Command Line)

Before we dive into the installation steps, let’s clarify what the terminal (or “command line”) is, why it’s used, and how to access it on each operating system.

What Is the Terminal?
The terminal (also known as the command-line interface, CLI, or shell) is a text-based way of interacting with your operating system. Instead of clicking through graphical menus, you type commands and receive text responses. This is incredibly powerful for tasks like installing tools, running servers, and automating workflows.
Why Use the Terminal for This Setup?
1. Precision: Command-line tools like Ollama provide exact control over installation and execution.
2. Simplicity: Many developer tools are built to run in a shell environment; instructions are often given as shell commands to copy and paste.
3. Automation & Scripting: You can script repeated tasks, making it easier to set up and manage multiple models.
How to Access the Terminal on Each OS
- macOS
  - Click the magnifying glass in the top-right corner (Spotlight) and search for Terminal, or
  - Open Finder > Applications > Utilities > Terminal.
- Linux
  - The default terminal can be found in your application menu (e.g., Activities > Terminal in Ubuntu). You can also press Ctrl + Alt + T on many Linux distributions.
- Windows
  - Search for Command Prompt or PowerShell in the Start menu.
  - Alternatively, you can use Windows Terminal if installed.
  - If you prefer a Linux-like environment on Windows, you can install WSL (Windows Subsystem for Linux) and open a Bash shell.

With that in mind, let’s proceed to the actual setup steps.

Step-by-Step Instructions

Step 1: Install Ollama

For macOS and Windows:

Visit ollama.com.
Download the installer for your system.
Follow the setup instructions to complete the installation.

For Linux:

Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

After installation, confirm that Ollama is working by checking the version:

ollama --version

Step 2: Download and Run DeepSeek R1

With Ollama installed, you can download and launch DeepSeek R1 using a single command:

ollama run deepseek-r1

This command will:

Automatically download the base DeepSeek R1 model.
Launch an interactive session in your terminal where you can input prompts and receive responses.

Step 3: Run a Smaller Model (Optional)

If your system can’t support the full-scale model (671B parameters), you can run a smaller version. Ollama supports various sizes such as 1.5B, 7B, 14B, and more.

Use this format:

ollama run deepseek-r1:Xb

Replace X with the desired model size. For example:

ollama run deepseek-r1:7b

This flexibility ensures you can run DeepSeek R1 even on modest hardware.

Step 4: Serve DeepSeek R1 as an API (Optional)

To make the model available for use in applications or scripts, you can run it as a background API server: ollama serve

How to Send API Requests to the Ollama Endpoint

Once you've started the Ollama server using:

ollama serve

Ollama exposes a local API (usually at http://localhost:11434) that allows you to interact with the model programmatically. You can use tools like curl, Postman, or Python to send requests.

Example Using curl

Here's a simple example that sends a prompt to DeepSeek R1 and receives a response:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1",
  "messages": [
    { "role": "user", "content": "What is the capital of France?" }
  ],
  "stream": false
}'

What This Does:

model: Specifies which model to use (DeepSeek R1 or another variant like DeepSeek R1:7b)
messages: A list of messages for the chat; the "user" role represents your input
stream: false: Tells Ollama to return the full response in one go (you can set it to true for streaming responses)

The server will return a JSON response containing the model's reply.

Next Steps

Now that DeepSeek R1 is up and running locally on your machine, you can explore a range of possibilities and tools to get the most out of your setup. Here are some ideas on where to go next:

Try Different Interfaces and GUIs
- If you prefer a more visual approach, look into community-built graphical interfaces (GUIs) for local language models. Tools like Text Generation Web UI let you interact with LLMs through a browser-based interface, making it easier to experiment with prompts and gather quick feedback without constantly working in the terminal.
Experiment with Prompt Engineering
- Now that you have DeepSeek R1 running locally, it’s a perfect environment to refine your prompts. Adjust formatting, context, or structure to get the best possible outputs. You can use more advanced prompt-engineering techniques—like providing role-based instructions or including examples—to further improve response quality.
Integrate Retrieval-Augmented Generation
- If you want the model to answer questions about specific documents or internal knowledge bases, investigate “retrieval-augmented generation” (RAG). Libraries like LangChain or LlamaIndex help you combine vector databases or external data sources with your locally running LLM, enabling more accurate and context-rich responses.
Fine-Tune or Customize the Model
- For a deeper level of customization, you can explore fine-tuning (or parameter-efficient fine-tuning methods like LoRA or QLoRA) on domain-specific data. This is especially useful if you’re building an internal tool for a specialized industry or if you want the model to adopt a certain style or knowledge domain.
Scale and Automate
- If you find yourself running large batch jobs or building enterprise-grade applications, consider containerizing your setup with Docker or orchestration tools like Kubernetes. This ensures repeatable deployments and easier scaling across multiple machines when your usage grows.
Explore Hardware Optimizations
- Running large models can be resource-intensive. Tools such as bitsandbytes or model quantization techniques can help reduce memory usage. Experiment with GPU acceleration, or—for CPU-only systems—explore 4-bit or 8-bit quantized models that fit better on limited hardware.
Build End-to-End Applications
- With an API endpoint from Ollama, you can plug DeepSeek R1 into chatbots, web applications, or internal developer tools. Whether it’s Slack integrations, customer support bots, or research assistants, your local LLM can now handle tasks without incurring ongoing API fees or risking data privacy.
Stay Updated
- The LLM landscape is evolving quickly. Keep an eye on repositories, community forums, and Ollama’s announcements for updates on model improvements, new features, and performance optimizations.

Running DeepSeek R1 locally is just the beginning. With a bit of experimentation and the right tooling, you can transform your setup into a powerful, custom AI platform—free from cloud constraints and tailored to your unique needs.

Incorporate DeepSeek R1 into an AI Agent Project

If you’re looking to build a fully interactive voice or chat experience, Voiceflow provides a no-code conversation design platform that can connect to your locally running LLM:

Set Up a Custom API Endpoint
- Since you have Ollama serving DeepSeek R1 at a local endpoint (e.g., http://localhost:11434), you can treat it like any other API.
- Within Voiceflow, create a custom API step or integration that points to your local Ollama endpoint.
Design Conversation Flows
- In Voiceflow, you can design how users will engage with your model. For example:
  - Welcome Prompt: Greet the user and explain what your AI assistant can do.
  - Main Flow: Capture user questions and feed them into the API call for DeepSeek R1.
  - Response Handling: Present DeepSeek R1’s output back to the user via text or spoken responses.
Handle Context & Memory
- Voiceflow’s platform lets you store variables or context from one prompt to the next.
- You can send this context along with each API request, giving DeepSeek R1 additional information to maintain conversational continuity.
Deploy on Multiple Channels
- Once your Voiceflow project is set up, you can deploy it to various channels (web chat, voice assistants, phone IVR, etc.), all while your DeepSeek R1 model continues to run privately on your local machine.
Add Advanced Features
- Conditional Logic: In Voiceflow, you can build conditional steps that direct the conversation flow based on user inputs or AI responses.
- Analytics & Logging: Track conversation paths, user inputs, and model responses to refine your experience over time.

By combining Voiceflow’s conversation-building tools with your locally running DeepSeek R1 instance, you can quickly prototype and deploy custom AI-driven assistants—without relying on external cloud APIs. This approach offers the best of both worlds: a user-friendly, no-code conversation flow builder on the front end, and a fully private, customizable large language model on the back end.

Conclusion

Running DeepSeek R1 locally with Ollama is a fast and reliable way to access cutting-edge AI capabilities without sacrificing privacy, speed, or flexibility. Whether you're building a prototype, developing an internal tool, or just experimenting with open-source AI, this setup gives you full autonomy—no cloud services required.

With just a few commands, you’ll have a high-performance language model running securely on your own machine. And as you grow more comfortable with the setup, you can extend it into full applications, integrate it with APIs, or even build retrieval-augmented generation pipelines.

Ready to take the next step? You now have everything you need to bring DeepSeek R1 into your local development environment.

How to Set Up and Run DeepSeek R1 Locally with Ollama