How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration

Author： Joe
发布时间：July 18, 2024
2272 views
One comment
36877 words
Categories： AI

通过Notion查看本文
本文同步发布在j000e.com

How to run Ollama & Open WebUI on Windows

Getting Started with Ollama on Windows: A Step-by-Step Guide

Introduction

In today's technological landscape, Large Language Models (LLMs) have become indispensable tools, capable of exhibiting human-level performance across various tasks, from text generation to code writing and language translation. However, deploying and running these models typically require substantial resources and expertise, especially in local environments. This is where Ollama comes into play.

What is Ollama?

Ollama is an open-source tool designed to simplify the local deployment and operation of large language models. Actively maintained and regularly updated, it offers a lightweight, easily extensible framework that allows developers to effortlessly build and manage LLMs on their local machines. This eliminates the need for complex configurations or reliance on external servers, making it an ideal choice for various applications.

Key Features of Ollama

With Ollama, developers can access and run a range of pre-built models such as Llama 3, Gemma, and Mistral, or import and customize their own models without worrying about the intricate details of the underlying implementations. The tool streamlines the setup process by defining model files that encompass model weights, configurations, and necessary data components, negating the need for complex configuration files or deployment procedures.

Benefits of Local Deployment

Ollama enables you to obtain open-source models for local use. It automatically sources models from the best available repositories and seamlessly employs GPU acceleration if your computer has dedicated GPUs, without requiring manual configuration. It can even utilize multiple GPUs on your machine, thereby accelerating inference speed and enhancing performance for resource-intensive tasks. Moreover, running LLMs locally with Ollama ensures that your data never leaves your computer, which is crucial for sensitive information.

What to Expect

This article will guide you through the process of installing and using Ollama on Windows, introduce its main features, run multimodal models like Llama 3, use CUDA acceleration, adjust system variables, load GGUF models, customize model prompts, and set up a frontend website through Docker to use chatbots more elegantly. It will demonstrate how to leverage its capabilities to explore and harness the power of large language models. Whether you want to quickly experience LLMs or need to deeply customize and run models in a local environment, Ollama provides the necessary tools and guidance.

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Download and Installation of Ollama

The installation process for Ollama is straightforward and supports multiple operating systems including macOS, Windows, and Linux, as well as Docker environments, ensuring broad usability and flexibility. Below is the installation guide for Windows and macOS platforms.

You can obtain the installation package from the official website or GitHub:

Screenshot of Ollama Download Page

Screenshot of Ollama Download Page

Ollama GitHub Releases

Ollama GitHub Releases

Install Ollama on Windows

Here, we download the installer from the Ollama official website: https://ollama.com/download/OllamaSetup.exe.

Run the installer and click Install.

Click on <code>Install</code>

Click on Install

The installer will automatically perform the installation tasks, so please be patient. Once the installation process is complete, the installer window will close automatically. Do not worry if you do not see anything, as Ollama is now running in the background and can be found in the system tray on the right side of the taskbar.

After installation, you can find the running Ollama in the system tray

After installation, you can find the running Ollama in the system tray

Install Ollama on macOS

Similarly, you can download the installer for macOS from the Ollama official website. Detailed installation instructions for this and other platforms will not be covered here.

https://ollama.com/download/Ollama-darwin.zip

Install Ollama on Linux

curl -fsSL https://ollama.com/install.sh | sh

You can refer to the official manual for further details: Manual install instructions

Install Ollama by Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

docker pull ollama/ollama

How to Use Ollama

This article will use the Windows platform as an example to introduce how to use Ollama. The usage on macOS and other platforms is quite similar.

Customize model storage location and environment variables (Optional)

This section is not mandatory; skipping it will not impact your use of Ollama.

Before you start using Ollama, if your system drive or partition (C:) has limited free space, or if you prefer storing files on other drives or partitions, you need to change the default storage location for Ollama models. By default, Ollama stores downloaded models in C:\Users\%username%\.ollama\models, and since models can be several gigabytes in size, this can quickly reduce the available space on your system drive, potentially affecting system performance.

Similarly, on macOS, the default storage location for models is ~/.ollama/models, and on Linux, it is /usr/share/ollama/.ollama/models.

If you need to use a different directory, set the environment variable OLLAMA_MODELS to the chosen directory. Here’s how to do it:

On Windows, Ollama inherits your user and system environment variables.

First Quit Ollama by clicking on it in the task bar.
Start the Settings (Windows 11) or Control Panel (Windows 10) application and search for environment variables.
Click on Edit the system environment variables.
Create a variable called OLLAMA_MODELS pointing to where you want to store the models
Click OK/Apply to save.
Start the Ollama application from the Windows Start menu.

Search “environment variables”

Search “environment variables”

Click on Environment Variables

Click on Environment Variables

Create a variable called OLLAMA_MODELS pointing to where you want to store the models

Create a variable called OLLAMA_MODELS pointing to where you want to store the models

If Ollama is run as a macOS application, environment variables should be set using launchctl:

For each environment variable, call launchctl setenv.
```
launchctl setenv OLLAMA_MODELS /PATH/
```
Restart the Ollama application.

After making this setting, when you pull models using Ollama, they will be stored in the custom location.

Other commonly used system environment variables can be set as needed (optional):

OLLAMA_HOST: The network address that the Ollama service listens on, default is 127.0.0.1. If you want to allow other computers (e.g., those in the local network) to access Ollama, you can set it to 0.0.0.0 to permit access from other networks.
OLLAMA_PORT: The default port that the Ollama service listens on, default is 11434. If there is a port conflict, you can change it to another port (e.g., 8080).
OLLAMA_ORIGINS: A comma-separated list of HTTP client request origins. If using locally without strict requirements, you can set it to an asterisk (*) to indicate no restrictions.
OLLAMA_KEEP_ALIVE: The duration a large model remains in memory, default is 5 minutes (5m). For example, a pure number like 300 means 300 seconds, 0 means the model is unloaded immediately after processing the request, and any negative number means it stays loaded indefinitely. You can set it to 24h to keep the model in memory for 24 hours, improving access speed.
OLLAMA_NUM_PARALLEL: The number of concurrent request handlers, default is 1, meaning requests are processed serially. Adjust this based on your actual needs.
OLLAMA_MAX_QUEUE: The length of the request queue, default is 512. Requests beyond this length will be discarded. Adjust this setting based on your situation.
OLLAMA_DEBUG: Flag for outputting debug logs. Set it to 1 to output detailed log information, which is useful for troubleshooting issues.
OLLAMA_MAX_LOADED_MODELS: The maximum number of models that can be loaded into memory simultaneously, default is 1, meaning only one model can be in memory at a time.

Quick Start: Try Llama 3

We can quickly experience Meta's latest open-source model, Llama 3 8B, by using the ollama run llama3 command. First, open a command line window (You can run the commands mentioned in this article by using cmd, PowerShell, or Windows Terminal.) and enter ollama run llama3 to start pulling the model. (If you want to experience other models, please refer to the "Model Library" section later in this article for a list of models and their corresponding commands, or follow the "Import from GGUF" section to load custom GGUF models.)

C:\Users\Edd1e>ollama run llama3
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████████████████████████████████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████████████████████████████████████████████▏  12 KB
pulling 8ab4849b038c... 100% ▕████████████████████████████████████████████████████████▏  254 B
pulling 577073ffcc6c... 100% ▕████████████████████████████████████████████████████████▏  110 B
pulling 3f8eb4da87fa... 100% ▕████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help)

Once the model pull is complete, we can start using Llama 3 8B. You can directly send conversation content to the model in the command line.

For example, we can first ask it who it is:

>>> Who are you?
I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a
conversational manner. I'm not a human, but rather a computer program designed to simulate conversation and answer
questions to the best of my ability based on my training.

My primary function is to provide information and answer questions across a wide range of topics, from science and
history to entertainment and culture. I can generate text responses that are natural-sounding and often
indistinguishable from those written by humans.

I was trained using a massive dataset of text from various sources, including books, articles, and websites. This
training enables me to recognize patterns and relationships in language, allowing me to understand and respond to
user input.

Some examples of what I can do include:

1. Answering questions: I can provide information on a wide range of topics, from science and history to
entertainment and culture.
2. Generating text: I can create original text based on a prompt or topic.
3. Summarizing content: I can summarize long pieces of text into shorter, more digestible versions.
4. Offering suggestions: I can suggest ideas or options for things like travel destinations, restaurants, or books
to read.

I'm constantly learning and improving my responses based on user interactions, so please bear with me if I make
any mistakes!

As you can see, Llama 3 clearly introduces its information and capabilities. We can continue to test it by asking two simple questions:

>>> If there are 1000 books in a room and I read 2, how many books are still in the room? Answer succinctly
998 books.

>>> Why?
Because you've read 2 books, leaving 1000 - 2 = 998 books remaining in the room.

>>> 9.11 and 9.2, which is larger?
9.2 is larger than 9.11.

It is evident that Llama 3 generates hallucinations when answering simple logical questions, and after multiple new conversation tests, it consistently provides incorrect answers. Therefore, the content generated by the model is likely to be erroneous, and you should not fully trust it.

Ollama Model Library

Want to try other models? You can access the list of models provided by Ollama at https://ollama.com/library.

Here are some example models that can be downloaded:

Model	Parameters	Size	Download
Llama 3	8B	4.7GB	ollama run llama3
Llama 3	70B	40GB	ollama run llama3:70b
Phi 3 Mini	3.8B	2.3GB	ollama run phi3
Phi 3 Medium	14B	7.9GB	ollama run phi3:medium
Gemma 2	9B	5.5GB	ollama run gemma2
Gemma 2	27B	16GB	ollama run gemma2:27b
Mistral	7B	4.1GB	ollama run mistral
Moondream 2	1.4B	829MB	ollama run moondream
Neural Chat	7B	4.1GB	ollama run neural-chat
Starling	7B	4.1GB	ollama run starling-lm
Code Llama	7B	3.8GB	ollama run codellama
Llama 2 Uncensored	7B	3.8GB	ollama run llama2-uncensored
LLaVA	7B	4.5GB	ollama run llava
Solar	10.7B	6.1GB	ollama run solar

Operation Commands

Before running the model, you should be aware that Ollama has the following commands, which you can run in the command line to utilize various functionalities of Ollama:

Command	Description	Example
serve	Start ollama
create	Create a model from a Modelfile	ollama create mymodel -f ./Modelfile
show	Show information for a model
run	Run a model
pull	Pull a model from a registry	ollama pull llama3
push	Push a model to a registry
list	List models
ps	List running models, displays hardware usage.
cp	Copy a model	ollama cp llama3 my-model
rm	Remove a model	ollama rm llama3
help	Help about any command

pull command can also be used to update a local model. Only the diff will be pulled.

If you want to get help content for a specific command like run, you can type ollama [command] --help to get more detailed usage information for that command. For example, by typing ollama run --help, you will see:

C:\Users\Edd1e>ollama run --help
Run a model

Usage:
  ollama run MODEL [PROMPT] [flags]

Flags:
      --format string      Response format (e.g. json)
  -h, --help               help for run
      --insecure           Use an insecure registry
      --keepalive string   Duration to keep a model loaded (e.g. 5m)
      --nowordwrap         Don't wrap words to the next line automatically
      --verbose            Show timings for response

Environment Variables:
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_NOHISTORY           Do not preserve readline history

While the model is running, you can perform the following operations:

Command	Description
/set	Set session variables
/show	Show model information
/load	Load a session or model
/save	Save your current session
/clear	Clear session context
/bye	Exit
/?, /help	Help for a command
/? shortcuts	Help for keyboard shortcuts

Additionally, you can use triple quotes (""") to begin a multi-line message. For example:

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

You can also leverage the capabilities of some multimodal models to have the model recognize images. For example, you can use the LLaVA model to recognize an image generated by DALLE-3 by simply including the image path in the prompt:

ollama run llava
>>>  What is in this image? "D:\Joe\Downloads\test.png"
Added image 'D:\Joe\Downloads\test.png'
 The image shows two people taking a selfie. They are wearing face masks and appear to be in an outdoor setting,
possibly with volcanic scenery in the background. One person is holding a phone with a camera app open, while the
other has their arm around the first person's shoulder. Both individuals are dressed casually and are also wearing
what seem to be raincoats or ponchos. The photo captures a moment of travel or exploration, as indicated by the
clear sky and natural environment.

The image <code>test.png</code> was generated by DALLE-3.

The image test.png was generated by DALLE-3.

As can be seen, the model accurately described the details in the image, almost perfectly recreating the prompt I used to generate it.

Viewing Logs

Sometimes, Ollama might not perform as expected. One of the best ways to find out what happened is to check the logs.

When running Ollama on Windows, there are several different locations you can check. Open the File Explorer by pressing Win+R and entering the following commands:

explorer %LOCALAPPDATA%\\Ollama  # View logs
explorer %LOCALAPPDATA%\\Programs\\Ollama  # Browse binaries (the installer adds this to the user's PATH)
explorer %HOMEPATH%\\.ollama  # Browse model and configuration storage location
explorer %TEMP%  # Temporary executable files are stored in one or more ollama* directories

On a Mac, you can find the logs by running the following command:

cat ~/.ollama/logs/server.log

If needed, you can set the environment variable OLLAMA_DEBUG to "1" to get more detailed log information.

Using GPU Acceleration: Installing CUDA Toolkit (Optional)

For smaller models like Llama 3 8B, using a CPU or integrated graphics can work well. However, if your computer has an Nvidia discrete GPU and you want to run larger models or achieve faster response times, you will need to install the CUDA Toolkit to better utilize the discrete GPU.

Note: This step is only applicable to Nvidia GPUs with compute capability 5.0+.

If you are using an AMD GPU, you can check the list of supported devices to see if your graphics card is supported by Ollama. However, the CUDA Toolkit is only applicable to Nvidia GPUs, so AMD GPU users can skip this section without worry—you are not missing out on anything.

Ollama supports the following AMD GPUs:

Family	Cards and accelerators
AMD Radeon RX	7900 XTX 7900 XT 7900 GRE 7800 XT 7700 XT 7600 XT 7600 6950 XT 6900 XTX 6900XT 6800 XT 6800 Vega 64 Vega 56
AMD Radeon PRO	W7900 W7800 W7700 W7600 W7500 W6900X W6800X Duo W6800X W6800 V620 V420 V340 V320 Vega II Duo Vega II VII SSG
AMD Instinct	MI300X MI300A MI300 MI250X MI250 MI210 MI200 MI100 MI60 MI50

Next, Nvidia GPU users should check your compute compatibility to see if your card is supported: Nvidia CUDA GPUs

Here is the list of supported GPUs:

Compute Capability	Family	Cards
9.0	NVIDIA	H100
8.9	GeForce RTX 40xx	RTX 4090 RTX 4080 SUPER RTX 4080 RTX 4070 Ti SUPER RTX 4070 Ti RTX 4070 SUPER RTX 4070 RTX 4060 Ti RTX 4060
	NVIDIA Professional	L4 L40 RTX 6000
8.6	GeForce RTX 30xx	RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 RTX 3060 Ti RTX 3060
	NVIDIA Professional	A40 RTX A6000 RTX A5000 RTX A4000 RTX A3000 RTX A2000 A10 A16 A2
8.0	NVIDIA	A100 A30
7.5	GeForce GTX/RTX	GTX 1650 Ti TITAN RTX RTX 2080 Ti RTX 2080 RTX 2070 RTX 2060
	NVIDIA Professional	T4 RTX 5000 RTX 4000 RTX 3000 T2000 T1200 T1000 T600 T500
	Quadro	RTX 8000 RTX 6000 RTX 5000 RTX 4000
7.0	NVIDIA	TITAN V V100 Quadro GV100
6.1	NVIDIA TITAN	TITAN Xp TITAN X
	GeForce GTX	GTX 1080 Ti GTX 1080 GTX 1070 Ti GTX 1070 GTX 1060 GTX 1050 Ti GTX 1050
	Quadro	P6000 P5200 P4200 P3200 P5000 P4000 P3000 P2200 P2000 P1000 P620 P600 P500 P520
	Tesla	P40 P4
6.0	NVIDIA	Tesla P100 Quadro GP100
5.2	GeForce GTX	GTX TITAN X GTX 980 Ti GTX 980 GTX 970 GTX 960 GTX 950
	Quadro	M6000 24GB M6000 M5000 M5500M M4000 M2200 M2000 M620
	Tesla	M60 M40
5.0	GeForce GTX	GTX 750 Ti GTX 750 NVS 810
	Quadro	K2200 K1200 K620 M1200 M520 M5000M M4000M M3000M M2000M M1000M K620M M600M M500M

If your GPU is supported, you can download the appropriate CUDA Toolkit installer from the following link:

Download CUDA Toolkit

Select the version that matches your system and architecture:

Download the CUDA installer suitable for Windows x64 architecture.

Download the CUDA installer suitable for Windows x64 architecture.

Run the installer and click on OK:

Run the CUDA setup package.

Run the CUDA setup package.

Follow the installer instructions to complete the installation:

CUDA Installer

CUDA Installer

At this point, CUDA has been successfully installed. However, I have some practical tips to share with you to help you better utilize your powerful discrete GPU for running large models.

Ollama will automatically detect and use the GPU to run models, but if your computer has multiple GPUs, it may end up using the wrong one. The simplest and most direct way to ensure Ollama uses the discrete GPU is by setting the Display Mode to Nvidia GPU only in the Nvidia Control Panel. As shown in the image below, you can find the Nvidia Control Panel in the system tray or by right-clicking on the desktop.

Please note that the Manage Display Mode feature is not available on every computer. If you don't have a similar setting, don't worry—this won't affect your ability to use Ollama.

Nvidia Control Panel - Manage Display Mode

Nvidia Control Panel - Manage Display Mode

Note: When your computer is connected to external displays, you might not be able to adjust the Display Mode. You will need to disconnect all external displays before changing the mode.

How can you verify that Ollama is using the correct GPU to run the model?

You can start running a model and ask it a question that requires a long answer (such as "Write a 1000-word article on artificial intelligence"). While it is responding, open a new command line window and run ollama ps to check if Ollama is using the GPU and to see the usage percentage. Additionally, you can use Windows Task Manager to monitor the GPU usage and memory usage to determine which hardware Ollama is using for inference.

Example, Ollama shows that it is fully utilizing the GPU, but it does not specify which GPU is being used. We can only confirm that it is not using the CPU.:

C:\Users\Edd1e>ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
llama3:latest   365c0bd3c000    6.7 GB  100% GPU        4 minutes from now

You can open Task Manager using the Ctrl+Shift+Esc shortcut and check the Performance tab. If Ollama is using the discrete GPU, you will see some usage in the section shown in the image:

Task Manager

Task Manager

Advanced Usage

Import from GGUF

Ollama supports importing GGUF models in the Modelfile. You can download a fine-tuned GGUF models from platforms like Hugging Face and run them through Ollama. To do that, you could:

Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.
```
FROM ./filename.gguf
```
For example, you can create a new text document using a text editor and input the following content. Save the document and then rename it to remove the file extension like “.txt”:
```
FROM "D:\Joe\Downloads\microsoft\Phi-3-mini-4k-instruct-gguf\Phi-3-mini-4k-instruct-q4.gguf"
```
The Phi 3 model comes from microsoft/Phi-3-mini-4k-instruct-gguf on Hugging Face.
Hugging Face Phi 3 Page

Create the model in Ollama and name this model "example":

ollama create example -f Modelfile

Example:

ollama create example -f "D:\Joe\Downloads\Modelfile"

Run the model

ollama run example

Example：

C:\Users\Edd1e>ollama run example
>>> who are you?
 I am Phi, an AI developed by Microsoft to assist users in generating human-like text based on the input provided.
How can I help you today?

Customize a prompt

Models from the Ollama library can be customized with a prompt. For example, to customize the llama3 model:

ollama pull llama3

Create a Modelfile:

FROM llama3

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are a research assistant from Meta named Joe. You like AI technology and studying in Australia. Answer as a research assistant, only.
"""

Next, create and run the model:

ollama create Joe -f "D:\Joe\Downloads\Modelfile"
ollama run Joe
>>> hi
G'day! Hi there! I'm Joe, a research assistant from Meta. Nice to meet you! I'm
passionate about exploring the possibilities of artificial intelligence and how it can shape our world for the
better. When I'm not working on projects or staying up-to-date with the latest AI developments, you can find me
exploring the beautiful Australian landscape or hitting the books at one of our top-notch universities here. What
brings you to this neck of the woods?

Use Ollama Like GPT: Open WebUI in Docker

In this chapter, we will install Docker and use the open-source front-end extension Open WebUI to connect to Ollama's API, ultimately creating a user-friendly chatbot experience similar to GPT.

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.

Docker is an open-source platform designed to automate the deployment, scaling, and management of applications using containerization. Containers package an application with all its dependencies, ensuring consistency across multiple environments. This allows for more efficient development, testing, and deployment processes.

Step 1: Start Hyper-V

If you haven't installed Docker before, you need to set it up first.

Open Control Panel > Programs > Programs and Features > Turn Windows features on or off

Control Panel - Programs and Features

Control Panel - Programs and Features

Turn Windows features on or off

Turn Windows features on or off

Check Hyper-V, Virtual Machine Platform, and Windows Subsystem for Linux, then click OK.

Restart your computer once it's done.

Step 2: Install WSL

Open PowerShell and start the command window as an administrator.

Input:

wsl --update

Install and set your Unix username and password:

wsl --install

Restart your computer after the installation is successful.

Let's begin the Docker installation.

First, we will install Docker Desktop, which can be downloaded from the official website:

https://www.docker.com/products/docker-desktop/

Follow the instructions to complete the installation. After the installation is complete, launch Docker Desktop and run the following command in the command line or PowerShell to pull the Open WebUI image:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Once the pull is complete, you can see the running container under the Containers tab. Click the link in the Ports section to open the webpage:

If you see this page, it means you have succeeded. Next, click on "Sign up" to register an account:

Fill in the information to complete the registration:

Once logged in, you can select a model from the top left corner. For example, let's choose Llama3:

You'll notice that the interface design and interactions are very similar to GPT, making it very user-friendly. It also renders Markdown very well:

If you choose the LLaVA model, you can directly paste images, which is more intuitive and convenient compared to filling in the path:

At this point, we have completed the deployment of the frontend page. This makes it more convenient and aesthetically pleasing to use, allowing open-source large models to run locally with a perfect user experience.

Conclusion

In this guide, we walked through the process of installing and using Ollama on Windows, highlighting its straightforward setup and powerful capabilities. By following the steps provided, you can easily deploy and manage large language models locally, benefiting from GPU acceleration and ensuring your data remains private.

Ollama simplifies the use of pre-built models like Llama 3 and allows for customization with GGUF models. Additionally, you can explore advanced features such as Docker integration for web-based interfaces, providing a user-friendly chat experience similar to popular AI chatbots.

This guide also touched on customizing prompts and environment variables to suit your specific needs, making Ollama a versatile tool for AI development. With its comprehensive documentation and support for various models, Ollama offers a robust solution for anyone looking to harness the power of large language models.

Through this guide, you should now have a comprehensive understanding of how to use Ollama, and you are ready to embark on your exploration and development journey.

Source

THE END

最后修改：2024 年 07 月 19 日 00 : 24
本文链接：https://www.j000e.com/AI/How-to-run-Ollama-Open-WebUI-on-Windows.html
版权声明：本文『How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration』为『Joe』原创。著作权归作者所有。
转载说明：『How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration || Joe's Blog』转载许可类型见文末右下角标识。允许规范转载时，转载文章需注明原文出处及地址。

Last modification：July 19, 2024

If you think my article is useful to you, please feel free to appreciate

One comment

scooter
September 30th, 2024 at 09:40 pm

Thank you! Your walkthrough filled in missing information for me that allowed me to install and get this working!

Reply

How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration

Joe • 2024 年 07 月 18 日

<div class="tip share">请注意，本文编写于 281 天前，最后修改于 280 天前，其中某些信息可能已经过时。</div>

<blockquote><a class="no-external-link" href="https://www.notion.so/How-to-run-Ollama-Open-WebUI-on-Windows-Llama-3-GGUF-Change-Model-Storage-Location-CUDA-GP-f799dab442a9424cb33fdba9d2fd6059?pvs=21" target="_blank">通过Notion查看本文</a> <a href="https://www.j000e.com/AI/How-to-run-Ollama-Open-WebUI-on-Windows.html">本文同步发布在j000e.com</a></blockquote><div class="index-menu"><ul class="index-menu-list"><li class="index-menu-item"><a data-scroll class="index-menu-link current" href="#menu_index_1" title="How to run Ollama & Open WebUI on Windows">How to run Ollama & Open WebUI on Windows</a><ul class="index-menu-list"><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_2" title="Getting Started with Ollama on Windows: A Step-by-Step Guide">Getting Started with Ollama on Windows: A Step-by-Step Guide</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_3" title="Introduction">Introduction</a><ul class="index-menu-list"><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_4" title="What is Ollama?">What is Ollama?</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_5" title="Key Features of Ollama">Key Features of Ollama</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_6" title="Benefits of Local Deployment">Benefits of Local Deployment</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_7" title="What to Expect">What to Expect</a></li></ul></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_8" title="Download and Installation of Ollama">Download and Installation of Ollama</a><ul class="index-menu-list"><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_9" title="Install Ollama on Windows">Install Ollama on Windows</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_10" title="Install Ollama on macOS">Install Ollama on macOS</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_11" title="Install Ollama on Linux">Install Ollama on Linux</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_12" title="Install Ollama by Docker">Install Ollama by Docker</a></li></ul></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_13" title="How to Use Ollama">How to Use Ollama</a><ul class="index-menu-list"><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_14" title="Customize model storage location and environment variables (Optional)">Customize model storage location and environment variables (Optional)</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_15" title="Quick Start: Try Llama 3">Quick Start: Try Llama 3</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_16" title="Ollama Model Library">Ollama Model Library</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_17" title="Operation Commands">Operation Commands</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_18" title="Viewing Logs">Viewing Logs</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_19" title="Using GPU Acceleration: Installing CUDA Toolkit (Optional)">Using GPU Acceleration: Installing CUDA Toolkit (Optional)</a></li></ul></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_20" title="Advanced Usage">Advanced Usage</a><ul class="index-menu-list"><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_21" title="Import from GGUF">Import from GGUF</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_22" title="Customize a prompt">Customize a prompt</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_23" title="Use Ollama Like GPT: Open WebUI in Docker">Use Ollama Like GPT: Open WebUI in Docker</a></li></ul></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_24" title="Conclusion">Conclusion</a></li><li class="index-menu-item"><a data-scroll class="index-menu-link " href="#menu_index_25" title="Source">Source</a></li></ul></li></ul></div><h1>How to run Ollama & Open WebUI on Windows</h1><h2>Getting Started with Ollama on Windows: A Step-by-Step Guide</h2><h2>Introduction</h2>In today's technological landscape, Large Language Models (LLMs) have become indispensable tools, capable of exhibiting human-level performance across various tasks, from text generation to code writing and language translation. However, deploying and running these models typically require substantial resources and expertise, especially in local environments. This is where Ollama comes into play.<h3>What is Ollama?</h3>Ollama is an open-source tool designed to simplify the local deployment and operation of large language models. Actively maintained and regularly updated, it offers a lightweight, easily extensible framework that allows developers to effortlessly build and manage LLMs on their local machines. This eliminates the need for complex configurations or reliance on external servers, making it an ideal choice for various applications.<h3>Key Features of Ollama</h3>With Ollama, developers can access and run a range of pre-built models such as Llama 3, Gemma, and Mistral, or import and customize their own models without worrying about the intricate details of the underlying implementations. The tool streamlines the setup process by defining model files that encompass model weights, configurations, and necessary data components, negating the need for complex configuration files or deployment procedures.<h3>Benefits of Local Deployment</h3>Ollama enables you to obtain open-source models for local use. It automatically sources models from the best available repositories and seamlessly employs GPU acceleration if your computer has dedicated GPUs, without requiring manual configuration. It can even utilize multiple GPUs on your machine, thereby accelerating inference speed and enhancing performance for resource-intensive tasks. Moreover, running LLMs locally with Ollama ensures that your data never leaves your computer, which is crucial for sensitive information.<h3>What to Expect</h3>This article will guide you through the process of installing and using Ollama on Windows, introduce its main features, run multimodal models like Llama 3, use CUDA acceleration, adjust system variables, load GGUF models, customize model prompts, and set up a frontend website through Docker to use chatbots more elegantly. It will demonstrate how to leverage its capabilities to explore and harness the power of large language models. Whether you want to quickly experience LLMs or need to deeply customize and run models in a local environment, Ollama provides the necessary tools and guidance.<blockquote>Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.</blockquote><h2>Download and Installation of Ollama</h2>The installation process for Ollama is straightforward and supports multiple operating systems including macOS, Windows, and Linux, as well as Docker environments, ensuring broad usability and flexibility. Below is the installation guide for Windows and macOS platforms.You can obtain the installation package from the official website or GitHub:<ul><li><a class="no-external-link" href="https://ollama.com/download" target="_blank">Download from the Ollama official website</a></li><li><a class="no-external-link" href="https://github.com/ollama/ollama/releases" target="_blank">Download from Ollama GitHub Releases</a></li></ul><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190053541.webp" alt="Screenshot of Ollama Download Page" title="Screenshot of Ollama Download Page"style="">Screenshot of Ollama Download Page<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190054379.webp" alt="Ollama GitHub Releases" title="Ollama GitHub Releases"style="">Ollama GitHub Releases<h3>Install Ollama on Windows</h3>Here, we download the installer from the Ollama official website: <a class="no-external-link" href="https://ollama.com/download/OllamaSetup.exe" target="_blank">https://ollama.com/download/OllamaSetup.exe</a>.Run the installer and click <code>Install</code>.<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190054332.webp" alt="Click on <code>Install</code>" title="Click on <code>Install</code>">Click on <code>Install</code>The installer will automatically perform the installation tasks, so please be patient. Once the installation process is complete, the installer window will close automatically. Do not worry if you do not see anything, as Ollama is now running in the background and can be found in the system tray on the right side of the taskbar.<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190054298.webp" alt="After installation, you can find the running Ollama in the system tray" title="After installation, you can find the running Ollama in the system tray"style="">After installation, you can find the running Ollama in the system tray<h3>Install Ollama on macOS</h3>Similarly, you can download the installer for macOS from the Ollama official website. Detailed installation instructions for this and other platforms will not be covered here.<a class="no-external-link" href="https://ollama.com/download/Ollama-darwin.zip" target="_blank">https://ollama.com/download/Ollama-darwin.zip</a><h3>Install Ollama on Linux</h3><pre><code class="lang-bash">curl -fsSL https://ollama.com/install.sh | sh</code></pre>You can refer to the official manual for further details: <a class="no-external-link" href="https://github.com/ollama/ollama/blob/main/docs/linux.md" target="_blank">Manual install instructions</a><h3>Install Ollama by Docker</h3>The official <a class="no-external-link" href="https://hub.docker.com/r/ollama/ollama" target="_blank">Ollama Docker image</a> <code>ollama/ollama</code> is available on Docker Hub.<pre><code class="lang-bash">docker pull ollama/ollama</code></pre><h2>How to Use Ollama</h2>This article will use the Windows platform as an example to introduce how to use Ollama. The usage on macOS and other platforms is quite similar.<h3>Customize model storage location and environment variables (Optional)</h3>This section is not mandatory; skipping it will not impact your use of Ollama.Before you start using Ollama, if your system drive or partition (C:) has limited free space, or if you prefer storing files on other drives or partitions, you need to change the default storage location for Ollama models. By default, Ollama stores downloaded models in <code>C:\Users\%username%\.ollama\models</code>, and since models can be several gigabytes in size, this can quickly reduce the available space on your system drive, potentially affecting system performance.Similarly, on macOS, the default storage location for models is <code>~/.ollama/models</code>, and on Linux, it is <code>/usr/share/ollama/.ollama/models</code>.If you need to use a different directory, set the environment variable <code>OLLAMA_MODELS</code> to the chosen directory. Here’s how to do it:On Windows, Ollama inherits your user and system environment variables.<ol><li>First Quit Ollama by clicking on it in the task bar.</li><li>Start the Settings (Windows 11) or Control Panel (Windows 10) application and search for <code>environment variables</code>.</li><li>Click on <code>Edit the system environment variables</code>.</li><li>Create a variable called <code>OLLAMA_MODELS</code> pointing to where you want to store the models</li><li>Click OK/Apply to save.</li><li>Start the Ollama application from the Windows Start menu.</li></ol><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190054262.webp" alt="Search “environment variables”" title="Search “environment variables”"style="">Search “environment variables”<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190054214.webp" alt="Click on Environment Variables" title="Click on Environment Variables"style="">Click on Environment Variables<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190054662.webp" alt="Create a variable called OLLAMA_MODELS pointing to where you want to store the models" title="Create a variable called OLLAMA_MODELS pointing to where you want to store the models"style="">Create a variable called OLLAMA_MODELS pointing to where you want to store the modelsIf Ollama is run as a macOS application, environment variables should be set using <code>launchctl</code>:<ol><li>For each environment variable, call <code>launchctl setenv</code>.<pre><code class="lang-bash">launchctl setenv OLLAMA_MODELS /PATH/</code></pre></li><li>Restart the Ollama application.</li></ol>After making this setting, when you pull models using Ollama, they will be stored in the custom location.Other commonly used system environment variables can be set as needed (optional):<ol><li><code>OLLAMA_HOST</code>: The network address that the Ollama service listens on, default is <code>127.0.0.1</code>. If you want to allow other computers (e.g., those in the local network) to access Ollama, you can set it to <code>0.0.0.0</code> to permit access from other networks.</li><li><code>OLLAMA_PORT</code>: The default port that the Ollama service listens on, default is <code>11434</code>. If there is a port conflict, you can change it to another port (e.g., 8080).</li><li><code>OLLAMA_ORIGINS</code>: A comma-separated list of HTTP client request origins. If using locally without strict requirements, you can set it to an asterisk (<code>*</code>) to indicate no restrictions.</li><li><code>OLLAMA_KEEP_ALIVE</code>: The duration a large model remains in memory, default is 5 minutes (5m). For example, a pure number like 300 means 300 seconds, 0 means the model is unloaded immediately after processing the request, and any negative number means it stays loaded indefinitely. You can set it to 24h to keep the model in memory for 24 hours, improving access speed.</li><li><code>OLLAMA_NUM_PARALLEL</code>: The number of concurrent request handlers, default is 1, meaning requests are processed serially. Adjust this based on your actual needs.</li><li><code>OLLAMA_MAX_QUEUE</code>: The length of the request queue, default is 512. Requests beyond this length will be discarded. Adjust this setting based on your situation.</li><li><code>OLLAMA_DEBUG</code>: Flag for outputting debug logs. Set it to 1 to output detailed log information, which is useful for troubleshooting issues.</li><li><code>OLLAMA_MAX_LOADED_MODELS</code>: The maximum number of models that can be loaded into memory simultaneously, default is 1, meaning only one model can be in memory at a time.</li></ol><h3>Quick Start: Try Llama 3</h3>We can quickly experience Meta's latest open-source model, Llama 3 8B, by using the <code>ollama run llama3</code> command. First, open a command line window (You can run the commands mentioned in this article by using cmd, PowerShell, or Windows Terminal.) and enter <code>ollama run llama3</code> to start pulling the model. (If you want to experience other models, please refer to the "Model Library" section later in this article for a list of models and their corresponding commands, or follow the "Import from GGUF" section to load custom GGUF models.)<pre><code class="lang-bash">C:\Users\Edd1e&gt;ollama run llama3
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████████████████████████████████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████████████████████████████████████████████▏ 12 KB
pulling 8ab4849b038c... 100% ▕████████████████████████████████████████████████████████▏ 254 B
pulling 577073ffcc6c... 100% ▕████████████████████████████████████████████████████████▏ 110 B
pulling 3f8eb4da87fa... 100% ▕████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
&gt;&gt;&gt; Send a message (/? for help)</code></pre>Once the model pull is complete, we can start using Llama 3 8B. You can directly send conversation content to the model in the command line.For example, we can first ask it who it is:<pre><code>&gt;&gt;&gt; Who are you?
I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a
conversational manner. I'm not a human, but rather a computer program designed to simulate conversation and answer
questions to the best of my ability based on my training.

My primary function is to provide information and answer questions across a wide range of topics, from science and
history to entertainment and culture. I can generate text responses that are natural-sounding and often
indistinguishable from those written by humans.

I was trained using a massive dataset of text from various sources, including books, articles, and websites. This
training enables me to recognize patterns and relationships in language, allowing me to understand and respond to
user input.

Some examples of what I can do include:

1. Answering questions: I can provide information on a wide range of topics, from science and history to
entertainment and culture.
2. Generating text: I can create original text based on a prompt or topic.
3. Summarizing content: I can summarize long pieces of text into shorter, more digestible versions.
4. Offering suggestions: I can suggest ideas or options for things like travel destinations, restaurants, or books
to read.

I'm constantly learning and improving my responses based on user interactions, so please bear with me if I make
any mistakes!</code></pre>As you can see, Llama 3 clearly introduces its information and capabilities. We can continue to test it by asking two simple questions:<pre><code>&gt;&gt;&gt; If there are 1000 books in a room and I read 2, how many books are still in the room? Answer succinctly
998 books.

&gt;&gt;&gt; Why?
Because you've read 2 books, leaving 1000 - 2 = 998 books remaining in the room.

&gt;&gt;&gt; 9.11 and 9.2, which is larger?
9.2 is larger than 9.11.</code></pre>It is evident that Llama 3 generates hallucinations when answering simple logical questions, and after multiple new conversation tests, it consistently provides incorrect answers. Therefore, the content generated by the model is likely to be erroneous, and you should not fully trust it.<h3>Ollama Model Library</h3>Want to try other models? You can access the list of models provided by Ollama at <a class="no-external-link" href="https://ollama.com/library" target="_blank">https://ollama.com/library</a>.Here are some example models that can be downloaded:<table><thead><tr><th>Model</th><th>Parameters</th><th>Size</th><th>Download</th></tr></thead><tbody><tr><td>Llama 3</td><td>8B</td><td>4.7GB</td><td>ollama run llama3</td></tr><tr><td>Llama 3</td><td>70B</td><td>40GB</td><td>ollama run llama3:70b</td></tr><tr><td>Phi 3 Mini</td><td>3.8B</td><td>2.3GB</td><td>ollama run phi3</td></tr><tr><td>Phi 3 Medium</td><td>14B</td><td>7.9GB</td><td>ollama run phi3:medium</td></tr><tr><td>Gemma 2</td><td>9B</td><td>5.5GB</td><td>ollama run gemma2</td></tr><tr><td>Gemma 2</td><td>27B</td><td>16GB</td><td>ollama run gemma2:27b</td></tr><tr><td>Mistral</td><td>7B</td><td>4.1GB</td><td>ollama run mistral</td></tr><tr><td>Moondream 2</td><td>1.4B</td><td>829MB</td><td>ollama run moondream</td></tr><tr><td>Neural Chat</td><td>7B</td><td>4.1GB</td><td>ollama run neural-chat</td></tr><tr><td>Starling</td><td>7B</td><td>4.1GB</td><td>ollama run starling-lm</td></tr><tr><td>Code Llama</td><td>7B</td><td>3.8GB</td><td>ollama run codellama</td></tr><tr><td>Llama 2 Uncensored</td><td>7B</td><td>3.8GB</td><td>ollama run llama2-uncensored</td></tr><tr><td>LLaVA</td><td>7B</td><td>4.5GB</td><td>ollama run llava</td></tr><tr><td>Solar</td><td>10.7B</td><td>6.1GB</td><td>ollama run solar</td></tr></tbody></table><h3>Operation Commands</h3>Before running the model, you should be aware that Ollama has the following commands, which you can run in the command line to utilize various functionalities of Ollama:<table><thead><tr><th>Command</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td>serve</td><td>Start ollama</td><td> </td></tr><tr><td>create</td><td>Create a model from a Modelfile</td><td>ollama create mymodel -f ./Modelfile</td></tr><tr><td>show</td><td>Show information for a model</td><td> </td></tr><tr><td>run</td><td>Run a model</td><td> </td></tr><tr><td>pull</td><td>Pull a model from a registry</td><td>ollama pull llama3</td></tr><tr><td>push</td><td>Push a model to a registry</td><td> </td></tr><tr><td>list</td><td>List models</td><td> </td></tr><tr><td>ps</td><td>List running models, displays hardware usage.</td><td> </td></tr><tr><td>cp</td><td>Copy a model</td><td>ollama cp llama3 my-model</td></tr><tr><td>rm</td><td>Remove a model</td><td>ollama rm llama3</td></tr><tr><td>help</td><td>Help about any command</td><td> </td></tr></tbody></table><blockquote><code>pull</code> command can also be used to update a local model. Only the diff will be pulled.</blockquote>If you want to get help content for a specific command like <code>run</code>, you can type <code>ollama [command] --help</code> to get more detailed usage information for that command. For example, by typing <code>ollama run --help</code>, you will see:<pre><code class="lang-bash">C:\Users\Edd1e&gt;ollama run --help
Run a model

Usage:
  ollama run MODEL [PROMPT] [flags]

Flags:
      --format string      Response format (e.g. json)
  -h, --help               help for run
      --insecure           Use an insecure registry
      --keepalive string   Duration to keep a model loaded (e.g. 5m)
      --nowordwrap         Don't wrap words to the next line automatically
      --verbose            Show timings for response

Environment Variables:
 OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
 OLLAMA_NOHISTORY Do not preserve readline history</code></pre>While the model is running, you can perform the following operations:<table><thead><tr><th>Command</th><th>Description</th></tr></thead><tbody><tr><td>/set</td><td>Set session variables</td></tr><tr><td>/show</td><td>Show model information</td></tr><tr><td>/load <model></td><td>Load a session or model</td></tr><tr><td>/save <model></td><td>Save your current session</td></tr><tr><td>/clear</td><td>Clear session context</td></tr><tr><td>/bye</td><td>Exit</td></tr><tr><td>/?, /help</td><td>Help for a command</td></tr><tr><td>/? shortcuts</td><td>Help for keyboard shortcuts</td></tr></tbody></table>Additionally, you can use triple quotes (<code>&quot;&quot;&quot;</code>) to begin a multi-line message. For example:<pre><code class="lang-bash">&gt;&gt;&gt; &quot;&quot;&quot;Hello,
... world!
... &quot;&quot;&quot;
I'm a basic program that prints the famous &quot;Hello, world!&quot; message to the console.</code></pre>You can also leverage the capabilities of some multimodal models to have the model recognize images. For example, you can use the LLaVA model to recognize an image generated by DALLE-3 by simply including the image path in the prompt:<pre><code class="lang-bash">ollama run llava
&gt;&gt;&gt; What is in this image? &quot;D:\Joe\Downloads\test.png&quot;
Added image 'D:\Joe\Downloads\test.png'
 The image shows two people taking a selfie. They are wearing face masks and appear to be in an outdoor setting,
possibly with volcanic scenery in the background. One person is holding a phone with a camera app open, while the
other has their arm around the first person's shoulder. Both individuals are dressed casually and are also wearing
what seem to be raincoats or ponchos. The photo captures a moment of travel or exploration, as indicated by the
clear sky and natural environment.</code></pre><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190055675.webp" alt="The image <code>test.png</code> was generated by DALLE-3." title="The image <code>test.png</code> was generated by DALLE-3.">The image <code>test.png</code> was generated by DALLE-3.As can be seen, the model accurately described the details in the image, almost perfectly recreating the prompt I used to generate it.<h3>Viewing Logs</h3>Sometimes, Ollama might not perform as expected. One of the best ways to find out what happened is to check the logs.When running Ollama on Windows, there are several different locations you can check. Open the File Explorer by pressing Win+R and entering the following commands:<pre><code class="lang-bash">explorer %LOCALAPPDATA%\\Ollama # View logs
explorer %LOCALAPPDATA%\\Programs\\Ollama # Browse binaries (the installer adds this to the user's PATH)
explorer %HOMEPATH%\\.ollama # Browse model and configuration storage location
explorer %TEMP% # Temporary executable files are stored in one or more ollama* directories
</code></pre>On a Mac, you can find the logs by running the following command:<pre><code class="lang-bash">cat ~/.ollama/logs/server.log</code></pre>If needed, you can set the environment variable <code>OLLAMA_DEBUG</code> to "1" to get more detailed log information.<h3>Using GPU Acceleration: Installing CUDA Toolkit (Optional)</h3>For smaller models like Llama 3 8B, using a CPU or integrated graphics can work well. However, if your computer has an Nvidia discrete GPU and you want to run larger models or achieve faster response times, you will need to install the CUDA Toolkit to better utilize the discrete GPU.Note: This step is only applicable to Nvidia GPUs with compute capability 5.0+.If you are using an AMD GPU, you can check the list of supported devices to see if your graphics card is supported by Ollama. However, the CUDA Toolkit is only applicable to Nvidia GPUs, so AMD GPU users can skip this section without worry—you are not missing out on anything.Ollama supports the following AMD GPUs:<table><thead><tr><th>Family</th><th>Cards and accelerators</th></tr></thead><tbody><tr><td>AMD Radeon RX</td><td>7900 XTX 7900 XT 7900 GRE 7800 XT 7700 XT 7600 XT 7600 6950 XT 6900 XTX 6900XT 6800 XT 6800 Vega 64 Vega 56</td></tr><tr><td>AMD Radeon PRO</td><td>W7900 W7800 W7700 W7600 W7500 W6900X W6800X Duo W6800X W6800 V620 V420 V340 V320 Vega II Duo Vega II VII SSG</td></tr><tr><td>AMD Instinct</td><td>MI300X MI300A MI300 MI250X MI250 MI210 MI200 MI100 MI60 MI50</td></tr></tbody></table>Next, Nvidia GPU users should check your compute compatibility to see if your card is supported: <a class="no-external-link" href="https://developer.nvidia.com/cuda-gpus" target="_blank">Nvidia CUDA GPUs</a>Here is the list of supported GPUs:<table><thead><tr><th>Compute Capability</th><th>Family</th><th>Cards</th></tr></thead><tbody><tr><td>9.0</td><td>NVIDIA</td><td>H100</td></tr><tr><td>8.9</td><td>GeForce RTX 40xx</td><td>RTX 4090 RTX 4080 SUPER RTX 4080 RTX 4070 Ti SUPER RTX 4070 Ti RTX 4070 SUPER RTX 4070 RTX 4060 Ti RTX 4060</td></tr><tr><td> </td><td>NVIDIA Professional</td><td>L4 L40 RTX 6000</td></tr><tr><td>8.6</td><td>GeForce RTX 30xx</td><td>RTX 3090 Ti RTX 3090 RTX 3080 Ti RTX 3080 RTX 3070 Ti RTX 3070 RTX 3060 Ti RTX 3060</td></tr><tr><td> </td><td>NVIDIA Professional</td><td>A40 RTX A6000 RTX A5000 RTX A4000 RTX A3000 RTX A2000 A10 A16 A2</td></tr><tr><td>8.0</td><td>NVIDIA</td><td>A100 A30</td></tr><tr><td>7.5</td><td>GeForce GTX/RTX</td><td>GTX 1650 Ti TITAN RTX RTX 2080 Ti RTX 2080 RTX 2070 RTX 2060</td></tr><tr><td> </td><td>NVIDIA Professional</td><td>T4 RTX 5000 RTX 4000 RTX 3000 T2000 T1200 T1000 T600 T500</td></tr><tr><td> </td><td>Quadro</td><td>RTX 8000 RTX 6000 RTX 5000 RTX 4000</td></tr><tr><td>7.0</td><td>NVIDIA</td><td>TITAN V V100 Quadro GV100</td></tr><tr><td>6.1</td><td>NVIDIA TITAN</td><td>TITAN Xp TITAN X</td></tr><tr><td> </td><td>GeForce GTX</td><td>GTX 1080 Ti GTX 1080 GTX 1070 Ti GTX 1070 GTX 1060 GTX 1050 Ti GTX 1050</td></tr><tr><td> </td><td>Quadro</td><td>P6000 P5200 P4200 P3200 P5000 P4000 P3000 P2200 P2000 P1000 P620 P600 P500 P520</td></tr><tr><td> </td><td>Tesla</td><td>P40 P4</td></tr><tr><td>6.0</td><td>NVIDIA</td><td>Tesla P100 Quadro GP100</td></tr><tr><td>5.2</td><td>GeForce GTX</td><td>GTX TITAN X GTX 980 Ti GTX 980 GTX 970 GTX 960 GTX 950</td></tr><tr><td> </td><td>Quadro</td><td>M6000 24GB M6000 M5000 M5500M M4000 M2200 M2000 M620</td></tr><tr><td> </td><td>Tesla</td><td>M60 M40</td></tr><tr><td>5.0</td><td>GeForce GTX</td><td>GTX 750 Ti GTX 750 NVS 810</td></tr><tr><td> </td><td>Quadro</td><td>K2200 K1200 K620 M1200 M520 M5000M M4000M M3000M M2000M M1000M K620M M600M M500M</td></tr></tbody></table>If your GPU is supported, you can download the appropriate CUDA Toolkit installer from the following link:<a class="no-external-link" href="https://developer.nvidia.com/cuda-downloads" target="_blank">Download CUDA Toolkit</a>Select the version that matches your system and architecture:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190055925.webp" alt="Download the CUDA installer suitable for Windows x64 architecture." title="Download the CUDA installer suitable for Windows x64 architecture."style="">Download the CUDA installer suitable for Windows x64 architecture.Run the installer and click on OK:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190055192.webp" alt="Run the CUDA setup package." title="Run the CUDA setup package."style="">Run the CUDA setup package.Follow the installer instructions to complete the installation:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190055358.webp" alt="CUDA Installer" title="CUDA Installer"style=""><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056108.webp" alt="CUDA Installer" title="CUDA Installer"style=""><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056154.webp" alt="CUDA Installer" title="CUDA Installer"style=""><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056991.webp" alt="CUDA Installer" title="CUDA Installer"style="">CUDA InstallerAt this point, CUDA has been successfully installed. However, I have some practical tips to share with you to help you better utilize your powerful discrete GPU for running large models.Ollama will automatically detect and use the GPU to run models, but if your computer has multiple GPUs, it may end up using the wrong one. The simplest and most direct way to ensure Ollama uses the discrete GPU is by setting the Display Mode to <code>Nvidia GPU only</code> in the Nvidia Control Panel. As shown in the image below, you can find the Nvidia Control Panel in the system tray or by right-clicking on the desktop.Please note that the Manage Display Mode feature is not available on every computer. If you don't have a similar setting, don't worry—this won't affect your ability to use Ollama.<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056063.webp" alt="Nvidia Control Panel - Manage Display Mode" title="Nvidia Control Panel - Manage Display Mode"style="">Nvidia Control Panel - Manage Display Mode<blockquote>Note: When your computer is connected to external displays, you might not be able to adjust the Display Mode. You will need to disconnect all external displays before changing the mode.</blockquote>How can you verify that Ollama is using the correct GPU to run the model? You can start running a model and ask it a question that requires a long answer (such as "Write a 1000-word article on artificial intelligence"). While it is responding, open a new command line window and run <code>ollama ps</code> to check if Ollama is using the GPU and to see the usage percentage. Additionally, you can use Windows Task Manager to monitor the GPU usage and memory usage to determine which hardware Ollama is using for inference.Example, Ollama shows that it is fully utilizing the GPU, but it does not specify which GPU is being used. We can only confirm that it is not using the CPU.:<pre><code class="lang-bash">C:\Users\Edd1e&gt;ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3:latest 365c0bd3c000 6.7 GB 100% GPU 4 minutes from now</code></pre>You can open Task Manager using the Ctrl+Shift+Esc shortcut and check the Performance tab. If Ollama is using the discrete GPU, you will see some usage in the section shown in the image:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056331.webp" alt=" Task Manager" title=" Task Manager"style="">Task Manager<h2>Advanced Usage</h2><h3>Import from GGUF</h3>Ollama supports importing GGUF models in the Modelfile. You can download a fine-tuned GGUF models from platforms like Hugging Face and run them through Ollama. To do that, you could:<ol><li>Create a file named <code>Modelfile</code>, with a <code>FROM</code> instruction with the local filepath to the model you want to import.<pre><code class="lang-bash">FROM ./filename.gguf</code></pre>For example, you can create a new text document using a text editor and input the following content. Save the document and then rename it to remove the file extension like “.txt”:<pre><code class="lang-bash">FROM &quot;D:\Joe\Downloads\microsoft\Phi-3-mini-4k-instruct-gguf\Phi-3-mini-4k-instruct-q4.gguf&quot;</code></pre>The Phi 3 model comes from <a class="no-external-link" href="https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf" target="_blank">microsoft/Phi-3-mini-4k-instruct-gguf</a> on Hugging Face.<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056970.webp" alt="Hugging Face Phi 3 Page" title="Hugging Face Phi 3 Page"style="">Hugging Face Phi 3 Page</li><li>Create the model in Ollama and name this model "example":<pre><code class="lang-bash">ollama create example -f Modelfile</code></pre>Example:<pre><code class="lang-bash">ollama create example -f &quot;D:\Joe\Downloads\Modelfile&quot;</code></pre></li><li>Run the model<pre><code class="lang-bash">ollama run example</code></pre>Example：<pre><code class="lang-bash">C:\Users\Edd1e&gt;ollama run example
&gt;&gt;&gt; who are you?
 I am Phi, an AI developed by Microsoft to assist users in generating human-like text based on the input provided.
How can I help you today?</code></pre></li></ol><h3>Customize a prompt</h3>Models from the Ollama library can be customized with a prompt. For example, to customize the <code>llama3</code> model:<pre><code class="lang-bash">ollama pull llama3</code></pre>Create a <code>Modelfile</code>:<pre><code class="lang-bash">FROM llama3

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM &quot;&quot;&quot;
You are a research assistant from Meta named Joe. You like AI technology and studying in Australia. Answer as a research assistant, only.
&quot;&quot;&quot;</code></pre>Next, create and run the model:<pre><code class="lang-bash">ollama create Joe -f &quot;D:\Joe\Downloads\Modelfile&quot;
ollama run Joe
&gt;&gt;&gt; hi
G'day! Hi there! I'm Joe, a research assistant from Meta. Nice to meet you! I'm
passionate about exploring the possibilities of artificial intelligence and how it can shape our world for the
better. When I'm not working on projects or staying up-to-date with the latest AI developments, you can find me
exploring the beautiful Australian landscape or hitting the books at one of our top-notch universities here. What
brings you to this neck of the woods?</code></pre><h3>Use Ollama Like GPT: Open WebUI in Docker</h3>In this chapter, we will install Docker and use the open-source front-end extension Open WebUI to connect to Ollama's API, ultimately creating a user-friendly chatbot experience similar to GPT.Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.Docker is an open-source platform designed to automate the deployment, scaling, and management of applications using containerization. Containers package an application with all its dependencies, ensuring consistency across multiple environments. This allows for more efficient development, testing, and deployment processes.Step 1: Start Hyper-VIf you haven't installed Docker before, you need to set it up first.Open Control Panel &gt; Programs &gt; Programs and Features &gt; Turn Windows features on or off<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190056555.webp" alt="Control Panel - Programs and Features" title="Control Panel - Programs and Features"style="">Control Panel - Programs and Features<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190057251.webp" alt="Turn Windows features on or off" title="Turn Windows features on or off"style="">Turn Windows features on or offCheck Hyper-V, Virtual Machine Platform, and Windows Subsystem for Linux, then click OK.<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190057678.webp" alt="" title=""style="">Restart your computer once it's done.Step 2: Install WSLOpen PowerShell and start the command window as an administrator.<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190057816.webp" alt="" title=""style="">Input:<pre><code class="lang-bash">wsl --update</code></pre>Install and set your Unix username and password:<pre><code class="lang-bash">wsl --install</code></pre><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190057561.webp" alt="" title=""style="">Restart your computer after the installation is successful.Let's begin the Docker installation.First, we will install Docker Desktop, which can be downloaded from the official website:<a class="no-external-link" href="https://www.docker.com/products/docker-desktop/" target="_blank">https://www.docker.com/products/docker-desktop/</a><img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190057857.webp" alt="" title=""style="">Follow the instructions to complete the installation. After the installation is complete, launch Docker Desktop and run the following command in the command line or PowerShell to pull the Open WebUI image:<pre><code class="lang-bash">docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main</code></pre>Once the pull is complete, you can see the running container under the Containers tab. Click the link in the Ports section to open the webpage:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190057146.webp" alt="" title=""style="">If you see this page, it means you have succeeded. Next, click on "Sign up" to register an account:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190058606.webp" alt="" title=""style="">Fill in the information to complete the registration:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190058816.webp" alt="" title=""style="">Once logged in, you can select a model from the top left corner. For example, let's choose Llama3:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190058546.webp" alt="" title=""style="">You'll notice that the interface design and interactions are very similar to GPT, making it very user-friendly. It also renders Markdown very well:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190058955.webp" alt="" title=""style="">If you choose the LLaVA model, you can directly paste images, which is more intuitive and convenient compared to filling in the path:<img src="https://cdn.jsdelivr.net/gh/J0O0O0O0E/PicsBackup@master/j000e.com/Pic2.0/202407190058012.webp" alt="" title=""style="">At this point, we have completed the deployment of the frontend page. This makes it more convenient and aesthetically pleasing to use, allowing open-source large models to run locally with a perfect user experience.<h2>Conclusion</h2>In this guide, we walked through the process of installing and using Ollama on Windows, highlighting its straightforward setup and powerful capabilities. By following the steps provided, you can easily deploy and manage large language models locally, benefiting from GPU acceleration and ensuring your data remains private.Ollama simplifies the use of pre-built models like Llama 3 and allows for customization with GGUF models. Additionally, you can explore advanced features such as Docker integration for web-based interfaces, providing a user-friendly chat experience similar to popular AI chatbots.This guide also touched on customizing prompts and environment variables to suit your specific needs, making Ollama a versatile tool for AI development. With its comprehensive documentation and support for various models, Ollama offers a robust solution for anyone looking to harness the power of large language models.Through this guide, you should now have a comprehensive understanding of how to use Ollama, and you are ready to embark on your exploration and development journey.<h2>Source</h2><ul><li><a class="no-external-link" href="https://github.com/ollama/ollama" target="_blank">https://github.com/ollama/ollama</a></li><li><a class="no-external-link" href="https://ollama.com/" target="_blank">https://ollama.com/</a></li><li><a class="no-external-link" href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html" target="_blank">https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html</a></li><li><a class="no-external-link" href="https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored" target="_blank">https://github.com/ollama/ollama/blob/main/docs/faq.md#where-are-models-stored</a></li><li><a class="no-external-link" href="https://github.com/open-webui/open-webui" target="_blank">https://github.com/open-webui/open-webui</a></li><li><a class="no-external-link" href="https://www.docker.com/products/docker-desktop/" target="_blank">https://www.docker.com/products/docker-desktop/</a></li><li><a class="no-external-link" href="https://docs.openwebui.com/" target="_blank">https://docs.openwebui.com/</a></li><li><a class="no-external-link" href="https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf" target="_blank">https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf</a></li></ul>

How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration

How to run Ollama & Open WebUI on Windows

Getting Started with Ollama on Windows: A Step-by-Step Guide

Introduction

What is Ollama?

Key Features of Ollama

Benefits of Local Deployment

What to Expect

Download and Installation of Ollama

Install Ollama on Windows

Install Ollama on macOS

Install Ollama on Linux

Install Ollama by Docker

How to Use Ollama

Customize model storage location and environment variables (Optional)

Quick Start: Try Llama 3

Ollama Model Library

Operation Commands

Viewing Logs

Using GPU Acceleration: Installing CUDA Toolkit (Optional)

Advanced Usage

Import from GGUF

Customize a prompt

Use Ollama Like GPT: Open WebUI in Docker

Conclusion

Source

One comment

Leave a Comment Cancel reply
Our site uses cookies to store your personal information for faster commenting. By continuing to comment, you agree to this term.

理解安卓应用的自启动/电池优化/休眠关系与FCM在大陆的运行条件实测

国行三星电视换区享受国外流媒体

Cloudflare Workers: 反向代理 | 重定向

TeraCopy加速复制程序(注册码)

Microsoft365使用非GoDaddy域名做为Outlook的个性化邮件地址以及Outlook发件人姓名颠倒问题

TinyPNG Third-party App简化压缩图片流程

搭建Epusdt接入独角数卡实现USDT收款(aaPanel)

Nginx强制HTTPS301重定向,不带www的域名跳转带www的

SFT 微调 Llama 3 8B | Google Colab 云端算力 + Unsloth 开源框架 + Hugging Face 多数据集

Getting Started with Dify: No-Code AI Application Development

How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration

How to run Ollama & Open WebUI on Windows

Getting Started with Ollama on Windows: A Step-by-Step Guide

Introduction

What is Ollama?

Key Features of Ollama

Benefits of Local Deployment

What to Expect

Download and Installation of Ollama

Install Ollama on Windows

Install Ollama on macOS

Install Ollama on Linux

Install Ollama by Docker

How to Use Ollama

Customize model storage location and environment variables (Optional)

Quick Start: Try Llama 3

Ollama Model Library

Operation Commands

Viewing Logs

Using GPU Acceleration: Installing CUDA Toolkit (Optional)

Advanced Usage

Import from GGUF

Customize a prompt

Use Ollama Like GPT: Open WebUI in Docker

Conclusion

Source

One comment

Leave a Comment Cancel reply Our site uses cookies to store your personal information for faster commenting. By continuing to comment, you agree to this term.

How to run Ollama & Open WebUI on Windows - Llama 3 & GGUF | Change Model Storage Location | CUDA GPU Acceleration

Leave a Comment Cancel reply
Our site uses cookies to store your personal information for faster commenting. By continuing to comment, you agree to this term.