Spaces:

seanpedrickcase
/

llm_topic_modelling

Running on Zero

llm_topic_modelling / windows_install_llama-cpp-python.txt

Generally improved inference for low vram systems, unsloth usage improvements, updated packages, switched default local model to Qwen 3 4b

bd1a015 about 2 months ago

raw

history blame

5.27 kB

	---

	#How to build llama-cpp-python on Windows: Step-by-Step Guide

	First, you need to set up a proper C++ development environment.

	# Step 1: Install the C++ Compiler
	Scroll down the page past the main programs to "Tools for Visual Studio" and download the "Build Tools for Visual Studio". This is a standalone installer that gives you the C++ compiler and libraries without installing the full Visual Studio IDE.

	Run the installer. In the "Workloads" tab, check the box for "Desktop development with C++".

	MSVC v143
	C++ ATL
	C++ Profiling tools
	C++ CMake tools for Windows
	C++ MFC
	C++ Modules
	Windows 10 SDK (10.0.20348.0)

	Proceed with the installation.

	Need to use 'x64 Native Tools Command Prompt for VS 2022' to install the below. Run as administrator

	# Step 2: Install CMake
	Go to the CMake download page: https://cmake.org/download

	Download the latest Windows installer (e.g., cmake-x.xx.x-windows-x86_64.msi).

	Run the installer. Crucially, when prompted, select the option to "Add CMake to the system PATH for all users" or "for the current user." This allows you to run cmake from any command prompt.


	# Step 3: (FOR CPU INFERENCE ONLY) Download and Place OpenBLAS
	This is often the trickiest part.

	Go to the OpenBLAS releases on GitHub.

	Find a recent release and download the pre-compiled version for Windows. It will typically be a file named something like OpenBLAS-0.3.21-x64.zip (the version number will change). Make sure you get the 64-bit (x64) version if you are using 64-bit Python.

	Create a folder somewhere easily accessible, for example, C:\libs\.

	Extract the contents of the OpenBLAS zip file into that folder. Your final directory structure should look something like this:

	C:\libs\OpenBLAS\
	├── bin\
	├── include\
	└── lib\

	## 3.b. Install Chocolatey
	https://chocolatey.org/install

	Step 1: Install Chocolatey (if you don't already have it)
	Open PowerShell as an Administrator. (Right-click the Start Menu -> "Windows PowerShell (Admin)" or "Terminal (Admin)").

	Run the following command to install Chocolatey. It's a single, long line:

	Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

	Once it's done, close the Administrator PowerShell window.

	Step 2: Install pkg-config-lite using Chocolatey
	IMPORTANT: Open a NEW command prompt or PowerShell window (as a regular user is fine). This is necessary so it recognises the new choco command.

	Run the following command in console to install a lightweight version of pkg-config:

	choco install pkgconfiglite

	Approve the installation by typing Y or A if prompted.

	# Step 4: Run the Installation Command
	Now you have all the pieces. The final step is to run the command in a terminal that is aware of your new build environment.

	Open the "Developer Command Prompt for VS" from your Start Menu. This is important! This special command prompt automatically configures all the necessary paths for the C++ compiler.

	## For CPU

	set PKG_CONFIG_PATH=C:\<path-to-openblas>\OpenBLAS\lib\pkgconfig # Set this in environment variables

	pip install llama-cpp-python==0.3.16 --force-reinstall --verbose --no-cache-dir -Ccmake.args="-DGGML_BLAS=ON;-DGGML_BLAS_VENDOR=OpenBLAS;-DBLAS_INCLUDE_DIRS=C:/<path-to-openblas>/OpenBLAS/include;-DBLAS_LIBRARIES=C:/<path-to-openblas>/OpenBLAS/lib/libopenblas.lib"

	pip install llama-cpp-python==0.3.16 --verbose --no-cache-dir -Ccmake.args="-DGGML_BLAS=ON;-DGGML_BLAS_VENDOR=OpenBLAS;-DBLAS_INCLUDE_DIRS=C:/Users/s_cas/libs/OpenBLAS/include;-DBLAS_LIBRARIES=C:/Users/s_cas/OpenBLAS/lib/libopenblas.lib";-DPKG_CONFIG_PATH=C:/users/s_cas/openblas/lib/pkgconfig"

	or to make a wheel:

	pip install llama-cpp-python==0.3.16 --wheel-dir dist --verbose --no-cache-dir -Ccmake.args="-DGGML_BLAS=ON;-DGGML_BLAS_VENDOR=OpenBLAS;-DBLAS_INCLUDE_DIRS=C:/<path-to-openblas>/OpenBLAS/include;-DBLAS_LIBRARIES=C:/<path-to-openblas>/OpenBLAS/lib/libopenblas.lib"

	pip wheel llama-cpp-python==0.3.16 --wheel-dir dist --verbose --no-cache-dir -Ccmake.args="-DGGML_BLAS=ON;-DGGML_BLAS_VENDOR=OpenBLAS;-DBLAS_INCLUDE_DIRS=C:/Users/<user>/libs/OpenBLAS/include;-DBLAS_LIBRARIES=C:/Users/<user>/libs/OpenBLAS/lib/libopenblas.lib"



	## With Cuda (NVIDIA GPUs only)

	Make sure that the have the CUDA 12.4 toolkit for windows installed: https://developer.nvidia.com/cuda-12-4-0-download-archive

	### Make sure you are using the x64 version of Developer command tools for the below, e.g. 'x64 Native Tools Command Prompt for VS 2022' ###

	Use NVIDIA GPU (cuBLAS): If you have an NVIDIA GPU, using cuBLAS is often easier because the CUDA Toolkit installer handles most of the setup.

	Install the NVIDIA CUDA Toolkit.

	Run the install command specifying cuBLAS (for faster inference):

	pip install llama-cpp-python==0.3.16 --force-reinstall --verbose -C cmake.args="-DGGML_CUDA=on -DGGML_CUBLAS=on"

	If you want to create a new wheel to help with future installs, you can run:

	cd first to a folder that you have edit access for

	pip wheel llama-cpp-python==0.3.16 --wheel-dir dist --verbose -C cmake.args="-DGGML_CUDA=on -DGGML_CUBLAS=on"