From 80830a41a7b38daf2d34db238dd6b53d2bb47b77 Mon Sep 17 00:00:00 2001 From: Javier Martinez Date: Wed, 7 Aug 2024 11:19:52 +0200 Subject: [PATCH] docs: add numpy issue to troubleshooting --- fern/docs/pages/installation/installation.mdx | 10 ++++++---- .../docs/pages/installation/troubleshooting.mdx | 17 ++++++++++++++++- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/fern/docs/pages/installation/installation.mdx b/fern/docs/pages/installation/installation.mdx index f7457b34b5..ce8decab02 100644 --- a/fern/docs/pages/installation/installation.mdx +++ b/fern/docs/pages/installation/installation.mdx @@ -307,11 +307,12 @@ If you have all required dependencies properly configured running the following powershell command should succeed. ```powershell -$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python +$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0 ``` If your installation was correct, you should see a message similar to the following next -time you start the server `BLAS = 1`. +time you start the server `BLAS = 1`. If there is some issue, please refer to the +[troubleshooting](#/installation/getting-started/troubleshooting#guide-for-building-llama-cpp-with-cuda-support) section. ```console llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB) @@ -339,11 +340,12 @@ Some tips: After that running the following command in the repository will install llama.cpp with GPU support: ```bash -CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python +CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0 ``` If your installation was correct, you should see a message similar to the following next -time you start the server `BLAS = 1`. +time you start the server `BLAS = 1`. If there is some issue, please refer to the +[troubleshooting](#/installation/getting-started/troubleshooting#guide-for-building-llama-cpp-with-cuda-support) section. ``` llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB) diff --git a/fern/docs/pages/installation/troubleshooting.mdx b/fern/docs/pages/installation/troubleshooting.mdx index dc99d6cb5b..0b72526d2a 100644 --- a/fern/docs/pages/installation/troubleshooting.mdx +++ b/fern/docs/pages/installation/troubleshooting.mdx @@ -46,4 +46,19 @@ huggingface: embedding: embed_dim: 384 ``` - \ No newline at end of file + + +# Building Llama-cpp with NVIDIA GPU support + +## Out-of-memory error + +If you encounter an out-of-memory error while running `llama-cpp` with CUDA, you can try the following steps to resolve the issue: +1. **Set the next environment:** + ```bash + TOKENIZERS_PARALLELISM=true + ``` +2. **Run PrivateGPT:** + ```bash + poetry run python -m privategpt + ``` +Give thanks to [MarioRossiGithub](https://github.com/MarioRossiGithub) for providing the following solution. \ No newline at end of file