Skip to content

A simple way to deploy local llama-2 models with Docker

Notifications You must be signed in to change notification settings

YD-19/uol-llama2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uol-llama2

This repository is a simple way to deply llama-2 models with Docker:

First, make sure you got the docker installed successfully, you can check it by running:

docker run hello-world

If error, please refer to Docker Installation Tutorial Official Website

Clone the repository and play:

git clone --recursive [email protected]:YD-19/uol-llama2.git

simply run the following two lines in your command:

make build

make start

Then, start chatting.

Hints:

  1. you must have at least 1 GPU on your machine for 7b models. 2 GPUs for 13b models, and 8 GPU for 30b models.
  2. If it is your first time to run the model, you should download the model and checklist from meta by the following command:

make build Download=true

  1. If you want to change model, simply rebuild with the following command, for example if you want to use 13b model:

make build model=13b-chat

  1. If CUDA our of memory, you can reduce the maximum sequence length with the following command:

make build seq_len=128

If using 4090, you could set seq_len=4096 for 7b-chat model

About

A simple way to deploy local llama-2 models with Docker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published