Deploy LLM on any device

Ubuntu Docker setup for Synology NAS

Install Docker on Synology NAS as add-on package
- Tutorial here
- Open Package Center
- Search Docker
- Install Docker
Download the Ubuntu 20.04 docker image
Create a new Docker container with Ubuntu 20.04
Install Anaconda on the docker container
Install compiler tools

1
2
3
4
apt-get update
apt-get install make
apt-get install gcc
apt-get install build-essential

Install LLaMA.cpp to run native LLaMA model

Clone the LLaMA.cpp repo

1
2
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Compile LLaMA.cpp

1
make

Prepare data and run inference

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# obtain the original LLaMA model weights and place them in ./models
# the model weights can be found from the original LLaMA repo 
# at https://github.com/facebookresearch/llama
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

# install Python dependencies
python3 -m pip install -r requirements.txt

# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/

# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -n 128

Using interative mode

1
./main -m ./models/7B/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

Using Alpaca

Alpaca

Alpaca is the instruction following LLaMA model

Recover official weights:

The weight diff between Alpaca-7B and LLaMA-7B is located here. To recover the original Alpaca-7B weights, follow these steps:

1
2
3
4
5
6
1. Convert Meta's released weights into huggingface format. Follow this guide:
    https://huggingface.co/docs/transformers/main/model_doc/llama
2. Make sure you cloned the released weight diff into your local machine. The weight diff is located at:
    https://huggingface.co/tatsu-lab/alpaca-7b/tree/main
3. Run this function with the correct paths. E.g.,
    python weight_diff.py recover --path_raw <path_to_step_1_dir> --path_diff <path_to_step_2_dir> --path_tuned <path_to_store_recovered_weights>

Detailed steps for step 1, you need to download the python script from transformers library and install probobuf from here: https://github.com/protocolbuffers/protobuf/tree/main/python#installation

1
2
pip install protobuf==3.20.0
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

1
2
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path

The weights get from here should be the similar from this repo: https://huggingface.co/chavinlo/alpaca-native

Note that this repo uses slightly modified training procedure

Unofficial weights for LLaMA based models and Alpaca

alpaca-native-7B by chavinlo
- Mostly similar to the official weights, but with slightly different training procedure: FSDP
4 Bit quantised version of the alpaca-native-7B by chavinlo
alpaca-native-13B by chavinlo
- Similar to 1. but with 13B parameters LLaMA model
GPT4-x-alpaca by chavinlo
- Checkout GPT4-x-alpaca
LORA by tloen
- Low rank attention model based on Alpaca for runtime optimization
Vicuna-13b by eachadea
- Checkout Vicuna-13b

GPT4-x-alpaca

gpt4-x-alpaca’s HuggingFace page states that it is based on the Alpaca 13B model, fine-tuned with GPT4 responses for 3 epochs.

GPTeacher

A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer.

Vicuna-13b

Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on 70K user-shared conversations collected from ShareGPT. Then, GPT-4 was used as judge to evaluate the performance of Vicuna-13B. The results show that Vicuna-13B can achieve more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The code and weights, along with an online demo, are publicly available for non-commercial use.

Running LLaMA and LLaMA based models on your workstation, or your laptop, or even on your Synology NAS

Ubuntu Docker setup for Synology NAS

Install LLaMA.cpp to run native LLaMA model

Clone the LLaMA.cpp repo

Compile LLaMA.cpp

Prepare data and run inference

Using interative mode

Using Alpaca

Alpaca

Recover official weights:

Unofficial weights for LLaMA based models and Alpaca

GPT4-x-alpaca

GPTeacher

Vicuna-13b

Relative Response Quality Assessed by GPT-4

More models to try out

Koala

WizardLM

Reference

CATALOG

FEATURED TAGS

FRIENDS