Running LLMs in Local

有个好机器和好网络！

Opensource

Huggingface

Transformers

Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch.

https://huggingface.co/docs/transformers/en/index

Load model from Huggingface

workshop/03-LLMs-Local/Qwen2-1.5B-Instruct-huggingface.py

Model Scope 魔搭社区

https://www.modelscope.cn/home https://community.modelscope.cn/

modelscope libary

https://www.modelscope.cn/docs/Quick%20Start

setup

check CUDA version nvidia-smi
pytorch with CUDA https://pytorch.org/get-started/locally/

Load model from Model Scope

workshop/03-LLMs-Local/Qwen2-1.5B-Instruct.py

vllm

Ollama

https://ollama.com/

Load model from Ollama

Quantized LLMs

Process of reducing the precision of the model's parameters (weights) from high-precision formats (such as 32-bit floating point) to lower-precision formats (such as 8-bit integers).

FP32 (32-bit floating point): Standard precision used in most training processes. FP16 (16-bit floating point): Common in mixed precision training, offering a good balance between speed and accuracy. INT8 (8-bit integer): Often used in inference for maximum efficiency.

llama.cpp

GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework. https://huggingface.co/docs/hub/en/gguf

Running LLMs in Local

Opensource

Huggingface

Transformers

Model

datasets

spaces

Load model from Huggingface

Model Scope 魔搭社区

modelscope libary

setup

Load model from Model Scope

vllm

Ollama

Load model from Ollama

Quantized LLMs

llama.cpp

Llama-3-8B-Instruct-GGUF

load from disk

ONNX etc

Load model from Huggingface

Running LLMs in Local

Opensource​

Huggingface​

Transformers​

Model​

datasets​

spaces​

Load model from Huggingface​

Model Scope 魔搭社区​

modelscope libary​

setup​

Load model from Model Scope​

vllm​

Ollama​

Load model from Ollama​

Quantized LLMs​

llama.cpp​

Llama-3-8B-Instruct-GGUF​

load from disk​

ONNX etc​

Load model from Huggingface​

Opensource

Huggingface

Transformers

Model

datasets

spaces

Load model from Huggingface

Model Scope 魔搭社区

modelscope libary

setup

Load model from Model Scope

vllm

Ollama

Load model from Ollama

Quantized LLMs

llama.cpp

Llama-3-8B-Instruct-GGUF

load from disk

ONNX etc

Load model from Huggingface