Skip to main content

Pretraining LLMs

why-pre-training

cost

data preparation

data

Pre-Traing is like reading! Fine-Tuning is specify task!

data1

packaging-data-for-pretraining

tokenization

Data Packing

data-packing

model-initialization

decoder only

预测下一个token。 decoder

weights

Random weights weights

Existing Model weights, continue training exsiting weights

Downscaling an existing model downscaling

Upscaling an existing model upscaling

training-in-action

Training Cycle

training-cycle

Cost

https://huggingface.co/training-cluster

evaluation

evaluation

Loss: trend

如果不对,就看下dataset 和 training arguments loss

see for yourself

人工很重要。

compare with others

也需要人工

benchmark

https://huggingface.co/open-llm-leaderboard

benchmark-data