Data & TrainingDeep Dive

LoRA (Low-Rank Adaptation)

Source
Definition
A parameter-efficient fine-tuning technique that injects small trainable matrices into a frozen model. LoRA lets companies customize large models at a fraction of the cost of full fine-tuning.
Why it matters
LoRA democratized model customization. Before LoRA, fine-tuning a 70B-parameter model required tens of GPUs and weeks of training. LoRA reduces this to a single GPU and hours by only training 0.1-1% of the model's parameters. The business impact is enormous: companies can now customize frontier models for specific domains, output formats, and brand voices without massive infrastructure investment. LoRA also enables serving multiple custom model versions efficiently: the base model stays in memory and different LoRA adapters are hot-swapped per request. This is how companies can offer personalized AI to thousands of enterprise customers from a single GPU cluster.
In practice
The LoRA paper from Microsoft Research (Hu et al., 2021) showed that low-rank adaptation matched full fine-tuning quality on many tasks while training 10,000x fewer parameters. QLoRA, introduced by Dettmers et al. in 2023, combined LoRA with quantization, enabling fine-tuning of a 65B model on a single 48GB GPU. Together AI and Anyscale offer managed LoRA fine-tuning services. In production, companies maintain libraries of LoRA adapters for different clients or use cases, all sharing the same base model. A single H100 can serve the base model with hundreds of LoRA adapters, switching between them in milliseconds.

We cover data & training every week.

Get the 5 AI stories that matter — free, every Friday.

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.