LoRA (Low-Rank Adaptation)
- Definition
- A parameter-efficient fine-tuning technique that injects small trainable matrices into a frozen model. LoRA lets companies customize large models at a fraction of the cost of full fine-tuning.
- Why it matters
- LoRA democratized model customization. Before LoRA, fine-tuning a 70B-parameter model required tens of GPUs and weeks of training. LoRA reduces this to a single GPU and hours by only training 0.1-1% of the model's parameters. The business impact is enormous: companies can now customize frontier models for specific domains, output formats, and brand voices without massive infrastructure investment. LoRA also enables serving multiple custom model versions efficiently: the base model stays in memory and different LoRA adapters are hot-swapped per request. This is how companies can offer personalized AI to thousands of enterprise customers from a single GPU cluster.
- In practice
- The LoRA paper from Microsoft Research (Hu et al., 2021) showed that low-rank adaptation matched full fine-tuning quality on many tasks while training 10,000x fewer parameters. QLoRA, introduced by Dettmers et al. in 2023, combined LoRA with quantization, enabling fine-tuning of a 65B model on a single 48GB GPU. Together AI and Anyscale offer managed LoRA fine-tuning services. In production, companies maintain libraries of LoRA adapters for different clients or use cases, all sharing the same base model. A single H100 can serve the base model with hundreds of LoRA adapters, switching between them in milliseconds.
We cover data & training every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Fine-tuning
The process of continuing to train a pre-trained model on a smaller, task-specific dataset. Fine-tuning customizes model behavior for specific domains or formats and is a key part of most enterprise AI deployments.
Quantization
Reducing the numerical precision of a model's weights (e.g., from 32-bit to 4-bit) to shrink its memory footprint and speed up inference. Quantization makes it possible to run large models on consumer hardware.
Parameter
A learnable value inside a neural network that gets adjusted during training. Model size is measured in parameters (e.g., 70B, 405B), which roughly correlates with capability and cost.
Distillation
The process of training a smaller, cheaper model to mimic the behavior of a larger, more capable one. Distillation is how companies ship AI to edge devices and reduce inference costs without sacrificing too much quality.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.