What it is
An educational demo showing how to fine-tune LLMs locally on M-series Macs using QLoRA for hashtag generation.
Why it matters
- Makes LLM fine-tuning accessible without GPU clusters.
- Runs on 8GB unified memory with <10GB storage.
- Practical intro to parameter-efficient techniques.
How it works
- QLoRA: 4-bit quantization + low-rank adapters for memory efficiency.
- Task: Hashtag generation as constrained text output.
- Hardware: Optimized for Apple Silicon’s unified memory.
- Pipeline: Data prep → training → inference on consumer hardware.
Tech
- Language: Python
- Framework: PyTorch, PEFT
- Model: Hugging Face Transformers
- Technique: QLoRA (4-bit quantization)
My role & links
- Implemented QLoRA pipeline for local training.
- Designed hashtag generation task and dataset.
- Code: GitHub
Overview
A demonstration project exploring parameter-efficient fine-tuning techniques, specifically QLoRA, to make LLM customization accessible on consumer hardware.
Technical Details
- QLoRA Implementation: 4-bit quantization with low-rank adapters for memory efficiency
- Task Design: Hashtag generation as a constrained text generation problem
- Hardware Optimization: Tailored for Apple Silicon’s unified memory architecture
- Training Pipeline: End-to-end workflow from data prep to inference
Learning Outcomes
This project serves as a practical introduction to:
- Parameter-efficient fine-tuning methods
- Memory optimization techniques for LLMs
- Trade-offs between model size and performance
- Local development workflows for ML experiments