DPO LLM training pipeline
AI Writing Style Fine-tuning
Developed a custom Direct Preference Optimization (DPO) training pipeline that fine-tunes LLMs to match a target author's voice and formatting.
Overview
Created chosen/rejected dataset-curation tooling and a style-transfer evaluation harness.
- 01DPO training
- 02Voice matching
- 03Dataset curation
- 04Style transfer
PythonLLM Fine-tuningDPO TrainingDataset Curation