Documentation
Cross-vendor local finetuning workstation — SFT, DPO, GRPO, RAFT, RM with verifier-grounded rewards on ROCm, CUDA, Apple MLX, Apple MPS.
What halo-forge is
A workstation tool that takes a base model and turns it into a finetuned, evaluated, served artifact — without leaving the local machine.
The single thing that makes it different from every adjacent project (axolotl, llama-factory, unsloth, mlx-lm-lora, torchtune): it runs natively on every modern accelerator, not just CUDA.
Pick a goal. Choose a catalog model. Pick an algorithm and verifier. Train, evaluate, serve, and save the run into a bundle when it is worth comparing.
Start By Intent
| I want to… | Start here |
|---|---|
| Train my first local model | Quick Start |
| Control a training workstation remotely | Public Frontend: remote workstation |
| Pick the right base model | Choose a Model |
| See runnable examples | Usage Scenarios |
| Run on Apple Silicon | Hardware Notes and Apple Silicon MLX Quickstart |
| Serve or export a trained artifact | Serve / convert / merge |
Capabilities
Trainers
- SFT — supervised finetuning with QLoRA / LoRA / DoRA / rsLoRA / PiSSA. PyTorch on every torch backend; MLX-native on Apple Silicon.
- DPO — preference optimization (sigmoid / IPO / hinge / KTO-pair / RPO / cDPO). PyTorch via TRL; MLX-native DPO supports sigmoid, IPO, hinge, and KTO-pair in reference-free and reference-model modes.
- GRPO — verifier-grounded policy gradient (DeepSeek-R1 / Tülu 3 family). PyTorch via TRL; MLX-native reference-free and reference-model GRPO.
- RAFT — rejection-sampling RLVR with curriculum + reward shaping. PyTorch + native MLX.
- Reward Model — Bradley-Terry RM from preference pairs. Becomes a learned verifier for any other modality.
Verifiers
Pluggable registry — drop a .py in ~/.halo-forge/verifiers/ or use @register_verifier. Out of the box:
- Execution & compile:
gcc,clang,mingw,execution,pytest,humaneval,mbpp,rust,cargo,go,custom,subprocess - Schema & format:
json_structure,json_schema,regex_format - Reference metrics:
bleu,rouge,chrf - LLM-as-judge:
llm_judge— rubric-graded with any local or hosted judge model
Data pipeline
- Synthesize — generate completions from seed prompts via a teacher model + verifier filter.
- Dedup — exact (SHA-256) + fuzzy (MinHash + LSH).
- Score — heuristic quality scoring + threshold / top-K filter.
- Compose —
synthesize → dedup → score → filteris the four-command pre-finetune sequence.
Inference + serving
- OpenAI-compatible serving —
halo-forge serve --model Xexposes/v1/chat/completions,/v1/completions,/v1/models. - Unified convert —
halo-forge convert --format mlx|gguf|hf --quant q4|q8|fp16|bf16|fp32 - Round-trip verify —
halo-forge convert --verifycatches silently-broken exports. - vLLM rollout — continuous-batched generation on CUDA/ROCm.
- MLX rollout — Apple Silicon equivalent via
mlx_lm.generate.
Evaluation
- lm-evaluation-harness —
halo-forge eval --tasks coreruns MMLU / GSM8K / HumanEval / IFEval / ARC etc. - Mid-training probe —
halo-forge proberuns a small held-out benchmark and diffs against a baseline; catches catastrophic forgetting in single-digit minutes.
Reproducibility
- Replay manifests —
halo-forge replay <run_dir>regenerates the exact launch command. - Sweep infrastructure — Optuna-style hyperparameter search with random / TPE / grid samplers.
Run management
- SQLite run database — search / filter / sort / paginate runs.
- Multi-run comparison — pin runs, overlay loss + reward curves, side-by-side config diff.
- Cohort eval dashboard — runs × tasks grid; best-per-task highlighted.
- Cost rollup — per-run kWh + $ from wall-clock × backend nominal power.
- Live telemetry strip — SSE-streamed GPU util / VRAM / power / throughput.
- Remote workstation — non-loopback access uses bearer tokens and controls one Halo Forge host.
Adapter merging
- Bake — single LoRA into base, output is a standard HF checkpoint.
- Combine — N adapters via
linear/ties/dare_linear/dare_ties/magnitude_prune.
Auth + multi-user
- API tokens — bearer-token auth, automatic when bound to non-loopback. Local-first stays zero-config.
Quick navigation
Getting started
- Quick Start — Install + first run
- Choose a Model — Model catalog, Liquid AI caveats, and first picks
- Usage Scenarios — Code, preference, reasoning, VLM, audio, agentic, serve/export
- Hardware Notes — Per-backend recommendations + feature matrix
- Remote Workstation — Token-authenticated browser access to one training host
Trainers
- Overview — Choosing between SFT / DPO / GRPO / RAFT / RM
Verifiers
- Plugin registry + ecosystem
- Execution + compile
- Schema + format
- Reference metrics
- LLM-as-judge
- Multi-language
- Custom verifiers
Data pipeline
Evaluation
Inference + serving
Reproducibility
Reference
Background
- Theory & Research — RLVR foundations
- Graduated Rewards — Partial credit
- Learning Rate Strategies — LR per algorithm
Meta
- Changelog — Version history
- Contributing — How to contribute
Choose a Training Method
Pick the right Halo Forge trainer from the dashboard or CLI
Command Index
Complete index of all halo-forge commands and flags
Configuration
Complete configuration reference
Full Pipeline
Complete guide to training a code generation model
Quick Start
Three practical paths from install to first useful Halo Forge run
Theory & Research
RLVR paradigm and research foundations
Choose a Model
How to pick a base model for SFT, RAFT, DPO, GRPO, VLM, audio, and serving
Data Generation
Preparing training data for SFT and RAFT
SFT Training
Supervised fine-tuning to establish baseline capability
Toolbox Setup
Build and configure the halo forge container environment
Troubleshooting
Common issues and solutions
Graduated Rewards
Why partial credit matters for RLVR training
Hardware Notes
Configuration for AMD Strix Halo
RAFT Training
Reward-Ranked Fine-Tuning with compiler verification
Usage Scenarios
Runnable Halo Forge workflows by goal
Learning Rate Strategies
Experimental learning rate recommendations for RAFT training
Windows Build Server
Configure a Windows machine for MSVC verification
Benchmarking
Evaluate model performance with pass@k metrics
Web UI
Dashboard for training, benchmarking, and monitoring
Model Catalog Reference
Catalog schema, model family status, and compatibility guidance
Production Training Runs
Step-by-step commands for training all model sizes on the Windows Systems Programming dataset
Public Frontend
User-facing local and remote workstation surface for training, monitoring, results, and docs
Trainers
Halo-forge ships four post-training algorithms. They share a common config / dispatch / output shape so the public API and frontend treat every run the same way regardless of which algorithm produced it.
Preference Tuning
DPO and ORPO training from chosen/rejected examples
Code Datasets
Reward Models
Train a scorer from chosen/rejected examples
GRPO
Verifier-grounded RL with group-relative advantages
Vision-Language Training
VLM training for image and text tasks
Audio Training
Audio and speech training paths
Reasoning Training
Math and multi-step reasoning training
Data pipeline
Three operations close the gap between "I have prompts" and "I have a training-ready dataset":
Tool-Use And Agentic Training
Function-calling and structured tool-use training
Dataset Formats
Data shapes expected by each training method
Dashboard Training
Use the Halo Forge dashboard as the primary operator surface
Training Artifacts
Files written by Halo Forge training runs
Apple Silicon MLX Quickstart
Evaluation
`halo-forge eval` wraps EleutherAI's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) so a halo-forge-trained model can be benchmarked against the published academic suites with one command and one consistent result shape.
Inference + serving
Halo-forge ships three commands that close the train → ship loop without leaving the local machine:
Replay manifests
`halo-forge replay <run_dir>` regenerates the exact launch command for a captured run, optionally relaunching it. Every shipped trainer writes a `replay.json` manifest next to the `training_summary.json`, capturing every input that influenced the run.
Hyperparameter sweeps
Optuna-style search over learning rate, batch, LoRA rank, etc., with random / TPE / grid samplers and ASHA-style sweep-level early stop.
Auth + tokens
Bearer-token API auth that turns on automatically when bound to non-loopback. Local-first stays zero-config.
Modalities & experimental features
Multi-modality training paths (VLM, Audio, Reasoning, Agentic) + features under active development.
Changelog
All notable changes to halo forge
Contributing
How to contribute to halo forge
How to Train
Complete guide to training code generation models with halo forge
Verifiers
Pluggable verification system for RLVR training. Plugin registry, programmatic + schema + reference-metric + LLM-as-judge verifiers.