Documentation

Cross-vendor local finetuning workstation — SFT, DPO, GRPO, RAFT, RM with verifier-grounded rewards on ROCm, CUDA, Apple MLX, Apple MPS.

What halo-forge is

A workstation tool that takes a base model and turns it into a finetuned, evaluated, served artifact — without leaving the local machine.

The single thing that makes it different from every adjacent project (axolotl, llama-factory, unsloth, mlx-lm-lora, torchtune): it runs natively on every modern accelerator, not just CUDA.

Pick a goal. Choose a catalog model. Pick an algorithm and verifier. Train, evaluate, serve, and save the run into a bundle when it is worth comparing.

Start By Intent

I want to…Start here
Train my first local modelQuick Start
Control a training workstation remotelyPublic Frontend: remote workstation
Pick the right base modelChoose a Model
See runnable examplesUsage Scenarios
Run on Apple SiliconHardware Notes and Apple Silicon MLX Quickstart
Serve or export a trained artifactServe / convert / merge

Capabilities

Trainers

  • SFT — supervised finetuning with QLoRA / LoRA / DoRA / rsLoRA / PiSSA. PyTorch on every torch backend; MLX-native on Apple Silicon.
  • DPO — preference optimization (sigmoid / IPO / hinge / KTO-pair / RPO / cDPO). PyTorch via TRL; MLX-native DPO supports sigmoid, IPO, hinge, and KTO-pair in reference-free and reference-model modes.
  • GRPO — verifier-grounded policy gradient (DeepSeek-R1 / Tülu 3 family). PyTorch via TRL; MLX-native reference-free and reference-model GRPO.
  • RAFT — rejection-sampling RLVR with curriculum + reward shaping. PyTorch + native MLX.
  • Reward Model — Bradley-Terry RM from preference pairs. Becomes a learned verifier for any other modality.

Verifiers

Pluggable registry — drop a .py in ~/.halo-forge/verifiers/ or use @register_verifier. Out of the box:

  • Execution & compile: gcc, clang, mingw, execution, pytest, humaneval, mbpp, rust, cargo, go, custom, subprocess
  • Schema & format: json_structure, json_schema, regex_format
  • Reference metrics: bleu, rouge, chrf
  • LLM-as-judge: llm_judge — rubric-graded with any local or hosted judge model

Data pipeline

  • Synthesize — generate completions from seed prompts via a teacher model + verifier filter.
  • Dedup — exact (SHA-256) + fuzzy (MinHash + LSH).
  • Score — heuristic quality scoring + threshold / top-K filter.
  • Composesynthesize → dedup → score → filter is the four-command pre-finetune sequence.

Inference + serving

  • OpenAI-compatible servinghalo-forge serve --model X exposes /v1/chat/completions, /v1/completions, /v1/models.
  • Unified converthalo-forge convert --format mlx|gguf|hf --quant q4|q8|fp16|bf16|fp32
  • Round-trip verifyhalo-forge convert --verify catches silently-broken exports.
  • vLLM rollout — continuous-batched generation on CUDA/ROCm.
  • MLX rollout — Apple Silicon equivalent via mlx_lm.generate.

Evaluation

  • lm-evaluation-harnesshalo-forge eval --tasks core runs MMLU / GSM8K / HumanEval / IFEval / ARC etc.
  • Mid-training probehalo-forge probe runs a small held-out benchmark and diffs against a baseline; catches catastrophic forgetting in single-digit minutes.

Reproducibility

  • Replay manifestshalo-forge replay <run_dir> regenerates the exact launch command.
  • Sweep infrastructure — Optuna-style hyperparameter search with random / TPE / grid samplers.

Run management

  • SQLite run database — search / filter / sort / paginate runs.
  • Multi-run comparison — pin runs, overlay loss + reward curves, side-by-side config diff.
  • Cohort eval dashboard — runs × tasks grid; best-per-task highlighted.
  • Cost rollup — per-run kWh + $ from wall-clock × backend nominal power.
  • Live telemetry strip — SSE-streamed GPU util / VRAM / power / throughput.
  • Remote workstation — non-loopback access uses bearer tokens and controls one Halo Forge host.

Adapter merging

  • Bake — single LoRA into base, output is a standard HF checkpoint.
  • Combine — N adapters via linear / ties / dare_linear / dare_ties / magnitude_prune.

Auth + multi-user

  • API tokens — bearer-token auth, automatic when bound to non-loopback. Local-first stays zero-config.

Quick navigation

Getting started

Trainers

  • Overview — Choosing between SFT / DPO / GRPO / RAFT / RM

Verifiers

Data pipeline

Evaluation

Inference + serving

Reproducibility

Reference

Background

Meta

Choose a Training Method

Pick the right Halo Forge trainer from the dashboard or CLI

Command Index

Complete index of all halo-forge commands and flags

Configuration

Complete configuration reference

Full Pipeline

Complete guide to training a code generation model

Quick Start

Three practical paths from install to first useful Halo Forge run

Theory & Research

RLVR paradigm and research foundations

Choose a Model

How to pick a base model for SFT, RAFT, DPO, GRPO, VLM, audio, and serving

Data Generation

Preparing training data for SFT and RAFT

SFT Training

Supervised fine-tuning to establish baseline capability

Toolbox Setup

Build and configure the halo forge container environment

Troubleshooting

Common issues and solutions

Graduated Rewards

Why partial credit matters for RLVR training

Hardware Notes

Configuration for AMD Strix Halo

RAFT Training

Reward-Ranked Fine-Tuning with compiler verification

Usage Scenarios

Runnable Halo Forge workflows by goal

Learning Rate Strategies

Experimental learning rate recommendations for RAFT training

Windows Build Server

Configure a Windows machine for MSVC verification

Benchmarking

Evaluate model performance with pass@k metrics

Web UI

Dashboard for training, benchmarking, and monitoring

Model Catalog Reference

Catalog schema, model family status, and compatibility guidance

Production Training Runs

Step-by-step commands for training all model sizes on the Windows Systems Programming dataset

Public Frontend

User-facing local and remote workstation surface for training, monitoring, results, and docs

Trainers

Halo-forge ships four post-training algorithms. They share a common config / dispatch / output shape so the public API and frontend treat every run the same way regardless of which algorithm produced it.

Preference Tuning

DPO and ORPO training from chosen/rejected examples

Code Datasets

Reward Models

Train a scorer from chosen/rejected examples

GRPO

Verifier-grounded RL with group-relative advantages

Vision-Language Training

VLM training for image and text tasks

Audio Training

Audio and speech training paths

Reasoning Training

Math and multi-step reasoning training

Data pipeline

Three operations close the gap between "I have prompts" and "I have a training-ready dataset":

Tool-Use And Agentic Training

Function-calling and structured tool-use training

Dataset Formats

Data shapes expected by each training method

Dashboard Training

Use the Halo Forge dashboard as the primary operator surface

Training Artifacts

Files written by Halo Forge training runs

Apple Silicon MLX Quickstart

Evaluation

`halo-forge eval` wraps EleutherAI's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) so a halo-forge-trained model can be benchmarked against the published academic suites with one command and one consistent result shape.

Inference + serving

Halo-forge ships three commands that close the train → ship loop without leaving the local machine:

Replay manifests

`halo-forge replay <run_dir>` regenerates the exact launch command for a captured run, optionally relaunching it. Every shipped trainer writes a `replay.json` manifest next to the `training_summary.json`, capturing every input that influenced the run.

Hyperparameter sweeps

Optuna-style search over learning rate, batch, LoRA rank, etc., with random / TPE / grid samplers and ASHA-style sweep-level early stop.

Auth + tokens

Bearer-token API auth that turns on automatically when bound to non-loopback. Local-first stays zero-config.

Modalities & experimental features

Multi-modality training paths (VLM, Audio, Reasoning, Agentic) + features under active development.

Changelog

All notable changes to halo forge

Contributing

How to contribute to halo forge

How to Train

Complete guide to training code generation models with halo forge

Verifiers

Pluggable verification system for RLVR training. Plugin registry, programmatic + schema + reference-metric + LLM-as-judge verifiers.