Choose a Training Method

Pick the right Halo Forge trainer from the dashboard or CLI

Halo Forge now exposes the same training surface in the dashboard and CLI. Use Start for the first safe SFT run, then use Train when you know the goal and need a specific method.

Goal	Start with	Move to	Data shape
Domain/style adaptation	SFT	DPO or ORPO	prompt/completion
Code with executable checks	SFT	RAFT or GRPO	prompts plus verifier
Preference alignment	DPO or ORPO	RM, then GRPO	prompt/chosen/rejected
Verifier-grounded reasoning	SFT or reasoning	GRPO	prompts plus verifier
Vision-language	VLM	VLM with verifier gating	image + prompt + answer
Audio	Audio	Audio with task verifier	audio + transcript/label
Tool use	Agentic	GRPO with schema/tool verifier	messages/tool calls

Dashboard Flow

Open Train.
Choose a goal: Code, Reasoning, Tool use, Vision, Audio, or Preferences.
Choose the method Halo Forge should run.
Review the generated launch, preflight, output path, and backend notes.
Launch, then monitor the run from Runs and inspect artifacts in Results.

The dashboard intentionally hides unusual flags until the advanced drawer is opened. The CLI remains available for exact reproducibility and scripting.

Method Guide

SFT learns from labeled examples. Use it first unless you already have preference data.
RAFT generates multiple answers, verifies them, keeps the useful ones, then trains.
DPO uses chosen/rejected pairs to improve behavior without an explicit reward model.
ORPO uses the same pair data as DPO but skips the reference model.
RM trains a reward scorer from chosen/rejected pairs.
GRPO uses a verifier as reward and performs group-relative policy updates.
VLM, audio, reasoning, and agentic are domain-specific training surfaces with capability gates where needed.

Dashboard Flow

Method Guide

Related Pages