Choose a Training Method

Pick the right Halo Forge trainer from the dashboard or CLI

Halo Forge now exposes the same training surface in the dashboard and CLI. Use Start for the first safe SFT run, then use Train when you know the goal and need a specific method.

GoalStart withMove toData shape
Domain/style adaptationSFTDPO or ORPOprompt/completion
Code with executable checksSFTRAFT or GRPOprompts plus verifier
Preference alignmentDPO or ORPORM, then GRPOprompt/chosen/rejected
Verifier-grounded reasoningSFT or reasoningGRPOprompts plus verifier
Vision-languageVLMVLM with verifier gatingimage + prompt + answer
AudioAudioAudio with task verifieraudio + transcript/label
Tool useAgenticGRPO with schema/tool verifiermessages/tool calls

Dashboard Flow

  1. Open Train.
  2. Choose a goal: Code, Reasoning, Tool use, Vision, Audio, or Preferences.
  3. Choose the method Halo Forge should run.
  4. Review the generated launch, preflight, output path, and backend notes.
  5. Launch, then monitor the run from Runs and inspect artifacts in Results.

The dashboard intentionally hides unusual flags until the advanced drawer is opened. The CLI remains available for exact reproducibility and scripting.

Method Guide

  • SFT learns from labeled examples. Use it first unless you already have preference data.
  • RAFT generates multiple answers, verifies them, keeps the useful ones, then trains.
  • DPO uses chosen/rejected pairs to improve behavior without an explicit reward model.
  • ORPO uses the same pair data as DPO but skips the reference model.
  • RM trains a reward scorer from chosen/rejected pairs.
  • GRPO uses a verifier as reward and performs group-relative policy updates.
  • VLM, audio, reasoning, and agentic are domain-specific training surfaces with capability gates where needed.