Usage Scenarios

Runnable Halo Forge workflows by goal

These scenarios are intentionally small. Scale samples, cycles, and model size after the workflow is producing trustworthy artifacts.

Code: SFT To RAFT

halo-forge sft train \
  --dataset codealpaca \
  --model Qwen/Qwen2.5-Coder-1.5B \
  --output models/code-sft \
  --epochs 1 \
  --max-samples 500

halo-forge raft train \
  --checkpoint models/code-sft/final_model \
  --prompts data/rlvr/humaneval_prompts.jsonl \
  --verifier execution \
  --cycles 3 \
  --samples-per-prompt 8 \
  --output models/code-raft

Use this when you want the verifier to filter generated code before the next training pass.

Preference Tuning: DPO

halo-forge dpo train \
  --dataset ultrafeedback \
  --model Qwen/Qwen2.5-3B-Instruct \
  --output models/chat-dpo \
  --epochs 1 \
  --loss-type sigmoid

Use DPO when you want the standard preference baseline. On MLX, Halo Forge supports sigmoid, IPO, hinge, and KTO-pair paths.

Reasoning: GRPO

halo-forge grpo train \
  --dataset gsm8k \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --verifier execution \
  --num-generations 4 \
  --reward-threshold 0.5 \
  --output models/reasoning-grpo

Use this when reward can be checked mechanically or with a strict verifier.

VLM: Document Extraction

halo-forge vlm train \
  --dataset textvqa \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --cycles 2 \
  --limit 24 \
  --output models/vlm-docs \
  --allow-prototype-train

For custom forms or invoices, prepare JSONL rows with image paths, prompts, and expected fields before scaling.

Audio: ASR Adaptation

halo-forge audio train \
  --dataset librispeech \
  --model openai/whisper-small \
  --task asr \
  --cycles 2 \
  --output models/audio-asr \
  --allow-prototype-train

Use this for speech-to-text adaptation. Liquid audio models are interesting, but the safest current Halo Forge path is Whisper-compatible.

Agentic: Tool Calling

halo-forge agentic train \
  --dataset xlam \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --cycles 2 \
  --limit 64 \
  --output models/agentic-tools \
  --allow-prototype-train

Use this when outputs must follow function-call or tool-call structure.

Evaluate, Serve, Export

halo-forge eval --model models/code-sft/final_model --tasks core
halo-forge serve --model models/code-sft/final_model
halo-forge convert --source models/code-sft/final_model --format gguf --quant q4 --output models/code-sft.gguf

Evaluation tells you whether training helped. Serving lets you test the artifact behind an OpenAI-compatible API. Export prepares deployment artifacts.

Apple Silicon MLX

halo-forge --accelerator mlx models list --backend mlx
halo-forge --accelerator mlx serve \
  --model mlx-community/Qwen2.5-3B-Instruct-bf16 \
  --backend mlx

Use MLX-format models on Apple Silicon. For PyTorch training on Apple Silicon, use the MPS backend unless a trainer explicitly supports MLX.