
Local or Remote Training Workstation
Train, evaluate, serve, and export models from one Halo Forge machine across ROCm, CUDA, Apple MPS, and Apple MLX.
Start By Intent
Pick the path that matches your first hour.
First Local Run
Use the beginner quickstart to install, check hardware, train a tiny model, evaluate, and serve.
Choose a Model
Browse the curated model catalog: Qwen, Llama, Mistral, Gemma, DeepSeek, Whisper, MLX, and Liquid AI.
Run a Scenario
Follow goal-based recipes for code, preference tuning, reasoning, VLM, audio, agentic, and export.
Local or Remote Workstation
Halo Forge runs on the machine with the accelerator. Use it locally at the desk, or open the same product surface from another device on a trusted network.
# On the training workstation
halo-forge token create dashboard
halo-forge serve --host 0.0.0.0 --port 8000
# From another device
open http://workstation.local:8000
# Paste the token in the Connection screen.The RAFT Approach
halo-forge implements RAFT (Reward-Ranked Fine-Tuning) — essentially iterated rejection sampling:
for cycle in range(num_cycles):
# 1. Generate samples
samples = model.generate(prompts, n=8)
# 2. Verify with compiler
results = verifier.verify_batch(samples)
# 3. Filter by reward threshold
filtered = [s for s, r in zip(samples, results)
if r.reward >= 0.5]
# 4. Fine-tune on verified samples
model.train(filtered)This is simpler than PPO/GRPO (1x model memory vs 2-4x), stable to train, and produces comparable results.
How It Works
RAFT iteratively improves code generation through verified feedback:
| Cycle | What Happens | Expected |
|---|---|---|
| 1-2 | Model learns basic patterns | Largest gains |
| 3-4 | Refinement continues | Moderate gains |
| 5-6 | Approach plateau | Monitor for stopping |
Results vary by model, dataset, and domain. Run your own benchmarks to measure improvement.
Quick Start
# Clone and install
git clone https://github.com/professor-moody/halo-forge.git
cd halo-forge
python -m venv .venv
source .venv/bin/activate
pip install -e .
# Validate installation
halo-forge test --level smoke # Quick check, no GPU
halo-forge info # Backend and hardware
# Browse model choices
halo-forge models list --mode sft
# First useful training run
halo-forge sft train --dataset codealpaca --model Qwen/Qwen2.5-Coder-1.5B --epochs 1Graduated Rewards
Binary rewards create sparse gradients. halo-forge uses graduated rewards:
| Outcome | Reward | Signal |
|---|---|---|
| Syntax error | 0.0 | Completely wrong |
| Compiles with warnings | 0.3 | Close but imperfect |
| Compiles clean | 0.5 | Correct syntax |
| Runs without crash | 0.7 | Executable |
| Correct output | 1.0 | Fully correct |
Documentation
Quick Start
Beginner, evaluator, and power-user paths
Choose a Model
Catalog guidance and Liquid AI caveats
Usage Scenarios
Runnable recipes by goal
Full Pipeline
Complete training workflow
Theory
RAFT research foundations
Verifiers
GCC, MinGW, pytest, custom
Configuration
Full config reference
Hardware Notes
Strix Halo configuration
Related Projects
malagent — Applies RLVR to security research (EDR evasion with Elastic Security as verifier)