Local or Remote Training Workstation

Train, evaluate, serve, and export models from one Halo Forge machine across ROCm, CUDA, Apple MPS, and Apple MLX.

Version 1.4.0 Hardware ROCm · CUDA · Apple License Apache 2.0

Start By Intent

Pick the path that matches your first hour.

First Local Run

Use the beginner quickstart to install, check hardware, train a tiny model, evaluate, and serve.

Choose a Model

Browse the curated model catalog: Qwen, Llama, Mistral, Gemma, DeepSeek, Whisper, MLX, and Liquid AI.

Run a Scenario

Follow goal-based recipes for code, preference tuning, reasoning, VLM, audio, agentic, and export.

The workflow: choose a catalog model, train with an appropriate algorithm, verify/evaluate the result, then serve or export the artifact.

Local or Remote Workstation

Halo Forge runs on the machine with the accelerator. Use it locally at the desk, or open the same product surface from another device on a trusted network.

Local
Loopback is zero-config
127.0.0.1
Token
Non-loopback requires bearer auth
hfk_...
Monitor
Watch runs, logs, telemetry
Remote browser
One
One workstation, not a worker fleet
Remote v1
bash
# On the training workstation
halo-forge token create dashboard
halo-forge serve --host 0.0.0.0 --port 8000

# From another device
open http://workstation.local:8000
# Paste the token in the Connection screen.

The RAFT Approach

halo-forge implements RAFT (Reward-Ranked Fine-Tuning) — essentially iterated rejection sampling:

python
for cycle in range(num_cycles):
    # 1. Generate samples
    samples = model.generate(prompts, n=8)
    
    # 2. Verify with compiler
    results = verifier.verify_batch(samples)
    
    # 3. Filter by reward threshold
    filtered = [s for s, r in zip(samples, results) 
                if r.reward >= 0.5]
    
    # 4. Fine-tune on verified samples
    model.train(filtered)

This is simpler than PPO/GRPO (1x model memory vs 2-4x), stable to train, and produces comparable results.

How It Works

RAFT iteratively improves code generation through verified feedback:

Generate
Multiple solutions
Per prompt
Verify
Compile & test
Real feedback
Filter
Keep best samples
By reward
Train
On verified code
Repeat cycles
CycleWhat HappensExpected
1-2Model learns basic patternsLargest gains
3-4Refinement continuesModerate gains
5-6Approach plateauMonitor for stopping

Results vary by model, dataset, and domain. Run your own benchmarks to measure improvement.

Quick Start

bash
# Clone and install
git clone https://github.com/professor-moody/halo-forge.git
cd halo-forge
python -m venv .venv
source .venv/bin/activate
pip install -e .

# Validate installation
halo-forge test --level smoke      # Quick check, no GPU
halo-forge info                    # Backend and hardware

# Browse model choices
halo-forge models list --mode sft

# First useful training run
halo-forge sft train --dataset codealpaca --model Qwen/Qwen2.5-Coder-1.5B --epochs 1

Graduated Rewards

Binary rewards create sparse gradients. halo-forge uses graduated rewards:

OutcomeRewardSignal
Syntax error0.0Completely wrong
Compiles with warnings0.3Close but imperfect
Compiles clean0.5Correct syntax
Runs without crash0.7Executable
Correct output1.0Fully correct

Related Projects

malagent — Applies RLVR to security research (EDR evasion with Elastic Security as verifier)