Local or Remote Training Workstation

Train, evaluate, serve, and export models from one Halo Forge machine across ROCm, CUDA, Apple MPS, and Apple MLX.

Version 1.4.0 Hardware ROCm · CUDA · Apple License Apache 2.0

Quick Start Choose a Model Start Remote Workstation View on GitHub

Start By Intent

Pick the path that matches your first hour.

First Local Run

Use the beginner quickstart to install, check hardware, train a tiny model, evaluate, and serve.

Choose a Model

Browse the curated model catalog: Qwen, Llama, Mistral, Gemma, DeepSeek, Whisper, MLX, and Liquid AI.

Run a Scenario

Follow goal-based recipes for code, preference tuning, reasoning, VLM, audio, agentic, and export.

The workflow: choose a catalog model, train with an appropriate algorithm, verify/evaluate the result, then serve or export the artifact.

Local or Remote Workstation

Halo Forge runs on the machine with the accelerator. Use it locally at the desk, or open the same product surface from another device on a trusted network.

Local

Loopback is zero-config

127.0.0.1

Token

Non-loopback requires bearer auth

hfk_...

Monitor

Watch runs, logs, telemetry

Remote browser

One

One workstation, not a worker fleet

Remote v1

bash

# On the training workstation
halo-forge token create dashboard
halo-forge serve --host 0.0.0.0 --port 8000

# From another device
open http://workstation.local:8000
# Paste the token in the Connection screen.

Remote Setup Guide →

The RAFT Approach

halo-forge implements RAFT (Reward-Ranked Fine-Tuning) — essentially iterated rejection sampling:

python

for cycle in range(num_cycles):
    # 1. Generate samples
    samples = model.generate(prompts, n=8)
    
    # 2. Verify with compiler
    results = verifier.verify_batch(samples)
    
    # 3. Filter by reward threshold
    filtered = [s for s, r in zip(samples, results) 
                if r.reward >= 0.5]
    
    # 4. Fine-tune on verified samples
    model.train(filtered)

This is simpler than PPO/GRPO (1x model memory vs 2-4x), stable to train, and produces comparable results.

How It Works

RAFT iteratively improves code generation through verified feedback:

Generate

Multiple solutions

Per prompt

Verify

Compile & test

Real feedback

Filter

Keep best samples

By reward

Train

On verified code

Repeat cycles

Cycle	What Happens	Expected
1-2	Model learns basic patterns	Largest gains
3-4	Refinement continues	Moderate gains
5-6	Approach plateau	Monitor for stopping

Results vary by model, dataset, and domain. Run your own benchmarks to measure improvement.

Quick Start

bash

# Clone and install
git clone https://github.com/professor-moody/halo-forge.git
cd halo-forge
python -m venv .venv
source .venv/bin/activate
pip install -e .

# Validate installation
halo-forge test --level smoke      # Quick check, no GPU
halo-forge info                    # Backend and hardware

# Browse model choices
halo-forge models list --mode sft

# First useful training run
halo-forge sft train --dataset codealpaca --model Qwen/Qwen2.5-Coder-1.5B --epochs 1

Full Quick Start Guide →

Graduated Rewards

Binary rewards create sparse gradients. halo-forge uses graduated rewards:

Outcome	Reward	Signal
Syntax error	0.0	Completely wrong
Compiles with warnings	0.3	Close but imperfect
Compiles clean	0.5	Correct syntax
Runs without crash	0.7	Executable
Correct output	1.0	Fully correct

Documentation

→

Related Projects

malagent — Applies RLVR to security research (EDR evasion with Elastic Security as verifier)

Local or Remote Training Workstation

Start By Intent

First Local Run

Choose a Model

Run a Scenario

Local or Remote Workstation

The RAFT Approach

How It Works

Quick Start

Graduated Rewards

Documentation

Quick Start

Choose a Model

Usage Scenarios

Full Pipeline

Theory

Verifiers

Configuration

Hardware Notes

Related Projects