Reasoning Training

Math and multi-step reasoning training

Reasoning training focuses on math, answer extraction, and multi-step problem solving.

Dashboard

Open Train, choose Reasoning, then choose SFT, Reasoning, or GRPO. Use SFT for format and traces; use GRPO when a verifier can score the final answer.

CLI

halo-forge reasoning train --dataset gsm8k --model Qwen/Qwen2.5-1.5B-Instruct --output ~/.halo-forge/runs/reasoning-gsm8k

Run a probe or eval after training to catch regressions on general tasks.