Choose a Model

How to pick a base model for SFT, RAFT, DPO, GRPO, VLM, audio, and serving

Halo Forge now has a curated model catalog. In the CLI:

halo-forge models list
halo-forge models list --mode raft --backend mlx
halo-forge models show Qwen/Qwen2.5-Coder-3B

In the dashboard, open Models. By default it filters to models that fit the detected workstation backend. The older Run Bundles page is different: it saves groups of your trained runs for comparison.

The dashboard catalog is designed as a decision surface, not just a list. Use the intent filters at the top when you already know the job:

First run: small recommended models for smoke tests.
Code RAFT: code models that fit verifier-ranked training.
Apple Silicon: MLX-native or Apple-friendly models.
VLM and Audio: modality-specific starting points.
Liquid AI: experimental LFM entries with caveats surfaced before training.

Safe SFT entries also expose Use in Start, which opens the goal-based first-run flow. Other rows use Use in Advanced to prefill the advanced training configurator.

The Start goals map to conservative datasets:

Start goal	Dataset	Best use
Code	`codealpaca`	Prove code SFT, model download, and output paths
Reasoning	`gsm8k_sft`	Check small math/reasoning data formatting
Tool use	`xlam_sft`	Check function-calling examples without agentic knobs
Apple Silicon	`codealpaca` + MLX when ready	Prove MLX-friendly local training basics

First Picks

Goal	Start with	Why
Code training	`Qwen/Qwen2.5-Coder-0.5B` for Start, `Qwen/Qwen2.5-Coder-3B` for advanced	Start small, then scale once the path is proven
Fast code smoke	`Qwen/Qwen2.5-Coder-0.5B`	Small enough for quick validation
Preference tuning	`Qwen/Qwen2.5-3B-Instruct`	Good DPO/RM default
Reasoning	`Qwen/Qwen2.5-1.5B-Instruct` or `Qwen/Qwen2.5-Math-1.5B`	Small reasoning-friendly baselines
VLM	`Qwen/Qwen2-VL-2B-Instruct`	Safest current dashboard VLM adapter path
Audio	`openai/whisper-small`	Current Halo Forge audio path is Whisper-oriented
Apple MLX	`mlx-community/Qwen2.5-0.5B-Instruct-bf16` for Start, `mlx-community/Qwen2.5-3B-Instruct-bf16` when memory allows	MLX-format models avoid HF conversion friction
Liquid AI experiment	`LiquidAI/LFM2.5-350M`	Tiny structured-output/tool-use candidate

Memory Tiers

Tier	Typical models	Use
Tiny	39M to 700M	Smoke tests, edge demos, CI
Small	1B to 4B	First real local training runs
Medium	7B to 15B	Quality-oriented workstation runs
Large	24B+	Advanced runs with large unified memory or multi-GPU

Family Notes

Qwen / Qwen Coder: default recommendation for most users. Qwen Coder is the safest code SFT/RAFT path.
Llama, Mistral, Gemma: good general baselines when licensing, tokenizer behavior, or ecosystem fit matters.
DeepSeek, StarCoder2, CodeLlama: useful code alternatives. Verify dependency and tokenizer behavior before long runs.
Whisper: current default for audio ASR training.
Qwen-VL: safest current VLM training path in Halo Forge.
MLX community models: use these on Apple Silicon when you want MLX-native inference or trainer paths.
Liquid AI LFM: promising small and efficient models, but treat them as experimental in Halo Forge until each adapter path has been tested.

Liquid AI Caveats

Liquid AI’s current public docs describe LFM2.5 text models from 350M to 1.2B, including instruction, thinking, and base variants. Their docs also list LFM2.5 vision and audio models. These are interesting for Halo Forge because they are small, edge-oriented, and include MLX/GGUF/vLLM-friendly distribution paths.

Use Liquid text models first for structured output, tool use, extraction, and reasoning experiments. The LiquidAI/LFM2.5-350M model card says it is not recommended for knowledge-intensive tasks or programming, so do not use it as your first code model.

Use Liquid VL/audio models as experimental entries only. Halo Forge’s VLM path is adapter-specific and the audio path is currently Whisper-oriented, so Liquid multimodal models need adapter validation before they should be trusted for training results.

Primary references:

Rules Of Thumb

Start smaller than you think. Prove the data and verifier first.
Use Qwen Coder for code unless you have a reason not to.
Use instruct models for DPO, GRPO, agentic, and chat refinement.
Use MLX-format repos for Apple MLX. Do not expect bitsandbytes-style runtime quantization there.
Treat experimental catalog entries as “interesting, not guaranteed.”