Vision-Language Training
VLM training for image and text tasks
Vision-language training adapts models for visual question answering, document extraction, screenshots, charts, and image-grounded reasoning.
Dashboard
Open Train, choose Vision, then choose Vision-language. Some VLM families are capability-gated until local dependencies and deterministic qualification are confirmed.
CLI
halo-forge vlm train --dataset textvqa --model Qwen/Qwen2-VL-2B-Instruct --output ~/.halo-forge/runs/vlm-textvqa
Use custom JSONL when your task needs local image paths or structured expected answers.