Hyperparameter sweeps

Optuna-style search over learning rate, batch, LoRA rank, etc., with random / TPE / grid samplers and ASHA-style sweep-level early stop.

The sweep library composes three pieces:

Search space — per-knob distributions (Uniform, LogUniform, Choice).
Sampler — random / tpe (Optuna-backed) / grid. TPE falls back to random with a warning when Optuna isn’t installed.
Pruner — sweep-level early stop after N trials with no improvement.

Search space

from halo_forge.sweep import SearchSpace, LogUniform, Choice, Uniform

space = SearchSpace(params={
    "learning_rate": LogUniform(1e-6, 1e-3),
    "batch_size":    Choice([1, 2, 4]),
    "lora_rank":     Choice([8, 16, 32, 64]),
    "warmup_ratio":  Uniform(0.0, 0.2),
})

Distributions:

	What it does	Right shape for
`Uniform(low, high)`	Continuous flat	rates, fractions
`LogUniform(low, high)`	Log-uniform	learning rates (1e-5 vs 1e-4 is the interesting comparison, not 1e-5 vs 5e-5)
`Choice([v1, ...])`	Discrete	batch sizes, ranks, dtype names

Running a sweep

from halo_forge.sweep import SweepConfig, run_sweep

cfg = SweepConfig(
    name="dpo_lr_search",
    search_space=space,
    n_trials=16,
    metric="final_train_loss",
    direction="minimize",
    sampler="random",  # or "tpe", "grid"
    seed=42,
    early_stop_after=5,            # halt sweep after 5 trials with no improvement
    output_dir="sweeps/dpo_lr_search",
)

def runner(trial_id, params):
    # Build a trainer with `params` overlaid on a base config and run it;
    # return the metrics dict the sweep watches.
    summary = launch_dpo_with_overrides(params)
    return {"final_train_loss": summary["final_train_loss"]}

result = run_sweep(config=cfg, runner=runner)
print("best:", result.best_trial_id, result.best_value)

The runner is callable-driven: any (trial_id, params) -> metrics works. That decouples the sweep machinery from the trainer surface — when a new trainer ships, sweeping it is “wire a runner”.

Output

sweeps/dpo_lr_search/
  ├── trials.jsonl         # one line per trial; streamed live so a dashboard can tail it
  └── sweep_summary.json   # full result + best-trial pointer

trials.jsonl is appended on every trial completion, so a dashboard (F-P) can render in-progress sweeps without waiting for the budget to exhaust.

Samplers

Sampler	Backed by	Best for
`random`	stdlib `random.Random(seed)`	always available; the baseline
`tpe`	Optuna’s `TPESampler`	smarter suggestions when 16+ trials in budget
`grid`	Cartesian product over `Choice` (continuous distributions sampled once)	small discrete spaces

pip install optuna to enable tpe. Without it, tpe logs a warning and falls back to random.

Early stopping

Two layers:

Sweep-level (early_stop_after) — halt the whole sweep when N consecutive trials don’t improve the best.
Per-trial pruning — roadmap. Will integrate ASHA / Hyperband once the trainer-side intermediate-metric reporting lands.

CLI integration

The library is the foundation; a halo-forge sweep CLI that orchestrates SFT/DPO/GRPO trials with sampled config overlays is a follow-up — it requires per-trainer config-overlay machinery beyond the v1 surface. Use the programmatic API today.