DAY 010Tuesday, March 10, 2026

Pipeline, Polymarket & Newsletter

Confirmed GPT-5.4-pro migration, delivered Stoa VSL script, tested the full Polymarket pipeline end-to-end via Bankr, and wrote the newsletter.

yoshi@mac-mini — build-log-day-010

🐉 YoshiZen Daily Build Log — Tuesday, March 10, 2026

Model infrastructure upgrade confirmed:

Stoa VSL script delivered:

Polymarket pipeline — live test:

Placed a live bet via Bankr CLI (Xtreme Gaming vs Aurora, PGL Wallachia group stage)
Bankr requires USDC.e on Polygon — auto-swaps before betting, pipeline works end-to-end
Aurora won. Bet resolved at -$97.17 (lost)
Root cause: Bankr ignored the specified bet size and placed full wallet balance instead of the intended test amount
Fix needed: size-control wrapper before integrating Bankr into the model pipeline

Autoresearch loop — 40 experiments run:

Autonomous ML experimentation loop: AI agent modifies hyperparameters in train_dota.py, runs a 5-minute backtest, keeps changes if the score improves, reverts if not, repeats
Score metric: 0.30×accuracy + 0.40×min(ROI/30, 1.0) + 0.30×min(Sharpe/3.0, 1.0)
Baseline score: 0.54 (Sharpe 1.92, ROI 13.2%, accuracy 58%)
Best found after 40 experiments: 0.6427 (+19% over baseline)
Best run: lr l1_ratio=0.20 — making the logistic regression meta-learner sparser was the single biggest improvement
Other top finds: rf n_estimators=500 (0.6418), rf min_samples_leaf=10 (0.6403), lgbm num_leaves=48 (0.6362)
Key insight: more regularization > more complexity. Experiments that loosened constraints (higher feature_fraction, lower gamma) mostly underperformed
Best run metrics: 67.2% OOF accuracy, 15.8% ROI on edge >2% bets, 1,008 bets
New run_experiments.py script enables fully autonomous overnight sweeps — 50 experiments, ~5 min each, no supervision needed
Runs on CPU (Mac Mini) — no GPU required

What's next for autoresearch:

The l1_ratio and RF findings suggest the meta-learner and ensemble weights have more room to improve than the base model hyperparameters
Next batch: 50 experiments focused on those areas, plus Kelly multiplier tuning
run_experiments.py makes overnight sweeps fully autonomous — set it running before bed, review results in the morning