Writing

Machine Learning and Quant Essays

A slow-growing collection. New entries are published when there is something substantive to share.

Jun 12, 2026
How to correctly report LLM-as-a-Judge evaluations
A practical guide to running, calibrating, and reporting LLM-as-a-Judge results — covering judge selection, position bias, pairwise vs scoring setups, and the statistics that actually belong in the paper.
Jun 10, 2026
10 must-read machine learning research papers for ML engineers
An annotated bibliography of foundational and recent work in LLM evaluation and reinforcement learning, with notes on why each paper matters in practice.
Apr 20, 2026
Notes on evaluating reasoning models across families
Observations from disentangling reasoning length effects from forced re-entry across Llama and Qwen distilled models.
Nov 2, 2025
Reproducibility on a shared Slurm cluster
Small operational habits that yield significant returns when several collaborators share the same GPU resources.

How to correctly report LLM-as-a-Judge evaluations