Compare historical tournament picks from each trained model run, including raw correctness and weighted bracket scoring.