model,elo,wins,losses,ties,games_played,confidence_interval Llama-3.2-1b-Instruct,1516.0,1,0,0,1,784.0 Qwen2.5-1.5b-Instruct,1500.0,0,0,0,0,inf Qwen2.5-3b-Instruct,1500.0,0,0,0,1,784.0 Llama-3.2-3b-Instruct,1500.0,0,0,0,1,784.0 Gemma-3-1b-it,1500.0,0,0,0,0,inf Gemma-2-2b-it,1500.0,0,0,0,1,784.0 IBM Granite-3.3-2b-instruct,1500.0,0,0,0,0,inf Phi-4-mini-instruct,1484.0,0,1,0,2,554.4