WebDev Arena Leaderboard
WebDev Arena is an open-source benchmark evaluating AI capabilities in web development, developed by LMArena.
Leaderboard
Anthropic
Arena Score
1252.24
License
Proprietary
95% CI
+4.91 / -6.09
Votes
12,356
Anthropic
Arena Score
1138.14
License
Proprietary
95% CI
+6.33 / -6.88
Votes
6,288
OpenAI
Arena Score
1068.48
License
Proprietary
95% CI
+6.10 / -5.58
Votes
9,224
OpenAI
Arena Score
1060.55
License
Proprietary
95% CI
+7.32 / -9.25
Votes
4,463
Arena Score
1029.45
License
Proprietary
95% CI
+6.21 / -5.47
Votes
8,010
Arena Score
1025.87
License
Proprietary
95% CI
+5.42 / -4.43
Votes
11,388
Arena Score
978.07
License
Proprietary
95% CI
+4.95 / -4.99
Votes
11,046
DeepSeek
Arena Score
967.88
License
DeepSeek
95% CI
+8.46 / -9.30
Votes
5,345
OpenAI
Arena Score
964.00
License
Proprietary
95% CI
+4.16 / -4.89
Votes
11,977
Alibaba
Arena Score
902.78
License
Apache 2.0
95% CI
+6.16 / -5.16
Votes
10,788
Arena Score
894.46
License
Proprietary
95% CI
+5.42 / -4.77
Votes
10,488
Arena Score
811.76
License
Llama 3.1
95% CI
+13.01 / -16.72
Votes
1,117
More Statistics for WebDev Arena (Overall)
Confidence Interval for Model Strength
Figure 1
Average Win Rate Against All Other Models (Assuming Uniform Sampling and No Ties)
Figure 2
Fraction of Model A Wins for All Non-tied A vs. B Battles
Figure 3
Battle Count for Each Combination of Models (without Ties)
Figure 4