Leaderboards
UC Berkeley
Chatbot Arena LLM Leaderboard: Community-driven Evaluation for Best LLM and AI chatbots
Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots
Berkeley Function-Calling Leaderboard
Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3)
The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: BFCL-v1 introducing AST as an evaluation metric, BFCL-v2 introducing enterprise and OSS-contributed functions, and BFCL-v3 introducing multi-turn interactions. Checkout code and data.
Coding Evaluation
https://evalplus.github.io/leaderboard.html
Last updated