Leaderboards

UC Berkeley

Chatbot Arena LLM Leaderboard: Community-driven Evaluation for Best LLM and AI chatbots

Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbotsarrow-up-right

Maintained by researchers at UC Berkeley SkyLabarrow-up-right and LMArenaarrow-up-right

Berkeley Function-Calling Leaderboard

Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3)arrow-up-right

The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: BFCL-v1arrow-up-right introducing AST as an evaluation metric, BFCL-v2arrow-up-right introducing enterprise and OSS-contributed functions, and BFCL-v3arrow-up-right introducing multi-turn interactions. Checkout code and dataarrow-up-right.

Coding Evaluation

  1. https://evalplus.github.io/leaderboard.html

Last updated