AutoArena

AutoArena is an open-source tool to stack rank LLM outputs using automated judge evaluation.
Test any GenAI system by comparing its responses with responses from another system. Generate a leaderboard to identify the LLM, RAG setup, or prompt that produces the highest quality responses.
See AutoArena in Action