Test any GenAI system by comparing its responses with responses from another system. Generate a leaderboard to identify the LLM, RAG setup, or prompt that produces the highest quality responses.
AutoArena
AutoArena is an open-source tool to stack rank LLM outputs using automated judge evaluation.