AI Evaluation
Define test cases and run them against the keyword grader or an LLM judge. Cases stay in your browser; completed runs are saved to the run history below.
Test cases
Each case is routed by your routing rules, then answered by the chosen model and graded.