Evaluation - AI Tax Assistant Platform

AI Evaluation

Define test cases and run them against the keyword grader or an LLM judge. Cases stay in your browser; completed runs are saved to the run history below.

Test cases

Each case is routed by your routing rules, then answered by the chosen model and graded.

Results

Edit the cases above, then click Run to see the results.

Run history

No runs yet. Completed runs are saved here, with a pass-rate trend over time.