Benchmark Manager
Manage benchmark test cases across safety, conversational quality, and therapeutic value metrics
Evaluation Types: Each metric is evaluated differently.🤖 Automatic metrics run without human input. 🔄 Hybrid metrics combine AI with expert validation. 👤 Human metrics require psychologist evaluation.
Metric Definitions
Click to view detailed descriptions of all metrics
Safety Critical Metrics
Test cases for CDR (Crisis Detection Rate), RPR (Resource Provision Rate), and HRR (Harmful Response Rate)
No entries yet. Add your first test case!