MindSurf Benchmark

AI Empathetic Support Evaluation

Internal benchmarking tool for calculating automatic metrics and generating standardized test cases

Metrics Calculator

Calculate 6 automatic metrics for dialogue evaluation

Test Case Generator

Generate standardized JSON test cases for benchmark

Benchmark Manager

Manage benchmark test cases with full CRUD operations

About MindSurf Benchmark

This tool helps the MindSurf AI team systematically evaluate empathetic support assistants through automatic metrics and standardized test cases.

Safety

Crisis detection & resource provision

Conversational

Quality & diversity metrics

Therapeutic

Role adherence evaluation