Show HN: E2E Testing for Chatbots
github.comHi HN,
Tired of shipping chatbot features based on gut feelings and "it seems to work" manual testing? We built SigmaEval, an open-source Python library that brings statistical rigor to testing conversational AI.
Instead of simple pass/fail checks, SigmaEval uses an AI User Simulator and an AI Judge to let you make data-driven statements like: "We're confident that at least 90% of user issues will be resolved with a quality score of 8/10 or higher." This allows you to set and enforce objective quality bars for your AI's behavior, response latency, and more, directly within your existing Pytest/Unittest suites.
It's built on top of LiteLLM to support 100+ LLM providers and is licensed under Apache 2.0. We just launched and would love to get your feedback.