Key Accountabilities
- Definition and execution of testing and quality assurance strategies for AI‑enabled workflows
- Continuous evaluation and monitoring of system behavior in production environments
- Contribution to auditability, risk management, and continuous quality improvement
Principal Responsibilities
- Define quality criteria and testing strategies for agent workflows, covering accuracy, latency, safety, compliance, and operational risk
- Build automated evaluation harnesses to assess agent performance, including hallucination rates, tool misuse, policy violations, and task success
- Implement continuous production monitoring to detect anomalies, quality degradation, and emerging safety concerns
- Develop and maintain automated test suites using Playwright for UI testing and custom scripts for API and workflow validation
- Apply LLM evaluation frameworks to assess output quality, regression, and system drift over time
- Produce and maintain dashboards and reports that communicate quality metrics and trends to engineering and stakeholders
- Develop and maintain runbooks for common failure modes and contribute to incident response activities
- Collaborate closely with developers to improve prompts, tool definitions, and workflow designs based on test results
- Ensure testing, logging, and monitoring practices align with data privacy, audit, and regulatory requirements
-