Overview
Bulk testing allows you to evaluate your workflow's accuracy at scale by testing it against multiple cases simultaneously. This feature is essential for validating workflow performance before moving from "Agent review" to fully automated operation, ensuring your workflows trigger correctly and handle cases as expected.
Why Use Bulk Testing?
Build Confidence in Automation
- Validate workflow accuracy before going fully automated
- Test against dozens of cases in a single run
- Identify potential issues before they impact customers
Measure Performance
- Get clear metrics on true positives, false positives, and false negatives
- Understand how accurately your workflow triggers
- Track improvements after making changes
Regression Testing
- Ensure changes don't break existing functionality
- Validate workflow updates before publishing
Getting Started
Step 1: Build Your Test Case Library
Before running bulk tests, you need to collect test cases. There are two ways to do this:
Method 1: Add Cases During QA Review
- Navigate to Quality > QA Review or the Monitor page
- When reviewing tickets, look for cases that should or shouldn't have triggered your workflow
- Click "Add test case" next to relevant tickets
- Specify which workflow the case should run on (or indicate it shouldn't run on any workflow)
- The system saves these ticket IDs to your workflow's test case library
Method 2: Add Cases from Monitor Page
- Go to the Monitor page
- Review recent workflow executions
- Identify cases that were correctly or incorrectly handled
- Add them as test cases using the same process above
Step 2: Access Bulk Testing
- Navigate to Workflows in your main menu
- Select the workflow you want to test
- Click on the "Test" tab
- You'll see the bulk evaluation interface
Improving Your Workflow with AI Suggestions
Before diving into bulk testing, consider using the "Get AI suggestions" feature to improve your workflow's trigger conditions:
How AI Suggestions Work
- Access the Feature: In your workflow's trigger prompt section, click "Get AI suggestions"
- Provide Context: You can optionally provide details about cases where your workflow is over-triggering or under-triggering
- Get Recommendations: The system analyzes your current trigger prompt and provides specific improvements
- Review Changes: You'll see a detailed analysis with recommended additions and deletions to your trigger conditions
When to Use AI Suggestions
- Over-triggering: Your workflow runs on cases it shouldn't handle
- Under-triggering: Your workflow misses cases it should handle
- New workflows: Getting help with initial trigger prompt creation
- After bulk test results: Using failed cases as examples to improve triggers
Example Usage
If your bulk test shows false positives, you can:
- Click "Get AI suggestions"
- In the context box, write: "This workflow is incorrectly triggering on cases about shipping delays, but it should only handle order cancellations"
- Apply the recommended improvements
- Run another bulk test to validate the changes
Configuring Your Bulk Test
True Positives (Cases Your Workflow Should Handle)
These are cases where your workflow should trigger and handle the customer inquiry.
Random Sample Option:
- Select "Random sample from assigned test cases"
- Set your sample size (recommended: 10-30 cases for initial testing)
- The system will randomly select from cases you've marked as appropriate for this workflow
Specific Cases Option:
- Choose "Specific case IDs"
- Enter the exact ticket IDs you want to test
- Useful for testing specific edge cases or scenarios
True Negatives (Cases Your Workflow Should NOT Handle)
These are cases where your workflow should not trigger.
Random Sample Options:
- "Random sample from test cases not assigned to this workflow" - Tests against a broad set of cases
- "Random sample of test cases assigned to other workflows" - Tests against cases meant for other workflows
- Set your sample size (recommended: 10-30 cases initially)
Specific Cases Option:
- Enter specific ticket IDs that should not trigger your workflow
- Helpful for testing known edge cases that should be excluded
Test Parameters
Other Workflows to Test Against:
- Select which other workflows should be included in the test
- This helps ensure your workflow doesn't conflict with others
Interaction Type:
- Choose whether to test from the initial customer interaction
- Most tests should use the initial interaction
Draft Mode:
- Enable this to test draft versions before publishing
- Useful for validating changes before they go live
Running the Test
- Configure Your Test Settings as described above
- Start Small: Begin with 10-20 test cases per category for your first tests
- Click "Test" to start the bulk evaluation
- Wait for Results - The process typically takes about 10 minutes to complete
- Review the Results once the test finishes
Understanding Your Results
Key Metrics
True Positives (TP):
- Cases where your workflow correctly triggered when it should have
- Higher numbers indicate better performance
True Negatives (TN):
- Cases where your workflow correctly did NOT trigger when it shouldn't have
- Higher numbers indicate good selectivity
False Positives (FP):
- Cases where your workflow incorrectly triggered when it shouldn't have
- Lower numbers are better - these represent over-triggering
False Negatives (FN):
- Cases where your workflow failed to trigger when it should have
- Lower numbers are better - these represent missed opportunities
Calculating Accuracy
While the system shows raw counts, you can calculate key performance metrics:
Overall Accuracy = (TP + TN) / (TP + TN + FP + FN)
- Target accuracy varies by workflow type and business requirements
- Consult with your Customer Success Manager to determine appropriate thresholds
Precision = TP / (TP + FP)
- Measures how often your workflow is correct when it triggers
Recall = TP / (TP + FN)
- Measures how well your workflow catches relevant cases
Best Practices
Building Effective Test Cases
Cover Multiple Scenarios:
- Include normal cases, edge cases, and error scenarios
- Add regional or country-specific variations if applicable
- Test different customer language styles and complexity levels
Start Small, Build Gradually:
- Begin with 5-10 test cases per category
- Add more cases as you encounter them during regular QA review
- Build your test library organically over time
Document Expected Outcomes:
- Keep notes on why certain cases should or shouldn't trigger your workflow
- This helps when reviewing results and making improvements
When to Run Bulk Tests
Before Publishing Changes:
- Always test workflow modifications before they go live
- Target accuracy varies by use case - discuss appropriate thresholds with your team
Periodic Performance Checks:
- Consider monthly tests for critical workflows
- Check after significant changes to your knowledge base or business processes
Before Going Fully Automated:
- More comprehensive testing is crucial when moving from agent review to automation
- Test with larger sample sizes (30-50 cases) for high-confidence validation
Interpreting and Acting on Results
High False Positive Rate:
- Your workflow may be triggering too broadly
- Use the "Get AI suggestions" feature with examples of incorrectly triggered cases
- Consider adding more specific exclusion criteria
High False Negative Rate:
- Your workflow may be too restrictive
- Use "Get AI suggestions" with examples of missed cases
- Review and potentially broaden trigger conditions
Low Overall Accuracy:
- May indicate fundamental issues with workflow design
- Consider redesigning trigger logic or splitting into multiple workflows
- Review your training data and knowledge sources
Troubleshooting Common Issues
"No Test Cases Available"
- You need to add test cases first through QA Review or Monitor pages
- Start by identifying 5-10 cases that should trigger your workflow
"Test Taking Too Long"
- Large sample sizes can take 15-20 minutes
- Consider testing with smaller samples initially (20-30 cases)
"Unexpected Results"
- Review individual cases that performed poorly
- Use the "Get AI suggestions" feature to improve trigger conditions
- Verify test cases are correctly categorized
Advanced Testing Strategies
Progressive Testing
- Start with a small, high-confidence test set (10-15 cases)
- Achieve good accuracy on this set (consult your team for targets)
- Gradually expand to larger, more diverse test sets
- Maintain consistent performance as you scale
Using AI Suggestions Iteratively
- Run initial bulk test
- Identify patterns in false positives/negatives
- Use "Get AI suggestions" with specific examples
- Apply recommended changes
- Re-test with the same cases to validate improvements
A/B Testing Workflow Changes
- Create two versions of your workflow
- Test both against the same set of cases
- Compare results to determine which performs better
Getting Help
If you encounter issues with bulk testing or need assistance interpreting results:
- Contact your Customer Success Manager
- Review individual failed cases to understand patterns
- Use the "Get AI suggestions" feature with specific problem examples
Remember: Bulk testing is your safety net for automation. Start small, build your test library gradually, and use AI suggestions to continuously improve your workflow's accuracy.
Comments
0 comments
Article is closed for comments.