Workflow bulk testing guide – Assembled

Overview

Bulk testing allows you to evaluate your workflow's accuracy at scale by testing it against multiple cases simultaneously. This feature is essential for validating workflow performance before moving from "Agent review" to fully automated operation, ensuring your workflows trigger correctly and handle cases as expected.

Why Use Bulk Testing?

Build Confidence in Automation

Validate workflow accuracy before going fully automated
Test against dozens of cases in a single run
Identify potential issues before they impact customers

Measure Performance

Get clear metrics on true positives, false positives, and false negatives
Understand how accurately your workflow triggers
Track improvements after making changes

Regression Testing

Ensure changes don't break existing functionality
Validate workflow updates before publishing

Getting Started

Step 1: Build Your Test Case Library

Before running bulk tests, you need to collect test cases. There are two ways to do this:

Method 1: Add Cases During QA Review

Navigate to Quality > QA Review or the Monitor page
When reviewing tickets, look for cases that should or shouldn't have triggered your workflow
Click "Add test case" next to relevant tickets
Specify which workflow the case should run on (or indicate it shouldn't run on any workflow)
The system saves these ticket IDs to your workflow's test case library

Method 2: Add Cases from Monitor Page

Go to the Monitor page
Review recent workflow executions
Identify cases that were correctly or incorrectly handled
Add them as test cases using the same process above

Step 2: Access Bulk Testing

Navigate to Workflows in your main menu
Select the workflow you want to test
Click on the "Test" tab
You'll see the bulk evaluation interface

Improving Your Workflow with AI Suggestions

Before diving into bulk testing, consider using the "Get AI suggestions" feature to improve your workflow's trigger conditions:

How AI Suggestions Work

Access the Feature: In your workflow's trigger prompt section, click "Get AI suggestions"
Provide Context: You can optionally provide details about cases where your workflow is over-triggering or under-triggering
Get Recommendations: The system analyzes your current trigger prompt and provides specific improvements
Review Changes: You'll see a detailed analysis with recommended additions and deletions to your trigger conditions

When to Use AI Suggestions

Over-triggering: Your workflow runs on cases it shouldn't handle
Under-triggering: Your workflow misses cases it should handle
New workflows: Getting help with initial trigger prompt creation
After bulk test results: Using failed cases as examples to improve triggers

Example Usage

If your bulk test shows false positives, you can:

Click "Get AI suggestions"
In the context box, write: "This workflow is incorrectly triggering on cases about shipping delays, but it should only handle order cancellations"
Apply the recommended improvements
Run another bulk test to validate the changes

Configuring Your Bulk Test

True Positives (Cases Your Workflow Should Handle)

These are cases where your workflow should trigger and handle the customer inquiry.

Random Sample Option:

Select "Random sample from assigned test cases"
Set your sample size (recommended: 10-30 cases for initial testing)
The system will randomly select from cases you've marked as appropriate for this workflow

Specific Cases Option:

Choose "Specific case IDs"
Enter the exact ticket IDs you want to test
Useful for testing specific edge cases or scenarios

True Negatives (Cases Your Workflow Should NOT Handle)

These are cases where your workflow should not trigger.

Random Sample Options:

"Random sample from test cases not assigned to this workflow" - Tests against a broad set of cases
"Random sample of test cases assigned to other workflows" - Tests against cases meant for other workflows
Set your sample size (recommended: 10-30 cases initially)

Specific Cases Option:

Enter specific ticket IDs that should not trigger your workflow
Helpful for testing known edge cases that should be excluded

Test Parameters

Other Workflows to Test Against:

Select which other workflows should be included in the test
This helps ensure your workflow doesn't conflict with others

Interaction Type:

Choose whether to test from the initial customer interaction
Most tests should use the initial interaction

Draft Mode:

Enable this to test draft versions before publishing
Useful for validating changes before they go live

Running the Test

Configure Your Test Settings as described above
Start Small: Begin with 10-20 test cases per category for your first tests
Click "Test" to start the bulk evaluation
Wait for Results - The process typically takes about 10 minutes to complete
Review the Results once the test finishes

Understanding Your Results

Key Metrics

True Positives (TP):

Cases where your workflow correctly triggered when it should have
Higher numbers indicate better performance

True Negatives (TN):

Cases where your workflow correctly did NOT trigger when it shouldn't have
Higher numbers indicate good selectivity

False Positives (FP):

Cases where your workflow incorrectly triggered when it shouldn't have
Lower numbers are better - these represent over-triggering

False Negatives (FN):

Cases where your workflow failed to trigger when it should have
Lower numbers are better - these represent missed opportunities

Calculating Accuracy

While the system shows raw counts, you can calculate key performance metrics:

Overall Accuracy = (TP + TN) / (TP + TN + FP + FN)

Target accuracy varies by workflow type and business requirements
Consult with your Customer Success Manager to determine appropriate thresholds

Precision = TP / (TP + FP)

Measures how often your workflow is correct when it triggers

Recall = TP / (TP + FN)

Measures how well your workflow catches relevant cases

Best Practices

Building Effective Test Cases

Cover Multiple Scenarios:

Include normal cases, edge cases, and error scenarios
Add regional or country-specific variations if applicable
Test different customer language styles and complexity levels

Start Small, Build Gradually:

Begin with 5-10 test cases per category
Add more cases as you encounter them during regular QA review
Build your test library organically over time

Document Expected Outcomes:

Keep notes on why certain cases should or shouldn't trigger your workflow
This helps when reviewing results and making improvements

When to Run Bulk Tests

Before Publishing Changes:

Always test workflow modifications before they go live
Target accuracy varies by use case - discuss appropriate thresholds with your team

Periodic Performance Checks:

Consider monthly tests for critical workflows
Check after significant changes to your knowledge base or business processes

Before Going Fully Automated:

More comprehensive testing is crucial when moving from agent review to automation
Test with larger sample sizes (30-50 cases) for high-confidence validation

Interpreting and Acting on Results

High False Positive Rate:

Your workflow may be triggering too broadly
Use the "Get AI suggestions" feature with examples of incorrectly triggered cases
Consider adding more specific exclusion criteria

High False Negative Rate:

Your workflow may be too restrictive
Use "Get AI suggestions" with examples of missed cases
Review and potentially broaden trigger conditions

Low Overall Accuracy:

May indicate fundamental issues with workflow design
Consider redesigning trigger logic or splitting into multiple workflows
Review your training data and knowledge sources

Troubleshooting Common Issues

"No Test Cases Available"

You need to add test cases first through QA Review or Monitor pages
Start by identifying 5-10 cases that should trigger your workflow

"Test Taking Too Long"

Large sample sizes can take 15-20 minutes
Consider testing with smaller samples initially (20-30 cases)

"Unexpected Results"

Review individual cases that performed poorly
Use the "Get AI suggestions" feature to improve trigger conditions
Verify test cases are correctly categorized

Advanced Testing Strategies

Progressive Testing

Start with a small, high-confidence test set (10-15 cases)
Achieve good accuracy on this set (consult your team for targets)
Gradually expand to larger, more diverse test sets
Maintain consistent performance as you scale

Using AI Suggestions Iteratively

Run initial bulk test
Identify patterns in false positives/negatives
Use "Get AI suggestions" with specific examples
Apply recommended changes
Re-test with the same cases to validate improvements

A/B Testing Workflow Changes

Create two versions of your workflow
Test both against the same set of cases
Compare results to determine which performs better

Getting Help

If you encounter issues with bulk testing or need assistance interpreting results:

Contact your Customer Success Manager
Review individual failed cases to understand patterns
Use the "Get AI suggestions" feature with specific problem examples

Remember: Bulk testing is your safety net for automation. Start small, build your test library gradually, and use AI suggestions to continuously improve your workflow's accuracy.