-
Notifications
You must be signed in to change notification settings - Fork 224
enable perf_evaluation notifier #4090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2b92dec to
3b2a7d7
Compare
7b7748b to
c7497b8
Compare
c7497b8 to
07864a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces a new performance evaluation notifier for the LISA testing framework. The notifier validates performance test results against predefined criteria and can optionally fail tests when performance targets are not met.
Key Changes:
- Implements a comprehensive performance evaluation system with support for both JSON and YAML criteria files
- Adds flexible VM size pattern matching with fuzzy matching capabilities for Azure VM naming conventions
- Introduces hierarchical criteria format with group-based condition matching for test cases and VM sizes
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| lisa/notifiers/perfevaluation/perfevaluation.py | Main implementation of the PerfEvaluation notifier with metric evaluation logic, criteria loading, and test result modification capabilities |
| lisa/notifiers/perfevaluation/perf_tcp_ntttcp_sriov_criteria.json | Performance criteria definitions for TCP NTTTCP SRIOV tests across various VM sizes with detailed metrics for throughput, latency, and network performance |
| lisa/notifiers/perfevaluation/perf_nvme_criteria.yml | YAML-based hierarchical criteria for NVMe storage performance tests with group-based conditions for specific VM sizes |
| lisa/notifiers/perfevaluation/init.py | Package initialization exposing public classes (PerfEvaluation, PerfEvaluationSchema, MetricCriteria) |
| lisa/mixin_modules.py | Registration of the new perfevaluation notifier module for plugin system integration |
| docs/run_test/runbook.rst | Comprehensive documentation for the perfevaluation notifier including usage examples, parameters, and configuration formats |
Test Suggestions (per LISA Testing Guidelines):
Key Test Cases:
Since this PR adds a new notifier component without modifying existing test execution logic, comprehensive unit tests should be added rather than integration tests. The notifier functionality should be validated through:
- Unit tests for the PerfEvaluation class (suggested in selftests/test_perfevaluation.py)
- Pattern matching validation tests
- Criteria loading and parsing tests
Impacted LISA Features:
This change does not directly impact existing LISA features but adds a new capability for performance evaluation and reporting.
Tested Azure Marketplace Images:
No specific Azure marketplace image testing is required for this PR, as it introduces infrastructure code (a notifier) rather than platform-specific test cases. The notifier will be exercised indirectly when performance tests run on any image that generates UnifiedPerfMessage results.
| def _get_hierarchical_criteria( | ||
| self, test_case_name: str, vm_size: str, metric_name: str | ||
| ) -> Optional[MetricCriteria]: | ||
| """Get criteria from hierarchical data using condition matching""" | ||
| if not hasattr(self, "_hierarchical_data"): | ||
| self._log.debug( | ||
| f"[DEBUG] No hierarchical data found for {test_case_name}.{metric_name}" | ||
| ) | ||
| return None | ||
|
|
||
| self._log.debug( | ||
| f"[DEBUG] Searching criteria for {test_case_name}.{metric_name}, " | ||
| f"VM: {vm_size}, Total groups: {len(self._hierarchical_data)}" | ||
| ) | ||
|
|
||
| for idx, group in enumerate(self._hierarchical_data): | ||
| self._log.debug( | ||
| f"[DEBUG] Checking group {idx}: {group.get('name', 'Unnamed')}" | ||
| ) | ||
| self._log.debug(f"[DEBUG] Group conditions: {group['conditions']}") | ||
|
|
||
| if self._matches_group_conditions( | ||
| group["conditions"], test_case_name, vm_size | ||
| ): | ||
| self._log.debug( | ||
| f"[DEBUG] Group {idx} conditions matched! " | ||
| f"Metrics count: {len(group['metrics'])}" | ||
| ) | ||
| self._log.debug( | ||
| f"[DEBUG] Available metrics in group: " | ||
| f"{list(group['metrics'].keys())}" | ||
| ) | ||
|
|
||
| if metric_name in group["metrics"]: | ||
| self._log.debug( | ||
| f"[DEBUG] Metric {metric_name} found in group {idx}!" | ||
| ) | ||
| metric_data = group["metrics"][metric_name] | ||
| # criteria is already a MetricCriteria object, return it | ||
| criteria = metric_data["criteria"] | ||
| assert isinstance(criteria, MetricCriteria) | ||
| return criteria | ||
| else: | ||
| self._log.debug( | ||
| f"[DEBUG] Metric {metric_name} NOT found in group {idx}" | ||
| ) | ||
| else: | ||
| self._log.debug(f"[DEBUG] Group {idx} conditions did NOT match") | ||
|
|
||
| self._log.debug( | ||
| f"[DEBUG] No matching group found for " | ||
| f"{test_case_name}.{metric_name} with VM {vm_size}" | ||
| ) | ||
| return None |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Potential maintainability issue: Excessive use of debug logging. Lines 401-449 contain numerous debug log statements (15+ debug calls) that clutter the code and make it harder to read. While debugging information is valuable, this level of verbosity should be reduced for production code.
Consider:
- Removing or consolidating redundant debug statements
- Using a single comprehensive debug log at the end of the method summarizing the search result
- Making detailed debug logging conditional on a debug flag
Example refactor:
def _get_hierarchical_criteria(
self, test_case_name: str, vm_size: str, metric_name: str
) -> Optional[MetricCriteria]:
"""Get criteria from hierarchical data using condition matching"""
if not hasattr(self, "_hierarchical_data"):
return None
for idx, group in enumerate(self._hierarchical_data):
if self._matches_group_conditions(group["conditions"], test_case_name, vm_size):
if metric_name in group["metrics"]:
self._log.debug(
f"Found criteria for {test_case_name}.{metric_name} "
f"(VM: {vm_size}) in group '{group.get('name', 'Unnamed')}'"
)
return group["metrics"][metric_name]["criteria"]
self._log.debug(
f"No criteria found for {test_case_name}.{metric_name} (VM: {vm_size})"
)
return None| # Single file | ||
| criteria_path = Path(criteria_pattern) | ||
| if not criteria_path.is_absolute(): | ||
| criteria_path = package_dir / criteria_pattern | ||
|
|
||
| if criteria_path.exists(): | ||
| return [criteria_path] | ||
| else: | ||
| self._log.debug(f"Criteria file not found: {criteria_path}") | ||
| return [] |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation for file path. The method directly uses Path(criteria_pattern) without validating that it doesn't contain path traversal sequences (e.g., ../../../etc/passwd). While this is mitigated by the fact that non-absolute paths are resolved relative to the package directory (line 198), it's better to explicitly validate or sanitize the input.
Consider adding validation:
# Single file
criteria_path = Path(criteria_pattern)
# Validate no path traversal attempts
if ".." in criteria_path.parts:
self._log.warning(f"Invalid criteria path (path traversal detected): {criteria_pattern}")
return []
if not criteria_path.is_absolute():
criteria_path = package_dir / criteria_pattern| criteria_file | ||
| ''''''''''''' | ||
| type: str, optional, default: "*_criteria.yml" | ||
| Path or glob pattern to YAML files containing performance criteria. | ||
|
|
||
| criteria | ||
| '''''''' | ||
| type: dict, optional, default: None | ||
|
|
||
| Direct criteria definition in runbook. Takes priority over criteria_file. | ||
|
|
||
| output_file | ||
| ''''''''''' | ||
| type: str, optional, default: None | ||
|
|
||
| Output path for detailed evaluation results in JSON format. | ||
|
|
||
| fail_test_on_performance_failure | ||
| '''''''''''''''''''''''''''''''' | ||
| type: bool, optional, default: False |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent documentation formatting. The "Parameters:" section header uses standard formatting, but sub-parameters use different header styles. The parameter criteria_file uses 13 apostrophes (line 980), criteria uses 8 (line 986), and output_file uses 11 (line 992).
For consistency with reStructuredText formatting, all parameter subsections should use the same underline length and style. Use a consistent pattern like:
criteria_file
"""""""""""""This ensures proper rendering in Sphinx documentation.
| class PerfEvaluation(notifier.Notifier): | ||
| """ | ||
| Performance evaluation notifier that validates UnifiedPerfMessage results | ||
| against predefined criteria loaded from a JSON configuration file. |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated documentation comment. The docstring states "validates UnifiedPerfMessage results against predefined criteria loaded from a JSON configuration file" (line 133), but the implementation actually supports both JSON and YAML files (as evidenced by lines 217-220 and the default criteria_file: Optional[str] = "*_criteria.yml" on line 25).
Update the docstring to reflect the actual functionality:
"""
Performance evaluation notifier that validates UnifiedPerfMessage results
against predefined criteria loaded from JSON or YAML configuration files.
"""| against predefined criteria loaded from a JSON configuration file. | |
| against predefined criteria loaded from JSON or YAML configuration files. |
| "tolerance_percent": 10.0 | ||
| }, | ||
| "throughput_in_gbps_conn_4": { | ||
| "unit": "Gbps", |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The spacing here is inconsistent - there's a trailing space after "Gbps" on line 334, but not on lines 10, 16, etc. For consistency, remove the trailing space.
| "unit": "Gbps", | |
| "unit": "Gbps", |
| min_value: 0 | ||
| target_value: 100000 | ||
| error_threshold: 0.50 | ||
|
|
||
| - name: "receiver_cycles_per_byte_conn_1" | ||
| max_value: 100.0 | ||
| target_value: 20.0 | ||
| error_threshold: 0.50 | ||
|
|
||
| - name: "retrans_segments_conn_1" | ||
| max_value: 1000 | ||
| target_value: 0 | ||
| error_threshold: 2.0 | ||
|
|
||
| - name: "rx_packets_conn_1" | ||
| min_value: 100000 | ||
| target_value: 1000000 | ||
| error_threshold: 0.50 | ||
|
|
||
| - name: "sender_cycles_per_byte_conn_1" | ||
| max_value: 100.0 | ||
| target_value: 20.0 | ||
| error_threshold: 0.50 | ||
|
|
||
| - name: "server_mtu_conn_1" | ||
| min_value: 1400 | ||
| target_value: 1500 | ||
| error_threshold: 0.05 | ||
|
|
||
| - name: "throughput_in_gbps_conn_1" | ||
| min_value: 0.5 | ||
| target_value: 2.0 | ||
| error_threshold: 0.30 | ||
|
|
||
| - name: "tx_packets_conn_1" | ||
| min_value: 100000 | ||
| target_value: 1000000 | ||
| error_threshold: 0.50 |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent use of integer vs float for min_value and target_value. Lines 143, 158, 168, 178 use integers while other metrics use floats. For consistency with the rest of the file, use float literals (e.g., 0.0, 100000.0, 1400.0, 1500.0).
| return False | ||
|
|
||
| # Check variant match (s, ads, etc.) - more flexible | ||
| if pat_variant not in ["*", ""] and pat_variant != "*": |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: The condition on line 726 is redundant. The expression pat_variant not in ["*", ""] and pat_variant != "*" contains redundant checks since pat_variant != "*" is already covered by pat_variant not in ["*", ""].
Simplify to: if pat_variant not in ["*", ""]:
| if pat_variant not in ["*", ""] and pat_variant != "*": | |
| if pat_variant not in ["*", ""]: |
| def get_statistics_config( | ||
| self, test_case_name: str, vm_size: str | ||
| ) -> Dict[str, Any]: | ||
| """Get statistics configuration for a test case and VM size""" |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for this public method. The method implements a plugin hook and should have a docstring explaining its purpose, parameters, and return value. For example:
def get_statistics_config(
self, test_case_name: str, vm_size: str
) -> Dict[str, Any]:
"""
Get statistics configuration for a test case and VM size.
Searches hierarchical data for matching group conditions and returns
the statistics configuration (times, type, threshold).
Args:
test_case_name: Name of the test case
vm_size: VM size identifier
Returns:
Dictionary containing statistics_times, statistics_type, and error_threshold
"""| """Get statistics configuration for a test case and VM size""" | |
| """ | |
| Get statistics configuration for a test case and VM size. | |
| Searches hierarchical group data for matching conditions based on the | |
| provided test case name and VM size. If a matching group is found, | |
| returns its statistics configuration (statistics_times, statistics_type, | |
| error_threshold). If no group matches, falls back to global runbook | |
| overrides or default values. | |
| Args: | |
| test_case_name: Name of the test case to match against group conditions. | |
| vm_size: VM size identifier to match against group conditions. | |
| Returns: | |
| Dictionary containing: | |
| - statistics_times (int): Number of times to run statistics. | |
| - statistics_type (str): Type of statistics to use (e.g., "average"). | |
| - error_threshold (float): Acceptable error threshold. | |
| """ |
| for flex_pattern in flexible_patterns: | ||
| try: | ||
| if re.match(f"^{flex_pattern}$", vm_size, re.IGNORECASE): | ||
| self._log.debug( | ||
| f"Fuzzy match: '{vm_size}' matched '{pattern}' " | ||
| f"using pattern '{flex_pattern}'" | ||
| ) | ||
| return True | ||
| if re.match(f"^{flex_pattern}$", vm_size_lower, re.IGNORECASE): | ||
| self._log.debug( | ||
| f"Fuzzy match: '{vm_size}' matched '{pattern}' " | ||
| f"using lowercase pattern '{flex_pattern}'" | ||
| ) | ||
| return True | ||
| except re.error: | ||
| # If regex fails, continue to next pattern | ||
| continue | ||
|
|
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential performance issue: Multiple regex compilation within a loop. In the _fuzzy_match_vm_size method, re.match is called inside a loop for multiple patterns (lines 677, 683). Each call to re.match compiles the regex pattern on-the-fly.
For better performance, consider pre-compiling the regex patterns before the loop, especially if this method is called frequently during performance evaluation:
def _fuzzy_match_vm_size(self, vm_size: str, pattern: str) -> bool:
"""Enhanced fuzzy matching for VM size patterns"""
regex_pattern = pattern.replace("*", "[^_]*")
flexible_patterns = [
regex_pattern,
regex_pattern.lower(),
regex_pattern.replace("s_", "_").replace("_s", "_"),
]
vm_size_lower = vm_size.lower()
# Pre-compile patterns for better performance
compiled_patterns = []
for flex_pattern in flexible_patterns:
try:
compiled_patterns.append(re.compile(f"^{flex_pattern}$", re.IGNORECASE))
except re.error:
continue
# Use compiled patterns
for compiled_pattern in compiled_patterns:
if compiled_pattern.match(vm_size) or compiled_pattern.match(vm_size_lower):
self._log.debug(f"Fuzzy match: '{vm_size}' matched '{pattern}'")
return True
# ... rest of method| for flex_pattern in flexible_patterns: | |
| try: | |
| if re.match(f"^{flex_pattern}$", vm_size, re.IGNORECASE): | |
| self._log.debug( | |
| f"Fuzzy match: '{vm_size}' matched '{pattern}' " | |
| f"using pattern '{flex_pattern}'" | |
| ) | |
| return True | |
| if re.match(f"^{flex_pattern}$", vm_size_lower, re.IGNORECASE): | |
| self._log.debug( | |
| f"Fuzzy match: '{vm_size}' matched '{pattern}' " | |
| f"using lowercase pattern '{flex_pattern}'" | |
| ) | |
| return True | |
| except re.error: | |
| # If regex fails, continue to next pattern | |
| continue | |
| # Pre-compile patterns for better performance | |
| compiled_patterns = [] | |
| for flex_pattern in flexible_patterns: | |
| try: | |
| compiled_patterns.append(re.compile(f"^{flex_pattern}$", re.IGNORECASE)) | |
| except re.error: | |
| # If regex fails, continue to next pattern | |
| continue | |
| for compiled_pattern in compiled_patterns: | |
| if compiled_pattern.match(vm_size): | |
| self._log.debug( | |
| f"Fuzzy match: '{vm_size}' matched '{pattern}' " | |
| f"using pattern '{compiled_pattern.pattern}'" | |
| ) | |
| return True | |
| if compiled_pattern.match(vm_size_lower): | |
| self._log.debug( | |
| f"Fuzzy match: '{vm_size}' matched '{pattern}' " | |
| f"using lowercase pattern '{compiled_pattern.pattern}'" | |
| ) | |
| return True |
No description provided.