enable perf_evaluation notifier #4090

LiliDeng · 2025-11-03T16:09:19Z

No description provided.

lisa/notifiers/perfevaluation/perfevaluation.py

lisa/notifiers/perfevaluation/perf_nvme_criteria.json

lisa/notifiers/perfevaluation/perfevaluation.py

Copilot

Pull request overview

This pull request introduces a new performance evaluation notifier for the LISA testing framework. The notifier validates performance test results against predefined criteria and can optionally fail tests when performance targets are not met.

Key Changes:

Implements a comprehensive performance evaluation system with support for both JSON and YAML criteria files
Adds flexible VM size pattern matching with fuzzy matching capabilities for Azure VM naming conventions
Introduces hierarchical criteria format with group-based condition matching for test cases and VM sizes

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
lisa/notifiers/perfevaluation/perfevaluation.py	Main implementation of the PerfEvaluation notifier with metric evaluation logic, criteria loading, and test result modification capabilities
lisa/notifiers/perfevaluation/perf_tcp_ntttcp_sriov_criteria.json	Performance criteria definitions for TCP NTTTCP SRIOV tests across various VM sizes with detailed metrics for throughput, latency, and network performance
lisa/notifiers/perfevaluation/perf_nvme_criteria.yml	YAML-based hierarchical criteria for NVMe storage performance tests with group-based conditions for specific VM sizes
lisa/notifiers/perfevaluation/init.py	Package initialization exposing public classes (PerfEvaluation, PerfEvaluationSchema, MetricCriteria)
lisa/mixin_modules.py	Registration of the new perfevaluation notifier module for plugin system integration
docs/run_test/runbook.rst	Comprehensive documentation for the perfevaluation notifier including usage examples, parameters, and configuration formats

Test Suggestions (per LISA Testing Guidelines):

Key Test Cases:
Since this PR adds a new notifier component without modifying existing test execution logic, comprehensive unit tests should be added rather than integration tests. The notifier functionality should be validated through:

Unit tests for the PerfEvaluation class (suggested in selftests/test_perfevaluation.py)
Pattern matching validation tests
Criteria loading and parsing tests

Impacted LISA Features:
This change does not directly impact existing LISA features but adds a new capability for performance evaluation and reporting.

Tested Azure Marketplace Images:
No specific Azure marketplace image testing is required for this PR, as it introduces infrastructure code (a notifier) rather than platform-specific test cases. The notifier will be exercised indirectly when performance tests run on any image that generates UnifiedPerfMessage results.

Copilot · 2025-12-10T07:08:14Z

lisa/notifiers/perfevaluation/perfevaluation.py

+    def _get_hierarchical_criteria(
+        self, test_case_name: str, vm_size: str, metric_name: str
+    ) -> Optional[MetricCriteria]:
+        """Get criteria from hierarchical data using condition matching"""
+        if not hasattr(self, "_hierarchical_data"):
+            self._log.debug(
+                f"[DEBUG] No hierarchical data found for {test_case_name}.{metric_name}"
+            )
+            return None
+
+        self._log.debug(
+            f"[DEBUG] Searching criteria for {test_case_name}.{metric_name}, "
+            f"VM: {vm_size}, Total groups: {len(self._hierarchical_data)}"
+        )
+
+        for idx, group in enumerate(self._hierarchical_data):
+            self._log.debug(
+                f"[DEBUG] Checking group {idx}: {group.get('name', 'Unnamed')}"
+            )
+            self._log.debug(f"[DEBUG] Group conditions: {group['conditions']}")
+
+            if self._matches_group_conditions(
+                group["conditions"], test_case_name, vm_size
+            ):
+                self._log.debug(
+                    f"[DEBUG] Group {idx} conditions matched! "
+                    f"Metrics count: {len(group['metrics'])}"
+                )
+                self._log.debug(
+                    f"[DEBUG] Available metrics in group: "
+                    f"{list(group['metrics'].keys())}"
+                )
+
+                if metric_name in group["metrics"]:
+                    self._log.debug(
+                        f"[DEBUG] Metric {metric_name} found in group {idx}!"
+                    )
+                    metric_data = group["metrics"][metric_name]
+                    # criteria is already a MetricCriteria object, return it
+                    criteria = metric_data["criteria"]
+                    assert isinstance(criteria, MetricCriteria)
+                    return criteria
+                else:
+                    self._log.debug(
+                        f"[DEBUG] Metric {metric_name} NOT found in group {idx}"
+                    )
+            else:
+                self._log.debug(f"[DEBUG] Group {idx} conditions did NOT match")
+
+        self._log.debug(
+            f"[DEBUG] No matching group found for "
+            f"{test_case_name}.{metric_name} with VM {vm_size}"
+        )
+        return None


[nitpick] Potential maintainability issue: Excessive use of debug logging. Lines 401-449 contain numerous debug log statements (15+ debug calls) that clutter the code and make it harder to read. While debugging information is valuable, this level of verbosity should be reduced for production code.

Consider:

Removing or consolidating redundant debug statements

Using a single comprehensive debug log at the end of the method summarizing the search result

Making detailed debug logging conditional on a debug flag

Example refactor:

def _get_hierarchical_criteria( self, test_case_name: str, vm_size: str, metric_name: str ) -> Optional[MetricCriteria]: """Get criteria from hierarchical data using condition matching""" if not hasattr(self, "_hierarchical_data"): return None for idx, group in enumerate(self._hierarchical_data): if self._matches_group_conditions(group["conditions"], test_case_name, vm_size): if metric_name in group["metrics"]: self._log.debug( f"Found criteria for {test_case_name}.{metric_name} " f"(VM: {vm_size}) in group '{group.get('name', 'Unnamed')}'" ) return group["metrics"][metric_name]["criteria"] self._log.debug( f"No criteria found for {test_case_name}.{metric_name} (VM: {vm_size})" ) return None

Copilot · 2025-12-10T07:08:15Z

lisa/notifiers/perfevaluation/perfevaluation.py

+            # Single file
+            criteria_path = Path(criteria_pattern)
+            if not criteria_path.is_absolute():
+                criteria_path = package_dir / criteria_pattern
+
+            if criteria_path.exists():
+                return [criteria_path]
+            else:
+                self._log.debug(f"Criteria file not found: {criteria_path}")
+                return []


Missing input validation for file path. The method directly uses Path(criteria_pattern) without validating that it doesn't contain path traversal sequences (e.g., ../../../etc/passwd). While this is mitigated by the fact that non-absolute paths are resolved relative to the package directory (line 198), it's better to explicitly validate or sanitize the input.

Consider adding validation:

# Single file criteria_path = Path(criteria_pattern) # Validate no path traversal attempts if ".." in criteria_path.parts: self._log.warning(f"Invalid criteria path (path traversal detected): {criteria_pattern}") return [] if not criteria_path.is_absolute(): criteria_path = package_dir / criteria_pattern

Copilot · 2025-12-10T07:08:15Z

docs/run_test/runbook.rst

+criteria_file
+'''''''''''''
+type: str, optional, default: "*_criteria.yml"
+
+Path or glob pattern to YAML files containing performance criteria.
+
+criteria
+''''''''
+type: dict, optional, default: None
+
+Direct criteria definition in runbook. Takes priority over criteria_file.
+
+output_file
+'''''''''''
+type: str, optional, default: None
+
+Output path for detailed evaluation results in JSON format.
+
+fail_test_on_performance_failure
+''''''''''''''''''''''''''''''''
+type: bool, optional, default: False


Inconsistent documentation formatting. The "Parameters:" section header uses standard formatting, but sub-parameters use different header styles. The parameter criteria_file uses 13 apostrophes (line 980), criteria uses 8 (line 986), and output_file uses 11 (line 992).

For consistency with reStructuredText formatting, all parameter subsections should use the same underline length and style. Use a consistent pattern like:

criteria_file """""""""""""

This ensures proper rendering in Sphinx documentation.

Copilot · 2025-12-10T07:08:15Z

lisa/notifiers/perfevaluation/perfevaluation.py

+class PerfEvaluation(notifier.Notifier):
+    """
+    Performance evaluation notifier that validates UnifiedPerfMessage results
+    against predefined criteria loaded from a JSON configuration file.


Outdated documentation comment. The docstring states "validates UnifiedPerfMessage results against predefined criteria loaded from a JSON configuration file" (line 133), but the implementation actually supports both JSON and YAML files (as evidenced by lines 217-220 and the default criteria_file: Optional[str] = "*_criteria.yml" on line 25).

Update the docstring to reflect the actual functionality:

""" Performance evaluation notifier that validates UnifiedPerfMessage results against predefined criteria loaded from JSON or YAML configuration files. """

Suggested change

against predefined criteria loaded from a JSON configuration file.

against predefined criteria loaded from JSON or YAML configuration files.

Copilot · 2025-12-10T07:08:16Z

lisa/notifiers/perfevaluation/perf_tcp_ntttcp_sriov_criteria.json

+          "tolerance_percent": 10.0
+        },
+        "throughput_in_gbps_conn_4": {
+          "unit": "Gbps", 


[nitpick] The spacing here is inconsistent - there's a trailing space after "Gbps" on line 334, but not on lines 10, 16, etc. For consistency, remove the trailing space.

Suggested change

"unit": "Gbps",

"unit": "Gbps",

Copilot · 2025-12-10T07:08:18Z

lisa/notifiers/perfevaluation/perf_nvme_criteria.yml

+        min_value: 0
+        target_value: 100000
+        error_threshold: 0.50
+
+      - name: "receiver_cycles_per_byte_conn_1"
+        max_value: 100.0
+        target_value: 20.0
+        error_threshold: 0.50
+
+      - name: "retrans_segments_conn_1"
+        max_value: 1000
+        target_value: 0
+        error_threshold: 2.0
+
+      - name: "rx_packets_conn_1"
+        min_value: 100000
+        target_value: 1000000
+        error_threshold: 0.50
+
+      - name: "sender_cycles_per_byte_conn_1"
+        max_value: 100.0
+        target_value: 20.0
+        error_threshold: 0.50
+
+      - name: "server_mtu_conn_1"
+        min_value: 1400
+        target_value: 1500
+        error_threshold: 0.05
+
+      - name: "throughput_in_gbps_conn_1"
+        min_value: 0.5
+        target_value: 2.0
+        error_threshold: 0.30
+
+      - name: "tx_packets_conn_1"
+        min_value: 100000
+        target_value: 1000000
+        error_threshold: 0.50


[nitpick] Inconsistent use of integer vs float for min_value and target_value. Lines 143, 158, 168, 178 use integers while other metrics use floats. For consistency with the rest of the file, use float literals (e.g., 0.0, 100000.0, 1400.0, 1500.0).

Copilot · 2025-12-10T07:08:19Z

lisa/notifiers/perfevaluation/perfevaluation.py

+            return False
+
+        # Check variant match (s, ads, etc.) - more flexible
+        if pat_variant not in ["*", ""] and pat_variant != "*":


Potential bug: The condition on line 726 is redundant. The expression pat_variant not in ["*", ""] and pat_variant != "*" contains redundant checks since pat_variant != "*" is already covered by pat_variant not in ["*", ""].

Simplify to: if pat_variant not in ["*", ""]:

Suggested change

if pat_variant not in ["*", ""] and pat_variant != "*":

if pat_variant not in ["*", ""]:

Copilot · 2025-12-10T07:08:19Z

lisa/notifiers/perfevaluation/perfevaluation.py

+    def get_statistics_config(
+        self, test_case_name: str, vm_size: str
+    ) -> Dict[str, Any]:
+        """Get statistics configuration for a test case and VM size"""


Missing docstring for this public method. The method implements a plugin hook and should have a docstring explaining its purpose, parameters, and return value. For example:

def get_statistics_config( self, test_case_name: str, vm_size: str ) -> Dict[str, Any]: """ Get statistics configuration for a test case and VM size. Searches hierarchical data for matching group conditions and returns the statistics configuration (times, type, threshold). Args: test_case_name: Name of the test case vm_size: VM size identifier Returns: Dictionary containing statistics_times, statistics_type, and error_threshold """

Suggested change

"""Get statistics configuration for a test case and VM size"""

"""

Get statistics configuration for a test case and VM size.

Searches hierarchical group data for matching conditions based on the

provided test case name and VM size. If a matching group is found,

returns its statistics configuration (statistics_times, statistics_type,

error_threshold). If no group matches, falls back to global runbook

overrides or default values.

Args:

test_case_name: Name of the test case to match against group conditions.

vm_size: VM size identifier to match against group conditions.

Returns:

Dictionary containing:

- statistics_times (int): Number of times to run statistics.

- statistics_type (str): Type of statistics to use (e.g., "average").

- error_threshold (float): Acceptable error threshold.

"""

Copilot · 2025-12-10T07:08:19Z

lisa/notifiers/perfevaluation/perfevaluation.py

+        for flex_pattern in flexible_patterns:
+            try:
+                if re.match(f"^{flex_pattern}$", vm_size, re.IGNORECASE):
+                    self._log.debug(
+                        f"Fuzzy match: '{vm_size}' matched '{pattern}' "
+                        f"using pattern '{flex_pattern}'"
+                    )
+                    return True
+                if re.match(f"^{flex_pattern}$", vm_size_lower, re.IGNORECASE):
+                    self._log.debug(
+                        f"Fuzzy match: '{vm_size}' matched '{pattern}' "
+                        f"using lowercase pattern '{flex_pattern}'"
+                    )
+                    return True
+            except re.error:
+                # If regex fails, continue to next pattern
+                continue
+


Potential performance issue: Multiple regex compilation within a loop. In the _fuzzy_match_vm_size method, re.match is called inside a loop for multiple patterns (lines 677, 683). Each call to re.match compiles the regex pattern on-the-fly.

For better performance, consider pre-compiling the regex patterns before the loop, especially if this method is called frequently during performance evaluation:

def _fuzzy_match_vm_size(self, vm_size: str, pattern: str) -> bool: """Enhanced fuzzy matching for VM size patterns""" regex_pattern = pattern.replace("*", "[^_]*") flexible_patterns = [ regex_pattern, regex_pattern.lower(), regex_pattern.replace("s_", "_").replace("_s", "_"), ] vm_size_lower = vm_size.lower() # Pre-compile patterns for better performance compiled_patterns = [] for flex_pattern in flexible_patterns: try: compiled_patterns.append(re.compile(f"^{flex_pattern}$", re.IGNORECASE)) except re.error: continue # Use compiled patterns for compiled_pattern in compiled_patterns: if compiled_pattern.match(vm_size) or compiled_pattern.match(vm_size_lower): self._log.debug(f"Fuzzy match: '{vm_size}' matched '{pattern}'") return True # ... rest of method

Suggested change

for flex_pattern in flexible_patterns:

try:

if re.match(f"^{flex_pattern}$", vm_size, re.IGNORECASE):

self._log.debug(

f"Fuzzy match: '{vm_size}' matched '{pattern}' "

f"using pattern '{flex_pattern}'"

)

return True

if re.match(f"^{flex_pattern}$", vm_size_lower, re.IGNORECASE):

self._log.debug(

f"Fuzzy match: '{vm_size}' matched '{pattern}' "

f"using lowercase pattern '{flex_pattern}'"

)

return True

except re.error:

# If regex fails, continue to next pattern

continue

# Pre-compile patterns for better performance

compiled_patterns = []

for flex_pattern in flexible_patterns:

try:

compiled_patterns.append(re.compile(f"^{flex_pattern}$", re.IGNORECASE))

except re.error:

# If regex fails, continue to next pattern

continue

for compiled_pattern in compiled_patterns:

if compiled_pattern.match(vm_size):

self._log.debug(

f"Fuzzy match: '{vm_size}' matched '{pattern}' "

f"using pattern '{compiled_pattern.pattern}'"

)

return True

if compiled_pattern.match(vm_size_lower):

self._log.debug(

f"Fuzzy match: '{vm_size}' matched '{pattern}' "

f"using lowercase pattern '{compiled_pattern.pattern}'"

)

return True

lisa/mixin_modules.py

LiliDeng requested a review from squirrelsc as a code owner November 3, 2025 16:09

squirrelsc reviewed Nov 3, 2025

View reviewed changes

lisa/notifiers/perfevaluation/perfevaluation.py Outdated Show resolved Hide resolved

squirrelsc reviewed Nov 3, 2025

View reviewed changes

lisa/notifiers/perfevaluation/perf_nvme_criteria.json Outdated Show resolved Hide resolved

squirrelsc reviewed Nov 3, 2025

View reviewed changes

lisa/notifiers/perfevaluation/perfevaluation.py Show resolved Hide resolved

squirrelsc reviewed Nov 3, 2025

View reviewed changes

lisa/notifiers/perfevaluation/perfevaluation.py Outdated Show resolved Hide resolved

LiliDeng force-pushed the lildeng/perf_evaluation branch 4 times, most recently from 2b92dec to 3b2a7d7 Compare November 19, 2025 07:44

LiliDeng force-pushed the lildeng/perf_evaluation branch 5 times, most recently from 7b7748b to c7497b8 Compare December 5, 2025 00:19

LiliDeng requested a review from Copilot December 10, 2025 07:02

Copilot started reviewing on behalf of LiliDeng December 10, 2025 07:02 View session

enable perf_evaluation notifier

07864a5

LiliDeng force-pushed the lildeng/perf_evaluation branch from c7497b8 to 07864a5 Compare December 10, 2025 07:03

Copilot AI reviewed Dec 10, 2025

View reviewed changes

	against predefined criteria loaded from a JSON configuration file.
	against predefined criteria loaded from JSON or YAML configuration files.

	if pat_variant not in ["", ""] and pat_variant != "":
	if pat_variant not in ["*", ""]:

-        """Get statistics configuration for a test case and VM size"""
+        """
+        Get statistics configuration for a test case and VM size.
+        Searches hierarchical group data for matching conditions based on the
+        provided test case name and VM size. If a matching group is found,
+        returns its statistics configuration (statistics_times, statistics_type,
+        error_threshold). If no group matches, falls back to global runbook
+        overrides or default values.
+        Args:
+            test_case_name: Name of the test case to match against group conditions.
+            vm_size: VM size identifier to match against group conditions.
+        Returns:
+            Dictionary containing:
+                - statistics_times (int): Number of times to run statistics.
+                - statistics_type (str): Type of statistics to use (e.g., "average").
+                - error_threshold (float): Acceptable error threshold.
+        """

enable perf_evaluation notifier #4090

Are you sure you want to change the base?

enable perf_evaluation notifier #4090

Conversation

LiliDeng commented Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants