Skip to content

Conversation

@LiliDeng
Copy link
Collaborator

@LiliDeng LiliDeng commented Nov 3, 2025

No description provided.

@LiliDeng LiliDeng requested a review from squirrelsc as a code owner November 3, 2025 16:09
@LiliDeng LiliDeng force-pushed the lildeng/perf_evaluation branch 4 times, most recently from 2b92dec to 3b2a7d7 Compare November 19, 2025 07:44
@LiliDeng LiliDeng force-pushed the lildeng/perf_evaluation branch 5 times, most recently from 7b7748b to c7497b8 Compare December 5, 2025 00:19
@LiliDeng LiliDeng requested a review from Copilot December 10, 2025 07:02
@LiliDeng LiliDeng force-pushed the lildeng/perf_evaluation branch from c7497b8 to 07864a5 Compare December 10, 2025 07:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new performance evaluation notifier for the LISA testing framework. The notifier validates performance test results against predefined criteria and can optionally fail tests when performance targets are not met.

Key Changes:

  • Implements a comprehensive performance evaluation system with support for both JSON and YAML criteria files
  • Adds flexible VM size pattern matching with fuzzy matching capabilities for Azure VM naming conventions
  • Introduces hierarchical criteria format with group-based condition matching for test cases and VM sizes

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
lisa/notifiers/perfevaluation/perfevaluation.py Main implementation of the PerfEvaluation notifier with metric evaluation logic, criteria loading, and test result modification capabilities
lisa/notifiers/perfevaluation/perf_tcp_ntttcp_sriov_criteria.json Performance criteria definitions for TCP NTTTCP SRIOV tests across various VM sizes with detailed metrics for throughput, latency, and network performance
lisa/notifiers/perfevaluation/perf_nvme_criteria.yml YAML-based hierarchical criteria for NVMe storage performance tests with group-based conditions for specific VM sizes
lisa/notifiers/perfevaluation/init.py Package initialization exposing public classes (PerfEvaluation, PerfEvaluationSchema, MetricCriteria)
lisa/mixin_modules.py Registration of the new perfevaluation notifier module for plugin system integration
docs/run_test/runbook.rst Comprehensive documentation for the perfevaluation notifier including usage examples, parameters, and configuration formats

Test Suggestions (per LISA Testing Guidelines):

Key Test Cases:
Since this PR adds a new notifier component without modifying existing test execution logic, comprehensive unit tests should be added rather than integration tests. The notifier functionality should be validated through:

  • Unit tests for the PerfEvaluation class (suggested in selftests/test_perfevaluation.py)
  • Pattern matching validation tests
  • Criteria loading and parsing tests

Impacted LISA Features:
This change does not directly impact existing LISA features but adds a new capability for performance evaluation and reporting.

Tested Azure Marketplace Images:
No specific Azure marketplace image testing is required for this PR, as it introduces infrastructure code (a notifier) rather than platform-specific test cases. The notifier will be exercised indirectly when performance tests run on any image that generates UnifiedPerfMessage results.

Comment on lines +396 to +449
def _get_hierarchical_criteria(
self, test_case_name: str, vm_size: str, metric_name: str
) -> Optional[MetricCriteria]:
"""Get criteria from hierarchical data using condition matching"""
if not hasattr(self, "_hierarchical_data"):
self._log.debug(
f"[DEBUG] No hierarchical data found for {test_case_name}.{metric_name}"
)
return None

self._log.debug(
f"[DEBUG] Searching criteria for {test_case_name}.{metric_name}, "
f"VM: {vm_size}, Total groups: {len(self._hierarchical_data)}"
)

for idx, group in enumerate(self._hierarchical_data):
self._log.debug(
f"[DEBUG] Checking group {idx}: {group.get('name', 'Unnamed')}"
)
self._log.debug(f"[DEBUG] Group conditions: {group['conditions']}")

if self._matches_group_conditions(
group["conditions"], test_case_name, vm_size
):
self._log.debug(
f"[DEBUG] Group {idx} conditions matched! "
f"Metrics count: {len(group['metrics'])}"
)
self._log.debug(
f"[DEBUG] Available metrics in group: "
f"{list(group['metrics'].keys())}"
)

if metric_name in group["metrics"]:
self._log.debug(
f"[DEBUG] Metric {metric_name} found in group {idx}!"
)
metric_data = group["metrics"][metric_name]
# criteria is already a MetricCriteria object, return it
criteria = metric_data["criteria"]
assert isinstance(criteria, MetricCriteria)
return criteria
else:
self._log.debug(
f"[DEBUG] Metric {metric_name} NOT found in group {idx}"
)
else:
self._log.debug(f"[DEBUG] Group {idx} conditions did NOT match")

self._log.debug(
f"[DEBUG] No matching group found for "
f"{test_case_name}.{metric_name} with VM {vm_size}"
)
return None
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Potential maintainability issue: Excessive use of debug logging. Lines 401-449 contain numerous debug log statements (15+ debug calls) that clutter the code and make it harder to read. While debugging information is valuable, this level of verbosity should be reduced for production code.

Consider:

  1. Removing or consolidating redundant debug statements
  2. Using a single comprehensive debug log at the end of the method summarizing the search result
  3. Making detailed debug logging conditional on a debug flag

Example refactor:

def _get_hierarchical_criteria(
    self, test_case_name: str, vm_size: str, metric_name: str
) -> Optional[MetricCriteria]:
    """Get criteria from hierarchical data using condition matching"""
    if not hasattr(self, "_hierarchical_data"):
        return None
    
    for idx, group in enumerate(self._hierarchical_data):
        if self._matches_group_conditions(group["conditions"], test_case_name, vm_size):
            if metric_name in group["metrics"]:
                self._log.debug(
                    f"Found criteria for {test_case_name}.{metric_name} "
                    f"(VM: {vm_size}) in group '{group.get('name', 'Unnamed')}'"
                )
                return group["metrics"][metric_name]["criteria"]
    
    self._log.debug(
        f"No criteria found for {test_case_name}.{metric_name} (VM: {vm_size})"
    )
    return None

Copilot uses AI. Check for mistakes.
Comment on lines +195 to +204
# Single file
criteria_path = Path(criteria_pattern)
if not criteria_path.is_absolute():
criteria_path = package_dir / criteria_pattern

if criteria_path.exists():
return [criteria_path]
else:
self._log.debug(f"Criteria file not found: {criteria_path}")
return []
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for file path. The method directly uses Path(criteria_pattern) without validating that it doesn't contain path traversal sequences (e.g., ../../../etc/passwd). While this is mitigated by the fact that non-absolute paths are resolved relative to the package directory (line 198), it's better to explicitly validate or sanitize the input.

Consider adding validation:

# Single file
criteria_path = Path(criteria_pattern)
# Validate no path traversal attempts
if ".." in criteria_path.parts:
    self._log.warning(f"Invalid criteria path (path traversal detected): {criteria_pattern}")
    return []
    
if not criteria_path.is_absolute():
    criteria_path = package_dir / criteria_pattern

Copilot uses AI. Check for mistakes.
Comment on lines +979 to +999
criteria_file
'''''''''''''
type: str, optional, default: "*_criteria.yml"
Path or glob pattern to YAML files containing performance criteria.

criteria
''''''''
type: dict, optional, default: None

Direct criteria definition in runbook. Takes priority over criteria_file.

output_file
'''''''''''
type: str, optional, default: None

Output path for detailed evaluation results in JSON format.

fail_test_on_performance_failure
''''''''''''''''''''''''''''''''
type: bool, optional, default: False
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent documentation formatting. The "Parameters:" section header uses standard formatting, but sub-parameters use different header styles. The parameter criteria_file uses 13 apostrophes (line 980), criteria uses 8 (line 986), and output_file uses 11 (line 992).

For consistency with reStructuredText formatting, all parameter subsections should use the same underline length and style. Use a consistent pattern like:

criteria_file
"""""""""""""

This ensures proper rendering in Sphinx documentation.

Copilot uses AI. Check for mistakes.
class PerfEvaluation(notifier.Notifier):
"""
Performance evaluation notifier that validates UnifiedPerfMessage results
against predefined criteria loaded from a JSON configuration file.
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated documentation comment. The docstring states "validates UnifiedPerfMessage results against predefined criteria loaded from a JSON configuration file" (line 133), but the implementation actually supports both JSON and YAML files (as evidenced by lines 217-220 and the default criteria_file: Optional[str] = "*_criteria.yml" on line 25).

Update the docstring to reflect the actual functionality:

"""
Performance evaluation notifier that validates UnifiedPerfMessage results
against predefined criteria loaded from JSON or YAML configuration files.
"""
Suggested change
against predefined criteria loaded from a JSON configuration file.
against predefined criteria loaded from JSON or YAML configuration files.

Copilot uses AI. Check for mistakes.
"tolerance_percent": 10.0
},
"throughput_in_gbps_conn_4": {
"unit": "Gbps",
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The spacing here is inconsistent - there's a trailing space after "Gbps" on line 334, but not on lines 10, 16, etc. For consistency, remove the trailing space.

Suggested change
"unit": "Gbps",
"unit": "Gbps",

Copilot uses AI. Check for mistakes.
Comment on lines +143 to +180
min_value: 0
target_value: 100000
error_threshold: 0.50

- name: "receiver_cycles_per_byte_conn_1"
max_value: 100.0
target_value: 20.0
error_threshold: 0.50

- name: "retrans_segments_conn_1"
max_value: 1000
target_value: 0
error_threshold: 2.0

- name: "rx_packets_conn_1"
min_value: 100000
target_value: 1000000
error_threshold: 0.50

- name: "sender_cycles_per_byte_conn_1"
max_value: 100.0
target_value: 20.0
error_threshold: 0.50

- name: "server_mtu_conn_1"
min_value: 1400
target_value: 1500
error_threshold: 0.05

- name: "throughput_in_gbps_conn_1"
min_value: 0.5
target_value: 2.0
error_threshold: 0.30

- name: "tx_packets_conn_1"
min_value: 100000
target_value: 1000000
error_threshold: 0.50
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Inconsistent use of integer vs float for min_value and target_value. Lines 143, 158, 168, 178 use integers while other metrics use floats. For consistency with the rest of the file, use float literals (e.g., 0.0, 100000.0, 1400.0, 1500.0).

Copilot uses AI. Check for mistakes.
return False

# Check variant match (s, ads, etc.) - more flexible
if pat_variant not in ["*", ""] and pat_variant != "*":
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The condition on line 726 is redundant. The expression pat_variant not in ["*", ""] and pat_variant != "*" contains redundant checks since pat_variant != "*" is already covered by pat_variant not in ["*", ""].

Simplify to: if pat_variant not in ["*", ""]:

Suggested change
if pat_variant not in ["*", ""] and pat_variant != "*":
if pat_variant not in ["*", ""]:

Copilot uses AI. Check for mistakes.
def get_statistics_config(
self, test_case_name: str, vm_size: str
) -> Dict[str, Any]:
"""Get statistics configuration for a test case and VM size"""
Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for this public method. The method implements a plugin hook and should have a docstring explaining its purpose, parameters, and return value. For example:

def get_statistics_config(
    self, test_case_name: str, vm_size: str
) -> Dict[str, Any]:
    """
    Get statistics configuration for a test case and VM size.
    
    Searches hierarchical data for matching group conditions and returns
    the statistics configuration (times, type, threshold).
    
    Args:
        test_case_name: Name of the test case
        vm_size: VM size identifier
        
    Returns:
        Dictionary containing statistics_times, statistics_type, and error_threshold
    """
Suggested change
"""Get statistics configuration for a test case and VM size"""
"""
Get statistics configuration for a test case and VM size.
Searches hierarchical group data for matching conditions based on the
provided test case name and VM size. If a matching group is found,
returns its statistics configuration (statistics_times, statistics_type,
error_threshold). If no group matches, falls back to global runbook
overrides or default values.
Args:
test_case_name: Name of the test case to match against group conditions.
vm_size: VM size identifier to match against group conditions.
Returns:
Dictionary containing:
- statistics_times (int): Number of times to run statistics.
- statistics_type (str): Type of statistics to use (e.g., "average").
- error_threshold (float): Acceptable error threshold.
"""

Copilot uses AI. Check for mistakes.
Comment on lines +675 to +692
for flex_pattern in flexible_patterns:
try:
if re.match(f"^{flex_pattern}$", vm_size, re.IGNORECASE):
self._log.debug(
f"Fuzzy match: '{vm_size}' matched '{pattern}' "
f"using pattern '{flex_pattern}'"
)
return True
if re.match(f"^{flex_pattern}$", vm_size_lower, re.IGNORECASE):
self._log.debug(
f"Fuzzy match: '{vm_size}' matched '{pattern}' "
f"using lowercase pattern '{flex_pattern}'"
)
return True
except re.error:
# If regex fails, continue to next pattern
continue

Copy link

Copilot AI Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential performance issue: Multiple regex compilation within a loop. In the _fuzzy_match_vm_size method, re.match is called inside a loop for multiple patterns (lines 677, 683). Each call to re.match compiles the regex pattern on-the-fly.

For better performance, consider pre-compiling the regex patterns before the loop, especially if this method is called frequently during performance evaluation:

def _fuzzy_match_vm_size(self, vm_size: str, pattern: str) -> bool:
    """Enhanced fuzzy matching for VM size patterns"""
    regex_pattern = pattern.replace("*", "[^_]*")
    
    flexible_patterns = [
        regex_pattern,
        regex_pattern.lower(),
        regex_pattern.replace("s_", "_").replace("_s", "_"),
    ]
    
    vm_size_lower = vm_size.lower()
    
    # Pre-compile patterns for better performance
    compiled_patterns = []
    for flex_pattern in flexible_patterns:
        try:
            compiled_patterns.append(re.compile(f"^{flex_pattern}$", re.IGNORECASE))
        except re.error:
            continue
    
    # Use compiled patterns
    for compiled_pattern in compiled_patterns:
        if compiled_pattern.match(vm_size) or compiled_pattern.match(vm_size_lower):
            self._log.debug(f"Fuzzy match: '{vm_size}' matched '{pattern}'")
            return True
    
    # ... rest of method
Suggested change
for flex_pattern in flexible_patterns:
try:
if re.match(f"^{flex_pattern}$", vm_size, re.IGNORECASE):
self._log.debug(
f"Fuzzy match: '{vm_size}' matched '{pattern}' "
f"using pattern '{flex_pattern}'"
)
return True
if re.match(f"^{flex_pattern}$", vm_size_lower, re.IGNORECASE):
self._log.debug(
f"Fuzzy match: '{vm_size}' matched '{pattern}' "
f"using lowercase pattern '{flex_pattern}'"
)
return True
except re.error:
# If regex fails, continue to next pattern
continue
# Pre-compile patterns for better performance
compiled_patterns = []
for flex_pattern in flexible_patterns:
try:
compiled_patterns.append(re.compile(f"^{flex_pattern}$", re.IGNORECASE))
except re.error:
# If regex fails, continue to next pattern
continue
for compiled_pattern in compiled_patterns:
if compiled_pattern.match(vm_size):
self._log.debug(
f"Fuzzy match: '{vm_size}' matched '{pattern}' "
f"using pattern '{compiled_pattern.pattern}'"
)
return True
if compiled_pattern.match(vm_size_lower):
self._log.debug(
f"Fuzzy match: '{vm_size}' matched '{pattern}' "
f"using lowercase pattern '{compiled_pattern.pattern}'"
)
return True

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants