Skip to content

Conversation

@kamalca
Copy link
Collaborator

@kamalca kamalca commented Nov 18, 2025

Requirements that are CountSpaces / IntRanges can now specify to select the max value instead of the min value.

For example, test cases like verify_synthetic_provision_with_max_nics can specify that they want the max nics for a given capability instead of hard coding 8 as it currently does.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the ability for requirements to select maximum values instead of minimum values. It renames generate_min_capability to the more generic choose_value throughout the codebase and adds a new choose_max_value field to the IntRange class. This allows test cases like verify_synthetic_provision_with_max_nics to dynamically request the maximum available capabilities instead of hardcoding values.

Key Changes:

  • Renamed generate_min_capability method to choose_value across all requirement classes and helper functions
  • Added choose_max_value boolean field to IntRange class to enable maximum value selection
  • Updated _choose_value implementation to support both minimum and maximum value selection based on the flag

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
lisa/search_space.py Core implementation: added choose_max_value field to IntRange, renamed methods, updated value selection logic
lisa/schema.py Updated method calls from generate_min_capability to choose_value in NodeSpace and related classes
lisa/environment.py Updated environment space and node creation to use choose_value method
lisa/features/nvme.py Updated NvmeSettings feature to use choose_value_countspace
lisa/features/gpu.py Renamed _generate_min_capability to _choose_value in GPU feature
lisa/sut_orchestrator/azure/platform_.py Updated Azure platform to call choose_value method
lisa/sut_orchestrator/azure/features.py Updated disk feature requirement method enum reference
lisa/sut_orchestrator/aws/platform_.py Updated AWS platform to call choose_value method
lisa/sut_orchestrator/aws/features.py Updated disk feature requirement method enum reference
lisa/sut_orchestrator/hyperv/platform_.py Updated Hyper-V platform to call choose_value method
lisa/sut_orchestrator/baremetal/platform_.py Updated baremetal platform to call choose_value method
lisa/sut_orchestrator/libvirt/platform.py Updated libvirt platform to use choose_value_countspace
lisa/runners/lisa_runner.py Updated runner to use choose_value_countspace for node count
selftests/test_search_space.py Updated test imports and method calls to use new naming
selftests/test_env_requirement.py Updated test method calls to use choose_value
selftests/test_platform.py Updated mock platform test to use choose_value
selftests/azure/test_disk_feature.py Updated disk feature tests to call choose_value

Test Suggestions:

Key Test Cases:
test_int_range_validation|test_choose_value_not_supported|test_supported_simple_requirement

Impacted LISA Features:
This change affects the core search space and requirement system, which is used by all features. No specific feature classes are directly modified in their capability checking logic.

Tested Azure Marketplace Images:
Since this is a core infrastructure change affecting how requirements are processed rather than OS-specific functionality, comprehensive testing across diverse OS families and architectures is recommended:

  • canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
  • debian debian-12 12-gen2 latest
  • redhat rhel 95_gen2 latest
  • suse sles-15-sp6 gen2 latest
  • microsoftcblmariner azure-linux-3 azure-linux-3-gen2 latest
Comments suppressed due to low confidence (6)

lisa/sut_orchestrator/azure/platform_.py:2285

  • Misleading variable name: The variable min_cap is misleading since this method now calls choose_value() which can return either the minimum or maximum value depending on the choose_max_value flag.

Consider renaming to a more neutral name like chosen_cap or selected_cap:

chosen_cap: schema.NodeSpace = requirement.choose_value(
    azure_capability.capability
)
# ... rest of the code using chosen_cap instead of min_cap
return chosen_cap
        min_cap: schema.NodeSpace = requirement.choose_value(
            azure_capability.capability
        )
        # Apply azure specified values. They will pass into arm template
        azure_node_runbook = min_cap.get_extended_runbook(AzureNodeSchema, AZURE)
        if azure_node_runbook.location:
            assert location in azure_node_runbook.location, (
                f"predefined location [{azure_node_runbook.location}] "
                f"must be same as "
                f"cap location [{location}]"
            )
        # the location may not be set
        azure_node_runbook.location = location
        azure_node_runbook.vm_size = azure_capability.vm_size

        return min_cap

selftests/test_search_space.py:233

  • Misleading variable name: The variable name actual_min is misleading since the choose_value_countspace() function can now return either the minimum or maximum value depending on the choose_max_value flag.

Consider renaming to a more neutral name like actual_value or chosen_value:

actual_value = choose_value_countspace(
    requirement, capability  # type:ignore
)
if expected_min[r_index][c_index] != actual_value:
    # ...
                        actual_min = choose_value_countspace(
                            requirement, capability  # type:ignore
                        )
                        if expected_min[r_index][c_index] != actual_min:
                            self._log.info(extra_msg)
                        self.assertEqual(
                            expected_min[r_index][c_index], actual_min, extra_msg

selftests/test_search_space.py:247

  • Misleading variable name: The variable name actual_min is misleading since the choose_value() function can now return either the minimum or maximum value depending on the choose_max_value flag.

Consider renaming to a more neutral name like actual_value or chosen_value:

actual_value = choose_value(
    requirement, capability  # type:ignore
)
self.assertEqual(
    expected_min[r_index][c_index], actual_value, extra_msg
)
                        actual_min = choose_value(
                            requirement, capability  # type:ignore
                        )
                        self.assertEqual(
                            expected_min[r_index][c_index], actual_min, extra_msg

lisa/features/nvme.py:264

  • Misleading variable name: The variable name min_value is misleading since the choose_value_countspace() function can now return either the minimum or maximum value depending on the choose_max_value flag.

Consider renaming to a more neutral name like chosen_value or result_value:

result_value = NvmeSettings()

if self.disk_count or capability.disk_count:
    result_value.disk_count = search_space.choose_value_countspace(
        self.disk_count, capability.disk_count
    )

return result_value
        min_value = NvmeSettings()

        if self.disk_count or capability.disk_count:
            min_value.disk_count = search_space.choose_value_countspace(
                self.disk_count, capability.disk_count
            )

        return min_value

lisa/environment.py:341

  • Misleading variable name: The variable name min_requirement is misleading since the choose_value() method can now return either the minimum or maximum value depending on the choose_max_value flag.

Consider renaming to a more neutral name like chosen_requirement or selected_requirement:

chosen_requirement = cast(
    schema.Capability,
    node_requirement.choose_value(node_requirement),
)
assert isinstance(chosen_requirement.node_count, int), (
    f"must be int after choose_value, " f"actual: {chosen_requirement.node_count}"
)
# node count should be expanded in platform already
assert chosen_requirement.node_count == 1, f"actual: {chosen_requirement.node_count}"
mock_runbook = schema.RemoteNode(
    type=constants.ENVIRONMENTS_NODES_REMOTE,
    capability=chosen_requirement,
    is_default=node_requirement.is_default,
)
        min_requirement = cast(
            schema.Capability,
            node_requirement.choose_value(node_requirement),
        )
        assert isinstance(min_requirement.node_count, int), (
            f"must be int after choose_value, " f"actual: {min_requirement.node_count}"
        )
        # node count should be expanded in platform already
        assert min_requirement.node_count == 1, f"actual: {min_requirement.node_count}"
        mock_runbook = schema.RemoteNode(
            type=constants.ENVIRONMENTS_NODES_REMOTE,
            capability=min_requirement,
            is_default=node_requirement.is_default,
        )

selftests/test_search_space.py:213

  • Misleading variable name: The variable name actual_min is misleading since the choose_value() method can now return either the minimum or maximum value depending on the choose_max_value flag.

Consider renaming to a more neutral name like actual_value or chosen_value throughout this method:

actual_value = requirement.choose_value(capability)
if expected_min[r_index][c_index] != actual_value:
    # ...
    self._log.info(f"actual_value: {actual_value}")
    # ...
    expected_min[r_index][c_index], actual_value, extra_msg
                        actual_min = requirement.choose_value(capability)
                        if expected_min[r_index][c_index] != actual_min:
                            self._log.info(extra_msg)
                            self._log.info(
                                f"expected_min: {expected_min[r_index][c_index]}"
                            )
                            self._log.info(f"actual_min: {actual_min}")
                        self.assertEqual(
                            expected_min[r_index][c_index], actual_min, extra_msg

@kamalca kamalca force-pushed the kameroncarr/count-space branch 3 times, most recently from 1299927 to c6965f2 Compare November 21, 2025 17:51
@kamalca kamalca requested a review from Copilot November 21, 2025 17:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

@kamalca kamalca force-pushed the kameroncarr/count-space branch 2 times, most recently from fe55fbf to 9d85510 Compare November 25, 2025 19:52
@kamalca kamalca requested a review from umfranci November 26, 2025 20:52
@kamalca kamalca marked this pull request as ready for review November 26, 2025 20:53
@kamalca kamalca requested a review from LiliDeng as a code owner November 26, 2025 20:53
@kamalca kamalca marked this pull request as draft November 26, 2025 22:20
Give an option to have IntRange select the max value when you call generate_min_capability.
Rename RequirementsMixin.generate_min_capability() to choose_value() since it might not select the minimum value.
@kamalca kamalca force-pushed the kameroncarr/count-space branch from 9d85510 to d3f3ce3 Compare November 26, 2025 22:43
@kamalca kamalca marked this pull request as ready for review December 1, 2025 17:24
)
def verify_max_gpu_provision(self, node: Node, log: Logger) -> None:
_gpu_provision_check(8, node, log)
assert isinstance(node.capability.gpu_count, int), (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalca
We want to ensure that the maximum GPU count offered by Azure is correctly validated.
How can we achieve this in our situation?

Copy link
Collaborator Author

@kamalca kamalca Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, it looks like the number of GPUs is a SKU quality, not a deployment parameter. So maybe GPU test cases are not a good candidate for this type of requirement.

I will have to ask Umang and Johnson what the motivation was for targeting the GPU test cases.
I will get back to you on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiliDeng I removed the changes to max_gpu test case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same case for the cases with name containing max

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the other test requirements is different since they are deployment parameters. The goal is to test the max resource capabilities of a given SKU in many cases.

The old test requirements have a similar problem; for example, we hard coded 8 NICs but some SKUs now support 15 or more. This PR addresses this by testing the max resource for the SKU selected.

Can you clarify further your concerns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that we need to ensure the maximum count is properly validated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so are you saying that you are not okay with lowering the minimum on the test case requirements?

Do I need to duplicate all of these test cases – one for the maximum on the given SKU, one for maximum allowed in Azure?

We need to be able to test the max_disk, max_nic capabilities of VM SKUs even if they are not capable of the max in Azure. In other words, we need to be able to test the max capabilities of smaller SKUs.

Can you help me out by describing exactly what part of my approach you have a problem with? What specific testing scenario are you worried about: chaos testing, image testing?
Do you have any ideas for how we can enable the scenario I am trying to achieve in a way that you are comfortable with?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking whether we can use a switch to support this scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid a switch. Every single validation flow would need to be addressed to understand whether they are testing the overall Azure Platform or if they are testing a specific SKU.
I worry that most people, especially people onboarding to LISA, do not know how test requirements work well enough to set a flag like this. There is no easy default behavior since both scenarios are common.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, these test cases are not required to be running on Azure Platform – so for other platforms, isn't the requirement a bit arbitrary? For example, the minimum might be set too high for AWS or another platform.

I think it makes sense to either

  1. have duplicate testcases specific to Azure if we are going to tie the requirements to Azure specific capabilities
  2. set the min nic / disk requirement in the runbook for testing where we want 8 NICs or 12 NVMe disks, etc.

Change test case requirements for max_nic and max_disk test cases.
@kamalca kamalca force-pushed the kameroncarr/count-space branch from d3f3ce3 to 240070d Compare December 2, 2025 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants