Skip to content

Conversation

@aciddelgado
Copy link
Contributor

This PR should allow us to replace a typical attention operator like GQA with Paged Attention, to be used with GenAI Engine API.

@@ -0,0 +1,107 @@
from onnxruntime import InferenceSession, OrtValue, SessionOptions, get_available_providers

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'get_available_providers' is not used.

Copilot Autofix

AI 3 months ago

To fix the unused import error for get_available_providers, simply remove it from the import statement on line 1 of test/python/test_paged_model.py. Alter the import so that only the used symbols (InferenceSession, OrtValue, SessionOptions) are listed. No changes to functionality are required, and no additional dependencies or definitions are needed.

Suggested changeset 1
test/python/test_paged_model.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/test/python/test_paged_model.py b/test/python/test_paged_model.py
--- a/test/python/test_paged_model.py
+++ b/test/python/test_paged_model.py
@@ -1,4 +1,4 @@
-from onnxruntime import InferenceSession, OrtValue, SessionOptions, get_available_providers
+from onnxruntime import InferenceSession, OrtValue, SessionOptions
 import numpy as np
 import torch
 
EOF
@@ -1,4 +1,4 @@
from onnxruntime import InferenceSession, OrtValue, SessionOptions, get_available_providers
from onnxruntime import InferenceSession, OrtValue, SessionOptions
import numpy as np
import torch

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
total_sequence_length = 276
sequence_length = 1
num_tokens = 1
max_num_blocks = 2

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable max_num_blocks is not used.

Copilot Autofix

AI 3 months ago

To fix the problem, the assignment of max_num_blocks on line 17 should be removed. As its right-hand side is a simple constant (2) with no side effects or function calls, it is safe to delete the whole line. No references exist to this variable elsewhere in the function, so the deletion does not affect functionality. The affected region is within test_paged_model() in the file test/python/test_paged_model.py. No imports, definitions, or method changes are needed—simply delete the line.

Suggested changeset 1
test/python/test_paged_model.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/test/python/test_paged_model.py b/test/python/test_paged_model.py
--- a/test/python/test_paged_model.py
+++ b/test/python/test_paged_model.py
@@ -14,7 +14,6 @@
   total_sequence_length = 276
   sequence_length = 1
   num_tokens = 1
-  max_num_blocks = 2
   num_blocks = 2
   block_size = 256
   num_heads = 32
EOF
@@ -14,7 +14,6 @@
total_sequence_length = 276
sequence_length = 1
num_tokens = 1
max_num_blocks = 2
num_blocks = 2
block_size = 256
num_heads = 32
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
io_binding.bind_ortvalue_output(f"present.{i}.value", values[i])

# Run inference
outputs = session.run_with_iobinding(io_binding)

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning test

This assignment to 'outputs' is unnecessary as it is
redefined
before this value is used.

Copilot Autofix

AI 3 months ago

To fix the problem, remove the first assignment to outputs on line 81, but retain the session.run_with_iobinding(io_binding) call to ensure any needed side effects happen. Only the assignment to outputs is redundant, so the method call should still be executed. This means replacing line 81 with just a call to session.run_with_iobinding(io_binding), removing "outputs =" from that line. No further changes need to be made, as the subsequent usage of outputs (after line 82) remains unchanged. No imports or additional definitions are needed.


Suggested changeset 1
test/python/test_paged_model.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/test/python/test_paged_model.py b/test/python/test_paged_model.py
--- a/test/python/test_paged_model.py
+++ b/test/python/test_paged_model.py
@@ -78,7 +78,7 @@
     io_binding.bind_ortvalue_output(f"present.{i}.value", values[i])
 
   # Run inference
-  outputs = session.run_with_iobinding(io_binding)
+  session.run_with_iobinding(io_binding)
   outputs = io_binding.copy_outputs_to_cpu()
 
   # Check output shape
EOF
@@ -78,7 +78,7 @@
io_binding.bind_ortvalue_output(f"present.{i}.value", values[i])

# Run inference
outputs = session.run_with_iobinding(io_binding)
session.run_with_iobinding(io_binding)
outputs = io_binding.copy_outputs_to_cpu()

# Check output shape
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
io_binding.bind_input(f"past_key_values.{i}.value", "cuda", 0, np.float16, values_gqa[i].shape(), values_gqa[i].data_ptr())
io_binding.bind_ortvalue_output(f"present.{i}.key", keys_gqa[i])
io_binding.bind_ortvalue_output(f"present.{i}.value", values_gqa[i])
outputs = session.run_with_iobinding(io_binding)

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning test

This assignment to 'outputs' is unnecessary as it is
redefined
before this value is used.

Copilot Autofix

AI 3 months ago

To fix the problem, simply remove the unnecessary assignment to outputs on line 109. Call session.run_with_iobinding(io_binding) as a standalone statement (unless its return value is genuinely intended to be used somewhere). This ensures that any side effects of the method call (if needed) still occur, but avoids having a redundant variable assignment. Only code within test/python/test_paged_model.py needs to be edited; specifically, line 109.


Suggested changeset 1
test/python/test_paged_model.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/test/python/test_paged_model.py b/test/python/test_paged_model.py
--- a/test/python/test_paged_model.py
+++ b/test/python/test_paged_model.py
@@ -106,7 +106,7 @@
     io_binding.bind_input(f"past_key_values.{i}.value", "cuda", 0, np.float16, values_gqa[i].shape(), values_gqa[i].data_ptr())
     io_binding.bind_ortvalue_output(f"present.{i}.key", keys_gqa[i])
     io_binding.bind_ortvalue_output(f"present.{i}.value", values_gqa[i])
-  outputs = session.run_with_iobinding(io_binding)
+  session.run_with_iobinding(io_binding)
   outputs = io_binding.copy_outputs_to_cpu()
 
   logits_gqa = outputs[0]
EOF
@@ -106,7 +106,7 @@
io_binding.bind_input(f"past_key_values.{i}.value", "cuda", 0, np.float16, values_gqa[i].shape(), values_gqa[i].data_ptr())
io_binding.bind_ortvalue_output(f"present.{i}.key", keys_gqa[i])
io_binding.bind_ortvalue_output(f"present.{i}.value", values_gqa[i])
outputs = session.run_with_iobinding(io_binding)
session.run_with_iobinding(io_binding)
outputs = io_binding.copy_outputs_to_cpu()

logits_gqa = outputs[0]
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
assert np.allclose(final_norm_output_page, final_norm_output_gqa[0], atol=1e-3), "Final norm output from paged model and gqa model do not match."

# Compare first present key between paged model and gqa model
present_key_gqa = outputs[1]

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable present_key_gqa is not used.

Copilot Autofix

AI 3 months ago

To fix the issue, simply remove the assignment to the variable present_key_gqa on line 117, as it is not used anywhere in the code. Take care not to remove any code on the right hand side of the assignment that has side effects, but in this case, outputs[1] is just an indexing operation and does not produce side effects. Removing the line will not affect the existing functionality or output of the test. No additional imports, methods, or definitions are required.

Suggested changeset 1
test/python/test_paged_model.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/test/python/test_paged_model.py b/test/python/test_paged_model.py
--- a/test/python/test_paged_model.py
+++ b/test/python/test_paged_model.py
@@ -114,7 +114,6 @@
   print("GQA model passed successfully.")
 
   # Compare first present key between paged model and gqa model
-  present_key_gqa = outputs[1]
   # print(f"Present key paged shape: {present_key_paged.shape}, Present key gqa shape: {present_key_gqa.shape}")
   # print(f"Difference between paged and gqa present key: {present_key_paged[1, total_sequence_length-block_size-1, 4, :50] - present_key_gqa[0, 4, total_sequence_length-1, :50]}")
 
EOF
@@ -114,7 +114,6 @@
print("GQA model passed successfully.")

# Compare first present key between paged model and gqa model
present_key_gqa = outputs[1]
# print(f"Present key paged shape: {present_key_paged.shape}, Present key gqa shape: {present_key_gqa.shape}")
# print(f"Difference between paged and gqa present key: {present_key_paged[1, total_sequence_length-block_size-1, 4, :50] - present_key_gqa[0, 4, total_sequence_length-1, :50]}")

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants