Bug Description and Reproduction
Describe the Bug
After long-term inference using onnxRuntimeGenAI.QNN, nonsensical output or repeated punctuation may occur.
EnableCaching has been disabled during initialization:
var options = new OnnxRuntimeGenAIChatClientOptions
{
StopSequences = Array.Empty<string>(),
PromptFormatter = TestPromptFormatter,
EnableCaching = false
};
To Reproduce
Steps to reproduce the issue:
- Initialize the model
- Repeat the inference process (Start inference → Wait for inference to end → Start inference)
Expected behavior
Normal inference output.
Desktop (please complete the following information)
- OS: Windows 11 Home 25H2 26200.6588
- OnnxRuntimeGenAI.QNN: 0.10.0
- NPU Driver: 30.0.140.1000/30.0.145.1000
Additional Context
Add any other relevant information about the issue here.