Expose start profiling API #26652

xiaofeihan1 · 2025-11-25T09:49:27Z

Description

Add start profiling API in ORT. With this, we can profile for a time span. Based on this, we have another genAI PR to support start/end profiling in genai.

ps: When enable_profiling of session option is true and start_profiling is called, we always profiling from the beginning of the session

enable_profiling	start_profiling	end_profiling	Result
0	0	0	no trace
0	0	1	no trace
0	1	0	trace from start profiling to the end of the session
0	1	1	trace from start profiling to the end profiling
1	0	0	trace the entire session
1	0	1	trace from the begginng of the session to the end profiling
1	1	0	trace the entire session
1	1	1	trace from the beggining of the session to the end profiling

Motivation and Context

Previously, we only enable profiling from the beginning of session. There is no way to start profiling in the middle of a session.

With this PR, we can profiling any time span.

xiaofeihan1 · 2025-11-26T08:37:41Z

Make this as draft because we might need other changes to support GenAI support start/end profiling.

qjia7 · 2025-12-02T02:36:49Z

@yuslepukhin @guschmue @fs-eire
We’re eager to have this feature because we’ve observed that when the context is long, profiling events exceed the maximum supported limit if profiling is enabled from the start of the session. As a result, no profiling data is generated, which makes debugging difficult. Without profiling data, it’s hard to investigate why generation becomes very slow when the total sequence length exceeds a certain threshold.
We’d like to enhance the current profiling interfaces in ORT and ORT-GenAI. Please take a look at this PR and guide us in the right direction. Thanks!

onnxruntime/core/session/inference_session.cc

feich-ms · 2025-12-04T07:21:01Z

onnxruntime/core/session/onnxruntime_c_api.cc

+ORT_API_STATUS_IMPL(OrtApis::SessionStartProfiling, _In_ OrtSession* sess) {
+  API_IMPL_BEGIN
+  auto session = reinterpret_cast<::onnxruntime::InferenceSession*>(sess);
+  session->StartProfiling("onnxruntime_profile_");


Is the "onnxruntime_profile_" the value by default? Is it possible that the value need to be specified by users? Looks like not necessary.

This would be a good feature as the disk may be polluted by many similar files.

Are multi-threading issues considered? Typically, there is a single thread that invokes Run() and no other calls except another Run are called.

Thanks for the suggestion! I have changed it to the following shape. Only when developer call start_profiling(), we will use the default value.

def start_profiling(self, file_prefix="onnxruntime_profile_"):

Hi @yuslepukhin I didn’t consider the multi-threaded scenario. We previously exposed end_profiling API. So I added start_profiling to keep it consistent.
I think you are right, calling session.run in one thread while calling session.start/end_profiling in another thread may lead to unexpected issues. E.g. If thread 1 call session.startprofiling and just marked enabled_ as true but not intiliazed profiler yet. and thread 2 call session. run which will try to record data because session_profiler_.IsEnabled() is true. This might cause unexpected issue.
Do you think the following pattern is a good way to address this? But it might cause some perf overhead when profiling is turned on,

Status InferenceSession::Run(...) { TimePoint tp = std::chrono::high_resolution_clock::now(); // Add double-Checked Locking to minimize perf overhead if (session_profiler_.IsEnabled()) { // first check std::lock_guard<std::mutex> lock(profiler_mutex_); if (session_profiler_.IsEnabled()) { // second check tp = session_profiler_.Start(); } } ... } template <typename T> void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) { // Add std::lock_guard<std::mutex> lock(profiler_mutex_); if (session_profiler_.IsEnabled()) { LOGS(*session_logger_, WARNING) << "Profiler is already running."; return; } ... }

Hi @yuslepukhin I didn’t consider the multi-threaded scenario. We previously exposed [end_profiling API]

We certainly do not want to do it Run(), besides, is it not what is taking place now already when profiling is enabled?

Our intention is to allow developers to start profiling at any point during the session. Could we document that this API is not thread-safe in API doc?

(Note: Enabling profiling via session options starts profiling at the beginning of the session. This may fail for large contexts if the number of profiling events exceeds the maximum supported limit.)

Our intention is to allow developers to start profiling at any point during the session. Could we document that this API is not thread-safe in API doc?

How would anyone use it w/o being thread-safe?

Hi @yuslepukhin I might be misunderstanding something, so let me clarify a few points to make sure we’re on the same page:

InferenceSession is designed as thread-safe.
This means developers can safely invoke its member functions from multiple threads. For example:

1.1 Multiple threads calling session.Run() concurrently.

1.2 Some threads calling session.Run() while others call session.GetOutputs().

All of these are supported.

However, StartProfiling and EndProfiling are not thread-safe in the current implementation.
This can lead to two categories of issues:

2.1 Multiple threads calling StartProfiling / EndProfiling simultaneously.
This is straightforward to fix by protecting these APIs with a dedicated mutex_.

2.2 Race conditions between Run() and profiling control APIs (StartProfiling / EndProfiling).
Run() checks session_profiler_.IsEnabled(), while the profiling-control APIs modify this state.
This can cause several problems:

a. session_profiler_.Start() may be invoked before the profiler is fully initialized.

b. Or session_profiler_.Start() might run even though profiling has already been disabled by EndProfiling.

To address both of these issues listed in 2, my proposal is to introduce a dedicated profiler_mutex_.

This ensures safe coordination between Run() and profiling state changes.
Importantly, this approach adds zero overhead to the hot Run() path when developers do not call profiling APIs at runtime. Do you have any suggestions on this? 🤔

// Current code InferenceSession::Run{ if (session_profiler_.IsEnabled()) { tp = session_profiler_.Start(); } .... } void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) { session_profiler_.StartProfiling(ss.str()); }

// My proposoal InferenceSession::Run{ // Add double-Checked Locking to minimize perf overhead if (session_profiler_.IsEnabled()) { // first check std::lock_guard<std::mutex> lock(profiler_mutex_); if (session_profiler_.IsEnabled()) { // second check tp = session_profiler_.Start(); } } .... } void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) { std::lock_guard<std::mutex> lock(profiler_mutex_); session_profiler_.StartProfiling(ss.str()); }

xiaofeihan1 force-pushed the xiaofeihan/add_start_profiling branch from 64fd2b3 to 4ae0d10 Compare November 26, 2025 03:10

xiaofeihan1 marked this pull request as ready for review November 26, 2025 03:12

qjia7 requested a review from yuslepukhin November 26, 2025 03:15

xiaofeihan1 marked this pull request as draft November 26, 2025 08:36

xiaofeihan1 removed the request for review from yuslepukhin November 26, 2025 08:36

xiaofeihan1 added 2 commits November 27, 2025 13:33

expose start profiling API

6e46636

expo c API

4b20c55

xiaofeihan1 force-pushed the xiaofeihan/add_start_profiling branch from 4ae0d10 to 4b20c55 Compare November 27, 2025 07:59

xiaofeihan1 marked this pull request as ready for review November 27, 2025 08:11

xiaofeihan1 requested a review from yuslepukhin November 27, 2025 08:11

xiaofeihan1 marked this pull request as draft November 27, 2025 09:28

xiaofeihan1 removed the request for review from yuslepukhin November 27, 2025 09:28

fix test

795660c

xiaofeihan1 marked this pull request as ready for review November 28, 2025 06:29

xiaofeihan1 closed this Dec 1, 2025

xiaofeihan1 reopened this Dec 1, 2025

qjia7 requested review from fs-eire, guschmue and yuslepukhin December 2, 2025 02:22

feich-ms reviewed Dec 4, 2025

View reviewed changes

onnxruntime/core/session/inference_session.cc Show resolved Hide resolved

feich-ms reviewed Dec 4, 2025

View reviewed changes

add file prefix

ecd6806

feich-ms mentioned this pull request Dec 9, 2025

Long context(tokens length >= 4090) encountered a sharp perf regression with beachmark_e2e.py microsoft/onnxruntime-genai#1910

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose start profiling API #26652

Expose start profiling API #26652

xiaofeihan1 commented Nov 25, 2025 •

edited

Loading

Uh oh!

xiaofeihan1 commented Nov 26, 2025

Uh oh!

qjia7 commented Dec 2, 2025

Uh oh!

Uh oh!

feich-ms Dec 4, 2025 •

edited

Loading

Uh oh!

yuslepukhin Dec 5, 2025

Uh oh!

yuslepukhin Dec 5, 2025 •

edited

Loading

Uh oh!

xiaofeihan1 Dec 5, 2025

Uh oh!

xiaofeihan1 Dec 5, 2025 •

edited

Loading

Uh oh!

yuslepukhin Dec 5, 2025 •

edited

Loading

Uh oh!

xiaofeihan1 Dec 8, 2025

Uh oh!

yuslepukhin Dec 8, 2025

Uh oh!

xiaofeihan1 Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Expose start profiling API #26652

Are you sure you want to change the base?

Expose start profiling API #26652

Conversation

xiaofeihan1 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

xiaofeihan1 commented Nov 26, 2025

Uh oh!

qjia7 commented Dec 2, 2025

Uh oh!

Uh oh!

feich-ms Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaofeihan1 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

xiaofeihan1 Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaofeihan1 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

xiaofeihan1 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaofeihan1 commented Nov 25, 2025 •

edited

Loading

feich-ms Dec 4, 2025 •

edited

Loading

yuslepukhin Dec 5, 2025 •

edited

Loading

xiaofeihan1 Dec 5, 2025 •

edited

Loading

yuslepukhin Dec 5, 2025 •

edited

Loading