Skip to content

Conversation

@xiaofeihan1
Copy link
Contributor

@xiaofeihan1 xiaofeihan1 commented Nov 25, 2025

Description

Add start profiling API in ORT. With this, we can profile for a time span. Based on this, we have another genAI PR to support start/end profiling in genai.

ps: When enable_profiling of session option is true and start_profiling is called, we always profiling from the beginning of the session

enable_profiling start_profiling end_profiling Result
0 0 0 no trace
0 0 1 no trace
0 1 0 trace from start profiling to the end of the session
0 1 1 trace from start profiling to the end profiling
1 0 0 trace the entire session
1 0 1 trace from the begginng of the session to the end profiling
1 1 0 trace the entire session
1 1 1 trace from the beggining of the session to the end profiling

Motivation and Context

Previously, we only enable profiling from the beginning of session. There is no way to start profiling in the middle of a session.

With this PR, we can profiling any time span.

@xiaofeihan1 xiaofeihan1 force-pushed the xiaofeihan/add_start_profiling branch from 64fd2b3 to 4ae0d10 Compare November 26, 2025 03:10
@xiaofeihan1 xiaofeihan1 marked this pull request as ready for review November 26, 2025 03:12
@qjia7 qjia7 requested a review from yuslepukhin November 26, 2025 03:15
@xiaofeihan1 xiaofeihan1 marked this pull request as draft November 26, 2025 08:36
@xiaofeihan1 xiaofeihan1 removed the request for review from yuslepukhin November 26, 2025 08:36
@xiaofeihan1
Copy link
Contributor Author

Make this as draft because we might need other changes to support GenAI support start/end profiling.

@xiaofeihan1 xiaofeihan1 force-pushed the xiaofeihan/add_start_profiling branch from 4ae0d10 to 4b20c55 Compare November 27, 2025 07:59
@xiaofeihan1 xiaofeihan1 marked this pull request as ready for review November 27, 2025 08:11
@xiaofeihan1 xiaofeihan1 marked this pull request as draft November 27, 2025 09:28
@xiaofeihan1 xiaofeihan1 removed the request for review from yuslepukhin November 27, 2025 09:28
@xiaofeihan1 xiaofeihan1 marked this pull request as ready for review November 28, 2025 06:29
@xiaofeihan1 xiaofeihan1 closed this Dec 1, 2025
@xiaofeihan1 xiaofeihan1 reopened this Dec 1, 2025
@qjia7
Copy link
Contributor

qjia7 commented Dec 2, 2025

@yuslepukhin @guschmue @fs-eire
We’re eager to have this feature because we’ve observed that when the context is long, profiling events exceed the maximum supported limit if profiling is enabled from the start of the session. As a result, no profiling data is generated, which makes debugging difficult. Without profiling data, it’s hard to investigate why generation becomes very slow when the total sequence length exceeds a certain threshold.
We’d like to enhance the current profiling interfaces in ORT and ORT-GenAI. Please take a look at this PR and guide us in the right direction. Thanks!

ORT_API_STATUS_IMPL(OrtApis::SessionStartProfiling, _In_ OrtSession* sess) {
API_IMPL_BEGIN
auto session = reinterpret_cast<::onnxruntime::InferenceSession*>(sess);
session->StartProfiling("onnxruntime_profile_");
Copy link
Contributor

@feich-ms feich-ms Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "onnxruntime_profile_" the value by default? Is it possible that the value need to be specified by users? Looks like not necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a good feature as the disk may be polluted by many similar files.

Copy link
Member

@yuslepukhin yuslepukhin Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are multi-threading issues considered? Typically, there is a single thread that invokes Run() and no other calls except another Run are called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have changed it to the following shape. Only when developer call start_profiling(), we will use the default value.

  def start_profiling(self, file_prefix="onnxruntime_profile_"):

Copy link
Contributor Author

@xiaofeihan1 xiaofeihan1 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yuslepukhin I didn’t consider the multi-threaded scenario. We previously exposed end_profiling API. So I added start_profiling to keep it consistent.
I think you are right, calling session.run in one thread while calling session.start/end_profiling in another thread may lead to unexpected issues. E.g. If thread 1 call session.startprofiling and just marked enabled_ as true but not intiliazed profiler yet. and thread 2 call session. run which will try to record data because session_profiler_.IsEnabled() is true. This might cause unexpected issue.
Do you think the following pattern is a good way to address this? But it might cause some perf overhead when profiling is turned on,

Status InferenceSession::Run(...) {
  TimePoint tp = std::chrono::high_resolution_clock::now();
  
  // Add double-Checked Locking to minimize perf overhead
  if (session_profiler_.IsEnabled()) { // first check
    std::lock_guard<std::mutex> lock(profiler_mutex_);
    if (session_profiler_.IsEnabled()) { // second check
      tp = session_profiler_.Start();
    }
  }
  ...
}

template <typename T>
void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) {
  // Add
  std::lock_guard<std::mutex> lock(profiler_mutex_);

  if (session_profiler_.IsEnabled()) {
    LOGS(*session_logger_, WARNING) << "Profiler is already running.";
    return;
  }
  ...
}

Copy link
Member

@yuslepukhin yuslepukhin Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yuslepukhin I didn’t consider the multi-threaded scenario. We previously exposed [end_profiling API]

We certainly do not want to do it Run(), besides, is it not what is taking place now already when profiling is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our intention is to allow developers to start profiling at any point during the session. Could we document that this API is not thread-safe in API doc?

(Note: Enabling profiling via session options starts profiling at the beginning of the session. This may fail for large contexts if the number of profiling events exceeds the maximum supported limit.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our intention is to allow developers to start profiling at any point during the session. Could we document that this API is not thread-safe in API doc?

How would anyone use it w/o being thread-safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yuslepukhin I might be misunderstanding something, so let me clarify a few points to make sure we’re on the same page:

  1. InferenceSession is designed as thread-safe.
    This means developers can safely invoke its member functions from multiple threads. For example:

    1.1 Multiple threads calling session.Run() concurrently.

    1.2 Some threads calling session.Run() while others call session.GetOutputs().

    All of these are supported.

  2. However, StartProfiling and EndProfiling are not thread-safe in the current implementation.
    This can lead to two categories of issues:

    2.1 Multiple threads calling StartProfiling / EndProfiling simultaneously.
    This is straightforward to fix by protecting these APIs with a dedicated mutex_.

    2.2 Race conditions between Run() and profiling control APIs (StartProfiling / EndProfiling).
    Run() checks session_profiler_.IsEnabled(), while the profiling-control APIs modify this state.
    This can cause several problems:

    a. session_profiler_.Start() may be invoked before the profiler is fully initialized.

    b. Or session_profiler_.Start() might run even though profiling has already been disabled by EndProfiling.

To address both of these issues listed in 2, my proposal is to introduce a dedicated profiler_mutex_.

This ensures safe coordination between Run() and profiling state changes.
Importantly, this approach adds zero overhead to the hot Run() path when developers do not call profiling APIs at runtime. Do you have any suggestions on this? 🤔

// Current code
InferenceSession::Run{
  if (session_profiler_.IsEnabled()) {
    tp = session_profiler_.Start();
  }
  ....
}

void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) {
  session_profiler_.StartProfiling(ss.str());
}

// My proposoal
InferenceSession::Run{
   // Add double-Checked Locking to minimize perf overhead
    if (session_profiler_.IsEnabled()) { // first check
      std::lock_guard<std::mutex> lock(profiler_mutex_);
      if (session_profiler_.IsEnabled()) { // second check
        tp = session_profiler_.Start();
      }
    }
  ....
}

void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) {
  std::lock_guard<std::mutex> lock(profiler_mutex_);
  session_profiler_.StartProfiling(ss.str());
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants