-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Expose start profiling API #26652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Expose start profiling API #26652
Conversation
64fd2b3 to
4ae0d10
Compare
|
Make this as draft because we might need other changes to support GenAI support start/end profiling. |
4ae0d10 to
4b20c55
Compare
|
@yuslepukhin @guschmue @fs-eire |
| ORT_API_STATUS_IMPL(OrtApis::SessionStartProfiling, _In_ OrtSession* sess) { | ||
| API_IMPL_BEGIN | ||
| auto session = reinterpret_cast<::onnxruntime::InferenceSession*>(sess); | ||
| session->StartProfiling("onnxruntime_profile_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "onnxruntime_profile_" the value by default? Is it possible that the value need to be specified by users? Looks like not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good feature as the disk may be polluted by many similar files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are multi-threading issues considered? Typically, there is a single thread that invokes Run() and no other calls except another Run are called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I have changed it to the following shape. Only when developer call start_profiling(), we will use the default value.
def start_profiling(self, file_prefix="onnxruntime_profile_"):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yuslepukhin I didn’t consider the multi-threaded scenario. We previously exposed end_profiling API. So I added start_profiling to keep it consistent.
I think you are right, calling session.run in one thread while calling session.start/end_profiling in another thread may lead to unexpected issues. E.g. If thread 1 call session.startprofiling and just marked enabled_ as true but not intiliazed profiler yet. and thread 2 call session. run which will try to record data because session_profiler_.IsEnabled() is true. This might cause unexpected issue.
Do you think the following pattern is a good way to address this? But it might cause some perf overhead when profiling is turned on,
Status InferenceSession::Run(...) {
TimePoint tp = std::chrono::high_resolution_clock::now();
// Add double-Checked Locking to minimize perf overhead
if (session_profiler_.IsEnabled()) { // first check
std::lock_guard<std::mutex> lock(profiler_mutex_);
if (session_profiler_.IsEnabled()) { // second check
tp = session_profiler_.Start();
}
}
...
}
template <typename T>
void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) {
// Add
std::lock_guard<std::mutex> lock(profiler_mutex_);
if (session_profiler_.IsEnabled()) {
LOGS(*session_logger_, WARNING) << "Profiler is already running.";
return;
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yuslepukhin I didn’t consider the multi-threaded scenario. We previously exposed [end_profiling API]
We certainly do not want to do it Run(), besides, is it not what is taking place now already when profiling is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our intention is to allow developers to start profiling at any point during the session. Could we document that this API is not thread-safe in API doc?
(Note: Enabling profiling via session options starts profiling at the beginning of the session. This may fail for large contexts if the number of profiling events exceeds the maximum supported limit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our intention is to allow developers to start profiling at any point during the session. Could we document that this API is not thread-safe in API doc?
How would anyone use it w/o being thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yuslepukhin I might be misunderstanding something, so let me clarify a few points to make sure we’re on the same page:
-
InferenceSession is designed as thread-safe.
This means developers can safely invoke its member functions from multiple threads. For example:1.1 Multiple threads calling session.Run() concurrently.
1.2 Some threads calling session.Run() while others call session.GetOutputs().
All of these are supported.
-
However, StartProfiling and EndProfiling are not thread-safe in the current implementation.
This can lead to two categories of issues:2.1 Multiple threads calling StartProfiling / EndProfiling simultaneously.
This is straightforward to fix by protecting these APIs with a dedicated mutex_.2.2 Race conditions between Run() and profiling control APIs (StartProfiling / EndProfiling).
Run() checks session_profiler_.IsEnabled(), while the profiling-control APIs modify this state.
This can cause several problems:a. session_profiler_.Start() may be invoked before the profiler is fully initialized.
b. Or session_profiler_.Start() might run even though profiling has already been disabled by EndProfiling.
To address both of these issues listed in 2, my proposal is to introduce a dedicated profiler_mutex_.
This ensures safe coordination between Run() and profiling state changes.
Importantly, this approach adds zero overhead to the hot Run() path when developers do not call profiling APIs at runtime. Do you have any suggestions on this? 🤔
// Current code
InferenceSession::Run{
if (session_profiler_.IsEnabled()) {
tp = session_profiler_.Start();
}
....
}
void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) {
session_profiler_.StartProfiling(ss.str());
}
// My proposoal
InferenceSession::Run{
// Add double-Checked Locking to minimize perf overhead
if (session_profiler_.IsEnabled()) { // first check
std::lock_guard<std::mutex> lock(profiler_mutex_);
if (session_profiler_.IsEnabled()) { // second check
tp = session_profiler_.Start();
}
}
....
}
void InferenceSession::StartProfiling(const std::basic_string<T>& file_prefix) {
std::lock_guard<std::mutex> lock(profiler_mutex_);
session_profiler_.StartProfiling(ss.str());
}
Description
Add start profiling API in ORT. With this, we can profile for a time span. Based on this, we have another genAI PR to support start/end profiling in genai.
ps: When
enable_profilingof session option is true andstart_profilingis called, we always profiling from the beginning of the sessionMotivation and Context
Previously, we only enable profiling from the beginning of session. There is no way to start profiling in the middle of a session.
With this PR, we can profiling any time span.