-
Notifications
You must be signed in to change notification settings - Fork 874
Fix http concurrency for multi part download. #4215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/transfermanager
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical HTTP concurrency bug in S3 multipart downloads where the HTTP semaphore was being released too early (after network download but before disk write), causing the ConcurrentServiceRequests limit to not properly control the entire I/O operation. The fix ensures the semaphore is held through both network download AND ProcessPartAsync completion for all download strategies (PART and RANGE).
Key Changes
- Modified semaphore lifecycle to hold HTTP slots through complete I/O operation (network + disk)
- Added comprehensive test coverage for the concurrency fix with deterministic tests
- Enhanced documentation with detailed XML comments explaining semaphore lifecycle
- Removed obsolete tests that verified the old (incorrect) behavior
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sdk/src/Services/S3/Custom/Transfer/Internal/MultipartDownloadManager.cs | Core fix: moved semaphore release from discovery methods to after Part 1 processing in StartDownloadsAsync, and moved release in CreateDownloadTaskAsync to after ProcessPartAsync; added extensive XML documentation explaining the semaphore lifecycle pattern |
| sdk/test/Services/S3/UnitTests/Custom/MultipartDownloadManagerTests.cs | Removed 2 obsolete tests that verified old behavior; added 2 new deterministic tests that verify semaphore is held through ProcessPartAsync; updated existing test to verify full download lifecycle |
| sdk/src/Services/S3/Custom/Transfer/Internal/FilePartDataHandler.cs | Added partNumber parameter to WritePartToFileAsync for improved logging (bug in log format string on line 236) |
sdk/src/Services/S3/Custom/Transfer/Internal/FilePartDataHandler.cs
Outdated
Show resolved
Hide resolved
Add tests
0e5ce2e to
df61e36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
sdk/src/Services/S3/Custom/Transfer/Internal/FilePartDataHandler.cs
Outdated
Show resolved
Hide resolved
sdk/src/Services/S3/Custom/Transfer/Internal/MultipartDownloadManager.cs
Show resolved
Hide resolved
| /// <remarks> | ||
| /// <para><strong>IMPORTANT - HTTP Semaphore Lifecycle:</strong></para> | ||
| /// <para> | ||
| /// This method acquires an HTTP concurrency slot from the configured semaphore and downloads Part 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docs here about it holding http concurrency and the need to call StartDownloadAsync afterword to release it
| } | ||
|
|
||
| [TestMethod] | ||
| public async Task DiscoverUsingPartStrategy_AcquiresAndReleasesHttpSlot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are not applicable because they were asserting the old behavior of releasing the http slot only after we get the initial response, and not after processing
| // Act - Start both discoveries concurrently | ||
| var task1 = coordinator1.DiscoverDownloadStrategyAsync(CancellationToken.None); | ||
| var task2 = coordinator2.DiscoverDownloadStrategyAsync(CancellationToken.None); | ||
| var discovery1 = await coordinator1.DiscoverDownloadStrategyAsync(CancellationToken.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test had to be updated because it requires callers to call startdownloadasync after discoverdownloadstrategyasync in order to release the http slot
| #region Concurrency Control Tests | ||
|
|
||
| [TestMethod] | ||
| public async Task HttpSemaphore_HeldThroughProcessPartAsync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in pr description, without my bug fix these tests fail. with my bug fix, these tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
sdk/test/Services/S3/UnitTests/Custom/MultipartDownloadManagerTests.cs
Outdated
Show resolved
Hide resolved
sdk/test/Services/S3/UnitTests/Custom/MultipartDownloadManagerTests.cs
Outdated
Show resolved
Hide resolved
sdk/test/Services/S3/UnitTests/Custom/MultipartDownloadManagerTests.cs
Outdated
Show resolved
Hide resolved
| _logger.DebugFormat("MultipartDownloadManager: [Part {0}] Processing completed successfully", partNumber); | ||
| } | ||
| finally | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to after processpart
| firstPartResponse = await _s3Client.GetObjectAsync(firstPartRequest, cancellationToken).ConfigureAwait(false); | ||
| } | ||
| finally | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to after processpart
| /// <exception cref="InvalidOperationException">Thrown if discovery has already been performed.</exception> | ||
| /// <exception cref="OperationCanceledException">Thrown if the operation is cancelled.</exception> | ||
| /// <inheritdoc/> | ||
| public async Task<DownloadDiscoveryResult> DiscoverDownloadStrategyAsync(CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably in the future we can refactor this logic such that we dont need a separate DiscoverDownloadStrategyAsyncand we can put everything into StartDownload. then at least the acquiring of the http slot and releasing will be in the same function
| Assert.AreEqual(7, result.TotalParts); // 52428800 / 8388608 = 6.25 -> 7 parts | ||
|
|
||
| // Verify the mock was called with correct setup | ||
| mockDataHandler.Verify(x => x.WaitForCapacityAsync(It.IsAny<CancellationToken>()), Times.Once); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i couldnt figure out how to get a good integration test for concurrency so i did unit tests instead. i would like to have an integration test though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i found some way, but the issue is that it wont always occur because of throughput issues of where the test is being run
Description
Fix HTTP concurrency for multi part download. Previously we were releasing the http semaphore slot after we got the http response but before we read the response body (wrote it to a file/stream). This meant that it was possible for their to be more active
ConcurrentServiceRequeststhan the user actually wanted.The fix was to move the releasing of the http semaphore slot after processing the part. This ensured that its only released once its written to the file/stream and ensures only
ConcurrentServiceRequestsare active at a time.Motivation and Context
#3806
Testing
HttpSemaphore_HeldThroughProcessPartAsyncwhich verify that the http semaphore is only released after process part is executed. In order to ensure this test actually checks the issue, i ran it before my fix and the test failed (as expected). After my fix, the test passes.Screenshots (if appropriate)
Before (we can see concurrency is over the set 10)

After, we can see concurrency stays at or below 10

Types of changes
Checklist
License