Skip to content

Conversation

@GarrettBeatty
Copy link
Contributor

@GarrettBeatty GarrettBeatty commented Dec 9, 2025

Description

Fix HTTP concurrency for multi part download. Previously we were releasing the http semaphore slot after we got the http response but before we read the response body (wrote it to a file/stream). This meant that it was possible for their to be more active ConcurrentServiceRequests than the user actually wanted.

The fix was to move the releasing of the http semaphore slot after processing the part. This ensured that its only released once its written to the file/stream and ensures only ConcurrentServiceRequests are active at a time.

Motivation and Context

#3806

Testing

  1. I added unit tests such as HttpSemaphore_HeldThroughProcessPartAsync which verify that the http semaphore is only released after process part is executed. In order to ensure this test actually checks the issue, i ran it before my fix and the test failed (as expected). After my fix, the test passes.
  2. dry run in progress 7e8b2922-9aab-4b1c-8cf2-ee8c6ad6b8d1

Screenshots (if appropriate)

Before (we can see concurrency is over the set 10)
image

After, we can see concurrency stays at or below 10
image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical HTTP concurrency bug in S3 multipart downloads where the HTTP semaphore was being released too early (after network download but before disk write), causing the ConcurrentServiceRequests limit to not properly control the entire I/O operation. The fix ensures the semaphore is held through both network download AND ProcessPartAsync completion for all download strategies (PART and RANGE).

Key Changes

  • Modified semaphore lifecycle to hold HTTP slots through complete I/O operation (network + disk)
  • Added comprehensive test coverage for the concurrency fix with deterministic tests
  • Enhanced documentation with detailed XML comments explaining semaphore lifecycle
  • Removed obsolete tests that verified the old (incorrect) behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
sdk/src/Services/S3/Custom/Transfer/Internal/MultipartDownloadManager.cs Core fix: moved semaphore release from discovery methods to after Part 1 processing in StartDownloadsAsync, and moved release in CreateDownloadTaskAsync to after ProcessPartAsync; added extensive XML documentation explaining the semaphore lifecycle pattern
sdk/test/Services/S3/UnitTests/Custom/MultipartDownloadManagerTests.cs Removed 2 obsolete tests that verified old behavior; added 2 new deterministic tests that verify semaphore is held through ProcessPartAsync; updated existing test to verify full download lifecycle
sdk/src/Services/S3/Custom/Transfer/Internal/FilePartDataHandler.cs Added partNumber parameter to WritePartToFileAsync for improved logging (bug in log format string on line 236)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

@GarrettBeatty GarrettBeatty changed the title fix http concurrency Fix http concurrency for multi part download. Dec 9, 2025
/// <remarks>
/// <para><strong>IMPORTANT - HTTP Semaphore Lifecycle:</strong></para>
/// <para>
/// This method acquires an HTTP concurrency slot from the configured semaphore and downloads Part 1.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added docs here about it holding http concurrency and the need to call StartDownloadAsync afterword to release it

}

[TestMethod]
public async Task DiscoverUsingPartStrategy_AcquiresAndReleasesHttpSlot()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are not applicable because they were asserting the old behavior of releasing the http slot only after we get the initial response, and not after processing

// Act - Start both discoveries concurrently
var task1 = coordinator1.DiscoverDownloadStrategyAsync(CancellationToken.None);
var task2 = coordinator2.DiscoverDownloadStrategyAsync(CancellationToken.None);
var discovery1 = await coordinator1.DiscoverDownloadStrategyAsync(CancellationToken.None);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test had to be updated because it requires callers to call startdownloadasync after discoverdownloadstrategyasync in order to release the http slot

#region Concurrency Control Tests

[TestMethod]
public async Task HttpSemaphore_HeldThroughProcessPartAsync()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in pr description, without my bug fix these tests fail. with my bug fix, these tests pass

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

_logger.DebugFormat("MultipartDownloadManager: [Part {0}] Processing completed successfully", partNumber);
}
finally
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to after processpart

firstPartResponse = await _s3Client.GetObjectAsync(firstPartRequest, cancellationToken).ConfigureAwait(false);
}
finally
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to after processpart

/// <exception cref="InvalidOperationException">Thrown if discovery has already been performed.</exception>
/// <exception cref="OperationCanceledException">Thrown if the operation is cancelled.</exception>
/// <inheritdoc/>
public async Task<DownloadDiscoveryResult> DiscoverDownloadStrategyAsync(CancellationToken cancellationToken)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably in the future we can refactor this logic such that we dont need a separate DiscoverDownloadStrategyAsyncand we can put everything into StartDownload. then at least the acquiring of the http slot and releasing will be in the same function

Assert.AreEqual(7, result.TotalParts); // 52428800 / 8388608 = 6.25 -> 7 parts

// Verify the mock was called with correct setup
mockDataHandler.Verify(x => x.WaitForCapacityAsync(It.IsAny<CancellationToken>()), Times.Once);
Copy link
Contributor Author

@GarrettBeatty GarrettBeatty Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i couldnt figure out how to get a good integration test for concurrency so i did unit tests instead. i would like to have an integration test though

Copy link
Contributor Author

@GarrettBeatty GarrettBeatty Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i found some way, but the issue is that it wont always occur because of throughput issues of where the test is being run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant