Fix stale lockfile cleanup for function extension builds #6696

skypher · 2025-12-08T03:54:47Z

Summary

When a function build process crashes or is killed (e.g., by SIGINT/Ctrl+C, SIGTERM, SIGHUP), the .build-lock directory can be left behind. Previously, this caused subsequent builds to hang for an extended period (with 20 retries using exponential backoff) before failing, making development frustrating. This was particularly confusing because the build would appear to hang indefinitely with no clear error message.

This PR implements a two-pronged approach to handle stale locks:

1. Signal Handlers for Clean Shutdown

Registers signal handlers (SIGINT, SIGTERM, SIGHUP) after acquiring the lock to ensure the lock is released even when the process is interrupted:

Handlers release the lock before re-emitting the signal for normal termination
Handlers are cleaned up after normal build completion to avoid memory leaks

2. Proactive Stale Lock Detection on Startup

Uses lockfile.check() to verify if a lock is actively held before attempting acquisition:

Removes orphaned lock directories from crashed builds automatically
Reduces retries from 20 to 3 with shorter timeouts (100ms-1000ms) for faster failure when lock is truly contested
Sets explicit 10-second stale threshold for lock files
Improves error message to show the actual lockfile path so users can manually clean up if needed

The fix handles three startup scenarios:

Stale lock exists (no active process) - auto-removes and proceeds normally
Active lock held by another process - waits and retries as before
Corrupted lock (check fails with ENOENT or similar) - attempts cleanup and proceeds

Test plan

Added 5 new unit tests covering all stale lock scenarios and signal handler registration
All 15 extension build tests pass
Manually verified fix by reproducing the original issue (crashed build leaving orphan lock) and confirming auto-cleanup works

When a function build process crashes or is killed, the `.build-lock` directory can be left behind. Previously, this caused subsequent builds to hang for an extended period (with 20 retries using exponential backoff) before failing, making development frustrating. This change: - Adds proactive stale lock detection before attempting to acquire a lock - Uses `lockfile.check()` to verify if a lock is actively held - Removes orphaned lock directories from crashed builds automatically - Reduces retries from 20 to 3 with shorter timeouts (faster failure) - Sets explicit 10-second stale threshold for lock files - Improves error message to show the actual lockfile path The fix handles three scenarios: 1. Stale lock exists (no active process) - auto-removes and proceeds 2. Active lock held by another process - waits normally 3. Corrupted lock (check fails) - attempts cleanup and proceeds

This prevents stale lockfiles from being left behind when the build process is interrupted by a signal (Ctrl+C, kill, terminal close). The signal handlers: - Register after the lock is acquired - Release the lock before re-emitting the signal to allow normal termination - Are cleaned up after normal build completion to avoid memory leaks Also adds a test to verify signal handlers are properly registered and removed during the build lifecycle.

The @shopify/cli-kit rmdir function uses the 'del' library which handles recursion automatically. The RmDirOptions interface only supports 'force', not 'recursive'.

skypher · 2025-12-08T04:27:00Z

CLA signed.

skypher requested a review from a team as a code owner December 8, 2025 03:54

github-actions bot added the cla-needed label Dec 8, 2025

skypher added 3 commits December 8, 2025 12:09

Trigger CI

8b744d1

Fix rmdir options: use force instead of recursive

cafaee1

The @shopify/cli-kit rmdir function uses the 'del' library which handles recursion automatically. The RmDirOptions interface only supports 'force', not 'recursive'.

github-actions bot removed the cla-needed label Dec 8, 2025

skypher added 2 commits December 8, 2025 12:28

Fix lint errors: generic constructor and no-catch-all

a4a56be

Fix lint: move inline comments above code

3463436

skypher requested a review from a team as a code owner December 8, 2025 04:48

Add changeset for stale lockfile fix

c80cbaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix stale lockfile cleanup for function extension builds #6696

Fix stale lockfile cleanup for function extension builds #6696

Uh oh!

skypher commented Dec 8, 2025 •

edited

Loading

Uh oh!

skypher commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix stale lockfile cleanup for function extension builds #6696

Are you sure you want to change the base?

Fix stale lockfile cleanup for function extension builds #6696

Uh oh!

Conversation

skypher commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Signal Handlers for Clean Shutdown

2. Proactive Stale Lock Detection on Startup

Test plan

Uh oh!

skypher commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

skypher commented Dec 8, 2025 •

edited

Loading