Skip to content

Conversation

@jakebailey
Copy link
Member

Often when I see people's profiles, the top allocator is WriteByte, all from the checker building runtime map keys.

We stringify data to use as map keys for union types, intersection types, etc. In JS, this is cheap thanks to the "rope" optimization where the strings are never actually produced unless required (which does not include Map lookups!).

But in Go, this doesn't work so well because we do actually have to back strings with real fully-written memory, and it's not easy to do a lookup otherwise. There's no magic string type that can expand itself when read. A Go stdlib map that allows custom hashing/equality is seemingly not coming soon.

An alternative is to just hash the data and use that hash as a key. A while ago I had tried use sha256 to do this (like gopls does), but did not see much success. And, introducing crypto into the binary is a can of worms I don't want to open. So, I gave up on that idea.

We have since then added xxhash to our dependencies for incremental mode, the LSP, etc.

A 128-bit xxhash could be argued to have enough collision resistance; UUIDs are of course also 128-bit. What are the chances of a UUID-level collision happening within a single compile? 😅

This PR tries that out, swapping our key builder out for one that hashes the data as it comes in instead.

The effect seems pretty good. VS Code before:

Memory used:    3565146K
Memory allocs:  34630016
Config time:      0.130s
Parse time:       0.520s
Bind time:        0.097s
Check time:       6.276s
Emit time:        1.394s
Total time:       8.425s

And after:

Memory used:    3533519K
Memory allocs:  25587150
Config time:      0.132s
Parse time:       0.549s
Bind time:        0.085s
Check time:       6.136s
Emit time:        1.378s
Total time:       8.289s

The time in general seems unchanged, but with 35% fewer allocations.

Downside is of course that the map keys become unreadable and irreversible. Right now, they're only "unreadable" thanks to us making numbers into shorter strings. We could have a mode which preserves the data behind a build tag. Not sure about that.

Thankfully, these keys are never actually exposed through any API, so should be opaque enough.

Copilot AI review requested due to automatic review settings December 9, 2025 00:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces string-based map keys with xxhash-based (128-bit) keys to reduce memory allocations. The change affects various type caches throughout the checker, converting from human-readable string keys to opaque hash values. The PR achieves a 35% reduction in memory allocations with minimal performance impact.

Key changes:

  • Replaced KeyBuilder (string builder) with keyBuilder (hash builder) using xxh3
  • Updated all internal type cache maps to use xxh3.Uint128 as keys instead of strings
  • Modified getRelationKey to return both a hash and a boolean indicating whether type parameters are constrained

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
internal/checker/types.go Updated map type signatures to use xxh3.Uint128 keys for type instantiations, conditional types, and index symbol caches
internal/checker/relater.go Changed relation cache to use hash keys and updated getRelationKey to return constrained flag separately
internal/checker/flow.go Replaced flow reference key from string to hash, added refKeyValid boolean flag to track validity instead of checking for "?" sentinel value
internal/checker/checker.go Rewrote key builder implementation to hash data instead of building strings; updated all type cache maps and special signature keys to use hashes

}

// Return the flow cache key for a "dotted name" (i.e. a sequence of identifiers
// separated by dots). The key consists of the id of the symbol referenced by the
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still describes the old string-based key format. It should be updated to reflect that the function now returns a hash of the reference information, not a structured string. Consider updating to something like: "Return the flow cache key for a 'dotted name' (i.e. a sequence of identifiers separated by dots). The key is a hash computed from the id of the symbol referenced by the leftmost identifier followed by zero or more property names separated by dots."

Suggested change
// separated by dots). The key consists of the id of the symbol referenced by the
// separated by dots). The key is a hash computed from the id of the symbol referenced by the

Copilot uses AI. Check for mistakes.
@jakebailey jakebailey changed the title Use xxhash for keys instead of strings Use xxhash for composite checker keys instead of strings Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants