-
Notifications
You must be signed in to change notification settings - Fork 686
[#9048] feat(iceberg-rest-catalog): Add cache for scan planning. #8980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
147b989 to
3b9a636
Compare
|
Waiting me fix it. |
3b9a636 to
e7fcf78
Compare
Done, Upgraded Iceberg to version 1.10.0 and upgraded the dependent Hadoop to Hadoop 3. |
...t-server/src/main/java/org/apache/gravitino/iceberg/service/rest/IcebergTableOperations.java
Show resolved
Hide resolved
...rg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/CatalogWrapperForREST.java
Show resolved
Hide resolved
405d548 to
b38bb69
Compare
|
Thanks @sunxiaojian , it's an amazing feature, the overall architecture looks good to me, would you like to split this PR into three part to make each PR more simple to review?
|
ok, I'll handle it. |
fb73cff to
9fea78b
Compare
Because the PRs are interdependent, they need to be merged in order. After each PR is merged, the remaining PRs need to merge in the main branch. |
0437361 to
1bd2b9c
Compare
|
@FANNG1 I have already rebased, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a caching mechanism for Iceberg REST scan planning to improve query performance by caching scan plan results. The implementation adds a new ScanPlanCache class using Caffeine cache with configurable capacity and expiration, integrates it into the catalog wrapper, and adds extensive tests for the scan planning endpoint.
Key changes:
- New scan plan cache using Caffeine with automatic expiration and cleanup
- Configuration options for cache capacity and expiration time
- Integration with the Iceberg REST catalog wrapper for scan planning operations
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java |
New cache implementation with Caffeine backend, custom cache key based on table identifier and scan parameters |
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/CatalogWrapperForREST.java |
Integration of scan plan cache into planTableScan method (but with critical bug - cache checked after expensive operation) |
iceberg/iceberg-common/src/main/java/org/apache/gravitino/iceberg/common/IcebergConfig.java |
Added configuration entries for cache capacity and expiration |
catalogs/catalog-common/src/main/java/org/apache/gravitino/catalog/lakehouse/iceberg/IcebergConstants.java |
Added constant definitions for cache configuration keys |
iceberg/iceberg-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestPlanTableScan.java |
New test file with tests for scan planning endpoint (missing actual cache tests) |
docs/iceberg-rest-service.md |
Documentation for scan plan cache configuration and usage |
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
iceberg/iceberg-common/src/main/java/org/apache/gravitino/iceberg/common/IcebergConfig.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestPlanTableScan.java
Outdated
Show resolved
Hide resolved
...rg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/CatalogWrapperForREST.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestPlanTableScan.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestPlanTableScan.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestPlanTableScan.java
Outdated
Show resolved
Hide resolved
c33f90e to
a782498
Compare
|
@FANNG1 fixed. |
a782498 to
90435eb
Compare
90435eb to
3d9bcf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestScanPlanCache.java
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestScanPlanCache.java
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java
Outdated
Show resolved
Hide resolved
4c14a1f to
0c5bbc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestScanPlanCache.java
Show resolved
Hide resolved
...g-rest-server/src/main/java/org/apache/gravitino/iceberg/service/cache/ScanPlanCacheKey.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestScanPlanCache.java
Show resolved
Hide resolved
...g-rest-server/src/main/java/org/apache/gravitino/iceberg/service/cache/ScanPlanCacheKey.java
Outdated
Show resolved
Hide resolved
...rg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/CatalogWrapperForREST.java
Outdated
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestScanPlanCache.java
Show resolved
Hide resolved
...rest-server/src/main/java/org/apache/gravitino/iceberg/service/cache/LocalScanPlanCache.java
Show resolved
Hide resolved
...g-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestScanPlanCache.java
Show resolved
Hide resolved
...rest-server/src/main/java/org/apache/gravitino/iceberg/service/cache/LocalScanPlanCache.java
Outdated
Show resolved
Hide resolved
...rest-server/src/main/java/org/apache/gravitino/iceberg/service/cache/LocalScanPlanCache.java
Show resolved
Hide resolved
dbca147 to
c6cbd9d
Compare
5222e24 to
4f3f795
Compare
| + "from caching but use more memory. Each cached entry stores the complete scan plan response, " | ||
| + "which can be large for tables with many files. A typical scan plan might be several KB to MB " | ||
| + "depending on table size.") | ||
| .version(ConfigConstants.VERSION_1_1_0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using .checkValue(value -> value > 0, ConfigConstants.POSITIVE_NUMBER_ERROR_MSG) to simplify the validation check in the following code?
| new ConfigBuilder(IcebergConstants.SCAN_PLAN_CACHE_EXPIRE_MINUTES) | ||
| .doc( | ||
| "Time in minutes after which cached scan plans expire if not accessed. Cached entries are automatically removed after this period of inactivity.") | ||
| .version(ConfigConstants.VERSION_1_1_0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same to above comment
|
|
||
| | Configuration item | Description | Default value | Required | Since Version | | ||
| |------------------------------------------------------------|----------------------------------------------------------|---------------|----------|---------------| | ||
| | `gravitino.iceberg-rest.scan-plan-cache-impl` | The implementation of the scan plan cache. | (none) | No | 1.1.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we disable cache by default? mainly for memory and data correctness concerns
| String impl = config.get(IcebergConfig.SCAN_PLAN_CACHE_IMPL); | ||
| if (StringUtils.isBlank(impl)) { | ||
| LOG.info("Scan plan cache is not configured, using default LocalScanPlanCache"); | ||
| return new LocalScanPlanCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we should pass cache capacity and expire time to other impls too.
|
LGTM except minor comments |
What changes were proposed in this pull request?
Support scan planning endpoint for Iceberg REST server
Why are the changes needed?
Fix: #(9048)
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
org.apache.gravitino.iceberg.service.rest.TestScanPlanCache