Skip to content

Conversation

@sunxiaojian
Copy link
Collaborator

@sunxiaojian sunxiaojian commented Oct 31, 2025

What changes were proposed in this pull request?

Support scan planning endpoint for Iceberg REST server

Why are the changes needed?

Fix: #(9048)

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

org.apache.gravitino.iceberg.service.rest.TestScanPlanCache

@sunxiaojian sunxiaojian force-pushed the issue-8311 branch 2 times, most recently from 147b989 to 3b9a636 Compare October 31, 2025 08:26
@sunxiaojian
Copy link
Collaborator Author

Waiting me fix it.

@sunxiaojian
Copy link
Collaborator Author

sunxiaojian commented Nov 2, 2025

Waiting me fix it.

Done, Upgraded Iceberg to version 1.10.0 and upgraded the dependent Hadoop to Hadoop 3.

@sunxiaojian sunxiaojian force-pushed the issue-8311 branch 5 times, most recently from 405d548 to b38bb69 Compare November 5, 2025 10:22
@FANNG1
Copy link
Contributor

FANNG1 commented Nov 7, 2025

Thanks @sunxiaojian , it's an amazing feature, the overall architecture looks good to me, would you like to split this PR into three part to make each PR more simple to review?

  1. upgrade Iceberg version
  2. support scan planning interface
  3. scan planning cache support

@sunxiaojian
Copy link
Collaborator Author

Thanks @sunxiaojian , it's an amazing feature, the overall architecture looks good to me, would you like to split this PR into three part to make each PR more simple to review?

  1. upgrade Iceberg version
  2. support scan planning interface
  3. scan planning cache support

ok, I'll handle it.

@sunxiaojian
Copy link
Collaborator Author

sunxiaojian commented Nov 9, 2025

Thanks @sunxiaojian , it's an amazing feature, the overall architecture looks good to me, would you like to split this PR into three part to make each PR more simple to review?

  1. upgrade Iceberg version
  2. support scan planning interface
  3. scan planning cache support

ok, I'll handle it.

  1. upgrade Iceberg version #9049
  2. support scan planning interface #9050
  3. Current PR.

Because the PRs are interdependent, they need to be merged in order. After each PR is merged, the remaining PRs need to merge in the main branch.

@sunxiaojian sunxiaojian changed the title [#8311] feat(iceberg-rest-catalog):Support scan planning endpoint for Iceberg REST server [#9048] feat(iceberg-rest-catalog): Add cache for scan planning. Nov 27, 2025
@sunxiaojian sunxiaojian force-pushed the issue-8311 branch 3 times, most recently from 0437361 to 1bd2b9c Compare November 27, 2025 10:35
@sunxiaojian
Copy link
Collaborator Author

@FANNG1 I have already rebased, PTAL.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a caching mechanism for Iceberg REST scan planning to improve query performance by caching scan plan results. The implementation adds a new ScanPlanCache class using Caffeine cache with configurable capacity and expiration, integrates it into the catalog wrapper, and adds extensive tests for the scan planning endpoint.

Key changes:

  • New scan plan cache using Caffeine with automatic expiration and cleanup
  • Configuration options for cache capacity and expiration time
  • Integration with the Iceberg REST catalog wrapper for scan planning operations

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/ScanPlanCache.java New cache implementation with Caffeine backend, custom cache key based on table identifier and scan parameters
iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/CatalogWrapperForREST.java Integration of scan plan cache into planTableScan method (but with critical bug - cache checked after expensive operation)
iceberg/iceberg-common/src/main/java/org/apache/gravitino/iceberg/common/IcebergConfig.java Added configuration entries for cache capacity and expiration
catalogs/catalog-common/src/main/java/org/apache/gravitino/catalog/lakehouse/iceberg/IcebergConstants.java Added constant definitions for cache configuration keys
iceberg/iceberg-rest-server/src/test/java/org/apache/gravitino/iceberg/service/rest/TestPlanTableScan.java New test file with tests for scan planning endpoint (missing actual cache tests)
docs/iceberg-rest-service.md Documentation for scan plan cache configuration and usage

@sunxiaojian sunxiaojian force-pushed the issue-8311 branch 4 times, most recently from c33f90e to a782498 Compare November 28, 2025 05:23
@sunxiaojian
Copy link
Collaborator Author

@FANNG1 fixed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

+ "from caching but use more memory. Each cached entry stores the complete scan plan response, "
+ "which can be large for tables with many files. A typical scan plan might be several KB to MB "
+ "depending on table size.")
.version(ConfigConstants.VERSION_1_1_0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using .checkValue(value -> value > 0, ConfigConstants.POSITIVE_NUMBER_ERROR_MSG) to simplify the validation check in the following code?

new ConfigBuilder(IcebergConstants.SCAN_PLAN_CACHE_EXPIRE_MINUTES)
.doc(
"Time in minutes after which cached scan plans expire if not accessed. Cached entries are automatically removed after this period of inactivity.")
.version(ConfigConstants.VERSION_1_1_0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same to above comment


| Configuration item | Description | Default value | Required | Since Version |
|------------------------------------------------------------|----------------------------------------------------------|---------------|----------|---------------|
| `gravitino.iceberg-rest.scan-plan-cache-impl` | The implementation of the scan plan cache. | (none) | No | 1.1.0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we disable cache by default? mainly for memory and data correctness concerns

String impl = config.get(IcebergConfig.SCAN_PLAN_CACHE_IMPL);
if (StringUtils.isBlank(impl)) {
LOG.info("Scan plan cache is not configured, using default LocalScanPlanCache");
return new LocalScanPlanCache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we should pass cache capacity and expire time to other impls too.

@FANNG1
Copy link
Contributor

FANNG1 commented Dec 9, 2025

LGTM except minor comments

@FANNG1 FANNG1 added the branch-1.1 Automatically cherry-pick commit to branch-1.1 label Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch-1.1 Automatically cherry-pick commit to branch-1.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants