Skip to content

feat: Add Operator Cache for databend #18196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented Jun 19, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Add Operator Cache for databend.

Databend uses OpenDAL to access object storage. When accessing through an IAM role, OpenDAL internally uses reqsign to call AWS's API and obtain a temporary token. However, this API has a ratelimit. If users are frequently creating external tables, this can often cause query failures.

This PR address this by introduce an operator cache which can resuse the same operator if inputs are the same.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jun 19, 2025
@BohuTANG BohuTANG requested a review from dantengsky June 20, 2025 08:26
Comment on lines +70 to +91
let cache_key = self.compute_cache_key(params);

// Check if we have a cached operator
{
let mut cache = self.cache.lock().unwrap();
if let Some(operator) = cache.get(&cache_key) {
debug!("Operator cache hit for key: {}", cache_key);
CACHE_HIT_COUNT.inc();
return Ok(operator.clone());
}
}

// Cache miss, create new operator
debug!("Operator cache miss for key: {}", cache_key);
CACHE_MISS_COUNT.inc();

let operator = init_operator_uncached(params)?;

// Insert into cache
{
let mut cache = self.cache.lock().unwrap();
cache.put(cache_key, operator.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using StorageParams as cache key? it already implemented (in this PR) Hash and Eq, in this way we don't need to worry about collisions that could occur with the compute_cache_key: StorageParams -> u64 function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants