Skip to content

Added BigQuery Metastore Catalog #2068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rambleraptor
Copy link
Contributor

@rambleraptor rambleraptor commented Jun 5, 2025

Rationale for this change

This PR brings BigQuery Metastore support to Python after it was merged into the Java implementation.

This allows Iceberg catalog functionality to be backed by BigQuery. It supports creating/deleting/listing namespaces (datasets in BigQuery terminology), creating/deleting/listing tables, and registering tables.

This is my first PR of size to iceberg-python, so any advice would be appreciated!

Are these changes tested?

Integration and unit tests included.

Are there any user-facing changes?

Introduces a new Catalog type.

@rambleraptor rambleraptor marked this pull request as draft June 5, 2025 23:02
@rambleraptor rambleraptor marked this pull request as ready for review June 5, 2025 23:03
@djouallah
Copy link

@rambleraptor user here, I tried to use bigquery metastore recently when it was announced it has added rest API interface, I am just wondering, why you need to add support for pyiceberg if it is already using rest API ?

@jayceslesar
Copy link
Contributor

jayceslesar commented Jun 28, 2025

Might be wise to also try and include the changes here apache/iceberg#13052 -- via apache/iceberg#12808 (comment)

Edit: looks like that is supported via gcp.credentials-location, I see now. Maybe unify on what upstream has planned? gcp.bigquery.credential-file

Comment on lines +46 to +49
GCP_PROJECT_ID = "gcp.project-id"
GCP_LOCATION = "gcp.location"
GCP_CREDENTIALS_LOCATION = "gcp.credentials-location"
GCP_CREDENTIALS_INFO = "gcp.credentials-info"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upstream defines these as

public static final String CREDENTIAL_FILE = "gcp.bigquery.credential-file";
  public static final String PROJECT_ID = "gcp.bigquery.project-id";
  public static final String GCP_LOCATION = "gcp.bigquery.location";
  public static final String LIST_ALL_TABLES = "gcp.bigquery.list-all-tables";

Comment on lines +77 to +81
gcp_credentials = None
if credentials_location:
gcp_credentials = service_account.Credentials.from_service_account_file(credentials_location)
elif credentials_info_str:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credentials_location will take precedence if the user specifies both -- do we want to allow a user to specify both? idrk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are special credentials needed to run this in CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants