-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Hook up ReleaseFile
s to the cleanup/deletion script
#95870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9206027
to
6cab655
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As cursor points out, the --organization
flag is only used for per-org model deletes but not for other models that would technically support filtering by organization. I think this can also be added in a follow-up to keep the change small, but we should be clear in the CLI doc about it. For example "Limit truncation to only entries from organization, applies only to release files".
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #95870 +/- ##
==========================================
+ Coverage 80.83% 80.84% +0.01%
==========================================
Files 10592 10592
Lines 610104 610109 +5
Branches 23993 23993
==========================================
+ Hits 493167 493233 +66
+ Misses 116227 116166 -61
Partials 710 710 |
As the `ReleaseFile` is cross-project, and has no index on the newly added date columns, I have duplicated the per-project deletion code to rather run per-organization.
6cab655
to
880a963
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Bulk Deletion Ignores Organization Filter
When the cleanup
command is executed with the --organization
flag, the run_bulk_query_deletes
function does not filter deletions by the specified organization. This occurs because run_bulk_query_deletes
does not accept or pass the organization_id
to the BulkDeleteQuery
constructor. As a result, models handled by this function (e.g., UserReport
, GroupEmailThread
) are deleted from all organizations instead of being limited to the target organization.
src/sentry/runner/commands/cleanup.py#L234-L235
sentry/src/sentry/runner/commands/cleanup.py
Lines 234 to 235 in 880a963
run_bulk_query_deletes(bulk_query_deletes, is_filtered, days, project, project_id) |
src/sentry/runner/commands/cleanup.py#L246-L253
sentry/src/sentry/runner/commands/cleanup.py
Lines 246 to 253 in 880a963
q = BulkDeleteQuery( | |
model=model_tp, | |
dtfield=dtfield, | |
days=days, | |
project_id=project_id, | |
order_by=order_by, | |
) | |
src/sentry/runner/commands/cleanup.py#L478-L520
sentry/src/sentry/runner/commands/cleanup.py
Lines 478 to 520 in 880a963
(GroupEmailThread, "date", None), | |
] + additional_bulk_query_deletes | |
return BULK_QUERY_DELETES | |
def run_bulk_query_deletes( | |
bulk_query_deletes: list[tuple[type[Model], str, str | None]], | |
is_filtered: Callable[[type[Model]], bool], | |
days: int, | |
project: str | None, | |
project_id: int | None, | |
) -> None: | |
from sentry.db.deletion import BulkDeleteQuery | |
debug_output("Running bulk query deletes in bulk_query_deletes") | |
for model_tp, dtfield, order_by in bulk_query_deletes: | |
chunk_size = 10000 | |
debug_output(f"Removing {model_tp.__name__} for days={days} project={project or '*'}") | |
if is_filtered(model_tp): | |
debug_output(">> Skipping %s" % model_tp.__name__) | |
else: | |
BulkDeleteQuery( | |
model=model_tp, | |
dtfield=dtfield, | |
days=days, | |
project_id=project_id, | |
order_by=order_by, | |
).execute(chunk_size=chunk_size) | |
def prepare_deletes_by_project( | |
project: str | None, project_id: int | None, is_filtered: Callable[[type[Model]], bool] | |
) -> tuple[QuerySet[Any] | None, list[tuple[Any, str, str]]]: | |
from sentry.constants import ObjectStatus | |
from sentry.models.debugfile import ProjectDebugFile | |
from sentry.models.group import Group | |
from sentry.models.project import Project | |
# Deletions that we run per project. In some cases we can't use an index on just the date | |
# column, so as an alternative we use `(project_id, <date_col>)` instead | |
DELETES_BY_PROJECT = [ |
Was this report helpful? Give feedback by reacting with 👍 or 👎
As the
ReleaseFile
is cross-project, and has no index on the newly added date columns,I have duplicated the per-project deletion code to rather run per-organization.
I’m not entirely confident in this TBH, and I wonder if this has any tests?