Skip to content

Loading pipelines containing many nested sorting extractors are non-performant #1819

@DradeAW

Description

@DradeAW

Hi,

In my pipeline (Lussac), I have very complex sorting objects (made with like tens of UnitsSelectionSorting, tens of UnitsAggregationSorting, and multiple other curation functions.
Handling those objects is fine, fast, and memory-efficient.
Saving such objects to pickle format or JSON format is also fast and memory-efficient (although the json file is tens of thousands of lines long and can weigh more than 100 MB ^^).

The problem comes when loading.
If the file was saved with recursive=False, then the loading is fast and efficient.
But if the file was saved with recursive=True, then the loading can take minutes and more than 10 GB of RAM, whereas creating the object takes less than a second and almost no RAM.

In my case, I need to use recursive=True because I am using relative_to when saving, which can only propagate if loaded recursively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreChanges to core modulecurationRelated to curation moduleextractorsRelated to extractors module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions