Skip to content

Mem usage and Increment equal for many lines #234

@PiotrZiolo

Description

@PiotrZiolo

I was searching for a memory leak in my code and for debugging purposes I split a for loop into separate lines where I fill a dict with some objects. These objects are created there. Here's the report from memory_profiler:

Line #    Mem usage    Increment   Line Contents
================================================
    44 1125.984 MiB 1125.984 MiB       @profile
    45                                 def __init__(self, priors=None, base_classes=None, seed=9):
    46                                     """
    47                                     :param pd.DataFrame priors: DataFrame with three columns: date, hour_of_week,
    48                                         prior. The last column defines priors (in the form of dictionaries)
    49                                         for given dates and hours of week.
    50                                     :param pd.DataFrame base_classes: DataFrame with three columns: date, hour_of_week,
    51                                         base_class. The last column defines priors (in the form of dictionaries)
    52                                         for given dates and hours of week.
    53                                     :param int seed: Seed for the random number generator.
    54                                     """
    55                             
    56 1125.984 MiB    0.000 MiB           self.models = {}  # type: Dict[str, CompetitiveClickProbabilityModule]
    57                             
    58 1125.984 MiB    0.000 MiB           self.seed = seed
    59 1125.984 MiB    0.000 MiB           self.rng = np.random.RandomState(seed)
    60 1125.984 MiB    0.000 MiB           self.priors = priors
    61 1125.984 MiB    0.000 MiB           self.base_classes = base_classes
    62                             
    63 1125.984 MiB    0.000 MiB           seed_min = 100000
    64 1125.984 MiB    0.000 MiB           seed_max = 999999
    65 1125.984 MiB    0.000 MiB           seeds = self.rng.randint(low=seed_min, high=seed_max, size=len(self.priors))
    66                             
    67 1126.316 MiB    0.332 MiB           base_df = priors.copy()
    68 1126.387 MiB    0.070 MiB           base_df.loc[:, "base_class"] = base_classes["base_class"]
    69 1127.254 MiB    0.867 MiB           base_df.loc[:, "seed"] = seeds
    70                             
    71 1127.254 MiB    0.000 MiB           self.models = {}  # type: Dict[str, SimulatorModule]
    72                             
    73 1127.383 MiB 1127.383 MiB           self.models["{}.{}".format(base_df["date"][0], base_df["hour_of_week"][0])] = base_df["base_class"][0](base_df["prior"][0], base_df["seed"][0])
    74 1127.508 MiB 1127.508 MiB           self.models["{}.{}".format(base_df["date"][1], base_df["hour_of_week"][1])] = base_df["base_class"][1](base_df["prior"][1], base_df["seed"][1])
    75 1127.898 MiB 1127.898 MiB           self.models["{}.{}".format(base_df["date"][2], base_df["hour_of_week"][2])] = base_df["base_class"][2](base_df["prior"][2], base_df["seed"][2])
    76 1128.383 MiB 1128.383 MiB           self.models["{}.{}".format(base_df["date"][3], base_df["hour_of_week"][3])] = base_df["base_class"][3](base_df["prior"][3], base_df["seed"][3])
    77 1128.758 MiB 1128.758 MiB           self.models["{}.{}".format(base_df["date"][4], base_df["hour_of_week"][4])] = base_df["base_class"][4](base_df["prior"][4], base_df["seed"][4])
    78 1129.176 MiB 1129.176 MiB           self.models["{}.{}".format(base_df["date"][5], base_df["hour_of_week"][5])] = base_df["base_class"][5](base_df["prior"][5], base_df["seed"][5])
    79 1129.633 MiB 1129.633 MiB           self.models["{}.{}".format(base_df["date"][6], base_df["hour_of_week"][6])] = base_df["base_class"][6](base_df["prior"][6], base_df["seed"][6])
    80 1130.070 MiB 1130.070 MiB           self.models["{}.{}".format(base_df["date"][7], base_df["hour_of_week"][7])] = base_df["base_class"][7](base_df["prior"][7], base_df["seed"][7])
    81 1130.508 MiB 1130.508 MiB           self.models["{}.{}".format(base_df["date"][8], base_df["hour_of_week"][8])] = base_df["base_class"][8](base_df["prior"][8], base_df["seed"][8])
    82 1130.883 MiB 1130.883 MiB           self.models["{}.{}".format(base_df["date"][9], base_df["hour_of_week"][9])] = base_df["base_class"][9](base_df["prior"][9], base_df["seed"][9])
    83 1131.320 MiB 1131.320 MiB           self.models["{}.{}".format(base_df["date"][10], base_df["hour_of_week"][10])] = base_df["base_class"][10](base_df["prior"][10], base_df["seed"][10])
    84 1131.758 MiB 1131.758 MiB           self.models["{}.{}".format(base_df["date"][11], base_df["hour_of_week"][11])] = base_df["base_class"][11](base_df["prior"][11], base_df["seed"][11])
    85 1132.156 MiB 1132.156 MiB           self.models["{}.{}".format(base_df["date"][12], base_df["hour_of_week"][12])] = base_df["base_class"][12](base_df["prior"][12], base_df["seed"][12])
    86 1132.633 MiB 1132.633 MiB           self.models["{}.{}".format(base_df["date"][13], base_df["hour_of_week"][13])] = base_df["base_class"][13](base_df["prior"][13], base_df["seed"][13])
    87 1133.008 MiB 1133.008 MiB           self.models["{}.{}".format(base_df["date"][14], base_df["hour_of_week"][14])] = base_df["base_class"][14](base_df["prior"][14], base_df["seed"][14])
    88 1133.434 MiB 1133.434 MiB           self.models["{}.{}".format(base_df["date"][15], base_df["hour_of_week"][15])] = base_df["base_class"][15](base_df["prior"][15], base_df["seed"][15])
    89 1133.883 MiB 1133.883 MiB           self.models["{}.{}".format(base_df["date"][16], base_df["hour_of_week"][16])] = base_df["base_class"][16](base_df["prior"][16], base_df["seed"][16])
    90 1134.320 MiB 1134.320 MiB           self.models["{}.{}".format(base_df["date"][17], base_df["hour_of_week"][17])] = base_df["base_class"][17](base_df["prior"][17], base_df["seed"][17])
    91 1134.758 MiB 1134.758 MiB           self.models["{}.{}".format(base_df["date"][18], base_df["hour_of_week"][18])] = base_df["base_class"][18](base_df["prior"][18], base_df["seed"][18])
    92 1135.141 MiB 1135.141 MiB           self.models["{}.{}".format(base_df["date"][19], base_df["hour_of_week"][19])] = base_df["base_class"][19](base_df["prior"][19], base_df["seed"][19])
    93 1135.570 MiB 1135.570 MiB           self.models["{}.{}".format(base_df["date"][20], base_df["hour_of_week"][20])] = base_df["base_class"][20](base_df["prior"][20], base_df["seed"][20])
    94 1136.008 MiB 1136.008 MiB           self.models["{}.{}".format(base_df["date"][21], base_df["hour_of_week"][21])] = base_df["base_class"][21](base_df["prior"][21], base_df["seed"][21])
    95 1136.418 MiB 1136.418 MiB           self.models["{}.{}".format(base_df["date"][22], base_df["hour_of_week"][22])] = base_df["base_class"][22](base_df["prior"][22], base_df["seed"][22])
    96 1136.883 MiB 1136.883 MiB           self.models["{}.{}".format(base_df["date"][23], base_df["hour_of_week"][23])] = base_df["base_class"][23](base_df["prior"][23], base_df["seed"][23])

As you can see starting from line 73 the Increment seems to be broken as it is always equal to Mem usage. The Mem usage suggests that it should be below 1MB. Is it a bug or my misunderstanding how it is supposed to work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions