Skip to content

Commit af08e21

Browse files
committed
multi-pack-index: prepare 'repack' subcommand
In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The verb will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. Signed-off-by: Derrick Stolee <[email protected]>
1 parent 1c4af93 commit af08e21

File tree

5 files changed

+37
-1
lines changed

5 files changed

+37
-1
lines changed

Documentation/git-multi-pack-index.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,17 @@ expire::
3636
have no objects referenced by the MIDX. Rewrite the MIDX file
3737
afterward to remove all references to these pack-files.
3838

39+
repack::
40+
Collect a batch of pack-files whose size are all at most the
41+
size given by --batch-size, but whose sizes sum to larger
42+
than --batch-size. The batch is selected by greedily adding
43+
small pack-files starting with the oldest pack-files that fit
44+
the size. Create a new pack-file containing the objects the
45+
multi-pack-index indexes into those pack-files, and rewrite
46+
the multi-pack-index to contain that pack-file. A later run
47+
of 'git multi-pack-index expire' will delete the pack-files
48+
that were part of this batch.
49+
3950

4051
EXAMPLES
4152
--------

builtin/multi-pack-index.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,13 @@
55
#include "midx.h"
66

77
static char const * const builtin_multi_pack_index_usage[] = {
8-
N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire)"),
8+
N_("git multi-pack-index [--object-dir=<dir>] (write|verify|expire|repack --batch-size=<size>)"),
99
NULL
1010
};
1111

1212
static struct opts_multi_pack_index {
1313
const char *object_dir;
14+
unsigned long batch_size;
1415
} opts;
1516

1617
int cmd_multi_pack_index(int argc, const char **argv,
@@ -19,6 +20,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
1920
static struct option builtin_multi_pack_index_options[] = {
2021
OPT_FILENAME(0, "object-dir", &opts.object_dir,
2122
N_("object directory containing set of packfile and pack-index pairs")),
23+
OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
24+
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
2225
OPT_END(),
2326
};
2427

@@ -40,6 +43,11 @@ int cmd_multi_pack_index(int argc, const char **argv,
4043
return 1;
4144
}
4245

46+
if (!strcmp(argv[0], "repack"))
47+
return midx_repack(opts.object_dir, (size_t)opts.batch_size);
48+
if (opts.batch_size)
49+
die(_("--batch-size option is only for 'repack' verb"));
50+
4351
if (!strcmp(argv[0], "write"))
4452
return write_midx_file(opts.object_dir);
4553
if (!strcmp(argv[0], "verify"))

midx.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1110,3 +1110,8 @@ int expire_midx_packs(const char *object_dir)
11101110
string_list_clear(&packs_to_drop, 0);
11111111
return result;
11121112
}
1113+
1114+
int midx_repack(const char *object_dir, size_t batch_size)
1115+
{
1116+
return 0;
1117+
}

midx.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ int write_midx_file(const char *object_dir);
5050
void clear_midx_file(struct repository *r);
5151
int verify_midx_file(const char *object_dir);
5252
int expire_midx_packs(const char *object_dir);
53+
int midx_repack(const char *object_dir, size_t batch_size);
5354

5455
void close_midx(struct multi_pack_index *m);
5556

t/t5319-multi-pack-index.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -410,4 +410,15 @@ test_expect_success 'expire removes unreferenced packs' '
410410
)
411411
'
412412

413+
test_expect_success 'repack with minimum size does not alter existing packs' '
414+
(
415+
cd dup &&
416+
ls .git/objects/pack >expect &&
417+
MINSIZE=$(ls -l .git/objects/pack/*pack | awk "{print \$5;}" | sort -n | head -n 1) &&
418+
git multi-pack-index repack --batch-size=$MINSIZE &&
419+
ls .git/objects/pack >actual &&
420+
test_cmp expect actual
421+
)
422+
'
423+
413424
test_done

0 commit comments

Comments
 (0)