How much space in Git does a git-annex file use?

When I store a file in git annex, how much space does it use in the Git repository?
- TL;DR about 400 packfile bytes/file at scale, plus another 100 per copy or drop until git annex forget.
What is a sensible value for annex.largefiles?

The experiment

While adding the tiny files in batches of 1000, it looks so linear it’s dull
Dividing the size of the packfile by number of files gives a size per file. As the number of files increases this shows an asymptotic decrease in packfile bytes cost per file,
With 50k files present, adding or removing a copy of all files also looks fairly linear
The return to three copies would not look out of place if it were mirrored over to five copies.
Files also take fewer bytes per copy when there are more copies
There is probably room for more experiments here, but it is enough for me.

In the case of changing sensibly sized text files, plain Git does a great job.
For write-once files which gzip to over 400 bytes,
- it might be worth putting them in the annex, so you can choose to not have them available
- you still have to pay for the symlink checkout (probably rounds up to 4 KiB)
- file checkouts probably also round up to 4 Kib
- files present in the annex also have three levels of directory above (probably 4 KiB each), two of which will be shared when many files are present

I haven’t made or tested settings for annex.largefiles yet, or considered what sort of experiment to run.

Using very small files like "$n\n" on backend=SHA256E
- These keys contain the length, which is single-digit (up to 6). For longer files, more digits and more variability.
With no chunking or URLs?
- git annex can store other things with the log, which are not tested here
Running the add/copy/drop operations close together in time
- This allows the integer part of the timestamps to compress better - you may pay some extra bytes for operations spread across years.
- The nanoseconds are probably still just noise in any case.

Other caveats,

Sizes are for one aggressively repacked packfile. When there are loose objects or multiple packfiles, total size will be larger.

Or “Can I repeat the experiment?”

I did it with some grubby shellscripts, Perl and filled in the gaps with paste-from-the-documentation oneliners.