Hg-repos 9 is still downloading repos.
Downloaded 763gb until now(all three repos)
I will return to my workstation june 30th. Hopefully pack 9 is done by then.
Good news: "hg clone" appears to be relatively thread-safe, i.e. the following command worked correctly for me (one thread completed the task, and the other 3 threads failed early, as the destination directory was already created):
hg clone https://bitbucket.org/hudson/magic-lantern & hg clone https://bitbucket.org/hudson/magic-lantern & hg clone https://bitbucket.org/hudson/magic-lantern & hg clone https://bitbucket.org/hudson/magic-lantern
I cannot guarantee it's 100% safe (given enough test runs, you might end up with a messed-up destination directory), but it's likely good enough for parallelizing this stuff. So, I've added a "
shuf" in the download script, to randomize the list of repos, like this:
for f in $(cat $1 | cut -d ' ' -f 2 | shuf); do
and launched 6 download threads on repo set #09. Fully downloaded in less than 12 hours

hg-repos-09: 92.33% downloaded, 7.67% errors, 0.00% todo, 17.97 MiB average
Also completed 14, 15 and 16, with the same trickery:
hg-repos-14: 95.62% downloaded, 4.38% errors, 0.00% todo, 23.86 MiB average
hg-repos-15: 96.23% downloaded, 3.77% errors, 0.00% todo, 25.55 MiB average
hg-repos-16: 95.51% downloaded, 4.49% errors, 0.00% todo, 29.93 MiB average
I've made effectively no progress since my last post. I got a 7zip command line working to store and move to the synced storage drive, but it is also very slow.
At least, you've completed 07 and 13, which nobody else downloaded, so let's focus on these.
First, move the two sets into another directory. You could try this script (from the working directory), currently hardcoded for #09, customize as needed:
SRC=hg-repos-09 # must be in the working directory
DST=../bitbucket-all-repos-09 # must be outside the working directory, sans trailing slash
for f in $(cat $SRC | cut -d ' ' -f 2 ); do
if [ -f $f.commits ]; then
echo $f
mkdir -p -- $DST/$f
mv -- $f/.hg $DST/$f/
mv -- $f.commits $DST/$f.commits
fi
done
On my machine, this took about 2.5 minutes.
Then, you check how much space this folder uses:
du -b -d 0 -h ../bitbucket-all-repos-09
This took a bit longer - 15 minutes, and reported 176G.
Then, one of the fastest ways to turn all of these tiny files (which are locking up your file browser) into a single large file, is probably without compression:
cd ../bitbucket-all-repos-09
find -type f -path '*/*/.hg/*' | tar -cvf - -T - > repos-09.tar
On the directory with ML forks, it took 16 minutes for 17 GiB input, 18 GiB output, so... expecting about 3 hours for one set (09 in particular).
Or maybe with some weak compression (
benchmark):
cd ../bitbucket-all-repos-09
find -type f -path '*/*/.hg/*' | rev | sort | rev | tar -cvf - -T - | gzip -1 - > repos-09.tar.gz
10 minutes, 12 GiB output. Expecting about 2 hours on repo set 09, but not actually tested. Tempted to try some more.
Edit: gzip -9, 20 minutes, 11.9 GiB output. No, thanks

In any case, it should be much faster than Windows Explorer or GUI-based archivers.
With xz, you will get better compression (~157 MiB in previous tests, with -9e --lzma2=dict=1536Mi for the same - highly redundant - input), but it's also much slower (40 min for that particular test).
The above commands can be adapted by anyone else who'd like to share what they downloaded.