Another one in red every now and then is something like "stream ended early, expected xxxx bytes, but got xxx bytes".
And some 404: not found.
Those 404 were deleted by their owners; we can ignore them.
The "stream ended early, expected xxxx bytes, but got xxx bytes" are probably network errors, and these will very likely work on a second attempt.
If you run each script 3-4 times, it should "fix" most of these errors. First run will take a long time (let's say about 1 day, give or take), but subsequent runs should be much faster (let's say 1 hour, but varies a lot, depending on how many errors actually were in that set).
For example, after fully downloading hg-repos-03, an extra run (that doesn't download anything new, but only retries the 404's) takes under 10 minutes, but for hg-repos-01, an extra run takes about 40 minutes.
No sources downloaded? Is this to be expected? Are we downloading commits and .hg content only? A bit confused...
Right. The .hg directory is all you need to recover the sources, from any version:
hg update
hg update <changeset_id>
hg update <branch_name>
...
The working directory is therefore redundant and takes additional disk space, so we don't keep it

Here's a Python script that computes the percentages (btw, 0.01% from one set is exactly 1 repo):
from __future__ import print_function
import os, sys
try: error_repos = open("hg-clone-errors.txt").readlines();
except: error_repos = []
error_repos = [x.strip() for x in error_repos]
for i in range(30):
fn = "hg-repos-%02d" % i
try: repos = open(fn).readlines()
except: continue
repos = [r.strip() for r in repos]
downloaded = 0
errors = 0
total = len(repos)
for line in repos:
r = line.split(" ")[1]
if os.path.isfile(r + ".commits"):
downloaded += 1
elif r in error_repos:
errors += 1
else:
pass
print("%s: %.2f%% downloaded, %.2f%% errors, %.2f%% todo" % (fn, downloaded * 100.0 / total, errors * 100.0 / total, (total - errors - downloaded) * 100.0 / total))
There is a catch: this script can't tell what errors might be recoverable (with a second run), and what errors are not (the 404's and a few others). Even if this prints "complete" (0% todo), running the download script once again might bring a few more repos that didn't work on the first try.