Bitbucket set to remove Mercurial support

Started by names_are_hard, August 20, 2019, 04:48:31 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.


Quote from: names_are_hard on June 09, 2020, 03:14:56 AM
hg-fast-export can't handle unnamed heads

Author of bitbucket-hg-exporter recommends hg-export-tool in the FAQ.

Currently trying a conversion with hg-git, as they say it should be lossless:

The Hg-Git plugin can convert commits/changesets losslessly from one system to another, so you can push via a Mercurial repository and another Mercurial client can pull it. In theory, the changeset IDs should not change, although this may not hold true for complex histories.

hg clone
mkdir magic-lantern-git
cd magic-lantern-git
git init
cd ..
cd magic-lantern
hg up unified
hg bookmarks hg
hg push ../magic-lantern-git

Result: only the "unified" branch got exported, as "hg". Fixme: gitk crashes on that repo. Cloning back to Mercurial appears to work, but changeset ID was not preserved.

Unified tip on Bitbucket: 7a3b5fa Ghost image: further cleanups and fixes
Unified tip after conversion to git and back to hg: 12839:68c3b9b53f35 Ghost image: further cleanups and fixes

Last commit with the same changeset ID:

77:4845885c4d68 (default) Added tag before-dryos-split for changeset b3ac1159ee7c

Not good. Conversion to git is definitely not lossless, at least not with this tool.

The others:
- git-hg and git-hg-again: both failing with:

Traceback (most recent call last):
  File "/usr/local/lib/git-hg/fast-export/", line 405, in <module>
  File "/usr/local/lib/git-hg/fast-export/", line 318, in hg2git
    if not verify_heads(ui,repo,heads_cache,force):
  File "/usr/local/lib/git-hg/fast-export/", line 299, in verify_heads
  File "/usr/local/lib/git-hg/fast-export/", line 71, in get_changeset
  File "/usr/lib64/python2.7/site-packages/mercurial/", line 1663, in lookup
    node = scmutil.revsymbol(self, key).node()
  File "/usr/lib64/python2.7/site-packages/mercurial/", line 618, in revsymbol
    raise error.RepoLookupError(_("unknown revision '%s'") % symbol)
mercurial.error.RepoLookupError: unknown revision '��{b~3v���0<�a�a��'

Epic fail.

hg-export-tool: it appears to handle branches one by one. Stopped at "/raw_recc-added-121-and-11751-aspect-rati-1382109783386". TODO: try names_are_hard's mappings.

Sourcehut's invertbucket:
For the repository, it worked nearly out of the box. Had to do a minor edit, as my Bitbucket username is a1ex, but the repository is stored on hudson.

printf "\n%s\n" "Fetching your Bitbucket repos..."

So, if all else fails, at least we have a working Mercurial mirror:

Having some hiccup when cloning it back, from two different networks, but found a workaround:

hg clone -r 1000
cd magic-lantern/
hg pull -r 5000
hg pull -r 10000
hg pull -r 15000
hg pull
hg update unified
hg id # expecting 7a3b5fa3f4c6

Checked the changeset IDs from the tip of unified, crop_rec_4k and digic6-dumper, and they look OK to me.

Migrating the issue tracker, on the other side, didn't work:

Migrate issues for hudson/magic-lantern? [y/n] y
Importing comments...
parse error: Invalid numeric literal at line 1, column 10
Failed to import comment:
parse error: Invalid numeric literal at line 1, column 10


Doing some testing with Having success with "Switch" automator app when using --force option to ignore unnamed heads issue to convert to git.
Trying to convert a magic lantern fork not so successful. Python issues all over. Sigh.
Sorry for the sparse error report here but feel I am opening up a can of worms and I don´t really have the time to clean up any mess. Still one question remains. The day bitbucket removes all mercurial repos. Even if I keep my local repos and have them back upped. Will this allow for me to include commits, history etc even after bitbucket is gone or do I risk to loose all history if not succeeding while still in the cloud?


Quote from: a1ex on June 07, 2020, 01:28:11 PM
...Having copies of all the data before it's deleted, is essential in my opinion. Deciding what to merge, what needs further cleanup, running tests and so on, can be done afterwards, without time pressure.

5D3.113 | 5D3.123 | EOSM.203 | 7D.203 | 70D.112 | 100D.101 | EOSM2.* | 50D.109


Quote from: Danne on June 07, 2020, 09:40:56 PM
Nice script :).
+1 for both mercurial and git.
+10 for keeping mercurial.
I created some curl "repo_ripper" downloading all forks from a bitbucket user based upon knowing the name of the main repository but the script mentioned seems taking care of it all.
Again, relieved that back up download is being done and checking into the possibility of keeping mercurial alive.
Wish my vacation would start already...

+100 and thanks for sharing "repo_ripper". No don't! :)

Agreed that it would definitely be ideal to keep mercurial alive!
5D3.113 | 5D3.123 | EOSM.203 | 7D.203 | 70D.112 | 100D.101 | EOSM2.* | 50D.109


Quote from: a1ex on June 08, 2020, 06:41:39 PM

for f in $(cat all-repos.txt | grep '^hg ' | cut -d ' ' -f 2); do

A1ex, how did you generate the all-repos.txt file?

PS. I think I finally clarified ... with the changes made to bitbucket_hg_exporter?
Canon 1300D, 500D, EOS M, EOS M2


With the script from reply #40 (yes, by modifying bitbucket-hg-exporter).

I have yet to get a complete list, as the script kept stopping somewhere in the middle (edited it a few times to address this, but...)

Will upload mine once it's complete (guess: 1-2 days).

Edit: this time it worked better, currently at about 190000 hg and 600000 git repos. They appear to be reported by their creation date (currently at summer 2016).

Edit2: crashed again somewhere at 2017 (about 207000 hg + 825000 git).


Some progress regarding I have applied for hosting here and they have approved the request. They already imported the main repository, issue list and "internal" open pull requests (i.e. those that did not come from forks).

A bit of surprise: our case is handled by... the author of this article:
Quote from: a1ex on June 07, 2020, 01:28:11 PM

I had no idea he's actually a team member there :)

Now, we should all review their workflow, which is described here:

Existing contributors are already able to login with their Bitbucket and/or Github credentials, but I'm not sure if the repository is visible to them yet. Let me know what happens if you try to login with this link:

Also made some progress with the git workflow, using the git-remote-hg extension:

hg clone
git clone hg::magic-lantern/ magic-lantern-git
cd magic-lantern-git/
git checkout branches/unified
# edit something
# commit
git push

Et voilà, the new commit appears on the right branch in the (local) Mercurial repository.

So, there is hope for lowering the barrier for contributors who prefer Git :)


Quote from: a1ex on June 13, 2020, 12:42:49 PM
Let me know what happens if you try to login with this link:
I got:
QuotePage Not Found
Make sure the address is correct and the page hasn't moved
Canon 1300D, 500D, EOS M, EOS M2


Had the same message as critic.
After logging in with bitbucket account and given access to the data you're directed to:
Which is one magic-lantern too much, I deleted the last magic-lantern in the address bar and then it works

Looks like an empty project too me, is that right ?
I can't see any source code or something  ???


Got it, so it's only accessible to me and Heptapod admins for now.

Will keep you posted.


There are only 2 members:

Bitbucket Importer @bitbucket_importer
Pierre Augier @paugier

Maybe if the rest of the members are added, we can see.
Canon 1300D, 500D, EOS M, EOS M2


I want to be sure that I have all ML source I want to keep, before 1 July, on my computer.

So let's say I want some specific build back on my computer. For instance, the one mentioned on this screenshot:

It says: Mercurial changeset : 6c6f37e9adfc (crop_rec4k_mlv_snd_isogain....and the rest is not readable(I could guess but is this readable in any other way  ??? )

Now what instructions do I have to do in terminal on Mac to get the source of the specific build above back on my computer, just the exact way it was then ?


You look up the changeset ID (6c6f37e9adfc). That identifies the exact commit you were working with (the exact version, if you want).

You also need to identify the repository containing this changeset (in this particular case, Danne's), clone it, "hg update 6c6f37e9adfc -C" and compile.

If there is a + on the changeset ID, that means there were local changes that were not yet committed. These changes are also included in the binaries, both for autoexec.bin and for modules (see this PR, but link will be broken after July 1st).

Besides the main repo, I'm also downloading all ML forks I could identify*), and planning to provide an archive. I can also identify unique commits from each fork, to review them later.

*) I'm still trying to download all Mercurial repositories from Bitbucket, to make sure I don't miss anything, but only managed to get ~ 40.000 repos out of 250.000 in one week, and there are only 2 weeks left. There is also, but... in their archive, the changeset IDs do not match (example: their unified vs bitbucket).

Maybe some parallelization is needed on my side :)

Anyone with good network connectivity, some hundred(s) of GiB of free space, able to run Bash scripts, recent Python (3.7+) and willing to leave their computer(s) running for the next two weeks? I'll provide a list of repos*) (divided among all participants, if that makes things easier) and download scripts. You will also need to keep an eye on the scripts, as Mercurial sometimes crashes or asks for usernames or whatever, but I'll try to sort it out.

*) Still unable to get the list of repos created after ~ 2017.


Quote from: a1ex on June 13, 2020, 09:07:17 PM
You look up the changeset ID (6c6f37e9adfc). That identifies the exact commit you were working with (the exact version, if you want).

Probably a stupid question, but how exactly do I look up the changeset ID, using the webinterface of bitbucket ?

Quote from: a1ex on June 13, 2020, 09:07:17 PM
Anyone with good network connectivity, some hundred(s) of GiB of free space, able to run Bash scripts, recent Python (3.7+) and willing to leave their computer(s) running for the next two weeks? I'll provide a list of repos*) (divided among all participants, if that makes things easier) and download scripts. You will also need to keep an eye on the scripts, as Mercurial sometimes crashes or asks for usernames or whatever, but I'll try to sort it out.

Not sure how much participants you get, but I might be able to help, if the script can be paused and resumed or something like that. Or runnable in small chunks.
So if I can let it run for a few hours and stop it when needed, I could help.
My connection is not superfast but reasonable at about 20Mbit/s. And I might have about 750GB of HDD space available for this.


@Alex - I can volunteer some time & space.  I've got some TB spare on a Linux server.  Send me your hacky scripts :)

You can find changeset IDs by searching commits.  But you have to be on the right repo. Yours is from Danne's:


I have a good connection and plenty of space (I also have significant hosting space available) but lacking the mindset to tackle (extended) debugging for this task.

If you create a relatively simple workflow I'll jump on it.

Possible to split the task up by month? 12 concurrent scripts running to download a years worth of data.



Will prepare scripts and file lists for tonight, then. Repo list reached 2018, btw :)


I can help too.
I have a linux server and some space.
Canon 1300D, 500D, EOS M, EOS M2


Alright folks, tonight we start backing up "all of Bitbucket" (well, all of the Mercurial repositories stored there).


Bitbucket (Atlassian) are going to discontinue Mercurial support (whatever, it's their business; after all, it was a free service, so we can't complain). The ugly part is - they don't seem to bother providing an archive. No, they are going to DELETE more than 10 years of work from TENS OF THOUSANDS users.

Other source code hosts did provide an archive when shutting down, see e.g. CodePlex or Google Code. Not Bitbucket.

For this reason, I'd stay away from their current and future offerings, no matter how tempting they might be. Time to move on.

OK, they were nice enough to give us one year to migrate (they could have deleted everything right away, and they would have probably been covered by their ToS - which I didn't read). For active projects, that's probably fine (more or less). However, many of these projects are no longer maintained; their last update was several years ago.

If you believe these unmaintained projects are no longer of interest, please stop reading here.

Anyway. Migration wasn't straightforward either. To date, I'm not aware of any way to losslessly convert a Mercurial repository to Git (and I bet the Bitbucket folks are not aware either). The hg-git extension promises lossless conversion, but fails to preserve the changeset IDs. The best results I've got were with the git-remote-hg extension, which provides a way to contribute from git, but... after trying several tools, I couldn't find a way to recover the original hg repo from its git copy.

OK, so what's the plan?

I've got a list of nearly ALL Bitbucket repositories, retrieved through their API, and attempted to download them. The process was not fast - only about 50.000 repos in one week (OK, my download setup was not very optimized either). There are about 250.000 Mercurial repos (as estimated by Octobus); downloading all of these would take about 5 weeks at this rate. There are only two weeks left, so... let's parallelize!

Estimated average repo size is about 5 10-15 MiB, so the entire *raw* archive should fit on 1-2 3-5 TB of disk space.

How much can this be compressed? First observation is that Mercurial raw data is already compressed, so attempting to e.g. "tar bz2" or "tar xz" every single repo, like I did in my previous attempt, is not going to help much.

However, many of these repositories are forks, which means, plenty of duplicate data. Therefore, forks are expected to compress very well if we group them together). Here's an example from our project (hudson/magic-lantern and its forks, which I've already downloaded from Bitbucket):

- Raw archive: 16.4 GiB (471 repos out of 540 reported by the API, 35.6 MiB average, downloaded in 2.5 hours)
- Individually compressed repos (tar.xz, default settings): 14.3 GiB (compression took 2 hours, one core on i7 7700HQ)
- Archive of tarballs: 14.3 GiB (food for thought)
- All ML forks archived together (tar.xz, -9e --lzma2=dict=1536Mi): only 273 MiB (!), compression time ~ 1 hour :)

Archiving everything in a single file, after downloading, might also work reasonably well (todo: test on a set of 10.000 repos).

Hence, the plan:

- Stage 1 (next two weeks): download all Mercurial repos from Bitbucket and store them uncompressed (raw .hg directories)
- Stage 2: decide the best strategy for compressing all of this stuff (possibly by grouping forks together - can be automated)
- Stage 3: publish an archive, for the entire world to use (what if other open source projects missed some important bit during their migration?)

List of repos:
all-repos (huge file; all repos until June 16, 2020)
hg-repos (huge file; list of Mercurial repos, which I'll divide in smaller chunks)
[ fields: hg/git, user/repo, creation date, last updated, optional url ]

Let's divide these into manageable chunks:

split -l 10000 --numeric-suffixes hg-repos hg-repos-

hg-repos-00 (a1ex: 91.23% downloaded, 8.77% errors, 2.82 MiB average) (Levas: started)
hg-repos-01 (a1ex: 92.38% downloaded, 7.62% errors, 4.16 MiB average)
hg-repos-02 (a1ex: 92.08% downloaded, 7.92% errors, 6.81 MiB average)
hg-repos-03 (a1ex: 92.76% downloaded, 6.43% errors, 0.81% todo)
hg-repos-04 (a1ex: 31.84% downloaded, 2.16% errors, 66.00% todo)
hg-repos-05 (critix: 23.59% downloaded, 1.78% errors)
hg-repos-06 (critix: 27.30% downloaded, 2.02% errors)
hg-repos-07 (Audionut: 23.08% downloaded, 1.96% errors) (Danne: started)
hg-repos-08 (Danne: started)
hg-repos-09 (Danne: started)
hg-repos-10 (Danne: started)
hg-repos-11 (names_are_hard: started)
hg-repos-12 (names_are_hard: started)
hg-repos-13 (names_are_hard: started) (Audionut: 25.35% downloaded, 1.18% errors)
hg-repos-14 (a1ex: 68.22% downloaded, 0.37% errors, 31.41% todo)
hg-repos-15 (a1ex: 91.68% downloaded, 0.94% errors, 7.38% todo)
hg-repos-16 (a1ex: 12.01% downloaded, 0.43% errors, 87.56% todo)
hg-repos-17 (Audionut: 24.25% downloaded, 3.39% errors) (kitor: 14.37% downloaded, 0.43% errors)
hg-repos-18 (Audionut: 22.46% downloaded, 0.51% errors) (kitor: 14.57% downloaded, 0.21% errors)
hg-repos-19 (Audionut: 15.49% downloaded, 0.27% errors) (kitor: 10.17% downloaded, 0.14% errors)
hg-repos-20 (Audionut: 16.87% downloaded, 0.21% errors)
hg-repos-21 (Audionut: 20.56% downloaded, 0.26% errors)
hg-repos-24 (Edit and fix link //Audionut)

Only ML forks, as identified earlier in this thread (caveat: different file format):
ml-forks (a1ex: 471/540 downloaded; the others had errors)

The hacky download script:

# usage: [bash] ./ hg-repos-00   # or 01, 02 etc

for f in $(cat $1 | cut -d ' ' -f 2); do
  echo "Processing $f ..."

  # skip already-downloaded repos (for which we have a valid .commits file)
  if [ ! -f $f.commits ]; then
    # hg clone, don't prompt for user/password/whatever,
    # and don't update the working directory (we only need the .hg folder)
    # this may fail (404 on some repos, auth needed on others, etc)
    if hg clone --config ui.interactive=false -U --$f $f; then   # HTTPS version, slower, but works out of the box
    #if hg clone --config ui.interactive=false -U -- ssh://[email protected]/$f $f; then   # SSH version, faster (thanks kitor), but requires additional setup ("You need to add your ssh public key to bitbucket. And run HG once by hand (without disabling interactive shell) to accept remote ssh pubkey.")
      # for each successfully-cloned repo, we build a list of commits (hashes only)
      # this lets us identify the contribution of every single fork
      # this may be used to decide the best compression strategy, after downloading all of the stuff
      (cd -- $f && hg log --template "{node}\n" > ../../$f.commits)
      # "hg clone" failed for some reason
      # todo: report status? (404 or whatever)
      # these repos will be retried if you run the script twice
      echo "$f" >> hg-clone-errors.txt

I'm keeping the script very simple, to avoid potential trouble. To parallelize, you should be able to start as many instances as you want (each instance with its own repo list, of course). These instances can probably work in the same directory (not thoroughly tested, but as long as each instance processes a different list of repos, it should be fine, I think). You can stop and restart each instance as needed, by closing the terminal (CTRL-C will only stop the active process, very likely a "hg clone", resulting in a false error report).

You will need:
- a working directory with enough free space available (assume 10 20 MiB / repo on average, although it's likely less)
- one or more lists of repos (download from above).

That's it, now you are ready to run the script. Guess: 2...4 threads might help (not 100% sure).
2-3 threads are probably best, depending on how powerful your machine is. Watch out for HDD/SSD thrashing!
If your machine starts to be unresponsive, you may have too many threads running. Stop or pause some of them!

Caveat: never edit Bash scripts while they are running!

Script outputs:
- For each repo (e.g. hudson/magic-lantern):
  - user directory (hudson/)
  - project directory (hudson/magic-lantern/)
  - .hg folder (hudson/magic-lantern/.hg/ - possibly hidden by default, depending on your file browser)
  - a list of commits (hudson/magic-lantern.commits)
- For all repos:
  - hg-clone-errors.txt (hopefully obvious)

Once you start running the script, let me know what subset(s) of repos you are downloading; we all should know what everybody is downloading, what has been downloaded, and so on. Ideally, each repo (or set of repos) should be downloaded by at least two participants, just in case. I'll edit this post to keep it up to date.

Afterwards, you can create an archive of what you downloaded, with:

tar -cJvf commits.tar.xz */*.commits                                 # commit lists only
tar -cf - */*/.hg | xz -9e --lzma2=dict=1536Mi -c - > repos.tar.xz   # hg repos only (slow, RAM-intensive!)

Let's hope the resulting files will be small enough to exchange them.

Aren't others already taking care of this?
(or, aren't you supposed to port ML on the 13D Mark 7, instead of this bull***?)

Unfortunately, the well-known -- who, to their credit, saved the day countless times -- apparently didn't do a great job on this one (see earlier report from aprofiti).

These guys did better (see but...

1) they are one single point of failure (so, a second backup should never hurt)
2) we have noticed a little issue in their archive: the Mercurial changeset IDs were not kept, at least in our repo (example a few posts above)
3) the deadline is coming!

OK, OK. Why didn't you start earlier?

As you may or may not know, this is a hobby project for us. Translation: if there is time to spare, the project advances. If there is not, the project stalls (or... disappears). And, as surprising as it may sound, we also have to eat from time to time (alongside with our families). The "default" way to put food on the table is to get a job, which can have the side effect of not leaving much time for hobbies (especially during these difficult times).

As mentioned earlier, I actually took a 2-week holiday in order to perform this migration and to catch up with other hobby projects (no travel plans or anything like that). That's when I've started to research this issue and noticed the need for a good backup of all those repos (not only ours).

The good side: after one week of messing around, I think I've got a plan that has at least some chances to work :)

So, let's try!


I've started:

hg-repos-11: finished 1st pass download
hg-repos-12: finished 1st pass download
hg-repos-13: finished 1st pass download

Will update this post as things complete, I start others, etc.

Using iftop, I see that with 3 processes, it will sometimes saturate my 100Mb downstream for sustained periods.  Other times it won't, more like 20Mb.  So I'd guess a mix of CPU and network bound dependent on what it's doing at the time.

PS I can't believe the 13D Mark 7 has native 8k 60fps but no clean HDMI.  Not even worth supporting.


Quote from: a1ex on June 14, 2020, 11:34:27 PM
Guess: 2...4 threads might help (not 100% sure).
An alternative would be to use Aria2. It has options to optimize threads to reach maximum speeds. Example ~/.aria2/aria2.conf:


I don't know if hg clone already does this, but would be nice to have SHA256 for each split...


Frustrates me trying to work out all of the dependencies to get things like this running. Anyway, I've got the following running...


edit: See this post by kitor before proceeding with the below:

For those following along with a windows 10 x64 box who want to help.

You need to install the windows subsystem for linux and get ubuntu:
*remember username and password

Install mercurial from within ubuntu:
sudo add-apt-repository ppa:mercurial-ppa/releases
sudo apt-get update
sudo apt-get install mercurial

Copy a1ex's hacky download script from above and save it as
This may help:

Download the hg-repos-xx from above.

The default home folder in ubuntu is located at:

Replace (xxxx) with with your windows 10 user. Replace (username) with whatever username you used when installing ubunutu.

I would suggest to cut the username folder and paste it wherever you want to store these downloaded repos. Then drop a symbolic link back into the original ubuntu home location.
I use a link shell extension to make that task easier.

Copy the script you saved and the hg-repos-xx to the username folder.
In ubunutu make the script executable.
chmod +x ~/

Then run it:
sudo ~/ hg-repos-xx

I initially had 5 scripts running which was keeping my 100Mbps reasonably saturated and things were singing along nicely, but after some time my storage drive couldn't keep up with all the random writes and crashed to a halt. #beware


Thanks for documenting the Windows way.  At least you have WSL these days to make it easier!

I guessed lucky: 3 processes is doing about 30GB per hour which leaves enough bandwidth left out of my 100Mb to not make normal stuff suck.


I had to implement a good QOS to stop my eldest son complaining about his ping times. Has the advantage of doing things like this while keeping the connection perfectly usable for normal tasks.
Went down a rabbit hole involving bufferbloat.


Canon 1300D, 500D, EOS M, EOS M2