ML status update, technical edition: repo status, ML internals etc.

Started by names_are_hard, March 06, 2023, 12:02:20 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

names_are_hard

I am currently the most active dev for modern Digic cams (i.e., badly supported new cams).  Kitor also seems to be back now, which is greatly appreciated!

I've been quite quiet lately, so I thought a big summary post to catch up would be useful.  Also, I'd like opinions from other active devs about future direction re code management.

If you don't want to read about technical stuff around ML code, you may want to skip this one :)
The notes below refer to my repo, here: https://github.com/reticulatedpines/magiclantern_simplified
Note that this is "mine" in the sense of I made it, not that I own the code, and not that I'm the only dev: anyone is welcome to contribute!  Kitor and coon42 are the biggest non-me contributors I think but we've had useful work from quite a few more.

A lot of the work I've been doing recently is on non-visible stuff (oh and I did a 7D2 port).  Qemu, automating ML testing, improving code quality.  This is boring for users but valuable for devs.  The main reason I am cautious here is because while the original intention of this repo was for adding support for modern cams, it's turned into the most active, modern ML repo.  The original regression testing system isn't available to me, so I'm writing another one (I can re-use parts of the old one).  When that is workable, we will have higher confidence that changes for new cams don't break old ones.  At that point I'd like to get user testing for old cams, to ensure feature parity with official repo(s).


Repo summary:

Quite a while back, I ported ML code so it expects to use Git, not hg.  I've done similar modernisation work in other areas.  Builds with modern toolchain.  Greatly reduced compiler warnings (some of these fixed years old bugs).  Some bugfixes that affect all cams, new and old.  Reliable fix for ISOless err on a few models.

This repo supports Digic 6, 7, 8 and X cams, 18 models so far.  It builds and *should* work on all old cams, too.  These are not well tested, more on this later.  Much work remains but it's easy to get started if your cam is on this list; you have working ML GUI, just very few features.  Pick a feature and solve it, one at a time!
See kitor's post about overall ML project status: https://www.magiclantern.fm/forum/index.php?topic=26852

A lot of work has been done to allow adding support for a new cam.  When I have a new cam I can add support getting to ML GUI in a few days in most cases.  This will take longer for someone unfamiliar with ML, but it's not that hard.

Merged multiple branches (lua_fix, unified, qemu, digic6-dumper) - no need to frequently switch branches.

The big missing piece here is that we don't have crop_rec_4k integrated.  I would like to merge crop_rec code but I require it to not cause problems on any supported cams.  I don't want to maintain multiple long-lived branches.  This might mean fixing Digic 4 bugs, or disabling crop_rec features on Digic 4 cams so no bugs can be triggered.  Bilal may be working on this problem (I don't have the experience with crop_rec features, or cams to test it on).

Added MMU based memory patching support for cams which have MMU (D7 and up).  Based on srsa's code (thanks!) but extended.  Unpatching not yet supported.
This allows patching very large amounts of ROM code, much more than is possible on old cams.  In theory this means we can do a lot of very cool stuff.  There is effectively no limit on how many patches we can apply on modern cams.

As part of the above, simplified code for patch manager.  In some ways this makes the UI worse, but it removed a few hundred lines of code, and makes it easier to work with MMU related patching for modern cams.  Future work will introduce patchsets, which will allow unpatching on MMU and return non-MMU patching UI to the old look.

Wifi improvements: 200D can use wifi to send whatever we want, wherever we want, and get data back.  This code should be easy to port to other cams.  You can tether to your phone, and we could make it upload photos as you took them.  Many other possibilities.

Module system improvements:
- build process improved; much faster.
- crash bug removed from module build / load process (modules could be loaded at addresses too far from ML code to be called, this would crash).
- fixed old system including modules in zip that a cam couldn't run (bad dependency checking).
- module compatibility extended to Digic 6, 7, 8 and X.

Qemu-eos moved to separate repo and updated from 2.5.0 to 4.2.1:
- much easier to build.
- much easier to modify (no awkward patch file workflow, it's just a normal repo!).
- improved ARM emulation.
- better / faster SD emulation.
- some features of qemu-eos broke in the update, most have been fixed (90% complete?).

In theory, works as before on old cams.  In practice, I don't know, and this will want testing.  I think this should wait until after I have a working Qemu regression testing system, but that should be soon (ish).  Also when that is available, I'd like to update to Qemu 6 - 4 is no longer supported.  Internally, 4 is much closer to 6 than 2 was to 4: this update should be much less painful.
https://github.com/reticulatedpines/qemu-eos/tree/qemu-eos-v4.2.1


Future repo discussion questions:

Should we try to have a single official repo again?  We currently have lots of forks that different people maintain, for different purposes.  This is confusing for new users, and annoying for devs trying to support them.  Plus, the longer this continues, the harder it becomes to get all the good features in one place.

I feel the old official repo has significant problems.  One major one is that nobody is maintaining it.  We could try to get access again, but it's Mercurial and very few people understand that system - I wouldn't be able to maintain it even if I had access.  I also think that the approach of using long-lived branches and cross-merging code between them is confusing, expensive in terms of effort, and liable to introduce problems due to complex merges.

I think we should move to Git.  All devs know git, it's the de facto standard for source control.  If you don't know any VCS, git is the easiest to find tutorials for.  We have a hard time getting devs, using an obscure system is off-putting.

Merging together the different repos will of course involve work and need co-operation.  I think it's worth it, but it's up to anyone that has their own fork as to whether they want to do this.  If you have ideas about how I can make this easier or more attractive, I'd like to hear them (hopefully when I have a testing system, that's a good thing you can take advantage of!).

Any official repo should be controlled by multiple trusted parties.  Git and github support this, so this is a question about defining and documenting how we want management of the repo to work.  To get things started, I'd suggest we want at least three active users with repo control, and potentially a larger group of people with commit rights / PR review rights.

names_are_hard


petabyte

I think having an 'official' github organization containing the active forks would be useful. Could add your repo and maybe the 1300d repository too. I think that would make the most sense to new devs approaching the project.

Also need to post more updates, especially to the site. I remember thinking the project was dead back when I first saw it.

names_are_hard

I haven't used github orgs before.  A quick read suggests it's definitely more admin, we'd need additional org level roles assigned.  What benefits would we get?  Putting our active repos in one place so they're easy to find makes sense, but we could do this with a page on magiclantern.fm, and that would work with repos on github, heptapod and bitbucket, etc.

Does it make sense to use an org if the end goal is a single repo that does everything?  Is that the end goal?

I was thinking a monthly status report might be a good thing.  How would that look given we currently have multiple efforts by different people across different repos?  Where should it be posted?

petabyte

I've used it for several projects, it's very useful for keeping development organized. Admins can create and modify existing repos, and you can allow certain people to modify certain repos. Not sure what you know and dont know about this stuff. Honestly don't see how it would mean more responsibility over a single repo.

I think having an org is more about making a project look like less of a personal fork and more of an official fork. Gives people a nice overview of the project info, people who participate, and the repos in development. But that's just how I see it.

70MM13

please post the updates on the forum!
i'm sure i'm not the only one who enjoyed reading your first update...

for years i've been considering buying a 5d4 just to start developing it, but the ridiculous amount of hassles involved just to get set up (sorry guys, it's true) have been a complete showstopper for me.  knowing that someone is trying to make it less hostile is very encouraging!

i think it could be a fun project, but when all the fun is removed before step 1, no thanks...

names_are_hard

Quote from: petabyte on April 04, 2023, 06:06:54 PM
Honestly don't see how it would mean more responsibility over a single repo.
It looks like you need an org admin as a minimum, and that admin can delete repos.  So how would that work if we have multiple repos, previously owned and controlled by one person?  There's an additional layer of responsibility and we'd need to have a plan.  There could easily be benefits that are worth that, but it seems clear there's also extra work.  And, can we bring in bitbucket and heptapod repos?  I would assume not, so, what do we do with those?  Do we gain much if we bring together some repos, but not others which are still quite active?

Quote
I think having an org is more about making a project look like less of a personal fork and more of an official fork. Gives people a nice overview of the project info, people who participate, and the repos in development. But that's just how I see it.
This we can do easily by listing the blessed repos on magiclantern.fm, perhaps somewhere on Downloads (which definitely needs updating, we get a lot of questions about "what should I download?"  "why are all the builds old?"), or maybe a new Repos section.

More generally, I think we should decide what the project goals are before committing to a particular solution.  I certainly agree that letting people know how to get ML is important!  And if there are different options on how to get it, what those options mean.

names_are_hard

Quote from: 70MM13 on April 04, 2023, 06:14:24 PM
please post the updates on the forum!
i'm sure i'm not the only one who enjoyed reading your first update...
Great, thanks!  Not sure which one you mean by "first update" but I'm glad for anything that helps others :)

Quote
for years i've been considering buying a 5d4 just to start developing it, but the ridiculous amount of hassles involved just to get set up (sorry guys, it's true) have been a complete showstopper for me.  knowing that someone is trying to make it less hostile is very encouraging!

It's a lot easier than it used to be, at least with my repo, which has been modernised.  Easier to build, cleaner and faster compiles.  It does help if you're familiar with Linux (or WSL).  Discord is also a much faster and easier way for asking for help than either forums or IRC.  We've talked lots of people through setting up a dev env, I'd say it takes most people less than three hours (unfortunately most people then seem to quit after about two weeks, with no explanation of why).

petabyte

Quote from: names_are_hard on April 04, 2023, 06:59:44 PM
It looks like you need an org admin as a minimum, and that admin can delete repos.  So how would that work if we have multiple repos, previously owned and controlled by one person?  There's an additional layer of responsibility and we'd need to have a plan.

GitHub permissions are customizable. I suggest creating a test organization to try it out if you haven't already. Other than that I'm not sure what youre asking.

names_are_hard

Quote from: petabyte on April 04, 2023, 07:18:48 PM
I suggest creating a test organization to try it out if you haven't already.

That makes sense, I should give it a try.  I'm not trying to criticise the idea of having an org - I'm trying to find out how it would help, because I don't know anything about using them.

I suspect it makes a decent amount of sense for the qemu-eos repo and some magiclantern repo to be in the same org, that feels an obvious fit for that layer of organisation.  I'm less sure it makes sense to have multiple ML repos in one org.  Why not merge them so there's a single place for ML code?  If the end goal is "have a single source of truth for ML code", do we need an org, when that would end up being a single repo?  If that isn't the goal, what is?  Long term, maintaining multiple ML repos that have different features, support different cameras, and work in different ways seems fundamentally bad for users; it's confusing and harder for us to give support, answer questions etc.  At least, that's my opinion - does it make sense to you?

Do GH orgs work for the repos that aren't already in github?  I'm assuming it doesn't, is that right?  If the goal is putting things in one place because that's easier for users to understand, then it doesn't currently work well (we could try to get people to move to GH, but obviously that's effort too).  Whereas we can easily dedicate a page on magiclantern.fm to list the useful repos, which would be an easier (though less well integrated) way of achieving that goal.

petabyte

Quote from: names_are_hard on April 04, 2023, 07:45:15 PM
does it make sense to you?
Yes, that makes sense.

Quote
Do GH orgs work for the repos that aren't already in github?  I'm assuming it doesn't, is that right?
No reason why it wouldn't. Are you referring to mercurial repositories?

names_are_hard

Quote from: petabyte on April 04, 2023, 08:19:01 PM
No reason why it wouldn't. Are you referring to mercurial repositories?

You can add a repo that isn't git, and isn't in github, into a github org?  Unexpected!

Yes, mercurial for official, of which I assume there are some forks.  Danne is using git, but on bitbucket (the commit history seems...  insane, I think he's using hg locally and then using git to push those changes to bitbucket?)

kitor

Quote from: petabyte on April 04, 2023, 03:50:37 PM
I think having an 'official' github organization containing the active forks would be useful.

I already created a placeholder some time ago: https://github.com/autoexec-bin

Quote from: names_are_hard on April 04, 2023, 08:37:34 PM
You can add a repo that isn't git, and isn't in github, into a github org?  Unexpected!

Yes, mercurial for official, of which I assume there are some forks.  Danne is using git, but on bitbucket (the commit history seems...  insane, I think he's using hg locally and then using git to push those changes to bitbucket?)

How? I guess this is a miscommunication.

For the existing GH repos you need to transfer it into org so it can be managed -> https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository

As for general discussion, I also do prefer to have a single repo with not too many branches. But let's be honest - with the amount of changes (including a ton of code formatting changes) it won't be a trivial task to merge anything into magiclantern_simplified.
We will need each fork maintainer to verify existing status over this repo and then merge their work by hand.
Too many Canon cameras.
If you have a dead R, RP, 250D mainboard (e.g. after camera repair) and want to donate for experiments, I'll cover shipping costs.

names_are_hard

Quote from: kitor on April 05, 2023, 09:24:08 PM
For the existing GH repos you need to transfer it into org so it can be managed -> https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository

It does have to be GH to use an org then, that makes a lot more sense to me, it's what I assumed.  So if we do want to go that route, we'd need all the active forks to move to GH.

Quote
We will need each fork maintainer to verify existing status over this repo and then merge their work by hand.

This is not required.  If people want to that may save time, but I'd be willing to do a lot of the merge work if people agree.  I know the repo best and already had to do large painful merges multiple times to get it to this point.  Plus it feels mostly my responsibility since I made it this way.  I can't do all the testing of the results since I'm lacking required cams.

Of course this is dependent on the current fork owners wanting to unify their work.  Bilal mentioned recently that he does want to at some point (maybe soon?  It's mentioned in the recent live preview thread).

names_are_hard

Updates for 2023 Mar - Jun period

Initial support added for 7D2, 5D4, 80D, 77D.  Merged in 70D support from 70D_fw_112 branch.

All the above cams now boot to ML GUI, though stability is variable and sometimes bad (task related?  Investigation is ongoing).

Kitor cleaned up boot code around locating ML in early memory, making this simpler and more unified across models.  This is a very nice change from a dev perspective, much cleaner and easier to understand.

Cleaned up code around task structs, which was making untrue assumptions about dependencies on Digic versions.  This fixes long standing bugs in the module system that can lead to crashes.  For all other repos, if modules are copied to a different cam, crashes can occur when accessing task information.

Further improvements to module system, fixing old bugs, and improving support for ARMv7-R models which have different dependencies around division.

Clean, reliable fix for "isoless err" on 650D.  The pattern used for this fix (and earlier for 550D, 200D) can be applied to other cams.

YOLO object detection and server processing offloading code merged.  Anyone with a Wifi enabled cam can now experiment with sending data from cam to a server of your choice and getting data back.  E.g., automatic uploading of images as you take them, via your phone as a wifi hotspot.  Or replace all your friends faces with clown heads, it's up to you.

Example of build system improvements (not all from this time period)

platform/700D.115$ make clean && /usr/bin/time make -j16 zip

Old build system:
0:19.38 elapsed

New build system:
0:02.28 elapsed

The new system builds 8x faster.  Consider using my repo if you like things that are 8x faster.

Probable upcoming work

Improve support for 5D4, 7D2, these are quite unstable currently.

Ensure all merged branches from official repo are fully updated (my merge of lua_fix is known to be based on a slightly older version).

Port crop_rec_4k_mlv_snd code.  Due to the highly complex way branching and merging was handled on official repo, I don't understand how to merge this and keep history.  I've attempted it but it's too hard for my ability.  I will likely throw away the history (which will remain in the archive repo, magiclantern_hg_02), since this makes the merge work much simpler.

At this point it will be possible to easily integrate code from the popular forks with strong support for old cams.

names_are_hard

Updates for 2023 Jun - Aug

Most of this period was spent integrating popular branches from official repo into mine.  The goal here is to have a modern repo with all of the popular branches merged into one; this allows anyone who was working with e.g. crop_rec_4k_mlv_snd to easily port their work into my repo, if they want.  All work is merged down into one branch.  All fixes and improvements go in the same branch; everyone benefits, fixes don't end up living in some minor branch for years.

I don't understand the branching strategy used by official, but it clearly allowed branches to remain alive for very long periods, often years.  This made integrating the work very difficult.  This was something like 900 commits, many requiring complex merge conflict resolutions.  I didn't enjoy this work but it was repeatedly causing great pain, and the more we made changes to the base repo to improve support on new cams, the harder it became to merge with official.  There's still some minor branches that may want to be merged in (e.g. iso_research), but the merge work looks easier for those that remain.

Lua_fix and crop_rec_4k_mlv_snd are now fully merged, initial testing effort (thanks Walter and Bilal!) suggests there are some minor problems but no major bugs have been found.

Some long standing bugs were fixed after these were merged, see my additions here (for lua_fix): https://github.com/reticulatedpines/magiclantern_simplified/commits/b7b45b3512a89b278f53d401bca3b2304393a141
And here (for crop_rec): https://github.com/reticulatedpines/magiclantern_simplified/commits/b4a40cf13e53a064049b09fd8639665e285c15c1

Build speed has been improved again.  Module builds in official had been artificially limited to build sequentially due to a known bug.  I fixed that bugs months ago, so I've re-enabled parallel builds.  Also during this period I found more bugs in the build system ::) But they're old bugs, so, I guess we can tolerate them a little longer...

Current reticulated_pines repo status
Ready to integrate third-party repos if desired.

If people are using your builds, they're valuable and I want to collect all the good parts into one place, while keeping builds useful for everyone.  Improve raw video but break features for stills shooters?  We need to find a compromise.  In exchange you get a base repo that builds easily on modern systems, builds quickly, supports modern cams, has bug fixes, and is actively maintained.

If you have a repo based on lua_fix or crop_rec_4k_mlv_snd (or many other branches), your changes should apply over the top of my repo.  This won't be completely trivial because we've changed things on that repo to support new cams, this includes the boot process for all cams, and e.g. memory subsystem changes.  If you'd like to try this, fork reticulated_pines and work in a branch.  If you'd like to integrate code but don't want to do it yourself, there's a good chance I'll do it if you ask.

Upcoming work
Community build testing!  Private testing on Discord suggests this merged code is good enough for wider testing.

Automated regression test system.  With more cams expected to work, getting regression testing useful again becomes more important.


names_are_hard

Bonus update week, late Aug / early Sept
I decided to clean up the code before pushing builds for testing.  I've been annoyed forever that the build has a large number of compiler warnings.  I've spent a lot of time previously reducing this number (targeting new cams, we saw many warnings related to the new cams disabling most features, these build configurations hadn't been tested before).

Took about a week, but I've now removed all warnings from builds.  Some of them were definitely bugs.  Some were quite hard to fix.  Some have been in the code for over 10 years.  On the plus side I got to use an anonymous union for the first time.

In the background I updated my dev machine, so now I'm building with arm-none-eabi-gcc 12.2.1.  This is a much more recent gcc than official builds use, and makes builds smaller.  I've also enabled slightly more aggressive warning flags in the build.

I've added a make flag that enables -Werror, so you can do e.g. this:
make FATAL_WARNINGS=y zip
There is now no excuse for adding code that introduces new warnings.  I need to get feedback from other devs using different environments / compilers, but at some point I expect to make -Werror the default.  Any devs reading this - if you can test to see if you get any build warnings, that would be useful!

Code with all the fixes is here:
https://github.com/reticulatedpines/magiclantern_simplified/commit/8934ac40ce1e004032a3187e02178f0852081875

Oh yeah, I made modules autoload, too.  If a module depends on an export from another module, when enabling the first module, the second will also be enabled (still requires reboot for the modules to load).  Sadly, this is less useful than expected: the few modules that do depend on others have the required symbols provided by *multiple other modules*, with no sensible way to choose between them.  So I made the code only auto-enable if it's unambiguous...  which is currently never the case.

I don't think we want multiple modules to provide the same symbol.  This is confusing and in the current form has led to code duplication.  Probably we want to split out the symbol into a separate binary and have everything that needs it, load that.

names_are_hard

Updates for 2023 Sep - Oct
Last commit considered: https://github.com/reticulatedpines/magiclantern_simplified/commit/af2c1123451636fcfd88c0929311b7b4276d386c

There are four pieces of work in this update.  Individually none are large, but all are significant.

First:
I added a developer guide.  This is only the beginnings, but I think it's a good start.  This is a set of Markdown files, and a script to turn them into PDF and HTML forms.  Consequently, you can preview them on Github: https://github.com/reticulatedpines/magiclantern_simplified/blob/dev/developer_guide/03_00_magiclantern.md
The PDF version looks nicest.  See ROADMAP.txt for rather rough plans for the future.  Anyone is welcome to contribute, but it does rather require good knowledge of ML code.

If anyone is interested in working on a User Guide in the same kind of format, I'd be very interesting in getting something into the repo.

There are significant benefits to including the guides in the repo.  It makes it easier to keep them current with the code.  It keeps things together, while the PDF and HTML generation still allows easily using the info in different formats / locations.

Second:
I found direct RPC functionality on dual-digic cams.  This means either core can trigger the other to run arbitrary code.  An interrupt is used to signal the other core.  The latency is very low, and it bypasses the normal scheduling system; as soon as the target core receives the interrupt, it runs the code you specified.  task_create_ex() is likely preferable in most circumstances, but it's useful to have options - and this RPC functionality can be used earlier, before the task system is initialised.  I'll do a longer technical write up as a separate post.

Third:
Improvements to MMU code.  The prior experimental code allowed you to use MMU to patch ROM contents, but required you to find large amounts of unused memory first, which was unreliable and time consuming.  And if you got it wrong, the cam would immediately lock up, requiring a battery pull.  Kitor and I have had several discussions about this, with a few good ideas for possible future work.  I went the route of reserving space within magiclantern.bin for a minimal set of structs to allow MMU remapping one 64kB page.  By using the MMU to intercept init1_task (the root task on cpu1), we can also get cpu1 to take the MMU patches before any tasks are started, in a cross-platform manner (this had been difficult previously due to technical differences in D7 and D8X boot process).

This greatly simplifies using MMU for testing, and entirely removes the possibility of crashes due to memory conflicts.  You can still easily patch things in a dumb way and break things :)

Because we can't steal much memory in the user mem region occupied by magiclantern.bin, if you want to make several patches in distant regions, this approach won't work.

See 200D for examples on how to configure a cam for MMU, especially CONFIG_MMU_REMAP.

Fourth:
How do we escape the confines of user mem pool?  On D45 cams, we relocate to the Allocate Memory pool.  But this uses cache hack patching, and D678X cams don't have that.  But - D78X do have MMU (Digic 6 will have to wait.  I can see two obvious approaches, both untested so far).

Combining the above, along with some code to automatically assign increasing numbers of the various MMU structs when given more space, I provide a way to locate ML itself, and MMU remap metadata, into Allocate Memory pool.  A few constants need to be chosen / found, but these are easy and portable across cams.  See 200D, PTR_ALLOC_MEM_START, ALLOC_MEM_STOLEN.

This removes prior constraints on ML binary size.  Although I have noticed that the AllocMem pool is quite small on some cams, so it might be better to split ML and MMU between user mem and AllocMem.  No problems observed so far, just worth noting.

Upcoming work
Working with a.sintes to integrate his recent work: 5D3 improved preview, improved card free space display, and a focus sequencing module

names_are_hard

Minor but satisfying update: merged changes from Ilia and updated a little - now ML builds with only python3.  No python2 required.

https://github.com/reticulatedpines/magiclantern_simplified/commit/85d0af41615d8c3fdfd5bbfd5cc7cb0f5e2b0260 and some earlier commits.