Quote from: g3gg0 on May 23, 2013, 12:16:09 AM
yeah..
well, i implemented an memcpy using LDMIA/STMIA for LV buffer copying and this was a dead end.
so i tried to get EDMAC working.
I don´t want to keep the idea of a 12 bit shifter alive, but i´m just interested in how much MB/s you recieved with LDMIA/STMIA.
the mysterious "d" seemed to reach about 30-40MB/s on an smaller modell.
and as far as i can see from the debug scrennshots a memcopy can reach over 70 MB/s. Ist that correct. Was that LDMIA/STMIA?
Some years ago a made heavy X86/MMX/SSE2 optimisations and looked some days ago at ARM optimizing.
As far as i can see (and what "d" did) you can sqeeze out a lot of bubbles and stalls out of hand optimizing.
Shifts are also free in ARM. Nearly every instruction can be combined with the barrel shifter at no cost. (I´m sure you are ware of this).
And you have a lot of Cache/Flush/Prefill Options in ARM.
But again, i don´t want to reanimate the idea, i´m just curious how fast optimized LDMIA/STMIA / memcpy on a 5DMK3 can be.