I believe your "...1/3 Aps-C sensor..." assessment is correct. This would explain the "soft look", etc. I will experiment with 3x a bit more.
The more I think about this question, the more I think I get what's going on. I think in normal mode, the processor is basically skipping or selectively using lines. Why you ask does it skip any lines? I think with the higher receptor density on the sensor, in order to minimize noise effects the processor skips lines (hence the true source of moire /aliasing is this skipping).
This explains why smaller sensor portion handles low light worse (its not just a crop of same lines, its using adjacent lines with more noise). More noise also explains weaker colour / contrasts.
The processor in normal mode probably cant process all the lines as input anyways (hmmm ... always seemed RAW video limits is most hampered by the weak processor), so aliasing and moire may also be there due to processor limits as well.
Having skipped lines, probably also reduces noise on the luminance/colour receptors, so better contrast / colour.
That's the trade off, less moire / aliasing with a softer picture (more noise) or better contrast/colour (less noise) with aliasing /moire. Neither is perfect solution, knowledge is power ... It seems a low moire / aliasing shot should be done with ISO 100. Add whatever lighting / ND filtering to stay at ISO 100 to get the least noise and best picture. That's my humble suggestion - Gaross try a 3X zoom clip at ISO 100 with proper exposure on a decent prime lens and see if it removes the aliasing /moire and the softness?
BTW it seems if the camera processor was 3X more powerful, then regular mode should use every line, and then it should be the same as 3X zoom mode. And to improve noise even more you need a 3X bigger sensor, so the adjacent lines are further apart.