Friday, May 22, 2015

Moto G and Moto E : snapdragon 400 vs 200 Round2

A while ago we discussed about S200 vs S400 using the Old CPU Benchmark tool called "Linpack For Android", an app that had not been updated since ages. It was convenient for one reason : no update = no artificial MFLOPS increase due to app optimization ... but with CM11,12 and 12.1 i noticed huge performance drops in Linpack whereas i didn't feel any. It was time to try something else


Linpack for Android
A dead Benchmark app...

Linpack vs RgbenchMM on MotoE and MotoG:
RgbenchMM 1190MHz
Moto E (S200) Moto G (S400)
1 270 290
2 611 641
4 587 1280
Linpack 1190MHz
Moto E (S200) Moto G (S400)
Single 79 55
Multi 145 150
Per Core 72,5 50


Why is there so much disparity? the way the app make the CPU work and the way the workload has been set are different and so are results.
2 disparities :

  • In Linpack Single Core results for MotoE (S200, stock rooted KK) and MotoG (S400, CM12.1) are not similar although they should regarding the Core frequency and the similar architecture. (results Are similar with RgbenchMM)
  • MultiThread results are strange on Linpack : How a QuadCore CPU couldn't perform better than a DualCore one??
    • Linpack cannot handle more than 3 Threads
    • MotoG with CM12.1 starts with about 20% less MFLOPS with one core, and that is still the case in the multithreaded run... Not expected

Let's see how RgbenchMM performs in similar situation!


RgbenchMM
A Great CPU Benchmark app!

What do we see here?

  • Results for RgbenchMM are similar with MotoE and MotoG for single and dual threading benchmarks : this is expected in the way S200 and S400 are clocked at the same frequency.
  • 4 Threads run give more than 4 times the score of Single Thread one with the QuadCore S400
  • Results are strictly linear to the number of cores for S400 (r²=0.97, which is pretty good for Benchmarks results)
  • 4 threads benchmarking is only suitable for MotoG (the S400 is a quad-core) and gives inconsistent results with S200 (only dual-core, so 4 threads handled by 2 cores cannot give good results)
  • Little lower results with MotoE could be explained since the S200 has its 2 cores shared between Benchmark and System, whereas MotoG (S400) has still 3 and 2 cores left to handle system processes during single and dual threads (although 4 threads run is strictly twice higher than 2 threads ... so even with all the cores used S400 MFLOPS/Core results are better than S200?)
  • On both devices, MFLOPS/Core results are slightly better in Dual Thread run and Single Thread one
RgbenchMM dev explains the following:
it does about 6 loads and 8 multiply-accumalates (or 16 flops) inside the loop. The load instructions (FLDD) are also VFP instructions as are the FMACD instructions. Thus, the benchmark is testing the VFP performance almost exclusively. One other detail about the code is that the threads are setup so that ideally they are reading the same columns of one of the input matrices. This will be beneficial on architectures with at least 1 level of shared cache and thus you may see more than 2x speedup on a dual-core processor.
  • The Extremely lower MFLOPS/Core result for 4 threads run on the DualCore S200 shows how it would handle hard multitasking.



Is the Conclusion changed?
Compared to the previous S200 vs s400 article

  • Does Changing the Benchmark app modifies the conclusion about the S200 vs S400 comparison? No, i doesn't
  • Does it changes the "Pure CPU benchmarks: the limits" article conclusion? Not exactly, in fact the results can't be compared since my MotoG was running CM11 at that time, but it could explain the high gap between CM11 and stock KK : it was more a Linpack bias than a ROM issue, i'll be able to test Stock ROM on a S400 soon, then the final conclusion will come.