qemu-patch-raspberry4/target-arm
Emilio G. Cota 1dd089d0ee target-arm: emulate aarch64's LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

                atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
..
arch_dump.c arm: Clean up includes 2016-01-29 15:07:23 +00:00
arm-powerctl.c Use #include "..." for our own headers, <...> for others 2016-07-12 16:19:16 +02:00
arm-powerctl.h ARM: Factor out ARM on/off PSCI control functions 2016-05-12 13:22:28 +01:00
arm-semi.c target-arm/arm-semi.c: In SYS_HEAPINFO use correct type for 'limit' 2016-07-07 13:47:00 +01:00
arm_ldst.h cpu: move exec-all.h inclusion out of cpu.h 2016-05-19 16:42:29 +02:00
cpu-qom.h exec: move cpu_exec_init() calls to realize functions 2016-10-24 17:29:16 -02:00
cpu.c x86 and CPU queue, 2016-10-24 2016-10-25 10:25:27 +01:00
cpu.h target-arm: Implement new HLT trap for semihosting 2016-10-24 16:26:56 +01:00
cpu64.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
crypto_helper.c target-arm: Clean up includes 2016-01-18 16:33:32 +00:00
gdbstub.c qemu-common: push cpu.h inclusion out of qemu-common.h 2016-05-19 16:42:29 +02:00
gdbstub64.c qemu-common: push cpu.h inclusion out of qemu-common.h 2016-05-19 16:42:29 +02:00
helper-a64.c target-arm: emulate aarch64's LL/SC using cmpxchg helpers 2016-10-26 08:29:02 -07:00
helper-a64.h target-arm: emulate aarch64's LL/SC using cmpxchg helpers 2016-10-26 08:29:02 -07:00
helper.c target-arm: Implement new HLT trap for semihosting 2016-10-24 16:26:56 +01:00
helper.h target-arm: Implement MRS (banked) and MSR (banked) instructions 2016-03-16 17:05:58 +00:00
internals.h Fix confusing argument names in some common functions 2016-07-12 13:06:08 +01:00
iwmmxt_helper.c target-arm: Clean up includes 2016-01-18 16:33:32 +00:00
kvm-consts.h all: Clean up includes 2016-02-23 12:43:05 +00:00
kvm-stub.c qemu-common: push cpu.h inclusion out of qemu-common.h 2016-05-19 16:42:29 +02:00
kvm.c target-arm: kvm: use AddressSpace-specific listener 2016-10-17 19:22:16 +01:00
kvm32.c os-posix: include sys/mman.h 2016-06-16 18:39:03 +02:00
kvm64.c os-posix: include sys/mman.h 2016-06-16 18:39:03 +02:00
kvm_arm.h target-arm: move gicv3_class_name from machine to kvm_arm.h 2016-10-04 13:28:08 +01:00
machine.c target-arm: move gicv3_class_name from machine to kvm_arm.h 2016-10-04 13:28:08 +01:00
Makefile.objs ARM: Factor out ARM on/off PSCI control functions 2016-05-12 13:22:28 +01:00
monitor.c target-arm/monitor.c: Advertise emulated GICv3 in capabilities 2016-06-17 15:23:51 +01:00
neon_helper.c target-arm: Fix warn about implicit conversion 2016-08-12 11:12:24 +01:00
op_addsub.h Correct spelling of licensed 2011-07-23 11:26:12 -05:00
op_helper.c Fix masking of PC lower bits when doing exception returns 2016-10-17 19:29:03 +01:00
psci.c Use #include "..." for our own headers, <...> for others 2016-07-12 16:19:16 +02:00
trace-events target-arm: Add trace events for the generic timers 2016-10-17 19:32:44 +01:00
translate-a64.c target-arm: emulate aarch64's LL/SC using cmpxchg helpers 2016-10-26 08:29:02 -07:00
translate.c target-arm: emulate SWP with atomic_xchg helper 2016-10-26 08:29:02 -07:00
translate.h target-arm: Infrastucture changes to enable handling of tagged address loading into PC 2016-10-17 19:22:18 +01:00