Commit graph

49274 commits

Author SHA1 Message Date
Laurent Vivier 92c62548f6 target-m68k: immediate ops manage word and byte operands
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier ff99b952c8 target-m68k: cmp manages word and bytes operands
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 8a370c6cb7 target-m68k: add/sub manage word and byte operands
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 227de713e0 target-m68k: add addressing modes to neg
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier db3d7945ae target-m68k: introduce byte and word cc_ops
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 3c980d2ef6 target-m68k: some bit ops cleanup
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 415f4b62eb target-m68k: suba/adda can manage word operand
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 52dc23c595 target-m68k: and can manage word and byte operands
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 020a465920 target-m68k: or can manage word and byte operands
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier eec37aec85 target-m68k: eor can manage word and byte operands
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier ea4f2a8441 target-m68k: add addressing modes to not
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Richard Henderson a665a820e5 target-m68k: Inline addx, subx, negx
Signed-off-by: Richard Henderson <rth@twiddle.net>

And add opcodes for 680x0

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2016-10-28 10:38:48 +02:00
Laurent Vivier beff27ab3a target-m68k: add dbcc
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier d5a3cf33f2 target-m68k: add addressing modes to scc
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 29cf437da4 target-m68k: add exg ops
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier c630e436c0 target-m68k: add linkl
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-28 10:38:48 +02:00
Laurent Vivier 71600eda7c target-m68k: add bkpt instruction
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2016-10-28 10:38:48 +02:00
Peter Maydell 835f3d24b4 audio: intel-hda: check stream entry count during transfer
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJYEg+bAAoJEEy22O7T6HE42V8QALHC42lwtj9Kx4yHS7Tpn4Jy
 ry62EjYvXb/BCd1GGkzCZhPPJSdpiwFRubmm00hwHPzQdYjj32CYfQvAFaLpcRlY
 u1Xp2G1YIlIrhhTwjEeYglBQkuLkjqh2g90kWarvw/Ry6iS9WEtrC8GwpbVnHa6/
 fAkAJV5KKUmXwKFVdhDZvhpOVf055U88EAoSz7H6P1opKcv/vruCs/wId3bl9LH0
 pmdhXnneJmriNWqoqmfEDAHGi37QS1GL2Zhfqs3H/dOfe5WTabYwFNd5fsz+PeyE
 SojgzdcTPpeBk25JwFjerx/aesu4uNU8GnUBqvDyVLERpHK4MVvAWToJJN5ruUGQ
 m+LYcCcbTIDUjVmvLCASjlJKoztv+iG4CCiFerCHg1tVBiPNMpZtdbkXnj61Vc77
 2r9P1sMkn+0KQ6bqoFw1A2Iz/DbL9faw935OQsGRpcLEHWq7laImSFM8qeUEARD2
 mpqi8vexIFdb40bW8kQ1IUuTcqrOhbABf7cw/aLGIQGhjH1MSTgAUtRz16erlwz3
 zmp3lJne06NWnqti0gepYo6QjVgYsAFcSySvlVgh7fo4lcp+aKaD1QasOnIIJHYY
 9hYXjgM5xd3E0k4O0NoJF1HpkuMDI/V+GXbhbog5ZUVlN7KDTSCLPXn45nsWNOva
 ttaXt0Fpc2Btwta/Wtkw
 =QEV8
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kraxel/tags/pull-audio-20161027-1' into staging

audio: intel-hda: check stream entry count during transfer

# gpg: Signature made Thu 27 Oct 2016 15:30:51 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-audio-20161027-1:
  audio: intel-hda: check stream entry count during transfer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-27 17:24:29 +01:00
Peter Maydell 5929d7e8a0 cmpxchg emulation of atomics, v8
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYEMv7AAoJEK0ScMxN0CebtzsIAJD3n9AlCnJoC0xVJDcacqlY
 nkUqJqgmV5FkXq+x8KA6t7G5jfxZTBxk6QY42nXBidfuquogXnk0TQ0LJxLqp316
 mbjdomF8NHZH/79wMg5cYP/Thu4FAtw4uqJbb7kUKPjvQiPJhkISAl/4Jg9y07WR
 n6KptKI09QpIs6cM8q9DQ+RXoYG/xg/hP3Wih5TL4N1hZSiJG78Mpr5nF2HgbFrs
 gKRCmUzNkrG/hBweOqWDRo3H/fHvxFUDMNOzceH7gl1OvANaXaIO2lMkEI3MleFF
 Jq5pEtJ4TefIeoqCw8Nd5WZyzqZPUucqlXWwz3TLP0x3AgzkTH4F5lWlumqb+WM=
 =jEq6
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/pull-atomic-20161026' into staging

cmpxchg emulation of atomics, v8

# gpg: Signature made Wed 26 Oct 2016 16:30:03 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-atomic-20161026: (37 commits)
  target-alpha: Emulate LL/SC using cmpxchg helpers
  target-alpha: Introduce MMU_PHYS_IDX
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  linux-user: remove handling of aarch64's EXCP_STREX
  linux-user: remove handling of ARM's EXCP_STREX
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  target-arm: emulate SWP with atomic_xchg helper
  target-arm: emulate LL/SC using cmpxchg helpers
  target-arm: Rearrange aa32 load and store functions
  tests: add atomic_add-bench
  target-i386: remove helper_lock()
  target-i386: emulate XCHG using atomic helper
  target-i386: emulate LOCK'ed BTX ops using atomic helpers
  target-i386: emulate LOCK'ed XADD using atomic helper
  target-i386: emulate LOCK'ed NEG using cmpxchg helper
  target-i386: emulate LOCK'ed NOT using atomic helper
  target-i386: emulate LOCK'ed INC using atomic helper
  target-i386: emulate LOCK'ed OP instructions using atomic helpers
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  tcg: Emit barriers with parallel_cpus
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-27 14:06:34 +01:00
Peter Maydell 8f9d84df97 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJYEBKaAAoJEO8Ells5jWIR4jQH/3HgiWHs9+iQrUjo8DXrbF1b
 Dkdg8B66yYRirwR4KeCVJqOMnscPotISJc47MveoU+CxAwRcmhVtPuH+gZ7MLggp
 IrFT9XNo4WhSBlOc1tr/qGyGGgzzkWbcKKBfD3dK049XDcXPm7A3hNshqitf6YJI
 ILnlVk0ttKP7PKd6pvwaH+8yNDqcCr4+Rk6uSgOAB4N416+N/zk2AwQGWbMgLSzZ
 zBRu95K/7UvRRoyyqR4kxTRGhfNdEqWeOXXISRmTBfBM+iK6W3uaeWSy5ka9QTdo
 yXwcwxVe9iBxMuR3sZqNAbi5EbQIBtQSI2echG4bCQwvwjEAw9LUOhnJ44XkTfE=
 =GoQg
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Wed 26 Oct 2016 03:19:06 BST
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  colo-proxy: fix memory leak
  net: rtl8139: limit processing of ring descriptors
  net: vmxnet: initialise local tx descriptor
  e1000e: Don't zero out buffer address in rx descriptor
  net: rocker: set limit to DMA buffer size
  net: eepro100: fix memory leak in device uninit
  tap-bsd: OpenBSD uses tap(4) now
  net: pcnet: fix source formatting and indentation
  net: pcnet: check rx/tx descriptor ring length

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-27 12:45:45 +01:00
Peter Maydell 991a97ac74 -----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYD6tmAAoJEPMMOL0/L748ebQQAKxA2zQn2TPWoy/hzAC+QQIt
 VVKm3so/4WwX1JVrfLm9jAVHBsOYvHgABSxHtdwRQK2UD6QP3zh8dZNFIAJgNtGH
 SNMRm2HRyE7f16hnbz5Scqp7YGvYDp8XVolNvS/o5bBh3dS9j4V4W4DiC3bu7OZY
 FNytzlFCQvhOXNRejhlKsusYvrRECEy5Zaa3LTbYRVX7K/sHtDCD01URQKYJWZFJ
 m13juuus1rXNVuYxbs1YLwAJcN9yM4pjnZnO6meBH669+/JSbByjjXuhARwng4Z5
 o/f8+ZpyCMlNXMTt7DFu6QPxrFKCHpQ2Rwyy55uVx7lEmtZ1s6n6mF+P/Wp+kXUZ
 QzvbBSKCnNsLHGUf+0Us/U1v61WFhd4MZJF7dzWecVpLT8tCbXADFLbfgzFkz5MN
 zVd7L2uO4F3CtkwW8GFxmiVqmnHyOl/+2kz8UkejbnJQxwmr8oijVQVexaXDKHCA
 KytAz+PEW5qX6uGlfnEV2DnBpNOAzh8RspZF/mzDA5H0VM08bmeK7ySUe+BYRv2p
 GZXqdVchQrf2fYmFB1Dn4hGvc/gxtLjxkFst3koItgGeUauY35AAkDZ2B5EJ03UZ
 kpfA0grwhWTX92h+MYwXRHYaMoscCQPcjfYNbDGcOSvvwkElOf+fFMTaA9t0MGF2
 gREdnWwhl8gr7K/Mh+3t
 =uyQH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vivier/tags/m68k-part1-pull-request' into staging

# gpg: Signature made Tue 25 Oct 2016 19:58:46 BST
# gpg:                using RSA key 0xF30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-part1-pull-request: (23 commits)
  target-m68k: Optimize gen_flush_flags
  target-m68k: Optimize some comparisons
  target-m68k: Use setcond for scc
  target-m68k: Introduce DisasCompare
  target-m68k: Reorg flags handling
  target-m68k: Remove incorrect clearing of cc_x
  target-m68k: Some fixes to SR and flags management
  target-m68k: Print flags properly
  target-m68k: update CPU flags management
  target-m68k: don't update cc_dest in helpers
  target-m68k: update move to/from ccr/sr
  target-m68k: remove m68k_cpu_exec_enter() and m68k_cpu_exec_exit()
  target-m68k: Replace helper_xflag_lt with setcond
  target-m68k: allow to update flags with operation on words and bytes
  target-m68k: REG() macro cleanup
  target-m68k: set PAGE_BITS to 12 for m68k
  target-m68k: define operand sizes
  target-m68k: set disassembler mode to 680x0 or coldfire
  target-m68k: introduce read_imXX() functions
  target-m68k: manage scaled index
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-27 11:58:43 +01:00
Richard Henderson ed2839166c target-alpha: Emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem.  However, portable parallel
code is written assuming only cmpxchg which means that in
practice this is a viable alternative.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
Richard Henderson 6a73ecf5cf target-alpha: Introduce MMU_PHYS_IDX
Rather than using helpers for physical accesses, use a mmu index.
The primary cleanup is with store-conditional on physical addresses.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
Emilio G. Cota 05188cc72f target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
The exception is not emitted anymore; remove it and the associated
TCG variables.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-31-git-send-email-cota@braap.org>
2016-10-26 08:29:02 -07:00
Emilio G. Cota f4e6eb7ffe linux-user: remove handling of aarch64's EXCP_STREX
The exception is not emitted anymore.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-30-git-send-email-cota@braap.org>
2016-10-26 08:29:02 -07:00
Emilio G. Cota b50b82fc48 linux-user: remove handling of ARM's EXCP_STREX
The exception is not emitted anymore.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twidle.net>
Message-Id: <1467054136-10430-29-git-send-email-cota@braap.org>
2016-10-26 08:29:02 -07:00
Emilio G. Cota 1dd089d0ee target-arm: emulate aarch64's LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y

                atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
Emilio G. Cota cf12bce088 target-arm: emulate SWP with atomic_xchg helper
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-25-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
Emilio G. Cota 354161b37c target-arm: emulate LL/SC using cmpxchg helpers
Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB

               atomic_add-bench: 1000000 ops/thread, [0,1] range

  9 ++---------+----------+----------+----------+----------+----------+---++
    +cmpxchg +-E--+       +          +          +          +          +    |
  8 +Emaster +-H--+                                                       ++
    | |                                                                    |
  7 ++E                                                                   ++
    | |                                                                    |
  6 ++++                                                                  ++
    |  |                                                                   |
  5 ++ |                                                                  ++
  4 ++ |                                                                  ++
    |  |                                                                   |
  3 ++ |                                                                  ++
    |   |                                                                  |
  2 ++  |                                                                 ++
    |H++E+---                                  +++  ---+E+------+E+------+E|
  1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
    ++H+       +    +++   +  +++     ++++       +          +          +    |
  0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
    0          10         20         30         40         50         60
                               Number of threads

                atomic_add-bench: 1000000 ops/thread, [0,2] range

  16 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  14 ++master +-H--+                                                      ++
     | |                                                                   |
  12 ++|                                                                  ++
     | E                                                                   |
  10 ++|                                                                  ++
     | |                                                                   |
   8 ++++                                                                 ++
     |E+|                                                                  |
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---       +++      +++              +++           ---+E+------+E|
   2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
     + |        +    +++   +         ++++       +          +          +    |
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +       ++++          +    |
  60 ++master +-H--+                                 ----E------+E+-------++
     |                                        -+E+---   +++     +++      +E|
     |                                +++ ---- +++                       ++|
  50 ++                       +++  ---+E+-                                ++
     |                        -E---                                        |
  40 ++                    ---+++                                         ++
     |               +++---                                                |
     |              -+E+                                                   |
  30 ++      +++----                                                      ++
     |       +E+                                                           |
  20 ++ +++--                                                             ++
     |  +E+                                                                |
     |+E+                                                                  |
  10 +E+                                                                  ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

              atomic_add-bench: 1000000 ops/thread, [0,1024] range

  120 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
      | master +-H--+                                                    ++|
  100 ++                                                              ----E+
      |                                                 +++  ---+E+---   ++|
      |                                                --E---   +++        |
   80 ++                                           ---- +++               ++
      |                                     ---+E+-                        |
   60 ++                              -+E+--                              ++
      |                       +++ ---- +++                                 |
      |                      -+E+-                                         |
   40 ++              +++----                                             ++
      |      +++   ---+E+                                                  |
      |     -+E+---                                                        |
   20 ++ +E+                                                              ++
      |+E+++                                                               |
      +E+        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

[rth: Enforce alignment for ldrexd.]

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-23-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
Richard Henderson 7f5616f538 target-arm: Rearrange aa32 load and store functions
Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:02 -07:00
Emilio G. Cota 070e3edcea tests: add atomic_add-bench
With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.

The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).

[ Note: each $foo corresponds to a -foo flag ]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
2016-10-26 08:29:01 -07:00
Emilio G. Cota 37b995f6e7 target-i386: remove helper_lock()
It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota ea97ebe89f target-i386: emulate XCHG using atomic helper
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-19-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota cfe819d309 target-i386: emulate LOCK'ed BTX ops using atomic helpers
[rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
incorrect zero-extension of address in register-offset case.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-18-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota f53b01817f target-i386: emulate LOCK'ed XADD using atomic helper
[rth: Move load of reg value to common location.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-17-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota 8eb8c73856 target-i386: emulate LOCK'ed NEG using cmpxchg helper
[rth: Move redundant qemu_load out of cmpxchg loop.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-16-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota 2a5fe8ae14 target-i386: emulate LOCK'ed NOT using atomic helper
[rth: Avoid qemu_load that's redundant with the atomic op.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-15-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota 60e573462f target-i386: emulate LOCK'ed INC using atomic helper
[rth: Merge gen_inc_locked back into gen_inc to share cc update.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-14-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota a7cee522f3 target-i386: emulate LOCK'ed OP instructions using atomic helpers
[rth: Eliminate some unnecessary temporaries.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-13-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Emilio G. Cota ae03f8de45 target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
  BAR
} else {
  FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Richard Henderson 91682118aa tcg: Emit barriers with parallel_cpus
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Richard Henderson df79b996a7 tcg: Add CONFIG_ATOMIC64
Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation.  Do so by dropping back to single-step.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Richard Henderson 7ebee43ee3 tcg: Add atomic128 helpers
Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction.  Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Richard Henderson c482cb117c tcg: Add atomic helpers
Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
Richard Henderson c86c6e4c80 cputlb: Tidy some macros
TGT_LE and TGT_BE are not size dependent and do not need to be
redefined.  The others are no longer used at all.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00
Richard Henderson 82a45b96a2 cputlb: Move most of iotlb code out of line
Saves 2k code size off of a cold path.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00
Richard Henderson 4097842885 cputlb: Remove includes from softmmu_template.h
We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00
Richard Henderson 3b08f0a925 cputlb: Move probe_write out of softmmu_template.h
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00
Richard Henderson dea2198201 cputlb: Replace SHIFT with DATA_SIZE
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00
Alex Bennée b67cb68ba5 linux-user: enable parallel code generation on clone
The variable parallel_cpus controls the generation of thread aware
atomic code.  We only need to set it once we clone our first thread.
At this point any existing translations need to be thrown away.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00