qemu-patch-raspberry4/include
Emilio G. Cota f6bb84d531 tcg: consolidate TB lookups in tb_lookup__cpu_state
This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

   (A) Lookup succeeds for TB in hash without tb_lock
        (B) Sets the TB's tb->invalid flag
        (B) Removes the TB from tb_htable
        (B) Clears all CPU's tb_jmp_cache
   (A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

 Performance counter stats for 'taskset -c 0 qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=jessie.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

Before:
      18714.917392 task-clock                #    0.952 CPUs utilized            ( +-  0.95% )
            23,142 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            10,558 page-faults               #    0.001 M/sec                    ( +-  0.95% )
    53,957,727,252 cycles                    #    2.883 GHz                      ( +-  0.91% ) [83.33%]
    24,440,599,852 stalled-cycles-frontend   #   45.30% frontend cycles idle     ( +-  1.20% ) [83.33%]
    16,495,714,424 stalled-cycles-backend    #   30.57% backend  cycles idle     ( +-  0.95% ) [66.66%]
    76,267,572,582 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  0.87% ) [83.34%]
    12,692,186,323 branches                  #  678.186 M/sec                    ( +-  0.92% ) [83.35%]
       263,486,879 branch-misses             #    2.08% of all branches          ( +-  0.73% ) [83.34%]

      19.648474449 seconds time elapsed                                          ( +-  0.82% )

After, w/ inline (this patch):
      18471.376627 task-clock                #    0.955 CPUs utilized            ( +-  0.96% )
            23,048 context-switches          #    0.001 M/sec                    ( +-  0.48% )
                 1 CPU-migrations            #    0.000 M/sec
            10,708 page-faults               #    0.001 M/sec                    ( +-  0.81% )
    53,208,990,796 cycles                    #    2.881 GHz                      ( +-  0.98% ) [83.34%]
    23,941,071,673 stalled-cycles-frontend   #   44.99% frontend cycles idle     ( +-  0.95% ) [83.34%]
    16,161,773,848 stalled-cycles-backend    #   30.37% backend  cycles idle     ( +-  0.76% ) [66.67%]
    75,786,269,766 instructions              #    1.42  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
    12,573,617,143 branches                  #  680.708 M/sec                    ( +-  1.34% ) [83.33%]
       260,235,550 branch-misses             #    2.07% of all branches          ( +-  0.66% ) [83.33%]

      19.340502161 seconds time elapsed                                          ( +-  0.56% )

After, w/o inline:
      18791.253967 task-clock                #    0.954 CPUs utilized            ( +-  0.78% )
            23,230 context-switches          #    0.001 M/sec                    ( +-  0.42% )
                 1 CPU-migrations            #    0.000 M/sec
            10,563 page-faults               #    0.001 M/sec                    ( +-  1.27% )
    54,168,674,622 cycles                    #    2.883 GHz                      ( +-  0.80% ) [83.34%]
    24,244,712,629 stalled-cycles-frontend   #   44.76% frontend cycles idle     ( +-  1.37% ) [83.33%]
    16,288,648,572 stalled-cycles-backend    #   30.07% backend  cycles idle     ( +-  0.95% ) [66.66%]
    77,659,755,503 instructions              #    1.43  insns per cycle
                                             #    0.31  stalled cycles per insn  ( +-  0.97% ) [83.34%]
    12,922,780,045 branches                  #  687.702 M/sec                    ( +-  1.06% ) [83.34%]
       261,962,386 branch-misses             #    2.03% of all branches          ( +-  0.71% ) [83.35%]

      19.700174670 seconds time elapsed                                          ( +-  0.56% )

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
..
block commit: Remove overlay_bs 2017-10-06 16:28:58 +02:00
chardev chardev: remove context in chr_update_read_handler 2017-09-22 21:07:27 +02:00
crypto block: convert qcrypto_block_encrypt|decrypt to take bytes offset 2017-10-06 16:30:47 +02:00
disas Fix Thumb-1 BE32 execution and disassembly. 2017-02-07 18:29:59 +00:00
exec tcg: consolidate TB lookups in tb_lookup__cpu_state 2017-10-10 07:37:10 -07:00
fpu configure: Drop ancient Solaris 9 and earlier support 2017-07-21 15:04:05 +01:00
hw machine: Add a valid_cpu_types property 2017-10-09 23:21:52 -03:00
io io: Reply to ping frames 2017-10-04 13:21:53 +01:00
libdecnumber Clean up ill-advised or unusual header guards 2016-07-12 16:20:46 +02:00
migration migration: check pre_save return in vmstate_save_state 2017-09-27 11:36:31 +01:00
monitor block: rip out all traces of password prompting 2017-07-11 17:44:56 +02:00
net net/net.c: Add vnet_hdr support in SocketReadState 2017-07-17 20:02:11 +08:00
qapi qapi: Change data type of the FOO_lookup generated for enum FOO 2017-09-04 13:09:13 +02:00
qemu hbitmap: Rename serialization_granularity to serialization_align 2017-10-06 16:28:58 +02:00
qom qom: update doc comment for type_register[_static]() 2017-10-09 23:21:52 -03:00
scsi scsi: add multipath support to qemu-pr-helper 2017-09-22 21:07:27 +02:00
standard-headers linux-headers: sync against v4.14-rc1 2017-09-29 10:58:31 +02:00
sysemu vl: exit if maxcpus is negative 2017-10-09 23:21:52 -03:00
ui egl: misc framebuffer helper improvements. 2017-09-29 10:36:33 +02:00
elf.h tcg/s390: Use constant pool for movi 2017-09-07 11:57:35 -07:00
glib-compat.h migration: Teach it about G_SOURCE_REMOVE 2017-09-22 14:11:18 +02:00
qemu-common.h maint: Include bug-reporting info in --help output 2017-08-08 17:28:53 +02:00
qemu-io.h hmp: Request permissions in qemu-io 2017-02-28 20:47:50 +01:00
trace-tcg.h trace: get rid of generated-events.h/generated-events.c 2016-10-12 09:54:52 +02:00