qemu-patch-raspberry4/include
Emilio G. Cota 2ac01d6daf translate-all: use a binary search tree to track TBs in TBContext
This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel img/arm/aarch32-current-linux-kernel-only.img \
	-append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

- Before:

       8048.598422      task-clock (msec)         #    0.931 CPUs utilized            ( +-  0.28% )
            16,974      context-switches          #    0.002 M/sec                    ( +-  0.12% )
                 0      cpu-migrations            #    0.000 K/sec
            10,125      page-faults               #    0.001 M/sec                    ( +-  1.23% )
    35,144,901,879      cycles                    #    4.367 GHz                      ( +-  0.14% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,758,252,643      instructions              #    1.87  insns per cycle          ( +-  0.33% )
    10,871,298,668      branches                  # 1350.707 M/sec                    ( +-  0.41% )
       192,322,212      branch-misses             #    1.77% of all branches          ( +-  0.32% )

       8.640869419 seconds time elapsed                                          ( +-  0.57% )

- After:
       8146.242027      task-clock (msec)         #    0.923 CPUs utilized            ( +-  1.23% )
            17,016      context-switches          #    0.002 M/sec                    ( +-  0.40% )
                 0      cpu-migrations            #    0.000 K/sec
            18,769      page-faults               #    0.002 M/sec                    ( +-  0.45% )
    35,660,956,120      cycles                    #    4.378 GHz                      ( +-  1.22% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,095,366,607      instructions              #    1.83  insns per cycle          ( +-  1.73% )
    10,803,480,261      branches                  # 1326.192 M/sec                    ( +-  1.95% )
       195,601,289      branch-misses             #    1.81% of all branches          ( +-  0.39% )

       8.828660235 seconds time elapsed                                          ( +-  0.38% )

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
..
block nbd patches for 2017-10-14 2017-10-16 15:54:42 +01:00
chardev chardev: remove context in chr_update_read_handler 2017-09-22 21:07:27 +02:00
crypto block: convert qcrypto_block_encrypt|decrypt to take bytes offset 2017-10-06 16:30:47 +02:00
disas disas: Always initialize read_memory_inner_func properly 2017-10-12 12:10:38 +02:00
exec translate-all: use a binary search tree to track TBs in TBContext 2017-10-24 13:53:42 -07:00
fpu configure: Drop ancient Solaris 9 and earlier support 2017-07-21 15:04:05 +01:00
hw s390x: refactor error handling for MSCH handler 2017-10-20 13:32:10 +02:00
io io: get rid of bounce buffering in websock write path 2017-10-16 16:57:08 +01:00
libdecnumber Clean up ill-advised or unusual header guards 2016-07-12 16:20:46 +02:00
migration migration: check pre_save return in vmstate_save_state 2017-09-27 11:36:31 +01:00
monitor block: rip out all traces of password prompting 2017-07-11 17:44:56 +02:00
net net/net.c: Add vnet_hdr support in SocketReadState 2017-07-17 20:02:11 +08:00
qapi qapi: Change data type of the FOO_lookup generated for enum FOO 2017-09-04 13:09:13 +02:00
qemu ui: opengl updates for dma-buf support. 2017-10-19 12:09:53 +01:00
qom tcg: Add CPUState cflags_next_tb 2017-10-24 13:53:41 -07:00
scsi scsi: add multipath support to qemu-pr-helper 2017-09-22 21:07:27 +02:00
standard-headers linux-headers: sync against v4.14-rc1 2017-09-29 10:58:31 +02:00
sysemu tpm: move recv_data_callback to TPM interface 2017-10-19 11:42:33 -04:00
ui ui: opengl updates for dma-buf support. 2017-10-19 12:09:53 +01:00
elf.h tcg/s390: Use constant pool for movi 2017-09-07 11:57:35 -07:00
glib-compat.h glib-compat: move G_SOURCE_CONTINUE/REMOVE there 2017-10-10 16:33:55 +02:00
qemu-common.h maint: Include bug-reporting info in --help output 2017-08-08 17:28:53 +02:00
qemu-io.h hmp: Request permissions in qemu-io 2017-02-28 20:47:50 +01:00
trace-tcg.h trace: get rid of generated-events.h/generated-events.c 2016-10-12 09:54:52 +02:00