Commit graph

89 commits

Author SHA1 Message Date
Dr. David Alan Gilbert 2a1bc8bde7 migration/rdma: rdma_accept_incoming_migration fix error handling
rdma_accept_incoming_migration is called from an fd handler and
can't return an Error * anywhere.
Currently it's leaking Error's in errp/local_err - there's
no point putting them in there unless we can report them.

Turn most into fprintf's, and the last into an error_reportf_err
where it's coming up from another function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-02-13 10:55:55 +01:00
Juan Quintela b673eab4e2 multifd: Make multifd_load_setup() get an Error parameter
We need to change the full chain to pass the Error parameter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-29 11:28:59 +01:00
Dr. David Alan Gilbert 987ab2a549 migration: Use automatic rcu_read unlock in rdma.c
Use the automatic read unlocker in migration/rdma.c.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20191007143642.301445-5-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-10-11 14:20:01 +01:00
Dr. David Alan Gilbert d46a4847ca migration/rdma.c: Swap synchronize_rcu for call_rcu
This fixes a deadlock that can occur on the migration source after
a failed RDMA migration;  as the source tries to cleanup it
clears a pair of pointers and uses synchronize_rcu to wait; this
is happening on the main thread.  With the CPUs running
a CPU thread can be an rcu reader and attempt to grab the main lock
(kvm_handle_io->address_space_write->flatview_write->flatview_write_continue->
prepare_mmio_access->qemu_mutex_lock_iothread_impl)

Replace the synchronize_rcu with a call_rcu to postpone the freeing.

Fixes: 74637e6f08 ("migration: implement bi-directional RDMA QIOChannel")

( https://bugzilla.redhat.com/show_bug.cgi?id=1746787 )

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190913163507.1403-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-25 15:51:19 +01:00
Dr. David Alan Gilbert de8434a35a migration/rdma: Don't moan about disconnects at the end
If we've already finished the migration or something has
already gone wrong, don't moan about the migration stream disconnecting.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190913163507.1403-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-25 15:51:19 +01:00
Peter Maydell 95a9457fd4 Header cleanup patches for 2019-08-13
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1WleASHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTBBYQALQLzIYb2Zux95bAxoJdhqNuEOGLfxeu
 gx0i0roPe6SBleHozUK+gf7kVYyw7he58n2dZURGqrpqktgZOFcea2a6Dq1rnVw6
 JMJ2Oy7V326bHwJT0Np9rW4n+FHsMQZoAUEHjl9EeGCZfO/zy2aSWPsD8mbcbm0g
 hUW5Jr4+cpm28BCL8I+2HhWFazB6G2IPAF9oEXmNsOM6J1Ho8WGrTAjASe0Il5Yi
 m2B4QWG+4uz77WYnkttnssm41K1S95HYyaKluIVyNwTnsPTN303V/sUj+wdRaooL
 k1O6WqaavGhal7QeRqy+vCpF8m6qLq7NaYCzSCOrrkkuC8TAnpVn7Xmi9qI+vb6O
 kGBpDWhq5wOnphsEhnFvhPZgD+WZo3mwTgW4h0d3UhB6orOTPTMvWKEwFJ1j/O6/
 gntV61o542c9gpZjS133221HRmNjteHF/5/TFzmX/G50sgivJn+WOP87naM2aBAz
 8MW5HatTox+qQqYD4VMUIVnVkguxHDVhFRBunYu0HvZZ1Rud+Lc6Xzi6H4jDlZ81
 vtOmAlMU3dbp97gNvJrAVqV4JIL3puOWbu0MMaQWoG53Kcdfu46LIr57TTg3dw61
 R9e7HSOQjYILChoodwELlyeAsVeZo3IzX9vPX8aw7MoHvneyTUNqtha/rHsLEwsb
 97G19dydGEC6
 =eSUz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-include-2019-08-13-v2' into staging

Header cleanup patches for 2019-08-13

# gpg: Signature made Fri 16 Aug 2019 12:39:12 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-include-2019-08-13-v2: (29 commits)
  sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
  sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h
  Include sysemu/sysemu.h a lot less
  Clean up inclusion of sysemu/sysemu.h
  numa: Move remaining NUMA declarations from sysemu.h to numa.h
  Include sysemu/hostmem.h less
  numa: Don't include hw/boards.h into sysemu/numa.h
  Include hw/boards.h a bit less
  Include hw/qdev-properties.h less
  Include qemu/main-loop.h less
  Include qemu/queue.h slightly less
  Include hw/hw.h exactly where needed
  Include qom/object.h slightly less
  Include exec/memory.h slightly less
  Include migration/vmstate.h less
  migration: Move the VMStateDescription typedef to typedefs.h
  Clean up inclusion of exec/cpu-common.h
  Include hw/irq.h a lot less
  typedefs: Separate incomplete types and function types
  ide: Include hw/ide/internal a bit less outside hw/ide/
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16 14:53:43 +01:00
Markus Armbruster d484205210 Include exec/memory.h slightly less
Drop unnecessary inclusions from headers.  Downgrade a few more to
exec/hwaddr.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-17-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Wei Yang 6a88eb2b08 migration: use migration_in_postcopy() to check POSTCOPY_ACTIVE
Use common helper function to check the state.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190719071129.11880-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-08-14 17:33:14 +01:00
Alex Bennée 1f4abd81f7 migration: move port_attr inside CONFIG_LINUX
Otherwise the FreeBSD compiler complains about an unused variable.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-04 19:23:07 +01:00
Markus Armbruster 0b8fa32f55 Include qemu/module.h where needed, drop it from qemu-common.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-4-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c
hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c;
ui/cocoa.m fixed up]
2019-06-12 13:18:33 +02:00
Dr. David Alan Gilbert 281496bb8a migration/rdma: Check qemu_rdma_init_one_block
Actually it can't fail at the moment, but Coverity moans that
it's the only place it's not checked, and it's an easy check.

Reported-by: Coverity (CID 1399413)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25 18:45:10 +01:00
Yury Kotov fbd162e629 migration: Add an ability to ignore shared RAM blocks
If ignore-shared capability is set then skip shared RAMBlocks during the
RAM migration.
Also, move qemu_ram_foreach_migratable_block (and rename) to the
migration code, because it requires access to the migration capabilities.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Yury Kotov 754cb9c0eb exec: Change RAMBlockIterFunc definition
Currently, qemu_ram_foreach_* calls RAMBlockIterFunc with many
block-specific arguments. But often iter func needs RAMBlock*.
This refactoring is needed for fast access to RAMBlock flags from
qemu_ram_foreach_block's callback. The only way to achieve this now
is to call qemu_ram_block_from_host (which also enumerates blocks).

So, this patch reduces complexity of
qemu_ram_foreach_block() -> cb() -> qemu_ram_block_from_host()
from O(n^2) to O(n).

Fix RAMBlockIterFunc definition and add some functions to read
RAMBlock* fields witch were passed.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-2-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Marcel Apfelbaum 9589e76301 migration/rdma: clang compilation fix
Configuring QEMU with:
        ../configure --cc=clang --enable-rdma

Leads to compilation error:

  CC      migration/rdma.o
  CC      migration/block.o
  qemu/migration/rdma.c:3615:58: error: taking address of packed member 'rkey' of class or structure
      'RDMARegisterResult' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
                            (uintptr_t)host_addr, NULL, &reg_result->rkey,
                                                         ^~~~~~~~~~~~~~~~
Fix it by using a temp local variable.

Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <20190304184923.24215-1-marcel.apfelbaum@gmail.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert cf75e26849 migration/rdma: Fix qemu_rdma_cleanup null check
If the migration fails before the channel is open (e.g. a bad
address) we end up in the cleanup with rdma->channel==NULL.

Spotted by Coverity: CID 1398634
Fixes: fbbaacab27
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190214185351.5927-1-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert fbbaacab27 migration/rdma: unregister fd handler
Unregister the fd handler before we destroy the channel,
otherwise we've got a race where we might land in the
fd handler just as we're closing the device.

(The race is quite data dependent, you just have to have
the right set of devices for it to trigger).

Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190122173111.29821-1-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-01-23 15:51:32 +00:00
Dr. David Alan Gilbert 449f91b2c8 migration/rdma: Fix uninitialised rdma_return_path
Clang correctly errors out moaning that rdma_return_path
is used uninitialised in the earlier error paths.
Make it NULL so that the error path ignores it.

Fixes: 55cc1b5937
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180830173657.22939-1-dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-09-26 12:21:33 +01:00
Lidong Chen 923709896b migration: poll the cm event for destination qemu
The destination qemu only poll the comp_channel->fd in
qemu_rdma_wait_comp_channel. But when source qemu disconnnect
the rdma connection, the destination qemu should be notified.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:17:43 +02:00
Lidong Chen 54db882f07 migration: implement the shutdown for RDMA QIOChannel
Because RDMA QIOChannel not implement shutdown function,
If the to_dst_file was set error, the return path thread
will wait forever. and the migration thread will wait
return path thread exit.

the backtrace of return path thread is:

(gdb) bt
    #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
    #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
        at qemu-timer.c:325
    #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
        at migration/rdma.c:1501
    #3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
        byte_len=0x7ef7091d0640) at migration/rdma.c:1580
    #4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
        head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
    #5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
        expecting=3) at migration/rdma.c:1903
    #6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
        size=32768) at migration/rdma.c:2714
    #7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
    #8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
        at migration/qemu-file.c:502
    #9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
    #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
    #11 0x00000000006a46d3 in source_return_path_thread (
        opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
    #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007f372a77635d in clone () from /lib64/libc.so.6

the backtrace of migration thread is:

(gdb) bt
    #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
    #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
        at util/qemu-thread-posix.c:504
    #2  0x00000000006a4bc5 in await_return_path_close_on_source (
        ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
    #3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
        current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
        at migration/migration.c:1695
    #4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
        at migration/migration.c:1837
    #5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #6  0x00007f372a77635d in clone () from /lib64/libc.so.6

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:14:45 +02:00
Lidong Chen d5882995a1 migration: poll the cm event while wait RDMA work request completion
If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
maybe loop forever. so we should also poll the cm event fd, and when
receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
we consider some error happened.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: Gal Shachaf <galsha@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:14:19 +02:00
Lidong Chen 4d9f675bcb migration: implement io_set_aio_fd_handler function for RDMA QIOChannel
if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
crash.

The backtrace is:
(gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
        io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
    #2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
    #3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
    #4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
        at migration/qemu-file-channel.c:83
    #5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
    #6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
    #7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
    #8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
    #9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
    #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
    #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
    #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
    #13 0x00007f96fe858760 in ?? ()
    #14 0x0000000000000000 in ?? ()

RDMA QIOChannel not implement io_set_aio_fd_handler. so
qio_channel_set_aio_fd_handler will access NULL pointer.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:13:11 +02:00
Lidong Chen f5627c2af9 migration: Stop rdma yielding during incoming postcopy
During incoming postcopy, the destination qemu will invoke
qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
yield, and poll the completion channel fd instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:13:02 +02:00
Lidong Chen 74637e6f08 migration: implement bi-directional RDMA QIOChannel
This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel currently, this patch use RCU to protect it.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:12:26 +02:00
Lidong Chen 55cc1b5937 migration: create a dedicated connection for rdma return path
If start a RDMA migration with postcopy enabled, the source qemu
establish a dedicated connection for return path.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:12:16 +02:00
Lidong Chen ccb7e1b5a6 migration: disable RDMA WRITE after postcopy started
RDMA WRITE operations are performed with no notification to the destination
qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
after postcopy started.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-08-22 12:12:07 +02:00
Dr. David Alan Gilbert ff0769a4ad migration: Fixes for non-migratable RAMBlocks
There are still a few cases where migration code is using the macros
and functions that do all RAMBlocks rather than just the migratable
blocks; fix those up.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180605162545.80778-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15 14:40:56 +01:00
Lidong Chen c5e76115cc migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
When cancel migration during RDMA precopy, the source qemu main thread hangs sometime.

The backtrace is:
    (gdb) bt
    #0  0x00007f249eabd43d in write () from /lib64/libpthread.so.0
    #1  0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
    #2  0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296
    #3  0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
    #4  0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273
    #5  0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98
    #6  0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334
    #7  0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162
    #8  0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
    #9  0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
    #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436
    #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0)
        at util/async.c:261
    #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
    #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215
    #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263
    #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522
    #16 0x00000000005bc6a5 in main_loop () at vl.c:1944
    #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752

It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime.

According to IB Spec once active side send DREQ message, it should wait for DREP message
and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped
due to network issues.
For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is
dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger.
Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state
and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events.

So it should not invoke rdma_get_cm_event which may hang forever, and the event channel
is also destroyed in qemu_rdma_cleanup.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04 05:46:15 +02:00
Lidong Chen f38f6d4155 migration: remove unnecessary variables len in QIOChannelRDMA
Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked
by different threads concurrently, this patch removes unnecessary variables
len in QIOChannelRDMA and use local variable instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Signed-off-by: Lidong Chen <jemmy858585@gmail.com>
2018-06-04 05:46:15 +02:00
Lidong Chen 71cd73061c migration: update index field when delete or qsort RDMALocalBlock
rdma_delete_block function deletes RDMALocalBlock base on index field,
but not update the index field. So when next time invoke rdma_delete_block,
it will not work correctly.

If start and cancel migration repeatedly, some RDMALocalBlock not invoke
ibv_dereg_mr to decrease kernel mm_struct vmpin. When vmpin is large than
max locked memory limitation, ibv_reg_mr will failed, and migration can not
start successfully again.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1525618499-1560-1-git-send-email-lidongchen@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Signed-off-by: Lidong Chen <jemmy858585@gmail.com>
2018-05-15 22:13:08 +02:00
Dr. David Alan Gilbert cce8040bb0 migration: Allow migrate_fd_connect to take an Error *
Allow whatever is performing the connection to pass migrate_fd_connect
an error to indicate there was a problem during connection, an allow
us to clean up.

The caller must free the error.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-06 10:55:12 +00:00
Eric Blake 2562755ee7 maint: Fix macros with broken 'do/while(0); ' usage
The point of writing a macro embedded in a 'do { ... } while (0)'
loop (particularly if the macro has multiple statements or would
otherwise end with an 'if' statement) is so that the macro can be
used as a drop-in statement with the caller supplying the
trailing ';'.  Although our coding style frowns on brace-less 'if':
  if (cond)
    statement;
  else
    something else;
that is the classic case where failure to use do/while(0) wrapping
would cause the 'else' to pair with any embedded 'if' in the macro
rather than the intended outer 'if'.  But conversely, if the macro
includes an embedded ';', then the same brace-less coding style
would now have two statements, making the 'else' a syntax error
rather than pairing with the outer 'if'.  Thus, even though our
coding style with required braces is not impacted, ending a macro
with ';' makes our code harder to port to projects that use
brace-less styles.

The change should have no semantic impact.  I was not able to
fully compile-test all of the changes (as some of them are
examples of the ugly bit-rotting debug print statements that are
completely elided by default, and I didn't want to recompile
with the necessary -D witnesses - cleaning those up is left as a
bite-sized task for another day); I did, however, audit that for
all files touched, all callers of the changed macros DID supply
a trailing ';' at the callsite, and did not appear to be used
as part of a brace-less conditional.

Found mechanically via: $ git grep -B1 'while (0);' | grep -A1 \\\\

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20171201232433.25193-7-eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 14:54:52 +01:00
Dr. David Alan Gilbert 32bce19634 migration/rdma: Send error during cancelling
When we issue a cancel and clean up the RDMA channel
send a CONTROL_ERROR to get the destination to quit.

The rdma_cleanup code waits for the event to come back
from the rdma_disconnect; but that wont happen until the
destination quits and there's currently nothing to force
it.

Note this makes the case of a cancel work while the destination
is alive, and it already works if the destination is
truly dead.  Note it doesn't fix the case where the destination
is hung (we get stuck waiting for the rdma_disconnect event).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20170717110936.23314-7-dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-07-18 17:36:18 +02:00
Dr. David Alan Gilbert 482a33c53c migration/rdma: Safely convert control types
control_desc[] is an array of strings that correspond to a
series of message types; they're used only for error messages, but if
the message type is seriously broken then we could go off the end of
the array.

Convert the array to a function control_desc() that bound checks.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20170717110936.23314-6-dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-07-18 17:36:17 +02:00
Dr. David Alan Gilbert 9c98cfbe72 migration/rdma: Allow cancelling while waiting for wrid
When waiting for a WRID, if the other side dies we end up waiting
for ever with no way to cancel the migration.
Cure this by poll()ing the fd first with a timeout and checking
error flags and migration state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20170717110936.23314-5-dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-07-18 17:36:17 +02:00
Dr. David Alan Gilbert 0b3c15f097 migration/rdma: fix qemu_rdma_block_for_wrid error paths
The two places that 'goto err_block_for_wrid' weren't setting ret
and so would end up returning 0 even though we've failed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20170717110936.23314-4-dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-07-18 17:36:16 +02:00
Dr. David Alan Gilbert 9cf2bab2ed migration/rdma: Fix race on source
Fix a race where the destination might try and send the source a
WRID_READY before the source has done a post-recv for it.

rdma_post_recv has to happen after the qp exists, and we're
OK since we've already called qemu_rdma_source_init that calls
qemu_alloc_qp.

This corresponds to:
https://bugzilla.redhat.com/show_bug.cgi?id=1285044

The race can be triggered by adding a few ms wait before this
post_recv_control (which was originally due to me turning on loads of
debug).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20170717110936.23314-2-dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-07-18 17:36:14 +02:00
Juan Quintela 6666c96aac migration: Move migration.h to migration/
Nothing uses it outside of migration.h

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-06-13 11:00:45 +02:00
Juan Quintela 7b1e1a2202 migration: Export ram.c functions in its own file
All functions are internal except for ram_mig_init().  Create
migration/misc.h for this kind of functions.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-06-01 18:49:23 +02:00
Juan Quintela e1a3ecee3b migration: Export rdma.c functions in its own file
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-06-01 18:49:23 +02:00
Juan Quintela 08a0aee15c migration: Split qemu-file.h
Split the file into public and internal interfaces.  I have to rename
the external one because we can't have two include files with the same
name in the same directory.  Build system gets confused.  The only
exported functions are the ones that handle basic types.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-06-01 18:49:22 +02:00
Juan Quintela 40014d81f2 migration: Export qemu-file-channel.c functions in its own file
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-05-18 19:20:50 +02:00
Markus Armbruster 0785bd7a7c sockets: Prepare inet_parse() for flattened SocketAddress
I'm going to flatten SocketAddress: rename SocketAddress to
SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate
SocketAddressLegacy except in external interfaces.

inet_parse() returns a newly allocated InetSocketAddress.  Lift the
allocation from inet_parse() into its caller socket_parse() to prepare
for flattening SocketAddress.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1493192202-3184-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Straightforward rebase]
2017-05-09 09:14:40 +02:00
Fam Zheng bbfb89e38c migration: Make errp the last parameter of local functions
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170421122710.15373-13-famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-04-24 09:13:44 +02:00
Dr. David Alan Gilbert cd5ea07064 migration/rdma: Don't flag an error when we've been told about one
If the other side tells us there's been an error and we fail
the migration, we don't need to signal that failure to the other
side because it already knew.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <michael@hinespot.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:22:38 +02:00
Dr. David Alan Gilbert 12c67ffb1f migration/rdma: Pass qemu_file errors across link
If we fail for some reason (e.g. a mismatched RAMBlock)
and it's set the qemu_file error flag, pass that error back to the
peer so it can clean up rather than waiting for some higher level
progress.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael R. Hines <michael@hinespot.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:22:38 +02:00
Richard Henderson a1febc4950 cutils: Export only buffer_is_zero
Since the two users don't make use of the returned offset,
beyond ensuring that the entire buffer is zero, consider the
can_use_buffer_find_nonzero_offset and buffer_find_nonzero_offset
functions internal.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-4-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13 19:09:45 +02:00
Daniel P. Berrange 22724f4921 migration: rename functions to starting migrations
Apply the following renames for starting incoming migration:

 process_incoming_migration -> migration_fd_process_incoming
 migration_set_incoming_channel -> migration_channel_process_incoming
 migration_tls_set_incoming_channel -> migration_tls_channel_process_incoming

and for starting outgoing migration:

 migration_set_outgoing_channel -> migration_channel_connect
 migration_tls_set_outgoing_channel -> migration_tls_channel_connect

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1464776234-9910-3-git-send-email-berrange@redhat.com
Message-Id: <1464776234-9910-3-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-06-16 09:51:37 +05:30
Daniel P. Berrange 6ddd2d76ca migration: convert RDMA to use QIOChannel interface
This converts the RDMA code to provide a subclass of QIOChannel
that uses RDMA for the data transport.

This implementation of RDMA does not correctly handle non-blocking
mode. Reads might block if there was not already some pending data
and writes will block until all data is sent. This flawed behaviour
was already present in the existing impl, so appears to not be a
critical problem at this time. It should be on the list of things
to fix in the future though.

The RDMA code would be much better off it it could be split up in
a generic RDMA layer, a QIOChannel impl based on RMDA, and then
the RMDA migration glue. This is left as a future exercise for
the brave.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-18-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:50 +05:30
Daniel P. Berrange d59ce6f344 migration: add reporting of errors for outgoing migration
Currently if an application initiates an outgoing migration,
it may or may not, get an error reported back on failure. If
the error occurs synchronously to the 'migrate' command
execution, the client app will see the error message. This
is the case for DNS lookup failures. If the error occurs
asynchronously to the monitor command though, the error
will be thrown away and the client left guessing about
what went wrong. This is the case for failure to connect
to the TCP server (eg due to wrong port, or firewall
rules, or other similar errors).

In the future we'll be adding more scope for errors to
happen asynchronously with the TLS protocol handshake.
TLS errors are hard to diagnose even when they are well
reported, so discarding errors entirely will make it
impossible to debug TLS connection problems.

Management apps which do migration are already using
'query-migrate' / 'info migrate' to check up on progress
of background migration operations and to see their end
status. This is a fine place to also include the error
message when things go wrong.

This patch thus adds an 'error-desc' field to the
MigrationInfo struct, which will be populated when
the 'status' is set to 'failed':

(qemu) migrate -d tcp:localhost:9001
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: failed (Error connecting to socket: Connection refused)
total time: 0 milliseconds

In the HMP, when doing non-detached migration, it is
also possible to display this error message directly
to the app.

(qemu) migrate tcp:localhost:9001
Error connecting to socket: Connection refused

Or with QMP

  {
    "execute": "query-migrate",
    "arguments": {}
  }
  {
    "return": {
      "status": "failed",
      "error-desc": "address resolution failed for myhost:9000: No address associated with hostname"
    }
  }

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-11-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:30 +05:30
Daniel P. Berrange 0436e09f96 migration: split migration hooks out of QEMUFileOps
The QEMUFileOps struct contains the I/O subsystem callbacks
and the migration stage hooks. Split the hooks out into a
separate QEMUFileHooks struct to make it easier to refactor
the I/O side of QEMUFile without affecting the hooks.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-6-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:16 +05:30