qemu-patch-raspberry4

Author	SHA1	Message	Date
Eric Blake	e54ee1b385	block: Add support to warn on backing file change without format For now, this is a mechanical addition; all callers pass false. But the next patch will use it to improve 'qemu-img rebase -u' when selecting a backing file with no format. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Message-Id: <20200706203954.341758-10-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Kevin Wolf	ffa244c84a	file-posix: Mitigate file fragmentation with extent size hints Especially when O_DIRECT is used with image files so that the page cache indirection can't cause a merge of allocating requests, the file will fragment on the file system layer, with a potentially very small fragment size (this depends on the requests the guest sent). On Linux, fragmentation can be reduced by setting an extent size hint when creating the file (at least on XFS, it can't be set any more after the first extent has been allocated), basically giving raw files a "cluster size" for allocation. This adds a create option to set the extent size hint, and changes the default from not setting a hint to setting it to 1 MB. The main reason why qcow2 defaults to smaller cluster sizes is that COW becomes more expensive, which is not an issue with raw files, so we can choose a larger size. The tradeoff here is only potentially wasted disk space. For qcow2 (or other image formats) over file-posix, the advantage should even be greater because they grow sequentially without leaving holes, so there won't be wasted space. Setting even larger extent size hints for such images may make sense. This can be done with the new option, but let's keep the default conservative for now. The effect is very visible with a test that intentionally creates a badly fragmented file with qemu-img bench (the time difference while creating the file is already remarkable) and then looks at the number of extents and the time a simple "qemu-img map" takes. Without an extent size hint: $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 25.848 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 19.616 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 2000000 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m1,279s user 0m0,043s sys 0m1,226s With the new default extent size hint of 1 MB: $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 11.833 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 10.155 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 178 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m0,061s user 0m0,040s sys 0m0,014s Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200707142329.48303-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Vladimir Sementsov-Ogievskiy	795d946d07	nbd: Use ERRP_GUARD() If we want to check error after errp-function call, we need to introduce local_err and then propagate it to errp. Instead, use the ERRP_GUARD() macro, benefits are: 1. No need of explicit error_propagate call 2. No need of explicit local_err variable: use errp directly 3. ERRP_GUARD() leaves errp as is if it's not NULL or &error_fatal, this means that we don't break error_abort (we'll abort on error_set, not on error_propagate) If we want to add some info to errp (by error_prepend() or error_append_hint()), we must use the ERRP_GUARD() macro. Otherwise, this info will not be added when errp == &error_fatal (the program will exit prior to the error_append_hint() or error_prepend() call). Fix several such cases, e.g. in nbd_read(). This commit is generated by command sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \ MAINTAINERS \| \ xargs git ls-files \| grep '\.[hc]$' \| \ xargs spatch \ --sp-file scripts/coccinelle/errp-guard.cocci \ --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200707165037.1026246-8-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and auto-propagated-errp.cocci to errp-guard.cocci. Commit message tweaked again.]	2020-07-10 15:18:09 +02:00
Vladimir Sementsov-Ogievskiy	a2adbbf603	block: drop unallocated_blocks_are_zero Currently this field only set by qed and qcow2. But in fact, all backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share these semantics: on unallocated blocks, if there is no backing file they just memset the buffer with zeroes. So, document this behavior for .supports_backing and drop .unallocated_blocks_are_zero Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	7b1efe996c	block: inline bdrv_unallocated_blocks_are_zero() The function has only one user: bdrv_co_block_status(). Inline it to simplify reviewing of the following patches, which will finally drop unallocated_blocks_are_zero field too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	ced914d0ab	block/core: add generic infrastructure for x-blockdev-amend qmp command blockdev-amend will be used similiar to blockdev-create to allow on the fly changes of the structure of the format based block devices. Current plan is to first support encryption keyslot management for luks based formats (raw and embedded in qcow2) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	df373fb0a3	block/amend: separate amend and create options for qemu-img Some options are only useful for creation (or hard to be amended, like cluster size for qcow2), while some other options are only useful for amend, like upcoming keyslot management options for luks Since currently only qcow2 supports amend, move all its options to a common macro and then include it in each action option list. In future it might be useful to remove some options which are not supported anyway from amend list, which currently cause an error message if amended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-5-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	a3579bfa0a	block/amend: add 'force' option 'force' option will be used for some unsafe amend operations. This includes things like erasing last keyslot in luks based formats (which destroys the data, unless the master key is backed up by external means), but that _might_ be desired result. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Eric Blake	f9919116b8	osdep: Make MIN/MAX evaluate arguments only once I'm not aware of any immediate bugs in qemu where a second runtime evaluation of the arguments to MIN() or MAX() causes a problem, but proactively preventing such abuse is easier than falling prey to an unintended case down the road. At any rate, here's the conversation that sparked the current patch: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg05718.html Update the MIN/MAX macros to only evaluate their argument once at runtime; this uses typeof(1 ? (a) : (b)) to ensure that we are promoting the temporaries to the same type as the final comparison (we have to trigger type promotion, as typeof(bitfield) won't compile; and we can't use typeof((a) + (b)) or even typeof((a) + 0), as some of our uses of MAX are on void* pointers where such addition is undefined). However, we are unable to work around gcc refusing to compile ({}) in a constant context (such as the array length of a static variable), even when only used in the dead branch of a __builtin_choose_expr(), so we have to provide a second macro pair MIN_CONST and MAX_CONST for use when both arguments are known to be compile-time constants and where the result must also be usable as a constant; this second form evaluates arguments multiple times but that doesn't matter for constants. By using a void expression as the expansion if a non-constant is presented to this second form, we can enlist the compiler to ensure the double evaluation is not attempted on non-constants. Alas, as both macros now rely on compiler intrinsics, they are no longer usable in preprocessor #if conditions; those will just have to be open-coded or the logic rewritten into #define or runtime 'if' conditions (but where the compiler dead-code-elimination will probably still apply). I tested that both gcc 10.1.1 and clang 10.0.0 produce errors for all forms of macro mis-use. As the errors can sometimes be cryptic, I'm demonstrating the gcc output: Use of MIN when MIN_CONST is needed: In file included from /home/eblake/qemu/qemu-img.c:25: /home/eblake/qemu/include/qemu/osdep.h:249:5: error: braced-group within expression allowed only inside a function 249 \| ({ \ \| ^ /home/eblake/qemu/qemu-img.c:92:12: note: in expansion of macro ‘MIN’ 92 \| char array[MIN(1, 2)] = ""; \| ^~~ Use of MIN_CONST when MIN is needed: /home/eblake/qemu/qemu-img.c: In function ‘is_allocated_sectors’: /home/eblake/qemu/qemu-img.c:1225:15: error: void value not ignored as it ought to be 1225 \| i = MIN_CONST(i, n); \| ^ Use of MIN in the preprocessor: In file included from /home/eblake/qemu/accel/tcg/translate-all.c:20: /home/eblake/qemu/accel/tcg/translate-all.c: In function ‘page_check_range’: /home/eblake/qemu/include/qemu/osdep.h:249:6: error: token "{" is not valid in preprocessor expressions 249 \| ({ \ \| ^ Fix the resulting callsites that used #if or computed a compile-time constant min or max to use the new macros. cpu-defs.h is interesting, as CPU_TLB_DYN_MAX_BITS is sometimes used as a constant and sometimes dynamic. It may be worth improving glib's MIN/MAX definitions to be saner, but that is a task for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200625162602.700741-1-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-26 09:39:39 -04:00
Klaus Jensen	3e829fd438	hw/block/nvme: use constants in identify Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Message-Id: <20200609190333.59390-6-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-06-17 14:53:40 +02:00
Vladimir Sementsov-Ogievskiy	7ae89a0de9	block/dirty-bitmap: add bdrv_has_named_bitmaps helper To be used for bitmap migration in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-28 13:15:22 -05:00
Eric Blake	c6996cf9a6	blockdev: Promote several bitmap functions to non-static The next patch will split blockdev.c, which will require accessing some previously-static functions from more than one .c file. But part of promoting a function to public is picking a naming scheme that does not reek of exposing too many internals (two of the three functions were named starting with 'do_'). To make future code motion easier, perform the function rename and non-static promotion into its own patch. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513011648.166876-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-19 10:32:14 -05:00
Eric Blake	ef893b5c84	block: Make it easier to learn which BDS support bitmaps Upcoming patches will enhance bitmap support in qemu-img, but in doing so, it turns out to be nice to suppress output when persistent bitmaps make no sense (such as on a qcow2 v2 image). Add a hook to make this easier to query. This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap, rather than trying to shoehorn the answer in via existing callbacks. In particular, while it might have been possible to overload .bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to answer whether any persistent bitmaps are supported, that is at odds with whether a particular bitmap can be stored (for example, even on an image that supports persistent bitmaps but has currently filled up the maximum number of bitmaps, attempts to store another one should fail); and the new functionality doesn't require coroutine safety. Similarly, we could have added one more piece of information to .bdrv_get_info, but then again, most callers to that function tend to already discard extraneous information, and making it a catch-all rather than a series of dedicated scalar queries hasn't really simplified life. In the future, when we improve the ability to look up bitmaps through a filter, we will probably also want to teach the block layer to automatically let filters pass this request on through. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513011648.166876-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-19 10:32:14 -05:00
Peter Maydell	f2465433b4	Pull request -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl7DkiAACgkQnKSrs4Gr c8h3lAf/encwoP2tfsFIMRmVfP2lu7lmbfcagv2CeuFU8DzFqrv1lJEoJTyWPq+X 3mrlyhxBiUTt8+CiLLeOOiNHzq9P/7ApvREeBYZ+LV2vV12lZwas23pdmXV4YvAk OammorLzwoy6xFRoWUkElitMrcCrpwNNMe0DWGwvUPLAx4HUuXZ1Ilo1DKpB8YkR pI/x7h3/UG6jgxUIHao4P078jG+wANWxJBt8gPzm8vS9eC+nOLkkzvi6zfN8IqzD 7RG9ieENJSz9lRQP/GGglW9aB83rmKsENGD0xrPWUNM6t9gvthJevAJUVBaoHUXr XKTsEUTYyR8nLasSNeAaT5nzHUa01w== =8IIS -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request # gpg: Signature made Tue 19 May 2020 09:00:32 BST # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: aio-posix: disable fdmon-io_uring when GSource is used aio-posix: don't duplicate fd handler deletion in fdmon_io_uring_destroy() tests/fuzz: Extract ioport_fuzz_qtest() method tests/fuzz: Extract pciconfig_fuzz_qos() method tests/fuzz: Remove unuseful/unused typedefs tests/fuzz: Add missing space in test description Makefile: List fuzz targets in 'make help' tests/fuzz/Makefile: Do not link code using unavailable devices Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-19 13:42:58 +01:00
Stefan Hajnoczi	ba607ca8bf	aio-posix: disable fdmon-io_uring when GSource is used The glib event loop does not call fdmon_io_uring_wait() so fd handlers waiting to be submitted build up in the list. There is no benefit is using io_uring when the glib GSource is being used, so disable it instead of implementing a more complex fix. This fixes a memory leak where AioHandlers would build up and increasing amounts of CPU time were spent iterating them in aio_pending(). The symptom is that guests become slow when QEMU is built with io_uring support. Buglink: https://bugs.launchpad.net/qemu/+bug/1877716 Fixes: `73fd282e7b` ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-18 18:16:00 +01:00
Max Reitz	e5d8a40685	block: Drop @child_class from bdrv_child_perm() Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	f6de853fa3	block: Drop child_file Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-33-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	9aab945e9c	block: Drop bdrv_format_default_perms() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-32-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	87278af1d9	block: Make bdrv_filter_default_perms() static Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-31-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	ff3541c4e2	block: Drop child_backing Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-25-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	f34ade1148	block: Drop child_format Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-23-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	2519f54919	block: Add bdrv_default_perms() This callback can be used by BDSs that use child_of_bds with the appropriate BdrvChildRole for their children. Also, make bdrv_format_default_perms() use it for child_of_bds children (just a temporary solution until we can drop bdrv_format_default_perms() altogether). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-20-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	4348355032	block: Add child_of_bds Any current user of child_file, child_format, and child_backing can and should use this generic BdrvChildClass instead, as it can handle all of these cases. However, to be able to do so, the users must pass the appropriate BdrvChildRole when the child is created/attached. (The following commits will take care of that.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-15-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	3cdc69d31b	block: Pass parent_is_format to .inherit_options() We plan to unify the generic .inherit_options() functions. The resulting common function will need to decide whether to force-enable format probing, force-disable it, or leave it as-is. To make this decision, it will need to know whether the parent node is a format node or not (because we never want format probing if the parent is a format node already (except for the backing chain)). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	272c02eaef	block: Pass BdrvChildRole to .inherit_options() For now, all callers (effectively) pass 0 and no callee evaluates thie value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bf8e925eb5	block: Pass BdrvChildRole to bdrv_child_perm() For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	258b776515	block: Add BdrvChildRole to BdrvChild For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	3284bcf430	block: Add BdrvChildRole and BdrvChildRoleBits This mask will supplement BdrvChildClass when it comes to what role (or combination of roles) a child takes for its parent. It consists of BdrvChildRoleBits values (which is an enum). Because empty enums are not allowed, let us just start with it filled. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-5-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bd86fb990c	block: Rename BdrvChildRole to BdrvChildClass This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	d67066d8bc	block: Add BlockDriver.is_format We want to unify child_format and child_file at some point. One of the important things that set format drivers apart from other drivers is that they do not expect other format nodes under them (except in the backing chain), i.e. we must not probe formats inside of formats. That means we need something on which to distinguish format drivers from others, and hence this flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	6f7a3b535f	block: Add bdrv_make_empty() Right now, all users of bdrv_make_empty() call the BlockDriver method directly. That is not only bad style, it is also wrong, unless the caller has a BdrvChild with a WRITE or WRITE_UNCHANGED permission. (WRITE_UNCHANGED suffices, because callers generally use this function to clear a node with a backing file after a commit operation.) Introduce bdrv_make_empty() that verifies that it does. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Denis Plotnikov	572ad9783f	qcow2: introduce compression type feature The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus compression type defines the only compression algorithm used for the image, and thus, for all image clusters. The goal of the feature is to add support of other compression methods to qcow2. For example, ZSTD which is more effective on compression than ZLIB. The default compression is ZLIB. Images created with ZLIB compression type are backward compatible with older qemu versions. Adding of the compression type breaks a number of tests because now the compression type is reported on image creation and there are some changes in the qcow2 header in size and offsets. The tests are fixed in the following ways: * filter out compression_type for many tests * fix header size, feature table size and backing file offset affected tests: 031, 036, 061, 080 header_size +=8: 1 byte compression type 7 bytes padding feature_table += 48: incompatible feature compression type backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change) * add "compression type" for test output matching when it isn't filtered affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206, 242, 255, 274, 280 Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Eric Blake	47e0b38a13	block: Drop unused .bdrv_has_zero_init_truncate Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in `ceaca56f` and implemented in commit `1dcaf527` claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Andrzej Jakowski	6cf9413229	nvme: introduce PMR support from NVMe 1.4 spec This patch introduces support for PMR that has been defined as part of NVMe 1.4 spec. User can now specify a pmrdev option that should point to HostMemoryBackend. pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe device. Guest OS can perform mmio read and writes to the PMR region that will stay persistent across system reboot. Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200330164656.9348-1-andrzej.jakowski@linux.intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	7b8e485742	block: Add flags to bdrv(_co)_truncate() Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	92b92799dc	block: Add flags to BlockDriver.bdrv_co_truncate() This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate() driver callbacks, and a supported_truncate_flags field in BlockDriverState that allows drivers to advertise support for request flags in the context of truncate. For now, we always pass 0 and no drivers declare support for any flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Paolo Bonzini	3c18a92dc4	aio-wait: delegate polling of main AioContext if BQL not held Any thread that is not a iothread returns NULL for qemu_get_current_aio_context(). As a result, it would also return true for in_aio_context_home_thread(qemu_get_aio_context()), causing AIO_WAIT_WHILE to invoke aio_poll() directly. This is incorrect if the BQL is not held, because aio_poll() does not expect to run concurrently from multiple threads, and it can actually happen when savevm writes to the vmstate file from the migration thread. Therefore, restrict in_aio_context_home_thread to return true for the main AioContext only if the BQL is held. The function is moved to aio-wait.h because it is mostly used there and to avoid a circular reference between main-loop.h and block/aio.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200407140746.8041-5-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-04-09 16:16:28 +01:00
Maxim Levitsky	5a5e7f8cd8	block: trickle down the fallback image creation function use to the block drivers Instead of checking the .bdrv_co_create_opts to see if we need the fallback, just implement the .bdrv_co_create_opts in the drivers that need it. This way we don't break various places that need to know if the underlying protocol/format really supports image creation, and this way we still allow some drivers to not support image creation. Fixes: `fd17146cd9` Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007 Note that technically this driver reverts the image creation fallback for the vxhs driver since I don't have a means to test it, and IMHO it is better to leave it not supported as it was prior to generic image creation patches. Also drop iscsi_create_opts which was left accidentally. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> [mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and bdrv_create_opts_simple from block.h into block_int.h] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Maxim Levitsky	b92902dfea	block: pass BlockDriver reference to the .bdrv_co_create This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Vladimir Sementsov-Ogievskiy	299ea9ff01	block/dirty-bitmap: improve _next_dirty_area API Firstly, _next_dirty_area is for scenarios when we may contiguously search for next dirty area inside some limited region, so it is more comfortable to specify "end" which should not be recalculated on each iteration. Secondly, let's add a possibility to limit resulting area size, not limiting searching area. This will be used in NBD code in further commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	9399c54b75	block/dirty-bitmap: add _next_dirty API We have bdrv_dirty_bitmap_next_zero, let's add corresponding bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than bitmap iterators in some cases. For test modify test_hbitmap_next_zero_check_range to check both next_zero and next_dirty and add some new checks. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	642700fda0	block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t We are going to introduce bdrv_dirty_bitmap_next_dirty so that same variable may be used to store its return value and to be its parameter, so it would int64_t. Similarly, we are going to refactor hbitmap_next_dirty_area to use hbitmap_next_dirty together with hbitmap_next_zero, therefore we want hbitmap_next_zero parameter type to be int64_t too. So, for convenience update all parameters of _next_zero and _next_dirty_area to be int64_t. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Peter Maydell	49780a582d	Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJeaQYTAAoJEH8JsnLIjy/WQwYP/3pzAjqVecL3dGmnPWAkBqCV CFpxT2nIMe+xCBvWQBoeekHsFJ7GQf4E1WVNRZgoAh9VQkvkajZsVNn8Auo2Veq2 c7w/R4xf/Wet2hKGVRS0JXwbg69U5BbpcF7E2DRNfp+CvaDCafHSDNeGTb3hFUjT x1jwhK6VqfY7+LHU0B0QmX2KA66nDx1p8l8HJQYd1MlCKAbj8kv/swEbqBJn32hA 32CIYfC4VCqkW5va1eOjd3Kyi/ugkFCHTI8+mOa45/BFBzIiKfCsDaFHh/DI59QB qcDKkUcO3+W788vCKgJQGnG070TwKPx2OnjhxFKiEGaoX3Sz+AY4wUvf3mfFl8GM zYqTdOy4Xh0ckvA6JCS0jtAKmANkeEGqnECAgub22Z+kyOzqC05B1FkYwqYDcFXY atWKm5Vr47jgD6Oq6O0OpZaZrAUWOfoBmq4ErnrBEHuW5329NEInmjYwxednK+43 CwU/lSdX7ujRSsjS8Xi1dHS4pxHK/mg51dInL44zGFUayegiLPgA8cuESE0mHOfZ 67X14rxu6D4Y5r0L+w7rsSGjByR29VynE1McL9fZ1Wp29JHaQ5fjdG6GMXEwYxmV R0YNXe85FAlNgqj0Bme+fR2YPxZ48NqHIMOvFFStNHyfD0qQN0TtT1iarfEcBR0u jm8MnSoIDLkRXEgBW9bW =ociU -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add # gpg: Signature made Wed 11 Mar 2020 15:38:59 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: qemu-iotests: adding LUKS cleanup for non-UTF8 secret error crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails block.c: adding bdrv_co_delete_file block: introducing 'bdrv_co_delete_file' interface tests/qemu-iotests: Fix socket_scm_helper build path qapi: Add '@allow-write-only-overlay' feature for 'blockdev-snapshot' iotests: Add iothread cases to 155 block: Fix cross-AioContext blockdev-snapshot iotests: Test mirror with temporarily disabled target backing file iotests: Fix run_job() with use_log=False block: Relax restrictions for blockdev-snapshot block: Make bdrv_get_cumulative_perm() public qom-qmp-cmds: fix two memleaks in qmp_object_add Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-12 16:51:26 +00:00
Peter Maydell	5931ed5641	Block patches for the 5.0 softfreeze: - qemu-img measure for LUKS - Improve block-copy's performance by reducing inter-request dependencies - Make curl's detection of accept-ranges more robust - Memleak fixes - iotest fix -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl5o5UUSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AnpMIALI1JGJ+kywe+BNTphOKtFGuIuHku/K4 2GujC9WjX4to1xyozockXcP/AlFCk1yicbxW8hxPMxgtNmLh6E6Y2vRNT4/Fnpjc KeyFw062rtD/O6H8jchfX1wIaMBDBASM2GGUtonbfs9mdYmFnEPg/9jo0eX/b9Qp kEDFyVonJsAGFT7gWYB7p2RxToqC/JQBdenacKuYeKkLQOMvUHChERktnRhSfRt6 bJIBkQ1e8xFuuFfnEUQR2GkvzzVw8k7kkY5gSN5m1kBCi3D1i/Pee5CFkYnQFiAD 66GGKH5OIFt83t7msqu1zhb7kobZ1OH671nMSTdqTRncRnQfLWloCkk= =D7LF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-03-11' into staging Block patches for the 5.0 softfreeze: - qemu-img measure for LUKS - Improve block-copy's performance by reducing inter-request dependencies - Make curl's detection of accept-ranges more robust - Memleak fixes - iotest fix # gpg: Signature made Wed 11 Mar 2020 13:19:01 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-03-11: block/block-copy: hide structure definitions block/block-copy: reduce intersecting request lock block/block-copy: rename start to offset in interfaces block/block-copy: refactor interfaces to use bytes instead of end block/block-copy: factor out find_conflicting_inflight_req block/block-copy: use block_status block/block-copy: specialcase first copy_range request block/block-copy: fix progress calculation job: refactor progress to separate object block/qcow2-threads: fix qcow2_decompress qemu-img: free memory before re-assign block/qcow2: do free crypto_opts in qcow2_close() iotests: Fix nonportable use of od --endian block/curl: HTTP header field names are case insensitive block/curl: HTTP header fields allow whitespace around values iotests: add 288 luks qemu-img measure test qemu-img: allow qemu-img measure --object without a filename luks: implement .bdrv_measure() luks: extract qcrypto_block_calculate_payload_offset() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-11 17:06:40 +00:00
Daniel Henrique Barboza	e1d7f8bb1e	block.c: adding bdrv_co_delete_file Using the new 'bdrv_co_delete_file' interface, a pure co_routine function 'bdrv_co_delete_file' inside block.c can can be used in a way similar of the existing bdrv_create_file to to clean up a created file. We're creating a pure co_routine because the only caller of 'bdrv_co_delete_file' will be already in co_routine context, thus there is no need to add all the machinery to check for qemu_in_coroutine() and create a separated co_routine to do the job. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-3-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Daniel Henrique Barboza	9bffae14df	block: introducing 'bdrv_co_delete_file' interface Adding to Block Drivers the capability of being able to clean up its created files can be useful in certain situations. For the LUKS driver, for instance, a failure in one of its authentication steps can leave files in the host that weren't there before. This patch adds the 'bdrv_co_delete_file' interface to block drivers and add it to the 'file' driver in file-posix.c. The implementation is given by 'raw_co_delete_file'. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Kevin Wolf	c7a0f2be8f	block: Make bdrv_get_cumulative_perm() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200310113831.27293-2-kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Peter Maydell	6e8a73e911	Pull request -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl5o3EQACgkQnKSrs4Gr c8jbYwgAupuS62MCITszybRE5Ote5CL80QDbMXNsZnZh6YBIGhPCqW2zdI2+m0zu +qoN6x6dxxNUWCvNlhbOcA45fiZVmYzS69TiEo21kCDijoK+h8+W+YbXuxGR2xJi cZMm8Q1DiK6Lj3vyfiwkFf4ns3VNz9DhI9hXu6CcpSkNcp79elQu87JJbzEWWWWy uEc7uEyBr0uCAKLEJvaLzzzWE2D2i6qKlmj3G17UbDNgCJ/Q/5HX13RUfMrgNgiP wmpcJ5MsB3Prz3K4XMMytUKXX/M8zpRLahp3p31t9qHelTWC3Lk1U4xzLPTJZPlm if/lrGRCRml+DKb9keBjWeTF4U31vg== =LftU -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request # gpg: Signature made Wed 11 Mar 2020 12:40:36 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: aio-posix: remove idle poll handlers to improve scalability aio-posix: support userspace polling of fd monitoring aio-posix: add io_uring fd monitoring implementation aio-posix: simplify FDMonOps->update() prototype aio-posix: extract ppoll(2) and epoll(7) fd monitoring aio-posix: move RCU_READ_LOCK() into run_poll_handlers() aio-posix: completely stop polling when disabled aio-posix: remove confusing QLIST_SAFE_REMOVE() qemu/queue.h: clear linked list pointers on remove Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-11 14:41:27 +00:00
Vladimir Sementsov-Ogievskiy	397f4e9d83	block/block-copy: hide structure definitions Hide structure definitions and add explicit API instead, to keep an eye on the scope of the shared fields. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	8719091f9d	block/block-copy: rename start to offset in interfaces offset/bytes pair is more usual naming in block layer, let's use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	dafaf13593	block/block-copy: refactor interfaces to use bytes instead of end We have a lot of "chunk_end - start" invocations, let's switch to bytes/cur_bytes scheme instead. While being here, improve check on block_copy_do_copy parameters to not overflow when calculating nbytes and use int64_t for bytes in block_copy for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	d0ebeca14a	block/block-copy: fix progress calculation Assume we have two regions, A and B, and region B is in-flight now, region A is not yet touched, but it is unallocated and should be skipped. Correspondingly, as progress we have total = A + B current = 0 If we reset unallocated region A and call progress_reset_callback, it will calculate 0 bytes dirty in the bitmap and call job_progress_set_remaining, which will set total = current + 0 = 0 + 0 = 0 So, B bytes are actually removed from total accounting. When job finishes we'll have total = 0 current = B , which doesn't sound good. This is because we didn't considered in-flight bytes, actually when calculating remaining, we should have set (in_flight + dirty_bytes) as remaining, not only dirty_bytes. To fix it, let's refactor progress calculation, moving it to block-copy itself instead of fixing callback. And, of course, track in_flight bytes count. We still have to keep one callback, to maintain backup job bytes_read calculation, but it will go on soon, when we turn the whole backup process into one block_copy call. Cc: qemu-stable@nongnu.org Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Maxim Levitsky	89802d5ae7	monitor/hmp: Move hmp_drive_add_node to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:22 +00:00
Maxim Levitsky	2bcad73c4b	monitor/hmp: move hmp_info_block* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-11-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:21 +00:00
Maxim Levitsky	1061f8dd80	monitor/hmp: move remaining hmp_block* functions to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-10-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:13 +00:00
Maxim Levitsky	e263120ecc	monitor/hmp: move hmp_nbd_server* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-9-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:17:58 +00:00
Maxim Levitsky	fce2b91fdf	monitor/hmp: move hmp_snapshot_* to block-hmp-cmds.c hmp_snapshot_blkdev is from GPLv2 version of the hmp-cmds.c thus have to change the licence to GPLv2 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-8-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:07:50 +00:00
Maxim Levitsky	6b7fbf61fb	monitor/hmp: move hmp_block_job* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-7-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:07:48 +00:00
Maxim Levitsky	0932e3f23d	monitor/hmp: move hmp_drive_mirror and hmp_drive_backup to block-hmp-cmds.c Moved code was added after 2012-01-13, thus under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-6-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed commit message	2020-03-09 18:07:35 +00:00
Maxim Levitsky	a1edae276a	monitor/hmp: move hmp_drive_del and hmp_commit to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-5-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:05:33 +00:00
Maxim Levitsky	a2dde2f221	monitor/hmp: rename device-hotplug.c to block/monitor/block-hmp-cmds.c These days device-hotplug.c only contains the hmp_drive_add In the next patch, rest of hmp_drive* functions will be moved there. Also add block-hmp-cmds.h to contain prototypes of these functions License for block-hmp-cmds.h since it contains the code moved from sysemu.h which lacks license and thus according to LICENSE is under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200308092440.23564-4-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:05:31 +00:00
Stefan Hajnoczi	d37d0e365a	aio-posix: remove idle poll handlers to improve scalability When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>	2020-03-09 16:45:16 +00:00
Stefan Hajnoczi	aa38e19f05	aio-posix: support userspace polling of fd monitoring Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	73fd282e7b	aio-posix: add io_uring fd monitoring implementation The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	b321051cf4	aio-posix: simplify FDMonOps->update() prototype The AioHandler node, bool is_new arguments are more complicated to think about than simply being given AioHandler old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Stefan Hajnoczi	1f050a4690	aio-posix: extract ppoll(2) and epoll(7) fd monitoring The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>	2020-03-09 16:41:31 +00:00
Kevin Wolf	eed8b69178	qemu-storage-daemon: Add --nbd-server option Add a --nbd-server option to qemu-storage-daemon to start the built-in NBD server right away. It maps the arguments for nbd-server-start to the command line, with the exception that it uses SocketAddress instead of SocketAddressLegacy: New interfaces shouldn't use legacy types, and the additional nesting would be nasty on the command line. Example (only with required options): --nbd-server addr.type=inet,addr.host=localhost,addr.port=10809 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200224143008.13362-10-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:21:28 +01:00
Peter Krempa	17e1e2be5f	block: Introduce 'bdrv_reopen_commit_post' step Add another step in the reopen process where driver can execute code after permission changes are comitted. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <adc02cf591c3cb34e98e33518eb1c540a0f27db1.1582893284.git.pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:37 +01:00
Stefan Hajnoczi	4749079ce0	aio-posix: make AioHandler deletion O(1) It is not necessary to scan all AioHandlers for deletion. Keep a list of deleted handlers instead of scanning the full list of all handlers. The AioHandler->deleted field can be dropped. Let's check if the handler has been inserted into the deleted list instead. Add a new QLIST_IS_INSERTED() API for this check. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Stefan Hajnoczi	8c6b0356b5	util/async: make bh_aio_poll() O(1) The ctx->first_bh list contains all created BHs, including those that are not scheduled. The list is iterated by the event loop and therefore has O(n) time complexity with respected to the number of created BHs. Rewrite BHs so that only scheduled or deleted BHs are enqueued. Only BHs that actually require action will be iterated. One semantic change is required: qemu_bh_delete() enqueues the BH and therefore invokes aio_notify(). The tests/test-aio.c:test_source_bh_delete_from_cb() test case assumed that g_main_context_iteration(NULL, false) returns false after qemu_bh_delete() but it now returns true for one iteration. Fix up the test case. This patch makes aio_compute_timeout() and aio_bh_poll() drop from a CPU profile reported by perf-top(1). Previously they combined to 9% CPU utilization when AioContext polling is commented out and the guest has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200221093951.1414693-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-22 08:26:47 +00:00
Peter Krempa	facda5443f	qapi: Allow getting flat output from 'query-named-block-nodes' When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Fixed coding style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	6b4907cf42	block: Remove bdrv_recurse_is_first_non_filter() It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	5d69b5ab85	block: Add bdrv_recurse_can_replace() After a couple of follow-up patches, this function will replace bdrv_recurse_is_first_non_filter() in check_to_replace_node(). bdrv_recurse_is_first_non_filter() is both not sufficiently specific for check_to_replace_node() (it allows cases that should not be allowed, like replacing child nodes of quorum with dissenting data that have more parents than just quorum), and it is too restrictive (it is perfectly fine to replace filters). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Max Reitz	a851ad4cac	block: Drop bdrv_is_first_non_filter() It is unused now. (And it was ugly because it needed to explore all BDS chains from the top.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Aarushi Mehta	f80f267373	blockdev: adds bdrv_parse_aio to use io_uring Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-8-stefanha@redhat.com Message-Id: <20200120141858.587874-8-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Aarushi Mehta	6663a0a337	block/io_uring: implements interfaces for io_uring Aborts when sqe fails to be set as sqes cannot be returned to the ring. Adds slow path for short reads for older kernels Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-5-stefanha@redhat.com Message-Id: <20200120141858.587874-5-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Aarushi Mehta	4c34ee366d	block/block: add BDRV flag for io_uring Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-4-stefanha@redhat.com Message-Id: <20200120141858.587874-4-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	18fbd0dec7	block/io: wait for serialising requests when a request becomes serialising Marking without waiting would not result in actual serialising behavior. Thus, make a call bdrv_mark_request_serialising sufficient for serialisation to happen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	c53cb42769	block: eliminate BDRV_REQ_NO_SERIALISING It is unused since commit `00e30f0` ("block/backup: use backup-top instead of write notifiers", 2019-10-01), drop it to simplify the code. While at it, drop redundant assertions on flags. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-2-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Max Reitz	7b1d9c4df0	block: Add bdrv_qapi_perm_to_blk_perm() We need some way to correlate QAPI BlockPermission values with BLK_PERM_* flags. We could: (1) have the same order in the QAPI definition as the the BLK_PERM_* flags are in LSb-first order. However, then there is no guarantee that they actually match (e.g. when someone modifies the QAPI schema without thinking of the BLK_PERM_* definitions). We could add static assertions, but these would break what’s good about this solution, namely its simplicity. (2) define the BLK_PERM_* flags based on the BlockPermission values. But this way whenever someone were to modify the QAPI order (perfectly sensible in theory), the BLK_PERM_* values would change. Because these values are used for file locking, this might break file locking between different qemu versions. Therefore, go the slightly more cumbersome way: Add a function to translate from the QAPI constants to the BLK_PERM_* flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191108123455.39445-2-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
Vladimir Sementsov-Ogievskiy	e53a578a8b	block/snapshot: rename Error ** parameter to more common errp Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@Redhat.com> Message-Id: <20191205174635.18758-11-vsementsov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:43:19 +01:00
Eric Blake	93676c88d7	nbd: Don't send oversize strings Qemu as server currently won't accept export names larger than 256 bytes, nor create dirty bitmap names longer than 1023 bytes, so most uses of qemu as client or server have no reason to get anywhere near the NBD spec maximum of a 4k limit per string. However, we weren't actually enforcing things, ignoring when the remote side violates the protocol on input, and also having several code paths where we send oversize strings on output (for example, qemu-nbd --description could easily send more than 4k). Tighten things up as follows: client: - Perform bounds check on export name and dirty bitmap request prior to handing it to server - Validate that copied server replies are not too long (ignoring NBD_INFO_* replies that are not copied is not too bad) server: - Perform bounds check on export name and description prior to advertising it to client - Reject client name or metadata query that is too long - Adjust things to allow full 4k name limit rather than previous 256 byte limit Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-4-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Eric Blake	cf7c49cf6a	bitmap: Enforce maximum bitmap name length We document that for qcow2 persistent bitmaps, the name cannot exceed 1023 bytes. It is inconsistent if transient bitmaps do not have to abide by the same limit, and it is unlikely that any existing client even cares about using bitmap names this long. It's time to codify that ALL bitmaps managed by qemu (whether persistent in qcow2 or not) have a documented maximum length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-3-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Eric Blake	9d7ab222da	nbd/server: Prefer heap over stack for parsing client names As long as we limit NBD names to 256 bytes (the bare minimum permitted by the standard), stack-allocation works for parsing a name received from the client. But as mentioned in a comment, we eventually want to permit up to the 4k maximum of the NBD standard, which is too large for stack allocation; so switch everything in the server to use heap allocation. For now, there is no change in actually supported name length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-2-eblake@redhat.com> [eblake: fix uninit variable compile failure] Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Max Reitz	c28107e9e5	block: Add bdrv_co_get_self_request() Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:32:51 +01:00
Max Reitz	304d9d7f03	block: Make wait/mark serialising requests public Make both bdrv_mark_request_serialising() and bdrv_wait_serialising_requests() public so they can be used from block drivers. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:29:15 +01:00
Klaus Jensen	407d22eb5b	nvme: fix NSSRS offset in CAP register Fix the offset of the NSSRS field the CAP register. From NVME 1.4, section 3 ("Controller Registers"), subsection 3.1.1 ("Offset 0h: CAP – Controller Capabilities") CAP_NSSRS_SHIFT is bit 36, not 33. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reported-by: Javier Gonzalez <javier.gonz@samsung.com> Message-id: 20191023073315.446534-1-its@irrelevant.dk Reviewed-by: John Snow <jsnow@redhat.com> [mreitz: Added John's note on the location in the specification where this information can be found] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:21:45 +01:00
Max Reitz	c80d8b06cf	block: Add @exact parameter to bdrv_co_truncate() We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:00:07 +01:00
Maxim Levitsky	e0dd95e373	block/nvme: add support for write zeros Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:30 +01:00
Vladimir Sementsov-Ogievskiy	0e2402452f	block/block-copy: increase buffered copy request No reason to limit buffered copy to one cluster. Let's allow up to 1 MiB. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	7f739d0e53	block/block-copy: add memory limit Currently total allocation for parallel requests to block-copy instance is unlimited. Let's limit it to 128 MiB. For now block-copy is used only in backup, so actually we limit total allocation for backup job. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	4dd09f6223	qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	f88676c149	block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Firstly, no reason to optimize failure path. Then, function name is ambiguous: it checks for readonly and similar things, but someone may think that it will ignore normal bitmaps which was just unchanged, and this is in bad relation with the fact that we should drop IN_USE flag for unchanged bitmaps in the image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	859443b0fb	block: switch reopen queue from QSIMPLEQ to QTAILQ We'll need reverse-foreach in the following commit, QTAILQ support it, so move to QTAILQ. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	ef9041a7b8	block/dirty-bitmap: refactor bdrv_dirty_bitmap_next bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	5deb6cbd1f	block/dirty-bitmap: add bs link Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	767db3aad8	block/dirty-bitmap: drop meta Drop meta bitmaps, as they are unused. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	d2c3080e41	block/qcow2: proper locking on bitmap add/remove paths qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire aio context since `0a6c86d024`. But this is not enough: we also must lock qcow2 mutex when access in-image metadata. Especially it concerns freeing qcow2 clusters. To achieve this, move qcow2_can_store_new_dirty_bitmap and qcow2_remove_persistent_dirty_bitmap to coroutine context. Since we work in coroutines in correct aio context, we don't need context acquiring in blockdev.c anymore, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	b56a1e3175	block/dirty-bitmap: return int from bdrv_remove_persistent_dirty_bitmap It's more comfortable to not deal with local_err. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	00e30f05de	block/backup: use backup-top instead of write notifiers Drop write notifiers and use filter node instead. = Changes = 1. Add filter-node-name argument for backup qmp api. We have to do it in this commit, as 257 needs to be fixed. 2. There are no more write notifiers here, so is_write_notifier parameter is dropped from block-copy paths. 3. To sync with in-flight requests at job finish we now have drained removing of the filter, we don't need rw-lock. 4. Block-copy is now using BdrvChildren instead of BlockBackends 5. As backup-top owns these children, we also move block-copy state into backup-top's ownership. = Iotest changes = 56: op-blocker doesn't shoot now, as we set it on source, but then check on filter, when trying to start second backup. To keep the test we instead can catch another collision: both jobs will get 'drive0' job-id, as job-id parameter is unspecified. To prevent interleaving with file-posix locks (as they are dependent on config) let's use another target for second backup. Also, it's obvious now that we'd like to drop this op-blocker at all and add a test-case for two backups from one node (to different destinations) actually works. But not in these series. 141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk check inside qmp_blockdev_del. But we've dropped block-copy blk objects, so no more blk objects on source bs (job blk is on backup-top filter bs). New message is from op-blocker, which is the next check in qmp_blockdev_add. 257: The test wants to emulate guest write during backup. They should go to filter node, not to original source node, of course. Therefore we need to specify filter node name and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	0f4b02b73e	block/block-copy: split block_copy_set_callbacks function Split block_copy_set_callbacks out of block_copy_state_new. It's needed for further commit: block-copy will use BdrvChildren of backup-top filter, so it will be created from backup-top filter creation function. But callbacks will still belong to backup job and will be set in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	a6ffe1998c	block/backup: move in-flight requests handling from backup to block-copy Move synchronization mechanism to block-copy, to be able to use one block-copy instance from backup job and backup-top filter in parallel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	d924559953	qapi: query-blockstat: add driver specific file-posix stats A block driver can provide a callback to report driver-specific statistics. file-posix driver now reports discard statistics Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	f344446654	block: add empty account cookie type Each block_acct_done/failed call is designed to correspond to a previous block_acct_start call, which initializes the stats cookie. However sometimes it is not the case, e.g. some error paths might report the same cookie twice because it is hard to accurately track if the cookie was reported yet or not. This patch cleans the cookie after report. (Note: block_acct_failed/done without a previous block_acct_start at all should be avoided. Uninitialized cookie might hold a garbage value and there is still "< BLOCK_MAX_IOTYPE" assertion for that) It will be particularly useful in ide code where it's hard to keep track whether the request done its accounting or not: in the following patch of the series, trim requests will do the accounting separately. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	159f85ddc8	qapi: add unmap to BlockDeviceStats Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	beb5f5450d	block: move block_copy from block/backup.c to separate file Split block_copy to separate file, to be cleanly shared with backup-top filter driver in further commits. It's a clean movement, the only change is drop "static" from interface functions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	6e9b225f73	block: introduce aio task pool Common interface for aio task loops. To be used for improving performance of synchronous io loops in qcow2, block-stream, copy-on-read, and may be other places. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Eric Blake	61bc846d8c	nbd: Grab aio context lock in more places When iothreads are in use, the failure to grab the aio context results in an assertion failure when trying to unlock things during blk_unref, when trying to unlock a mutex that was not locked. In short, all calls to nbd_export_put need to done while within the correct aio context. But since nbd_export_put can recursively reach itself via nbd_export_close, and recursively grabbing the context would deadlock, we can't do the context grab directly in those functions, but must do so in their callers. Hoist the use of the correct aio_context from nbd_export_new() to its caller qmp_nbd_server_add(). Then tweak qmp_nbd_server_remove(), nbd_eject_notifier(), and nbd_esport_close_all() to grab the right context, so that all callers during qemu now own the context before nbd_export_put() can call blk_unref(). Remaining uses in qemu-nbd don't matter (since that use case does not support iothreads). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190917023917.32226-1-eblake@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com>	2019-09-24 07:30:19 -05:00
Nir Soffer	8972571509	block: Remove unused masks Replace confusing usage: ~BDRV_SECTOR_MASK With more clear: (BDRV_SECTOR_SIZE - 1) Remove BDRV_SECTOR_MASK and the unused BDRV_BLOCK_OFFSET_MASK which was it's last user. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827185913.27427-3-nsoffer@redhat.com Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 14:48:30 +02:00
Vladimir Sementsov-Ogievskiy	bb0c940993	job: drop job_drain In job_finish_sync job_enter should be enough for a job to make some progress and draining is a wrong tool for it. So use job_enter directly here and drop job_drain with all related staff not used more. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: John Snow <jsnow@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Eric Blake	0a4795455c	nbd: Prepare for NBD_CMD_FLAG_FAST_ZERO Commit `fe0480d6` and friends added BDRV_REQ_NO_FALLBACK as a way to avoid wasting time on a preliminary write-zero request that will later be rewritten by actual data, if it is known that the write-zero request will use a slow fallback; but in doing so, could not optimize for NBD. The NBD specification is now considering an extension that will allow passing on those semantics; this patch updates the new protocol bits and 'qemu-nbd --list' output to recognize the bit, as well as the new errno value possible when using the new flag; while upcoming patches will improve the client to use the feature when present, and the server to advertise support for it. The NBD spec recommends (but not requires) that ENOTSUP be avoided for all but failures of a fast zero (the only time it is mandatory to avoid an ENOTSUP failure is when fast zero is supported but not requested during write zeroes; the questionable use is for ENOTSUP to other actions like a normal write request). However, clients that get an unexpected ENOTSUP will either already be treating it the same as EINVAL, or may appreciate the extra bit of information. We were equally loose for returning EOVERFLOW in more situations than recommended by the spec, so if it turns out to be a problem in practice, a later patch can tighten handling for both error codes. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message, also handle EOPNOTSUPP]	2019-09-05 16:03:13 -05:00
Eric Blake	dbb38caac5	nbd: Improve per-export flag handling in server When creating a read-only image, we are still advertising support for TRIM and WRITE_ZEROES to the client, even though the client should not be issuing those commands. But seeing this requires looking across multiple functions: All callers to nbd_export_new() passed a single flag based solely on whether the export allows writes. Later, we then pass a constant set of flags to nbd_negotiate_options() (namely, the set of flags which we always support, at least for writable images), which is then further dynamically modified with NBD_FLAG_SEND_DF based on client requests for structured options. Finally, when processing NBD_OPT_EXPORT_NAME or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the runtime set of flags we've built up over several functions. Let's refactor things to instead compute a baseline of flags as soon as possible which gets shared between multiple clients, in nbd_export_new(), and changing the signature for the callers to pass in a simpler bool rather than having to figure out flags. We can then get rid of the 'myflags' parameter to various functions, and instead refer to client for everything we need (we still have to perform a bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and NBD_OPT_EXPORT_GO, but it's easier to see what is being computed). This lets us quit advertising senseless flags for read-only images, as well as making the next patch for exposing FAST_ZERO support easier to write. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: improve commit message, update iotest 223]	2019-09-05 16:02:54 -05:00
Eric Blake	61cc872456	nbd: Advertise multi-conn for shared read-only connections The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be advertised when the server promises cache consistency between simultaneous clients (basically, rules that determine what FUA and flush from one client are able to guarantee for reads from another client). When we don't permit simultaneous clients (such as qemu-nbd without -e), the bit makes no sense; and for writable images, we probably have a lot more work before we can declare that actions from one client are cache-consistent with actions from another. But for read-only images, where flush isn't changing any data, we might as well advertise multi-conn support. What's more, advertisement of the bit makes it easier for clients to determine if 'qemu-nbd -e' was in use, where a second connection will succeed rather than hang until the first client goes away. This patch affects qemu as server in advertising the bit. We may want to consider patches to qemu as client to attempt parallel connections for higher throughput by spreading the load over those connections when a server advertises multi-conn, but for now sticking to one connection per nbd:// BDS is okay. See also: https://bugzilla.redhat.com/1708300 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190815185024.7010-1-eblake@redhat.com> [eblake: tweak blockdev-nbd.c to not request shared when writable, fix iotest 233] Reviewed-by: John Snow <jsnow@redhat.com>	2019-09-05 15:51:55 -05:00
Vladimir Sementsov-Ogievskiy	1acc3466a2	block/io: introduce bdrv_co_p{read, write}v_part Introduce extended variants of bdrv_co_preadv and bdrv_co_pwritev with qiov_offset parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-10-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-10-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	ac850bf099	block: define .*_part io handlers in BlockDriver Add handlers supporting qiov_offset parameter: bdrv_co_preadv_part bdrv_co_pwritev_part bdrv_co_pwritev_compressed_part This is used to reduce need of defining local_qiovs and hd_qiovs in all corners of block layer code. The following patches will increase usage of this new API part by part. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-5-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-5-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Max Reitz	ceaca56fee	block: Add bdrv_has_zero_init_truncate() No .bdrv_has_zero_init() implementation returns 1 if growing the file would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it in lieu of this new function was always safe. But on the other hand, it is possible that growing an image that is not zero-initialized would still add a zero-initialized area, like when using nonpreallocating truncation on a preallocated image. For callers that care only about truncation, not about creation with potential preallocation, this new function is useful. Alternatively, we could have added a PreallocMode parameter to bdrv_has_zero_init(). But the only user would have been qemu-img convert, which does not have a plain PreallocMode value right now -- it would have to parse the creation option to obtain it. Therefore, the simpler solution is to let bdrv_has_zero_init() inquire the preallocation status and add the new bdrv_has_zero_init_truncate() that presupposes PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	cdf3bc934a	mirror: Fix bdrv_has_zero_init() use bdrv_has_zero_init() only has meaning for newly created images or image areas. If the mirror job itself did not create the image, it cannot rely on bdrv_has_zero_init()'s result to carry any meaning. This is the case for drive-mirror with mode=existing and always for blockdev-mirror. Note that we only have to zero-initialize the target with sync=full, because other modes actually do not promise that the target will contain the same data as the source after the job -- sync=top only promises to copy anything allocated in the top layer, and sync=none will only copy new I/O. (Which is how mirror has always handled it.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
John Snow	c4e4b0fa59	qapi: implement block-dirty-bitmap-remove transaction action It is used to do transactional movement of the bitmap (which is possible in conjunction with merge command). Transactional bitmap movement is needed in scenarios with external snapshot, when we don't want to leave copy of the bitmap in the base image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190708220502.12977-3-jsnow@redhat.com [Edited "since" version to 4.2 --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	28636b8211	block/dirty-bitmap: add bdrv_dirty_bitmap_get Add a public interface for get. While we're at it, rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked". (There are more functions to rename to the bdrv_dirty_bitmap_VERB form, but they will wait until the conclusion of this series.) Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	b7661ca5d8	block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal I'm surprised it didn't come up sooner, but sometimes we have a +busy bitmap as a source. This is dangerous from the QMP API, but if we are the owner that marked the bitmap busy, it's safe to merge it using it as a read only source. It is not safe in the general case to allow users to read from in-use bitmaps, so create an internal variant that foregoes the safety checking. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-10-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	c8b5650178	block/backup: Add mirror sync mode 'bitmap' We don't need or want a new sync mode for simple differences in semantics. Create a new mode simply named "BITMAP" that is designed to make use of the new Bitmap Sync Mode field. Because the only bitmap sync mode is 'on-success', this adds no new functionality to the backup job (yet). The old incremental backup mode is maintained as a syntactic sugar for sync=bitmap, mode=on-success. Add all of the plumbing necessary to support this new instruction. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-6-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
Peter Maydell	c6a2225a5a	nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJdVaRZAAoJEKeha0olJ0NqrGoIAJSvVLMDeWZIkHr3CQ5AbMHy 6IHUntBwv4PEHw0FyyDU7lLgEWubTwe/7RfvyJ69kQYSJLjvHa3KEic0aa7SOETK hGUlSoIFHEugi+XDcYyy9EG+ItUR7jnunkwomxvFRm4XzjEHFO9ck8fOS+uq/23e LGDHwdoZI6vawUPftbBuRAlB3egCEcBtTWXYMk8lm3MXHOHL7O18DRkfWvwcHfl6 mNIKgTVMtl1gYoJznCUmC5VLHL4jQy+kSNXnyHBQOEEvTcORu0EztJS81H+BODni sxa9seem7JL9NLUTmkJsbGfSM6RKdfypX34oik9yakqUnXRrlxkxI+IX26XfdQ4= =2MAO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-08-15' into staging nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect # gpg: Signature made Thu 15 Aug 2019 19:28:41 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2019-08-15: block/nbd: refactor nbd connection parameters block/nbd: add cmdline and qapi parameter reconnect-delay block/nbd: move from quit to state block/nbd: use non-blocking io channel for nbd negotiation block/nbd: split connection_co start out of nbd_client_connect nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH block/stream: use BDRV_REQ_PREFETCH block: implement BDRV_REQ_PREFETCH qapi: Add InetSocketAddress member keep-alive Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-16 15:53:37 +01:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	2ae16a6aa4	Include generated QAPI headers less Some of the generated qapi-types-MODULE.h are included all over the place. Changing a QAPI type can trigger massive recompiling. Top scorers recompile more than 1000 out of some 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h): 6300 qapi/qapi-builtin-types.h 5700 qapi/qapi-types-run-state.h 3900 qapi/qapi-types-common.h 3300 qapi/qapi-types-sockets.h 3000 qapi/qapi-types-misc.h 3000 qapi/qapi-types-crypto.h 3000 qapi/qapi-types-job.h 3000 qapi/qapi-types-block-core.h 2800 qapi/qapi-types-block.h 1300 qapi/qapi-types-net.h Clean up headers to include generated QAPI headers only where needed. Impact is negligible except for hw/qdev-properties.h. This header includes qapi/qapi-types-block.h and qapi/qapi-types-misc.h. They are used only in expansions of property definition macros such as DEFINE_PROP_BLOCKDEV_ON_ERROR() and DEFINE_PROP_OFF_AUTO(). Moving their inclusion from hw/qdev-properties.h to the users of these macros avoids pointless recompiles. This is how other property definition macros, such as DEFINE_PROP_NETDEV(), already work. Improves things for some of the top scorers: 3600 qapi/qapi-types-common.h 2800 qapi/qapi-types-sockets.h 900 qapi/qapi-types-misc.h 2200 qapi/qapi-types-crypto.h 2100 qapi/qapi-types-job.h 2100 qapi/qapi-types-block-core.h 270 qapi/qapi-types-block.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-3-armbru@redhat.com>	2019-08-16 13:31:51 +02:00
Markus Armbruster	ec150c7e09	include: Make headers more self-contained Back in 2016, we discussed[1] rules for headers, and these were generally liked: 1. Have a carefully curated header that's included everywhere first. We got that already thanks to Peter: osdep.h. 2. Headers should normally include everything they need beyond osdep.h. If exceptions are needed for some reason, they must be documented in the header. If all that's needed from a header is typedefs, put those into qemu/typedefs.h instead of including the header. 3. Cyclic inclusion is forbidden. This patch gets include/ closer to obeying 2. It's actually extracted from my "[RFC] Baby steps towards saner headers" series[2], which demonstrates a possible path towards checking 2 automatically. It passes the RFC test there. [1] Message-ID: <87h9g8j57d.fsf@blackfin.pond.sub.org> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html [2] Message-Id: <20190711122827.18970-1-armbru@redhat.com> https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg02715.html Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-2-armbru@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:51 +02:00
Vladimir Sementsov-Ogievskiy	a8e2bb6a76	block/nbd: use non-blocking io channel for nbd negotiation No reason to use blocking channel for negotiation and we'll benefit in further reconnect feature, as qio_channel reads and writes will do qemu_coroutine_yield while waiting for io completion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190618114328.55249-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:14 -05:00
Vladimir Sementsov-Ogievskiy	3299e5ecf7	block: implement BDRV_REQ_PREFETCH Do effective copy-on-read request when we don't need data actually. It will be used for block-stream and NBD_CMD_CACHE. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190725100550.33801-2-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: comment grammar fix] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-08-15 13:22:13 -05:00
Max Reitz	43eaaaef0e	block: Only the main loop can change AioContexts bdrv_set_aio_context_ignore() can only work in the main loop: bdrv_drained_begin() only works in the main loop and the node's (old) AioContext; and bdrv_drained_end() really only works in the main loop and the node's (new) AioContext (contrary to its current comment, which is just wrong). Consequentially, bdrv_set_aio_context_ignore() must be called from the main loop. Luckily, assuming that we can make block graph changes only from the main loop as well, all its callers do that already. Note that changing a node's context in a sense is an operation that changes the block graph, so it actually makes sense to require this function to be called from the main loop. Also, fix bdrv_drained_end()'s description. You can only use it from the main loop or the node's AioContext, and in the latter case, the whole subtree must be in the same context. Fixes: `e037c09c78` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190722133054.21781-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:41:43 +02:00
Max Reitz	e037c09c78	block: Do not poll in bdrv_do_drained_end() We should never poll anywhere in bdrv_do_drained_end() (including its recursive callees like bdrv_drain_invoke()), because it does not cope well with graph changes. In fact, it has been written based on the postulation that no graph changes will happen in it. Instead, the callers that want to poll must poll, i.e. all currently globally available wrappers: bdrv_drained_end(), bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and bdrv_drain_all_end(). Graph changes there do not matter. They can poll simply by passing a pointer to a drained_end_counter and wait until it reaches 0. This patch also adds a non-polling global wrapper for bdrv_do_drained_end() that takes a drained_end_counter pointer. We need such a variant because now no function called anywhere from bdrv_do_drained_end() must poll. This includes BdrvChildRole.drained_end(), which already must not poll according to its interface documentation, but bdrv_child_cb_drained_end() just violates that by invoking bdrv_drained_end() (which does poll). Therefore, BdrvChildRole.drained_end() must take a drained_end_counter parameter, which bdrv_child_cb_drained_end() can pass on to the new bdrv_drained_end_no_poll() function. Note that we now have a pattern of all drained_end-related functions either polling or receiving a drained_end_counter to let the caller poll based on that. A problem with a single poll loop is that when the drained section in bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in the old contexts, while others are in the new context already. To let the collective poll in bdrv_drained_end() work correctly, we must not hold a lock to the old context, so that the old context can make progress in case it is different from the current context. (In the process, remove the comment saying that the current context is always the old context, because it is wrong.) In all other places, all nodes in a subtree must be in the same context, so we can just poll that. The exception of course is bdrv_drain_all_end(), but that always runs in the main context, so we can just poll NULL (like bdrv_drain_all_begin() does). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Max Reitz	f4c8a43be0	block: Make bdrv_parent_drained_[^_]*() static These functions are not used outside of block/io.c, there is no reason why they should be globally available. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Max Reitz	804db8ea00	block: Introduce BdrvChild.parent_quiesce_counter Commit `5cb2737e92` laid out why bdrv_do_drained_end() must decrement the quiesce_counter after bdrv_drain_invoke(). It did not give a very good reason why it has to happen after bdrv_parent_drained_end(), instead only claiming symmetry to bdrv_do_drained_begin(). It turns out that delaying it for so long is wrong. Situation: We have an active commit job (i.e. a mirror job) from top to base for the following graph: filter \| [file] \| v top --[backing]--> base Now the VM is closed, which results in the job being cancelled and a bdrv_drain_all() happening pretty much simultaneously. Beginning the drain means the job is paused once whenever one of its nodes is quiesced. This is reversed when the drain ends. With how the code currently is, after base's drain ends (which means that it will have unpaused the job once), its quiesce_counter remains at 1 while it goes to undrain its parents (bdrv_parent_drained_end()). For some reason or another, undraining filter causes the job to be kicked and enter mirror_exit_common(), where it proceeds to invoke block_job_remove_all_bdrv(). Now base will be detached from the job. Because its quiesce_counter is still 1, it will unpause the job once more. So in total, undraining base will unpause the job twice. Eventually, this will lead to the job's pause_count going negative -- well, it would, were there not an assertion against this, which crashes qemu. The general problem is that if in bdrv_parent_drained_end() we undrain parent A, and then undrain parent B, which then leads to A detaching the child, bdrv_replace_child_noperm() will undrain A as if we had not done so yet; that is, one time too many. It follows that we cannot decrement the quiesce_counter after invoking bdrv_parent_drained_end(). Unfortunately, decrementing it before bdrv_parent_drained_end() would be wrong, too. Imagine the above situation in reverse: Undraining A leads to B detaching the child. If we had already decremented the quiesce_counter by that point, bdrv_replace_child_noperm() would undrain B one time too little; because it expects bdrv_parent_drained_end() to issue this undrain. But bdrv_parent_drained_end() won't do that, because B is no longer a parent. Therefore, we have to do something else. This patch opts for introducing a second quiesce_counter that counts how many times a child's parent has been quiesced (though c->role->drained_*). With that, bdrv_replace_child_noperm() just has to undrain the parent exactly that many times when removing a child, and it will always be right. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-07-19 13:19:16 +02:00
Max Reitz	e5182c1c57	block: Add BDS.never_freeze The commit and the mirror block job must be able to drop their filter node at any point. However, this will not be possible if any of the BdrvChild links to them is frozen. Therefore, we need to prevent them from ever becoming frozen. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190703172813.6868-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Andrey Shinkevich	170d3bd341	block: include base when checking image chain for block allocation This patch is used in the 'block/stream: introduce a bottom node' that is following. Instead of the base node, the caller may pass the node that has the base as its backing image to the function bdrv_is_allocated_above() with a new parameter include_base = true and get rid of the dependency on the base that may change during commit/stream parallel jobs. Now, if the specified base is not found in the backing image chain, the QEMU will abort. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1559152576-281803-2-git-send-email-andrey.shinkevich@virtuozzo.com [mreitz: Squashed in the following as a rebase on conflicting patches:] Message-id: e3cf99ae-62e9-8b6e-5a06-d3c8b9363b85@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-02 03:53:04 +02:00
Max Reitz	c1087f1206	block: Add bdrv_child_refresh_perms() If a block node uses bdrv_child_try_set_perm() to change the permission it takes on its child, the result may be very short-lived. If anything makes the block layer recalculate the permissions internally, it will invoke the node driver's .bdrv_child_perm() implementation. The permission/shared permissions masks that returns will then override the values previously passed to bdrv_child_try_set_perm(). If drivers want a child edge to have specific values for the permissions/shared permissions mask, it must return them in .bdrv_child_perm(). Consequentially, there is no need for them to pass the same values to bdrv_child_try_set_perm() then: It is better to have a function that invokes .bdrv_child_perm() and calls bdrv_child_try_set_perm() with the result. This patch adds such a function under the name of bdrv_child_refresh_perms(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	b23c580c94	block: drop bs->job Drop remaining users of bs->job: 1. assertions actually duplicated by assert(!bs->refcnt) 2. trace-point seems not enough reason to change stream_start to return BlockJob pointer 3. Restricting creation of two jobs based on same bs is bad idea, as 3.1 Some jobs creates filters to be their main node, so, this check don't actually prevent creating second job on same real node (which will create another filter node) (but I hope it is restricted by other mechanisms) 3.2 Even without bs->job we have two systems of permissions: op-blockers and BLK_PERM 3.3 We may want to run several jobs on one node one day And finally, drop bs->job pointer itself. Hurrah! Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	8164102ffe	blockdev: blockdev_mark_auto_del: drop usage of bs->job We are going to remove bs->job pointer. Drop it's usage in blockdev_mark_auto_del: instead of looking at bs->job let's check all jobs for references to bs. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:10 +02:00
Vladimir Sementsov-Ogievskiy	cc19f1773d	block/replication: drop usage of bs->job We are going to remove bs->job pointer. Drop it's usage in replication code. Additionally we have to return job pointer from some mirror APIs. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-18 16:41:09 +02:00
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	2019-06-12 13:20:20 +02:00
Vladimir Sementsov-Ogievskiy	d93e572688	block/io: bdrv_pdiscard: support int64_t bytes parameter This fixes at least one overflow in qcow2_process_discards, which passes 64bit region length to bdrv_pdiscard where bytes (or sectors in the past) parameter is int since its introduction in `0b919fae`. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 16:55:58 +02:00
Kevin Wolf	42a65f02f9	block: Remove bdrv_set_aio_context() All callers of bdrv_set_aio_context() are eliminated now, they have moved to bdrv_try_set_aio_context() and related safe functions. Remove bdrv_set_aio_context(). With this, we can now know that the .set_aio_ctx callback must be present in bdrv_set_aio_context_ignore() because bdrv_can_set_aio_context() would have returned false previously, so instead of checking the condition, we can assert it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 16:55:58 +02:00
Kevin Wolf	132ada80c4	block: Adjust AioContexts when attaching nodes So far, we only made sure that updating the AioContext of a node affected the whole subtree. However, if a node is newly attached to a new parent, we also need to make sure that both the subtree of the node and the parent are in the same AioContext. This tries to move the new child node to the parent AioContext and returns an error if this isn't possible. BlockBackends now actually apply their AioContext to their root node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Kenneth Heitke	3036a626e9	nvme: add Get/Set Feature Timestamp support Signed-off-by: Kenneth Heitke <kenneth.heitke@intel.com> Reviewed-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:09 +02:00
Julia Suvorova	2b02fd81de	block/linux-aio: Drop unused BlockAIOCB submission method Callback-based laio_submit() and laio_cancel() were left after rewriting Linux AIO backend to coroutines in hope that they would be used in other code that could bypass coroutines. They can be safely removed because they have not been used since that time. Signed-off-by: Julia Suvorova <jusual@mail.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Vladimir Sementsov-Ogievskiy	69f47505ee	block: avoid recursive block_status call if possible drv_co_block_status digs bs->file for additional, more accurate search for hole inside region, reported as DATA by bs since `5daa74a6eb`. This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 knows, where are holes and where is data. But every block_status request calls lseek additionally. Assume a big disk, full of data, in any iterative copying block job (or img convert) we'll call lseek(HOLE) on every iteration, and each of these lseeks will have to iterate through all metadata up to the end of file. It's obviously ineffective behavior. And for many scenarios we don't need this lseek at all. However, lseek is needed when we have metadata-preallocated image. So, let's detect metadata-preallocation case and don't dig qcow2's protocol file in other cases. The idea is to compare allocation size in POV of filesystem with allocations size in POV of Qcow2 (by refcounts). If allocation in fs is significantly lower, consider it as metadata-preallocation case. 102 iotest changed, as our detector can't detect shrinked file as metadata-preallocation, which don't seem to be wrong, as with metadata preallocation we always have valid file length. Two other iotests have a slight change in their QMP output sequence: Active 'block-commit' returns earlier because the job coroutine yields earlier on a blocking operation. This operation is loading the refcount blocks in qcow2_detect_metadata_preallocation(). Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:20:41 +02:00
Kevin Wolf	53a7d04185	block: Propagate AioContext change to parents All block nodes and users in any connected component of the block graph must be in the same AioContext, so changing the AioContext of one node must not only change all of its children, but all of its parents (and in turn their children etc.) as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Kevin Wolf	a3a683c33d	block: Make bdrv_attach/detach_aio_context() static Since commit `b97511c7bc`, there is no reason for block drivers any more to call these functions (see the function comment in block_int.h). They are now just internal helper functions for bdrv_set_aio_context() and can be made static. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Kevin Wolf	5d2318499f	block: Add bdrv_try_set_aio_context() Eventually, we want to make sure that all parents and all children of a node are in the same AioContext as the node itself. This means that changing the AioContext may fail because one of the other involved parties (e.g. a guest device that was configured with an iothread) cannot allow switching to a different AioContext. Introduce a set of functions that allow to first check whether all involved nodes can switch to a new context and only then do the actual switch. The check recursively covers children and parents. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-20 17:08:56 +02:00
Peter Maydell	01807c8b0e	Miscellaneous patches for 2019-05-13 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJc2RbiAAoJEDhwtADrkYZTfpAP/itXg+X9wRfMeerni6SAkgtz knaLtJNC7YvwH7W6TIkSf2QgMrg/nYzIXxxj6V+Y3Vdn4CD93C7ldItWkm33amlA SEhREHpAn4F3wg/LsOGYYgpuqF/wrUcZsmzudnslfMd3mM6Q9Q6J3q6mu8n1oRcl RRKlk++ElqyRdvmxFhddhPxk797Vuunh76vd3ARUFmzKs2n7CGkeBu+qbk41VqI9 YtjmWHO6BDY5b01PvjuWPir6n1yJsYcpfo3ZElZvPf5jQHMmO6fGB3SZc/PIWegq gAVeoXtwhNm+nywMpIv1wHQMkvRDZW0wrurIQBc4VGpH1Pa90dR9FNVZ8r0OZqPB aErPCdC7ED73uzJwzXKTnLxY0XDgdhsAsW7lFggANs6YyewZNcbDaVhZWsopTTK/ 3jBbddIw2RsfHNQgXlFVVzjZJGHBNHxFjAFASCKcapUWQwDKU42kQrS1GqxG56NI Lgi8Ce+Q0GsVF4wme96Oa/8EMRfmNvsHMfWQvmqGqA1OACSOf2PSGCeD618A5gq6 kV6wF4v5HdGFkc0x9Vr5ur7kv3eOhpzFzBM6XJXe3CyqnYrkNuBldkyGZBbrNY7G aW5sR26Is4m9i+7159cNB5LmnfQqtsscibkSC0UQiXcuWgevd6cdiF+0r1YuNp7C Faa2yPOHs4mHCjUwade9 =yKfZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-misc-2019-05-13' into staging Miscellaneous patches for 2019-05-13 # gpg: Signature made Mon 13 May 2019 08:04:02 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-misc-2019-05-13: Clean up decorations and whitespace around header guards Normalize header guard symbol definition. Clean up ill-advised or unusual header guards Clean up header guards that don't match their file name target/xtensa: Clean up core-isa.h header guards linux-user/nios2 linux-user/riscv: Clean up header guards authz: Normalize #include "authz/trace.h" to "trace.h" Use #include "..." for our own headers, <...> for others Clean up includes Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-05-13 13:55:13 +01:00
Markus Armbruster	6834c3f410	Clean up decorations and whitespace around header guards Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190315145123.28030-9-armbru@redhat.com>	2019-05-13 08:58:55 +02:00
Alberto Garcia	2e11d7562a	block: Remove bdrv_read() and bdrv_write() No one is using these functions anymore, all callers have switched to the byte-based bdrv_pread() and bdrv_pwrite() Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-05-10 16:45:40 +02:00
Max Reitz	c4e2aff81b	qemu-nbd: Look up flag names in array The existing code to convert flag bits into strings looks a bit strange now, and if we ever add more flags, it will look even stranger. Prevent that from happening by making it look up the flag names in an array. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20190405191635.25740-1-mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-05-07 09:43:42 -05:00
Vladimir Sementsov-Ogievskiy	371420e217	block: introduce byte-based io helpers Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Markus Armbruster	e1ce7d747b	block/qapi: Clean up how we print to monitor or stdout bdrv_snapshot_dump(), bdrv_image_info_specific_dump(), bdrv_image_info_dump() and their helpers take an fprintf()-like callback and a FILE * to pass to it. hmp.c passes monitor_printf() cast to fprintf_function and the current monitor cast to FILE *. qemu-img.c and qemu-io-cmds.c pass fprintf and stdout. The type-punning is technically undefined behaviour, but works in practice. Clean up: drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-8-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Markus Armbruster	ac7ff4cf5f	qsp: Simplify how qsp_report() prints qsp_report() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_sync_profile() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-7-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Kevin Wolf	738301e117	file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes We know that the kernel implements a slow fallback code path for BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it. The other operations we call in the context of .bdrv_co_pwrite_zeroes should usually be quick, so no modification should be needed for them. If we ever notice that there are additional problematic cases, we can still make these conditional as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	fe0480d629	block: Add BDRV_REQ_NO_FALLBACK For qemu-img convert, we want an operation that zeroes out the whole image if this can be done efficiently, but that returns an error otherwise so we don't write explicit zeroes and immediately overwrite them with the real data, potentially doubling the amount of data to be written. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Peter Maydell	523a2a42c3	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlyIFSwACgkQfe+BBqr8 OQ7wghAAm16eCEr57oTO7QXR3y8uVFsKqXBn9cNH6nbrFp2PUQSglwMDKBls1Z5m olF23X/JaqSlSmkL9BBuzDZ6Up+kkHKuxPq4/5RKXfiDI0pr3R0eqts0COAlaN9q Bew3ipj99m8gzMi2093AW4+Ob0N3658fuDTGLe1M1Uoy7CEg1QJ7rVOBBEui7vIl RbZ8l/Zmb4ldNpB3lnE4Nh9ue8fy0RAj3Nai161nCnNeXNF/VzD3Ye8bojSBbnux PIMX6/RWmykX4feIf9QP8apDpxX4HkyuPq5EdwT9PD8PwdyXPAXZtsYUNCuNtQuk n5VKFVgFYgqUclBeVHmrMYPU4K4iCFQp4/Fua7wzPEC0iG05NiiDv91oVkEJCp3L ManHeuGfNLCcXaIntKZhuJl1cK8yMM3yDww6/pPTehrPjcyvKa0NOqhQBExektcD R6q7maJRzFaxSxdcs+Zzuog9zESvH1mlJxQCKzeYhAP0kkxInyTELE/Vbx37xuqR RFfZYyVQ6x87Q/sxHx4EMiV97WUM8elZOQdSEC/okt5WUUNpgIu0WF9nSQ1VKZ8C CZmv5xh9ogfwvB/kOm6IVwNkLvVagJQcLwddORI5LLXLbSIUcuwVSuyMp/7iDtQ/ hnHkGs2mIJ2JUYbSSNsSJNs6oTurn8eTFCeGoYKJgd9l4QxaThU= =ekU+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Tue 12 Mar 2019 20:23:08 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: (22 commits) tests/qemu-iotests: add bitmap resize test 246 block/qcow2-bitmap: Allow resizes with persistent bitmaps block/qcow2-bitmap: Don't check size for IN_USE bitmap docs/interop/qcow2: Improve bitmap flag in_use specification bitmaps: Fix typo in function name block/dirty-bitmaps: implement inconsistent bit block/dirty-bitmaps: disallow busy bitmaps as merge source block/dirty-bitmaps: prohibit removing readonly bitmaps block/dirty-bitmaps: prohibit readonly bitmaps for backups block/dirty-bitmaps: add block_dirty_bitmap_check function block/dirty-bitmap: add inconsistent status block/dirty-bitmaps: add inconsistent bit iotests: add busy/recording bit test to 124 blockdev: remove unused paio parameter documentation block/dirty-bitmaps: move comment block block/dirty-bitmaps: unify qmp_locked and user_locked calls block/dirty-bitmap: explicitly lock bitmaps with successors nbd: change error checking order for bitmaps block/dirty-bitmap: change semantics of enabled predicate block/dirty-bitmap: remove set/reset assertions against enabled bit ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # tests/qemu-iotests/group	2019-03-13 17:30:34 +00:00
Alberto Garcia	5019aece2a	block: Remove the AioContext parameter from bdrv_reopen_multiple() This parameter has been unused since `1a63a90750` Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	8a2ce0bc1e	block: Add a 'mutable_opts' field to BlockDriver If we reopen a BlockDriverState and there is an option that is present in bs->options but missing from the new set of options then we have to return an error unless the driver is able to reset it to its default value. This patch adds a new 'mutable_opts' field to BlockDriver. This is a list of runtime options that can be modified during reopen. If an option in this list is unspecified on reopen then it must be reset (or return an error). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	cb828c31de	block: Allow changing the backing file on reopen This patch allows the user to change the backing file of an image that is being reopened. Here's what it does: - In bdrv_reopen_prepare(): check that the value of 'backing' points to an existing node or is null. If it points to an existing node it also needs to make sure that replacing the backing file will not create a cycle in the node graph (i.e. you cannot reach the parent from the new backing file). - In bdrv_reopen_commit(): perform the actual node replacement by calling bdrv_set_backing_hd(). There may be temporary implicit nodes between a BDS and its backing file (e.g. a commit filter node). In these cases bdrv_reopen_prepare() looks for the real (non-implicit) backing file and requires that the 'backing' option points to it. Replacing or detaching a backing file is forbidden if there are implicit nodes in the middle. Although x-blockdev-reopen is meant to be used like blockdev-add, there's an important thing that must be taken into account: the only way to set a new backing file is by using a reference to an existing node (previously added with e.g. blockdev-add). If 'backing' contains a dictionary with a new set of options ({"driver": "qcow2", "file": { ... }}) then it is interpreted that the _existing_ backing file must be reopened with those options. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	8546632e61	block: Handle child references in bdrv_reopen_queue() Children in QMP are specified with BlockdevRef / BlockdevRefOrNull, which can contain a set of child options, a child reference, or NULL. In optional attributes like "backing" it can also be missing. Only the first case (set of child options) is being handled properly by bdrv_reopen_queue(). This patch deals with all the others. Here's how these cases should be handled when bdrv_reopen_queue() is deciding what to do with each child of a BlockDriverState: 1) Set of child options: if the child was implicitly created (i.e inherits_from points to the parent) then the options are removed from the parent's options QDict and are passed to the child with a recursive bdrv_reopen_queue() call. This case was already working fine. 2) Child reference: there's two possibilites here. 2a) Reference to the current child: if the child was implicitly created then it is put in the reopen queue, keeping its current set of options (since this was a child reference there was no way to specify a different set of options). If the child is not implicit then it keeps its current set of options but it is not reopened (and therefore does not inherit any new option from the parent). 2b) Reference to a different BDS: the current child is not put in the reopen queue at all. Passing a reference to a different BDS can be used to replace a child, although at the moment no driver implements this, so it results in an error. In any case, the current child is not going to be reopened (and might in fact disappear if it's replaced) 3) NULL: This is similar to (2b). Although no driver allows this yet it can be used to detach the current child so it should not be put in the reopen queue. 4) Missing option: at the moment "backing" is the only case where this can happen. With "blockdev-add", leaving "backing" out means that the default backing file is opened. We don't want to open a new image during reopen, so we require that "backing" is always present. We'll relax this requirement a bit in the next patch. If keep_old_opts is true and "backing" is missing then this behaves like 2a (the current child is reopened). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	077e8e2018	block: Add 'keep_old_opts' parameter to bdrv_reopen_queue() The bdrv_reopen_queue() function is used to create a queue with the BDSs that are going to be reopened and their new options. Once the queue is ready bdrv_reopen_multiple() is called to perform the operation. The original options from each one of the BDSs are kept, with the new options passed to bdrv_reopen_queue() applied on top of them. For "x-blockdev-reopen" we want a function that behaves much like "blockdev-add". We want to ignore the previous set of options so that only the ones actually specified by the user are applied, with the rest having their default values. One of the things that we need is a way to tell bdrv_reopen_queue() whether we want to keep the old set of options or not, and that's what this patch does. All current callers are setting this new parameter to true and x-blockdev-reopen will set it to false. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	2cad1ebe70	block: Allow freezing BdrvChild links Our permission system is useful to define what operations are allowed on a certain block node and includes things like BLK_PERM_WRITE or BLK_PERM_RESIZE among others. One of the permissions is BLK_PERM_GRAPH_MOD which allows "changing the node that this BdrvChild points to". The exact meaning of this has never been very clear, but it can be understood as "change any of the links connected to the node". This can be used to prevent changing a backing link, but it's too coarse. This patch adds a new 'frozen' attribute to BdrvChild, which forbids detaching the link from the node it points to, and new API to freeze and unfreeze a backing chain. After this change a few functions can fail, so they need additional checks. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Eric Blake	796a3798ab	bitmaps: Fix typo in function name Commit `a88b179f` introduced the ability to set and query bitmap persistence, but with an atypical spelling. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20190308205845.25734-1-eblake@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	3ae96d6684	block/dirty-bitmaps: add block_dirty_bitmap_check function Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	b0f455599d	block/dirty-bitmaps: add inconsistent bit Add an inconsistent bit to dirty-bitmaps that allows us to report a bitmap as persistent but potentially inconsistent, i.e. if we find bitmaps on a qcow2 that have been marked as "in use". Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	27a1b301a4	block/dirty-bitmaps: unify qmp_locked and user_locked calls These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	50a47257f8	block/dirty-bitmaps: rename frozen predicate helper "Frozen" was a good description a long time ago, but it isn't adequate now. Rename the frozen predicate to has_successor to make the semantics of the predicate more clear to outside callers. In the process, remove some calls to frozen() that no longer semantically make sense. For bdrv_enable_dirty_bitmap_locked and bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU internals from performing this action when we only wished to prohibit QMP users from issuing these commands. All of the QMP API commands for bitmap manipulation already check against user_locked() to prohibit these actions. Several other assertions really want to check that the bitmap isn't in-use by another operation -- use the bitmap_user_locked function for this instead, which presently also checks for has_successor. This leaves some redundant checks of has_successor through different helpers that are addressed in forthcoming patches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
Peter Maydell	e2a18635a4	nbd patches for 2019-03-08 - support TLS client authorization in NBD servers - iotest 223 race fix -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJcgqh3AAoJEKeha0olJ0Nq/bgH/1TXo49gC9SMNcBBHd5vqc6/ J+eXYQihmGLy7pNkfCBTB0QZz9d7V4tN/N1PAuIfzsHxcQJeyUBwcY7jin2SiTTM lfR9NWDY43OE+8tcPSXODyo3mge8g3d1X3vw8/QMX95TDrKQ8SMwAllegCFBKPZs T0+Jyfd8oA0NcQz4EPPUL5f2ptLo2slye2ZjbMBn/1WFrYkL+joUYJgyakYcZnY/ mcvmXF2JLG2fPzFoU1yvF+oZn6J2fx5pw92P+SZ7lA+qRzlWfvrVyK9sNqCS+K5m qdfMeeL/SyUPsUvlcbDH7iSjxWkR/h7MtXRq83FHzupasMeXiQ9ieb3MFAtHnGM= =5pyZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-03-08' into staging nbd patches for 2019-03-08 - support TLS client authorization in NBD servers - iotest 223 race fix # gpg: Signature made Fri 08 Mar 2019 17:37:59 GMT # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2019-03-08: iotests: Wait for qemu to end in 223 nbd: fix outdated qapi docs syntax for tls-creds nbd: allow authorization with nbd-server-start QMP command qemu-nbd: add support for authorization of TLS clients Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-09 20:55:44 +00:00
Kevin Wolf	6c3944dc62	qcow2: Implement data-file-raw create option Provide an option to force QEMU to always keep the external data file consistent as a standalone read-only raw image. At the moment, this means making sure that write_zeroes requests are forwarded to the data file instead of just updating the metadata, and checking that no backing file is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	dcc98687f8	qcow2: Creating images with external data file This adds a .bdrv_create option to use an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Andrey Shinkevich	9ac404c523	block: iterate_format with account of whitelisting bdrv_iterate_format (which is currently only used for printing out the formats supported by the block layer) doesn't take format whitelisting into account. This creates a problem for tests: they enumerate supported formats to decide which tests to enable, but then discover that QEMU doesn't let them actually use some of those formats. To avoid that, exclude formats that are not whitelisted from enumeration, if whitelisting is in use. Since we have separate whitelists for r/w and r/o, take this a parameter to bdrv_iterate_format, and print two lists of supported formats (r/w and r/o) in main qemu. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Daniel P. Berrange	000194556b	nbd: allow authorization with nbd-server-start QMP command As with the previous patch to qemu-nbd, the nbd-server-start QMP command also needs to be able to specify authorization when enabling TLS encryption. First the client must create a QAuthZ object instance using the 'object-add' command: { 'execute': 'object-add', 'arguments': { 'qom-type': 'authz-list', 'id': 'authz0', 'parameters': { 'policy': 'deny', 'rules': [ { 'match': '*CN=fred', 'policy': 'allow' } ] } } } They can then reference this in the new 'tls-authz' parameter when executing the 'nbd-server-start' command: { 'execute': 'nbd-server-start', 'arguments': { 'addr': { 'type': 'inet', 'host': '127.0.0.1', 'port': '9000' }, 'tls-creds': 'tls0', 'tls-authz': 'authz0' } } Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20190227162035.18543-3-berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-03-06 11:05:27 -06:00
Daniel P. Berrange	b25e12daff	qemu-nbd: add support for authorization of TLS clients Currently any client which can complete the TLS handshake is able to use the NBD server. The server admin can turn on the 'verify-peer' option for the x509 creds to require the client to provide a x509 certificate. This means the client will have to acquire a certificate from the CA before they are permitted to use the NBD server. This is still a fairly low bar to cross. This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which takes the ID of a previously added 'QAuthZ' object instance. This will be used to validate the client's x509 distinguished name. Clients failing the authorization check will not be permitted to use the NBD server. For example to setup authorization that only allows connection from a client whose x509 certificate distinguished name is CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB escape the commas in the name and use: qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\ endpoint=server,verify-peer=yes \ --object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\ O=Example Org,,L=London,,ST=London,,C=GB' \ --tls-creds tls0 \ --tls-authz authz0 \ ....other qemu-nbd args... NB: a real shell command line would not have leading whitespace after the line continuation, it is just included here for clarity. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20190227162035.18543-2-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: split long line in --help text, tweak 233 to show that whitespace after ,, in identity= portion is actually okay] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-03-06 11:05:27 -06:00
Max Reitz	998b3a1e5a	block: Purify .bdrv_refresh_filename() Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	abc521a9aa	block: Add BlockDriver.bdrv_gather_child_options Some follow-up patches will rework the way bs->full_open_options is refreshed in bdrv_refresh_filename(). The new implementation will remove the need for the block drivers' bdrv_refresh_filename() implementations to set bs->full_open_options; instead, it will be generic and use static information from each block driver. However, by implementing bdrv_gather_child_options(), block drivers will still be able to override the way the full_open_options of their children are incorporated into their own. We need to implement this function for VMDK because we have to prevent the generic implementation from gathering the options of all children: It is not possible to specify options for the extents through the runtime options. For quorum, the child names that would be used by the generic implementation and the ones that we actually (currently) want to use differ. See quorum_gather_child_options() for more information. Note that both of these are cases which are not ideal: In case of VMDK it would probably be nice to be able to specify options for all extents. In case of quorum, the current runtime option structure is simply broken and needs to be fixed (but that is left for another patch). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-23-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	2654267cc1	block: Add strong_runtime_opts to BlockDriver This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	1e89d0f9be	block: Add bdrv_dirname() This function may be implemented by block drivers to derive a directory name from a BDS. Concatenating this g_free()-able string with a relative filename must result in a valid (not necessarily existing) filename, so this is a function that should generally be not implemented by format drivers, because this is protocol-specific. If a BDS's driver does not implement this function, bdrv_dirname() will fall through to the BDS's file if it exists. If it does not, the exact_filename field will be used to generate a directory name. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	6b6833c1b4	block: bdrv_get_full_backing_filename's ret. val. Make bdrv_get_full_backing_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	645ae7d88e	block: bdrv_get_full_backing_filename_from_...'s ret. val. Make bdrv_get_full_backing_filename_from_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	009b03aaa2	block: Make path_combine() return the path Besides being safe for arbitrary path lengths, after some follow-up patches all callers will want a freshly allocated buffer anyway. In the meantime, path_combine_deprecated() is added which has the same interface as path_combine() had before this patch. All callers to that function will be converted in follow-up patches. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20190201192935.18394-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	998c201923	block: Add BDS.auto_backing_file If the backing file is overridden, this most probably does change the guest-visible data of a BDS. Therefore, we will need to consider this in bdrv_refresh_filename(). To see whether it has been overridden, we might want to compare bs->backing_file and bs->backing->bs->filename. However, bs->backing_file is changed by bdrv_set_backing_hd() (which is just used to change the backing child at runtime, without modifying the image header), so bs->backing_file most of the time simply contains a copy of bs->backing->bs->filename anyway, so it is useless for such a comparison. This patch adds an auto_backing_file BDS field which contains the backing file path as indicated by the image header, which is not changed by bdrv_set_backing_hd(). Because of bdrv_refresh_filename() magic, however, a BDS's filename may differ from what has been specified during bdrv_open(). Then, the comparison between bs->auto_backing_file and bs->backing->bs->filename may fail even though bs->backing was opened from bs->auto_backing_file. To mitigate this, we can copy the real BDS's filename (after the whole bdrv_open() and bdrv_refresh_filename() process) into bs->auto_backing_file, if we know the former has been opened based on the latter. This is only possible if no options modifying the backing file's behavior have been specified, though. To simplify things, this patch only copies the filename from the backing file if no options have been specified for it at all. Furthermore, there are cases where an overlay is created by qemu which already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We do not need to worry about updating the overlay's bs->auto_backing_file there, because we actually wrote a post-bdrv_refresh_filename() filename into the image header. So all in all, there will be false negatives where (as of a future patch) bdrv_refresh_filename() will assume that the backing file differs from what was specified in the image header, even though it really does not. However, these cases should be limited to where (1) the user actually did override something in the backing chain (e.g. by specifying options for the backing file), or (2) the user executed a QMP command to change some node's backing file (e.g. change-backing-file or block-commit with @backing-file given) where the given filename does not happen to coincide with qemu's idea of the backing BDS's filename. Then again, (1) really is limited to -drive. With -blockdev or blockdev-add, you have to adhere to the schema, so a user cannot give partial "unimportant" options (e.g. by just setting backing.node-name and leaving the rest to the image header). Therefore, trying to fix this would mean trying to fix something for -drive only. To improve on (2), we would need a full infrastructure to "canonicalize" an arbitrary filename (+ options), so it can be compared against another. That seems a bit over the top, considering that filenames nowadays are there mostly for the user's entertainment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Max Reitz	f30c66ba6e	block: Use bdrv_refresh_filename() to pull Before this patch, bdrv_refresh_filename() is used in a pushing manner: Whenever the BDS graph is modified, the parents of the modified edges are supposed to be updated (recursively upwards). However, that is nonviable, considering that we want child changes not to concern parents. Also, in the long run we want a pull model anyway: Here, we would have a bdrv_filename() function which returns a BDS's filename, freshly constructed. This patch is an intermediate step. It adds bdrv_refresh_filename() calls before every place a BDS.filename value is used. The only exceptions are protocol drivers that use their own filename, which clearly would not profit from refreshing that filename before. Also, bdrv_get_encrypted_filename() is removed along the way (as a user of BDS.filename), since it is completely unused. In turn, all of the calls to bdrv_refresh_filename() before this patch are removed, because we no longer have to call this function on graph changes. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Vladimir Sementsov-Ogievskiy	f962e96150	block: fix bdrv_check_perm for non-tree subgraph bdrv_check_perm in it's recursion checks each node in context of new permissions for one parent, because of nature of DFS. It works well, while children subgraph of top-most updated node is a tree, i.e. it doesn't have any kind of loops. But if we have a loop (not oriented, of course), i.e. we have two different ways from top-node to some child-node, then bdrv_check_perm will do wrong thing: top \| \ \| \| v v A B \| \| v v node It will once check new permissions of node in context of new A permissions and old B permissions and once visa-versa. It's a wrong way and may lead to corruption of permission system. We may start with no-permissions and all-shared for both A->node and B->node relations and finish up with non shared write permission for both ways. The following commit will add a test, which shows this bug. To fix this situation, let's really set BdrvChild permissions during bdrv_check_perm procedure. And we are happy here, as check-perm is already written in transaction manner, so we just need to restore backed-up permissions in _abort. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:19 +01:00
Kevin Wolf	d3bd5b9089	nbd: Use low-level QIOChannel API in nbd_read_eof() Instead of using the convenience wrapper qio_channel_read_all_eof(), use the lower level QIOChannel API. This means duplicating some code, but we'll need this because this coroutine yield is special: We want it to be interruptible so that nbd_client_attach_aio_context() can correctly reenter the coroutine. This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so that connection_co will always sit in this exact qio_channel_yield() call when bdrv_drain() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-02-25 15:03:19 +01:00
Kevin Wolf	a7b78fc944	nbd: Move nbd_read_eof() to nbd/client.c The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't have to live in the header file, but can move next to its caller. Also add the missing coroutine_fn to the function and its caller. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-02-25 15:03:19 +01:00
Daniel Henrique Barboza	8c04093c8c	block/snapshot: remove bdrv_snapshot_delete_by_id_or_name After the previous patch, the only instance of this function left is inside qemu-img.c. qemu-img is using it inside the 'img_snapshot' function to delete snapshots in the SNAPSHOT_DELETE case, based on a "snapshot_name" string that refers to the tag, not ID, of the QEMUSnapshotInfo struct. This can be verified by checking the SNAPSHOT_CREATE case that comes shortly before SNAPSHOT_DELETE. In that case, the same "snapshot_name" variable is being strcpy to the 'name' field of the QEMUSnapshotInfo struct sn: pstrcpy(sn.name, sizeof(sn.name), snapshot_name); Based on that, it is unlikely that "snapshot_name" might contain an "id" in SNAPSHOT_DELETE. This patch changes SNAPSHOT_DELETE to use snapshot_find() and snapshot_delete() instead of bdrv_snapshot_delete_by_id_or_name. After that, there is no instances left of bdrv_snapshot_delete_by_id_or_name in the code, so it is safe to remove it entirely. Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:03:18 +01:00
Andrey Shinkevich	1bf6e9ca92	bdrv_query_image_info Error parameter added Inform a user in case qcow2_get_specific_info fails to obtain QCOW2 image specific information. This patch is preliminary to the one "qcow2: Add list of bitmaps to ImageInfoSpecificQCow2". Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1549638368-530182-2-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-11 14:35:43 -06:00
Vladimir Sementsov-Ogievskiy	e6798f06a6	nbd: generalize usage of nbd_read We generally do very similar things around nbd_read: error_prepend specifying what we have tried to read, and be_to_cpu conversion of integers. So, it seems reasonable to move common things to helper functions, which: 1. simplify code a bit 2. generalize nbd_read error descriptions, all starting with "Failed to read" 3. make it more difficult to forget to convert things from BE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com> [eblake: rename macro to DEF_NBD_READ_N and formatting tweaks; checkpatch has false positive complaint] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-04 15:11:27 -06:00
Vladimir Sementsov-Ogievskiy	5d3b4e9946	qapi: add x-debug-query-block-graph Add a new command, returning block nodes (and their users) graph. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20181221170909.25584-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-01-31 00:38:19 +01:00
Stefan Hajnoczi	bc19a0a6e4	throttle-groups: fix restart coroutine iothread race The following QMP command leads to a crash when iothreads are used: { 'execute': 'device_del', 'arguments': {'id': 'data'} } The backtrace involves the queue restart coroutine where tgm->throttle_state is a NULL pointer because throttle_group_unregister_tgm() has already been called: (gdb) bt full #0 0x00005585a7a3b378 in qemu_mutex_lock_impl (mutex=0xffffffffffffffd0, file=0x5585a7bb3d54 "block/throttle-groups.c", line=412) at util/qemu-thread-posix.c:64 err = <optimized out> __PRETTY_FUNCTION__ = "qemu_mutex_lock_impl" __func__ = "qemu_mutex_lock_impl" #1 0x00005585a79be074 in throttle_group_restart_queue_entry (opaque=0x5585a9de4eb0) at block/throttle-groups.c:412 _f = <optimized out> data = 0x5585a9de4eb0 tgm = 0x5585a9079440 ts = 0x0 tg = 0xffffffffffffff98 is_write = false empty_queue = 255 This coroutine should not execute in the iothread after the throttle group member has been unregistered! The root cause is that the device_del code path schedules the restart coroutine in the iothread while holding the AioContext lock. Therefore the iothread cannot execute the coroutine until after device_del releases the lock - by this time it's too late. This patch adds a reference count to ThrottleGroupMember so we can synchronously wait for restart coroutines to complete. Once they are done it is safe to unregister the ThrottleGroupMember. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190114133257.30299-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-01-24 10:02:28 +00:00
Eric Blake	0b576b6bfb	nbd/client: Add meta contexts to nbd_receive_export_list() We want to be able to detect whether a given qemu NBD server is exposing the right export(s) and dirty bitmaps, at least for regression testing. We could use 'nbd-client -l' from the upstream NBD project to list exports, but it's annoying to rely on out-of-tree binaries; furthermore, nbd-client doesn't necessarily know about all of the qemu NBD extensions. Thus, we plan on adding a new mode to qemu-nbd that merely sniffs all possible information from the server during handshake phase, then disconnects and dumps the information. This patch continues the work of the previous patch, by adding the ability to track the list of available meta contexts into NBDExportInfo. It benefits from the recent refactoring patches with a new nbd_list_meta_contexts() that reuses much of the same framework as setting a meta context. Note: a malicious server could exhaust memory of a client by feeding an unending loop of contexts; perhaps we could place a limit on how many we are willing to receive. But this is no different from our earlier analysis on a server sending an unending list of exports, and the death of a client due to memory exhaustion when the client was going to exit soon anyways is not really a denial of service attack. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-19-eblake@redhat.com>	2019-01-21 15:49:52 -06:00
Eric Blake	d21a2d3451	nbd/client: Add nbd_receive_export_list() We want to be able to detect whether a given qemu NBD server is exposing the right export(s) and dirty bitmaps, at least for regression testing. We could use 'nbd-client -l' from the upstream NBD project to list exports, but it's annoying to rely on out-of-tree binaries; furthermore, nbd-client doesn't necessarily know about all of the qemu NBD extensions. Thus, we plan on adding a new mode to qemu-nbd that merely sniffs all possible information from the server during handshake phase, then disconnects and dumps the information. This patch adds the low-level client code for grabbing the list of exports. It benefits from the recent refactoring patches, in order to share as much code as possible when it comes to doing validation of server replies. The resulting information is stored in an array of NBDExportInfo which has been expanded to any description string, along with a convenience function for freeing the list. Note: a malicious server could exhaust memory of a client by feeding an unending loop of exports; perhaps we should place a limit on how many we are willing to receive. But note that a server could reasonably be serving an export for every file in a large directory, where an arbitrary limit in the client means we can't list anything from such a server; the same happens if we just run until the client fails to malloc() and thus dies by an abort(), where the limit is no longer arbitrary but determined by available memory. Since the client is already planning on being short-lived, it's hard to call this a denial of service attack that would starve off other uses, so it does not appear to be a security issue. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190117193658.16413-18-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-21 15:49:52 -06:00
Eric Blake	2df94eb52b	nbd/client: Change signature of nbd_negotiate_simple_meta_context() Pass 'info' instead of three separate parameters related to info, when requesting the server to set the meta context. Update the NBDExportInfo struct to rename the received id field to match the fact that we are currently overloading the field to match whatever context the user supplied through the x-dirty-bitmap hack, as well as adding a TODO comment to remind future patches about a desire to request two contexts at once. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-11-eblake@redhat.com>	2019-01-21 15:49:52 -06:00
Eric Blake	6dc1667d68	nbd/client: Move export name into NBDExportInfo Refactor the 'name' parameter of nbd_receive_negotiate() from being a separate parameter into being part of the in-out 'info'. This also spills over to a simplification of nbd_opt_go(). The main driver for this refactoring is that an upcoming patch would like to add support to qemu-nbd to list information about all exports available on a server, where the name(s) will be provided by the server instead of the client. But another benefit is that we can now allow the client to explicitly specify the empty export name "" even when connecting to an oldstyle server (even if qemu is no longer such a server after commit `7f7dfe2a`). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-10-eblake@redhat.com>	2019-01-21 15:49:52 -06:00
Eric Blake	9d26dfcbab	nbd/server: Favor [u]int64_t over off_t Although our compile-time environment is set up so that we always support long files with 64-bit off_t, we have no guarantee whether off_t is the same type as int64_t. This requires casts when printing values, and prevents us from directly using qemu_strtoi64() (which will be done in the next patch). Let's just flip to uint64_t where possible, and stick to int64_t for detecting failure of blk_getlength(); we also keep the assertions added in the previous patch that the resulting values fit in 63 bits. The overflow check in nbd_co_receive_request() was already sane (request->from is validated to fit in 63 bits, and request->len is 32 bits, so the addition can't overflow 64 bits), but rewrite it in a form easier to recognize as a typical overflow check. Rename the variable 'description' to keep line lengths reasonable. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-21 15:49:51 -06:00
Vladimir Sementsov-Ogievskiy	166cd55125	Revert "block/dirty-bitmap: Add bdrv_dirty_iter_next_area" This reverts commit `72d10a9421`. The function is unused now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy	a78a1a48cd	dirty-bitmap: add bdrv_dirty_bitmap_next_dirty_area The function alters bdrv_dirty_iter_next_area(), which is wrong and less efficient (see further commit "block/mirror: fix and improve do_sync_target_write" for description). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-15 18:26:50 -05:00
Vladimir Sementsov-Ogievskiy	76d570dc49	dirty-bitmap: improve bdrv_dirty_bitmap_next_zero Add bytes parameter to the function, to limit searched range. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-01-15 18:26:49 -05:00
Eric Blake	678ba275c7	nbd: Merge nbd_export_bitmap into nbd_export_new We only have one caller that wants to export a bitmap name, which it does right after creation of the export. But there is still a brief window of time where an NBD client could see the export but not the dirty bitmap, which a robust client would have to interpret as meaning the entire image should be treated as dirty. Better is to eliminate the window entirely, by inlining nbd_export_bitmap() into nbd_export_new(), and refusing to create the bitmap in the first place if the requested bitmap can't be located. We also no longer need logic for setting a different bitmap name compared to the bitmap being exported. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-8-eblake@redhat.com>	2019-01-14 10:09:46 -06:00

... 2 3 4 5 6 ...

1228 commits