qemu-patch-raspberry4

Author	SHA1	Message	Date
Eric Blake	75d34eb98c	nbd/client: Trace server noncompliance on structured reads Just as we recently added a trace for a server sending block status that doesn't match the server's advertised minimum block alignment, let's do the same for read chunks. But since qemu 3.1 is such a server (because it advertised 512-byte alignment, but when serving a file that ends in data but is not sector-aligned, NBD_CMD_READ would detect a mid-sector change between data and hole at EOF and the resulting read chunks are unaligned), we don't want to change our behavior of otherwise tolerating unaligned reads. Note that even though we fixed the server for 4.0 to advertise an actual block alignment (which gets rid of the unaligned reads at EOF for posix files), we can still trigger it via other means: $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file Arguably, that is a bug in the blkdebug block status function, for leaking a block status that is not aligned. It may also be possible to observe issues with a backing layer with smaller alignment than the active layer, although so far I have been unable to write a reliable iotest for that scenario. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190330165349.32256-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-04-01 08:58:04 -05:00
Eric Blake	4841211e0d	block: Add bdrv_get_request_alignment() The next patch needs access to a device's minimum permitted alignment, since NBD wants to advertise this to clients. Add an accessor function, borrowing from blk_get_max_transfer() for accessing a backend's block limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190329042750.14704-6-eblake@redhat.com>	2019-04-01 08:46:52 -05:00
Eric Blake	9cf638508c	nbd/client: Support qemu-img convert from unaligned size If an NBD server advertises a size that is not a multiple of a sector, the block layer rounds up that size, even though we set info.size to the exact byte value sent by the server. The block layer then proceeds to let us read or query block status on the hole that it added past EOF, which the NBD server is unlikely to be happy with. Fortunately, qemu as a server never advertizes an unaligned size, so we generally don't run into this problem; but the nbdkit server makes it easy to test: $ printf %1000d 1 > f1 $ ~/nbdkit/nbdkit -fv file f1 & pid=$! $ qemu-img convert -f raw nbd://localhost:10809 f2 $ kill $pid $ qemu-img compare f1 f2 Pre-patch, the server attempts a 1024-byte read, which nbdkit rightfully rejects as going beyond its advertised 1000 byte size; the conversion fails and the output files differ (not even the first sector is copied, because qemu-img does not follow ddrescue's habit of trying smaller reads to get as much information as possible in spite of errors). Post-patch, the client's attempts to read (and query block status, for new enough nbdkit) are properly truncated to the server's length, with sane handling of the hole the block layer forced on us. Although f2 ends up as a larger file (1024 bytes instead of 1000), qemu-img compare shows the two images to have identical contents for display to the guest. I didn't add iotests coverage since I didn't want to add a dependency on nbdkit in iotests. I also did NOT patch write, trim, or write zeroes - these commands continue to fail (usually with ENOSPC, but whatever the server chose), because we really can't write to the end of the file, and because 'qemu-img convert' is the most common case where we care about being tolerant (which is read-only). Perhaps we could truncate the request if the client is writing zeros to the tail, but that seems like more work, especially if the block layer is fixed in 4.1 to track byte-accurate sizing (in which case this patch would be reverted as unnecessary). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-5-eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2019-04-01 08:32:44 -05:00
Eric Blake	a62a85ef5c	nbd/client: Report offsets in bdrv_block_status It is desirable for 'qemu-img map' to have the same output for a file whether it is served over file or nbd protocols. However, ever since we implemented block status for NBD (2.12), the NBD protocol forgot to inform the block layer that as the final layer in the chain, the offset is valid; without an offset, the human-readable form of qemu-img map gives up with the unhelpful: $ nbdkit -U - data data="1" size=512 --run 'qemu-img map $nbd' Offset Length Mapped to File qemu-img: File contains external, encrypted or compressed clusters. The --output=json form always works, because it is reporting the lower-level bdrv_block_status results directly rather than trying to filter out sparse ranges for human consumption - but now it also shows the offset member. With this patch, the human output changes to: Offset Length Mapped to File 0 0x200 0 nbd+unix://?socket=/tmp/nbdkitOxeoLa/socket This change is observable to several iotests. Fixes: `78a33ab5` Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 20:52:29 -05:00
Eric Blake	7da537f70d	nbd/client: Lower min_block for block-status, unaligned size We have a latent bug in our NBD client code, tickled by the brand new nbdkit 1.11.10 block status support: $ nbdkit --filter=log --filter=truncate -U - \ data data="1" size=511 truncate=64K logfile=/dev/stdout \ --run 'qemu-img convert $nbd /var/tmp/out' ... qemu-img: block/io.c:2122: bdrv_co_block_status: Assertion `pnum && QEMU_IS_ALIGNED(pnum, align) && align > offset - aligned_offset' failed. The culprit? Our implementation of .bdrv_co_block_status can return unaligned block status for any server that operates with a lower actual alignment than what we tell the block layer in request_alignment, in violation of the block layer's constraints. To date, we've been unable to trip the bug, because qemu as NBD server always advertises block sizing (at which point it is a server bug if the server sends unaligned status - although qemu 3.1 is such a server and I've sent separate patches for 4.0 both to get the server to obey the spec, and to let the client to tolerate server oddities at EOF). But nbdkit does not (yet) advertise block sizing, and therefore is not in violation of the spec for returning block status at whatever boundaries it wants, and those unaligned results can occur anywhere rather than just at EOF. While we are still wise to avoid sending sub-sector read/write requests to a server of unknown origin, we MUST consider that a server telling us block status without an advertised block size is correct. So, we either have to munge unaligned answers from the server into aligned ones that we hand back to the block layer, or we have to tell the block layer about a smaller alignment. Similarly, if the server advertises an image size that is not sector-aligned, we might as well assume that the server intends to let us access those tail bytes, and therefore supports a minimum block size of 1, regardless of whether the server supports block status (although we still need more patches to fix the problem that with an unaligned image, we can send read or block status requests that exceed EOF to the server). Again, qemu as server cannot trip this problem (because it rounds images to sector alignment), but nbdkit advertised unaligned size even before it gained block status support. Solve both alignment problems at once by using better heuristics on what alignment to report to the block layer when the server did not give us something to work with. Note that very few NBD servers implement block status (to date, only qemu and nbdkit are known to do so); and as the NBD spec mentioned block sizing constraints prior to documenting block status, it can be assumed that any future implementations of block status are aware that they must advertise block size if they want a minimum size other than 1. We've had a long history of struggles with picking the right alignment to use in the block layer, as evidenced by the commit message of `fd8d372d` (v2.12) that introduced the current choice of forced 512-byte alignment. There is no iotest coverage for this fix, because qemu can't provoke it, and I didn't want to make test 241 dependent on nbdkit. Fixes: `fd8d372d` Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2019-03-30 20:52:19 -05:00
Eric Blake	737d3f5244	nbd-client: Work around server BLOCK_STATUS misalignment at EOF The NBD spec is clear that a server that advertises a minimum block size should reply to NBD_CMD_BLOCK_STATUS with extents aligned accordingly. However, we know that the qemu NBD server implementation has had a corner-case bug where it is not compliant with the spec, present since the introduction of NBD_CMD_BLOCK_STATUS in qemu 2.12 (and unlikely to be patched in time for 4.0). Namely, when qemu is serving a file that is not a multiple of 512 bytes, it rounds the size advertised over NBD up to the next sector boundary (someday, I'd like to fix that to be byte-accurate, but it's a much bigger audit not appropriate for this release); yet if the final sector contains data prior to EOF, lseek(SEEK_HOLE) will point to the implicit hole mid-sector which qemu then reported over NBD. We are well within our rights to hang up on a server that can't follow the spec, but it is more useful to try and keep the connection alive in spite of the problem. Do so by tracing a message about the problem, and then either truncating the request back to an aligned boundary (if it covered more than the final sector) or widening it out to the full boundary with a forced status of data (since truncating would result in 0 bytes, but we have to make progress, and valid since data is a default-safe answer). And in practice, since the problem only happens on a sector that starts with data and ends with a hole, we are going to want to read that full sector anyway (where qemu as the server fills in the tail beyond EOF with appropriate NUL bytes). Easy reproduction: $ printf %1000d 1 > file $ qemu-nbd -f raw -t file & pid=$! $ qemu-img map --output=json -f raw nbd://localhost:10809 qemu-img: Could not read file metadata: Invalid argument $ kill $pid where the patched version instead succeeds with: [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true}] Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190326171317.4036-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	ebd82cd872	nbd: Permit simple error to NBD_CMD_BLOCK_STATUS The NBD spec is clear that when structured replies are active, a simple error reply is acceptable to any command except for NBD_CMD_READ. However, we were mistakenly requiring structured errors for NBD_CMD_BLOCK_STATUS, and hanging up on a server that gave a simple error (since qemu does not behave as such a server, we didn't notice the problem until now). Broken since its introduction in commit `78a33ab5` (v2.12). Noticed while debugging a separate failure reported by nbdkit while working out its initial implementation of BLOCK_STATUS, although it turns out that nbdkit also chose to send structured error replies for BLOCK_STATUS, so I had to manually provoke the situation by hacking qemu's server to send a simple error reply: \| diff --git i/nbd/server.c w/nbd/server.c \| index fd013a2817a..833288d7c45 100644 \| 00--- i/nbd/server.c \| +++ w/nbd/server.c \| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, \| "discard failed", errp); \| \| case NBD_CMD_BLOCK_STATUS: \| + return nbd_co_send_simple_reply(client, request->handle, ENOMEM, \| + NULL, 0, errp); \| if (!request->len) { \| return nbd_send_generic_reply(client, request->handle, -EINVAL, \| "need non-zero length", errp); \| Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190325190104.30213-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	b29f3a3d2a	nbd: Don't lose server's error to NBD_CMD_BLOCK_STATUS When the server replies with a (structured []) error to NBD_CMD_BLOCK_STATUS, without any extent information sent first, the client code was blindly throwing away the server's error code and instead telling the caller that EIO occurred. This has been broken since its introduction in `78a33ab5` (v2.12, where we should have called: error_setg(&local_err, "Server did not reply with any status extents"); nbd_iter_error(&iter, false, -EIO, &local_err); to declare the situation as a non-fatal error if no earlier error had already been flagged, rather than just blindly slamming iter.err and iter.ret), although it is more noticeable since commit `7f86068d`, which actually tries hard to preserve the server's code thanks to a separate iter.request_ret. [] The spec is clear that the server is also permitted to reply with a simple error, but that's a separate fix. I was able to provoke this scenario with a hack to the server, then seeing whether ENOMEM makes it back to the caller: \| diff --git a/nbd/server.c b/nbd/server.c \| index fd013a2817a..29c7995de02 100644 \| --- a/nbd/server.c \| +++ b/nbd/server.c \| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, \| "discard failed", errp); \| \| case NBD_CMD_BLOCK_STATUS: \| + return nbd_send_generic_reply(client, request->handle, -ENOMEM, \| + "no status for you today", errp); \| if (!request->len) { \| return nbd_send_generic_reply(client, request->handle, -EINVAL, \| "need non-zero length", errp); \| -- Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190325190104.30213-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	a39286dd61	nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS The NBD spec states that NBD_CMD_FLAG_REQ_ONE (which we currently always use) should not reply with an extent larger than our request, and that the server's response should be exactly one extent. Right now, that means that if a server sends more than one extent, we treat the server as broken, fail the block status request, and disconnect, which prevents all further use of the block device. But while good software should be strict in what it sends, it should be tolerant in what it receives. While trying to implement NBD_CMD_BLOCK_STATUS in nbdkit, we temporarily had a non-compliant server sending too many extents in spite of REQ_ONE. Oddly enough, 'qemu-img convert' with qemu 3.1 failed with a somewhat useful message: qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS which then disappeared with commit `d8b4bad8`, on the grounds that an error message flagged only at the time of coroutine teardown is pointless, and instead we should rely on the actual failed API to report an error - in other words, the 3.1 behavior was masking the fact that qemu-img was not reporting an error. That has since been fixed in the previous patch, where qemu-img convert now fails with: qemu-img: error while reading block status of sector 0: Invalid argument But even that is harsh. Since we already partially relaxed things in commit `acfd8f7a` to tolerate a server that exceeds the cap (although that change was made prior to the NBD spec actually putting a cap on the extent length during REQ_ONE - in fact, the NBD spec change was BECAUSE of the qemu behavior prior to that commit), it's not that much harder to argue that we should also tolerate a server that sends too many extents. But at the same time, it's nice to trace when we are being tolerant of server non-compliance, in order to help server writers fix their implementations to be more portable (if they refer to our traces, rather than just stderr). Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190323212639.579-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Kevin Wolf	738301e117	file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes We know that the kernel implements a slow fallback code path for BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it. The other operations we call in the context of .bdrv_co_pwrite_zeroes should usually be quick, so no modification should be needed for them. If we ever notice that there are additional problematic cases, we can still make these conditional as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	80f5c33ff3	block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers Filter drivers that support .bdrv_co_pwrite_zeroes can safely advertise BDRV_REQ_NO_FALLBACK because they just forward the request flags to their child node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	fe0480d629	block: Add BDRV_REQ_NO_FALLBACK For qemu-img convert, we want an operation that zeroes out the whole image if this can be done efficiently, but that returns an error otherwise so we don't write explicit zeroes and immediately overwrite them with the real data, potentially doubling the amount of data to be written. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	48ce986096	block: Remove error messages in bdrv_make_zero() There is only a single caller of bdrv_make_zero(), which is qemu-img convert. If the function fails, we just fall back to a different method of zeroing out blocks on the target image. There is no good reason to print error messages on stderr when the higher level operation will actually succeed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Markus Armbruster	a9779a3ab0	trace-events: Delete unused trace points Tracked down with cleanup-trace-events.pl. Funnies requiring manual post-processing: * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-5-armbru@redhat.com Message-Id: <20190314180929.27722-5-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Markus Armbruster	500016e5db	trace-events: Shorten file names in comments We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Alberto Garcia	782b9d06bf	block: Make bdrv_{copy_on_read,crypto_luks,replication} static Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Sam Eiderman	b69864e5a8	vmdk: Support version=3 in VMDK descriptor files Commit `509d39aa22` added support for read only VMDKs of version 3. This commit fixes the probe function to correctly handle descriptors of version 3. This commit has two effects: 1. We no longer need to supply '-f vmdk' when pointing to descriptor files of version 3 in qemu/qemu-img command line arguments. 2. This fixes the scenario where a VMDK points to a parent version 3 descriptor file which is being probed as "raw" instead of "vmdk". Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Kevin Wolf	a0cf83639c	qcow2: Fix data file error condition in qcow2_co_create() We were trying to check whether bdrv_open_blockdev_ref() returned success, but accidentally checked the wrong variable. Spotted by Coverity (CID 1399703). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2019-03-19 15:49:29 +01:00
Sergio Lopez	5e771752a1	mirror: Confirm we're quiesced only if the job is paused or cancelled While child_job_drained_begin() calls to job_pause(), the job doesn't actually transition between states until it runs again and reaches a pause point. This means bdrv_drained_begin() may return with some jobs using the node still having 'busy == true'. As a consequence, block_job_detach_aio_context() may get into a deadlock, waiting for the job to be actually paused, while the coroutine servicing the job is yielding and doesn't get the opportunity to get scheduled again. This situation can be reproduced by issuing a 'block-commit' immediately followed by a 'device_del'. To ensure bdrv_drained_begin() only returns when the jobs have been paused, we change mirror_drained_poll() to only confirm it's quiesced when job->paused == true and there aren't any in-flight requests, except if we reached that point by a drained section initiated by the mirror/commit job itself. The other block jobs shouldn't need any changes, as the default drained_poll() behavior is to only confirm it's quiesced if the job is not busy or completed. Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Peter Maydell	dbbc277510	Pull request * Add 'drop-cache=on\|off' option to file-posix.c. The default is on. Disabling the option fixes a QEMU 3.0.0 performance regression when live migrating on the same host with cache.direct=off. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJciOSEAAoJEJykq7OBq3PIVSUIAI6r2Mgoi+no4nle8Jf2nZ+W EnQXnNEFyJA0lKRtqQ2UILD9udVdKd/L1PZu5k/Il/Ralto9Yf3+62brekI7rsss c3Qusu4LUK6jom2RslRjRIaJ9GilQi/jWezKV/O0VlcsMVemgVHX008EIR+ea1U4 H0/u2kfu04PciKQ5MR2+6aacu9bfmyH1yM2no+aMN5dDu/38PV6JEsf0Zl2agowg opGepJ7YiDQsxH9IBXrbfm38mBrrY0K2vFzAb9BzTHfBPotGMNIZNJNM2FChRfoM sTjOIpZz3NDwPEUPQPZxp+7YKRFFYfse1oHtpyh4n1rMQksB019SCGlP9TBhrF0= =CH5G -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request * Add 'drop-cache=on\|off' option to file-posix.c. The default is on. Disabling the option fixes a QEMU 3.0.0 performance regression when live migrating on the same host with cache.direct=off. # gpg: Signature made Wed 13 Mar 2019 11:07:48 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: file-posix: add drop-cache=on\|off option Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-14 09:34:51 +00:00
Peter Maydell	523a2a42c3	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlyIFSwACgkQfe+BBqr8 OQ7wghAAm16eCEr57oTO7QXR3y8uVFsKqXBn9cNH6nbrFp2PUQSglwMDKBls1Z5m olF23X/JaqSlSmkL9BBuzDZ6Up+kkHKuxPq4/5RKXfiDI0pr3R0eqts0COAlaN9q Bew3ipj99m8gzMi2093AW4+Ob0N3658fuDTGLe1M1Uoy7CEg1QJ7rVOBBEui7vIl RbZ8l/Zmb4ldNpB3lnE4Nh9ue8fy0RAj3Nai161nCnNeXNF/VzD3Ye8bojSBbnux PIMX6/RWmykX4feIf9QP8apDpxX4HkyuPq5EdwT9PD8PwdyXPAXZtsYUNCuNtQuk n5VKFVgFYgqUclBeVHmrMYPU4K4iCFQp4/Fua7wzPEC0iG05NiiDv91oVkEJCp3L ManHeuGfNLCcXaIntKZhuJl1cK8yMM3yDww6/pPTehrPjcyvKa0NOqhQBExektcD R6q7maJRzFaxSxdcs+Zzuog9zESvH1mlJxQCKzeYhAP0kkxInyTELE/Vbx37xuqR RFfZYyVQ6x87Q/sxHx4EMiV97WUM8elZOQdSEC/okt5WUUNpgIu0WF9nSQ1VKZ8C CZmv5xh9ogfwvB/kOm6IVwNkLvVagJQcLwddORI5LLXLbSIUcuwVSuyMp/7iDtQ/ hnHkGs2mIJ2JUYbSSNsSJNs6oTurn8eTFCeGoYKJgd9l4QxaThU= =ekU+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Tue 12 Mar 2019 20:23:08 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: (22 commits) tests/qemu-iotests: add bitmap resize test 246 block/qcow2-bitmap: Allow resizes with persistent bitmaps block/qcow2-bitmap: Don't check size for IN_USE bitmap docs/interop/qcow2: Improve bitmap flag in_use specification bitmaps: Fix typo in function name block/dirty-bitmaps: implement inconsistent bit block/dirty-bitmaps: disallow busy bitmaps as merge source block/dirty-bitmaps: prohibit removing readonly bitmaps block/dirty-bitmaps: prohibit readonly bitmaps for backups block/dirty-bitmaps: add block_dirty_bitmap_check function block/dirty-bitmap: add inconsistent status block/dirty-bitmaps: add inconsistent bit iotests: add busy/recording bit test to 124 blockdev: remove unused paio parameter documentation block/dirty-bitmaps: move comment block block/dirty-bitmaps: unify qmp_locked and user_locked calls block/dirty-bitmap: explicitly lock bitmaps with successors nbd: change error checking order for bitmaps block/dirty-bitmap: change semantics of enabled predicate block/dirty-bitmap: remove set/reset assertions against enabled bit ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # tests/qemu-iotests/group	2019-03-13 17:30:34 +00:00
Stefan Hajnoczi	f357fcd890	file-posix: add drop-cache=on\|off option Commit `dd577a26ff` ("block/file-posix: implement bdrv_co_invalidate_cache() on Linux") introduced page cache invalidation so that cache.direct=off live migration is safe on Linux. The invalidation takes a significant amount of time when the file is large and present in the page cache. Normally this is not the case for cross-host live migration but it can happen when migrating between QEMU processes on the same host. On same-host migration we don't need to invalidate pages for correctness anyway, so an option to skip page cache invalidation is useful. I investigated optimizing invalidation and detecting same-host migration, but both are hard to achieve so a user-visible option will suffice. As a bonus this option means that the cache invalidation feature will now be detectable by libvirt via QMP schema introspection. Suggested-by: Neil Skrypuch <neil@tembosocial.com> Tested-by: Neil Skrypuch <neil@tembosocial.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190307164941.3322-1-stefanha@redhat.com Message-Id: <20190307164941.3322-1-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-13 10:54:55 +00:00
Alberto Garcia	5019aece2a	block: Remove the AioContext parameter from bdrv_reopen_multiple() This parameter has been unused since `1a63a90750` Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	8a2ce0bc1e	block: Add a 'mutable_opts' field to BlockDriver If we reopen a BlockDriverState and there is an option that is present in bs->options but missing from the new set of options then we have to return an error unless the driver is able to reset it to its default value. This patch adds a new 'mutable_opts' field to BlockDriver. This is a list of runtime options that can be modified during reopen. If an option in this list is unspecified on reopen then it must be reset (or return an error). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	077e8e2018	block: Add 'keep_old_opts' parameter to bdrv_reopen_queue() The bdrv_reopen_queue() function is used to create a queue with the BDSs that are going to be reopened and their new options. Once the queue is ready bdrv_reopen_multiple() is called to perform the operation. The original options from each one of the BDSs are kept, with the new options passed to bdrv_reopen_queue() applied on top of them. For "x-blockdev-reopen" we want a function that behaves much like "blockdev-add". We want to ignore the previous set of options so that only the ones actually specified by the user are applied, with the rest having their default values. One of the things that we need is a way to tell bdrv_reopen_queue() whether we want to keep the old set of options or not, and that's what this patch does. All current callers are setting this new parameter to true and x-blockdev-reopen will set it to false. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	6585493369	block: Freeze the backing chain for the duration of the stream job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	ef53dc09ed	block: Freeze the backing chain for the duration of the mirror job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Alberto Garcia	df827336ab	block: Freeze the backing chain for the duration of the commit job Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	23dece19da	file-posix: Make auto-read-only dynamic Until now, with auto-read-only=on we tried to open the file read-write first and if that failed, read-only was tried. This is actually not good enough for libvirt, which gives QEMU SELinux permissions for read-write only as soon as it actually intends to write to the image. So we need to be able to switch between read-only and read-write at runtime. This patch makes auto-read-only dynamic, i.e. the file is opened read-only as long as no user of the node has requested write permissions, but it is automatically reopened read-write as soon as the first writer is attached. Conversely, if the last writer goes away, the file is reopened read-only again. bs->read_only is no longer set for auto-read-only=on files even if the file descriptor is opened read-only because it will be transparently upgraded as soon as a writer is attached. This changes the output of qemu-iotests 232. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	6ceabe6f77	file-posix: Prepare permission code for fd switching In order to be able to dynamically reopen the file read-only or read-write, depending on the users that are attached, we need to be able to switch to a different file descriptor during the permission change. This interacts with reopen, which also creates a new file descriptor and performs permission changes internally. In this case, the permission change code must reuse the reopen file descriptor instead of creating a third one. In turn, reopen can drop its code to copy file locks to the new file descriptor because that is now done when applying the new permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	a6aeca0ca5	file-posix: Lock new fd in raw_reopen_prepare() There is no reason why we can take locks on the new file descriptor only in raw_reopen_commit() where error handling isn't possible any more. Instead, we can already do this in raw_reopen_prepare(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	e0c9cf3a48	file-posix: Store BDRVRawState.reopen_state during reopen We'll want to access the file descriptor in the reopen_state while processing permission changes in the context of the repoen. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Kevin Wolf	5cec287025	file-posix: Factor out raw_reconfigure_getfd() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
Vladimir Sementsov-Ogievskiy	cb8aac3783	qapi: drop x- from x-block-latency-histogram-set Drop x- and x_ prefixes for latency histograms and update version to 4.0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:08 +01:00
John Snow	d19c6b36ff	block/qcow2-bitmap: Allow resizes with persistent bitmaps Since we now load all bitmaps into memory anyway, we can just truncate them in-memory and then flush them back to disk. Just in case, we will still check and enforce that this shortcut is valid -- i.e. that any bitmap described on-disk is indeed in-memory and can be modified. If there are any inconsistent bitmaps, we refuse to allow the truncate as we do not actually load these bitmaps into memory, and it isn't safe or reasonable to attempt to truncate corrupted data. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190311185147.52309-4-vsementsov@virtuozzo.com [vsementsov: drop bitmap flushing, fix block comments style] Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 14:57:38 -04:00
Vladimir Sementsov-Ogievskiy	bf5f0cf5d8	block/qcow2-bitmap: Don't check size for IN_USE bitmap We are going to allow image resize when there are persistent bitmaps. It may lead to appearing of inconsistent bitmaps (IN_USE=1) with inconsistent size. But we still want to load them as inconsistent. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190311185147.52309-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 14:50:28 -04:00
Eric Blake	796a3798ab	bitmaps: Fix typo in function name Commit `a88b179f` introduced the ability to set and query bitmap persistence, but with an atypical spelling. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20190308205845.25734-1-eblake@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	74da6b9435	block/dirty-bitmaps: implement inconsistent bit Set the inconsistent bit on load instead of rejecting such bitmaps. There is no way to un-set it; the only option is to delete the bitmap. Obvervations: - bitmap loading does not need to update the header for in_use bitmaps. - inconsistent bitmaps don't need to have their data loaded; they're glorified corruption sentinels. - bitmap saving does not need to save inconsistent bitmaps back to disk. - bitmap reopening DOES need to drop the readonly flag from inconsistent bitmaps to allow reopening of qcow2 files with non-qemu-owned bitmaps being eventually flushed back to disk. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190301191545.8728-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	cb8e58e3de	block/dirty-bitmaps: disallow busy bitmaps as merge source We didn't do any state checking on source bitmaps at all, so this adds inconsistent and busy checks. readonly is allowed, so you can still copy a readonly bitmap to a new destination to use it for operations like drive-backup. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	3ae96d6684	block/dirty-bitmaps: add block_dirty_bitmap_check function Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	0064cfefa4	block/dirty-bitmap: add inconsistent status Even though the status field is deprecated, we still have to support it for a few more releases. Since this is a very new kind of bitmap state, it makes sense for it to have its own status field. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	b0f455599d	block/dirty-bitmaps: add inconsistent bit Add an inconsistent bit to dirty-bitmaps that allows us to report a bitmap as persistent but potentially inconsistent, i.e. if we find bitmaps on a qcow2 that have been marked as "in use". Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	1e6fddcd6f	block/dirty-bitmaps: move comment block Simply move the big status enum comment block to above the status function, and document it as being deprecated. The whole confusing block can get deleted in three releases time. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-9-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	27a1b301a4	block/dirty-bitmaps: unify qmp_locked and user_locked calls These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	21d2376f26	block/dirty-bitmap: explicitly lock bitmaps with successors Instead of implying a user_locked/busy status, make it explicit. Now, bitmaps in use by migration, NBD or backup operations are all treated the same way with the same code paths. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	8b2e20f64f	block/dirty-bitmap: change semantics of enabled predicate Currently, the enabled predicate means something like: "the QAPI status of the bitmap is ACTIVE." After this patch, it should mean exclusively: "This bitmap is recording guest writes, and is allowed to do so." In many places, this is how this predicate was already used. Internal usages of the bitmap QPI can call user_locked to find out if the bitmap is in use by an operation. To accommodate this, modify the create_successor routine to now explicitly disable the parent bitmap at creation time. Justifications: 1. bdrv_dirty_bitmap_status suffers no change from the lack of 1:1 parity with the new predicates because of the order in which the predicates are checked. This is now only for compatibility. 2. bdrv_set_dirty() is unchanged: pre-patch, it was skipping bitmaps that were disabled or had a successor, while post-patch it is only skipping bitmaps that are disabled. To accommodate this, create_successor now ensures that any bitmap with a successor is explicitly disabled. 3. qcow2_store_persistent_dirty_bitmaps: No functional change. This function cares only about the literal enabled bit, and makes no effort to check if the bitmap is in-use or not. After this patch there are still no ways to produce an enabled bitmap with a successor. 4. block_dirty_bitmap_enable_prepare block_dirty_bitmap_disable_prepare init_dirty_bitmap_migration nbd_export_new These functions care about the literal enabled bit, and already check user_locked separately. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-5-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	c28ddbb07e	block/dirty-bitmap: remove set/reset assertions against enabled bit bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap are only used as an internal API by the mirror and migration areas of our code. These calls modify the bitmap, but do so at the behest of QEMU and not the guest. Presently, these bitmaps are always "enabled" anyway, but there's no reason they have to be. Modify these internal APIs to drop this assertion. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	50a47257f8	block/dirty-bitmaps: rename frozen predicate helper "Frozen" was a good description a long time ago, but it isn't adequate now. Rename the frozen predicate to has_successor to make the semantics of the predicate more clear to outside callers. In the process, remove some calls to frozen() that no longer semantically make sense. For bdrv_enable_dirty_bitmap_locked and bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU internals from performing this action when we only wished to prohibit QMP users from issuing these commands. All of the QMP API commands for bitmap manipulation already check against user_locked() to prohibit these actions. Several other assertions really want to check that the bitmap isn't in-use by another operation -- use the bitmap_user_locked function for this instead, which presently also checks for has_successor. This leaves some redundant checks of has_successor through different helpers that are addressed in forthcoming patches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	4db6ceb0b5	block/dirty-bitmap: add recording and busy properties The current API allows us to report a single status, which we've defined as: Frozen: has a successor, treated as qmp_locked, may or may not be enabled. Locked: no successor, qmp_locked. may or may not be enabled. Disabled: Not frozen or locked, disabled. Active: Not frozen, locked, or disabled. The problem is that both "Frozen" and "Locked" mean nearly the same thing, and that both of them do not intuit whether they are recording guest writes or not. This patch deprecates that status field and introduces two orthogonal properties instead to replace it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
Niels de Vos	0e3b891fef	gluster: the glfs_io_cbk callback function pointer adds pre/post stat args The glfs_*_async() functions do a callback once finished. This callback has changed its arguments, pre- and post-stat structures have been added. This makes it possible to improve caching, which is useful for Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first release that includes these new arguments. With an additional detection in ./configure, the new arguments can conditionally get included in the glfs_io_cbk handler. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 14:26:49 +01:00

1 2 3 4 5 ...

4236 commits