qemu-patch-raspberry4

Author	SHA1	Message	Date
Max Reitz	24552feb6a	qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK Masks for L2 table entries should have 64 bit. Fixes: `b6c246942b` Buglink: https://bugs.launchpad.net/qemu/+bug/1850000 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191028161841.1198-2-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-07 14:37:46 +01:00
Tuguoyi	570542ecb1	qcow2-bitmap: Fix uint64_t left-shift overflow There are two issues in In check_constraints_on_bitmap(), 1) The sanity check on the granularity will cause uint64_t integer left-shift overflow when cluster_size is 2M and the granularity is BIGGER than 32K. 2) The way to calculate image size that the maximum bitmap supported can map to is a bit incorrect. This patch fix it by add a helper function to calculate the number of bytes needed by a normal bitmap in image and compare it to the maximum bitmap bytes supported by qemu. Fixes: `5f72826e7f` Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Message-id: 4ba40cd1e7ee4a708b40899952e49f22@h3c.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-07 14:37:33 +01:00
Max Reitz	292d06b925	block/file-posix: Let post-EOF fallocate serialize The XFS kernel driver has a bug that may cause data corruption for qcow2 images as of qemu commit `c8bb23cbdb`. We can work around it by treating post-EOF fallocates as serializing up until infinity (INT64_MAX in practice). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:33:51 +01:00
Max Reitz	c28107e9e5	block: Add bdrv_co_get_self_request() Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:32:51 +01:00
Max Reitz	304d9d7f03	block: Make wait/mark serialising requests public Make both bdrv_mark_request_serialising() and bdrv_wait_serialising_requests() public so they can be used from block drivers. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:29:15 +01:00
Vladimir Sementsov-Ogievskiy	dcfbece684	block/block-copy: fix s->copy_size for compressed cluster `0e2402452f` allowed writes larger than cluster, but that's unsupported for compressed write. Fix it. Fixes: `0e2402452f` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191029150934.26416-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:21:45 +01:00
Peter Maydell	aaffb85335	Block patches for softfreeze: - iotest patches - Improve performance of the mirror block job in write-blocking mode - Limit memory usage for the backup block job - Add discard and write-zeroes support to the NVMe host block driver - Fix a bug in the mirror job - Prevent the qcow2 driver from creating technically non-compliant qcow2 v3 images (where there is not enough extra data for snapshot table entries) - Allow callers of bdrv_truncate() (etc.) to determine whether the file must be resized to the exact given size or whether it is OK for block devices not to shrink -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl2224ESHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AeXMH/RXKEX4BZYMRKCe41P18tJC9Bl2x0T20 YeOsZVvpARlr7o/36BF2kGFF4MnL0OQ+9ELuyROX865rk/VL2rWqnHDE5oQM889a dFwMs+0zvNbig3iLNcw0H5OkE2mrdM+a1EUdn/lBe/39Z8dPqPxRGqIYHq38Ugdu emwSy1nWen7o0f71HRJfyVtI3KcrzXx71FrA/FY2yL/eHz+zRYGZj2SpAdFPkXP/ lgaz+m0tWhnSW1QzEOXB0Gh69ULt/DczCinYmv5qUY1noW5TPPtiDNCQTts5O4ba oJsR3AJv5/l9m65JTmiyQSqnQfPcstrQ5FqOcSnP637cfqUFyWsvdks= =L7v1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-28' into staging Block patches for softfreeze: - iotest patches - Improve performance of the mirror block job in write-blocking mode - Limit memory usage for the backup block job - Add discard and write-zeroes support to the NVMe host block driver - Fix a bug in the mirror job - Prevent the qcow2 driver from creating technically non-compliant qcow2 v3 images (where there is not enough extra data for snapshot table entries) - Allow callers of bdrv_truncate() (etc.) to determine whether the file must be resized to the exact given size or whether it is OK for block devices not to shrink # gpg: Signature made Mon 28 Oct 2019 12:13:53 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2019-10-28: (69 commits) qemu-iotests: restrict 264 to qcow2 only Revert "qemu-img: Check post-truncation size" block: Pass truncate exact=true where reasonable block: Let format drivers pass @exact block: Evaluate @exact in protocol drivers block: Add @exact parameter to bdrv_co_truncate() block: Do not truncate file node when formatting block/cor: Drop cor_co_truncate() block: Handle filter truncation like native impl. iotests: Test qcow2's snapshot table handling iotests: Add peek_file* functions qcow2: Fix v3 snapshot table entry compliancy qcow2: Repair snapshot table with too many entries qcow2: Fix overly long snapshot tables qcow2: Keep track of the snapshot table length qcow2: Fix broken snapshot table entries qcow2: Add qcow2_check_fix_snapshot_table() qcow2: Separate qcow2_check_read_snapshot_table() qcow2: Write v3-compliant snapshot list on upgrade qcow2: Put qcow2_upgrade() into its own function ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-10-28 14:40:01 +00:00
Max Reitz	e8d04f9237	block: Pass truncate exact=true where reasonable This is a change in behavior, so all instances need a good justification. The comments added here should explain my reasoning. qed already had a comment that suggests it always expected bdrv_truncate()/blk_truncate() to behave as if exact=true were passed (`c743849bee` came eight months before `55b949c847`), so it was simply broken until now. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [mreitz: Changed comment in qed.c to explain why a new QED file must be empty, as requested and suggested by Maxim] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:08:45 +01:00
Max Reitz	e61a28a9b6	block: Let format drivers pass @exact When truncating a format node, the @exact parameter is generally handled simply by virtue of the format storing the new size in the image metadata. Such formats do not need to pass on the parameter to their file nodes. There are exceptions, though: - raw and crypto cannot store the image size, and thus must pass on @exact. - When using qcow2 with an external data file, it just makes sense to keep its size in sync with the qcow2 virtual disk (because the external data file is the virtual disk). Therefore, we should pass @exact when truncating it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-7-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:05:30 +01:00
Max Reitz	82325ae5f2	block: Evaluate @exact in protocol drivers We have two protocol drivers that return success when trying to shrink a block device even though they cannot shrink it. This behavior is now only allowed with exact=false, so they should return an error with exact=true. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:05:24 +01:00
Max Reitz	c80d8b06cf	block: Add @exact parameter to bdrv_co_truncate() We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:00:07 +01:00
Max Reitz	26536c7fc2	block: Do not truncate file node when formatting There is no reason why the format drivers need to truncate the protocol node when formatting it. When using the old .bdrv_co_create_ops() interface, the file will be created with no size option anyway, which generally gives it a size of 0. (Exceptions are block devices, which cannot be truncated anyway.) When using blockdev-create, the user must have given the file node some size anyway, so there is no reason why we should override that. qed is an exception, it needs the file to start completely empty (as explained by `c743849bee`). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:57 +01:00
Max Reitz	bb8160eb78	block/cor: Drop cor_co_truncate() No other filter driver has a .bdrv_co_truncate() implementation, and there is no need to because the general block layer code can handle it just as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:51 +01:00
Max Reitz	6b7e8f8b1c	block: Handle filter truncation like native impl. Make the filter truncation (passing it through to bs->file) a first-class citizen and handle it exactly as if it was the filter driver's native implementation of .bdrv_co_truncate(). I do not see a reason not to, it makes the code a bit shorter, and may be even more correct because this gets us to finish the write_req that we prepared before (may be important to e.g. bring dirty bitmaps to the correct size). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:45 +01:00
Max Reitz	e40e6e88f6	qcow2: Fix v3 snapshot table entry compliancy qcow2 v3 images require every snapshot table entry to have at least 16 bytes of extra data. If they do not, let qemu-img check -r all fix it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:09 +01:00
Max Reitz	d2b1d1ec73	qcow2: Repair snapshot table with too many entries The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (65536 entries) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:08 +01:00
Max Reitz	099febf3ac	qcow2: Fix overly long snapshot tables We currently refuse to open qcow2 images with overly long snapshot tables. This patch makes qemu-img check -r all drop all offending entries past what we deem acceptable. The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (64 MB) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:04 +01:00
Max Reitz	624143355c	qcow2: Keep track of the snapshot table length When repairing the snapshot table, we truncate entries that have too much extra data. This frees up space that we do not have to count towards the snapshot table size. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	f91f1f159b	qcow2: Fix broken snapshot table entries The only case where we currently reject snapshot table entries is when they have too much extra data. Fix them with qemu-img check -r all by counting it as a corruption, reducing their extra_data_size, and then letting qcow2_check_fix_snapshot_table() do the rest. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	fe446b5da2	qcow2: Add qcow2_check_fix_snapshot_table() qcow2_check_read_snapshot_table() can perform consistency checks, but it cannot fix everything. Specifically, it cannot allocate new clusters, because that should wait until the refcount structures are known to be consistent (i.e., after qcow2_check_refcounts()). Thus, it cannot call qcow2_write_snapshots(). Do that in qcow2_check_fix_snapshot_table(), which is called after qcow2_check_refcounts(). Currently, there is nothing that would set result->corruptions, so this is a no-op. A follow-up patch will change that. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:01 +01:00
Max Reitz	8bc584fe03	qcow2: Separate qcow2_check_read_snapshot_table() Reading the snapshot table can fail. That is a problem when we want to repair the image. Therefore, stop reading the snapshot table in qcow2_do_open() in check mode. Instead, add a new function qcow2_check_read_snapshot_table() that reads the snapshot table at a later point. In the future, we want to handle errors here and fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:00 +01:00
Max Reitz	0a85af351d	qcow2: Write v3-compliant snapshot list on upgrade qcow2 v3 requires every snapshot table entry to have two extra data fields: The 64-bit VM state size, and the virtual disk size. Both are optional for v2 images, so they may not be present. qcow2_upgrade() therefore should update the snapshot table to ensure all entries have these extra data fields. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347 Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:52 +01:00
Max Reitz	722efb0c7c	qcow2: Put qcow2_upgrade() into its own function This does not make sense right now, but it will make sense once we need to do more than to just update s->qcow_version. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:20 +01:00
Max Reitz	e0314b56b2	qcow2: Make qcow2_write_snapshots() public Updating the snapshot list will be useful when upgrading a v2 image to v3, so we will need to call this function in qcow2.c. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:17 +01:00
Max Reitz	fcf9a6b728	qcow2: Keep unknown extra snapshot data The qcow2 specification says to ignore unknown extra data fields in snapshot table entries. Currently, we discard it whenever we update the image, which is a bit different from "ignore". This patch makes the qcow2 driver keep all unknown extra data fields when updating an image's snapshot table. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-5-mreitz@redhat.com [mreitz: Adjusted comments as proposed by Eric] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:52:57 +01:00
Max Reitz	ecf6c7c0c1	qcow2: Add Error ** to qcow2_read_snapshots() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:51:09 +01:00
Max Reitz	d8fa8442ad	qcow2: Use endof() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:51:08 +01:00
Max Reitz	f93c3add3a	mirror: Do not dereference invalid pointers mirror_exit_common() may be called twice (if it is called from mirror_prepare() and fails, it will be called from mirror_abort() again). In such a case, many of the pointers in the MirrorBlockJob object will already be freed. This can be seen most reliably for s->target, which is set to NULL (and then dereferenced by blk_bs()). Cc: qemu-stable@nongnu.org Fixes: `737efc1eda` Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191014153931.20699-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:49:37 +01:00
Maxim Levitsky	e87a09d625	block/nvme: add support for discard Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-3-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:35 +01:00
Maxim Levitsky	e0dd95e373	block/nvme: add support for write zeros Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:30 +01:00
Vladimir Sementsov-Ogievskiy	0e2402452f	block/block-copy: increase buffered copy request No reason to limit buffered copy to one cluster. Let's allow up to 1 MiB. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	7f739d0e53	block/block-copy: add memory limit Currently total allocation for parallel requests to block-copy instance is unlimited. Let's limit it to 128 MiB. For now block-copy is used only in backup, so actually we limit total allocation for backup job. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	e332a726da	block/block-copy: refactor copying Merge copying code into one function block_copy_do_copy, which only calls bdrv_ io functions and don't do any synchronization (like dirty bitmap set/reset). Refactor block_copy() function so that it takes full decision about size of chunk to be copied and does all the synchronization (checking intersecting requests, set/reset dirty bitmaps). It will help: - introduce parallel processing of block_copy iterations: we need to calculate chunk size, start async chunk copying and go to the next iteration - simplify synchronization improvement (like memory limiting in further commit and reducing critical section (now we lock the whole requested range, when actually we need to lock only dirty region which we handle at the moment)) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	b3b7036afb	block/block-copy: limit copy_range_size to 16 MiB Large copy range may imply memory allocation and large io effort, so using 2G copy range request may be bad idea. Let's limit it to 16 MiB. It also helps the following patch to refactor copy-with-offload fallback to copy-with-bounce-buffer. Note, that total memory usage of backup is still not limited, it will be fixed in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	3816edd2cb	block/block-copy: allocate buffer in block_copy_with_bounce_buffer Move bounce_buffer allocation block_copy_with_bounce_buffer. This commit simplifies further work on implementing copying by larger chunks (of different size) and further asynchronous handling of block_copy iterations (with help of block/aio_task API). Allocation works fast, a lot faster than disk io, so it's not a problem that we now allocate/free bounce_buffer more times. And we anyway will have to allocate several bounce_buffers for parallel execution of loop iterations in future. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	994b44ab20	Revert "mirror: Only mirror granularity-aligned chunks" This reverts commit `9adc1cb49a`. "mirror: Only mirror granularity-aligned chunks" Since previous commit unaligned chunks are supported by do_sync_target_write. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	dbdf699cad	block/mirror: support unaligned write in active mirror Prior `9adc1cb49a` do_sync_target_write had a bug: it reset aligned-up region in the dirty bitmap, which means that we may not copy some bytes and assume them copied, which actually leads to producing corrupted target. So `9adc1cb49a` forced dirty bitmap granularity to be request_alignment for mirror-top filter, so we are not working with unaligned requests. However forcing large alignment obviously decreases performance of unaligned requests. This commit provides another solution for the problem: if unaligned padding is already dirty, we can safely ignore it, as 1. It's dirty, it will be copied by mirror_iteration anyway 2. It's dirty, so skipping it now we don't increase dirtiness of the bitmap and therefore don't damage "synchronicity" of the write-blocking mirror. If unaligned padding is not dirty, we just write it, no reason to touch dirty bitmap if we succeed (on failure we'll set the whole region ofcourse, but we loss "synchronicity" on failure anyway). Note: we need to disable dirty_bitmap, otherwise we will not be able to see in do_sync_target_write bitmap state before current operation. We may of course check dirty bitmap before the operation in bdrv_mirror_top_do_write and remember it, but we don't need active dirty bitmap for write-blocking mirror anyway. New code-path is unused until the following commit reverts `9adc1cb49a`. Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191011090711.19940-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	b30168647f	block/block-backend: add blk_co_pwritev_part Add blk write function with qiov_offset parameter. It's needed for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	5c511ac375	block/mirror: simplify do_sync_target_write do_sync_target_write is called from bdrv_mirror_top_do_write after write/discard operation, all inside active_write/active_write_settle protecting us from mirror iteration. So the whole area is dirty for sure, no reason to examine dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
Kevin Wolf	5e97855052	qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation() qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which requires s->lock to be taken to protect its accesses to the refcount table and refcount blocks. However, nothing in this code path actually took the lock. This could cause the same cache entry to be used by two requests at the same time, for different tables at different offsets, resulting in image corruption. As it would be preferable to base the detection on consistent data (even though it's just heuristics), let's take the lock not only around the qcow2_get_refcount() calls, but around the whole function. This patch takes the lock in qcow2_co_block_status() earlier and asserts in qcow2_detect_metadata_preallocation() that we hold the lock. Fixes: `69f47505ee` Cc: qemu-stable@nongnu.org Reported-by: Michael Weiser <michael.weiser@gmx.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-10-25 15:18:55 +02:00
Vladimir Sementsov-Ogievskiy	8ccf458af5	block/backup: drop dead code from backup_job_create After commit `00e30f05de`, there is no more "goto error" points after job creation, so after "error:" @job is always NULL and we don't need roll-back job creation. Reported-by: Coverity (CID 1406402) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-25 15:15:01 +02:00
Vladimir Sementsov-Ogievskiy	f7651539d8	block/nbd: nbd reconnect Implement reconnect. To achieve this: 1. add new modes: connecting-wait: means, that reconnecting is in progress, and there were small number of reconnect attempts, so all requests are waiting for the connection. connecting-nowait: reconnecting is in progress, there were a lot of attempts of reconnect, all requests will return errors. two old modes are used too: connected: normal state quit: exiting after fatal error or on close Possible transitions are: * -> quit connecting-* -> connected connecting-wait -> connecting-nowait (transition is done after reconnect-delay seconds in connecting-wait mode) connected -> connecting-wait 2. Implement reconnect in connection_co. So, in connecting-* mode, connection_co, tries to reconnect unlimited times. 3. Retry nbd queries on channel error, if we are in connecting-wait state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191009084158.15614-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-10-22 09:22:07 -05:00
Vladimir Sementsov-Ogievskiy	4dd09f6223	qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	f6333cbf8b	block/qcow2-bitmap: fix and improve qcow2_reopen_bitmaps_rw - Correct check for write access to file child, and in correct place (only if we want to write). - Support reopen rw -> rw (which will be used in following commit), for example, !bdrv_dirty_bitmap_readonly() is not a corruption if bitmap is marked IN_USE in the image. - Consider unexpected bitmap as a corruption and check other combinations of in-image and in-RAM bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190927122355.7344-9-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	644ddbb754	block/qcow2-bitmap: do not remove bitmaps on reopen-ro qcow2_reopen_bitmaps_ro wants to store bitmaps and then mark them all readonly. But the latter don't work, as qcow2_store_persistent_dirty_bitmaps removes bitmaps after storing. It's OK for inactivation but bad idea for reopen-ro. And this leads to the following bug: Assume we have persistent bitmap 'bitmap0'. Create external snapshot bitmap0 is stored and therefore removed Commit snapshot now we have no bitmaps Do some writes from guest () they are not marked in bitmap Shutdown Start bitmap0 is loaded as valid, but it is actually broken! It misses writes () Incremental backup it will be inconsistent So, let's stop removing bitmaps on reopen-ro. But don't rejoice: reopening bitmaps to rw is broken too, so the whole scenario will not work after this patch and we can't enable corresponding test cases in 260 iotests still. Reopening bitmaps rw will be fixed in the following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	bd429a884c	block/qcow2-bitmap: drop qcow2_reopen_bitmaps_rw_hint() The function is unused, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	f88676c149	block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Firstly, no reason to optimize failure path. Then, function name is ambiguous: it checks for readonly and similar things, but someone may think that it will ignore normal bitmaps which was just unchanged, and this is in bad relation with the fact that we should drop IN_USE flag for unchanged bitmaps in the image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	ef9041a7b8	block/dirty-bitmap: refactor bdrv_dirty_bitmap_next bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	1e63830160	block/dirty-bitmap: drop BdrvDirtyBitmap.mutex mutex field is just a pointer to bs->dirty_bitmap_mutex, so no needs to store it in BdrvDirtyBitmap when we have bs pointer in it (since previous patch). Drop mutex field. Constantly use bdrv_dirty_bitmaps_lock/unlock in block/dirty-bitmap.c to make it more obvious that it's not per-bitmap lock. Still, for simplicity, leave bdrv_dirty_bitmap_lock/unlock functions as an external API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	5deb6cbd1f	block/dirty-bitmap: add bs link Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	767db3aad8	block/dirty-bitmap: drop meta Drop meta bitmaps, as they are unused. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	d2c3080e41	block/qcow2: proper locking on bitmap add/remove paths qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire aio context since `0a6c86d024`. But this is not enough: we also must lock qcow2 mutex when access in-image metadata. Especially it concerns freeing qcow2 clusters. To achieve this, move qcow2_can_store_new_dirty_bitmap and qcow2_remove_persistent_dirty_bitmap to coroutine context. Since we work in coroutines in correct aio context, we don't need context acquiring in blockdev.c anymore, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	b56a1e3175	block/dirty-bitmap: return int from bdrv_remove_persistent_dirty_bitmap It's more comfortable to not deal with local_err. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	85cc8a4f6b	block: move bdrv_can_store_new_dirty_bitmap to block/dirty-bitmap.c block/dirty-bitmap.c seems to be more appropriate for it and bdrv_remove_persistent_dirty_bitmap already in it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Max Reitz	d1b9d19f99	qcow2: Limit total allocation range to INT_MAX When the COW areas are included, the size of an allocation can exceed INT_MAX. This is kind of limited by handle_alloc() in that it already caps avail_bytes at INT_MAX, but the number of clusters still reflects the original length. This can have all sorts of effects, ranging from the storage layer write call failing to image corruption. (If there were no image corruption, then I suppose there would be data loss because the .cow_end area is forced to be empty, even though there might be something we need to COW.) Fix all of it by limiting nb_clusters so the equivalent number of bytes will not exceed INT_MAX. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Alberto Garcia	f2208fdc5b	block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK The BDRV_REQ_NO_FALLBACK flag means that an operation should only be performed if it can be offloaded or otherwise performed efficiently. However a misaligned write request requires a RMW so we should return an error and let the caller decide how to proceed. This hits an assertion since commit `c8bb23cbdb` if the required alignment is larger than the cluster size: qemu-img create -f qcow2 -o cluster_size=2k img.qcow2 4G qemu-io -c "open -o driver=qcow2,file.align=4k blkdebug::img.qcow2" \ -c 'write 0 512' qemu-io: block/io.c:1127: bdrv_driver_pwritev: Assertion `!(flags & BDRV_REQ_NO_FALLBACK)' failed. Aborted The reason is that when writing to an unallocated cluster we try to skip the copy-on-write part and zeroize it using BDRV_REQ_NO_FALLBACK instead, resulting in a write request that is too small (2KB cluster size vs 4KB required alignment). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	e4ec5ad464	replay: add BH oneshot event for block layer Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	c8aa7895eb	replay: don't drain/flush bdrv queue while RR is working In record/replay mode bdrv queue is controlled by replay mechanism. It does not allow saving or loading the snapshots when bdrv queue is not empty. Stopping the VM is not blocked by nonempty queue, but flushing the queue is still impossible there, because it may cause deadlocks in replay mode. This patch disables bdrv_drain_all and bdrv_flush_all in record/replay mode. Stopping the machine when the IO requests are not finished is needed for the debugging. E.g., breakpoint may be set at the specified step, and forcing the IO requests to finish may break the determinism of the execution. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	3c6c4348f2	block: implement bdrv_snapshot_goto for blkreplay This patch enables making snapshots with blkreplay used in block devices. This function is required to make bdrv_snapshot_goto without calling .bdrv_open which is not implemented. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Peter Lieven	6caaad46de	block/vhdx: add check for truncated image files qemu is currently not able to detect truncated vhdx image files. Add a basic check if all allocated blocks are reachable at open and report all errors during bdrv_co_check. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Maxim Levitsky	e99754b42e	nbd: add empty .bdrv_reopen_prepare Fixes commit job / qemu-img commit, when commiting qcow2 file which is based on nbd export. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1718727 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190930213820.29777-2-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	00e30f05de	block/backup: use backup-top instead of write notifiers Drop write notifiers and use filter node instead. = Changes = 1. Add filter-node-name argument for backup qmp api. We have to do it in this commit, as 257 needs to be fixed. 2. There are no more write notifiers here, so is_write_notifier parameter is dropped from block-copy paths. 3. To sync with in-flight requests at job finish we now have drained removing of the filter, we don't need rw-lock. 4. Block-copy is now using BdrvChildren instead of BlockBackends 5. As backup-top owns these children, we also move block-copy state into backup-top's ownership. = Iotest changes = 56: op-blocker doesn't shoot now, as we set it on source, but then check on filter, when trying to start second backup. To keep the test we instead can catch another collision: both jobs will get 'drive0' job-id, as job-id parameter is unspecified. To prevent interleaving with file-posix locks (as they are dependent on config) let's use another target for second backup. Also, it's obvious now that we'd like to drop this op-blocker at all and add a test-case for two backups from one node (to different destinations) actually works. But not in these series. 141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk check inside qmp_blockdev_del. But we've dropped block-copy blk objects, so no more blk objects on source bs (job blk is on backup-top filter bs). New message is from op-blocker, which is the next check in qmp_blockdev_add. 257: The test wants to emulate guest write during backup. They should go to filter node, not to original source node, of course. Therefore we need to specify filter node name and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	7df7868b96	block: introduce backup-top filter driver Backup-top filter caches write operations and does copy-before-write operations. The driver will be used in backup instead of write-notifiers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	0f4b02b73e	block/block-copy: split block_copy_set_callbacks function Split block_copy_set_callbacks out of block_copy_state_new. It's needed for further commit: block-copy will use BdrvChildren of backup-top filter, so it will be created from backup-top filter creation function. But callbacks will still belong to backup job and will be set in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	843670f30f	block/backup: move write_flags calculation inside backup_job_create This is logic-less refactoring, which simplifies further patch, as we'll need write_flags for backup-top filter creation and backup-top should be created before block job creation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-3-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	a6ffe1998c	block/backup: move in-flight requests handling from backup to block-copy Move synchronization mechanism to block-copy, to be able to use one block-copy instance from backup job and backup-top filter in parallel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	d924559953	qapi: query-blockstat: add driver specific file-posix stats A block driver can provide a callback to report driver-specific statistics. file-posix driver now reports discard statistics Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	1c45036636	file-posix: account discard operations This will help to identify how many of the user-issued discard operations (accounted on a device level) have actually suceeded down on the host file (even though the numbers will not be exactly the same if non-raw format driver is used (e.g. qcow2 sending metadata discards)). Note that these numbers will not include discards triggered by write-zeroes + MAY_UNMAP calls. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-9-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	f344446654	block: add empty account cookie type Each block_acct_done/failed call is designed to correspond to a previous block_acct_start call, which initializes the stats cookie. However sometimes it is not the case, e.g. some error paths might report the same cookie twice because it is hard to accurately track if the cookie was reported yet or not. This patch cleans the cookie after report. (Note: block_acct_failed/done without a previous block_acct_start at all should be avoided. Uninitialized cookie might hold a garbage value and there is still "< BLOCK_MAX_IOTYPE" assertion for that) It will be particularly useful in ide code where it's hard to keep track whether the request done its accounting or not: in the following patch of the series, trim requests will do the accounting separately. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	159f85ddc8	qapi: add unmap to BlockDeviceStats Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	beb5f5450d	block: move block_copy from block/backup.c to separate file Split block_copy to separate file, to be cleanly shared with backup-top filter driver in further commits. It's a clean movement, the only change is drop "static" from interface functions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	0e23e382b7	block/backup: fix block-comment style We need to fix comment style around block-copy functions before further moving them to separate file to satisfy checkpatch. But do more: fix all comments style. Also, seems like doubled first asterisk is not forbidden, but drop it too for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	2c8074c453	block/backup: introduce BlockCopyState Split copying code part from backup to "block-copy", including separate state structure and function renaming. This is needed to share it with backup-top filter driver in further commits. Notes: 1. As BlockCopyState keeps own BlockBackend objects, remaining job->common.blk users only use it to get bs by blk_bs() call, so clear job->commen.blk permissions set in block_job_create and add job->source_bs to be used instead of blk_bs(job->common.blk), to keep it more clear which bs we use when introduce backup-top filter in further commit. 2. Rename s/initializing_bitmap/skip_unallocated/ to sound a bit better as interface to BlockCopyState 3. Split is not very clean: there left some duplicated fields, backup code uses some BlockCopyState fields directly, let's postpone it for further improvements and keep this comment simpler for review. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190920142056.12778-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	372c67ea61	block/backup: improve comment about image fleecing Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	0bd0c44372	block/backup: split shareable copying part from backup_do_cow Split copying logic which will be shared with backup-top filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	1048ddf0a3	block/backup: fix backup_cow_with_offload for last cluster We shouldn't try to copy bytes beyond EOF. Fix it. Fixes: `9ded4a0114` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	981fb5810a	block/backup: fix max_transfer handling for copy_range Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we are trying to find aligned size which satisfy both source and target. Also, don't ignore too small max_transfer. In this case seems safer to disable copy_range. Fixes: `9ded4a0114` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	d710cf575a	block/qcow2: introduce parallel subrequest handling in read and write It improves performance for fragmented qcow2 images. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	6aa7a2631b	block/qcow2: refactor qcow2_co_pwritev_part Similarly to previous commit, prepare for parallelizing write-loop iterations. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	88f468e546	block/qcow2: refactor qcow2_co_preadv_part Further patch will run partial requests of iterations of qcow2_co_preadv in parallel for performance reasons. To prepare for this, separate part which may be parallelized into separate function (qcow2_co_preadv_task). While being here, also separate encrypted clusters reading to own function, like it is done for compressed reading. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	6e9b225f73	block: introduce aio task pool Common interface for aio task loops. To be used for improving performance of synchronous io loops in qcow2, block-stream, copy-on-read, and may be other places. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Max Reitz	8644476e51	block: Skip COR for inactive nodes We must not write data to inactive nodes, and a COR is certainly something we can simply not do without upsetting anyone. So skip COR operations on inactive nodes. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191001174827.11081-2-mreitz@redhat.com Message-Id: <20191001174827.11081-2-mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-10-08 14:28:25 +01:00
Kevin Wolf	05f4aced65	block/snapshot: Restrict set of snapshot nodes Nodes involved in internal snapshots were those that were returned by bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all nodes that are either the root node of a BlockBackend or monitor-owned nodes. With the typical -drive use, this worked well enough. However, in the typical -blockdev case, the user defines one node per option, making all nodes monitor-owned nodes. This includes protocol nodes etc. which often are not snapshottable, so "savevm" only returns an error. Change the conditions so that internal snapshot still include all nodes that have a BlockBackend attached (we definitely want to snapshot anything attached to a guest device and probably also the built-in NBD server; snapshotting block job BlockBackends is more of an accident, but a preexisting one), but other monitor-owned nodes are only included if they have no parents. This makes internal snapshots usable again with typical -blockdev configurations. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com>	2019-10-04 11:52:40 +02:00
Philippe Mathieu-Daudé	31e404151b	cutils: Move size_to_str() from "qemu-common.h" to "qemu/cutils.h" "qemu/cutils.h" contains various qemu_strtosz_*() functions useful to convert strings to size. It seems natural to have the opposite usage (from size to string) there too. The function definition is already in util/cutils.c. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190903120555.7551-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-09-19 11:57:34 +02:00
Maxim Levitsky	603fbd076c	block/qcow2: refactor encryption code * Change the qcow2_co_{encrypt\|decrypt} to just receive full host and guest offsets and use this function directly instead of calling do_perform_cow_encrypt (which is removed by that patch). * Adjust qcow2_co_encdec to take full host and guest offsets as well. * Document the qcow2_co_{encrypt\|decrypt} arguments to prevent the bug fixed in former commit from hopefully happening again. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190915203655.21638-3-mlevitsk@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [mreitz: Let perform_cow() return the error value returned by qcow2_co_encrypt(), as proposed by Vladimir] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:36:22 +02:00
Maxim Levitsky	38e7d54bdc	block/qcow2: Fix corruption introduced by commit `8ac0f15f33` This fixes subtle corruption introduced by luks threaded encryption in commit `8ac0f15f33` Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922 The corruption happens when we do a write that * writes to two or more unallocated clusters at once * doesn't fully cover the first sector * doesn't fully cover the last sector * uses luks encryption In this case, when allocating the new clusters we COW both areas prior to the write and after the write, and we encrypt them. The above mentioned commit accidentally made it so we encrypt the second COW area using the physical cluster offset of the first area. The problem is that offset_in_cluster in do_perform_cow_encrypt can be larger that the cluster size, thus cluster_offset will no longer point to the start of the cluster at which encrypted area starts. Next patch in this series will refactor the code to avoid all these assumptions. In the bugreport that was triggered by rebasing a luks image to new, zero filled base, which lot of such writes, and causes some files with zero areas to contain garbage there instead. But as described above it can happen elsewhere as well Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190915203655.21638-2-mlevitsk@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:35:02 +02:00
Max Reitz	c34dc07f9f	curl: Check curl_multi_add_handle()'s return code If we had done that all along, debugging would have been much simpler. (Also, I/O errors are better than hangs.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:12 +02:00
Max Reitz	bfb23b480a	curl: Handle success in multi_check_completion Background: As of cURL 7.59.0, it verifies that several functions are not called from within a callback. Among these functions is curl_multi_add_handle(). curl_read_cb() is a callback from cURL and not a coroutine. Waking up acb->co will lead to entering it then and there, which means the current request will settle and the caller (if it runs in the same coroutine) may then issue the next request. In such a case, we will enter curl_setup_preadv() effectively from within curl_read_cb(). Calling curl_multi_add_handle() will then fail and the new request will not be processed. Fix this by not letting curl_read_cb() wake up acb->co. Instead, leave the whole business of settling the AIOCB objects to curl_multi_check_completion() (which is called from our timer callback and our FD handler, so not from any cURL callbacks). Reported-by: Natalie Gavrielov <ngavrilo@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-7-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	9abaf9fc47	curl: Report only ready sockets Instead of reporting all sockets to cURL, only report the one that has caused curl_multi_do_locked() to be called. This lets us get rid of the QLIST_FOREACH_SAFE() list, which was actually wrong: SAFE foreaches are only safe when the current element is removed in each iteration. If it possible for the list to be concurrently modified, we cannot guarantee that only the current element will be removed. Therefore, we must not use QLIST_FOREACH_SAFE() here. Fixes: `ff5ca1664a` Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	9dbad87d25	curl: Pass CURLSocket to curl_multi_do() curl_multi_do_locked() currently marks all sockets as ready. That is not only inefficient, but in fact unsafe (the loop is). A follow-up patch will change that, but to do so, curl_multi_do_locked() needs to know exactly which socket is ready; and that is accomplished by this patch here. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	948403bcb1	curl: Check completion in curl_multi_do() While it is more likely that transfers complete after some file descriptor has data ready to read, we probably should not rely on it. Better be safe than sorry and call curl_multi_check_completion() in curl_multi_do(), too, just like it is done in curl_multi_read(). With this change, curl_multi_do() and curl_multi_read() are actually the same, so drop curl_multi_read() and use curl_multi_do() as the sole FD handler. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	007f339b10	curl: Keep *socket until the end of curl_sock_cb() This does not really change anything, but it makes the code a bit easier to follow once we use @socket as the opaque pointer for aio_set_fd_handler(). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190910124136.10565-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Max Reitz	0487861685	curl: Keep pointer to the CURLState in CURLSocket A follow-up patch will make curl_multi_do() and curl_multi_read() take a CURLSocket instead of the CURLState. They still need the latter, though, so add a pointer to it to the former. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190910124136.10565-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 15:31:11 +02:00
Nir Soffer	1bbbf32d5f	block: Use QEMU_IS_ALIGNED Replace instances of: (n & (BDRV_SECTOR_SIZE - 1)) == 0 And: (n & ~BDRV_SECTOR_MASK) == 0 With: QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE) Which reveals the intent of the code better, and makes it easier to locate the code checking alignment. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827185913.27427-2-nsoffer@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 14:48:30 +02:00
Alberto Garcia	bf3d78ae55	qcow2: Stop overwriting compressed clusters one by one handle_alloc() tries to find as many contiguous clusters that need copy-on-write as possible in order to allocate all of them at the same time. However, compressed clusters are only overwritten one by one, so let's say that we have an image with 1024 consecutive compressed clusters: qemu-img create -f qcow2 hd.qcow2 64M for f in `seq 0 64 65472`; do qemu-io -c "write -c ${f}k 64k" hd.qcow2 done In this case trying to overwrite the whole image with one large write request results in 1024 separate allocations: qemu-io -c "write 0 64M" hd.qcow2 This restriction comes from commit `095a9c58ce` from 2008. Nowadays QEMU can overwrite multiple compressed clusters just fine, and in fact it already does: as long as the first cluster that handle_alloc() finds is not compressed, all other compressed clusters in the same batch will be overwritten in one go: qemu-img create -f qcow2 hd.qcow2 64M qemu-io -c "write -z 0 64k" hd.qcow2 for f in `seq 64 64 65472`; do qemu-io -c "write -c ${f}k 64k" hd.qcow2 done Compared to the previous one, overwriting this image on my computer goes from 8.35s down to 230ms. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: John Snow <jsnow@redhat.com Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:37 +02:00
Philippe Mathieu-Daudé	d90d5cae2b	block/create: Do not abort if a block driver is not available The 'blockdev-create' QMP command was introduced as experimental feature in commit `b0292b851b`, using the assert() debug call. It got promoted to 'stable' command in `3fb588a0f2`, but the assert call was not removed. Some block drivers are optional, and bdrv_find_format() might return a NULL value, triggering the assertion. Stable code is not expected to abort, so return an error instead. This is easily reproducible when libnfs is not installed: ./configure [...] module support no Block whitelist (rw) Block whitelist (ro) libiscsi support yes libnfs support no [...] Start QEMU: $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait Send the 'blockdev-create' with the 'nfs' driver: $ ( cat << 'EOF' {'execute': 'qmp_capabilities'} {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} EOF ) \| socat STDIO UNIX:/tmp/qemu.qmp {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}} {"return": {}} QEMU crashes: $ gdb qemu-system-x86_64 core Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x00007ffff510957f in raise () at /lib64/libc.so.6 #1 0x00007ffff50f3895 in abort () at /lib64/libc.so.6 #2 0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69 #5 0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314 #6 0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131 #7 0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174 With this patch applied, QEMU returns a QMP error: {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'} {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}} Cc: qemu-stable@nongnu.org Reported-by: Xu Tian <xutian@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:37 +02:00
Peter Lieven	d2c6becbe0	block/nfs: add support for nfs_umount libnfs recently added support for unmounting. Add support in Qemu too. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:15 +02:00
Peter Lieven	601dc65597	block/nfs: tear down aio before nfs_close nfs_close is a sync call from libnfs and has its own event handler polling on the nfs FD. Avoid that both QEMU and libnfs are intefering here. CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-13 12:18:14 +02:00
Max Reitz	1a37e31244	vpc: Return 0 from vpc_co_create() on success blockdev_create_run() directly uses .bdrv_co_create()'s return value as the job's return value. Jobs must return 0 on success, not just any nonnegative value. Therefore, using blockdev-create for VPC images may currently fail as the vpc driver may return a positive integer. Because there is no point in returning a positive integer anywhere in the block layer (all non-negative integers are generally treated as complete success), we probably do not want to add more such cases. Therefore, fix this problem by making the vpc driver always return 0 in case of success. Suggested-by: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Kevin Wolf	effecce6bc	file-posix: Fix has_write_zeroes after NO_FALLBACK If QEMU_AIO_NO_FALLBACK is given, we always return failure and don't even try to use the BLKZEROOUT ioctl. In this failure case, we shouldn't disable has_write_zeroes because we didn't learn anything about the ioctl. The next request might not set QEMU_AIO_NO_FALLBACK and we can still use the ioctl then. Fixes: `738301e117` Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-09-10 08:58:43 +02:00
Max Reitz	b2c6f23f4a	block/file-posix: Reduce xfsctl() use This patch removes xfs_write_zeroes() and xfs_discard(). Both functions have been added just before the same feature was present through fallocate(): - fallocate() has supported PUNCH_HOLE for XFS since Linux 2.6.38 (March 2011); xfs_discard() was added in December 2010. - fallocate() has supported ZERO_RANGE for XFS since Linux 3.15 (June 2014); xfs_write_zeroes() was added in November 2013. Nowadays, all systems that qemu runs on should support both fallocate() features (RHEL 7's kernel does). xfsctl() is still useful for getting the request alignment for O_DIRECT, so this patch does not remove our dependency on it completely. Note that xfs_write_zeroes() had a bug: It calls ftruncate() when the file is shorter than the specified range (because ZERO_RANGE does not increase the file length). ftruncate() may yield and then discard data that parallel write requests have written past the EOF in the meantime. Dropping the function altogether fixes the bug. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: `50ba5b2d99` Reported-by: Lukáš Doktor <ldoktor@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Vladimir Sementsov-Ogievskiy	bb0c940993	job: drop job_drain In job_finish_sync job_enter should be enough for a job to make some progress and draining is a wrong tool for it. So use job_enter directly here and drop job_drain with all related staff not used more. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: John Snow <jsnow@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Alberto Garcia	b70d08205b	qcow2: Fix the calculation of the maximum L2 cache size The size of the qcow2 L2 cache defaults to 32 MB, which can be easily larger than the maximum amount of L2 metadata that the image can have. For example: with 64 KB clusters the user would need a qcow2 image with a virtual size of 256 GB in order to have 32 MB of L2 metadata. Because of that, since commit `b749562d98` we forbid the L2 cache to become larger than the maximum amount of L2 metadata for the image, calculated using this formula: uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); The problem with this formula is that the result should be rounded up to the cluster size because an L2 table on disk always takes one full cluster. For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if the last 32 KB of those are not going to be used. However QEMU rounds the numbers down and only creates 2 cache tables (128 KB), which is not enough for the image. A quick test doing 4KB random writes on a 1280 MB image gives me around 500 IOPS, while with the correct cache size I get 16K IOPS. Cc: qemu-stable@nongnu.org Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-09-10 08:58:43 +02:00
Eric Blake	f061656cc3	nbd: Implement client use of NBD FAST_ZERO The client side is fairly straightforward: if the server advertised fast zero support, then we can map that to BDRV_REQ_NO_FALLBACK support. A server that advertises FAST_ZERO but not WRITE_ZEROES is technically broken, but we can ignore that situation as it does not change our behavior. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 16:03:26 -05:00
Andrey Shinkevich	294682cc3a	block: workaround for unaligned byte range in fallocate() Revert the commit `118f99442d` 'block/io.c: fix for the allocation failure' and use better error handling for file systems that do not support fallocate() for an unaligned byte range. Allow falling back to pwrite in case fallocate() returns EINVAL. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <1566913973-15490-1-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-09-05 16:01:31 -05:00
Eric Blake	df18c04edf	nbd: Use g_autofree in a few places Thanks to our recent move to use glib's g_autofree, I can join the bandwagon. Getting rid of gotos is fun ;) There are probably more places where we could register cleanup functions and get rid of more gotos; this patch just focuses on the labels that existed merely to call g_free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190824172813.29720-2-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-09-05 15:52:45 -05:00
Stefan Hajnoczi	236094c738	file-posix: fix request_alignment typo Fixes: `a6b257a08e` ("file-posix: Handle undetectable alignment") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190827101328.4062-1-stefanha@redhat.com Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	bedb8bb419	vmdk: Reject invalid compressed writes Compressed writes generally have to write full clusters, not just in theory but also in practice when it comes to vmdk's streamOptimized subformat. It currently is just silently broken for writes with non-zero in-cluster offsets: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 4096 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec) read failed: Invalid argument (The technical reason is that vmdk_write_extent() just writes the incomplete compressed data actually to offset 4k. When reading the data, vmdk_read_extent() looks at offset 0 and finds the compressed data size to be 0, because that is what it reads from there. This yields an error.) For incomplete writes with zero in-cluster offsets, the error path when reading the rest of the cluster is a bit different, but the result is the same: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec) read failed: Invalid argument (Here, vmdk_read_extent() finds the data and then sees that the uncompressed data is short.) It is better to reject invalid writes than to make the user believe they might have succeeded and then fail when trying to read it back. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190815153638.4600-5-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	cdc0dd2586	vmdk: Use bdrv_dirname() for relative extent paths This makes iotest 033 pass with e.g. subformat=monolithicFlat. It also turns a former error in 059 into success. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190815153638.4600-3-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Nir Soffer	3a20013fbb	block: posix: Always allocate the first block When creating an image with preallocation "off" or "falloc", the first block of the image is typically not allocated. When using Gluster storage backed by XFS filesystem, reading this block using direct I/O succeeds regardless of request length, fooling alignment detection. In this case we fallback to a safe value (4096) instead of the optimal value (512), which may lead to unneeded data copying when aligning requests. Allocating the first block avoids the fallback. Since we allocate the first block even with preallocation=off, we no longer create images with zero disk size: $ ./qemu-img create -f raw test.raw 1g Formatting 'test.raw', fmt=raw size=1073741824 $ ls -lhs test.raw 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw And converting the image requires additional cluster: $ ./qemu-img measure -f raw -O qcow2 test.raw required size: 458752 fully allocated size: 1074135040 When using format like vmdk with multiple files per image, we allocate one block per file: $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat $ ls -lhs test*.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 353 Aug 27 03:23 test.vmdk I did quick performance test for copying disks with qemu-img convert to new raw target image to Gluster storage with sector size of 512 bytes: for i in $(seq 10); do rm -f dst.raw sleep 10 time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw done Here is a table comparing the total time spent: Type Before(s) After(s) Diff(%) --------------------------------------- real 530.028 469.123 -11.4 user 17.204 10.768 -37.4 sys 17.881 7.011 -60.7 We can see very clear improvement in CPU usage. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827010528.8818-2-nsoffer@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Vladimir Sementsov-Ogievskiy	5396234b96	block/qcow2: implement .bdrv_co_pwritev(_compressed)_part Implement and use new interface to get rid of hd_qiov. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-13-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-13-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	df893d25ce	block/qcow2: implement .bdrv_co_preadv_part Implement and use new interface to get rid of hd_qiov. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-12-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-12-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	00721a3529	block/qcow2: refactor qcow2_co_preadv to use buffer-based io Use buffer based io in encrypted case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-11-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-11-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	1acc3466a2	block/io: introduce bdrv_co_p{read, write}v_part Introduce extended variants of bdrv_co_preadv and bdrv_co_pwritev with qiov_offset parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-10-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-10-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	28c4da2869	block/io: bdrv_aligned_pwritev: use and support qiov_offset Use and support new API in bdrv_aligned_pwritev. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-9-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-9-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	65cd4424b9	block/io: bdrv_aligned_preadv: use and support qiov_offset Use and support new API in bdrv_co_do_copy_on_readv. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-8-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-8-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	2275cc90a1	block/io: bdrv_co_do_copy_on_readv: lazy allocation Allocate bounce_buffer only if it is really needed. Also, sub-optimize allocation size (why not?). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-7-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-7-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	1143ec5ebf	block/io: bdrv_co_do_copy_on_readv: use and support qiov_offset Use and support new API in bdrv_co_do_copy_on_readv. Note that in case of allocated-in-top we need to shrink read size to MIN(..) by hand, as pre-patch this was actually done implicitly by qemu_iovec_concat (and we used local_qiov.size). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-6-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-6-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	ac850bf099	block: define .*_part io handlers in BlockDriver Add handlers supporting qiov_offset parameter: bdrv_co_preadv_part bdrv_co_pwritev_part bdrv_co_pwritev_compressed_part This is used to reduce need of defining local_qiovs and hd_qiovs in all corners of block layer code. The following patches will increase usage of this new API part by part. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-5-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-5-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy	7a3f542fbd	block/io: refactor padding We have similar padding code in bdrv_co_pwritev, bdrv_co_do_pwrite_zeroes and bdrv_co_preadv. Let's combine and unify it. [Squashed in Vladimir's qemu-iotests 077 fix --Stefan] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-4-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-4-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:58:12 +01:00
Vladimir Sementsov-Ogievskiy	f76889e7b9	util/iov: improve qemu_iovec_is_zero We'll need to check a part of qiov soon, so implement it now. Optimization with align down to 4 * sizeof(long) is dropped due to: 1. It is strange: it aligns length of the buffer, but where is a guarantee that buffer pointer is aligned itself? 2. buffer_is_zero() is a better place for optimizations and it has them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190604161514.262241-3-vsementsov@virtuozzo.com Message-Id: <20190604161514.262241-3-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-08-27 14:52:45 +01:00
Max Reitz	fbc8e1b7e4	vpc: Do not return RAW from block_status vpc is not really a passthrough driver, even when using the fixed subformat (where host and guest offsets are equal). It should handle preallocation like all other drivers do, namely by returning DATA \| RECURSE instead of RAW. There is no tangible difference but the fact that bdrv_is_allocated() no longer falls through to the protocol layer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-4-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	4dd84ac9a7	vmdk: Make block_status recurse for flat extents Fixes: `69f47505ee` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-3-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	ad6434dc62	vdi: Make block_status recurse for fixed images Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Fixes: `69f47505ee` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-2-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	9956688a8f	vhdx: Fix .bdrv_has_zero_init() Fixed VHDX images cannot guarantee to be zero-initialized. If the image has the "fixed" subformat, forward the call to the underlying storage node. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-9-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	0a28bf2826	vdi: Fix .bdrv_has_zero_init() Static VDI images cannot guarantee to be zero-initialized. If the image has been statically allocated, forward the call to the underlying storage node. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190724171239.8764-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	38841dcd27	qcow2: Fix .bdrv_has_zero_init() If a qcow2 file is preallocated, it can no longer guarantee that it initially appears as filled with zeroes. So implement .bdrv_has_zero_init() by checking whether the file is preallocated; if so, forward the call to the underlying storage node, except for when it is encrypted: Encrypted preallocated images always return effectively random data, so .bdrv_has_zero_init() must always return 0 for them. .bdrv_has_zero_init_truncate() can remain bdrv_has_zero_init_1(), because it presupposes PREALLOC_MODE_OFF. Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-7-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	b647d69adc	block: Use bdrv_has_zero_init_truncate() vhdx and parallels call bdrv_has_zero_init() when they do not really care about an image's post-create state but only about what happens when you grow an image. That is a bit ugly, and also overly safe when growing preallocated images without preallocating the new areas. Let them use bdrv_has_zero_init_truncate() instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> [mreitz: Added commit message] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	1dcaf52760	block: Implement .bdrv_has_zero_init_truncate() We need to implement .bdrv_has_zero_init_truncate() for every block driver that supports truncation and has a .bdrv_has_zero_init() implementation. Implement it the same way each driver implements .bdrv_has_zero_init(). This is at least not any more unsafe than what we had before. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Max Reitz	cdf3bc934a	mirror: Fix bdrv_has_zero_init() use bdrv_has_zero_init() only has meaning for newly created images or image areas. If the mirror job itself did not create the image, it cannot rely on bdrv_has_zero_init()'s result to carry any meaning. This is the case for drive-mirror with mode=existing and always for blockdev-mirror. Note that we only have to zero-initialize the target with sync=full, because other modes actually do not promise that the target will contain the same data as the source after the job -- sync=top only promises to copy anything allocated in the top layer, and sync=none will only copy new I/O. (Which is how mirror has always handled it.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Maxim Levitsky	672de729a1	LUKS: support preallocation preallocation=off and preallocation=metadata both allocate luks header only, and preallocation=falloc/full is passed to underlying file. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190716161901.1430-1-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Peter Maydell	3fbd3405d2	- Run the iotest during "make check" -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAl1XvtURHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUqqRAAqfjPRPtXSZxaxl63O5EADugLY72s+04i Zc5MC4Ivwj0x1WA6JIG77kz6xmjIOizRIkF1jkyEkG+AgIrjw3rdYhCD4Iav0n/v nqltkOaf1FzdQYCHUTn0WUYn7Df2bSkjSTPhnqbCaGq5WjXzgzi9jhCFlZpo374J 9yWk74nt3QlOUjLw6+cm0HxEf9IlRQdPwJNYJwsrYHgspJwcwAYJ7xaL34huoxkJ 10fA9q6QK1bh67nZpAJOte3wQ8r35cUT4ZaIiyO0MFMrEiLp4/1gKYpkZwWq0+iV 25rWVjogzRjp+LejMAltY9MmUekCl5ZzVVuhdt2jGPbNanzdHxHydYFUELP6WmrM zAyYQkDvG7JiY2F0M9rKJZOnpO2pYF2hxc/nqD04qF3HD0zG1eoIg056UlKcc5b/ kIgR2srlj0aHWhGJ2/DV3w5ZowjJGKBAYYxQdEJmiuLpGdOimSoWXbjacDEd4mUf DevXv6k9cAvexZxU4cOUzpip4U3MGC0rJ1BNIgTs6eIeKq3geROTpuHjJFSBHZiP H/MqtoT4xTXJYomc9MiZG90fG9KyywlEF6e0GjVIcadJEmFIbJ+DfrAknVRjuaij ThMSIuvMEpXFhyghlApURePNi8W3FIIYHISw0JE/u7+4/7L/iYDToYeM/o563+8O zbj0n9fSewI= =oVZx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2019-08-17' into staging - Run the iotest during "make check" # gpg: Signature made Sat 17 Aug 2019 09:46:13 BST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/huth-gitlab/tags/pull-request-2019-08-17: gitlab-ci: Remove qcow2 tests that are handled by "make check" already tests: Run the iotests during "make check" again block: fix NetBSD qemu-iotests failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-19 14:14:09 +01:00
Paolo Bonzini	f6fc1e30cf	block: fix NetBSD qemu-iotests failure Opening a block device on NetBSD has an additional step compared to other OSes, corresponding to raw_normalize_devicepath. The error message in that function is slightly different from that in raw_open_common and this was causing spurious failures in qemu-iotests. However, in general it is not important to know what exact step was failing, for example in the qemu-iotests case the error message contains the fairly unequivocal "No such file or directory" text from strerror. We can thus fix the failures by standardizing on a single error message for both raw_open_common and raw_normalize_devicepath; in fact, we can even use error_setg_file_open to make sure the error message is the same as in the rest of QEMU. Message-Id: <20190725095920.28419-1-pbonzini@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2019-08-17 09:02:59 +02:00
Vladimir Sementsov-Ogievskiy	a1ed82b443	block/backup: refactor write_flags write flags are constant, let's store it in BackupBlockJob instead of recalculating. It also makes two boolean fields to be unused, so, drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190730163251.755248-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 18:29:43 -04:00
Vladimir Sementsov-Ogievskiy	319bd5edb9	block/backup: deal with zero detection We have detect_zeroes option, so at least for blockdev-backup user should define it if zero-detection is needed. For drive-backup leave detection enabled by default but do it through existing option instead of open-coding. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190730163251.755248-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 18:29:43 -04:00
Vladimir Sementsov-Ogievskiy	590a63d598	qapi: add dirty-bitmaps to query-named-block-nodes result Let's add a possibility to query dirty-bitmaps not only on root nodes. It is useful when dealing both with snapshots and incremental backups. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20190717173937.18747-1-jsnow@redhat.com [Added deprecation information. --js] Signed-off-by: John Snow <jsnow@redhat.com> [Fixed spelling --js]	2019-08-16 18:29:43 -04:00
John Snow	1a2b8b406b	block/backup: support bitmap sync modes for non-bitmap backups Accept bitmaps and sync policies for the other backup modes. This allows us to do things like create a bitmap synced to a full backup without a transaction, or start a resumable backup process. Some combinations don't make sense, though: - NEVER policy combined with any non-BITMAP mode doesn't do anything, because the bitmap isn't used for input or output. It's harmless, but is almost certainly never what the user wanted. - sync=NONE is more questionable. It can't use on-success because this job never completes with success anyway, and the resulting artifact of 'always' is suspect: because we start with a full bitmap and only copy out segments that get written to, the final output bitmap will always be ... a fully set bitmap. Maybe there's contexts in which bitmaps make sense for sync=none, but not without more severe changes to the current job, and omitting it here doesn't prevent us from adding it later. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 18:29:43 -04:00
John Snow	7e30dd618e	block/backup: teach TOP to never copy unallocated regions Presently, If sync=TOP is selected, we mark the entire bitmap as dirty. In the write notifier handler, we dutifully copy out such regions. Fix this in three parts: 1. Mark the bitmap as being initialized before the first yield. 2. After the first yield but before the backup loop, interrogate the allocation status asynchronously and initialize the bitmap. 3. Teach the write notifier to interrogate allocation status if it is invoked during bitmap initialization. As an effect of this patch, the job progress for TOP backups now behaves like this: - total progress starts at bdrv_length. - As allocation status is interrogated, total progress decreases. - As blocks are copied, current progress increases. Taken together, the floor and ceiling move to meet each other. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20190716000117.25219-10-jsnow@redhat.com [Remove ret = -ECANCELED change. --js] [Squash in conflict resolution based on Max's patch --js] Message-id: c8b0ab36-79c8-0b4b-3193-4e12ed8c848b@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	dba8700f16	block/backup: add backup_is_cluster_allocated Modify bdrv_is_unallocated_range to utilize the pnum return from bdrv_is_allocated, and in the process change the semantics from "is unallocated" to "is allocated." Optionally returns a full number of clusters that share the same allocation status. This will be used to carefully toggle bits in the bitmap for sync=top initialization in the following commits. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-9-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	141cdcdf84	block/backup: centralize copy_bitmap initialization Just a few housekeeping changes that keeps the following commit easier to read; perform the initial copy_bitmap initialization in one place. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	0fff1f1371	block/backup: improve sync=bitmap work estimates When making backups based on bitmaps, the work estimate can be more accurate. Update iotests to reflect the new strategy. TOP work estimates are broken, but do not get worse with this commit. That issue is addressed in the following commits instead. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	a6c9365ad4	block/backup: hoist bitmap check into QMP interface This is nicer to do in the unified QMP interface that we have now, because it lets us use the right terminology back at the user. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716000117.25219-5-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	c4e4b0fa59	qapi: implement block-dirty-bitmap-remove transaction action It is used to do transactional movement of the bitmap (which is possible in conjunction with merge command). Transactional bitmap movement is needed in scenarios with external snapshot, when we don't want to leave copy of the bitmap in the base image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190708220502.12977-3-jsnow@redhat.com [Edited "since" version to 4.2 --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	b30ffbef53	block/backup: loosen restriction on readonly bitmaps With the "never" sync policy, we actually can utilize readonly bitmaps now. Loosen the check at the QMP level, and tighten it based on provided arguments down at the job creation level instead. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-19-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	c23909e530	block/backup: add 'always' bitmap sync policy This adds an "always" policy for bitmap synchronization. Regardless of if the job succeeds or fails, the bitmap is always synchronized. This means that for backups that fail part-way through, the bitmap retains a record of which sectors need to be copied out to accomplish a new backup using the old, partial result. In effect, this allows us to "resume" a failed backup; however the new backup will be from the new point in time, so it isn't a "resume" as much as it is an "incremental retry." This can be useful in the case of extremely large backups that fail considerably through the operation and we'd like to not waste the work that was already performed. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-13-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	62aa1fbeac	block/backup: upgrade copy_bitmap to BdrvDirtyBitmap This simplifies some interface matters; namely the initialization and (later) merging the manifest back into the sync_bitmap if it was provided. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-12-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	28636b8211	block/dirty-bitmap: add bdrv_dirty_bitmap_get Add a public interface for get. While we're at it, rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked". (There are more functions to rename to the bdrv_dirty_bitmap_VERB form, but they will wait until the conclusion of this series.) Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	b7661ca5d8	block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal I'm surprised it didn't come up sooner, but sometimes we have a +busy bitmap as a source. This is dangerous from the QMP API, but if we are the owner that marked the bitmap busy, it's safe to merge it using it as a read only source. It is not safe in the general case to allow users to read from in-use bitmaps, so create an internal variant that foregoes the safety checking. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-10-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
John Snow	cf0cd293c6	block/backup: add 'never' policy to bitmap sync mode This adds a "never" policy for bitmap synchronization. Regardless of if the job succeeds or fails, we never update the bitmap. This can be used to perform differential backups, or simply to avoid the job modifying a bitmap. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00

1 2 3 4 5 ...

4632 commits