Commit graph

631 commits

Author SHA1 Message Date
Kevin Wolf f01643fb8b qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200424125448.63318-5-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 8c6242b6f3 block-backend: Add flags to blk_truncate()
Now that node level interface bdrv_truncate() supports passing request
flags to the block driver, expose this on the BlockBackend level, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 7b8e485742 block: Add flags to bdrv(_co)_truncate()
Now that block drivers can support flags for .bdrv_co_truncate, expose
the parameter in the node level interfaces bdrv_co_truncate() and
bdrv_truncate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf 92b92799dc block: Add flags to BlockDriver.bdrv_co_truncate()
This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate()
driver callbacks, and a supported_truncate_flags field in
BlockDriverState that allows drivers to advertise support for request
flags in the context of truncate.

For now, we always pass 0 and no drivers declare support for any flag.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Alberto Garcia fb43d2d46e qcow2: Check request size in qcow2_co_pwritev_compressed_part()
When issuing a compressed write request the number of bytes must be a
multiple of the cluster size or reach the end of the last cluster.

With the current code such requests are allowed and we hit an
assertion:

   $ qemu-img create -f qcow2 img.qcow2 1M
   $ qemu-io -c 'write -c 0 32k' img.qcow2

   qemu-io: block/qcow2.c:4257: qcow2_co_pwritev_compressed_task:
   Assertion `bytes == s->cluster_size || (bytes < s->cluster_size &&
              (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))' failed.
   Aborted

This patch fixes a regression introduced in 0d483dce38

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200406143401.26854-1-berto@igalia.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-04-07 13:51:09 +02:00
Alberto Garcia 80f5c01183 qcow2: Forbid discard in qcow2 v2 images with backing files
A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200331114345.29993-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-04-07 13:51:09 +02:00
Eric Blake e7be13ad3f qcow2: Avoid feature name extension on small cluster size
As the feature name table can be quite large (over 9k if all 64 bits
of all three feature fields have names; a mere 8 features leaves only
8 bytes for a backing file name in a 512-byte cluster), it is unwise
to emit this optional header in images with small cluster sizes.

Update iotest 036 to skip running on small cluster sizes; meanwhile,
note that iotest 061 never passed on alternative cluster sizes
(however, I limited this patch to tests with output affected by adding
feature names, rather than auditing for other tests that are not
robust to alternative cluster sizes).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200324174233.1622067-4-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Eric Blake bb40ebce2c qcow2: List autoclear bit names in header
The feature table is supposed to advertise the name of all feature
bits that we support; however, we forgot to update the table for
autoclear bits.  While at it, move the table to read-only memory in
code, and tweak the qcow2 spec to name the second autoclear bit.
Update iotests that are affected by the longer header length.

Fixes: 88ddffae
Fixes: 93c24936
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200324174233.1622067-3-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Eric Blake a951a631b9 qcow2: Comment typo fixes
Various trivial typos noticed while working on this file.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200324174233.1622067-2-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Maxim Levitsky b92902dfea block: pass BlockDriver reference to the .bdrv_co_create
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-26 14:44:33 +01:00
Vladimir Sementsov-Ogievskiy 808cf3cb6a block/qcow2: zero data_file child after free
data_file being NULL doesn't seem to be a correct state, but it's
better than dead pointer and simpler to debug.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200316060631.30052-3-vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-24 11:41:46 +01:00
Eric Blake 71eaec2e8c block: Avoid memleak on qcow2 image info failure
If we fail to get bitmap info, we must not leak the encryption info.

Fixes: b8968c875f
Fixes: Coverity CID 1421894
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200320183620.1112123-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Tested-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-24 11:41:46 +01:00
Pan Nengyuan 4aebf0f0da block/qcow2: do free crypto_opts in qcow2_close()
'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack:

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295
    #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49
    #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290
    #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163
    #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148
    #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232
    #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512
    #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792
    #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819
    #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317
    #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575
    #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126
    #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219
    #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397
    #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534
    #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668
    #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485
    #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:30 +01:00
Stefan Hajnoczi 6d49d3a859 luks: extract qcrypto_block_calculate_payload_offset()
The qcow2 .bdrv_measure() code calculates the crypto payload offset.
This logic really belongs in crypto/block.c where it can be reused by
other image formats.

The "luks" block driver will need this same logic in order to implement
.bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset()
function now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-2-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-11 12:42:29 +01:00
Peter Krempa 65eb7c85a3 block/qcow2: Move bitmap reopen into bdrv_reopen_commit_post
The bitmap code requires writing the 'file' child when the qcow2 driver
is reopened in read-write mode.

If the 'file' child is being reopened due to a permissions change, the
modification is commited yet when qcow2_reopen_commit is called. This
means that any attempt to write the 'file' child will end with EBADFD
as the original fd was already closed.

Moving bitmap reopening to the new callback which is called after
permission modifications are commited fixes this as the file descriptor
will be replaced with the correct one.

The above problem manifests itself when reopening 'qcow2' format layer
which uses a 'file-posix' file child which was opened with the
'auto-read-only' property set.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <db118dbafe1955afbc0a18d3dd220931074ce349.1582893284.git.pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06 17:15:37 +01:00
Daniel P. Berrangé 087ab8e775 block: always fill entire LUKS header space with zeros
When initializing the LUKS header the size with default encryption
parameters will currently be 2068480 bytes. This is rounded up to
a multiple of the cluster size, 2081792, with 64k sectors. If the
end of the header is not the same as the end of the cluster we fill
the extra space with zeros. This was forgetting that not even the
space allocated for the header will be fully initialized, as we
only write key material for the first key slot. The space left
for the other 7 slots is never written to.

An optimization to the ref count checking code:

  commit a5fff8d4b4 (refs/bisect/bad)
  Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  Date:   Wed Feb 27 16:14:30 2019 +0300

    qcow2-refcount: avoid eating RAM

made the assumption that every cluster which was allocated would
have at least some data written to it. This was violated by way
the LUKS header is only partially written, with much space simply
reserved for future use.

Depending on the cluster size this problem was masked by the
logic which wrote zeros between the end of the LUKS header and
the end of the cluster.

$ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \
   -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\
               encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \
               cluster_size_check.qcow2 100M
  Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600
    encrypt.format=luks encrypt.key-secret=cluster_encrypt0
    encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16

$ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \
    'json:{"driver": "qcow2", "encrypt.format": "luks", \
           "encrypt.key-secret": "cluster_encrypt0", \
           "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}'
ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000
Leaked cluster 4 refcount=1 reference=0
...snip...
Leaked cluster 130 refcount=1 reference=0

1 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

127 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
Image end offset: 268288

The problem only exists when the disk image is entirely empty. Writing
data to the disk image payload will solve the problem by causing the
end of the file to be extended further.

The change fixes it by ensuring that the entire allocated LUKS header
region is fully initialized with zeros. The qemu-img check will still
fail for any pre-existing disk images created prior to this change,
unless at least 1 byte of the payload is written to.

Fully writing zeros to the entire LUKS header is a good idea regardless
as it ensures that space has been allocated on the host filesystem (or
whatever block storage backend is used).

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200207135520.2669430-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20 16:43:42 +01:00
Alberto Garcia 2d4b5256cf qcow2: Fix alignment checks in encrypted images
I/O requests to encrypted media should be aligned to the sector size
used by the underlying encryption method, not to BDRV_SECTOR_SIZE.
Fortunately this doesn't break anything at the moment because
both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as
BDRV_SECTOR_SIZE.

The checks in qcow2_co_preadv_encrypted() are also unnecessary because
they are repeated immediately afterwards in qcow2_co_encdec().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200213171646.15876-1-berto@igalia.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Alberto Garcia 3afea40243 qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value
This replaces all remaining instances in the qcow2 code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: b5f74b606c2d9873b12d29acdb7fd498029c4025.1579374329.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 25ae71db55 qcow2: Don't require aligned offsets in qcow2_co_copy_range_from()
qemu-img's convert_co_copy_range() operates at the sector level and
block_copy() operates at the cluster level so this condition is always
true, but it is not necessary to restrict this here, so let's leave it
to the driver implementation return an error if there is any.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: a4264aaee656910c84161a2965f7a501437379ca.1579374329.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 344ffea951 qcow2: Tighten cluster_offset alignment assertions
qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always
return offsets that are cluster-aligned so don't just check that they
are sector-aligned.

The check in qcow2_co_preadv_task() is also replaced by an assertion
for the same reason.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 558ba339965f858bede4c73ce3f50f0c0493597d.1579374329.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia ef97d608c7 qcow2: Don't round the L1 table allocation up to the sector size
The L1 table is read from disk using the byte-based bdrv_pread() and
is never accessed beyond its last element, so there's no need to
allocate more memory than that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Alberto Garcia 7cdca2e233 qcow2: Use a GString in report_unsupported_feature()
This is a bit more efficient than having to allocate and free memory
for each item.

The default size (60) is enough for all the existing incompatible
features or the "Unknown incompatible feature" message.

Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200115135626.19442-1-berto@igalia.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Andrey Shinkevich 0d483dce38 qcow2: Allow writing compressed data of multiple clusters
QEMU currently supports writing compressed data of the size equal to
one cluster. This patch allows writing QCOW2 compressed data that
exceed one cluster. Now, we split buffered data into separate clusters
and write them compressed using the block/aio_task API.

Suggested-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1575288906-551879-3-git-send-email-andrey.shinkevich@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06 13:43:07 +01:00
Tuguoyi 66be5c3e78 qcow2: Move error check of local_err near its assignment
The local_err check outside of the if block was necessary
when it was introduced in commit d1258dd0c8 because it needed to be
executed even if qcow2_load_autoloading_dirty_bitmaps() returned false.

After some modifications that all required the error check to remain
where it is, commit 9c98f145df finally moved the
qcow2_load_dirty_bitmaps() call into the if block, so now the error
check should be there, too.

Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-12-19 14:31:52 +01:00
Alberto Garcia 74e60fb56a qcow2: Use offset_into_cluster()
There's a couple of places left in the qcow2 code that still do the
calculation manually, so let's replace them.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-12-18 11:21:16 +01:00
Kevin Wolf 3b65081638 qcow2: Declare BDRV_REQ_NO_FALLBACK supported
In the common case, qcow2_co_pwrite_zeroes() already only modifies
metadata case, so we're fine with or without BDRV_REQ_NO_FALLBACK set.

The only exception is when using an external data file, where the
request is passed down to the block driver of the external data file. We
are forwarding the BDRV_REQ_NO_FALLBACK flag there, though, so this is
fine, too.

Declare the flag supported therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2019-12-18 11:21:16 +01:00
Max Reitz e8d04f9237 block: Pass truncate exact=true where reasonable
This is a change in behavior, so all instances need a good
justification.  The comments added here should explain my reasoning.

qed already had a comment that suggests it always expected
bdrv_truncate()/blk_truncate() to behave as if exact=true were passed
(c743849bee came eight months before 55b949c847), so it was simply
broken until now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-8-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
[mreitz: Changed comment in qed.c to explain why a new QED file must be
         empty, as requested and suggested by Maxim]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:08:45 +01:00
Max Reitz e61a28a9b6 block: Let format drivers pass @exact
When truncating a format node, the @exact parameter is generally handled
simply by virtue of the format storing the new size in the image
metadata.  Such formats do not need to pass on the parameter to their
file nodes.

There are exceptions, though:
- raw and crypto cannot store the image size, and thus must pass on
  @exact.

- When using qcow2 with an external data file, it just makes sense to
  keep its size in sync with the qcow2 virtual disk (because the
  external data file is the virtual disk).  Therefore, we should pass
  @exact when truncating it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-7-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:05:30 +01:00
Max Reitz c80d8b06cf block: Add @exact parameter to bdrv_co_truncate()
We have two drivers (iscsi and file-posix) that (in some cases) return
success from their .bdrv_co_truncate() implementation if the block
device is larger than the requested offset, but cannot be shrunk.  Some
callers do not want that behavior, so this patch adds a new parameter
that they can use to turn off that behavior.

This patch just adds the parameter and lets the block/io.c and
block/block-backend.c functions pass it around.  All other callers
always pass false and none of the implementations evaluate it, so that
this patch does not change existing behavior.  Future patches take care
of that.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:00:07 +01:00
Max Reitz 26536c7fc2 block: Do not truncate file node when formatting
There is no reason why the format drivers need to truncate the protocol
node when formatting it.  When using the old .bdrv_co_create_ops()
interface, the file will be created with no size option anyway, which
generally gives it a size of 0.  (Exceptions are block devices, which
cannot be truncated anyway.)

When using blockdev-create, the user must have given the file node some
size anyway, so there is no reason why we should override that.

qed is an exception, it needs the file to start completely empty (as
explained by c743849bee).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-4-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:59:57 +01:00
Max Reitz fe446b5da2 qcow2: Add qcow2_check_fix_snapshot_table()
qcow2_check_read_snapshot_table() can perform consistency checks, but it
cannot fix everything.  Specifically, it cannot allocate new clusters,
because that should wait until the refcount structures are known to be
consistent (i.e., after qcow2_check_refcounts()).  Thus, it cannot call
qcow2_write_snapshots().

Do that in qcow2_check_fix_snapshot_table(), which is called after
qcow2_check_refcounts().

Currently, there is nothing that would set result->corruptions, so this
is a no-op.  A follow-up patch will change that.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:01 +01:00
Max Reitz 8bc584fe03 qcow2: Separate qcow2_check_read_snapshot_table()
Reading the snapshot table can fail.  That is a problem when we want to
repair the image.

Therefore, stop reading the snapshot table in qcow2_do_open() in check
mode.  Instead, add a new function qcow2_check_read_snapshot_table()
that reads the snapshot table at a later point.  In the future, we want
to handle errors here and fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-9-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:54:00 +01:00
Max Reitz 0a85af351d qcow2: Write v3-compliant snapshot list on upgrade
qcow2 v3 requires every snapshot table entry to have two extra data
fields: The 64-bit VM state size, and the virtual disk size.  Both are
optional for v2 images, so they may not be present.

qcow2_upgrade() therefore should update the snapshot table to ensure all
entries have these extra data fields.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:53:52 +01:00
Max Reitz 722efb0c7c qcow2: Put qcow2_upgrade() into its own function
This does not make sense right now, but it will make sense once we need
to do more than to just update s->qcow_version.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:53:20 +01:00
Max Reitz ecf6c7c0c1 qcow2: Add Error ** to qcow2_read_snapshots()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 11:51:09 +01:00
Kevin Wolf 5e97855052 qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
requires s->lock to be taken to protect its accesses to the refcount
table and refcount blocks. However, nothing in this code path actually
took the lock. This could cause the same cache entry to be used by two
requests at the same time, for different tables at different offsets,
resulting in image corruption.

As it would be preferable to base the detection on consistent data (even
though it's just heuristics), let's take the lock not only around the
qcow2_get_refcount() calls, but around the whole function.

This patch takes the lock in qcow2_co_block_status() earlier and asserts
in qcow2_detect_metadata_preallocation() that we hold the lock.

Fixes: 69f47505ee
Cc: qemu-stable@nongnu.org
Reported-by: Michael Weiser <michael.weiser@gmx.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-10-25 15:18:55 +02:00
Vladimir Sementsov-Ogievskiy 4dd09f6223 qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit
The only reason I can imagine for this strange code at the very-end of
bdrv_reopen_commit is the fact that bs->read_only updated after
calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same
time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong
check for being writable, when actually it only need writable file
child not self.

So, as it's fixed, let's move things to correct place.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy 644ddbb754 block/qcow2-bitmap: do not remove bitmaps on reopen-ro
qcow2_reopen_bitmaps_ro wants to store bitmaps and then mark them all
readonly. But the latter don't work, as
qcow2_store_persistent_dirty_bitmaps removes bitmaps after storing.
It's OK for inactivation but bad idea for reopen-ro. And this leads to
the following bug:

Assume we have persistent bitmap 'bitmap0'.
Create external snapshot
  bitmap0 is stored and therefore removed
Commit snapshot
  now we have no bitmaps
Do some writes from guest (*)
  they are not marked in bitmap
Shutdown
Start
  bitmap0 is loaded as valid, but it is actually broken! It misses
  writes (*)
Incremental backup
  it will be inconsistent

So, let's stop removing bitmaps on reopen-ro. But don't rejoice:
reopening bitmaps to rw is broken too, so the whole scenario will not
work after this patch and we can't enable corresponding test cases in
260 iotests still. Reopening bitmaps rw will be fixed in the following
patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190927122355.7344-7-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy d2c3080e41 block/qcow2: proper locking on bitmap add/remove paths
qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire
aio context since 0a6c86d024. But this is not enough: we also must
lock qcow2 mutex when access in-image metadata. Especially it concerns
freeing qcow2 clusters.

To achieve this, move qcow2_can_store_new_dirty_bitmap and
qcow2_remove_persistent_dirty_bitmap to coroutine context.

Since we work in coroutines in correct aio context, we don't need
context acquiring in blockdev.c anymore, drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy d710cf575a block/qcow2: introduce parallel subrequest handling in read and write
It improves performance for fragmented qcow2 images.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 6aa7a2631b block/qcow2: refactor qcow2_co_pwritev_part
Similarly to previous commit, prepare for parallelizing write-loop
iterations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy 88f468e546 block/qcow2: refactor qcow2_co_preadv_part
Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).

While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190916175324.18478-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10 10:56:17 +02:00
Maxim Levitsky 603fbd076c block/qcow2: refactor encryption code
* Change the qcow2_co_{encrypt|decrypt} to just receive full host and
  guest offsets and use this function directly instead of calling
  do_perform_cow_encrypt (which is removed by that patch).

* Adjust qcow2_co_encdec to take full host and guest offsets as well.

* Document the qcow2_co_{encrypt|decrypt} arguments
  to prevent the bug fixed in former commit from hopefully
  happening again.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-id: 20190915203655.21638-3-mlevitsk@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[mreitz: Let perform_cow() return the error value returned by
         qcow2_co_encrypt(), as proposed by Vladimir]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 15:36:22 +02:00
Nir Soffer 1bbbf32d5f block: Use QEMU_IS_ALIGNED
Replace instances of:

    (n & (BDRV_SECTOR_SIZE - 1)) == 0

And:

   (n & ~BDRV_SECTOR_MASK) == 0

With:

    QEMU_IS_ALIGNED(n, BDRV_SECTOR_SIZE)

Which reveals the intent of the code better, and makes it easier to
locate the code checking alignment.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190827185913.27427-2-nsoffer@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-16 14:48:30 +02:00
Alberto Garcia b70d08205b qcow2: Fix the calculation of the maximum L2 cache size
The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
larger than the maximum amount of L2 metadata that the image can have.
For example: with 64 KB clusters the user would need a qcow2 image
with a virtual size of 256 GB in order to have 32 MB of L2 metadata.

Because of that, since commit b749562d98
we forbid the L2 cache to become larger than the maximum amount of L2
metadata for the image, calculated using this formula:

    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);

The problem with this formula is that the result should be rounded up
to the cluster size because an L2 table on disk always takes one full
cluster.

For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
the last 32 KB of those are not going to be used.

However QEMU rounds the numbers down and only creates 2 cache tables
(128 KB), which is not enough for the image.

A quick test doing 4KB random writes on a 1280 MB image gives me
around 500 IOPS, while with the correct cache size I get 16K IOPS.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-09-10 08:58:43 +02:00
Vladimir Sementsov-Ogievskiy 5396234b96 block/qcow2: implement .bdrv_co_pwritev(_compressed)_part
Implement and use new interface to get rid of hd_qiov.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-13-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-13-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy df893d25ce block/qcow2: implement .bdrv_co_preadv_part
Implement and use new interface to get rid of hd_qiov.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-12-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-12-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Vladimir Sementsov-Ogievskiy 00721a3529 block/qcow2: refactor qcow2_co_preadv to use buffer-based io
Use buffer based io in encrypted case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-11-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-11-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-08-27 14:58:42 +01:00
Max Reitz 38841dcd27 qcow2: Fix .bdrv_has_zero_init()
If a qcow2 file is preallocated, it can no longer guarantee that it
initially appears as filled with zeroes.

So implement .bdrv_has_zero_init() by checking whether the file is
preallocated; if so, forward the call to the underlying storage node,
except for when it is encrypted: Encrypted preallocated images always
return effectively random data, so .bdrv_has_zero_init() must always
return 0 for them.

.bdrv_has_zero_init_truncate() can remain bdrv_has_zero_init_1(),
because it presupposes PREALLOC_MODE_OFF.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-7-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00
Max Reitz 1dcaf52760 block: Implement .bdrv_has_zero_init_truncate()
We need to implement .bdrv_has_zero_init_truncate() for every block
driver that supports truncation and has a .bdrv_has_zero_init()
implementation.

Implement it the same way each driver implements .bdrv_has_zero_init().
This is at least not any more unsafe than what we had before.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19 17:13:26 +02:00