Commit graph

3845 commits

Author SHA1 Message Date
Kevin Wolf 0109e7e6f8 block: Defer .bdrv_drain_begin callback to polling phase
We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf dcf94a23b1 block: Don't poll in parent drain callbacks
bdrv_do_drained_begin() is only safe if we have a single
BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
that parent callbacks introduce a nested polling loop that could cause
graph changes while we're traversing the graph.

Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
node without waiting for its requests to complete. These requests will
be waited for in the BDRV_POLL_WHILE() call down the call chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf fe4f0614ef block: Drain recursively with a single BDRV_POLL_WHILE()
Anything can happen inside BDRV_POLL_WHILE(), including graph
changes that may interfere with its callers (e.g. child list iteration
in recursive callers of bdrv_do_drained_begin).

Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
end of bdrv_do_drained_begin() to avoid such effects. The recursion
happens now inside the loop condition. As the graph can only change
between bdrv_drain_poll() calls, but not inside of it, doing the
recursion here is safe.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf d30b8e64b7 block: Remove bdrv_drain_recurse()
For bdrv_drain(), recursively waiting for child node requests is
pointless because we didn't quiesce their parents, so new requests could
come in anyway. Letting the function work only on a single node makes it
more consistent.

For subtree drains and drain_all, we already have the recursion in
bdrv_do_drained_begin(), so the extra recursion doesn't add anything
either.

Remove the useless code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf 89bd030533 block: Really pause block jobs on drain
We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf 1cc8e54ada block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()
Commit 91af091f92 added an additional aio_poll() to BDRV_POLL_WHILE()
in order to make sure that all pending BHs are executed on drain. This
was the wrong place to make the fix, as it is useless overhead for all
other users of the macro and unnecessarily complicates the mechanism.

This patch effectively reverts said commit (the context has changed a
bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
loop condition for drain.

The effect is probably hard to measure in any real-world use case
because actual I/O will dominate, but if I run only the initialisation
part of 'qemu-img convert' where it calls bdrv_block_status() for the
whole image to find out how much data there is copy, this phase actually
needs only roughly half the time after this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf c13ad59f01 block: Don't manually poll in bdrv_drain_all()
All involved nodes are already idle, we called bdrv_do_drain_begin() on
them.

The comment in the code suggested that this was not correct because the
completion of a request on one node could spawn a new request on a
different node (which might have been drained before, so we wouldn't
drain the new request). In reality, new requests to different nodes
aren't spawned out of nothing, but only in the context of a parent
request, and they aren't submitted to random nodes, but only to child
nodes. As long as we still poll for the completion of the parent request
(which we do), draining each root node separately is good enough.

Remove the additional polling code from bdrv_drain_all_begin() and
replace it with an assertion that all nodes are already idle after we
drained them separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf 7d40d9ef9d block: Remove 'recursive' parameter from bdrv_drain_invoke()
All callers pass false for the 'recursive' parameter now. Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf 79ab8b21dc block: Use bdrv_do_drain_begin/end in bdrv_drain_all()
bdrv_do_drain_begin/end() implement already everything that
bdrv_drain_all_begin/end() need and currently still do manually: Disable
external events, call parent drain callbacks, call block driver
callbacks.

It also does two more things:

The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
stood out in the test case by behaving different from the other drain
variants. Adding this is not only safe, but in fact a bug fix.

The second is calling bdrv_drain_recurse(). We already do that later in
the same function in a loop, so basically doing an early first iteration
doesn't hurt.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf bb67568954 test-bdrv-drain: bdrv_drain() works with cross-AioContext events
As long as nobody keeps the other I/O thread from working, there is no
reason why bdrv_drain() wouldn't work with cross-AioContext events. The
key is that the root request we're waiting for is in the AioContext
we're polling (which it always is for bdrv_drain()) so that aio_poll()
is woken up in the end.

Add a test case that shows that it works. Remove the comment in
bdrv_drain() that claims otherwise.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18 15:03:25 +02:00
Kevin Wolf b008326744 block: Remove deprecated -drive option serial
The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.

Tests need to be updated to set the serial number with -global instead
of using the -drive option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster d083f954a9 rbd: New parameter key-secret
Legacy -drive supports "password-secret" parameter that isn't
available with -blockdev / blockdev-add.  That's because we backed out
our first try to provide it there due to interface design doubts, in
commit 577d8c9a81, v2.9.0.

This is the second try.  It brings back the parameter, except it's
named "key-secret" now.

Let's review our reasons for backing out the first try, as stated in
the commit message:

    * BlockdevOptionsRbd member @password-secret isn't actually a
      password, it's a key generated by Ceph.

Addressed by the rename.

    * We're not sure where member @password-secret belongs (see the
      previous commit).

See previous commit.

    * How @password-secret interacts with settings from a configuration
      file specified with @conf is undocumented.

Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster a3699de4dd rbd: New parameter auth-client-required
Parameter auth-client-required lets you configure authentication
methods.  We tried to provide that in v2.9.0, but backed out due to
interface design doubts (commit 464444fcc1).

This commit is similar to what we backed out, but simpler: we use a
list of enumeration values instead of a list of objects with a member
of enumeration type.

Let's review our reasons for backing out the first try, as stated in
the commit message:

    * The implementation uses deprecated rados_conf_set() key
      "auth_supported".  No biggie.

Fixed: we use "auth-client-required".

    * The implementation makes -drive silently ignore invalid parameters
      "auth" and "auth-supported.*.X" where X isn't "auth".  Fixable (in
      fact I'm going to fix similar bugs around parameter server), so
      again no biggie.

That fix is commit 2836284db6.  This commit doesn't bring the bugs
back.

    * BlockdevOptionsRbd member @password-secret applies only to
      authentication method cephx.  Should it be a variant member of
      RbdAuthMethod?

We've had time to ponder, and we decided to stick to the way Ceph
configuration works: the key configured separately, and silently
ignored if the authentication method doesn't use it.

    * BlockdevOptionsRbd member @user could apply to both methods cephx
      and none, but I'm not sure it's actually used with none.  If it
      isn't, should it be a variant member of RbdAuthMethod?

Likewise.

    * The client offers a *set* of authentication methods, not a list.
      Should the methods be optional members of BlockdevOptionsRbd instead
      of members of list @auth-supported?  The latter begs the question
      what multiple entries for the same method mean.  Trivial question
      now that RbdAuthMethod contains nothing but @type, but less so when
      RbdAuthMethod acquires other members, such the ones discussed above.

Again, we decided to stick to the way Ceph configuration works, except
we make auth-client-required a list of enumeration values instead of a
string containing keywords separated by delimiters.

    * How BlockdevOptionsRbd member @auth-supported interacts with
      settings from a configuration file specified with @conf is
      undocumented.  I suspect it's untested, too.

Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster f853465aac block: Make remaining uses of qobject input visitor more robust
Remaining uses of qobject_input_visitor_new_keyval() in the block
subsystem:

* block_crypto_open_opts_init()
  Currently doesn't visit any non-string scalars, thus safe.  It's
  called from
  - block_crypto_open_luks()
    Creates the QDict with qemu_opts_to_qdict_filtered(), which
    creates only string scalars, but has a TODO asking for other types.
  - qcow_open()
  - qcow2_open(), qcow2_co_invalidate_cache(), qcow2_reopen_prepare()

* block_crypto_create_opts_init(), called from
  - block_crypto_co_create_opts_luks()
    Also creates the QDict with qemu_opts_to_qdict_filtered().

* vdi_co_create_opts()
  Also creates the QDict with qemu_opts_to_qdict_filtered().

Replace these uses by qobject_input_visitor_new_flat_confused() for
robustness.  This adds crumpling.  Right now, that's a no-op, but if
we ever extend these things in non-flat ways, crumpling will be
needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster af91062ee1 block: Factor out qobject_input_visitor_new_flat_confused()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster 92adf9dbcd block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts()
The following pattern occurs in the .bdrv_co_create_opts() methods of
parallels, qcow, qcow2, qed, vhdx and vpc:

    qobj = qdict_crumple_for_keyval_qiv(qdict, errp);
    qobject_unref(qdict);
    qdict = qobject_to(QDict, qobj);
    if (qdict == NULL) {
         ret = -EINVAL;
         goto done;
    }

    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
    [...]
    ret = 0;
done:
    qobject_unref(qdict);
    [...]
    return ret;

If qobject_to() fails, we return failure without setting errp.  That's
wrong.  As far as I can tell, it cannot fail here.  Clean it up
anyway, by removing the useless conversion.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster 374c52467a block: Fix -drive for certain non-string scalars
The previous commit fixed -blockdev breakage due to misuse of the
qobject input visitor's keyval flavor in bdrv_file_open().  The commit
message explain why using the plain flavor would be just as wrong; it
would break -drive.  Turns out we break it in three places:
nbd_open(), sd_open() and ssh_file_open().  They are even marked
FIXME.  Example breakage:

    $ qemu-system-x86 -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off
    qemu-system-x86: -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off: Invalid parameter type for 'numeric', expected: boolean

Fix it the same way: replace qdict_crumple() by
qdict_crumple_for_keyval_qiv(), and switch from plain to the keyval
flavor.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster e5af0da1dc block: Fix -blockdev for certain non-string scalars
Configuration flows through the block subsystem in a rather peculiar
way.  Configuration made with -drive enters it as QemuOpts.
Configuration made with -blockdev / blockdev-add enters it as QAPI
type BlockdevOptions.  The block subsystem uses QDict, QemuOpts and
QAPI types internally.  The precise flow is next to impossible to
explain (I tried for this commit message, but gave up after wasting
several hours).  What I can explain is a flaw in the BlockDriver
interface that leads to this bug:

    $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234
    qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid

QMP blockdev-add is broken the same way.

Here's what happens.  The block layer passes configuration represented
as flat QDict (with dotted keys) to BlockDriver methods
.bdrv_file_open().  The QDict's members are typed according to the
QAPI schema.

nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with
qdict_crumple() and a qobject input visitor.

This visitor comes in two flavors.  The plain flavor requires scalars
to be typed according to the QAPI schema.  That's the case here.  The
keyval flavor requires string scalars.  That's not the case here.
nfs_file_open() uses the latter, and promptly falls apart for members
@user, @group, @tcp-syn-count, @readahead-size, @page-cache-size,
@debug.

Switching to the plain flavor would fix -blockdev, but break -drive,
because there the scalars arrive in nfs_file_open() as strings.

The proper fix would be to replace the QDict by QAPI type
BlockdevOptions in the BlockDriver interface.  Sadly, that's beyond my
reach right now.

Next best would be to fix the block layer to always pass correctly
typed QDicts to the BlockDriver methods.  Also beyond my reach.

What I can do is throw another hack onto the pile: have
nfs_file_open() convert all members to string, so use of the keyval
flavor actually works, by replacing qdict_crumple() by new function
qdict_crumple_for_keyval_qiv().

The pattern "pass result of qdict_crumple() to
qobject_input_visitor_new_keyval()" occurs several times more:

* qemu_rbd_open()

  Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only
  string members, its only a latent bug.  Fix it anyway.

* parallels_co_create_opts(), qcow_co_create_opts(),
  qcow2_co_create_opts(), bdrv_qed_co_create_opts(),
  sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts()

  These work, because they create the QDict with
  qemu_opts_to_qdict_filtered(), which creates only string scalars.
  The function sports a TODO comment asking for better typing; that's
  going to be fun.  Use qdict_crumple_for_keyval_qiv() to be safe.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Max Reitz 609f45ea95 block: Add block-specific QDict header
There are numerous QDict functions that have been introduced for and are
used only by the block layer.  Move their declarations into an own
header file to reflect that.

While qdict_extract_subqdict() is in fact used outside of the block
layer (in util/qemu-config.c), it is still a function related very
closely to how the block layer works with nested QDicts, namely by
sometimes flattening them.  Therefore, its declaration is put into this
header as well and util/qemu-config.c includes it with a comment stating
exactly which function it needs.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
[Copyright note tweaked, superfluous includes dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster deadbb8ebb iscsi: Drop deprecated -drive parameter "filename"
Parameter "filename" is deprecated since commit 5c3ad1a6a8, v2.10.0.
Time to get rid of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster bb9f762ff3 rbd: Drop deprecated -drive parameter "filename"
Parameter "filename" is deprecated since commit 91589d9e5c, v2.10.0.
Time to get rid of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Vladimir Sementsov-Ogievskiy b598e531f1 qapi: add x-block-dirty-bitmap-merge
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:32 -04:00
Vladimir Sementsov-Ogievskiy 8b1402ce80 block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap
Add locks and remove comments about BQL accordingly to
dirty_bitmap_mutex definition in block_int.h.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:31 -04:00
Paolo Bonzini b133c27f5d block: simplify code around releasing bitmaps
QLIST_REMOVE does not require walking the list, and once the "bitmap"
argument is removed from bdrv_do_release_matching_dirty_bitmap_locked
the code simplifies a lot and it is worth inlining everything in the
callers of bdrv_do_release_matching_dirty_bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180326104037.6894-1-pbonzini@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:31 -04:00
Paolo Bonzini ab41fc4853 block: remove bdrv_dirty_bitmap_make_anon
All this function is doing will be repeated by
bdrv_do_release_matching_dirty_bitmap_locked, except
resetting bm->persistent.  But even that does not matter
because the bitmap will be freed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180323164254.26487-1-pbonzini@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11 14:53:31 -04:00
Max Reitz ddf3b47ef4 qcow2: Do not mark inactive images corrupt
When signaling a corruption on a read-only image, qcow2 already makes
fatal events non-fatal (i.e., they will not result in the image being
closed, and the image header's corrupt flag will not be set).  This is
necessary because we cannot set the corrupt flag on read-only images,
and it is possible because further corruption of read-only images is
impossible.

Inactive images are effectively read-only, too, so we should do the same
for them.  bdrv_is_writable() can tell us whether an image can actually
be written to, so use its result instead of !bs->read_only.

(Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in
bdrv_co_pwritev() will fail, crashing qemu.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-3-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Alberto Garcia bc33c047d1 throttle: Fix crash on reopen
The throttle block filter can be reopened, and with this it is
possible to change the throttle group that the filter belongs to.

The way the code does that is the following:

  - On throttle_reopen_prepare(): create a new ThrottleGroupMember
    and attach it to the new throttle group.

  - On throttle_reopen_commit(): detach the old ThrottleGroupMember,
    delete it and replace it with the new one.

The problem with this is that by replacing the ThrottleGroupMember the
previous value of io_limits_disabled is lost, causing an assertion
failure in throttle_co_drain_end().

This problem can be reproduced by reopening a throttle node:

   $QEMU -monitor stdio
   -object throttle-group,id=tg0,x-iops-total=1000 \
   -blockdev node-name=hd0,driver=qcow2,file.driver=file,file.filename=hd.qcow2 \
   -blockdev node-name=root,driver=throttle,throttle-group=tg0,file=hd0,read-only=on

   (qemu) block_stream root
   block/throttle.c:214: throttle_co_drain_end: Assertion `tgm->io_limits_disabled' failed.

Since we only want to change the throttle group on reopen there's no
need to create a ThrottleGroupMember and discard the old one. It's
easier if we simply detach it from its current group and attach it to
the new one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180608151536.7378-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Vladimir Sementsov-Ogievskiy 7eb24009db block/qcow2-bitmap: fix free_bitmap_clusters
This assert may fail, because bitmap_table is not initialized. Just
drop it, as it's obvious, that bitmap_table_load sets bitmap_table
parameter only when returning zero.

Reported-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180608101225.2575-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz 3cce51c919 qcow2: Repair OFLAG_COPIED when fixing leaks
Repairing OFLAG_COPIED is usually safe because it is done after the
refcounts have been repaired.  Therefore, it we did not find anyone else
referencing a data or L2 cluster, it makes no sense to not set
OFLAG_COPIED -- and the other direction (clearing OFLAG_COPIED) is
always safe, anyway, it may just induce leaks.

Furthermore, if OFLAG_COPIED is actually consistent with a wrong (leaky)
refcount, we will decrement the refcount with -r leaks, but OFLAG_COPIED
will then be wrong.  qemu-img check should not produce images that are
more corrupted afterwards then they were before.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1527085
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180509200059.31125-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz d1402b5026 block: Add Error parameter to bdrv_amend_options
Looking at the qcow2 code that is riddled with error_report() calls,
this is really how it should have been from the start.

Along the way, turn the target_version/current_version comparisons at
the beginning of qcow2_downgrade() into assertions (the caller has to
make sure these conditions are met), and rephrase the error message on
using compat=1.1 to get refcount widths other than 16 bits.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180509210023.20283-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz b8cf1913a9 block/file-posix: File locking during creation
When creating a file, we should take the WRITE and RESIZE permissions.
We do not need either for the creation itself, but we do need them for
clearing and resizing it.  So we can take the proper permissions by
replacing O_TRUNC with an explicit truncation to 0, and by taking the
appropriate file locks between those two steps.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180509215336.31304-3-mreitz@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Max Reitz d0a96155de block/file-posix: Pass FD to locking helpers
raw_apply_lock_bytes() and raw_check_lock_bytes() currently take a
BDRVRawState *, but they only use the lock_fd field.  During image
creation, we do not have a BDRVRawState, but we do have an FD; so if we
want to reuse the functions there, we should modify them to receive only
the FD.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180509215336.31304-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Paolo Bonzini 68acc99f14 sheepdog: remove huge BSS object
block/sheepdog.o has a 4M static variable that is 90% of QEMU's whole .bss
section.  Replace it with a heap-allocated block, and make it smaller too
since only the inode header is actually being used.

bss size goes down from 4464280 to 269976.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180523160721.14018-3-pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-06-05 10:15:12 -04:00
Paolo Bonzini 03b036cc0c sheepdog: cleanup repeated expression
The expression "SD_INODE_SIZE - sizeof(inode.data_vdi_id)" already has a macro
defined for the same value (though with a nicer definition using offsetof).
Replace it.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180523160721.14018-2-pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-06-05 10:15:12 -04:00
Peter Maydell 0d514fa234 Pull request
* Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)
 
    If the underlying storage supports copy offloading, qemu-img convert will
    use it instead of performing reads and writes.  This avoids data transfers
    and thus frees up storage bandwidth for other purposes.  SCSI EXTENDED COPY
    and Linux copy_file_range(2) are used to implement this optimization.
 
  * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbFSBoAAoJEJykq7OBq3PISpEIAIcMao4/rzinAWXzS+ncK9LO
 6FtRVpgutpHaWX2ayySaz5n2CdR3cNMrpCI7sjY2Kw0lrdkqxPgl5n0SWD+VCl4W
 7+JLz/uF0iUV8X+99e7WGAjZbm9LSlxgn5AQKfrrwyPf0ZfzoYQ5nBMcQ6xjEeQP
 48j2WqJqN9/u8RBD07o11yn0+CE5g56/f12xVjR5ASVodzsAmcZ2OQRMQbM01isU
 1mBekJQkDxJkt5l13Rql8+t+vWz8/9BEW2c/eIDKvoayMqYJpdfKv4DqLloIuHnc
 3RkquA0zUuKtl7xEnEkH/We7fi4QPGW/vyBN7ychS/zKzZFQrXmwqrAuFSw3dKU=
 =vZp+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

Pull request

 * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2)

   If the underlying storage supports copy offloading, qemu-img convert will
   use it instead of performing reads and writes.  This avoids data transfers
   and thus frees up storage bandwidth for other purposes.  SCSI EXTENDED COPY
   and Linux copy_file_range(2) are used to implement this optimization.

 * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning

# gpg: Signature made Mon 04 Jun 2018 12:20:08 BST
# gpg:                using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  main-loop: drop spin_counter
  qemu-img: Convert with copy offloading
  block-backend: Add blk_co_copy_range
  iscsi: Implement copy offloading
  iscsi: Create and use iscsi_co_wait_for_task
  iscsi: Query and save device designator when opening
  file-posix: Implement bdrv_co_copy_range
  qcow2: Implement copy offloading
  raw: Implement copy offloading
  raw: Check byte range uniformly
  block: Introduce API for copy offloading

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 18:34:04 +01:00
Peter Maydell f67c9b693a acpi, vhost, misc: fixes, features
vDPA support, fix to vhost blk RO bit handling, some include path
 cleanups, NFIT ACPI table.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y
 LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk
 EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY
 odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK
 a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR
 KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM=
 =3+V0
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

acpi, vhost, misc: fixes, features

vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 01 Jun 2018 17:25:19 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  vhost-blk: turn on pre-defined RO feature bit
  ACPI testing: test NFIT platform capabilities
  nvdimm, acpi: support NFIT platform capabilities
  tests/.gitignore: add entry for generated file
  arch_init: sort architectures
  ui: use local path for local headers
  qga: use local path for local headers
  colo: use local path for local headers
  migration: use local path for local headers
  usb: use local path for local headers
  sd: fix up include
  vhost-scsi: drop an unused include
  ppc: use local path for local headers
  rocker: drop an unused include
  e1000e: use local path for local headers
  ioapic: fix up includes
  ide: use local path for local headers
  display: use local path for local headers
  trace: use local path for local headers
  migration: drop an unused include
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 10:15:16 +01:00
Fam Zheng b5679fa49c block-backend: Add blk_co_copy_range
It's a BlockBackend wrapper of the BDS interface.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-10-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng 604dfaaa32 iscsi: Implement copy offloading
Issue EXTENDED COPY (LID1) command to implement the copy_range API.

The parameter data construction code is modified from libiscsi's
iscsi-dd.c.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-9-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng 66e75c03b2 iscsi: Create and use iscsi_co_wait_for_task
This loop is repeated a growing number times. Make a helper.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180601092648.24614-8-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng cc9743c236 iscsi: Query and save device designator when opening
The device designator data returned in INQUIRY command will be useful to
fill in source/target fields during copy offloading. Do this when
connecting to the target and save the data for later use.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-7-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng 1efad060d7 file-posix: Implement bdrv_co_copy_range
With copy_file_range(2), we can implement the bdrv_co_copy_range
semantics.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-6-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng fd9fcd37a8 qcow2: Implement copy offloading
The two callbacks are implemented quite similarly to the read/write
functions: bdrv_co_copy_range_from maps for read and calls into bs->file
or bs->backing depending on the allocation status; bdrv_co_copy_range_to
maps for write and calls into bs->file.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-5-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:48 +01:00
Fam Zheng 72d219e2f9 raw: Implement copy offloading
Just pass down to ->file.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-4-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:47 +01:00
Fam Zheng 3844553852 raw: Check byte range uniformly
We don't verify the request range against s->size in the I/O callbacks
except for raw_co_pwritev. This is inconsistent (especially for
raw_co_pwrite_zeroes and raw_co_pdiscard), so fix them, in the meanwhile
make the helper reusable by the coming new callbacks.

Note that in most cases the block layer already verifies the request
byte range against our reported image length, before invoking the driver
callbacks.  The exception is during image creating, after
blk_set_allow_write_beyond_eof(blk, true) is called. But in that case,
the requests are not directly from the user or guest. So there is no
visible behavior change in adding the check code.

The int64_t -> uint64_t inconsistency, as shown by the type casting, is
pre-existing due to the interface.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-3-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:47 +01:00
Fam Zheng fcc6767836 block: Introduce API for copy offloading
Introduce the bdrv_co_copy_range() API for copy offloading.  Block
drivers implementing this API support efficient copy operations that
avoid reading each block from the source device and writing it to the
destination devices.  Examples of copy offload primitives are SCSI
EXTENDED COPY and Linux copy_file_range(2).

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601092648.24614-2-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01 14:41:47 +01:00
Michael S. Tsirkin 0d8c41dae5 block: use local path for local headers
When pulling in headers that are in the same directory as the C file (as
opposed to one in include/), we should use its relative path, without a
directory.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2018-05-31 04:16:06 +03:00
Kevin Wolf 3fb588a0f2 block/create: Mark blockdev-create stable
We're ready to declare the blockdev-create job stable. This renames the
corresponding QMP command from x-blockdev-create to blockdev-create.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:18 +02:00
Kevin Wolf e5ab4347f9 block/create: Make x-blockdev-create a job
This changes the x-blockdev-create QMP command so that it doesn't block
the monitor and the main loop any more, but starts a background job that
performs the image creation.

The basic job as implemented here is all that is necessary to make image
creation asynchronous and to provide a QMP interface that can be marked
stable, but it still lacks a few features that jobs usually provide: The
job will ignore pause commands and it doesn't publish more than very
basic progress yet (total-progress is 1 and current-progress advances
from 0 to 1 when the driver callbacks returns). These features can be
added later without breaking compatibility.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:07 +02:00
Kevin Wolf 1266c9b9f5 job: Add error message for failing jobs
So far we relied on job->ret and strerror() to produce an error message
for failed jobs. Not surprisingly, this tends to result in completely
useless messages.

This adds a Job.error field that can contain an error string for a
failing job, and a parameter to job_completed() that sets the field. As
a default, if NULL is passed, we continue to use strerror(job->ret).

All existing callers are changed to pass NULL. They can be improved in
separate patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-30 13:31:01 +02:00
Kevin Wolf 4a5f2779ba vhdx: Fix vhdx_co_create() return value
.bdrv_co_create() is supposed to return 0 on success, but vhdx could
return a positive value instead. Fix this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2018-05-29 22:19:57 +02:00