Commit graph

2477 commits

Author SHA1 Message Date
Kevin Wolf 3fb06697ae block: Introduce .bdrv_co_preadv/pwritev BlockDriver function
Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf cab3a3563c block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev
It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf 0884447382 block: Support AIO drivers in bdrv_driver_preadv/pwritev()
Instead of registering emulation functions as .bdrv_co_writev, just
directly check whether the function is there or not, and use the AIO
interface if it isn't. This makes the read/write functions more
consistent with how things are done in other places (flush, discard,
etc.)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 78a07294d5 block: Introduce bdrv_driver_pwritev()
This is a function that simply calls into the block driver for doing a
write, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

This one is a bit more interesting than the version for reads: It adds
support for .bdrv_co_writev_flags() everywhere, so that drivers
implementing this function can drop .bdrv_co_writev() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 166fe96051 block: Introduce bdrv_driver_preadv()
This is a function that simply calls into the block driver for doing a
read, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

For now, this is just a wrapper for calling bs->drv->bdrv_co_readv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini dd7f7ed104 linux-aio: make it more type safe
Replace void* with an opaque LinuxAioState type.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini 6b98bd6495 block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini ce0f141259 block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini b6e84c97ed block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
Do not call bdrv_drain_recurse twice in bdrv_co_drain.  A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway.  But better get it right.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini a72f641407 block: move restarting of throttled reqs to block/throttle-groups.c
We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini 733bbc8cea block: make bdrv_start_throttled_reqs return void
The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf 90c78624f1 block: Don't disable I/O throttling on sync requests
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Eric Blake 15c2f669e3 qapi: Split visit_end_struct() into pieces
As mentioned in previous patches, we want to call visit_end_struct()
functions unconditionally, so that visitors can release resources
tied up since the matching visit_start_struct() without also having
to worry about error priority if more than one error occurs.

Even though error_propagate() can be safely used to ignore a second
error during cleanup caused by a first error, it is simpler if the
cleanup cannot set an error.  So, split out the error checking
portion (basically, input visitors checking for unvisited keys) into
a new function visit_check_struct(), which can be safely skipped if
any earlier errors are encountered, and leave the cleanup portion
(which never fails, but must be called unconditionally if
visit_start_struct() succeeded) in visit_end_struct().

Generated code in qapi-visit.c has diffs resembling:

|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v,
|         goto out_obj;
|     }
|     visit_type_ACPIOSTInfo_members(v, obj, &err);
|-    error_propagate(errp, err);
|-    err = NULL;
|+    if (err) {
|+        goto out_obj;
|+    }
|+    visit_check_struct(v, &err);
| out_obj:
|-    visit_end_struct(v, &err);
|+    visit_end_struct(v);
| out:

and in qapi-event.c:

@@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP
|         goto out;
|     }
|     visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, &param, &err);
|-    visit_end_struct(v, err ? NULL : &err);
|+    if (!err) {
|+        visit_check_struct(v, &err);
|+    }
|+    visit_end_struct(v);
|     if (err) {
|         goto out;

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com>
[Conflict with a doc fixup resolved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Kevin Wolf d208c50d9d vvfat: Fix default volume label
Commit d5941dd documented that it leaves the default volume name as it
was ("QEMU VVFAT"), but it doesn't actually implement this. You get an
empty name (eleven space characters) instead.

This fixes the implementation to apply the advertised default.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-29 11:14:13 +02:00
Kevin Wolf ebb72c9f06 vvfat: Fix volume name assertion
Commit d5941dd made the volume name configurable, but it didn't consider
that the rw code compares the volume name string to assert that the
first directory entry is the volume name. This made vvfat crash in rw
mode.

This fixes the assertion to compare with the configured volume name
instead of a literal string.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-29 11:14:08 +02:00
Fam Zheng ab27c3b5e7 mirror: Workaround for unexpected iohandler events during completion
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945c8 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Until then, the request loss has been silent. Later, 3f09bfbc7b added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:44:09 +02:00
Fam Zheng 4150ae60eb mirror: Don't extend the last sub-chunk
The last sub-chunk is rounded up to the copy granularity in the target
image, resulting in a larger size than the source.

Add a function to clip the copied sectors to the end.

This undoes the "wrong" changes to tests/qemu-iotests/109.out in
e5b43573e2. The remaining two offset changes are okay.

[ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ]

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-04-20 16:52:55 +02:00
Max Reitz f27a274259 block/mirror: Refresh stale bitmap iterator cache
If the drive's dirty bitmap is dirtied while the mirror operation is
running, the cache of the iterator used by the mirror code may become
stale and not contain all dirty bits.

This only becomes an issue if we are looking for contiguously dirty
chunks on the drive. In that case, we can easily detect the discrepancy
and just refresh the iterator if one occurs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Max Reitz 9c83625bdd block/mirror: Revive dead yielding code
mirror_iteration() is supposed to wait if the current chunk is subject
to a still in-flight mirroring operation. However, it mixed checking
this conflict situation with checking the dirty status of a chunk. A
simplification for the latter condition (the first chunk encountered is
always dirty) led to neglecting the former: We just skip the first chunk
and thus never test whether it conflicts with an in-flight operation.

To fix this, pull out the code which waits for in-flight operations on
the first chunk of the range to be mirrored to settle.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Jeff Cody d85fa9eb87 block/gluster: prevent data loss after i/o error
Upon receiving an I/O error after an fsync, by default gluster will
dump its cache.  However, QEMU will retry the fsync, which is especially
useful when encountering errors such as ENOSPC when using the werror=stop
option.  When using caching with gluster, however, the last written data
will be lost upon encountering ENOSPC.  Using the write-behind-cache
xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
to retain the cached data after a failed fsync, so that ENOSPC and other
transient errors are recoverable.

Unfortunately, we have no way of knowing if the
'resync-failed-syncs-after-fsync' xlator option is supported, so for now
close the fd and set the BDS driver to NULL upon fsync error.

Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Jeff Cody 5d4343e6c2 block/gluster: code movement of qemu_gluster_close()
Move qemu_gluster_close() further up in the file, in preparation
for the next patch, to avoid a forward declaration.

Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Jeff Cody a882745356 block/gluster: return correct error value
Upon error, gluster will call the aio callback function with a
ret value of -1, with errno set to the proper error value.  If
we set the acb->ret value to the return value in the callback,
that results in every error being EPERM (i.e. 1).  Instead, set
it to the proper error result.

Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Kevin Wolf 16aaf975ee block: Don't ignore flags in blk_{,co,aio}_write_zeroes()
Commit 57d6a428 neglected to pass the given flags to blk_aio_prwv(),
which broke discard by WRITE SAME for scsi-disk (the UNMAP bit would be
ignored).

Commit fc1453cd introduced the same bug for blk_write_zeroes(). This is
used for 'qemu-img convert' without has_zero_init (e.g. on a block
device) and for preallocation=falloc in parallels.

Commit 8896e088 is the version for blk_co_write_zeroes(). This function
is only used in qemu-io.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 9c057d0b68 block/vpc: update comments to be compliant w/coding guidelines
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 32f6439cf7 block/vpc: set errp in vpc_open
Add more useful error information to failure paths in vpc_open

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 66176fc6a7 block/vpc: make checks on max table size a bit more lax
The check on the max_table_size field not being larger than required is
valid, and in accordance with the VHD spec.  However, there have been
VHD images encountered in the wild that have an out-of-spec max table
size that is technically too large.

There is no issue in allowing this larger table size, as we also
later verify that the computed size (used for the pagetable) is
large enough to fit all sectors.  In addition, max_table_entries
is bounds checked against SIZE_MAX and INT_MAX.

Remove the strict check, so that we can accomodate these sorts of
images that are benignly out of spec.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reported-by: Grant Wu <grantwwu@gmail.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody c23fb11bbb block/vpc: Use the correct max sector count for VHD images
The old VHD_MAX_SECTORS value is incorrect, and is a throwback
to the CHS calculations.  The VHD specification allows images up to 2040
GiB, which (using 512 byte sectors) corresponds to a maximum number of
sectors of 0xff000000, rather than the old value of 0xfe0001ff.

Update VHD_MAX_SECTORS to reflect the correct value.

Also, update comment references to the actual size limit, and correct
one compare so that we can have sizes up to the limit.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody bab246db1d block/vpc: use current_size field for XenConverter VHD images
XenConverter VHD images are another VHD image where current_size is
different from the CHS values in the the format header.  Use
current_size as the default, by looking at the creator_app signature
field.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Stefan Hajnoczi 9bdfb9e8ac vpc: use current_size field for XenServer VHD images
The vpc driver has two methods of determining virtual disk size.  The
correct one to use depends on the software that generated the image
file.  Add the XenServer creator_app signature so that image size is
correctly detected for those images.

Reported-by: Grant Wu <grantwwu@gmail.com>
Reported-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody 0211b9becc block/vpc: set errp in vpc_create
Add more useful error information to failure paths in vpc_create().

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:11 +02:00
Kevin Wolf 7fa84cd8d4 block: Fix blk_aio_write_zeroes()
Commit 57d6a428 broke blk_aio_write_zeroes() because in some write
functions in the call path don't have an explicit length argument but
reuse qiov->size instead. Which is great, except that write_zeroes
doesn't have a qiov, which this commit interprets as 0 bytes.
Consequently, blk_aio_write_zeroes() didn't effectively do anything.

This patch introduces an explicit acb->bytes in BlkAioEmAIOCB and uses
that instead of acb->rwco.size.

The synchronous version of the function is okay because it does pass a
qiov (with the right size and a NULL pointer as its base).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:22:11 +02:00
Max Reitz 4e876bcf2b qcow2: Prevent backing file names longer than 1023
We reject backing file names with a length of more than 1023 characters
when opening a qcow2 file, so we should not produce such files
ourselves.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Paolo Bonzini 40a99aace3 vpc: fix return value check for blk_pwrite
bdrv_pwrite_sync used to return zero or negative error, while blk_pwrite returns
the number of written bytes when successful.  This caused VPC image creation
to fail spectacularly: it wrote the first 512 bytes, and then exited immediately
because of the non-zero answer from blk_pwrite.  But the truly spectacular part
is that it returns a positive value (the 512 that blk_pwrite returned) causing
everyone to believe that it succeeded.

This fixes qemu-iotests with vpc format.

Fixes: b8f45cdf78
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Fam Zheng 39bf92dd70 mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs)
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Fam Zheng a77fd4bb29 block: Fix bdrv_drain in coroutine
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.

For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().

Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain
    while (has tracked request)
      aio_poll()

In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).

With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain.enter
>   schedule BH
>   qemu_coroutine_yield()
>                                  qcow2:qemu_co_mutex_lock.return
>                                  ...
                                   tracked request end
    ...
    (resumed from BH callback)
  bdrv_drain.return
  ...

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Peter Maydell 31370dbe5d Block layer patches for 2.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJXA9qJAAoJEH8JsnLIjy/W+W8P/2xTXH8+h0qurUvv5Rz6HUbD
 HHNlGnqa3M5yMLMqmtlb0J/9dj0pTlGNkp7d/9blh3MlZH/ZpeUMtq8Lro23YhUF
 0J94sRKGhK3T5GqYSA/BFbVvXQJ3yX7cKcYaQjmh7rK6Ua+65Mv/dulci+jbfGuu
 BkiVgumGAalSeaFqXZR685g61ZHbz+mQJnd3VFcvletnPBu0j1GMkuU0THAcy09q
 CTUjwWlL9CHu1lYkAa0KxgFtj6mZ+gEu5ws5Lvk8yFtSB+af/mJtzoHuq/7+Ske2
 7SiVXotFW8kR7ic1TnWiEku8+31FSBVJp6xUcRVTDOHVG7oSQxBDg5bGwPn8TxXy
 bvLvTJDIFodGhkiDTuGLuttvX+U2xCl4GmBS01OiFF53UGWgjjY+pkDZiaNC4nFW
 vwItj7/KGKL2Nq6cVfGCDOYYjFtHAPGI3yyJ2babXecv+9nKr0WeJpk6cfKVnP17
 rZs28Y3Ub/P2M4oOt4YdhRSanQZbe5eIQOsdfWX4q12hujL0zbsCtV6dpjeTsY74
 J4CBLzBYCj6y9Jc8R+D6XLYJBtJQGaSj99Oqe9WdUuHEqTGQt4HMuYHAj77wImVG
 ccURYiBpmB+FChLhG+yIlo1PHS0kpgeD+ZkZDHC0gYdiFqdKnFeQ7lj3Jj2tSWXY
 7Y7qbaOhtXp+20M6oO+G
 =ax/b
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.6

# gpg: Signature made Tue 05 Apr 2016 16:32:25 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  crypto: Avoid memory leak on failure
  qemu-iotests: 149: Use "/usr/bin/env python"
  block: Forbid I/O throttling on nodes with multiple parents for 2.6
  block: forbid x-blockdev-del from acting on DriveInfo

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 17:03:32 +01:00
Eric Blake 95c3df5a24 crypto: Avoid memory leak on failure
Commit 7836857 introduced a memory leak due to invalid use of
Error vs. visit_type_end().  If visiting the intermediate
members fails, we clear the error and unconditionally use
visit_end_struct() on the same error object; but if that
cleanup succeeds, we then skip the qapi_free call.

Until a later patch adds visit_check_struct(), the only safe
approach is to use two separate error objects.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1459526222-30052-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-05 17:23:21 +02:00
Eric Blake a89ef0c357 nbd: don't request FUA on FLUSH
The NBD protocol does not clearly document what will happen
if a client sends NBD_CMD_FLAG_FUA on NBD_CMD_FLUSH.
Historically, both the qemu and upstream NBD servers silently
ignored that flag, but that feels a bit risky.  Meanwhile, the
qemu NBD client unconditionally sends the flag (without even
bothering to check whether the caller cares; at least with
NBD_CMD_WRITE the client only sends FUA if requested by a
higher layer).

There is ongoing discussion on the NBD list to fix the
protocol documentation to require that the server MUST ignore
the flag (unless the kernel folks can better explain what FUA
means for a flush), but until those doc improvements land, the
current nbd.git master was recently changed to reject the flag
with EINVAL (see nbd commit ab22e082), which now makes it
impossible for a qemu client to use FLUSH with an upstream NBD
server.

We should not send FUA with flush unless the upstream protocol
documents what it will do, and even then, it should be something
that the caller can opt into, rather than being unconditional.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459526902-32561-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Stefan Hajnoczi 0d94b74655 block/nfs: add missing #include "qemu/cutils.h"
parse_uint_full() used to be included from qemu-common.h but was moved
to qemu/cutils.h in commit f348b6d1a5
("util: move declarations out of qemu-common.h").

Cc: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1459341994-20567-3-git-send-email-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-30 16:50:39 -04:00
Stefan Hajnoczi d165b8cb8b block/nfs: add missing #include "qapi/error.h"
error_setg() used to be included indirectly through qemu/osdep.h.  Since
commit da34e65cb4 ("include/qemu/osdep.h:
Don't include qapi/error.h") it requires an explicit include.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1459341994-20567-2-git-send-email-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-30 16:50:39 -04:00
Max Reitz a90639270d block/null-{co,aio}: Implement get_block_status()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:04 +02:00
Max Reitz cd219eb1e5 block/null-{co,aio}: Allow reading zeroes
This is optional so that it does not impede the null block driver's
performance unless this behavior is desired.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf 09cf9db1bc block: Remove bdrv_(set_)enable_write_cache()
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf 61de4c6808 block: Remove BDRV_O_CACHE_WB
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.

At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.

This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf 5481531154 raw: Support BDRV_REQ_FUA
Pass through the FUA flag to the lower layer so that the separate flush
can be saved in practically relevant cases where a (raw) format driver
sits on top of the protocol driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 2b556518c3 nbd: Support BDRV_REQ_FUA
The NBD server already used to send a FUA flag when the writethrough
mode was set. This code was a remnant from the times where protocol
drivers actually had to implement writethrough modes. Since nowadays the
block layer sends flushes in writethrough mode and non-root nodes are
always writeback, this was mostly dead code - only mostly because if NBD
was configured to be used without a format, we sent _both_ FUA and an
explicit flush afterwards, which makes the code not technically dead,
but useless overhead.

This patch changes the code so that the block layer's FUA flag is
recognised and translated into a NBD FUA flag. The additional flush is
avoided now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 9f0eb9e129 iscsi: Support BDRV_REQ_FUA
This replaces the existing hack in the iscsi driver that sent the FUA
bit in writethrough mode and ignored the following flush in order to
optimise the number of roundtrips (see commit 73b5394e).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 93f5e6d88a block: Introduce bdrv_co_writev_flags()
This function will allow drivers to implement BDRV_REQ_FUA natively
instead of sending a separate flush after the write.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf c83f9fba2a block/qapi: Use blk_enable_write_cache()
Now that WCE is handled on the BlockBackend level, the flag is
meaningless for BDSes. As the schema requires us to fill the field,
we return an enabled write cache for them.

Note that this means that querying the BlockBackend name may return
writethrough as the cache information, whereas querying the node-name of
the root of that same BlockBackend will return writeback.

This may appear odd at first, but it actually makes sense because it
correctly repesents the layer that implements the WCE handling. This
becomes more apparent when you consider nodes that are the root node of
multiple BlockBackends, where each BB can have its own WCE setting.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf bfd18d1e0b block: Move enable_write_cache to BB level
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.

Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf 855a6a93a1 block: Handle flush error in bdrv_pwrite_sync()
We don't want to silently ignore a flush error.

Also, there is little point in avoiding the flush for writethrough modes
and once WCE is moved to the BB layer, we definitely need the flush here
because bdrv_pwrite() won't involve one any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf 72e775c7d9 block: Always set writeback mode in blk_new_open()
All callers of blk_new_open() either don't rely on the WCE bit set after
blk_new_open() because they explicitly set it anyway, or they pass
BDRV_O_CACHE_WB unconditionally.

This patch changes blk_new_open() so that it always enables writeback
mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that
used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Pavel Dovgalyuk 63785678f3 replay: introduce block devices record/replay
This patch introduces block driver that implement recording
and replaying of block devices' operations.
All block completion operations are added to the queue.
Queue is flushed at checkpoints and information about processed requests
is recorded to the log. In replay phase the queue is matched with
events read from the log. Therefore block devices requests are processed
deterministically.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
[ kwolf: Rebased onto modified and already applied part of the series ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:15:57 +02:00
Pavel Dovgalyuk c32b82afaf block: add flush callback
This patch adds callback for flush request. This callback is responsible
for flushing whole block devices stack. bdrv_flush function does not
proceed to underlying devices. It should be performed by this callback
function, if needed.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange e6ff69bf5e block: move encryption deprecation warning into qcow code
For a couple of releases we have been warning

  Encrypted images are deprecated
  Support for them will be removed in a future release.
  You can use 'qemu-img convert' to convert your image to an unencrypted one.

This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.

The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.

When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.

This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange 78368575a6 block: add generic full disk encryption driver
Add a block driver that is capable of supporting any full disk
encryption format. This utilizes the previously added block
encryption code, and at this time supports the LUKS format.

The driver code is capable of supporting any format supported
by the QCryptoBlock module, so it registers one block driver
for each format. This patch only registers the "luks" driver
since the "qcow" driver is there only for back-compatibility
with existing qcow built-in encryption.

New LUKS compatible volumes can be formatted using qemu-img
with defaults for all settings.

$ qemu-img create --object secret,data=123456,id=sec0 \
      -f luks -o key-secret=sec0 demo.luks 10G

Alternatively the cryptographic settings can be explicitly
set

$ qemu-img create --object secret,data=123456,id=sec0 \
      -f luks -o key-secret=sec0,cipher-alg=aes-256,\
                 cipher-mode=cbc,ivgen-alg=plain64,hash-alg=sha256 \
      demo.luks 10G

And query its size

$ qemu-img info demo.img
image: demo.img
file format: luks
virtual size: 10G (10737418240 bytes)
disk size: 132K
encrypted: yes

Note that it was not necessary to provide the password
when querying info for the volume. The password is only
required when performing I/O on the volume

All volumes created by this new 'luks' driver should be
capable of being opened by the kernel dm-crypt driver.

The only algorithms listed in the LUKS spec that are
not currently supported by this impl are sha512 and
ripemd160 hashes and cast6 cipher.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[ kwolf - Added #include to resolve conflict with da34e65c ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:11:26 +02:00
Daniel P. Berrange abb06c5ac1 block: add flag to indicate that no I/O will be performed
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info

This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Max Reitz 5430215699 block/qapi: Pass bdrv_query_blk_stats() s->stats
bdrv_query_blk_stats() does not need access to all of BlockStats,
BlockDeviceStats is enough and is what this function is actually
supposed to fill.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Max Reitz 0e8f44bee9 block/qapi: Set s->device in bdrv_query_stats()
This is the only instance of bdrv_query_blk_stats() accessing anything
in the BlockStats structure other than s->stats, so let us move it to
its caller (where it makes just as much sense) allowing us to make
bdrv_query_blk_stats() take a pointer to the BlockDeviceStats instead of
BlockStats.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Xu 5eda622768 block/qapi: fix unbounded stack for dump_qdict
Using heap instead of stack for better safety.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Xu 853ccfed8f block/qapi: make two printf() formats literal
Fix two places to use literal printf format when possible.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf 72f41b6fbd block: Remove blk_set_bs()
The function is unused since commit f21d96d0 ('block: Use BdrvChild in
BlockBackend').

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Programmingkid d0855f1235 block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host
Mac OS X can be picky when it comes to allowing the user
to use physical devices in QEMU. Most mounted volumes
appear to be off limits to QEMU. If an issue is detected,
a message is displayed showing the user how to unmount a
volume. Now QEMU uses both CD and DVD media.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Maydell 553934db66 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJW+dDJAAoJEL2+eyfA3jBXk6oP/R/zX4foUVFMTvDbxHwWc41t
 gXGk1BpIjFnteab/tzUBDIdgs/DPxzM6bClhe45gNInVBgnOyeVmpUwRGGNYQKbn
 FdkrAcC6Vy6BJv+xRTMMS+h4i6ebJ6HqqQPwkz0VulxsAknDPQsBebe0tM8uO7k9
 G+ccMYOyUUiGTIRC3pBkRCu8APEialPSv3MpUTMtp71R3US+pEwmo1AgyOFq/lDu
 B/8LUBoR48XCEGfOA6ZixzoMwF1lTWpezx5/KF+fQ26sgnNzjpwYWnJk+LG7Gtvj
 8PHYsHDoXSISlIgxzLpS0AA6s54+mutgIeNJG5FBXGrSSNlAB1+cKZsnZw42YjfI
 BVIHQkmcGT+h9UEDdekiOfQorypSYRm51ueTGO/lUbxNifvJ5LQA97F0G/filoCj
 ovGIfOwgpWaEBPCb//U1TRGhhTg+dNyCeC4GoxDEFyWmLPYp8p7Xtz+vsZOIdH4O
 Wl9i6BzzeNEgJyutKqn2qpNLl6Pfd548MOJJqAUkGxDGrCJMkmn2lJSpSSji6cdm
 y4Az/tPY0/xpxwjSRakaIMOlhDoGXmrQG+I6JG1TZLSH7x1+Ajhr2ryx4CBONceV
 1quibAqoG1GwxCyYn7dv4aeJrDlg3XzEWQW6nJhuE91d9ZH+jF5u2+i+IZcQCDBe
 Cd6d0SZlcOnq3M5LiOrA
 =1ekF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 29 Mar 2016 01:48:09 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  qemu-iotests: add no-op streaming test
  qemu-iotests: fix test_stream_partial()
  block: never cancel a streaming job without running stream_complete()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-29 19:54:49 +01:00
Alberto Garcia 6578629e08 block: never cancel a streaming job without running stream_complete()
We need to call stream_complete() in order to do all the necessary
clean-ups, even if there's an early failure. At the moment it's only
useful to make sure that s->backing_file_str is not leaked, but it
will become more important if we introduce support for streaming to
any intermediate node.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 2abedf2debc65c250560237f31a8e6756883c8fc.1458566441.git.berto@igalia.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-28 13:56:44 -04:00
Veronia Bahaa f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Rutuja Shah 73bcb24d93 Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed.  This replacement improves the readability and
understandability of code.

For example,

    timer_mod(fdctrl->result_timer,
	      qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));

NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.

Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster daf015ef5a include/qemu/iov.h: Don't include qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

qemu/iov.h includes qemu-common.h for QEMUIOVector stuff.  Move all
that to qemu/iov.h and drop the ill-advised include.  Include
qemu/iov.h where the QEMUIOVector stuff is now missing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Eric Blake 32bafa8fdd qapi: Don't special-case simple union wrappers
Simple unions were carrying a special case that hid their 'data'
QMP member from the resulting C struct, via the hack method
QAPISchemaObjectTypeVariant.simple_union_type().  But by using
the work we started by unboxing flat union and alternate
branches, coupled with the ability to visit the members of an
implicit type, we can now expose the simple union's implicit
type in qapi-types.h:

| struct q_obj_ImageInfoSpecificQCow2_wrapper {
|     ImageInfoSpecificQCow2 *data;
| };
|
| struct q_obj_ImageInfoSpecificVmdk_wrapper {
|     ImageInfoSpecificVmdk *data;
| };
...
| struct ImageInfoSpecific {
|     ImageInfoSpecificKind type;
|     union { /* union tag is @type */
|         void *data;
|-        ImageInfoSpecificQCow2 *qcow2;
|-        ImageInfoSpecificVmdk *vmdk;
|+        q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
|+        q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
|     } u;
| };

Doing this removes asymmetry between QAPI's QMP side and its
C side (both sides now expose 'data'), and means that the
treatment of a simple union as sugar for a flat union is now
equivalent in both languages (previously the two approaches used
a different layer of dereferencing, where the simple union could
be converted to a flat union with equivalent C layout but
different {} on the wire, or to an equivalent QMP wire form
but with different C representation).  Using the implicit type
also lets us get rid of the simple_union_type() hack.

Of course, now all clients of simple unions have to adjust from
using su->u.member to using su->u.member.data; while this touches
a number of files in the tree, some earlier cleanup patches
helped minimize the change to the initialization of a temporary
variable rather than every single member access.  The generated
qapi-visit.c code is also affected by the layout change:

|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
|     }
|     switch (obj->type) {
|     case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
|-        visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err);
|+        visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err);
|         break;
|     case IMAGE_INFO_SPECIFIC_KIND_VMDK:
|-        visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err);
|+        visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err);
|         break;
|     default:
|         abort();

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Alberto Garcia 6049490df4 quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode
If there's an I/O error in one of Quorum children then QEMU
should emit QUORUM_REPORT_BAD. However this is not working with
read-pattern=fifo. This patch fixes this problem.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: d57e39e8d3e8564003a1e2aadbd29c97286eb2d2.1458034554.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-17 16:43:30 +01:00
Kevin Wolf 8896e08814 block: Use blk_co_pwritev() in blk_co_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf 57d6a42883 block: Use blk_aio_prwv() for aio_read/write/write_zeroes
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf a55d3fba99 block: Use blk_prw() in blk_pread()/blk_pwrite()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf fc1453cdfc block: Use blk_co_pwritev() in blk_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf 5bd5119667 block: Pull up blk_read_unthrottled() implementation
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf a8823a3bfd block: Use blk_co_pwritev() for blk_write()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf 1bf1cbc91f block: Use blk_co_preadv() for blk_read()
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf f21d96d04b block: Use BdrvChild in BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz 981f4f578e block: Add blk_next_root_bs()
This function iterates over all BDSs attached to a BB. We are going to
need it when rewriting bdrv_next() so it no longer uses bdrv_states.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz fe1a9cbc33 block: Move some bdrv_*_all() functions to BB
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 7c735873d9 blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()
We can basically inline it in hmp_drive_del(); monitor_remove_blk() is
called already, so we just need to call bdrv_make_anon(), too.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz efaa7c4eeb blockdev: Split monitor reference from BB creation
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.

In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".

If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz e5e785500b blockdev: Separate BB name management
Introduce separate functions (monitor_add_blk() and
monitor_remove_blk()) which set or unset a BB name. Since the name is
equivalent to the monitor's reference to a BB, adding a name the same as
declaring the BB to be monitor-owned and removing it revokes this
status, hence the function names.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 2cf22d6a1a blockdev: Add list of all BlockBackends
While monitor_block_backends contains nearly all BBs, we sometimes
really need all BBs. To this end, this patch adds the block_backend
list.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 9492b0b928 blockdev: Rename blk_backends
The blk_backends list does not contain all BlockBackends but only the
ones which are referenced by the monitor, and that is not necessarily
true for every BlockBackend. Rename the list to monitor_block_backends
to make that fact clear.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz a55448b368 qapi: Drop QERR_UNKNOWN_BLOCK_FORMAT_FEATURE
Just specifying a custom string is simpler in basically all places that
used it, and in addition, specifying the BB or node name is something we
generally do not do in other error messages when opening a BDS, so we
should not do it here.

This changes the output for iotest 036 (to the better, in my opinion),
so the reference output needs to be changed accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 1393f21270 block: Add blk_commit_all()
Later, we will remove bdrv_commit_all() and move its contents here, and
in order to replace bdrv_commit_all() calls by calls to blk_commit_all()
before doing so, we need to add it as an alias now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz 74d1b8fc27 block: Use blk_next() in block-backend.c
Instead of iterating directly through blk_backends, we can use
blk_next() instead. This gives us some abstraction from the list itself
which we can use to rename it, for example.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Stefan Hajnoczi 1f3ddfcb25 Revert "qed: Implement .bdrv_drain"
This reverts commit df9a681dc9.

Note that commit df9a681dc9 included some
unrelated hunks, possibly due to a merge failure or an overlooked
squash.  This only reverts the qed .bdrv_drain() implementation.

The qed .bdrv_drain() implementation is unsafe and can lead to a double
request completion.

Paolo Bonzini reports:
"The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
unconditionally, but this is not correct if an allocating write is
queued.  In this case, qed_unplug_allocating_write_reqs will restart the
allocating write and possibly cause it to complete.  The aiocb however
is still in use for the L2/L1 table writes, and will then be completed
again as soon as the table writes are stable."

For QEMU 2.6 we can simply revert this commit.  A full solution for the
qed need check timer may be added if the bdrv_drain() implementation is
extended.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1457431876-8475-1-git-send-email-stefanha@redhat.com
2016-03-17 09:50:14 +00:00
Jeff Cody 03c698f0a2 block/sheepdog: fix argument passed to qemu_strtoul()
The function qemu_strtoul() reads 'unsigned long' sized data,
which is larger than uint32_t on 64-bit machines.

Even though the snap_id field in the header is 32-bits, we must
accommodate the full size in qemu_strtoul().

This patch also adds more meaningful error handling to the
qemu_strtoul() call, and subsequent results.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Message-id: e56fc50abedd9a112e0683342c8eafda063cd2f9.1456935548.git.jcody@redhat.com
2016-03-16 13:25:29 -04:00
Alberto Garcia b9c600d207 quorum: Fix crash in quorum_aio_cb()
quorum_aio_cb() emits the QUORUM_REPORT_BAD event if there's
an I/O error in a Quorum child. However sacb->aiocb must be
correctly initialized for this to happen. read_quorum_children() and
read_fifo_child() are not doing this, which results in a QEMU crash.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8138570d071ba7e25db3736979234a1fd71dbd05.1457610443.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:06 +01:00
Fam Zheng ebab225910 block: Move block dirty bitmap code to separate files
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.

Also two long lines (bdrv_get_dirty) are wrapped.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Fam Zheng b2f56462d5 backup: Use Bitmap to replace "s->bitmap"
"s->bitmap" tracks done sectors, we only check bit states without using any
iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and
more memory efficient.

Meanwhile, rename it to done_bitmap, to reflect the intention.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Kevin Wolf b8f45cdf78 vpc: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:44 +01:00
Kevin Wolf c4bea1690e vmdk: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 10bf03af12 vhdx: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf a08f0c3b5f vdi: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf fba98d455a sheepdog: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 8a56fdadaf qed: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 23588797b6 qcow2: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 6af4016020 qcow: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 8942764f54 parallels: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf c10c9d9615 block: Introduce blk_set_allow_write_beyond_eof()
We check that the guest can't write beyond the end of its disk, but for
other internal users it can make sense to allow growing a file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf 6340472c54 block: Use writeback in .bdrv_create() implementations
There's no reason to use a writethrough cache mode while creating an
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng 71968dbfd8 vmdk: Switch to heap arrays for vmdk_parent_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng 5997c210b9 vmdk: Switch to heap arrays for vmdk_read_cid
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng 965415eb20 vmdk: Switch to heap arrays for vmdk_write_cid
It is only called once for each opened image, so we can do it the easy
way.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie 924e8a2bbc quorum: modify vote rules for flush operation
Keep flush interface the same logic as quorum read/write, Otherwise in
following scenario, we'll encounter unexpected errors.

Quorum has two children(A, B). A do flush sucessfully, but B flush failed.
This cause the filesystem of guest become read-only with following errors:

end_request: I/O error, dev vda, sector 11159960
Aborting journal on device vda3-8
EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal
EXT4-fs (vda3): Remounting filesystem read-only

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie 0ae053b7e1 qmp event: Refactor QUORUM_REPORT_BAD
Introduce QuorumOpType, and make QUORUM_REPORT_BAD compatible
with it.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Jeff Cody fb9245c261 block/vpc: give option to force the current_size field in .bdrv_create
When QEMU creates a VHD image, it goes by the original spec,
calculating the current_size based on the nearest CHS geometry (with an
exception for disks > 127GB).

Apparently, Azure will only allow images that are sized to the nearest
MB, and the current_size as calculated from CHS cannot guarantee that.

Allow QEMU to create images similar to how Hyper-V creates images, by
setting current_size to the specified virtual disk size.  This
introduces an option, force_size, to be passed to the vpc format during
image creation, e.g.:

    qemu-img convert -f raw -o force_size -O vpc test.img test.vhd

When using the "force_size" option, the creator app field used by
QEMU will be "qem2" instead of "qemu", to indicate the difference.
In light of this, we also add parsing of the "qem2" field during
vpc_open.

Bug reference: https://bugs.launchpad.net/qemu/+bug/1490611

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Jeff Cody c540d53ac8 block/vpc: choose size calculation method based on creator_app field
The VHD file format is used by both Virtual PC, and Hyper-V.  However,
how the virtual disk size is calculated varies between the two.

Virtual PC uses the CHS drive parameters to determine the drive size.
Hyper-V, on the other hand, uses the current_size field in the footer
when determining image size.

This is problematic for a few reasons:

* VHD images from Hyper-V, using CHS calculations, will likely be
  trunctated.

* If we just rely always on current_size, then QEMU may have data
  compatibility issues with Virtual PC (we may write too much data
  into a VHD file to be used by Virtual PC, for instance).

* Existing VHD images created by QEMU have used the CHS calculations,
  except for images exceeding the 127GB limit.  We want to remain
  compatible with our own generated images.

Luckily, the VHD specification defines a 'Creator App' field, that is
used to indicate what software created the VHD file.

This patch does two things:

    1. Uses the 'Creator App' field to help determine how to calculate
       size, and

    2. Adds a VPC format option 'force_size_calc', so that the user can
       override the 'Creator App' auto-detection, in case there exist
       VHD images with unknown or contradictory 'Creator App' entries.

N.B.: We currently use the maximum CHS value as an indication to use the
current_size field.  This patch does not change that, even with the
'force_size_calc' option.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf c21cc6ca98 block/qapi: Include empty drives in query-blockstats
Since commit 5ec18f8c, query-blockstats didn't return the statistics of
drives without media any more because such drives have only a BB now,
but not a BDS any more.

This patch fixes the regression so that query-blockstats iterates over
BBs by default and empty drives are displayed again.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf b07363a1a3 block/qapi: Factor out bdrv_query_bds_stats()
The new functions handles the data that is taken from the
BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf 2b77e60ab8 block/qapi: Factor out bdrv_query_blk_stats()
The new functions handles the data that is taken from the BlockBackend.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Daniel P. Berrange b16a44e13e osdep: remove use of socket_error() from all code
Now that QEMU wraps the Win32 sockets methods to automatically
set errno upon failure, there is no reason for callers to use
the socket_error() method. They can rely on accessing errno
even on Win32. Remove all use of socket_error() from general
code, leaving it as a static method in oslib-win32.c only.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:34 +00:00
Eric Blake 0399293e5b util: Shorten references into SocketAddress
An upcoming patch will alter how simple unions, like SocketAddress,
are laid out, which will impact all lines of the form 'addr->u.XXX'
(expanding it to the longer 'addr->u.XXX.data').  For better
legibility in that patch, and less need for line wrapping, it's better
to use a temporary variable to reduce the effect of a layout change to
just the variable initializations, rather than every reference within
a SocketAddress.  Also, take advantage of some C99 initialization where
it makes sense (simplifying g_new0() to g_new()).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:52 +01:00
John Snow 4c9bca7e39 block/backup: avoid copying less than full target clusters
During incremental backups, if the target has a cluster size that is
larger than the backup cluster size and we are backing up to a target
that cannot (for whichever reason) pull clusters up from a backing image,
we may inadvertantly create unusable incremental backup images.

For example:

If the bitmap tracks changes at a 64KB granularity and we transmit 64KB
of data at a time but the target uses a 128KB cluster size, it is
possible that only half of a target cluster will be recognized as dirty
by the backup block job. When the cluster is allocated on the target
image but only half populated with data, we lose the ability to
distinguish between zero padding and uninitialized data.

This does not happen if the target image has a backing file that points
to the last known good backup.

Even if we have a backing file, though, it's likely going to be faster
to just buffer the redundant data ourselves from the live image than
fetching it from the backing file, so let's just always round up to the
target granularity.

The same logic applies to backup modes top, none, and full. Copying
fractional clusters without the guarantee of COW is dangerous, but even
if we can rely on COW, it's likely better to just re-copy the data.

Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
John Snow 16096a4d47 block/backup: make backup cluster size configurable
64K might not always be appropriate, make this a runtime value.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-2-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
Fam Zheng 21cd917ff5 mirror: Add mirror_wait_for_io
The three lines are duplicated a number of times now, refactor a
function.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454637630-10585-3-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Fam Zheng e5b43573e2 mirror: Rewrite mirror_iteration
The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.

Rewrite mirror_iteration to fix both flaws.

The output of iotests 109 is updated because we now report the offset
and len slightly differently in mirroring progress.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454637630-10585-2-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz 04a3615860 vhdx: Simplify vhdx_set_shift_bits()
For values which are powers of two (and we do assume all of these to
be), sizeof(x) * 8 - 1 - clz(x) == ctz(x). Therefore, use ctz().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-3-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz 939901dcd2 vhdx: DIV_ROUND_UP() in vhdx_calc_bat_entries()
We have DIV_ROUND_UP(), so we can use it to produce more easily readable
code. It may be slower than the bit shifting currently performed
(because it actually performs a division), but since
vhdx_calc_bat_entries() is never used in a hot path, this is completely
fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-2-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange b189346eb1 iscsi: add support for getting CHAP password via QCryptoSecret API
The iSCSI driver currently accepts the CHAP password in plain text
as a block driver property. This change adds a new "password-secret"
property that accepts the ID of a QCryptoSecret instance.

  $QEMU \
     -object secret,id=sec0,filename=/home/berrange/example.pw \
     -drive driver=iscsi,url=iscsi://example.com/target-foo/lun1,\
            user=dan,password-secret=sec0

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-4-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange 1bff960642 curl: add support for HTTP authentication parameters
If connecting to a web server which has authentication
turned on, QEMU gets a 401 as curl has not been configured
with any authentication credentials.

This adds 4 new parameters to the curl block driver
options 'username', 'password-secret', 'proxy-username'
and 'proxy-password-secret'. Passwords are provided using
the recently added 'secret' object type

 $QEMU \
     -object secret,id=sec0,filename=/home/berrange/example.pw \
     -object secret,id=sec1,filename=/home/berrange/proxy.pw \
     -drive driver=http,url=http://example.com/some.img,\
            username=dan,password-secret=sec0,\
            proxy-username=dan,proxy-password-secret=sec1

Of course it is possible to use the same secret for both the
proxy & server passwords if desired, or omit the proxy auth
details, or the server auth details as required.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-3-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange 60390a2192 rbd: add support for getting password from QCryptoSecret object
Currently RBD passwords must be provided on the command line
via

  $QEMU -drive file=rbd:pool/image:id=myname:\
               key=QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=:\
               auth_supported=cephx

This is insecure because the key is visible in the OS process
listing.

This adds support for an 'password-secret' parameter in the RBD
parameters that can be used with the QCryptoSecret object to
provide the password via a file:

  echo "QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=" > poolkey.b64
  $QEMU -object secret,id=secret0,file=poolkey.b64,format=base64 \
        -drive driver=rbd,filename=rbd:pool/image:id=myname:\
               auth_supported=cephx,password-secret=secret0

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-2-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Vasiliy Tolstov eab8eb8db3 sheepdog: allow to delete snapshot
This patch implements a blockdriver function bdrv_snapshot_delete() in
the sheepdog driver. With the new function, snapshots of sheepdog can
be deleted from libvirt.

Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
Message-id: 1450873346-22334-1-git-send-email-mitake.hitoshi@lab.ntt.co.jp
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Peter Lieven 7725b8bf12 block/nfs: add support for setting debug level
recent libnfs versions support logging debug messages. Add
support for it in qemu through an URL parameter.

Example:
 qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447052973-14513-1-git-send-email-pl@kamp.de
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Alberto Garcia 398befdf50 qapi: Add burst length fields to BlockDeviceInfo
This patch adds the new bps_*_max_length and iops_*_max_length
parameters to the BlockDeviceInfo struct.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-22 14:08:06 +01:00
Changlong Xie f38738e212 quorum: fix segfault when read fails in fifo mode
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-22 09:49:46 +01:00
Daniel P. Berrange 75822a12c0 nbd: enable use of TLS with NBD block driver
This modifies the NBD driver so that it is possible to request
use of TLS. This is done by providing the 'tls-creds' parameter
with the ID of a previously created QCryptoTLSCreds object.

For example

  $QEMU -object tls-creds-x509,id=tls0,endpoint=client,\
                dir=/home/berrange/security/qemutls \
        -drive driver=nbd,host=localhost,port=9000,tls-creds=tls0

The client will drop the connection if the NBD server does not
provide TLS.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-15-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:16:33 +01:00
Daniel P. Berrange f95910fe6b nbd: implement TLS support in the protocol negotiation
This extends the NBD protocol handling code so that it is capable
of negotiating TLS support during the connection setup. This involves
requesting the STARTTLS protocol option before any other NBD options.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-14-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:16:28 +01:00
Daniel P. Berrange 1c778ef729 nbd: convert to using I/O channels for actual socket I/O
Now that all callers are converted to use I/O channels for
initial connection setup, it is possible to switch the core
NBD protocol handling core over to use QIOChannel APIs for
actual sockets I/O.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-7-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:13:57 +01:00
Daniel P. Berrange 064097d919 nbd: convert block client to use I/O channels for connection setup
This converts the NBD block driver client to use the QIOChannelSocket
class for initial connection setup. The NbdClientSession struct has
two pointers, one to the master QIOChannelSocket providing the raw
data channel, and one to a QIOChannel which is the current channel
used for I/O. Initially the two point to the same object, but when
TLS support is added, they will point to different objects.

The qemu-img & qemu-io tools now need to use MODULE_INIT_QOM to
ensure the QIOChannel object classes are registered. The qemu-nbd
tool already did this.

In this initial conversion though, all I/O is still actually done
using the raw POSIX sockets APIs.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1455129674-17255-4-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-16 17:13:22 +01:00
Peter Maydell f075c89f0a -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJWugGdAAoJEJykq7OBq3PIE5cH/RY0iIsvDU4dD05Xs/xlDPNL
 0lPwqmYIhGPCYdEQIhLsnf8bfYkyXGaS/bgWFzKk6mb4XI6HhcDdEoOXOSa3Rreq
 bm58ngE9xwC5ZiDuBFxzMXuoU/+iAU/YHiTxL84JaMpBjUQsUCvHlLMVXJXGphzG
 pXE5neYuy+MftiZ5f0nlVaIGCdd9rHWkKMgpViR0Xanx1cZe0TSkh12qlCRLXNJG
 SHDq2wNrFp7lV4IK2evcU59GvJ5tfXyaDEh7yCMrvMUFM96jBjSgtgKbuE+t4VWi
 6Awli6z9OvygSu0s/GuJ7yW41RSiTv/Qu1TagltkNjuSB5Gzw3lTgmorgVlLS+Y=
 =PZTe
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Tue 09 Feb 2016 15:11:25 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  block: add missing call to bdrv_drain_recurse
  blockjob: Fix hang in block_job_finish_sync
  iov: avoid memcpy for "simple" iov_from_buf/iov_to_buf

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-02-09 17:56:46 +00:00
Paolo Bonzini 9dcf8ecd9e block: add missing call to bdrv_drain_recurse
This is also needed in bdrv_drain_all, not just in bdrv_drain.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1450867706-19860-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-09 13:52:26 +00:00
Eric Blake 51e72bc1dd qapi: Swap visit_* arguments for consistent 'name' placement
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp).  This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order.  It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.

Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.

Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.

Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
 $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings').  The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.

    // Part 1: Swap declaration order
    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_start_struct
    -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type bool, TV, T1;
    identifier ARG1;
    @@
     bool visit_optional
    -(TV v, T1 ARG1, const char *name)
    +(TV v, const char *name, T1 ARG1)
     { ... }

    @@
    type TV, TErr, TObj, T1;
    identifier OBJ, ARG1;
    @@
     void visit_get_next_type
    -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_type_enum
    -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj;
    identifier OBJ;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
     void VISIT_TYPE
    -(TV v, TObj OBJ, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, TErr errp)
     { ... }

    // Part 2: swap caller order
    @@
    expression V, NAME, OBJ, ARG1, ARG2, ERR;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
    (
    -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
    +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -visit_optional(V, ARG1, NAME)
    +visit_optional(V, NAME, ARG1)
    |
    -visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
    +visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
    |
    -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
    +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -VISIT_TYPE(V, OBJ, NAME, ERR)
    +VISIT_TYPE(V, NAME, OBJ, ERR)
    )

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Fam Zheng ac987b30d0 block: Use returned *file in bdrv_co_get_block_status
Now that all drivers return the right "file" pointer, we can use it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1453780743-16806-14-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng e0f100f57c vmdk: Return extent's file in bdrv_get_block_status
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-13-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng d0a18f1025 vmdk: Fix calculation of block status's offset
"offset" is the offset of cluster and sector_num doesn't necessarily
refer to the start of it, it should add index_in_cluster.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-12-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 7429e20788 vpc: Assign bs->file->bs to file in vpc_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-11-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 8bfb137152 vdi: Assign bs->file->bs to file in vdi_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-10-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng d234c92931 sheepdog: Assign bs to file in sd_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-9-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 53f1dfd1ff qed: Assign bs->file->bs to file in bdrv_qed_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-8-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng ddf4987d76 parallels: Assign bs->file->bs to file in parallels_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-7-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 3399833f14 iscsi: Assign bs to file in iscsi_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-6-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 02650acbc6 raw: Assign bs to file in raw_co_get_block_status
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-5-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 178b4db7e5 qcow2: Assign bs->file->bs to file in qcow2_co_get_block_status
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 3064bf6fff qcow: Assign bs->file->bs to file in qcow_co_get_block_status
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-3-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00
Fam Zheng 67a0fd2a9b block: Add "file" output parameter to block status query functions
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.

Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.

The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02 17:50:47 +01:00