Commit graph

614 commits

Author SHA1 Message Date
Stefan Hajnoczi fd7f8c6537 block: use Error mechanism instead of -errno for block_job_create()
The block job API uses -errno return values internally and we convert
these to Error in the QMP functions.  This is ugly because the Error
should be created at the point where we still have all the relevant
information.  More importantly, it is hard to add new error cases to
this case since we quickly run out of -errno values without losing
information.

Go ahead and use Error directly and don't convert later.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Kevin Wolf b3adf53a3a nbd: Fix uninitialised use of s->sock
s->sock is assigned only afterwards, so we're really registering an
aio_fd_handler for file descriptor 0 here. Not exactly what we intended.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-26 17:54:22 +02:00
Anthony Liguori 1f8bcac09a Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony: (38 commits)
  qemu-iotests: Fix test 031 for qcow2 v3 support
  qemu-iotests: Add -o and make v3 the default for qcow2
  qcow2: Zero write support
  qemu-iotests: Test backing file COW with zero clusters
  qemu-iotests: add a simple test for write_zeroes
  qcow2: Support for feature table header extension
  qcow2: Support reading zero clusters
  qcow2: Version 3 images
  qcow2: Ignore reserved bits in check_refcounts
  qcow2: Ignore reserved bits in refcount table entries
  qcow2: Simplify count_cow_clusters
  qcow2: Refactor qcow2_free_any_clusters
  qcow2: Ignore reserved bits in L1/L2 entries
  qcow2: Fail write_compressed when overwriting data
  qcow2: Ignore reserved bits in count_contiguous_clusters()
  qcow2: Ignore reserved bits in get_cluster_offset
  qcow2: Save disk size in snapshot header
  Specification for qcow2 version 3
  qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()
  iotests: Resolve test failures caused by hostname
  ...
2012-04-23 14:27:04 -05:00
Kevin Wolf 621f058940 qcow2: Zero write support
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:30 +02:00
Kevin Wolf cfcc4c62ff qcow2: Support for feature table header extension
Instead of printing an ugly bitmask, qemu can now print a more helpful
string even for yet unknown features.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:29 +02:00
Kevin Wolf 6377af48b0 qcow2: Support reading zero clusters
This adds support for reading zero clusters in version 3 images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:29 +02:00
Kevin Wolf 6744cbab8c qcow2: Version 3 images
This adds the basic infrastructure to qcow2 to handle version 3 images.
It includes code to create v3 images, allow header updates for v3 images
and checks feature bits.

It still misses support for zero clusters, so this is not a fully
compliant implementation of v3 yet.

The default for creating new images stays at v2 for now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:29 +02:00
Kevin Wolf afdf0abe77 qcow2: Ignore reserved bits in check_refcounts
Also don't infer the cluster type directly from the L2 entries, but use
qcow2_get_cluster_type() to keep everything in a single place.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:29 +02:00
Kevin Wolf 76dc9e0c8f qcow2: Ignore reserved bits in refcount table entries
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:29 +02:00
Kevin Wolf 143550a83e qcow2: Simplify count_cow_clusters
count_cow_clusters() tries to reuse existing functions, and all it
achieves is to make things much more complicated than they really are:
Everything needs COW, unless it's a normal cluster with refcount 1.

This patch implements the obvious way of doing this, and by using
qcow2_get_cluster_type() it gets rid of all flag magic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:28 +02:00
Kevin Wolf c7a4c37a0f qcow2: Refactor qcow2_free_any_clusters
Zero clusters will add another cluster type. Refactor the open-coded
cluster type detection into a switch of QCOW2_CLUSTER_* options so that
the detection is in a single place. This makes it easier to add new
cluster types.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:28 +02:00
Kevin Wolf 8e37f681d5 qcow2: Ignore reserved bits in L1/L2 entries
This changes the still existing places that assume that the only flags
are QCOW_OFLAG_COPIED and QCOW_OFLAG_COMPRESSED to properly mask out
reserved bits.

It does not convert bdrv_check yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:28 +02:00
Kevin Wolf b0b6862e5e qcow2: Fail write_compressed when overwriting data
qcow2_alloc_compressed_cluster_offset() already fails if the copied flag
is set, because qcow2_write_compressed() doesn't perform COW as it would
have to do to allow this.

However, what we really want to check here is whether the cluster is
allocated or not. With internal snapshots the copied flag may not be set
on allocated clusters. Check the cluster offset instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 2bfcc4a0a0 qcow2: Ignore reserved bits in count_contiguous_clusters()
Until now, count_contiguous_clusters() has an argument that allowed to
specify flags that should be ignored in the comparison, i.e. that are
allowed to change between contiguous clusters.

This patch changes the function so that it ignores all flags by default
now and you need to pass the flags on which it should stop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 68d000a390 qcow2: Ignore reserved bits in get_cluster_offset
With this change, reading from a qcow2 image ignores all reserved bits
that are set in an L1 or L2 table entry.

Now get_cluster_offset() assigns *cluster_offset only the offset without
any other flags. The cluster type is not longer encoded in the offset,
but a positive return value in case of success.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 90b277593d qcow2: Save disk size in snapshot header
This allows that different snapshots of an image can have different
sizes, which is a requirement for enabling image resizing even with
images that have internal snapshots.

We don't do the actual support for it now, but make sure that the
additional field is present and not completely ignored in all version 3
images. When trying to load a snapshot of different size, it returns
an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf f24423bd90 qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()
Refcount block allocation and refcount table growth rely on
s->free_cluster_index pointing to somewhere after the current
allocation. Change qcow2_alloc_cluster_at() to fulfill this
assumption.

Without this change it could happen that a newly allocated refcount
block and the allocated data block point to the same area in the image
file, causing data corruption in the long run.

This fixes a bug that became first visible after commit 250196f1.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:56:19 +02:00
Paolo Bonzini bafbd6a1c6 aio: remove process_queue callback and qemu_aio_process_queue
Both unused after the previous patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:37:53 +02:00
Paolo Bonzini 7fe7b68b32 nbd: do not block in nbd_wr_sync if no data at all is available
Right now, nbd_wr_sync will hang if no data at all is available on the
socket and the other side is not going to provide any.  Relax this by
making it loop only for writes or partial reads.  This fixes a race
where one thread is executing qemu_aio_wait() and another is executing
main_loop_wait().  Then, the select() call in main_loop_wait() can return
stale data and call the "readable" callback with no data in the socket.

Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini 185b43386a nbd: consistently return negative errno values
In the next patch we need to look at the return code of nbd_wr_sync.
To avoid percolating the socket_error() ugliness all around, let's
handle errors by returning negative errno values.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini fc19f8a02e nbd: consistently check for <0 or >=0
This prepares for the following patch, which changes -1 return values
to negative errno.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini dd3e8ac413 nbd: avoid out of bounds access to recv_coroutine array
This can happen with a buggy or malicious server.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:42 +02:00
Kevin Wolf 2795ecf681 qcow2: Fix return value of alloc_refcount_block
Someone forgot something in commit 29c1a730... Documenting the right
return value is not enough, you also need to actually return it in the
code.

This bug sometimes causes error return values even when everything has
succeeded: The new offset of the refcount block is truncated to 32 bits
and interpreted as signed. At least with small cluster sizes it's easy
to get a negative return value this way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:03:27 +02:00
Kevin Wolf 8dc0a5e7a0 qcow2: Fix error handling in qcow2_alloc_cluster_offset
If do_alloc_cluster_offset() fails, the error handling code tried to
remove the request from the in-flight queue, to which it wasn't added
yet, resulting in a NULL pointer dereference.

m->nb_clusters really only becomes != 0 when the request is in the list.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:03:27 +02:00
Stefan Weil 4e35b92a51 block: Fix spelling in comment (ineffcient -> inefficient)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 15:48:52 +02:00
Anthony Liguori bb5d8dd757 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony: (46 commits)
  qed: remove incoming live migration blocker
  qed: honor BDRV_O_INCOMING for incoming live migration
  migration: clear BDRV_O_INCOMING flags on end of incoming live migration
  qed: add bdrv_invalidate_cache to be called after incoming live migration
  blockdev: open images with BDRV_O_INCOMING on incoming live migration
  block: add a function to clear incoming live migration flags
  block: Add new BDRV_O_INCOMING flag to notice incoming live migration
  block stream: close unused files and update ->backing_hd
  qemu-iotests: Fix call syntax for qemu-io
  qemu-iotests: Fix call syntax for qemu-img
  qemu-iotests: Test unknown qcow2 header extensions
  qemu-iotests: qcow2.py
  sheepdog: fix send req helpers
  sheepdog: implement SD_OP_FLUSH_VDI operation
  block: bdrv_append() fixes
  qed: track dirty flag status
  qemu-img: add dirty flag status
  qed: image fragmentation statistics
  qemu-img: add image fragmentation statistics
  block: document job API
  ...
2012-04-10 08:16:12 -05:00
Benoît Canet 50d30c2675 qed: remove incoming live migration blocker
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 16:29:12 +02:00
Benoît Canet 2d1f3c2360 qed: honor BDRV_O_INCOMING for incoming live migration
From original commit with Patchwork-id: 31108 by
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

"The QED image format includes a file header bit to mark images dirty.
QED normally checks dirty images on open and fixes inconsistent
metadata.  This is undesirable during live migration since the dirty bit
may be set if the source host is modifying the image file.  The check
should be postponed until migration completes.

Skip operations that modify the image file if the BDRV_O_INCOMING flag
is set."

Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 16:29:04 +02:00
Benoît Canet c82954e529 qed: add bdrv_invalidate_cache to be called after incoming live migration
The QED image is reopened to flush metadata and check consistency.

Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 16:28:27 +02:00
Marcelo Tosatti 5a67a1048e block stream: close unused files and update ->backing_hd
Close the now unused images that were part of the previous backing file
chain and adjust ->backing_hd, backing_filename and backing_format
properly.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=801449

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 15:11:37 +02:00
Liu Yuan eb09218077 sheepdog: fix send req helpers
We should return if reading of the header fails.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:41 +02:00
Liu Yuan 47622c44d0 sheepdog: implement SD_OP_FLUSH_VDI operation
Flush operation is supposed to flush the write-back cache of
sheepdog cluster.

By issuing flush operation, we can assure the Guest of data
reaching the sheepdog cluster storage.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:41 +02:00
Dong Xu Wang d68dbee80e qed: track dirty flag status
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:41 +02:00
Dong Xu Wang 11c9c615c8 qed: image fragmentation statistics
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 9f25eccc1c block: set job->speed in block_set_speed
There is no need to do this in every implementation of set_speed
(even though there is only one right now).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 3e914655f2 block: fix streaming/closing race
Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.

The cancellation must be synchronous, on the other hand qemu_aio_wait
will not restart a coroutine that is sleeping in co_sleep.  So add
a flag saying whether streaming has in-flight I/O.  If the busy flag
is false, the coroutine is quiescent and, when cancelled, will not
issue any new I/O.

This protects streaming against closing, but not against deleting.
We have a reference count protecting us against concurrent deletion,
but I still added an assertion to ensure nothing bad happens.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini eb9566d13e vdi: change goto to loop
Finally reindent all code and change goto statements to a loop.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 4eea78e634 vdi: do not create useless iovecs
Reads and writes to the underlying file can also occur with the simple
non-vectored I/O interfaces.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini a7a43aa199 vdi: leave bounce buffering to block layer
vdi.c really works as if it implemented bdrv_read and bdrv_write.  However,
because only vector I/O is supported by the asynchronous callbacks, it
went through extra pain to bounce-buffer the I/O.  This can be handled
by the block layer now that the format is coroutine-based.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini bfc45fc183 vdi: move aiocb fields to locals
Most of the AIOCB really holds local variables that need to persist
across callback invocation.  It can go away now.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 4de659e8eb vdi: merge aio_read_cb and aio_write_cb into callers
Now inline the former AIO callbacks into vdi_co_readv and vdi_co_writev.
While many cleanups are possible, the code now really looks synchronous.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 0c7bfc321b vdi: move end-of-I/O handling at the end
The next step is to take code that only triggers after the first operation,
and move it at the end of vdi_aio_read_cb and vdi_aio_write_cb.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 3d46a75aa5 vdi: basic conversion to coroutines
Even a basic conversion changing the bdrv_aio_readv/bdrv_aio_writev calls
to bdrv_co_readv/bdrv_co_writev, and callbacks to goto statements can
eliminate a lot of code.  This is because error handling is simplified
and indirections through bottom halves can go away.

After this patch, I/O to the underlying file already happens via
coroutines, but the code still looks a lot like if asynchronous I/O was
being used.

Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Zhang Shengju c088b69136 block/vpc: write checksum back to footer after check
After validation check, the 'checksum' is not written back
to footer, which leave it with zero.

This results in errors while loadding it under Microsoft's
Hyper-V environment, and also errors from utilities like
Citrix's vhd-util.

Signed-off-by: Zhang Shengju <sean_zhang@trendmicro.com.cn>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 29cdb2513c block: push recursive flushing up from drivers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:39 +02:00
Kevin Wolf 3948d1d487 qcow2: Remove unused parameter in get_cluster_table()
Since everything goes through the cache, callers don't use the L2 table
offset any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-05 14:54:39 +02:00
Stefan Weil fb7c8e8a2d block/curl: Replace usleep by g_usleep
The function usleep is not available for all supported platforms:
at least some versions of MinGW don't support it.

usleep was also declared obsolete by POSIX.1-2001.

The function g_usleep is part of glib2.0, so it is available for
all supported platforms.

Using nanosleep would also be possible but needs more code.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-03 09:34:34 +01:00
Kevin Wolf 250196f19c qcow2: Reduce number of I/O requests
If the first part of a write request is allocated, but the second isn't
and it can be allocated so that the resulting area is contiguous, handle
it at once. This is a common case for sequential writes.

After this patch, alloc_cluster_offset() only checks if the clusters are
already allocated or how many new clusters can be allocated contigouosly.
The actual cluster allocation is split off into a new function
do_alloc_cluster_offset().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:07 +01:00
Kevin Wolf 256900b16b qcow2: Add qcow2_alloc_clusters_at()
This function allows to allocate clusters at a given offset in the image
file. This is useful if you want to allocate the second part of an area
that must be contiguous.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:07 +01:00
Kevin Wolf bf319ece56 qcow2: Factor out count_cow_clusters
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:07 +01:00
Kevin Wolf 259b217310 qcow2: Add error messages in qcow2_truncate
qemu-img resize has some limitations with qcow2, but the user is only
told that "this image format does not support resize". Quite confusing,
so add some more detailed error_report() calls and change "this image
format" into "this image".

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:06 +01:00
Kevin Wolf 3cce16f44d qcow2: Add some tracing
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:06 +01:00
Stefan Hajnoczi 14fe292d86 qed: do not evict in-use L2 table cache entries
The L2 table cache reduces QED metadata reads that would be required
when translating LBAs to offsets into the image file.  Since requests
execute in parallel it is possible to share an L2 table between multiple
requests.

There is a potential data corruption issue when an in-use L2 table is
evicted from the cache because the following situation occurs:

  1. An allocating write performs an update to L2 table "A".

  2. Another request needs L2 table "B" and causes table "A" to be
     evicted.

  3. A new read request needs L2 table "A" but it is not cached.

As a result the L2 update from #1 can overlap with the L2 fetch from #3.
We must avoid doing overlapping I/O requests here since the worst case
outcome is that the L2 fetch completes before the L2 update and yields
stale data.  In that case we would effectively discard the L2 update and
lose data clusters!

Thanks to Benoît Canet <benoit.canet@gmail.com> for extensive testing
and debugging which lead to discovery of this bug.

Reported-by: Benoît Canet <benoit.canet@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Tested-by: Benoît Canet <benoit.canet@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-03-12 15:14:06 +01:00
Stefan Weil 75d1234103 block/vmdk: Fix warning from splint (comparision of unsigned value)
l1_entry_sectors will never be less than 0.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-07 13:03:51 +00:00
Kevin Wolf 64ca6aee4f qcow2: Reject too large header extensions
Image files that make qemu-img info read several gigabytes into the
unknown header extensions list are bad. Just fail opening the image
if an extension claims to be larger than the header extension area.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-02-29 12:48:47 +01:00
Kevin Wolf fd29b4bbef qcow2: Fix offset in qcow2_read_extensions
The spec says that the length of extensions is padded to 8 bytes, not
the offset. Currently this is the same because the header size is a
multiple of 8, so this is only about compatibility with future changes
to the header size.

While touching it, move the calculation to a common place instead of
duplicating it for each header extension type.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-02-29 12:48:47 +01:00
Kevin Wolf 423477e556 qcow2: Fix build with DEBUG_EXT enabled
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:47 +01:00
Luiz Capitulino f36f394952 block: bdrv_eject(): Make eject_flag a real bool
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:05 -02:00
MORITA Kazutaka 6d1acda8f1 sheepdog: fix co_recv coroutine context
The co_recv coroutine has two things that will try to enter it:

  1. The select(2) read callback on the sheepdog socket.
  2. The aio_add_request() blocking operations, including a coroutine
     mutex.

This patch fixes it by setting NULL to co_recv before sending data.

In future, we should make the sheepdog driver fully coroutine-based
and simplify request handling.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:51 +01:00
Kevin Wolf 75bab85ca0 qcow2: Keep unknown header extension when rewriting header
If we want header extensions to work as compatible extensions, we can't
destroy yet unknown header extensions when rewriting the header (e.g.
for changing the backing file). Save all unknown header extensions in a
list of blobs and include them in a new header.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:51 +01:00
Kevin Wolf e24e49e619 qcow2: Update whole header at once
In order to switch the backing file, qcow2 issues multiple write
requests that only changed a part of the image header. Any failure after
the first one would leave the header in an corrupted state. With this
patch, the whole header is written at once, so we can't fail in the
middle.

At the same time, this gives us a reusable functions that updates all
fields of the qcow2 header and not only the backing file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:51 +01:00
Kevin Wolf ecd880d9ee vpc: Round up image size during fixed image creation
The geometry calculation algorithm from the VHD spec rounds the image
size down if it doesn't exactly match a geometry. During image
conversion, this causes the image to be truncated. For dynamic images,
we already have code in place to round up instead, let's do the same for
fixed images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:51 +01:00
Charles Arnold 24da78dbb5 vpc: Add support for Fixed Disk type
The Virtual Hard Disk Image Format Specification allows for three
types of hard disk formats, Fixed, Dynamic, and Differencing.  Qemu
currently only supports Dynamic disks.  This patch adds support for
the Fixed Disk format.

Usage:
    Example 1: qemu-img create -f vpc -o type=fixed <filename> [size]
    Example 2: qemu-img convert -O vpc -o type=fixed <input filename> <output filename>

While it is also allowed to specify '-o type=dynamic', the default disk type
remains Dynamic and is what is used when the type is left unspecified.

Signed-off-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:51 +01:00
Ronnie Sahlberg f9dadc9855 iSCSI: add configuration variables for iSCSI
This patch adds configuration variables for iSCSI to set
initiator-name to use when logging in to the target,
which type of header-digest to negotiate with the target
and username and password for CHAP authentication.

This allows specifying a initiator-name either from the command line
-iscsi initiator-name=iqn.2004-01.com.example:test
or from a configuration file included with -readconfig
    [iscsi]
      initiator-name = iqn.2004-01.com.example:test
      header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE
      user = CHAP username
      password = CHAP password

If you use several different targets, you can also configure this on a per
target basis by using a group name:
    [iscsi "iqn.target.name"]
    ...

The configuration file can be read using -readconfig.
Example :
qemu-system-i386 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1
 -readconfig iscsi.conf

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Stefan Hajnoczi 0e71be1932 qed: add .bdrv_co_write_zeroes() support
Zero writes are a dedicated interface for writing regions of zeroes into
the image file.  If clusters are not yet allocated it is possible to use
an efficient metadata representation which keeps the image file compact
and does not store individual zero bytes.

Implementing this for the QED image format is fairly straightforward.
The only issue is that when a zero write touches an existing cluster we
have to allocate a bounce buffer and perform a regular write.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Stefan Hajnoczi 6e4f59bd0d qed: replace is_write with flags field
Per-request attributes like read/write are currently implemented as bool
fields in the QEDAIOCB struct.  This becomes unwiedly as the number of
attributes grows.  For example, the qed_aio_setup() function would have
to take multiple bool arguments and at call sites it would be hard to
distinguish the meaning of each bool.

Instead use a flags field with bitmask constants.  This will be used
when zero write support is added.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Li Zhi Hui 2b16c9ffb2 qcow: Use bdrv functions to replace file operation
Since common file operation functions lack of error detection and use
much more I/O syscalls, so change them to bdrv series functions and
reduce I/O request.

Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Li Zhi Hui 84b0ec020f qcow: Return real error code in qcow_open
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Stefan Weil 641543b76b block/vdi: Zero unused parts when allocating a new block (fix #919242)
The new block was filled with zero when it was allocated by g_malloc0,
but when it was reused later and only partially used, data from the
previously allocated block were still present and written to the new
block.

This caused the problems reported by bug #919242
(https://bugs.launchpad.net/qemu/+bug/919242).

Now the unused parts of the new block which are before and after the data
are always filled with zero, so it is no longer necessary to zero the whole
block with g_malloc0.

I also updated the copyright comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Marcelo Tosatti c8c3080f4a block: add support for partial streaming
Add support for streaming data from an intermediate section of the
image chain (see patch and documentation for details).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Stefan Hajnoczi 5094a6c016 block: rate-limit streaming operations
This patch implements rate-limiting for image streaming.  If we've
exceeded the bandwidth quota for a 100 ms time slice we sleep the
coroutine until the next slice begins.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi 4f1043b4ff block: add image streaming block job
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi 031380d877 block: replace unchecked strdup/malloc/calloc with glib
Most of the codebase as been converted to use glib memory allocation
functions.  There are still a few instances of malloc/calloc in the
block layer and qemu-io.  Replace them, especially since they do not
check the strdup/malloc/calloc return value.

Reported-by: Dr David Alan Gilbert <davidagilbert@uk.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:39:03 +01:00
Gregory Farnum bd60324706 rbd: wire up snapshot removal and rollback functionality
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:39:03 +01:00
Paolo Bonzini 6b620ca3b0 prepare for future GPLv2+ relicensing
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-01-13 10:55:56 -06:00
Stefan Hajnoczi 8d98734651 vvfat: avoid leaking file descriptor in commit_one_file()
Reported-by: Dr David Alan Gilbert <davidagilbert@uk.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-01-13 10:36:59 +00:00
Paolo Bonzini 128aa58947 move corking functions to osdep.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:58 +01:00
Paolo Bonzini 7a706633e9 nbd: add support for NBD_CMD_TRIM
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:57 +01:00
Paolo Bonzini 1486d04a1b nbd: add support for NBD_CMD_FLUSH
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:57 +01:00
Paolo Bonzini 2c7989a9b1 nbd: add support for NBD_CMD_FLAG_FUA
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:57 +01:00
Paolo Bonzini ecda3447d1 nbd: allow multiple in-flight requests
Allow sending up to 16 requests, and drive the replies to the coroutine
that did the request.  The code is written to be exactly the same as
before this patch when MAX_NBD_REQUESTS == 1 (modulo the extra mutex
and state).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:57 +01:00
Paolo Bonzini d9b09f13ca nbd: split requests
qemu-nbd has a limit of slightly less than 1M per request.  Work
around this in the nbd block driver.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:57 +01:00
Paolo Bonzini ae255e523c nbd: switch to asynchronous operation
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:57 +01:00
Paolo Bonzini 8c5135f90e sheepdog: move coroutine send/recv function to generic code
Outside coroutines, avoid busy waiting on EAGAIN by temporarily
making the socket blocking.

The API of qemu_recvv/qemu_sendv is slightly different from
do_readv/do_writev because they do not handle coroutines.  It
returns the number of bytes written before encountering an
EAGAIN.  The specificity of yielding on EAGAIN is entirely in
qemu-coroutine.c.

Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2011-12-22 11:53:53 +01:00
Li Zhi Hui 16d2fc002a block/cow: Return real error code
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:33 +01:00
Kevin Wolf c2c9a46609 qcow2: Allow >4 GB VM state
This is a compatible extension to the snapshot header format that allows
saving a 64 bit VM state size.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:33 +01:00
Josh Durgin b9c532903f rbd: always set out parameter in qemu_rbd_snap_list
The caller expects psn_tab to be NULL when there are no snapshots or
an error occurs. This results in calling g_free on an invalid address.

Reported-by: Oliver Francke <Oliver@filoo.de>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:08 +01:00
Li Zhi Hui 28c1202ba6 block/qcow2.c: call qcow2_free_snapshots in the function of qcow2_close
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:08 +01:00
Paolo Bonzini 91977c2e5f block: qemu_aio_get does not return NULL
Initially done with the following semantic patch:

@ rule1 @
expression E;
statement S;
@@
  E = qemu_aio_get (...);
(
- if (E == NULL) { ... }
|
- if (E)
    { <... S ...> }
)

which however missed occurrences in linux-aio.c and posix-aio-compat.c.
Those were done by hand.

The change in vdi_aio_setup's caller was also done by hand.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:08 +01:00
Paolo Bonzini ad54ae80c7 block: bdrv_aio_* do not return NULL
Initially done with the following semantic patch:

@ rule1 @
expression E;
statement S;
@@
  E =
(
   bdrv_aio_readv
|  bdrv_aio_writev
|  bdrv_aio_flush
|  bdrv_aio_discard
|  bdrv_aio_ioctl
)
     (...);
(
- if (E == NULL) { ... }
|
- if (E)
    { <... S ...> }
)

which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Dong Xu Wang 3a93113a00 fix typo: delete redundant semicolon
Double semicolons should be single.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-06 09:56:41 +00:00
Anthony Liguori eb5d5beaeb Merge remote-tracking branch 'kwolf/for-anthony' into staging 2011-12-05 09:39:25 -06:00
Stefan Hajnoczi e94d138733 cow: use bdrv_co_is_allocated()
Now that bdrv_co_is_allocated() is available we can use it instead of
the synchronous bdrv_is_allocated() interface.  This is a follow-up that
Kevin Wolf <kwolf@redhat.com> pointed out after applying the series that
introduces bdrv_co_is_allocated().

It is safe to make cow_read() a coroutine_fn because its only caller is
a coroutine_fn.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi e8ee5e4c47 coroutine: add qemu_co_queue_restart_all()
It's common to wake up all waiting coroutines.  Introduce the
qemu_co_queue_restart_all() function to do this instead of looping over
qemu_co_queue_next() in every caller.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi 81145834d3 cow: convert to .bdrv_co_is_allocated()
The cow block driver does not keep internal state for cluster lookups.
This means it is safe to perform cluster lookups in coroutine context
without risk of race conditions that corrupt internal state.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi e850b35a1f vdi: convert to .bdrv_co_is_allocated()
It is trivial to switch from the synchronous .bdrv_is_allocated()
interface to .bdrv_co_is_allocated() since vdi_is_allocated() does not
block.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi 73f703ca8f vvfat: convert to .bdrv_co_is_allocated()
It is trivial to switch from the synchronous .bdrv_is_allocated()
interface to .bdrv_co_is_allocated() since vvfat_is_allocated() does not
block.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi f8a2e5e3ca block: convert qcow2, qcow2, and vmdk to .bdrv_co_is_allocated()
The qcow2, qcow, and vmdk block drivers are based on coroutines.  They have a
coroutine mutex which protects internal state.  We can convert the
.bdrv_is_allocated() function to .bdrv_co_is_allocated() by holding the mutex
around the cluster lookup operation.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi b7d5a5b8ae qed: convert to .bdrv_co_is_allocated()
The bdrv_qed_is_allocated() function is a synchronous wrapper around
qed_find_cluster(), which performs the cluster lookup.  In order to
convert the synchronous function to a coroutine function we yield
instead of using qemu_aio_wait().  Note that QED's cache is already safe
for parallel requests so no locking is needed.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Kevin Wolf e3f652b332 qcow2: Fix error path in qcow2_snapshot_load_tmp
If the bdrv_read() of the snapshot's L1 table fails, return the right
error code and make sure that the old L1 table is still loaded and we
don't break the BlockDriverState completely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05 14:51:36 +01:00