Commit graph

19844 commits

Author SHA1 Message Date
Kevin Wolf b0b6862e5e qcow2: Fail write_compressed when overwriting data
qcow2_alloc_compressed_cluster_offset() already fails if the copied flag
is set, because qcow2_write_compressed() doesn't perform COW as it would
have to do to allow this.

However, what we really want to check here is whether the cluster is
allocated or not. With internal snapshots the copied flag may not be set
on allocated clusters. Check the cluster offset instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 2bfcc4a0a0 qcow2: Ignore reserved bits in count_contiguous_clusters()
Until now, count_contiguous_clusters() has an argument that allowed to
specify flags that should be ignored in the comparison, i.e. that are
allowed to change between contiguous clusters.

This patch changes the function so that it ignores all flags by default
now and you need to pass the flags on which it should stop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 68d000a390 qcow2: Ignore reserved bits in get_cluster_offset
With this change, reading from a qcow2 image ignores all reserved bits
that are set in an L1 or L2 table entry.

Now get_cluster_offset() assigns *cluster_offset only the offset without
any other flags. The cluster type is not longer encoded in the offset,
but a positive return value in case of success.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 90b277593d qcow2: Save disk size in snapshot header
This allows that different snapshots of an image can have different
sizes, which is a requirement for enabling image resizing even with
images that have internal snapshots.

We don't do the actual support for it now, but make sure that the
additional field is present and not completely ignored in all version 3
images. When trying to load a snapshot of different size, it returns
an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf 4fabffc112 Specification for qcow2 version 3
This updates the qcow2 specification to cover version 3. It contains the
following changes:

- Added compatible/incompatible/auto-clear feature bits plus an optional
  feature name table to allow useful error messages even if an older
  version doesn't know some feature at all.

- Configurable refcount width. If you don't want to use internal
  snapshots, make refcounts one bit and save cache space and I/O.

- Zero cluster flags. This allows discard even with a backing file that
  doesn't contain zeros. It is also useful for copy-on-read/image
  streaming, as you'll want to keep sparseness without accessing the
  remote image for an unallocated cluster all the time.

- Fixed internal snapshot metadata to use 64 bit VM state size. You
  can't save a snapshot of a VM with >= 4 GB RAM today.

- Extended internal snapshot metadata to contain the disk size, so that
  resizing images that have snapshots can be allowed in the future.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf f24423bd90 qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()
Refcount block allocation and refcount table growth rely on
s->free_cluster_index pointing to somewhere after the current
allocation. Change qcow2_alloc_cluster_at() to fulfill this
assumption.

Without this change it could happen that a newly allocated refcount
block and the allocated data block point to the same area in the image
file, causing data corruption in the long run.

This fixes a bug that became first visible after commit 250196f1.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:56:19 +02:00
David Gibson fecccc4477 Add .gitignore for tests/
The new autotests in tests/ generate a number of files, both
executable and source, which are not caught by the existing .gitignore
files.  This patch adds a new .gitignore in tests/ which covers these.

[Changed 'rtc-test' to '*-test' so future tests do not need to be added
to .gitignore on a case-by-case basis.  Stefan]

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:23:27 +01:00
Stefan Weil 362f5fb564 e1000: Fix spelling (segmentaion -> segmentation) in debug output
This was reported by https://bugs.launchpad.net/qemu/+bug/984476.

I also changed the case for 'error'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:20:54 +01:00
Eduardo Elias Ferreira 4f5c017738 spice-qemu-char.c: Show what name is unsupported
Signed-off-by: Eduardo Elias Ferreira <edusf@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:14:53 +01:00
Eric Bénard 4d6145488c pflash_cfi01: remove redundant line
Signed-off-by: Eric Bénard <eric@eukrea.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:14:53 +01:00
Stefan Weil 5f8daf2e04 qxl: Add missing GCC_FMT_ATTR and fix format specifier
val is an uint64_t, therefore %d was not correct.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:14:53 +01:00
Paolo Bonzini 4451b79962 fix block_job_set_speed name in documentation
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:14:53 +01:00
Amos Kong 0ed6dc1a98 error.c: don't return value for void function
It is invalid to return a value from a function
returning void.

[C99 6.8.6.4 says "A return statement with an expression shall not
appear in a function whose return type is void" but gcc 4.6.3 with QEMU
compile flags does not complain.  It's still worth fixing this.  Stefan]

Signed-off-by: Amos Kong <akong@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-20 13:14:53 +01:00
Andreas Färber 29926112a2 iotests: Resolve test failures caused by hostname
`hostname -s` may output an errror:
hostname: Name or service not known
This causes all tests to fail for `make check-block`.

Suppress such error messages, letting the tests succeed.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 12:58:50 +02:00
Liu Yuan 80ccf93b88 qemu-img: let 'qemu-img convert' flush data
The 'qemu-img convert -h' advertise that the default cache mode is
'writeback', while in fact it is 'unsafe'.

This patch 1) fix the help manual and 2) let bdrv_close() call bdrv_flush()

2) is needed because some backend storage doesn't have a self-flush
mechanism(for e.g., sheepdog), so we need to call bdrv_flush() to make
sure the image is really writen to the storage instead of hanging around
writeback cache forever.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 11:42:41 +02:00
Blue Swirl 90449c3887 sparc: fix qtest
Initialize TCG only when enabled.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-04-19 18:52:48 +00:00
Blue Swirl e776bffb53 qtest: add dummy functions for user emulators
Allow qtest to be used also in files used for user emulators by
introducing dummy functions.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-04-19 18:52:35 +00:00
Blue Swirl 85215d419b qtest: add register fuzzing to RTC test
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-04-19 18:14:55 +00:00
Michael Roth 4bdd04165a qemu-ga: fix help output
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-19 10:45:07 -05:00
Michael Roth d35d4cb517 qemu-ga: generate missing stubs for fsfreeze
When linux-specific commands (including guest-fsfreeze-*) were consolidated
under defined(__linux__), we forgot to account for the case where
defined(__linux__) && !defined(FIFREEZE). As a result stubs are no longer
being generated on linux hosts that don't have FIFREEZE support. Fix
this.

Tested-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-19 10:42:59 -05:00
Paolo Bonzini e25ceb76e5 nbd: obey FUA on reads
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 17:19:37 +02:00
Paolo Bonzini 38ceff0412 nbd: do not include block_int.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 17:19:37 +02:00
Paolo Bonzini 9eb0bfca96 aio: simplify qemu_aio_wait
The do...while loop can never loop, because select will just not return
0 when invoked with infinite timeout.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:51:47 +02:00
Paolo Bonzini bcdc18578d aio: return "AIO in progress" state from qemu_aio_wait
The definition of when qemu_aio_flush should loop is much simpler
than it looks.  It just has to call qemu_aio_wait until it makes
no progress and all flush callbacks return false.  qemu_aio_wait
is the logical place to tell the caller about this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:50:49 +02:00
Paolo Bonzini bafbd6a1c6 aio: remove process_queue callback and qemu_aio_process_queue
Both unused after the previous patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:37:53 +02:00
Paolo Bonzini 7fe7b68b32 nbd: do not block in nbd_wr_sync if no data at all is available
Right now, nbd_wr_sync will hang if no data at all is available on the
socket and the other side is not going to provide any.  Relax this by
making it loop only for writes or partial reads.  This fixes a race
where one thread is executing qemu_aio_wait() and another is executing
main_loop_wait().  Then, the select() call in main_loop_wait() can return
stale data and call the "readable" callback with no data in the socket.

Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini 185b43386a nbd: consistently return negative errno values
In the next patch we need to look at the return code of nbd_wr_sync.
To avoid percolating the socket_error() ugliness all around, let's
handle errors by returning negative errno values.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini fc19f8a02e nbd: consistently check for <0 or >=0
This prepares for the following patch, which changes -1 return values
to negative errno.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini 94e7340b5d nbd: consistently use ssize_t
GCC (pedantically, but correctly) considers that a negative ssize_t may
become positive when casted to int.  This may cause uninitialized variable
warnings when a function returns such a negative ssize_t and is inlined.
Propagate ssize_t return types to avoid this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:43 +02:00
Paolo Bonzini dd3e8ac413 nbd: avoid out of bounds access to recv_coroutine array
This can happen with a buggy or malicious server.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:36:42 +02:00
Paolo Bonzini adfe92f6d1 posix-aio: merge posix_aio_process_queue and posix_aio_read
posix_aio_read already calls qemu_aio_process_queue, and dually
qemu_aio_process_queue is always followed by a select loop that calls
posix_aio_read.

No races are possible, so there is no need for a separate process_queue
callback.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:35:43 +02:00
Paolo Bonzini 8a83205d34 qemu-tool: map vm_clock to rt_clock
QED uses vm_clock timers so that images are not touched during and after
migration.  This however does not apply to qemu-io and qemu-img.
Treat vm_clock as a synonym for rt_clock there, and enable it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:30:25 +02:00
Paolo Bonzini a5a5238ee4 qemu-io: use main_loop_wait
This will let timers run during aio_read and aio_write commands,
though not during synchronous commands.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:29:33 +02:00
Paolo Bonzini 3e46d87d66 scsi: add SANITIZE command
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:26:33 +02:00
Ronnie Sahlberg f644a2904d SCSI emulation: should tell the guest that we actually support thin provisioning
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
[Actually, we should report it only if discard_granularity is nonzero.
 Older SBC drafts assigned 0 to thin provisioning and 1 to thick
 (resource-provisioned, they call it).  Newer drafts assign respectively
 1 and 2 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:26:29 +02:00
Ronnie Sahlberg c9e4d8284e SCSI emulation: Support unmap via WRITE_SAME_10.
This was added in SBC r26 in place of the reserved bits that were
present up to that version.

It is the same as WRITE_SAME_16 as far as QEMU is concerned.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:16:05 +02:00
Paolo Bonzini 6a2de0f203 scsi: advertise DPOFUA
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:16:05 +02:00
Paolo Bonzini e590ecbed5 scsi: small refactoring of MMC mode-sense
Make DBD a boolean value, and force device-specific parameter to zero.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:16:05 +02:00
Paolo Bonzini ac66842646 scsi: support FUA on reads
To force unit access on reads, flush the cache *before* doing the read.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:16:05 +02:00
Paolo Bonzini a0e66a699e scsi: add a started field to SCSIDiskReq
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:16:05 +02:00
Paolo Bonzini 7f64f8e2c3 scsi: force unit access on VERIFY
Also DMA data from the host, to avoid that the host reports an
underrun.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:15:58 +02:00
Paolo Bonzini 3ed9902528 block: allow interrupting a co_sleep_ns
In the next patch we want to reenter the coroutine from
block_job_cancel_sync and cancel the timer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:03:27 +02:00
Kevin Wolf 2795ecf681 qcow2: Fix return value of alloc_refcount_block
Someone forgot something in commit 29c1a730... Documenting the right
return value is not enough, you also need to actually return it in the
code.

This bug sometimes causes error return values even when everything has
succeeded: The new offset of the refcount block is truncated to 32 bits
and interpreted as signed. At least with small cluster sizes it's easy
to get a negative return value this way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:03:27 +02:00
Kevin Wolf 8dc0a5e7a0 qcow2: Fix error handling in qcow2_alloc_cluster_offset
If do_alloc_cluster_offset() fails, the error handling code tried to
remove the request from the in-flight queue, to which it wasn't added
yet, resulting in a NULL pointer dereference.

m->nb_clusters really only becomes != 0 when the request is in the list.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:03:27 +02:00
Stefan Hajnoczi e82dabd82e ide: convert ide_sector_write() to asynchronous I/O
The IDE PIO write sector code path uses bdrv_write() and hence can make
the guest unresponsive while the I/O request is in progress.  This patch
converts ide_sector_write() to use bdrv_aio_writev() by using the
BUSY_STAT bit to tell the guest that the request is in progress.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Tested-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:03:27 +02:00
Stefan Hajnoczi bef0fd5958 ide: convert ide_sector_read() to asynchronous I/O
The IDE PIO interface currently uses bdrv_read() to perform reads
synchronously.  Synchronous I/O in the vcpu thread is bad because it
prevents the guest from executing code - it makes the guest
unresponsive.

This patch converts IDE PIO to use bdrv_aio_readv().  We simply need to
use the BUSY_STAT status so the guest knows to wait while we are busy.

The only external user of ide_sector_read() is restart behavior on I/O
errors and it is not affected by this change.  We still need to restart
I/O in the same way.

Migration is also unaffected if I understand the code correctly.  We
continue to use the same transfer function and the BUSY_STAT status
should never be migrated since we flush I/O before migrating device
state.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Tested-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:03:27 +02:00
Kevin Wolf 592fa07043 qemu-io: Add command line switch for cache mode
To be used as in 'qemu-io -t writeback test.img'

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 16:03:16 +02:00
Stefan Weil 4e35b92a51 block: Fix spelling in comment (ineffcient -> inefficient)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 15:48:52 +02:00
Dong Xu Wang 8ff9ae00da iotests: fix error in 005
According comment, we should not read again, we will write.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 15:48:52 +02:00
Kevin Wolf 7094f12f86 block: Drain requests in bdrv_close
If an AIO request is in flight that refers to a BlockDriverState that
has been closed and possibly even freed, more or less anything could
happen. I have seen segfaults, -EBADF return values and qcow2 sometimes
actually catches the situation in bdrv_close() and abort()s.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 15:48:52 +02:00