Commit graph

345 commits

Author SHA1 Message Date
Alexandre Raymond 9bf0960a9a Fix compilation warning due to missing header for sigaction (followup)
This patch removes all references to signal.h when qemu-common.h is included
as they become redundant.

Signed-off-by: Alexandre Raymond <cerbere@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-06-08 09:04:29 +01:00
Stefan Hajnoczi 77a5a0001b qed: support for growing images
The .bdrv_truncate() operation resizes images and growing is easy to
implement in QED.  Simply check that the new size is valid and then
update the image_size header field to reflect the new size.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-18 14:39:15 +02:00
Stefan Hajnoczi 6f321e93ab qed: Periodically flush and clear need check bit
One strategy to limit the startup delay of consistency check when
opening image files is to ensure that the file is marked dirty for as
little time as possible.

QED currently marks the image dirty when the first allocating write
request is issued and clears the dirty bit again when the image is
cleanly closed.  In practice that means the image is marked dirty for
most of a guest's lifetime and prone to being in a dirty state upon
crash or power failure.

It is safe to clear the dirty bit after all allocating write requests
have completed and a flush has been performed.  This patch adds a timer
after the last allocating write request completes.  When the timer fires
it will flush and then clear the dirty bit.  The timer is set to 5
seconds and is cancelled upon arrival of a new allocating write request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-18 14:38:46 +02:00
Stefan Weil a1c7273b82 Fix typos in comments and code (occured -> occurred and related)
The code changed here is an unused data type name (evt_flush_occurred).

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-05-08 10:02:18 +01:00
Stefan Weil ebabb67a17 Fix typo in code and comments
Replace writeable -> writable

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-05-06 08:19:25 +01:00
Nick Thomas d2d979c628 NBD: Avoid leaking a couple of strings when the NBD device is closed
Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-03 11:29:21 +02:00
Stefan Hajnoczi 19dfc44a94 qed: Fix consistency check on 32-bit hosts
The qed_bytes_to_clusters() function is normally used with size_t
lengths.  Consistency check used it with file size length and therefore
failed on 32-bit hosts when the image file is 4 GB or more.

Make qed_bytes_to_clusters() explicitly 64-bit and update consistency
check to keep 64-bit cluster counts.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-27 16:21:00 +02:00
Mitnick Lyu 2d56a546a7 vpc.c: Use get_option_parameter() does the search
Use get_option_parameter() to instead of duplicating the loop, and
use BDRV_SECTOR_SIZE to instead of 512

Signed-off-by: Mitnick Lyu <mitnick.lyu@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-13 12:31:41 +02:00
Anthony Liguori 21df65b644 qed: Add support for zero clusters
Zero clusters are similar to unallocated clusters except instead of reading
their value from a backing file when one is available, the cluster is always
read as zero.

This implements read support only.  At this stage, QED will never write a
zero cluster.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-13 12:06:41 +02:00
Nick Thomas 33897dc7d6 NBD device: Separate out parsing configuration and opening sockets.
We also change the way the file parameter is parsed so IPv6 IP
addresses can be used, e.g.: "drive=nbd:[::1]:5000"

Signed-off-by: Nick Thomas <nick@bytemark.co.uk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:48 +02:00
Stefan Weil 4ff9786c67 Fix trivial "endianness bugs"
Replace endianess -> endianness.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2011-04-03 21:42:57 +02:00
Michael Tokarev 8cffde7329 get rid of private bitmap functions in block/sheepdog.c, use generic ones
qemu now has generic bitmap functions,
so don't redefine them in sheepdog.c,
use common header instead.  A small cleanup.

Here's only one function which is actually
used in sheepdog and gets replaced with
a generic one (simplified):

- static inline int test_bit(int nr, const volatile unsigned long *addr)
+ static inline int test_bit(int nr, const unsigned long *addr)
 {
-  return ((1UL << (nr % BITS_PER_LONG))
            & ((unsigned long*)addr)[nr / BITS_PER_LONG])) != 0;
+  return 1UL & (addr[nr / BITS_PER_LONG] >> (nr & (BITS_PER_LONG-1)));
 }

The body is equivalent, but the argument is not: there's
"volatile" in there.  Why it is used for - I'm not sure.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2011-04-01 22:23:06 +02:00
Stefan Weil 5614c188c6 block/qcow: Don't ignore immediate read/write and other failures
This patch is similar to 171e3d6b99
which fixed qcow2:

Returning -EIO is far from optimal, but at least it's an error code.

In addition to read/write failures, -EIO is also returned when
decompress_cluster failed.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-03-15 13:21:14 +01:00
Stefan Weil 40a892b78c block/vdi: Don't ignore immediate read/write failures
This patch is similar to 171e3d6b99
which fixed qcow2:

Returning -EIO is far from optimal, but at least it's an error code.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-03-15 13:21:14 +01:00
Kevin Wolf 16fde5f2c2 qcow2: Fix order in L2 table COW
When copying L2 tables (this happens only with internal snapshots), the order
wasn't completely safe, so that after a crash you could end up with a L2 table
that has too low refcount, possibly leading to corruption in the long run.

This patch puts the operations in the right order: First allocate the new
L2 table and replace the reference, and only then decrease the refcount of the
old table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-10 13:24:29 +01:00
Kevin Wolf 10b758e85c qed: Report error for unsupported features
Instead of just returning -ENOTSUP, generate a more detailed error.

Unfortunately we don't have a helpful text for features that we don't know yet,
so just print the feature mask. It might be useful at least if someone asks for
help.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-02-10 13:23:59 +01:00
Kevin Wolf e8cdcec123 qcow2: Report error for version > 2
The qcow2 driver is now declared responsible for any QCOW image that has
version 2 or greater (before this, version 3 would be detected as raw).

For everything newer than version 2, an error is reported.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
2011-02-10 13:23:56 +01:00
Kevin Wolf 8af3648843 qcow2: Fix error handling for reading compressed clusters
When reading a compressed cluster failed, qcow2 falsely returned success.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2011-02-10 13:23:44 +01:00
Kevin Wolf 3ab4c7e92d qcow2: Fix error handling for immediate backing file read failure
Requests could return success even though they failed when bdrv_aio_readv
returned NULL for a backing file read.

Reported-by: Chunqiang Tang <ctang@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-10 13:23:44 +01:00
Chunqiang Tang e0d9c6f937 QCOW2: bug fix - read base image beyond its size
This patch fixes the following bug in QCOW2. For a QCOW2 image that is larger
than its base image, when handling a read request straddling over the end of the
base image, the QCOW2 driver attempts to read beyond the end of the base image
and the request would fail.

This bug was found by Fast Virtual Disk (FVD)'s fully automated testing tool.
The following test triggered the bug.

dd if=/dev/zero of=/var/ramdisk/truth.raw count=0 bs=1 seek=1098561536
dd if=/dev/zero of=/var/ramdisk/zero-500M.raw count=0 bs=1 seek=593099264
./qemu-img create -f qcow2 -ocluster_size=65536,backing_fmt=blksim -b /var/ramdisk/zero-500M.raw /var/ramdisk/test.qcow2 1098561536
./qemu-io --auto --seed=30477694 --truth=/var/ramdisk/truth.raw --format=qcow2 --test=blksim:/var/ramdisk/test.qcow2 --verify_write=true --compare_before=false --compare_after=true --round=100000 --parallel=100 --io_size=10485760 --fail_prob=0 --cancel_prob=0 --instant_qemubh=true

Signed-off-by: Chunqiang Tang <ctang@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-10 13:23:44 +01:00
Stefan Weil 4f3669ea5b block/vdi: Fix wrong size in conditionally used memset, memcmp
Error report from cppcheck:
block/vdi.c:122: error: Using sizeof for array given as function argument returns the size of pointer.
block/vdi.c:128: error: Using sizeof for array given as function argument returns the size of pointer.

Fix both by setting the correct size.

The buggy code is only used when QEMU is build without uuid support.
The bug is not critical, so there is no urgent need to apply it to
old versions of QEMU.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 10:07:25 +01:00
Kevin Wolf e1a7107f2d qcow2: Really use cache=unsafe for image creation
For cache=unsafe we also need to set BDRV_O_CACHE_WB, otherwise we have some
strange unsafe writethrough mode.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-02-07 09:44:22 +01:00
Blue Swirl 1869a65385 qcow2-refcount: remove write-only variables
Variables l2_modified and l2_size are not really used, remove them.
Spotted by GCC 4.6.0:
  CC    block/qcow2-refcount.o
/src/qemu/block/qcow2-refcount.c: In function 'qcow2_update_snapshot_refcount':
/src/qemu/block/qcow2-refcount.c:708:37: error: variable 'l2_modified' set but not used [-Werror=unused-but-set-variable]
/src/qemu/block/qcow2-refcount.c:708:9: error: variable 'l2_size' set but not used [-Werror=unused-but-set-variable]

CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:05:34 +01:00
Kevin Wolf 1b40bbd13a raw-win32: Fix bdrv_flush return value
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Stefan Hajnoczi 0d09c79700 qed: Images with backing file do not require QED_F_NEED_CHECK
The consistency check on open is necessary in order to fix inconsistent
table offsets left as a result of a crash mid-operation.  Images with a
backing file actually flush before updating table offsets and are
therefore guaranteed to be consistent.  Do not mark these images dirty.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Kevin Wolf 5ea929e3d1 qcow2: Add bdrv_discard support
This adds a bdrv_discard function to qcow2 that frees the discarded clusters.
It does not yet pass the discard on to the underlying file system driver, but
the space can be reused by future writes to the image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-01-31 10:03:00 +01:00
MORITA Kazutaka b444736346 sheepdog: support creating images on remote hosts
This patch parses the input filename in sd_create(), and enables us
specifying a target server to create sheepdog images.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Jes Sorensen bf595021c7 Reorganize struct Qcow2Cache for better struct packing
Move size after the two pointers in struct Qcow2Cache to get better
packing of struct elements on 64 bit architectures.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Stefan Hajnoczi c743849bee qed: Refuse to create images on block devices
QED relies on the underlying filesystem to extend the file and maintain
its size.  Check that images are not created on a block device.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:41:50 +01:00
Kevin Wolf 3de0a2944b qcow2: Batch flushes for COW
qcow2 calls bdrv_flush() after performing COW in order to ensure that the
L2 table change is never written before the copy is safe on disk. Now that the
L2 table is cached, we can wait with flushing until we write out the next L2
table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:41:49 +01:00
Kevin Wolf 29c1a7301a qcow2: Use QcowCache
Use the new functions of qcow2-cache.c for everything that works on refcount
block and L2 tables.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:41:49 +01:00
Kevin Wolf 493810940b qcow2: Add QcowCache
This adds some new cache functions to qcow2 which can be used for caching
refcount blocks and L2 tables. When used with cache=writethrough they work
like the old caching code which is spread all over qcow2, so for this case we
have merely a cleanup.

The interesting case is with writeback caching (this includes cache=none) where
data isn't written to disk immediately but only kept in cache initially. This
leads to some form of metadata write batching which avoids the current "write
to refcount block, flush, write to L2 table" pattern for each single request
when a lot of cluster allocations happen. Instead, cache entries are only
written out if its required to maintain the right order. In the pure cluster
allocation case this means that all metadata updates for requests are done in
memory initially and on sync, first the refcount blocks are written to disk,
then fsync, then L2 tables.

This improves performance of scenarios with lots of cluster allocations
noticably (e.g. installation or after taking a snapshot).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 11:08:51 +01:00
Aurelien Jarno 653df36bbe qcow2: fix unaligned access
cpu_to_be64w() is called with an obviously non-aligned pointer. Use
cpu_to_be64wu() instead. It fixes unaligned accesses errors on IA64
hosts.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 11:08:50 +01:00
Blue Swirl f0ff243a16 vpc: fix a file descriptor leak
Fix a file descriptor leak, reported by cppcheck:
[/src/qemu/block/vpc.c:524]: (error) Resource leak: fd

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-01-12 19:49:00 +00:00
Blue Swirl 08089edcd2 vvfat: fix a file descriptor leak
Fix a file descriptor leak, reported by cppcheck:
[/src/qemu/block/vvfat.c:759]: (error) Resource leak: dir

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-01-12 19:48:58 +00:00
Jes Sorensen 6d85a57e20 Add proper -errno error return values to qcow2_open()
In addition this adds missing braces to the function to be consistent
with the coding style.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:15:04 +01:00
Jes Sorensen 7c80ab3f21 block/qcow2.c: rename qcow_ functions to qcow2_
It doesn't really make sense for functions in qcow2.c to be named
qcow_ so convert the names to match correctly.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:15:01 +01:00
Stefan Hajnoczi 01979a98d7 qed: Consistency check support
This patch adds support for the qemu-img check command.  It also
introduces a dirty bit in the qed header to mark modified images as
needing a check.  This bit is cleared when the image file is closed
cleanly.

If an image file is opened and it has the dirty bit set, a consistency
check will run and try to fix corrupted table offsets.  These
corruptions may occur if there is power loss while an allocating write
is performed.  Once the image is fixed it opens as normal again.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:04 +01:00
Stefan Hajnoczi eabba580e6 qed: Read/write support
This patch implements the read/write state machine.  Operations are
fully asynchronous and multiple operations may be active at any time.

Allocating writes lock tables to ensure metadata updates do not
interfere with each other.  If two allocating writes need to update the
same L2 table they will run sequentially.  If two allocating writes need
to update different L2 tables they will run in parallel.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:04 +01:00
Stefan Hajnoczi 298800cae7 qed: Table, L2 cache, and cluster functions
This patch adds code to look up data cluster offsets in the image via
the L1/L2 tables.  The L2 tables are writethrough cached in memory for
performance (each read/write requires a lookup so it is essential to
cache the tables).

With cluster lookup code in place it is possible to implement
bdrv_is_allocated() to query the number of contiguous
allocated/unallocated clusters.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:04 +01:00
Stefan Hajnoczi 75411d236d qed: Add QEMU Enhanced Disk image format
This patch introduces the qed on-disk layout and implements image
creation.  Later patches add read/write and other functionality.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:04 +01:00
Christoph Hellwig dce512dedf raw-posix: add discard support
Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
hole punching mechanisms can be added when they become available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Christoph Hellwig bb8bf76fb1 block: add discard support
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Christian Brunner f27aaf4b53 ceph/rbd block driver for qemu-kvm
RBD is an block driver for the distributed file system Ceph
(http://ceph.newdream.net/). This driver uses librados (which is part
of the Ceph server) for direct access to the Ceph object store and is
running entirely in userspace (Yehuda also wrote a driver for the
linux kernel, that can be used to access rbd volumes as a block
device).

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Christian Brunner <chb@muc.de>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-14 15:44:21 +01:00
Christoph Hellwig 11a3cb8159 raw-posix: raw_pwrite comment fixup
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-11-26 19:02:52 +01:00
Kevin Wolf 80465c5016 block: Remove unused s->hd in various drivers
All drivers use bs->file instead of s->hd for quite a while now, so it's time
to remove s->hd.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-24 17:31:06 +01:00
Blue Swirl cfd07e7abb Fix win32 build
Fix a return value change missed by
205ef7961f.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-11-07 15:10:40 +00:00
Blue Swirl a313358636 block: avoid a warning on 64 bit hosts with long as int64_t
When building on a 64 bit host which uses 'long' for int64_t,
GCC emits a warning:
  CC    block/blkverify.o
/src/qemu/block/blkverify.c: In function `blkverify_verify_readv':
/src/qemu/block/blkverify.c:304: warning: long long int format, long
unsigned int arg (arg 3)

Rework a77cffe7e9 to avoid the warning.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-11-04 13:54:37 +01:00
Kevin Wolf 1c02e2a171 qcow2: Invalidate cache after failed read
The cache content may be destroyed after a failed read, better not use it any
more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-04 13:54:37 +01:00
Kevin Wolf 4a4111851f vpc: Implement bdrv_flush
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-11-04 12:52:16 +01:00
Kevin Wolf 205ef7961f block: Allow bdrv_flush to return errors
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-04 12:52:16 +01:00
Anthony Liguori 21bcc5907f Merge remote branch 'kwolf/for-anthony' into staging 2010-10-26 09:50:58 -05:00
Blue Swirl c57c846a80 qemu-timer: move commonly used timer code to qemu-timer-common
Move timer init functions to a new file, qemu-timer-common.c. Make other
critical timer functions inlined to preserve performance in
qemu-timer.c, also move muldiv64() (used by the inline functions)
to qemu-timer.h.

Adjust block/raw-posix.c and simpletrace.c to use get_clock() directly.
Remove a similar/duplicate definition in qemu-tool.c.

Adjust hw/omap_clk.c to include qemu-timer.h because muldiv64() is used
there.

After this change, tracing can be used also for user code and
simpletrace on Win32.

Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-10-23 15:24:07 +00:00
Stefan Weil a77cffe7e9 block: Use GCC_FMT_ATTR and fix a format error
Adding the gcc format attribute detects a format bug
which is fixed here.

v2:
Don't use type cast. BDRV_SECTOR_SIZE is unsigned long long,
so %lld should be the correct format specifier.

Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
edison 51ef67270b Copy snapshots out of QCOW2 disk
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.

Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp

Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Kevin Wolf 9b036055ef qcow2: Remove old image creation function
They have been #ifdef'd out by the previous patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Kevin Wolf a9420734b6 qcow2: Simplify image creation
Instead of doing lots of magic for setting up initial refcount blocks and stuff
create a minimal (inconsistent) image, open it and initialize the rest with
regular qcow2 functions.

This is a complete rewrite of the image creation function. The old
implementating is #ifdef'd out and will be removed by the next patch (removing
it here would have made the diff unreadable because diff tries to find
similarities when it's really a rewrite)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Stefan Hajnoczi 72893756e0 qcow2: Support exact L1 table growth
The L1 table grow operation includes a size calculation that bumps up
the new L1 table size in order to anticipate the size needs of vmstate
data.  This helps reduce the number of times that the L1 table has to be
grown when vmstate data is appended.

This size overhead is not necessary during image creation,
bdrv_truncate(), or snapshot goto operations.  In fact, existing
qemu-iotests that exercise table growth are no longer able to trigger it
because image creation preallocates an L1 table that is too large after
changes to qcow_create2().

This patch keeps the size calculation but also adds exact growth for
callers that do not want to inflate the L1 table size unnecessarily.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Blue Swirl 83e3f76c25 block: avoid a write only variable
Compiling with GCC 4.6.0 20100925 produced a warning:
/src/qemu/block/qcow2-refcount.c: In function 'update_refcount':
/src/qemu/block/qcow2-refcount.c:552:13: error: variable 'dummy' set but not used [-Werror=unused-but-set-variable]

Fix by adding a dummy cast so that the result is not unused.

Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-10-13 18:38:07 +00:00
Stefan Weil d523d5d694 block/vvfat: Fix compiler warning in debug code
Fix this compiler warning:
./block/vvfat.c:2285: error: comparison of unsigned expression >= 0 is always true

Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-10-03 06:40:54 +00:00
Anthony Liguori 687db4ed2e block-verify: fix 32-bit build
Reported-by: Peter Lemenkov <lemenkov@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-22 14:46:33 -05:00
Stefan Hajnoczi d9d334176c blkverify: Add block driver for verifying I/O
The blkverify block driver makes investigating image format data
corruption much easier.  A raw image initialized with the same contents
as the test image (e.g. qcow2 file) must be provided.  The raw image
mirrors read/write operations and is used to verify that data read from
the test image is correct.

See docs/blkverify.txt for more information.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 17:00:53 +02:00
Kevin Wolf 6f5f060b73 qcow2: Avoid bounce buffers for AIO write requests
qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO write path. Encrypted
images continue to use a bounce buffer, however with constant size.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:43 +02:00
Kevin Wolf bd28f83565 qcow2: Avoid bounce buffers for AIO read requests
qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO read path, and constrains
them to a constant size for encrypted images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf 9f8e668eb1 qcow2: Get rid of additional sync on COW
We always have a sync for the refcount update when a new cluster is
allocated. If we move this past the COW, we can save an additional sync.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf 29216ed14f qcow2: Move sync out of qcow2_alloc_clusters
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf 1c4c28149f qcow2: Move sync out of update_refcount
Note that the flush is omitted intentionally in qcow2_free_clusters. If
anything, we can leak clusters here if we lose the writes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf c01828fb51 qcow2: Move sync out of write_refcount_block_entries
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Laurent Vivier c2e2872bf4 nbd: correctly manage default port
block/nbd.c: use default port number when none is specified
qemu-nbd.c:  use IANA-assigned port number: 10809

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Christoph Hellwig 581b9e29f3 raw-posix: handle > 512 byte alignment correctly
Replace the hardcoded handling of 512 byte alignment with bs->buffer_alignment
to handle larger sector size devices correctly.

Note that we can not rely on it to be initialize in bdrv_open, so deal
with the worst case there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf a655211ac6 vvfat: Use cache=unsafe
The qcow file used for write support in vvfat is a temporary file,
so we can use cache=unsafe there. Without this, write support is just
too slow to be of any use.

Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
2010-09-21 15:39:42 +02:00
Kevin Wolf 9217e26f43 vvfat: Fix double free for opening the image rw
Allocation and deallocation of bs->opaque is not in the control of a
block driver. Therefore it should not set bs->opaque to a data structure
used by another bs, or closing the image will lead to a double free.

Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
2010-09-21 15:39:42 +02:00
Kevin Wolf ac48e389d0 vvfat: Fix segfault on write to read-only disk
vvfat tries to set the readonly flag in its open function, but nowadays
this is overwritted with the readonly=... command line option. Check in
bdrv_write if the vvfat was opened read-only and return an error in this
case.

Without this check, vvfat tries to access the qcow bs, which is NULL
without enabled write support.

Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
2010-09-21 15:39:42 +02:00
Blue Swirl 95ee3914bf blkdebug: fix enum comparison
The signedness of enum types depend on the compiler implementation.
Therefore the check for negative values may or may not be meaningful.

Fix by explicitly casting to a signed integer.

Since the values are also checked earlier against event_names
table, this is an internal error. Change the 'if' to 'assert'.

This also avoids a warning with GCC flag -Wtype-limits.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-09-18 05:53:15 +00:00
Anthony Liguori 8b33d9eeba Revert "Make default invocation of block drivers safer (v3)"
This reverts commit 79368c81bf.

Conflicts:

	block.c

I haven't been able to come up with a solution yet for the corruption caused by
unaligned requests from the IDE disk so revert until a solution can be written.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-08 17:09:15 -05:00
Kevin Wolf 7ec5e6a4ca qcow2: Remove unnecessary flush after L2 write
When a new cluster was allocated, we only need a flush after the write to the
L2 table if it was a COW and we need to decrease the refcounts of the old
clusters.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:24 +02:00
Bernhard Kohl 05acda4d16 raw-posix: improve detection of scsi-generic devices
Allow symbolic links which point to /dev/sgX devices.

Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:19 +02:00
Kevin Wolf 897804d629 raw-posix: Don't use file name for host_cdrom detection on Linux
On Linux, we have code to detect CD-ROMs using an ioctl. We shouldn't lose
anything but false positives by removing the check for a /dev/cd* path.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:16 +02:00
Laurent Vivier 1d45f8b542 nbd: Introduce NBD named exports.
This patch allows to connect Qemu using NBD protocol to an nbd-server
using named exports.

For instance, if on the host "isoserver", in /etc/nbd-server/config, you have:

[generic]
[debian-500-ppc-netinst]
        exportname = /ISO/debian-500-powerpc-netinst.iso
[Fedora-10-ppc-netinst]
        exportname = /ISO/Fedora-10-ppc-netinst.iso

You can connect to it, using:

    qemu -cdrom nbd:isoserver:exportname=debian-500-ppc-netinst
    qemu -cdrom nbd:isoserver:exportname=Fedora-10-ppc-netinst

NOTE: you need at least nbd-server 2.9.18

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Loïc Minier 2aa326be0d vvfat: fat_chksum(): fix access above array bounds
Signed-off-by: Loïc Minier <loic.minier@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Izumi Tsutsui 010cb2b314 sheepdog: remove unnecessary includes
"qemu_socket.h" includes all necessary files and
including <netinet/tcp.h> without <netinet/in.h>
could cause errors on some systems.

Signed-off-by: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Kevin Wolf 336c1c1255 block: Fix bdrv_has_zero_init
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Stefan Weil 2ee9fb4801 block: Replace u_int8_t, u_int16_t, u_int32_t, u_int64_t by standard int types
There is no need to have a second set of integral types.
Replace them by the standard types from stdint.h.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-07-25 16:59:38 +02:00
Anthony Liguori 79368c81bf Make default invocation of block drivers safer (v3)
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-15 08:17:06 -05:00
MORITA Kazutaka 6defcc3784 sheepdog: fix compile error on systems without TCP_CORK
WIN32 is not only the system which doesn't have TCP_CORK (e.g. OS X).

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-07-07 20:54:56 +03:00
MORITA Kazutaka 33b1db1c88 block: add sheepdog driver for distributed storage support
Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes to VMs like Amazon EBS.  This
patch adds a qemu block driver for Sheepdog.

Sheepdog features are:
- No node in the cluster is special (no metadata node, no control
  node, etc)
- Linear scalability in performance and capacity
- No single point of failure
- Autonomous management (zero configuration)
- Useful volume management support such as snapshot and cloning
- Thin provisioning
- Autonomous load balancing

The more details are available at the project site:
    http://www.osrg.net/sheepdog/

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:50 +02:00
Markus Armbruster 65d21bc73b raw-posix: Fix test for host CD-ROM
raw_pread_aligned() retries up to two times if the block device backs
a virtual CD-ROM (a drive with media=cdrom and if=ide, scsi, xen or
none).  This makes no sense.  Whether retrying reads can correct read
errors can only depend on what we're reading, not on how the result
gets used.  We need to check what whether we're reading from a
physical CD-ROM or floppy here.

I doubt retrying is useful even then.  Left for another day.

Impact:

* Virtual CD-ROM backed by host_cdrom behaves the same.

* Virtual CD-ROM backed by file or host_device no longer retries.

* A drive backed by host_cdrom now retries even if it's not a virtual
  CD-ROM.

* Any drive backed by host_floppy now retries.

While there, clean up gratuitous use of goto.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Kevin Wolf 9ac228e02c qcow2/vdi: Change check to distinguish error cases
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Kevin Wolf 8db520cee8 blkdebug: Initialize state as 1
state = 0 in rules means that the rule is valid for any state. Therefore it's
impossible to have a rule that works only in the initial state. This changes
the initial state from 0 to 1 to make this possible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Kevin Wolf 698f0d52cd blkdebug: Free QemuOpts after having read the config
Forgetting to free them means that the next instance inherits all rules and
gets its own rules only additionally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Kevin Wolf 327cdad416 blkdebug: Fix set_state_opts definition
The list head was initialized to point to the wrong list, so all actions ended
up being handled as inject-error even if they were set-state in fact.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Kevin Wolf 19dbcbf7cc qcow2: Fix error handling during metadata preallocation
People were wondering why qemu-img check failed after they tried to preallocate
a large qcow2 file and ran out of disk space.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:01 +02:00
Kevin Wolf f74550fd53 qcow2: Don't try to check tables that couldn't be loaded
Trying to check them leads to a second error message which is more confusing
than helpful:

    Can't get refcount for cluster 0: Invalid argument
    ERROR cluster 0 refcount=-22 reference=1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 6882c8fa78 qcow2: Fix qemu-img check segfault on corrupted images
With corrupted images, we can easily get an cluster index that exceeds the
array size of the temporary refcount table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 078a458e07 vpc: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf b8852e87d9 vmdk: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 8b3b720620 qcow2: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf 5e5557d970 qcow: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf b0ad5a455d cow: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.
While at it, correct the wrong usage of errno.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Christoph Hellwig 2063392ae5 cow: use qemu block API
Use bdrv_pwrite to access the backing device instead of pread, and
convert the driver to implementing the bdrv_open method which gives
it an already opened BlockDriverState for the underlying device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00