Commit graph

614 commits

Author SHA1 Message Date
Josh Durgin b11f38fcdf rbd: hook up cache options
Writeback caching was added in Ceph 0.46, and writethrough will be in
0.47. These are controlled by general config options, so there's no
need to check for librbd version.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Kevin Wolf 166acf546f qcow2: Support for fixing refcount inconsistencies
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Kevin Wolf ccf34716ee qemu-img check: Print fixed clusters and recheck
When any inconsistencies have been fixed, print the statistics and run
another check to make sure everything is correct now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Kevin Wolf 4534ff5426 qemu-img check -r for repairing images
The QED block driver already provides the functionality to not only
detect inconsistencies in images, but also fix them. However, this
functionality cannot be manually invoked with qemu-img, but the
check happens only automatically during bdrv_open().

This adds a -r switch to qemu-img check that allows manual invocation
of an image repair.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Paolo Bonzini 6ef228fc0d stream: move rate limiting to a separate header file
Make the code reusable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Paolo Bonzini 188a7bbf94 stream: move is_allocated_above to block.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Paolo Bonzini f9749f28b7 stream: tweak usage of bdrv_co_is_allocated
is_allocated_base has complex semantics that are not really usable
outside streaming.  Split the check in two parts, where the allocated
state for the top bs is moved to the caller.  The resulting function
is more generally useful.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Paolo Bonzini 5500316ded block: implement is_allocated for raw
Either FIEMAP, or SEEK_DATA+SEEK_HOLE can be used to implement the
is_allocated callback for raw files.  On Linux ext4, btrfs and XFS
all support it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Zhi Yong Wu 87267753a3 qcow2: fix endianness conversion
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Zhi Yong Wu 833e40858c qcow2: remove a line of unnecessary code
Commit 3948d1d4 removed the pointer argument we filled in with l2_offset
but forgot to remove the unnecessary l2_offset assignment.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Kevin Wolf 1417d7e40e qcow2: Silence false warning
Some gcc versions seem not to be able to figure out that the switch
statement covers all possible values and that c is therefore always
initialised. Add a default branch for them.

Reported-by: malc <av1474@comtv.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>
2012-06-15 15:52:45 +04:00
Paolo Bonzini 7456e4ce8d build: move block/ objects to nested Makefile.objs
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-06-07 09:21:13 +02:00
Jim Meyering c2d76497b6 block: prevent snapshot mode $TMPDIR symlink attack
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.

get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.

This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652

Signed-off-by: Jim Meyering <meyering@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-30 10:18:20 +02:00
MORITA Kazutaka 6f3c714eb7 sheepdog: fix return value of do_load_save_vm_state
bdrv_save_vmstate and bdrv_load_vmstate should return the vmstate size
on success, and -errno on error.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-30 09:58:39 +02:00
Anthony Liguori 306761537f Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  fdc-test: introduced qtest no_media_on_start and cmos qtest for floppy
  fdc: fix media detection
  fdc: floppy drive should be visible after start without media
  qemu-iotests: mark 035 qcow2-only
  qcow2: Check qcow2_alloc_clusters_at() return value
  sheepdog: use heap instead of stack for BDRVSheepdogState
  sheepdog: return -errno on error
  sheepdog: mark image as snapshot when tag is specified
  qemu-img: Explain how rebase operation can be used to perform a 'diff' operation.
  qcow2: don't leak buffer for unexpected qcow_version in header
2012-05-29 04:30:49 -05:00
Ronnie Sahlberg f4dfa67f04 ISCSI: Switch to using READ16/WRITE16 for I/O to the LUN
This allows using LUNs bigger than 2TB.  Keep using READ10 for other
device types such as MMC.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28 14:04:16 +02:00
Ronnie Sahlberg 6bcd1346bb ISCSI: Only call READCAPACITY16 for SBC devices, use READCAPACITY10 for MMC
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28 14:04:15 +02:00
Ronnie Sahlberg dbfff6d776 ISCSI: get device type at connection time
This is needed to avoid READ CAPACITY(16) for MMC devices.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-28 14:04:14 +02:00
Paolo Bonzini c7b4a95202 ISCSI: change num_blocks to 64-bit
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-28 14:04:14 +02:00
Ronnie Sahlberg c9b9f6824f ISCSI: redo how we set up the events
Call qemu_notify_event() after updating events.  Otherwise, If we add
an event for -is-writeable but the socket is already writeable there
may be a delay before the event callback is actually triggered.

Those delays would in particular hurt performance during BIOS boot and
when the GRUB bootloader reads the kernel and initrd.

But first call out to the socket write functions directly, and only set up
the write event if the socket is full.  This will happen very rarely and
this improves performance.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28 14:04:06 +02:00
Kevin Wolf df02179189 qcow2: Check qcow2_alloc_clusters_at() return value
When using qcow2_alloc_clusters_at(), the cluster allocation code
checked the wrong variable for an error code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
MORITA Kazutaka b6fc8245e9 sheepdog: use heap instead of stack for BDRVSheepdogState
bdrv_create() is called in coroutine context now, so we cannot use
more stack than 1 MB in the function if we use ucontext coroutine.
This patch allocates BDRVSheepdogState, whose size is 4 MB, on the
heap in sd_create().

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
MORITA Kazutaka cb595887cc sheepdog: return -errno on error
On error, BlockDriver APIs should return -errno instead of -1.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
MORITA Kazutaka 622b6057be sheepdog: mark image as snapshot when tag is specified
When a snapshot tag is specified in the filename, the opened image is
a snapshot.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
Jim Meyering b6c147622d qcow2: don't leak buffer for unexpected qcow_version in header
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
Kevin Wolf c44bfe4637 qcow2: Don't ignore failure to clear autoclear flags
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-14 17:02:19 +02:00
Anthony Liguori 04120e3bb0 block: fix warning introduced in efcc7a23
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-10 09:10:42 -05:00
Paolo Bonzini efcc7a2324 stream: do not copy unallocated sectors from the base
Unallocated sectors should really never be accessed by the guest,
so there's no need to copy them during the streaming process.
If they are read by the guest during streaming, guest-initiated
copy-on-read will copy them (we're in the base == NULL case, which
enables copy on read).  If they are read after we disconnect the
image from the base, they will read as zeroes anyway.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 11:01:59 +02:00
Paolo Bonzini b21d677ee9 stream: fix ratelimiting corner case
This fixes inability to make progress in streaming if the quota is set
to less than the amount of data that an I/O operation has to write.

In this case, limit->dispatched + n will always be above the quota and,
due to the "goto retry" to recheck cancellation and allocation, streaming
will livelock.

This can be reproduced with "block_job_set_speed ide0-hd0 1b".  Of course,
with this patch the requested limit will not be obeyed.  That could be
done with another patch that caps is_allocated's n argument by the slice
quota.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 11:01:59 +02:00
Paolo Bonzini f6133def92 stream: pass new base image format to bdrv_change_backing_file
When an image is modified to point to the new backing file, the backing
file format is set to NULL, which means auto-probe.  This is wrong, in
fact it is a small security problem.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 11:01:59 +02:00
Paolo Bonzini fa4478d5c8 block: wait for job callback in block_job_cancel_sync
The limitation on not having I/O after cancellation cannot really be
kept.  Even streaming has a very small race window where you could
cancel a job and have it report completion.  If this window is hit,
bdrv_change_backing_file() will yield and possibly cause accesses to
dangling pointers etc.

So, let's just assume that we cannot know exactly what will happen
after the coroutine has set busy to false.  We can set a very lax
condition:

- if we cancel the job, the coroutine won't set it to false again
(and hence will not call co_sleep_ns again).

- block_job_cancel_sync will wait for the coroutine to exit, which
pretty much ensures no race.

Instead, we track the coroutine that executes the job and put very
strict conditions on what to do while it is quiescent (busy = false).
First of all, the coroutine must never set busy = false while the job
has been cancelled.  Second, the coroutine can be reentered arbitrarily
while it is quiescent, so you cannot really do anything but co_sleep_ns at
that time.  This condition is obeyed by the block_job_sleep_ns function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 4513eafe92 block: add block_job_sleep_ns
This function abstracts the pretty complex semantics of the "busy"
member of BlockJob.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini e023b2e244 block: fix snapshot on QED
QED's opaque data includes a pointer back to the BlockDriverState.
This breaks when bdrv_append shuffles data between bs_new and bs_top.
To avoid this, add a "rebind" function that tells the driver about
the new relationship between the BlockDriverState and its opaque.

The patch also adds rebind to VVFAT for completeness, even though
it is not used with live snapshots.

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 469ef350e1 block: update in-memory backing file and format
These are needed to print "info block" output correctly.  QCOW2 does this
because it needs it to write the header, but QED does not, and common code
is the right place to do it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini 5f3777945d block: push bdrv_change_backing_file error checking up from drivers
This check applies to all drivers, but QED lacks it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Anthony Liguori 7c652c1eaf Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  fdc: simplify media change handling
  qcow2: lock on prealloc
  block: make bdrv_create adopt coroutine
  qcow2: Limit COW to where it's needed
  sheepdog: switch to writethrough mode if cluster doesn't support flush
2012-05-08 09:38:41 -05:00
Zhi Yong Wu 15552c4ad3 qcow2: lock on prealloc
preallocate() will be locked. This is required because
qcow2_alloc_cluster_link_l2() assumes that it runs under a lock that it
can drop while COW is being performed.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Kevin Wolf 54e6814360 qcow2: Limit COW to where it's needed
This fixes a regression introduced in commit 250196f1. The bug leads to
data corruption, found during an Autotest run with a Fedora 8 guest.

Consider a write request whose first part is covered by an already
allocated cluster, but additional clusters need to be newly allocated.
When counting the number of clusters to allocate, the qcow2 code would
decide to do COW for all remaining clusters of the write request, even
if some of them are already allocated.

If during this COW operation another write request is issued that touches
the same cluster, it will still refer to the old cluster. When the COW
completes, the first request will update the L2 table and the second
write request will be lost. Note that the requests need not overlap, it's
enough for them to touch the same cluster.

This patch ensures that only clusters that really require COW are
considered for allocation. In this case any other request writing to the
same cluster will be an allocating write and gets serialised.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
MORITA Kazutaka 115c2b5a68 sheepdog: switch to writethrough mode if cluster doesn't support flush
This is necessary for qemu to work with the older version of Sheepdog
which doesn't support SD_OP_FLUSH_VDI.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Ronnie Sahlberg fa6acb0c2f ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs
Update the configure test for libiscsi support to detect version 1.3
or later.  Version 1.3 of libiscsi provides both READCAPACITY16 as well
as UNMAP commands.

Update the iscsi block layer to use READCAPACITY16 to detect the size of
the LUN instead of READCAPACITY10. This allows support for LUNs larger
than 2TB.

Update to implement bdrv_aio_discard() using the UNMAP command.
This allows us to use thin-provisioned LUNs from TGTD and other iSCSI
targets that support thin-provisioning.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
[squashed in subsequent patch from Ronnie to fix off-by-one in LBA count]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-04 10:39:18 +02:00
Josh Durgin 787f31330e rbd: add discard support
Change the write flag to an operation type in RBDAIOCB, and make the
buffer optional since discard doesn't use it.

Discard is first included in librbd 0.1.2 (which is in Ceph 0.46).
If librbd is too old, leave out qemu_rbd_aio_discard entirely,
so the old behavior is preserved.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:41:42 +02:00
Zhi Yong Wu 647cc47223 qcow2: fix the return value -ENOENT -> -EEXIST
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:39:39 +02:00
Kevin Wolf 7242411460 qcow2: Don't hold cache references across yield
If cache references are held while the coroutine has yielded, the cache
may get used up and abort() when it can't find a free entry.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:39:39 +02:00
Kevin Wolf 60651f901a qcow2: Remove unused parameter in do_alloc_cluster_offset
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:39:39 +02:00
Stefan Weil b9531b6eed block/qcow2: Add missing GCC_FMT_ATTR to function report_unsupported()
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:39:39 +02:00
Pavel Borzenkov 83affaa622 raw-posix: Do not use CONFIG_COCOA macro
Use __APPLE__ and __MACH__ macros instead of CONFIG_COCOA to detect Mac
OS X host. The patch is based on Ben Leslie's patch:
http://patchwork.ozlabs.org/patch/97859/

Signed-off-by: Ben Leslie <benno@benno.id.au>
Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
2012-05-01 00:16:58 +02:00
Anthony Liguori a8b69b8e24 Merge remote-tracking branch 'qmp/queue/qmp' into staging
* qmp/queue/qmp:
  qapi: fix qmp_balloon() conversion
  qemu-iotests: add block-stream speed value test case
  block: add 'speed' optional parameter to block-stream
  block: change block-job-set-speed argument from 'value' to 'speed'
  block: use Error mechanism instead of -errno for block_job_set_speed()
  block: use Error mechanism instead of -errno for block_job_create()
2012-04-27 12:00:06 -05:00
Stefan Hajnoczi c83c66c3b5 block: add 'speed' optional parameter to block-stream
Allow streaming operations to be started with an initial speed limit.
This eliminates the window of time between starting streaming and
issuing block-job-set-speed.  Users should use the new optional 'speed'
parameter instead so that speed limits are in effect immediately when
the job starts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi 882ec7ce53 block: change block-job-set-speed argument from 'value' to 'speed'
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi 9e6636c72d block: use Error mechanism instead of -errno for block_job_set_speed()
There are at least two different errors that can occur in
block_job_set_speed(): the job might not support setting speeds or the
value might be invalid.

Use the Error mechanism to report the error where it occurs.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00