Using ip_deq after m_free might read pointers from an allocation reuse.
This would be difficult to exploit, but that is still related with
CVE-2019-14378 which generates fragmented IP packets that would trigger this
issue and at least produce a DoS.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(from libslirp.git commit c59279437eda91841b9d26079c70b8a540d41204)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When the first fragment does not fit in the preallocated buffer, q will
already be pointing to the ext buffer, so we mustn't try to update it.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(from libslirp.git commit 126c04acbabd7ad32c2b018fe10dfac2a3bc1210)
(from libslirp.git commit e0be80430c390bce181ea04dfcdd6ea3dfa97de1)
*squash in e0be80 (clarifying comments)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In function ‘create_qp’:
hw/rdma/vmw/pvrdma_cmd.c:517:16: error: ‘rc’ undeclared
The backport of 509f57c98 in 41dd30ff6 mishandled the conflict
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The network interface name in Linux is defined to be of size
IFNAMSIZ(=16), including the terminating null('\0') byte.
The same is applied to interface names read from 'bridge.conf'
file to form ACL rules. If user supplied '--br=bridge' name
is not restricted to the same length, it could lead to ACL bypass
issue. Restrict interface name to IFNAMSIZ, including null byte.
Reported-by: Riccardo Schirone <rschiron@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 6f5d867122)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().
For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.
Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.
The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).
The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 4720cbeea1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
create_cq and create_qp routines allocate ring object, but it's
not released in case of an error, leading to memory leakage.
Reported-by: Li Qiang <liq3ea@163.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
(cherry picked from commit 509f57c98e)
Conflicts:
hw/rdma/vmw/pvrdma_cmd.c
*drop dependency on 09178217
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
pvrdma_idx_ring_has_[data/space] routines also return invalid
index PVRDMA_INVALID_IDX[=-1], if ring has no data/space. Check
return value from these routines to avoid plausible infinite loops.
Reported-by: Li Qiang <liq3ea@163.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
(cherry picked from commit f1e2e38ee0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When creating CQ/QP rings, an object can have up to
PVRDMA_MAX_FAST_REG_PAGES 8 pages. Check 'npages' parameter
to avoid excessive memory allocation or a null dereference.
Reported-by: Li Qiang <liq3ea@163.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
(cherry picked from commit 2c858ce5da)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If the value of get_image_size() exceeds INT_MAX / 2 - 10000, the
computation of @dt_size overflows to a negative number, which then
gets converted to a very large size_t for g_malloc0() and
load_image_size(). In the (fortunately improbable) case g_malloc0()
succeeds and load_image_size() survives, we'd assign the negative
number to *sizep. What that would do to the callers I can't say, but
it's unlikely to be good.
Fix by rejecting images whose size would overflow.
Reported-by: Kurtis Miller <kurtis.miller@nccgroup.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190409174018.25798-1-armbru@redhat.com>
(cherry picked from commit 065e6298a7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Instead use load_image_size().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181130151712.2312-9-peter.maydell@linaro.org
(cherry picked from commit da885fe1ee)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When releasing spice resources in release_resource() routine,
if release info object 'ext.info' is null, it leads to null
pointer dereference. Add check to avoid it.
Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20190425063534.32747-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d52680fc93)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The Mesa library tries to set process affinity on some of its threads in
order to optimize its performance. Currently this results in QEMU being
immediately terminated when seccomp is enabled.
Mesa doesn't consider failure of the process affinity settings to be
fatal to its operation, but our seccomp policy gives it no choice in
gracefully handling this denial.
It is reasonable to consider that malicious code using the resource
control syscalls to be a less serious attack than if they were trying
to spawn processes or change UIDs and other such things. Generally
speaking changing the resource control setting will "merely" affect
quality of service of processes on the host. With this in mind, rather
than kill the process, we can relax the policy for these syscalls to
return the EPERM errno value. This allows callers to detect that QEMU
does not want them to change resource allocations, and apply some
reasonable fallback logic.
The main downside to this is for code which uses these syscalls but does
not check the return value, blindly assuming they will always
succeeed. Returning an errno could result in sub-optimal behaviour.
Arguably though such code is already broken & needs fixing regardless.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 9a1565a03b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
While emulating identification protocol, tcp_emu() does not check
available space in the 'sc_rcv->sb_data' buffer. It could lead to
heap buffer overflow issue. Add check to avoid it.
Reported-by: Kira <864786842@qq.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit a7104eda7d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Whenever the allocation length of a SCSI request is shorter than the size of the
VPD page list, page_idx is used blindly to index into r->buf. Even though
the stores in the insertion sort are protected against overflows, the same is not
true of the reads and the final store of 0xb0.
This basically does the same thing as commit 57dbb58d80 ("scsi-generic: avoid
out-of-bounds access to VPD page list", 2018-11-06), except that here the
allocation length can be chosen by the guest. Note that according to the SCSI
standard, the contents of the PAGE LENGTH field are not altered based
on the allocation length.
The code was introduced by commit 6c219fc8a1 ("scsi-generic: keep VPD
page list sorted", 2018-11-06) but the overflow was already possible before.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Fixes: a71c775b24
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e909ff9369)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If during pvrdma device initialisation an error occurs,
pvrdma_realize() does not release memory resources, leading
to memory leakage.
Reported-by: Li Qiang <liq3ea@163.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20181212175817.815-1-ppandit@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
(cherry picked from commit cce648613b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The glfs_*_async() functions do a callback once finished. This callback
has changed its arguments, pre- and post-stat structures have been
added. This makes it possible to improve caching, which is useful for
Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first
release that includes these new arguments.
With an additional detection in ./configure, the new arguments can
conditionally get included in the glfs_io_cbk handler.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 0e3b891fef)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
New versions of Glusters libgfapi.so have an updated glfs_ftruncate()
function that returns additional 'struct stat' structures to enable
advanced caching of attributes. This is useful for file servers, not so
much for QEMU. Nevertheless, the API has changed and needs to be
adopted.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e014dbe74e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
To avoid incoherent states when the machine resets (see bug report
below), add the device reset callback.
A "system reset" sets the device state machine in READ_ARRAY mode
and, after some delay, set the SR.7 READY bit.
Since we do not model timings, we set the SR.7 bit directly.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678713
Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
[Laszlo Ersek: Regression tested EDK2 OVMF IA32X64, ArmVirtQemu Aarch64
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04373.html]
Message-Id: <20190718104837.13905-2-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit 3a283507c0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We have two open-coded copies of macro PFLASH_CFI01(). Move the macro
to the header, so we can ditch the copies. Move PFLASH_CFI02() to the
header for symmetry.
We define macros TYPE_PFLASH_CFI01 and TYPE_PFLASH_CFI02 for type name
strings, then mostly use the strings. If the macros are worth
defining, they are worth using. Replace the strings by the macros.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-6-armbru@redhat.com>
(cherry picked from commit 81c7db723e)
*prereq for 3a283507
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
pflash_cfi01.c and pflash_cfi02.c start their identifiers with
pflash_cfi01_ and pflash_cfi02_ respectively, except for
CFI_PFLASH01(), TYPE_CFI_PFLASH01, CFI_PFLASH02(), TYPE_CFI_PFLASH02.
Rename for consistency.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-5-armbru@redhat.com>
(cherry picked from commit e7b6274197)
*prereq for 3a283507
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Our implementation of "write to buffer" (command 0xE8) is flawed.
LOG_UNIMP its use, and add some FIXME comments.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-4-armbru@redhat.com>
(cherry picked from commit 4dbda935e0)
*prereq for 3a283507
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When a guest tries to abort "write to buffer" (command 0xE8), we print
"PFLASH: Possible BUG - Write block confirm", then exit(1). Letting
the guest terminate QEMU is not a good idea. Instead, LOG_UNIMP we
screwed up, then reset the device.
Macro PFLASH_BUG() is now unused; delete it.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-3-armbru@redhat.com>
(cherry picked from commit 2d93bebf81)
*prereq for 3a283507
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
flash.h's incomplete struct pflash_t is completed both in
pflash_cfi01.c and in pflash_cfi02.c. The complete types are
incompatible. This can hide type errors, such as passing a pflash_t
created with pflash_cfi02_register() to pflash_cfi01_get_memory().
Furthermore, POSIX reserves typedef names ending with _t.
Rename the two structs to PFlashCFI01 and PFlashCFI02.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308094610.21210-2-armbru@redhat.com>
(cherry picked from commit 1643406520)
*prereq for 3a283507
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Don't dynamically allocate the pflash's timer. But do use timer_del in
an unrealize function to make sure that the timer can't fire after the
pflash_t has been freed.
Signed-off-by: Stephen Checkoway <stephen.checkoway@oberlin.edu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190219153727.62279-1-stephen.checkoway@oberlin.edu>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit d80cf1eb2e)
*prereq for 16434065/3a283507
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In the previous commit we fixed a crash when the guest read a
register that pop from an empty FIFO.
By auditing the repository, we found another similar use with
an easy way to reproduce:
$ qemu-system-aarch64 -M xlnx-zcu102 -monitor stdio -S
QEMU 4.0.50 monitor - type 'help' for more information
(qemu) xp/b 0xfd4a0134
Aborted (core dumped)
(gdb) bt
#0 0x00007f6936dea57f in raise () at /lib64/libc.so.6
#1 0x00007f6936dd4895 in abort () at /lib64/libc.so.6
#2 0x0000561ad32975ec in xlnx_dp_aux_pop_rx_fifo (s=0x7f692babee70) at hw/display/xlnx_dp.c:431
#3 0x0000561ad3297dc0 in xlnx_dp_read (opaque=0x7f692babee70, offset=77, size=4) at hw/display/xlnx_dp.c:667
#4 0x0000561ad321b896 in memory_region_read_accessor (mr=0x7f692babf620, addr=308, value=0x7ffe05c1db88, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
#5 0x0000561ad321bd70 in access_with_adjusted_size (addr=308, value=0x7ffe05c1db88, size=1, access_size_min=4, access_size_max=4, access_fn=0x561ad321b858 <memory_region_read_accessor>, mr=0x7f692babf620, attrs=...) at memory.c:569
#6 0x0000561ad321e9d5 in memory_region_dispatch_read1 (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1420
#7 0x0000561ad321ea9d in memory_region_dispatch_read (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1447
#8 0x0000561ad31bd742 in flatview_read_continue (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1, addr1=308, l=1, mr=0x7f692babf620) at exec.c:3385
#9 0x0000561ad31bd895 in flatview_read (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3423
#10 0x0000561ad31bd90b in address_space_read_full (as=0x561ad5bb3020, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3436
#11 0x0000561ad33b1c42 in address_space_read (len=1, buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", attrs=..., addr=4249485620, as=0x561ad5bb3020) at include/exec/memory.h:2131
#12 0x0000561ad33b1c42 in memory_dump (mon=0x561ad59c4530, count=1, format=120, wsize=1, addr=4249485620, is_physical=1) at monitor/misc.c:723
#13 0x0000561ad33b1fc1 in hmp_physical_memory_dump (mon=0x561ad59c4530, qdict=0x561ad6c6fd00) at monitor/misc.c:795
#14 0x0000561ad37b4a9f in handle_hmp_command (mon=0x561ad59c4530, cmdline=0x561ad59d0f22 "/b 0x00000000fd4a0134") at monitor/hmp.c:1082
Fix by checking the FIFO is not empty before popping from it.
The datasheet is not clear about the reset value of this register,
we choose to return '0'.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20190709113715.7761-4-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit a09ef50404)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reading the RX_DATA register when the RX_FIFO is empty triggers
an abort. This can be easily reproduced:
$ qemu-system-arm -M emcraft-sf2 -monitor stdio -S
QEMU 4.0.50 monitor - type 'help' for more information
(qemu) x 0x40001010
Aborted (core dumped)
(gdb) bt
#1 0x00007f035874f895 in abort () at /lib64/libc.so.6
#2 0x00005628686591ff in fifo8_pop (fifo=0x56286a9a4c68) at util/fifo8.c:66
#3 0x00005628683e0b8e in fifo32_pop (fifo=0x56286a9a4c68) at include/qemu/fifo32.h:137
#4 0x00005628683e0efb in spi_read (opaque=0x56286a9a4850, addr=4, size=4) at hw/ssi/mss-spi.c:168
#5 0x0000562867f96801 in memory_region_read_accessor (mr=0x56286a9a4b60, addr=16, value=0x7ffeecb0c5c8, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
#6 0x0000562867f96cdb in access_with_adjusted_size (addr=16, value=0x7ffeecb0c5c8, size=4, access_size_min=1, access_size_max=4, access_fn=0x562867f967c3 <memory_region_read_accessor>, mr=0x56286a9a4b60, attrs=...) at memory.c:569
#7 0x0000562867f99940 in memory_region_dispatch_read1 (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1420
#8 0x0000562867f99a08 in memory_region_dispatch_read (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1447
#9 0x0000562867f38721 in flatview_read_continue (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, addr1=16, l=4, mr=0x56286a9a4b60) at exec.c:3385
#10 0x0000562867f38874 in flatview_read (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3423
#11 0x0000562867f388ea in address_space_read_full (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3436
#12 0x0000562867f389c5 in address_space_rw (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=false) at exec.c:3466
#13 0x0000562867f3bdd7 in cpu_memory_rw_debug (cpu=0x56286aa19d00, addr=1073745936, buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=0) at exec.c:3976
#14 0x000056286811ed51 in memory_dump (mon=0x56286a8c32d0, count=1, format=120, wsize=4, addr=1073745936, is_physical=0) at monitor/misc.c:730
#15 0x000056286811eff1 in hmp_memory_dump (mon=0x56286a8c32d0, qdict=0x56286b15c400) at monitor/misc.c:785
#16 0x00005628684740ee in handle_hmp_command (mon=0x56286a8c32d0, cmdline=0x56286a8caeb2 "0x40001010") at monitor/hmp.c:1082
From the datasheet "Actel SmartFusion Microcontroller Subsystem
User's Guide" Rev.1, Table 13-3 "SPI Register Summary", this
register has a reset value of 0.
Check the FIFO is not empty before accessing it, else log an
error message.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20190709113715.7761-3-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit c0bccee9b4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Both lqspi_read() and lqspi_load_cache() expect a 32-bit
aligned address.
>From UG1085 datasheet [*] chapter on 'Quad-SPI Controller':
Transfer Size Limitations
Because of the 32-bit wide TX, RX, and generic FIFO, all
APB/AXI transfers must be an integer multiple of 4-bytes.
Shorter transfers are not possible.
Set MemoryRegionOps.impl values to force 32-bit accesses,
this way we are sure we do not access the lqspi_buf[] array
out of bound.
[*] https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Tested-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 526668c734)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The function gen_get_ccr() returns a tcg_temp created with
tcg_temp_new(). Free it with tcg_temp_free().
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190310003428.11723-4-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 44c64e9095)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Previous patches switched to a temporary pbp but that does not go far
enough: after device uses a buffer, guest is free to reuse it, so
tracking the page and freeing it later is wrong.
Free and reset the pbp after we push each element.
Fixes: ed48c59875 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size")
Cc: qemu-stable@nongnu.org #v4.0.0
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1b47b37c33)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
As ramblocks cannot get removed/readded while we are processing a bulk
of inflation requests, there is no more need to track the page size
in form of the number of subpages.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190725113638.4702-8-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9a7ca8a7c9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We still have multiple issues in the current code
- The PBP is not freed during unrealize()
- The PBP is not reset on device resets: After a reset, the PBP is stale.
- We are not indicating VIRTIO_BALLOON_F_MUST_TELL_HOST, therefore
guests (esp. legacy guests) will reuse pages without deflating,
turning the PBP stale. Adding that would require compat handling.
Instead, let's use the PBP only temporarily, when processing one bulk of
inflation requests. This will keep guest_page_size > 4k working (with
Linux guests). There is nothing to do for deflation requests anymore.
The pbp is only used for a limited amount of time.
Fixes: ed48c59875 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size")
Cc: qemu-stable@nongnu.org #v4.0.0
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-7-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit a8cd64d488)
*drop context dependency on qemu_4_0_config_size changes
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Using the address of a RAMBlock to test for a matching pbp is not really
safe. Instead, let's use the guest physical address of the base page
along with the page size (via the number of subpages).
Also, let's allocate the bitmap separately. This makes the code
easier to read and maintain - we can reuse bitmap_new().
Prepare the code to move the PBP out of the device.
Fixes: ed48c59875 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size")
Fixes: b27b323914 ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates")
Cc: qemu-stable@nongnu.org #v4.0.0
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-6-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1c5cfc2b71)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
"host_page_base" is really confusing, let's make this clearer, also
rename the other offsets to indicate to which base they apply.
offset -> mr_offset
ram_offset -> rb_offset
host_page_base -> rb_aligned_offset
While at it, use QEMU_ALIGN_DOWN() instead of a handcrafted computation
and move the computation to the place where it is needed.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-5-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e6129b271b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Let's simplify this - the case we are optimizing for is very hard to
trigger and not worth the effort. If we're switching from inflation to
deflation, let's reset the pbp.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2ffc49eea1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We are using the wrong functions to set/clear bits, effectively touching
multiple bits, writing out of range of the bitmap, resulting in memory
corruptions. We have to use set_bit()/clear_bit() instead.
Can easily be reproduced by starting a qemu guest on hugetlbfs memory,
inflating the balloon. QEMU crashes. This never could have worked
properly - especially, also pages would have been discarded when the
first sub-page would be inflated (the whole bitmap would be set).
While testing I realized, that on hugetlbfs it is pretty much impossible
to discard a page - the guest just frees the 4k sub-pages in random order
most of the time. I was only able to discard a hugepage a handful of
times - so I hope that now works correctly.
Fixes: ed48c59875 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size")
Fixes: b27b323914 ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates")
Cc: qemu-stable@nongnu.org #v4.0.0
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 483f13524b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If we directly cast from int to uint64_t, we will first sign-extend to
an int64_t, which is wrong. We actually want to treat the PFNs like
unsigned values.
As far as I can see, this dates back to the initial virtio-balloon
commit, but wasn't triggered as fairly big guests would be required.
Cc: qemu-stable@nongnu.org
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit ffa207d082)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Prior to f6deb6d9 "virtio-balloon: Remove unnecessary MADV_WILLNEED on
deflate", the balloon device issued an madvise() MADV_WILLNEED on
pages removed from the balloon. That would hint to the host kernel
that the pages were likely to be needed by the guest in the near
future.
It's unclear if this is actually valuable or not, and so f6deb6d9
removed this, essentially ignoring balloon deflate requests. However,
concerns have been raised that this might cause a performance
regression by causing extra latency for the guest in certain
configurations.
So, until we can get actual benchmark data to see if that's the case,
this restores the old behaviour, issuing a MADV_WILLNEED when a page is
removed from the balloon.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190306030601.21986-4-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 596546fe9e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This fixes a balloon bug with a nasty consequence - potentially
corrupting guest memory - but which is extremely unlikely to be
triggered in practice.
The balloon always works in 4kiB units, but the host could have a
larger page size on certain platforms. Since ed48c59 "virtio-balloon:
Safely handle BALLOON_PAGE_SIZE < host page size" we've handled this
by accumulating requests to balloon 4kiB subpages until they formed a
full host page. Since f6deb6d "virtio-balloon: Remove unnecessary
MADV_WILLNEED on deflate" we essentially ignore deflate requests.
Suppose we have a host with 8kiB pages, and one host page has subpages
A & B. If we get this sequence of events -
inflate A
deflate A
inflate B
- the current logic will discard the whole host page. That's
incorrect because the guest has deflated subpage A, and could have
written important data to it.
This patch fixes the problem by adjusting our state information about
partially ballooned host pages when deflate requests are received.
Fixes: ed48c59 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size"
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190306030601.21986-3-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit b27b323914)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ed48c59875 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host
page size" introduced a new temporary data structure which tracks 4kiB
chunks which have been inserted into the balloon by the guest but
don't yet form a full host page which we can discard.
Unfortunately, I had a thinko and allocated that structure with
g_malloc0() but freed it with a plain free() rather than g_free().
This corrects the problem.
Fixes: ed48c59875
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190306030601.21986-2-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 301cf2a8dd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.
Now, we handle this very badly: we silently ignore balloon requests that
aren't host page aligned, and for requests that are host page aligned we
discard the entire host page. The latter can corrupt guest memory if its
page size is smaller than the host's.
The obvious choice would be to disable the balloon if the host page size is
not 4kiB. However, that would break the special case where host and guest
have the same page size, but that's larger than 4kiB. That case currently
works by accident[1] - and is used in practice on many production POWER
systems where 64kiB has long been the Linux default page size on both host
and guest.
To make the balloon safe, without breaking that useful special case, we
need to accumulate 4kiB balloon requests until we have a whole contiguous
host page to discard.
We could in principle do that across all guest memory, but it would require
a large bitmap to track. This patch represents a compromise: we track
ballooned subpages for a single contiguous host page at a time. This means
that if the guest discards all 4kiB chunks of a host page in succession,
we will discard it. This is the expected behaviour in the (host page) ==
(guest page) != 4kiB case we want to support.
If the guest scatters 4kiB requests across different host pages, we don't
discard anything, and issue a warning. Not ideal, but at least we don't
corrupt guest memory as the previous version could.
Warning reporting is kind of a compromise here. Determining whether we're
in a problematic state at realize() time is tricky, because we'd have to
look at the host pagesizes of all memory backends, but we can't really know
if some of those backends could be for special purpose memory that's not
subject to ballooning.
Reporting only when the guest tries to balloon a partial page also isn't
great because if the guest page size happens to line up it won't indicate
that we're in a non ideal situation. It could also cause alarming repeated
warnings whenever a migration is attempted.
So, what we do is warn the first time the guest attempts balloon a partial
host page, whether or not it will end up ballooning the rest of the page
immediately afterwards.
[1] Because when the guest attempts to balloon a page, it will submit
requests for each 4kiB subpage. Most will be ignored, but the one
which happens to be host page aligned will discard the whole lot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ed48c59875)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
discard RAM pages inserted into the balloon. This is basically a Linux
only interface (MADV_DONTNEED exists on some other platforms, but doesn't
always have the same semantics). It also doesn't work on hugepages and has
some other limitations.
It turns out that postcopy also needs to discard chunks of memory, and uses
a better interface for it: ram_block_discard_range(). It doesn't cover
every case, but it covers more than going direct to madvise() and this
gives us a single place to update for more possibilities in future.
There are some subtleties here to maintain the current balloon behaviour:
* For now, we just ignore requests to balloon in a hugepage backed region.
That matches current behaviour, because MADV_DONTNEED on a hugepage would
simply fail, and we ignore the error.
* If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
non-host-page-aligned addresses. These would also fail in madvise(),
which we then ignored. ram_block_discard_range() error_report()s calls
on unaligned addresses, so we explicitly check that case to avoid
spamming the logs.
* We now call ram_block_discard_range() with the *host* page size, whereas
we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly,
this also matches existing behaviour. Although the kernel fails madvise
on unaligned addresses, it will round unaligned sizes *up* to the host
page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size
we can incorrectly discard more memory than the guest asked us to. I'm
planning to address that soon.
Errors other than the ones discussed above, will now be reported by
ram_block_discard_range(), rather than silently ignored, which means we
have a much better chance of seeing when something is going wrong.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-5-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit dbe1a27745)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This replaces the balloon_page() internal interface with
ballon_inflate_page(), with a slightly different interface. The new
interface will make future alterations simpler.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-4-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e9550234d7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The virtio-balloon device's verification of the address given to it by the
guest has a number of faults:
* The addresses here are guest physical addresses, which should be
'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
pretty subtle and confusing)
* We don't check for section.mr being NULL, which is the main way that
memory_region_find() reports basic failures. We really need to check
that before looking at any other section fields, because
memory_region_find() doesn't initialize them on the failure path
* We're passing a length of '1' to memory_region_find(), but really the
guest is requesting that we put the entire page into the balloon,
so it makes more sense to call it with BALLOON_PAGE_SIZE
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-3-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b218a70e6a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When the balloon is inflated, we discard memory place in it using madvise()
with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
sounds like it makes sense but is actually unnecessary.
The misleadingly named MADV_DONTNEED just discards the memory in question,
it doesn't set any persistent state on it in-kernel; all that's necessary
to bring the memory back is to touch it. MADV_WILLNEED in contrast
specifically says that the memory will be used soon and faults it in.
This patch simplify's the balloon operation by dropping the madvise()
on deflate. This might have an impact on performance - it will move a
delay at deflate time until that memory is actually touched, which
might be more latency sensitive. However:
* Memory that's being given back to the guest by deflating the
balloon *might* be used soon, but it equally could just sit around
in the guest's pools until needed (or even be faulted out again if
the host is under memory pressure).
* Usually, the timescale over which you'll be adjusting the balloon
is long enough that a few extra faults after deflation aren't
going to make a difference.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-2-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f6deb6d95a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In virtio_balloon_get_config() we initialize a struct virtio_balloon_config
which we then copy to guest memory. However, the local variable is not
zero initialized. This works OK at the moment because we initialize
all the fields in it; however an upcoming kernel header change will
add some new fields. If we don't zero out the whole struct then we
will start leaking a small amount of the contents of QEMU's stack
to the guest as soon as we update linux-headers/ to a set of headers
that includes the new fields.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190118183603.24757-1-peter.maydell@linaro.org
(cherry picked from commit 5385a5988c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Show PCIe host bridge PNP id with PCI host bridge as a compatible id
when expanding a pcie bus.
Cc: qemu-stable@nongnu.org
Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Message-Id: <1563526469-15588-1-git-send-email-wrfsh@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ee4b0c8686)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When very large regions (32GB sized in our case, PCI pass-through of GPUs)
are compared substraction result does not fit into gint.
As a result crs_replace_with_free_ranges does not get sorted ranges and
incorrectly computes PCI64 free space regions. Which then makes linux
guest complain about device and PCI64 hole intersection and device
becomes unusable.
Fix that by returning exactly fitting ranges.
Also fix indentation of an entire crs_replace_with_free_ranges to make
checkpatch happy.
Cc: qemu-stable@nongnu.org
Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Message-Id: <1563466463-26012-1-git-send-email-wrfsh@yandex-team.ru>
Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
(cherry picked from commit 21e2acd583)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Masked entries will not generate interrupt messages, thus do no need to
be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
the kind
qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, high=0xff00, low=0x100)
if the masked entry happens to reference a non-present IRTE.
Cc: qemu-stable@nongnu.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <a84b7e03-f9a8-b577-be27-4d93d1caa1c9@siemens.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit be1927c97e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Implement a function to translate TPM error codes to strings so that
at least the most common error codes can be translated to human
readable strings.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit 7e095e84ba)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Exit() in the frontend reset function when the backend indicates
intialization failure.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit bcfd16fe26)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When a guest which doesn't support multiqueue is migrated with a multi queues
vhost-user-blk deivce, a crash will occur like:
0 qemu_memfd_alloc (name=<value optimized out>, size=562949953421312, seals=<value optimized out>, fd=0x7f87171fe8b4, errp=0x7f87171fe8a8) at util/memfd.c:153
1 0x00007f883559d7cf in vhost_log_alloc (size=70368744177664, share=true) at hw/virtio/vhost.c:186
2 0x00007f88355a0758 in vhost_log_get (listener=0x7f8838bd7940, enable=1) at qemu-2-12/hw/virtio/vhost.c:211
3 vhost_dev_log_resize (listener=0x7f8838bd7940, enable=1) at hw/virtio/vhost.c:263
4 vhost_migration_log (listener=0x7f8838bd7940, enable=1) at hw/virtio/vhost.c:787
5 0x00007f88355463d6 in memory_global_dirty_log_start () at memory.c:2503
6 0x00007f8835550577 in ram_init_bitmaps (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2173
7 ram_init_all (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2192
8 ram_save_setup (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2219
9 0x00007f88357a419d in qemu_savevm_state_setup (f=0x7f88384ce600) at migration/savevm.c:1002
10 0x00007f883579fc3e in migration_thread (opaque=0x7f8837530400) at migration/migration.c:2382
11 0x00007f8832447893 in start_thread () from /lib64/libpthread.so.0
12 0x00007f8832178bfd in clone () from /lib64/libc.so.6
This is because vhost_get_log_size() returns a overflowed vhost-log size.
In this function, it uses the uninitialized variable vqs->used_phys and
vqs->used_size to get the vhost-log size.
Signed-off-by: Li Hangjing <lihangjing@baidu.com>
Reviewed-by: Xie Yongji <xieyongji@baidu.com>
Reviewed-by: Chai Wen <chaiwen@baidu.com>
Message-Id: <20190603061524.24076-1-lihangjing@baidu.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 240e647a14)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We already have 221 for accesses through the page cache, but it is
better to create a new file for O_DIRECT instead of integrating those
test cases into 221. This way, we can make use of
_supported_cache_modes (and _default_cache_mode) so the test is
automatically skipped on filesystems that do not support O_DIRECT.
As part of the split, add _supported_cache_modes to 221. With that, it
no longer fails when run with -c none or -c directsync.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2fab30c80b)
*remove context dependencies on iotests not in 3.1
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently, qemu crashes whenever someone queries the block status of an
unaligned image tail of an O_DIRECT image:
$ echo > foo
$ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
Offset Length Mapped to File
qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
failed.
This is because bdrv_co_block_status() checks that the result returned
by the driver's implementation is aligned to the request_alignment, but
file-posix can fail to do so, which is actually mentioned in a comment
there: "[...] possibly including a partial sector at EOF".
Fix this by rounding up those partial sectors.
There are two possible alternative fixes:
(1) We could refuse to open unaligned image files with O_DIRECT
altogether. That sounds reasonable until you realize that qcow2
does necessarily not fill up its metadata clusters, and that nobody
runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2
files usually have an unaligned image tail.
(2) bdrv_co_block_status() could ignore unaligned tails. It actually
throws away everything past the EOF already, so that sounds
reasonable.
Unfortunately, the block layer knows file lengths only with a
granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
would have to guess whether its file length information is inexact
or whether the driver is broken.
Fixing what raw_co_block_status() returns is the safest thing to do.
There seems to be no other block driver that sets request_alignment and
does not make sure that it always returns aligned values.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 9c3db310ff)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Without this filter, this test sometimes fails.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit fff2388d5d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
the current value of 1024 bytes (16 * MFI_FRAME_SIZE) we map is not enough to hold
the maximum number of scatter gather elements we advertise. We actually need a
maximum of 2048 bytes. This is 128 max sg elements * 16 bytes (sizeof (union mfi_sgl)).
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20190404121015.28634-1-pl@kamp.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2e56fbc87f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
csske will be removed in a future machine. Ignore it for expanding the
cpu model. Otherwise qemu falls back to z9.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190429090250.7648-3-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit eaf6f642ab)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Buglink: https://launchpad.net/bugs/1823458
Currently, a user CHR_EVENT_CLOSED event will cause net_vhost_user_event()
to call vhost_user_cleanup(), which calls vhost_net_cleanup() for all
its queues. However, vhost_net_cleanup() must never be called like
this for fully-initialized nets; when other code later calls
vhost_net_stop() - such as from virtio_net_vhost_status() - it will try
to access the already-cleaned-up fields and fail with assertion errors
or segfaults.
The vhost_net_cleanup() will eventually be called from
qemu_cleanup_net_client().
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Message-Id: <20190416184624.15397-3-dan.streetman@canonical.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6ab79a20af)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.
This is an example stack trace of a related crash:
#0 0x00007ffff59da57f in raise () at /lib64/libc.so.6
#1 0x00007ffff59c4895 in abort () at /lib64/libc.so.6
#2 0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3 0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
#4 0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
#5 0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
at block/io.c:833
#6 0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
#7 0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
#8 0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
#9 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
#10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
#12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
#13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
#14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
#16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
#17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
#18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
#19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
#20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
#21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
#22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
#23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776
The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
(cherry picked from commit 1bffe1ae7a)
*drop context dependency on e64f25f30b
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When extracting a human-readable size formatter, we changed 'uint64_t
div' pre-patch to 'unsigned long div' post-patch. Which breaks on
32-bit platforms, resulting in 'inf' instead of intended values larger
than 999GB.
Fixes: 22951aaa
CC: qemu-stable@nongnu.org
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 754da86714)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Limiting the allocation to INT_MAX bytes isn't particularly clever
because it means that the final cluster will be a partial cluster which
will be completed through a COW operation. This results in unnecessary
data read and write requests which lead to an unwanted non-sparse
filesystem block for metadata preallocation.
Align the maximum allocation size down to the cluster size to avoid this
situation.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit f29fbf7c6b)
*modified to avoid functional dependency on 93e32b3e
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Error reporting for user_creatable_add_opts_foreach was changed so that
it no longer called 'error_report_err' in:
commit 7e1e0c1112
Author: Markus Armbruster <armbru@redhat.com>
Date: Wed Oct 17 10:26:43 2018 +0200
qom: Clean up error reporting in user_creatable_add_opts_foreach()
Some callers were updated to pass in "&error_fatal" but all the ones in
qemu-img were left passing NULL. As a result all errors went to
/dev/null instead of being reported to the user.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 334c43e2c3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Open files and directories with O_NOFOLLOW to avoid symlinks attacks.
While being at it also add O_CLOEXEC.
usb-mtp only handles regular files and directories and ignores
everything else, so users should not see a difference.
Because qemu ignores symlinks, carrying out a successful symlink attack
requires swapping an existing file or directory below rootdir for a
symlink and winning the race against the inotify notification to qemu.
Fixes: CVE-2018-16872
Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Reported-by: Michael Hanselmann <public@hansmi.ch>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael Hanselmann <public@hansmi.ch>
Message-id: 20181213122511.13853-1-kraxel@redhat.com
(cherry picked from commit bab9df35ce)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When using -drive to configure the hd drive for the New World machine, the node
name "disk" should be used instead of the "hd" alias.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190307212058.4890-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 31bc6fa7fa)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When using -drive to configure the hd drive for the Old World machine, the node
name "disk" should be used instead of the "hd" alias.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190307212058.4890-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 484d366e02)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The current check to test if usbfs support should be compiled or not
solely relies on the presence of <linux/usbdevice_fs.h>, without
actually checking that all definition used by Qemu are provided by
this header file.
With sufficiently old kernel headers, <linux/usbdevice_fs.h> may be
present, but some of the definitions needed by Qemu may not be
available.
This commit improves the check by building a small program that
actually tests whether the necessary definitions are available.
In addition, it fixes a bug where have_usbfs was set to "yes"
regardless of the result of the test.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190213211827.20300-1-thomas.petazzoni@bootlin.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 96566d09aa)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Commit 3ebee3b191 defined assert() as g_assert(), but when we build
the VSS DLL component of QGA (to handle fsfreeze) we do not include
glib, which results in breakage when building with VSS support enabled.
Fix this by including glib (along with the -lintl and -lws2_32
dependencies it brings).
Since the VSS DLL is built statically, this introduces an additional
dependency on static glib and supporting libs for the mingw environment
(possibly why we didn't include glib originally), but VSS support
already has very specific prerequisites so it shouldn't affect too many
build environments.
Since the VSS DLL code does use qemu/osdep.h, this should also help
avoid future breakages and possibly allow for some clean ups in current
VSS code.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit 82a58d270c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Commit 8bca4613 added support for %% in json strings when interpolating,
but in doing so broke handling of % when not interpolating.
When parse_string() is fed a string token containing '%', it skips the
'%' regardless of ctxt->ap, i.e. even it's not interpolating. If the
'%' is the string's last character, it fails an assertion. Else, it
"merely" swallows the '%'.
Fix parse_string() to handle '%' specially only when interpolating.
To gauge the bug's impact, let's review non-interpolating users of this
parser, i.e. code passing NULL context to json_message_parser_init():
* tests/check-qjson.c, tests/test-qobject-input-visitor.c,
tests/test-visitor-serialization.c
Plenty of tests, but we still failed to cover the buggy case.
* monitor.c: QMP input
* qga/main.c: QGA input
* qobject_from_json():
- qobject-input-visitor.c: JSON command line option arguments of
-display and -blockdev
Reproducer: -blockdev '{"%"}'
- block.c: JSON pseudo-filenames starting with "json:"
Reproducer: https://bugzilla.redhat.com/show_bug.cgi?id=1668244#c3
- block/rbd.c: JSON key pairs
Pseudo-filenames starting with "rbd:".
Command line, QMP and QGA input are trusted.
Filenames are trusted when they come from command line, QMP or HMP.
They are untrusted when they come from from image file headers.
Example: QCOW2 backing file name. Note that this is *not* the security
boundary between host and guest. It's the boundary between host and an
image file from an untrusted source.
Neither failing an assertion nor skipping a character in a filename of
your choice looks exploitable. Note that we don't support compiling
with NDEBUG.
Fixes: 8bca4613e6
Cc: qemu-stable@nongnu.org
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Message-Id: <20190102140535.11512-1-cfergeau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
[Commit message extended to discuss impact]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit bbc0586ced)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Processor tracing is not yet implemented for KVM and it will be an
opt in feature requiring a special module parameter.
Disable it, because it is wrong to enable it by default and
it is impossible that no one has ever used it.
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 4c257911dc)
*drop context dependency on ecb85fe48
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
PCONFIG is not available to guests; it must be specifically enabled
using the PCONFIG_ENABLE execution control. Disable it, because
no one can ever use it.
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1545227081-213696-2-git-send-email-robert.hu@linux.intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 76e5a4d583)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
vfio-ap devices do not pin any pages in the host. Therefore, they
are compatible with memory ballooning.
Flag them as compatible, so both vfio-ap and a balloon can be
used simultaneously.
Cc: qemu-stable@nongnu.org
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 1883e8fc80)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In tpm_tis_mmio_write() if the requesting locality is seizing
access, any seizure by a lower locality is cancelled. However the
loop doing the seizure had an off-by-one error and the locality
immediately preceding the requesting locality was not being cleared.
This is fixed by adjusting the test in the for loop to check the
localities up to the requesting locality.
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
(cherry picked from commit 37b55d67c0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When emulating ident in tcp_emu, if the strchr checks passed but the
sscanf check failed, two uninitialized variables would be copied and
sent in the reply, so move this code inside the if(sscanf()) clause.
Signed-off-by: William Bowling <will@wbowling.info>
Cc: qemu-stable@nongnu.org
Cc: secalert@redhat.com
Message-Id: <1551476756-25749-1-git-send-email-will@wbowling.info>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit d3222975c7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
bdrv_co_invalidate_cache() clears the BDRV_O_INACTIVE flag before
actually activating a node so that the correct permissions etc. are
taken. In case of errors, the flag must be restored so that the next
call to bdrv_co_invalidate_cache() retries activation.
Restoring the flag was missing in the error path for a failed
parent->role->activate() call. The consequence is that this attempt to
activate all images correctly fails because we still set errp, however
on the next attempt BDRV_O_INACTIVE is already clear, so we return
success without actually retrying the failed action.
An example where this is observable in practice is migration to a QEMU
instance that has a raw format block node attached to a guest device
with share-rw=off (the default) while another process holds
BLK_PERM_WRITE for the same image. In this case, all activation steps
before parent->role->activate() succeed because raw can tolerate other
writers to the image. Only the parent callback (in particular
blk_root_activate()) tries to implement the share-rw=on property and
requests exclusive write permissions. This fails when the migration
completes and correctly displays an error. However, a manual 'cont' will
incorrectly resume the VM without calling blk_root_activate() again.
This case is described in more detail in the following bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1531888
Fix this by correctly restoring the BDRV_O_INACTIVE flag in the error
path.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 78fc3b3a26)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Make sure that the locality passed from the backend to
tpm_tis_request_completed() is valid.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit a639f96111)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Make sure that the new locality passed to tpm_tis_prep_abort()
is valid.
Add a comment to aborting_locty that it may be any locality, including
TPM_TIS_NO_LOCALITY.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit e92b63ea61)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The tcg_register_iommu_notifier() code has a GArray of
TCGIOMMUNotifier structs which it has registered by passing
memory_region_register_iommu_notifier() a pointer to the embedded
IOMMUNotifier field. Unfortunately, if we need to enlarge the
array via g_array_set_size() this can cause a realloc(), which
invalidates the pointer that memory_region_register_iommu_notifier()
put into the MemoryRegion's iommu_notify list. This can result
in segfaults.
Switch the GArray to holding pointers to the TCGIOMMUNotifier
structs, so that we can individually allocate and free them.
Cc: qemu-stable@nongnu.org
Fixes: 1f871c5e6b ("exec.c: Handle IOMMUs in address_space_translate_for_iotlb()")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128174241.5860-1-peter.maydell@linaro.org
(cherry picked from commit 5601be3b01)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The architecture specifies specification exceptions for all
unavailable subcodes.
The presence of subcodes is indicated by checking some query subcode.
For example 6 will indicate that 3-6 are available. So future systems
might call new subcodes to check for new features. This should not
trigger a hw error, instead we return the architectured specification
exception.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20190111113657.66195-3-frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 37dbd1f4d4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Linux returns success if pwrite64() or pread64() are called with a
zero length NULL buffer, but QEMU was returning -TARGET_EFAULT.
This is the same bug that we fixed in commit 58cfa6c2e6
for the write syscall, and long before that in 38d840e679
for the read syscall.
Fixes: https://bugs.launchpad.net/qemu/+bug/1810433
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190108184900.9654-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 2bd3f8998e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Since "s390x/tcg: avoid overflows in time2tod/tod2time", the
time2tod() function tries to deal with the 9 uppermost bits in the
time value, but uses the wrong mask for this: 0xff80000000000000 should
be used instead of 0xff10000000000000 here.
Fixes: 14055ce53c
Cc: qemu-stable@nongnu.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1544792887-14575-1-git-send-email-thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[CH: tweaked commit message]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit aba7a5a2de)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Otherwise it won't be set up correctly and won't work after
miigration.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2b4e573c7c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When VM boots from the latest version of linux kernel, after
hot-unpluging virtio-blk disks which are hotplugged into
pcie-root-port, the VM's dmesg log shows:
[ 151.046242] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0001 from Slot Status
[ 151.046365] pciehp 0000:00:05.0:pcie004: Slot(0-3): Attention button pressed
[ 151.046369] pciehp 0000:00:05.0:pcie004: Slot(0-3): Powering off due to button press
[ 151.046420] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 151.046425] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 151.046464] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 151.046468] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 156.163421] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 2f1
[ 156.163427] pciehp 0000:00:05.0:pcie004: pciehp_unconfigure_device: domain🚌dev = 0000:06:00
[ 156.198736] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 156.198772] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 157.224124] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0018 from Slot Status
[ 157.224194] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 157.224220] pciehp 0000:00:05.0:pcie004: pciehp_check_link_active: lnk_status = 2011
[ 157.224223] pciehp 0000:00:05.0:pcie004: Slot(0-3): Link Up
[ 157.224233] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 7f1
[ 157.224281] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 157.224285] pciehp 0000:00:05.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
[ 157.224300] pciehp 0000:00:05.0:pcie004: __pciehp_link_set: lnk_ctrl = 0
[ 157.224336] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 157.224339] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 159.739294] pci 0000:06:00.0 id reading try 50 times with interval 20 ms to get ffffffff
[ 159.739315] pciehp 0000:00:05.0:pcie004: pciehp_check_link_status: lnk_status = 2011
[ 159.739318] pciehp 0000:00:05.0:pcie004: Failed to check link status
[ 159.739371] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 159.739394] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 160.771426] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771452] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 160.771495] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771499] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[ 160.771535] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771539] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
After analyzing the log information, it seems that qemu doesn't
change the Link Status from active to inactive after hot-unplug.
This results in the abnormal log after the linux kernel commit
d331710ea78fea merged.
Furthermore, If I hotplug the same virtio-blk disk after hot-unplug,
the virtio-blk would turn on and then back off.
So this patch set the Link Status inactive after hot-unplug and
active after hot-plug.
Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
Signed-off-by: Zheng Xiang <xiang.zheng@linaro.org>
Cc: Wang Haibin <wanghaibin.wang@huawei.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2f2b18f60b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Changes requirement for "vsubsbs" instruction, which has been supported
since ISA 2.03. (Please see section 5.9.1.2 of ISA 2.03)
Reported-by: Paul A. Clarke <pc@us.ibm.com>
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Signed-off-by: Leonardo Bras <leonardo@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit fcfbc18d00)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
"-machine pc" will not work all architectures. Lets fall back to the
default machine by not specifying it.
In addition we also need to specify -no-shutdown on s390 as qemu will
exit otherwise.
Cc: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2c26e648e4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Clang 3.4 considers duplicate typedef in ppc4xx_i2c.h and
bitbang_i2c.h an error even if they are identical. Move it to a common
place to allow building with this clang version.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 2b4c1125ac)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2019-03-27 20:03:43 -05:00
78 changed files with 1299 additions and 324 deletions