Compare commits

...

204 Commits

Author SHA1 Message Date
Michael Roth 0982a56a55 Update version for 2.11.2 release
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-26 22:27:39 -05:00
Philippe Mathieu-Daudé aba59705d9 usb/dev-mtp: Fix use of uninitialized values
This fixes:

  hw/usb/dev-mtp.c:971:5: warning: 4th function call argument is an uninitialized value
      trace_usb_mtp_op_get_partial_object(s->dev.addr, o->handle, o->path,
                                           c->argv[1], c->argv[2]);
                                                       ^~~~~~~~~~
and:

  hw/usb/dev-mtp.c:981:12: warning: Assigned value is garbage or undefined
      offset = c->argv[1];
               ^ ~~~~~~~~~~

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180604151421.23385-3-f4bug@amsat.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 62713a2e50)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-21 10:18:22 -05:00
Philippe Mathieu-Daudé 17e3fcbc51 usb: correctly handle Zero Length Packets
USB Specification Revision 2.0, §5.5.3:
  The Data stage of a control transfer from an endpoint to the host is complete when the endpoint does one of the following:
  • Has transferred exactly the amount of data specified during the Setup stage
  • Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet"

hw/usb/redirect.c:802:9: warning: Declared variable-length array (VLA) has zero size
        uint8_t buf[size];
        ^~~~~~~~~~~ ~~~~

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180604151421.23385-2-f4bug@amsat.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit bf78fb1c1b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-21 10:18:16 -05:00
Philippe Mathieu-Daudé 9e4fa091ee gdbstub: fix off-by-one in gdb_handle_packet()
memtohex() adds an extra trailing NUL character.

Reported-by: AddressSanitizer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180408145933.1149-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 9005774b27)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-21 10:18:10 -05:00
Michael Roth a8e4217b0c update s390-ccw.img for stable
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlsrYiwSHGNvaHVja0By
 ZWRoYXQuY29tAAoJEN7Pa5PG8C+vZ/cP/Rw0Vu0xvMbiyACKkqSNvP6o7w1jL5X/
 2wXefb+uZBWlvPBDXYmBlvmLpXQlbXCwm3igus5/fFAPByAkIG1qstKa7Pfd7fc/
 Bj0LyH52P/PSJ+nss2SwnLvVxY/XFOZVoA3pRgZYA6L6uVJXI1t5XZHj088UiYnR
 +Pr22RXaozEYTWcNLacmByUXkQdpet3y9ORaUZn4aSdoReSvPYb9Tc9qNBqR1GVP
 v04m92T94qfs/9O1mStRKMjkpvWnh/niyIRRr6BsJ5rsYM3SsWi+LIF8/uWXl1Qy
 aPaDCniXSdhd7MKiYhSMvLMx6k4tf8cutAVT1th033cAzkzr2YV5Vd+M5c7v0D0+
 xynQlmdeC0MPsuQAsA6wBc/3SkLAxDQOKX3YcOSvIcixUpWqAUXpHNjsGivx8pau
 XWqYj6LF8uXie9Avv2s1iTqUOAB/0QpYFBHSef3JwAV5KNAVZPApMsyrC8gd1N0B
 /gXPX9lGLGnfglOH9fRu/C+jCOuhYl0Nbp00EUNATapuVBt/cBoq7RGU3PQQVNS6
 1lYo8SCFBMUPnpAzGHGwhgUHc7KYhb7xJWbYKhrLbohkIza2C+xd5+w7qGbLPwSw
 wQ8jruR3gH7YhIkuu6RgfG5oWtRAApb+8L/MUtt7rqYNi1PcW8VukS09xXjZ0PPn
 mpa5z/DP3+uy
 =Cm1K
 -----END PGP SIGNATURE-----

Merge tag 'tags/s390x-20180621-211-stable' into stable-2.11-staging

update s390-ccw.img for stable

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-21 10:16:52 -05:00
Cornelia Huck 728d6c602e pc-bios/s390-ccw.img: update image for stable
Contains the following commits:
- s390: Do not pass inofficial IPL type to the guest
- s390-ccw: force diag 308 subcode to unsigned long
- pc-bios/s390-ccw: struct tpi_info must be declared as aligned(4)

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-06-21 04:22:15 -04:00
Marc-André Lureau 1e13e7d93b tpm: lookup cancel path under tpm device class
Since Linux commit 313d21eeab9282e, tpm devices have their own device
class "tpm" and the cancel path must be looked up under
/sys/class/tpm/ instead of /sys/class/misc/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
(cherry picked from commit 05b71fb207)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:08 -05:00
Marc-André Lureau a36591fa62 tpm-passthrough: don't save guessed cancel_path in options
The value is later unneeded, and may leak if the free visitor doesn't
consider it since has_cancel_path is false. And for consistency with
"path" it shouldn't be returned in get_tpm_options().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
(cherry picked from commit 69c07db046)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:08 -05:00
Christian Borntraeger 0f04a8bba6 s390-ccw-virtio: allow for systems larger that 7.999TB
KVM does not allow memory regions > KVM_MEM_MAX_NR_PAGES, basically
limiting the memory per slot to 8TB-4k. As memory slots on s390/kvm must
be a multiple of 1MB we need start a new memory region if we cross
8TB-1M.

With that (and optimistic overcommitment in the kernel) I was able to
start a 24TB guest on a 1TB system.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20171211122146.162430-1-borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[CH: 1UL -> 1ULL in KVM_MEM_MAX_NR_PAGES; build fix on 32 bit hosts]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>

(cherry picked from commit bb223055b9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:08 -05:00
Daniel P. Berrangé 2ab0ce6e8d crypto: ensure we use a predictable TLS priority setting
The TLS test cert generation relies on a fixed set of algorithms that are
only usable under GNUTLS' default priority setting. When building QEMU
with a custom distro specific priority setting, this can cause the TLS
tests to fail. By forcing the tests to always use "NORMAL" priority we
can make them more robust.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 057ad0b469)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:08 -05:00
Daniel P. Berrange 8bbb76e050 qapi: ensure stable sort ordering when checking QAPI entities
Some early python 3.x versions will have different default
ordering when calling the 'values()' method on a dict, compared
to python 2.x and later 3.x versions. Explicitly sort the items
to get a stable ordering.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20180116134217.8725-8-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit f7a5376d4b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Shannon Zhao 6622e94897 arm_gicv3_kvm: kvm_dist_get/put_priority: skip the registers banked by GICR_IPRIORITYR
While for_each_dist_irq_reg loop starts from GIC_INTERNAL, it forgot to
offset the date array and index. This will overlap the GICR registers
value and leave the last GIC_INTERNAL irq's registers out of update.

Fixes: 367b9f527b
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 1dcf367519)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Eric Blake 99e051b290 iotests: Add test 221 to catch qemu-img map regression
Although qemu-img creates aligned files (by rounding up), it
must also gracefully handle files that are not sector-aligned.
Test that the bug fixed in the previous patch does not recur.

It's a bit annoying that we can see the (implicit) hole past
the end of the file on to the next sector boundary, so if we
ever reach the point where we report a byte-accurate size rather
than our current behavior of always rounding up, this test will
probably need a slight modification.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit c6a9d2f6f9)
 Conflicts:
	tests/qemu-iotests/group
* drop context dep on tests not present in 2.11
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Eric Blake 9c5a8433ce qemu-img: Fix assert when mapping unaligned raw file
Commit a290f085 exposed a latent bug in qemu-img map introduced
during the conversion of block status to be byte-based.  Earlier in
commit 5e344dd8, the internal interface get_block_status() switched
to take byte-based parameters, but still called a sector-based
block layer function; as such, rounding was added in the lone
caller to obey the contract.  However, commit 237d78f8 changed
get_block_status() to truly be byte-based, at which point rounding
to sector boundaries can result in calling bdrv_block_status() with
'bytes == 0' (a coding error) when the boundary between data and a
hole falls mid-sector (true for the past-EOF implicit hole present
in POSIX files).  Fix things by removing the rounding that is now
no longer necessary.

See also https://bugzilla.redhat.com/1589738

Fixes: 237d78f8
Reported-by: Dan Kenigsberg <danken@redhat.com>
Reported-by: Nir Soffer <nsoffer@redhat.com>
Reported-by: Maor Lipchuk <mlipchuk@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e0b371ed5e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
linzhecheng d8a919ffc7 vhost-user: delete net client if necessary
As qemu_new_net_client create new ncs but error happens later,
ncs will be left in global net_clients list and we can't use them any
more, so we need to cleanup them.

Cc: qemu-stable@nongnu.org
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit c67daf4a24)
 Conflicts:
	net/vhost-user.c
* drop functional dep on 4d0cf552
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Brijesh Singh f4e93bbd2e tap: set vhostfd passed from qemu cli to non-blocking
A guest boot hangs while probing the network interface when
iommu_platform=on is used.

The following qemu cli hangs without this patch:

# $QEMU \
  -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 4<>/dev/host-net \
  -device virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
  ...

Commit: c471ad0e9b (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.

Fixes: c471ad0e9b ("vhost_net: device IOTLB support")
Cc: qemu-stable@nongnu.org
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit d542800d1e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Konrad Rzeszutek Wilk afa43eb63e i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)
AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x80000008_EBX[25] CPUID feature bit.

This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-3-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 403503b162)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Konrad Rzeszutek Wilk 73521f60f4 i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)
"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD).  To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f.  With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-4-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit cfeea0c021)
 Conflicts:
	target/i386/kvm.c
	target/i386/machine.c
* drop context dep on b77146e9a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Daniel P. Berrangé 4ce0b750a2 i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)
New microcode introduces the "Speculative Store Bypass Disable"
CPUID feature bit. This needs to be exposed to guest OS to allow
them to protect against CVE-2018-3639.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Message-Id: <20180521215424.13520-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit d19d1f9659)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Gerd Hoffmann 84cfae0b6f vga: fix region calculation
Typically the scanline length and the line offset are identical.  But
in case they are not our calculation for region_end is incorrect.  Using
line_offset is fine for all scanlines, except the last one where we have
to use the actual scanline length.

Fixes: CVE-2018-7550
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Message-id: 20180309143704.13420-1-kraxel@redhat.com
(cherry picked from commit 7cdc61becd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:07 -05:00
Alberto Garcia d80b60fc24 throttle: Fix crash on reopen
The throttle block filter can be reopened, and with this it is
possible to change the throttle group that the filter belongs to.

The way the code does that is the following:

  - On throttle_reopen_prepare(): create a new ThrottleGroupMember
    and attach it to the new throttle group.

  - On throttle_reopen_commit(): detach the old ThrottleGroupMember,
    delete it and replace it with the new one.

The problem with this is that by replacing the ThrottleGroupMember the
previous value of io_limits_disabled is lost, causing an assertion
failure in throttle_co_drain_end().

This problem can be reproduced by reopening a throttle node:

   $QEMU -monitor stdio
   -object throttle-group,id=tg0,x-iops-total=1000 \
   -blockdev node-name=hd0,driver=qcow2,file.driver=file,file.filename=hd.qcow2 \
   -blockdev node-name=root,driver=throttle,throttle-group=tg0,file=hd0,read-only=on

   (qemu) block_stream root
   block/throttle.c:214: throttle_co_drain_end: Assertion `tgm->io_limits_disabled' failed.

Since we only want to change the throttle group on reopen there's no
need to create a ThrottleGroupMember and discard the old one. It's
easier if we simply detach it from its current group and attach it to
the new one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180608151536.7378-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit bc33c047d1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Max Reitz f647a4fc44 iotests: Add case for a corrupted inactive image
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit c50abd175a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Max Reitz f91f44fcd1 qcow2: Do not mark inactive images corrupt
When signaling a corruption on a read-only image, qcow2 already makes
fatal events non-fatal (i.e., they will not result in the image being
closed, and the image header's corrupt flag will not be set).  This is
necessary because we cannot set the corrupt flag on read-only images,
and it is possible because further corruption of read-only images is
impossible.

Inactive images are effectively read-only, too, so we should do the same
for them.  bdrv_is_writable() can tell us whether an image can actually
be written to, so use its result instead of !bs->read_only.

(Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in
bdrv_co_pwritev() will fail, crashing qemu.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-3-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit ddf3b47ef4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Max Reitz 62891cad12 block: Make bdrv_is_writable() public
This is a useful function for the whole block layer, so make it public.
At the same time, users outside of block.c probably do not need to make
use of the reopen functionality, so rename the current function to
bdrv_is_writable_after_reopen() create a new bdrv_is_writable() function
that just passes NULL to it for the reopen queue.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180606193702.7113-2-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit cc02214097)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Shannon Zhao 5d639531b7 arm_gicv3_kvm: kvm_dist_get/put: skip the registers banked by GICR
While we skip the GIC_INTERNAL irqs, we don't change the register offset
accordingly. This will overlap the GICR registers value and leave the
last GIC_INTERNAL irq's registers out of update.

Fix this by skipping the registers banked by GICR.

Also for migration compatibility if the migration source (old version
qemu) doesn't send gicd_no_migration_shift_bug = 1 to destination, then
we shift the data of PPI to get the right data for SPI.

Fixes: 367b9f527b
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Message-id: 1527816987-16108-1-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 910e204841)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
John Snow deabb454dc ahci: fix PxCI register race
Fixes: https://bugs.launchpad.net/qemu/+bug/1769189

AHCI presently signals completion prior to the PxCI register being
cleared to indicate completion. If a guest driver attempts to issue
a new command in its IRQ handler, it might be surprised to learn there
is still a command pending.

In the case of Windows 10's boot driver, it will actually poll the IRQ
register hoping to find out when the command is done running -- which
will never happen, as there isn't a command running.

Fix this: clear PxCI in ahci_cmd_done and not in the asynchronous BH.
Because it now runs synchronously, we don't need to check if the command
is actually done by spying on the ATA registers. We know it's done.

CC: qemu-stable <qemu-stable@nongnu.org>
Reported-by: François Guerraz <kubrick@fgv6.net>
Tested-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180531004323.4611-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 5694c7eacc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
John Thomson 16238325f1 Fix libusb-1.0.22 deprecated libusb_set_debug with libusb_set_option
libusb-1.0.22 marked libusb_set_debug deprecated
it is replaced with
libusb_set_option(libusb_context, LIBUSB_OPTION_LOG_LEVEL, libusb_log_level);

details here: 539f22e2fd

Warning here:

  CC      hw/usb/host-libusb.o
/builds/xen/src/qemu-xen/hw/usb/host-libusb.c: In function 'usb_host_init':
/builds/xen/src/qemu-xen/hw/usb/host-libusb.c:250:5: error: 'libusb_set_debug' is deprecated: Use libusb_set_option instead [-Werror=deprecated-declarations]
     libusb_set_debug(ctx, loglevel);
     ^~~~~~~~~~~~~~~~
In file included from /builds/xen/src/qemu-xen/hw/usb/host-libusb.c:40:0:
/usr/include/libusb-1.0/libusb.h:1300:18: note: declared here
 void LIBUSB_CALL libusb_set_debug(libusb_context *ctx, int level);
                  ^~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [/builds/xen/src/qemu-xen/rules.mak:66: hw/usb/host-libusb.o] Error 1
make: Leaving directory '/builds/xen/src/xen/tools/qemu-xen-build'

Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
Message-id: 20180405132046.4968-1-git@johnthomson.fastmail.com.au
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 9d8fa0df49)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Shannon Zhao e55c6b6795 arm_gicv3_kvm: increase clroffset accordingly
It forgot to increase clroffset during the loop. So it only clear the
first 4 bytes.

Fixes: 367b9f527b
Cc: qemu-stable@nongnu.org
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1527047633-12368-1-git-send-email-zhaoshenglong@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 34ffacae08)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Peter Xu 8da7a171c0 intel-iommu: rework the page walk logic
This patch fixes a potential small window that the DMA page table might
be incomplete or invalid when the guest sends domain/context
invalidations to a device.  This can cause random DMA errors for
assigned devices.

This is a major change to the VT-d shadow page walking logic. It
includes but is not limited to:

- For each VTDAddressSpace, now we maintain what IOVA ranges we have
  mapped and what we have not.  With that information, now we only send
  MAP or UNMAP when necessary.  Say, we don't send MAP notifies if we
  know we have already mapped the range, meanwhile we don't send UNMAP
  notifies if we know we never mapped the range at all.

- Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call
  in any places to resync the shadow page table for a device.

- When we receive domain/context invalidation, we should not really run
  the replay logic, instead we use the new sync shadow page table API to
  resync the whole shadow page table without unmapping the whole
  region.  After this change, we'll only do the page walk once for each
  domain invalidations (before this, it can be multiple, depending on
  number of notifiers per address space).

While at it, the page walking logic is also refactored to be simpler.

CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Tested-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 63b88968f1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:06 -05:00
Peter Xu bdb6f9b6bc util: implement simple iova tree
Introduce a simplest iova tree implementation based on GTree.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit eecf5eedbd)
 Conflicts:
	util/Makefile.objs
* drop context dep
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu e2c44fc580 intel-iommu: trace domain id during page walk
This patch only modifies the trace points.

Previously we were tracing page walk levels.  They are redundant since
we have page mask (size) already.  Now we trace something much more
useful which is the domain ID of the page walking.  That can be very
useful when we trace more than one devices on the same system, so that
we can know which map is for which domain.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d118c06ebb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu e8aa1dc816 intel-iommu: pass in address space when page walk
We pass in the VTDAddressSpace too.  It'll be used in the follow up
patches.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2f764fa87d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu 04e0e11628 intel-iommu: introduce vtd_page_walk_info
During the recursive page walking of IOVA page tables, some stack
variables are constant variables and never changed during the whole page
walking procedure.  Isolate them into a struct so that we don't need to
pass those contants down the stack every time and multiple times.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fe215b0cbb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu 21962e6958 intel-iommu: only do page walk for MAP notifiers
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables.
Fasten that procedure by skipping the page table walk.  That should
boost performance for UNMAP-only notifiers like vhost.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 4f8a62a933)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu 3585497e86 intel-iommu: add iommu lock
SECURITY IMPLICATION: this patch fixes a potential race when multiple
threads access the IOMMU IOTLB cache.

Add a per-iommu big lock to protect IOMMU status.  Currently the only
thing to be protected is the IOTLB/context cache, since that can be
accessed even without BQL, e.g., in IO dataplane.

Note that we don't need to protect device page tables since that's fully
controlled by the guest kernel.  However there is still possibility that
malicious drivers will program the device to not obey the rule.  In that
case QEMU can't really do anything useful, instead the guest itself will
be responsible for all uncertainties.

CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1d9efa73e1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu 99fc962b62 intel-iommu: remove IntelIOMMUNotifierNode
That is not really necessary.  Removing that node struct and put the
list entry directly into VTDAddressSpace.  It simplfies the code a lot.
Since at it, rename the old notifiers_list into vtd_as_with_notifiers.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b4a4ba0d68)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Peter Xu 6c403b72b4 intel-iommu: send PSI always even if across PDEs
SECURITY IMPLICATION: without this patch, any guest with both assigned
device and a vIOMMU might encounter stale IO page mappings even if guest
has already unmapped the page, which may lead to guest memory
corruption.  The stale mappings will only be limited to the guest's own
memory range, so it should not affect the host memory or other guests on
the host.

During IOVA page table walking, there is a special case when the PSI
covers one whole PDE (Page Directory Entry, which contains 512 Page
Table Entries) or more.  In the past, we skip that entry and we don't
notify the IOMMU notifiers.  This is not correct.  We should send UNMAP
notification to registered UNMAP notifiers in this case.

For UNMAP only notifiers, this might cause IOTLBs cached in the devices
even if they were already invalid.  For MAP/UNMAP notifiers like
vfio-pci, this will cause stale page mappings.

This special case doesn't trigger often, but it is very easy to be
triggered by nested device assignments, since in that case we'll
possibly map the whole L2 guest RAM region into the device's IOVA
address space (several GBs at least), which is far bigger than normal
kernel driver usages of the device (tens of MBs normally).

Without this patch applied to L1 QEMU, nested device assignment to L2
guests will dump some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
                    0x7f89a920d000) = -17 (File exists)

CC: QEMU Stable <qemu-stable@nongnu.org>
Acked-by: Jason Wang <jasowang@redhat.com>
[peterx: rewrite the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

(cherry picked from commit 36d2d52bdb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Prasad Singamsetty 0b250250b7 intel-iommu: Extend address width to 48 bits
The current implementation of Intel IOMMU code only supports 39 bits
iova address width. This patch provides a new parameter (x-aw-bits)
for intel-iommu to extend its address width to 48 bits but keeping the
default the same (39 bits). The reason for not changing the default
is to avoid potential compatibility problems with live migration of
intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits'
parameter are 39 and 48.

After enabling larger address width (48), we should be able to map
larger iova addresses in the guest. For example, a QEMU guest that
is configured with large memory ( >=1TB ). To check whether 48 bits
aw is enabled, we can grep in the guest dmesg output with line:
"DMAR: Host address width 48".

Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 37f51384ae)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Prasad Singamsetty d45364e6d4 intel-iommu: Redefine macros to enable supporting 48 bit address width
The current implementation of Intel IOMMU code only supports 39 bits
host/iova address width so number of macros use hard coded values based
on that. This patch is to redefine them so they can be used with
variable address widths. This patch doesn't add any new functionality
but enables adding support for 48 bit address width.

Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 92e5d85e83)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:05 -05:00
Jan Kiszka 7128bcb221 hw/intc/arm_gicv3: Fix APxR<n> register dispatching
There was a nasty flip in identifying which register group an access is
targeting. The issue caused spuriously raised priorities of the guest
when handing CPUs over in the Jailhouse hypervisor.

Cc: qemu-stable@nongnu.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-id: 28b927d3-da58-bce4-cc13-bfec7f9b1cb9@siemens.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 887aae10f6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Michal Privoznik 2e70085eb7 console: Avoid segfault in screendump
After f771c5440e it is possible to select device and
head which to take screendump from. And even though we check if
provided head number falls within range, it may still happen that
the console has no surface yet leading to SIGSEGV:

  qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 \
    -qmp stdio \
    -device virtio-vga,id=video0,max_outputs=4

  {"execute":"qmp_capabilities"}
  {"execute":"screendump", "arguments":{"filename":"/tmp/screen.ppm", "device":"video0", "head":1}}
  Segmentation fault

 #0  0x00005628249dda88 in ppm_save (filename=0x56282826cbc0 "/tmp/screen.ppm", ds=0x0, errp=0x7fff52a6fae0) at ui/console.c:304
 #1  0x00005628249ddd9b in qmp_screendump (filename=0x56282826cbc0 "/tmp/screen.ppm", has_device=true, device=0x5628276902d0 "video0", has_head=true, head=1, errp=0x7fff52a6fae0) at ui/console.c:375
 #2  0x00005628247740df in qmp_marshal_screendump (args=0x562828265e00, ret=0x7fff52a6fb68, errp=0x7fff52a6fb60) at qapi/qapi-commands-ui.c:110

Here, @ds from frame #0 (or @surface from frame #1) is
dereferenced at the very beginning of ppm_save(). And because
it's NULL crash happens.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: cb05bb1909daa6ba62145c0194aafa05a14ed3d1.1526569138.git.mprivozn@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 08d9864fa4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Cornelia Huck b68ef5920a s390x/ccw: make sure all ccw devices are properly reset
Thomas reported that the subchannel for a  3270 device that ended up
in a broken state (status pending even though not enabled) did not
get out of that state even after a reboot (which involves a subsytem
reset). The reason for this is that the 3270 device did not define
a reset handler.

Let's fix this by introducing a base reset handler (set up for all
ccw devices) that resets the subchannel and have virtio-ccw call
its virtio-specific reset procedure in addition to that.

CC: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 838fb84f83)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Cornelia Huck b2ca602627 virtio-ccw: common reset handler
All the different virtio ccw devices use the same reset handler,
so let's move setting it into the base virtio ccw device class.

CC: qemu-stable@nongnu.org
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 0c53057adb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Nia Alarie f7f5398dc7 s390x/virtio: Convert virtio-ccw from *_exit to *_unrealize
Signed-off-by: Nia Alarie <nia.alarie@gmail.com>
Message-Id: <20180307162958.11232-1-nia.alarie@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 24118af846)
*prereq for 0c53057adb
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Philippe Mathieu-Daudé 18f59bbc76 qdev: add helpers to be more explicit when using abstract QOM parent functions
QOM API learning curve is quite hard, in particular when devices inherit from
abstract parent.
To be more explicit about when a device class change the parent hooks, add few
helpers hoping a device class_init() will be easier to understand.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180114020412.26160-3-f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 46795cf2e2)
*prereq for 0c53057adb
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Philippe Mathieu-Daudé 0db89523a6 qdev: rename typedef qdev_resetfn() -> DeviceReset()
following the DeviceRealize and DeviceUnrealize typedefs,
this unify a bit the new QOM API.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180114020412.26160-2-f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b850f664a1)
*prereq for 0c53057adb
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Thomas Huth ce4e7b03a7 pc-bios/s390-ccw: struct tpi_info must be declared as aligned(4)
I've run into a compilation error today with the current version of GCC 8:

In file included from s390-ccw.h:49,
                 from main.c:12:
cio.h:128:1: error: alignment 1 of 'struct tpi_info' is less than 4 [-Werror=packed-not-aligned]
 } __attribute__ ((packed));
 ^
cc1: all warnings being treated as errors

Since the struct tpi_info contains an element ("struct subchannel_id schid")
which is marked as aligned(4), we've got to mark the struct tpi_info as
aligned(4), too.

CC: qemu-stable@nongnu.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1525774672-11913-1-git-send-email-thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit a6e4385dea)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Cornelia Huck 435a31ff9a s390x/css: disabled subchannels cannot be status pending
The 3270 code will try to post an attention interrupt when the
3270 emulator (e.g. x3270) attaches. If the guest has not yet
enabled the subchannel for the 3270 device, we will present a spurious
cc 1 (status pending) when it uses msch on it later on, e.g. when
trying to enable the subchannel.

To fix this, just don't do anything in css_conditional_io_interrupt()
if the subchannel is not enabled. The 3270 code will work fine with
that, and the other user of this function (virtio-ccw) never
attempts to post an interrupt for a disabled device to begin with.

CC: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 6e9c893ecd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Fam Zheng 720db5deeb raw: Check byte range uniformly
We don't verify the request range against s->size in the I/O callbacks
except for raw_co_pwritev. This is inconsistent (especially for
raw_co_pwrite_zeroes and raw_co_pdiscard), so fix them, in the meanwhile
make the helper reusable by the coming new callbacks.

Note that in most cases the block layer already verifies the request
byte range against our reported image length, before invoking the driver
callbacks.  The exception is during image creating, after
blk_set_allow_write_beyond_eof(blk, true) is called. But in that case,
the requests are not directly from the user or guest. So there is no
visible behavior change in adding the check code.

The int64_t -> uint64_t inconsistency, as shown by the type casting, is
pre-existing due to the interface.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20180601092648.24614-3-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 3844553852)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:04 -05:00
Michael Walle 979e7ea7eb lm32: take BQL before writing IP/IM register
Writing to these registers may raise an interrupt request. Actually,
this prevents the milkymist board from starting.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Walle <michael@walle.cc>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
(cherry picked from commit 81e9cbd0ca)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Max Reitz 15fb5dbf90 iotests: Add test for -U/force-share conflicts
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502202051.15493-4-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 4e7d73c5fb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Max Reitz 5704f53ca9 qemu-img: Use only string options in img_open_opts
img_open_opts() takes a QemuOpts and converts them to a QDict, so all
values therein are strings.  Then it may try to call qdict_get_bool(),
however, which will fail with a segmentation fault every time:

$ ./qemu-img info -U --image-opts \
    driver=file,filename=/dev/null,force-share=off
[1]    27869 segmentation fault (core dumped)  ./qemu-img info -U
--image-opts driver=file,filename=/dev/null,force-share=off

Fix this by using qdict_get_str() and comparing the value as a string.
Also, when adding a force-share value to the QDict, add it as a string
so it fits the rest of the dict.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502202051.15493-3-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 4615f87832)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Max Reitz dbacc54944 qemu-io: Use purely string blockdev options
Currently, qemu-io only uses string-valued blockdev options (as all are
converted directly from QemuOpts) -- with one exception: -U adds the
force-share option as a boolean.  This in itself is already a bit
questionable, but a real issue is that it also assumes the value already
existing in the options QDict would be a boolean, which is wrong.

That has the following effect:

$ ./qemu-io -r -U --image-opts \
    driver=file,filename=/dev/null,force-share=off
[1]    15200 segmentation fault (core dumped)  ./qemu-io -r -U
--image-opts driver=file,filename=/dev/null,force-share=off

Since @opts is converted from QemuOpts, the value must be a string, and
we have to compare it as such.  Consequently, it makes sense to also set
it as a string instead of a boolean.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180502202051.15493-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 2a01c01f9e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Max Reitz 324bef41d3 iotests: Add test for rebasing with relative paths
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180509182002.8044-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 28036a7f70)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Max Reitz 1bec53c6f4 qemu-img: Resolve relative backing paths in rebase
Currently, rebase interprets a relative path for the new backing image
as follows:
(1) Open the new backing image with the given relative path (thus relative to
    qemu-img's working directory).
(2) Write it directly into the overlay's backing path field (thus
    relative to the overlay).

If the overlay is not in qemu-img's working directory, both will be
different interpretations, which may either lead to an error somewhere
(either rebase fails because it cannot open the new backing image, or
your overlay becomes unusable because its backing path does not point to
a file), or, even worse, it may result in your rebase being performed
for a different backing file than what your overlay will point to after
the rebase.

Fix this by interpreting the target backing path as relative to the
overlay, like qemu-img does everywhere else.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1569835
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180509182002.8044-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit d16699b646)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Olaf Hering 088a22455b configure: recognize more rpmbuild macros
Extend the list of recognized, but ignored options from rpms %configure
macro. This fixes build on hosts running SUSE Linux.

Cc: qemu-stable@nongnu.org
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Message-Id: <20180418075045.27393-1-olaf@aepfle.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 181ce1d05c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Gerd Hoffmann ecd54d6670 qxl: fix local renderer crash
Make sure we only ask the spice local renderer for display updates in
case we have a valid primary surface.  Without that spice is confused
and throws errors in case a display update request (triggered by
screendump for example) happens in parallel to a mode switch and hits
the race window where the old primary surface is gone and the new isn't
establisted yet.

Cc: qemu-stable@nongnu.org
Fixes: https://bugzilla.redhat.com//show_bug.cgi?id=1567733
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20180427115528.345-1-kraxel@redhat.com
(cherry picked from commit 5bd5c27c7d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Greg Kurz 1378cb9db5 spapr: don't advertise radix GTSE if max-compat-cpu < power9
On a POWER9 host, if a guest runs in pre POWER9 compat mode, it necessarily
uses the hash MMU mode. In this case, we shouldn't advertise radix GTSE in
the ibm,arch-vec-5-platform-support DT property as the current code does.
The first reason is that it doesn't make sense, and the second one is that
causes the CAS-negotiated options subsection to be migrated. This breaks
backward migration to QEMU 2.7 and older versions on POWER8 hosts:

qemu-system-ppc64: error while loading state for instance 0x0 of device
 'spapr'
qemu-system-ppc64: load of migration failed: No such file or directory

This patch hence initialize CPUs a bit earlier so that we can check the
requested compat mode, and don't set OV5_MMU_RADIX_GTSE for power8 and
older.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 0550b1206a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Greg Kurz 0ab2f8e36d target/ppc: always set PPC_MEM_TLBIE in pre 2.8 migration hack
The pseries-2.7 and older machine types require CPUPPCState::insns_flags
to be strictly equal between source and destination. This checking is
abusive and breaks migration of KVM guests when the host CPU models
are different, even if they are compatible enough to allow the guest
to run transparently. This buggy behaviour was fixed for pseries-2.8
and we added some hacks to allow backward migration of older machine
types. These hacks assume that the CPU belongs to the POWER8 family,
which was true for most KVM based setup we cared about at the time.
But now POWER9 systems are coming, and backward migration of pre 2.8
guests running in POWER8 architected mode from a POWER9 host to a
POWER8 host is broken:

qemu-system-ppc64: error while loading state for instance 0x0 of device
 'cpu'
qemu-system-ppc64: load of migration failed: Invalid argument

This happens because POWER9 doesn't set PPC_MEM_TLBIE in insns_flags,
while POWER8 does. Let's force PPC_MEM_TLBIE in the migration hack to
fix the issue. This is an acceptable hack because these old machine
types only support CPU models that do set PPC_MEM_TLBIE.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit bce009645b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:03 -05:00
Peter Maydell 67bacd5be4 target/arm: Implement v8M VLLDM and VLSTM
For v8M the instructions VLLDM and VLSTM support lazy saving
and restoring of the secure floating-point registers. Even
if the floating point extension is not implemented, these
instructions must act as NOPs in Secure state, so they can
be used as part of the secure-to-nonsecure call sequence.

Fixes: https://bugs.launchpad.net/qemu/+bug/1768295
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180503105730.5958-1-peter.maydell@linaro.org
(cherry picked from commit b1e5336a98)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Henry Wertz a27d261fe2 tcg/arm: Fix memory barrier encoding
I found with qemu 2.11.x or newer that I would get an illegal instruction
error running some Intel binaries on my ARM chromebook.  On investigation,
I found it was quitting on memory barriers.

qemu instruction:
mb $0x31
was translating as:
0x604050cc:  5bf07ff5  blpl     #0x600250a8

After patch it gives:
0x604050cc:  f57ff05b  dmb      ish

In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
correct based on online docs, but due to some endian-related shenanigans it
had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory
barrier for ARMv6) also should be byte swapped  (and this patch does so).
I have not checked for correctness of aarch64's barrier instruction.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Henry Wertz <hwertz10@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 3f814b8037)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Cornelia Huck 834a846eee s390-ccw: force diag 308 subcode to unsigned long
We currently pass an integer as the subcode parameter. However,
the upper bits of the register containing the subcode need to
be 0, which is not guaranteed unless we explicitly specify the
subcode to be an unsigned long value.

Fixes: d046c51dad ("pc-bios/s390-ccw: Get device address via diag 308/6")
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 63d8b5ace3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Viktor Mihajlovski 0304f75f61 s390: Do not pass inofficial IPL type to the guest
IPL over a virtio-scsi device requires special handling not
available in the real architecture. For this purpose the IPL
type 0xFF has been chosen as means of communication between
QEMU and the pc-bios. However, a guest OS could be confused
by seeing an unknown IPL type.

This change sets the IPL parameter type to 0x02 (CCW) to prevent
this. Pre-existing Linux has looked up the IPL parameters only in
the case of FCP IPL. This means that the behavior should stay
the same even if Linux checks for the IPL type unconditionally.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Message-Id: <1522940844-12336-4-git-send-email-mihajlov@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit e8c7ef288a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Eric Blake 955b77f56e nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE
A missing space makes for poor error messages, and sizes can't
go negative.  Also, we missed diagnosing a server that sends
a maximum block size less than the minimum.

Fixes: 081dd1fe
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180501154654.943782-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
(cherry picked from commit e475d108f1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Jason Andryuk f3f01bab82 ccid: Fix dwProtocols advertisement of T=0
Commit d7d218ef02 attempted to change
dwProtocols to only advertise support for T=0 and not T=1.  The change
was incorrect as it changed 0x00000003 to 0x00010000.

lsusb -v in a linux guest shows:
"dwProtocols         65536  (Invalid values detected)", though the
smart card could still be accessed.  Windows 7 does not detect inserted
smart cards and logs the the following Error in the Event Logs:

    Source: Smart Card Service
    Event ID: 610
    Smart Card Reader 'QEMU QEMU USB CCID 0' rejected IOCTL SET_PROTOCOL:
    Incorrect function. If this error persists, your smart card or reader
    may not be functioning correctly

    Command Header: 03 00 00 00

Setting to 0x00000001 fixes the Windows issue.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Message-id: 20180420183219.20722-1-jandryuk@gmail.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0ee86bb6c5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Geert Uytterhoeven 5648a81dd2 device_tree: Increase FDT_MAX_SIZE to 1 MiB
It is not uncommon for a contemporary FDT to be larger than 64 KiB,
leading to failures loading the device tree from sysfs:

    qemu-system-aarch64: qemu_fdt_setprop: Couldn't set ...: FDT_ERR_NOSPACE

Hence increase the limit to 1 MiB, like on PPC.

For reference, the largest arm64 DTB created from the Linux sources is
ca. 75 KiB large (100 KiB when built with symbols/fixup support).

Cc: qemu-stable@nongnu.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Message-id: 1523541337-23919-1-git-send-email-geert+renesas@glider.be
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 14ec3cbd7c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Peter Maydell 0580c63b4f hw/char/cmsdk-apb-uart.c: Correctly clear INTSTATUS bits on writes
The CMSDK APB UART INTSTATUS register bits are all write-one-to-clear.
We were getting this correct for the TXO and RXO bits (which need
special casing because their state lives in the STATE register),
but had forgotten to handle the normal bits for RX and TX which
we do store in our s->intstatus field.

Perform the W1C operation on the bits in s->intstatus too.

Fixes: https://bugs.launchpad.net/qemu/+bug/1760262
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180410134203.17552-1-peter.maydell@linaro.org
(cherry picked from commit 6670b494fd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Richard Henderson b17ed3e1d2 tcg: Introduce tcg_set_insn_start_param
The parameters for tcg_gen_insn_start are target_ulong, which may be split
into two TCGArg parameters for storage in the opcode on 32-bit hosts.

Fixes the ARM target and its direct use of tcg_set_insn_param, which would
set the wrong argument in the 64-on-32 case.

Cc: qemu-stable@nongnu.org
Reported-by: alarson@ddci.com
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180410003558.2470-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 9743cd5736)
 Conflicts:
	target/arm/translate.h
	tcg/tcg.h
* rework to avoid functional dependency on 15fa08f

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:02 -05:00
Philippe Mathieu-Daudé 44633a272b hw/block/pflash_cfi: fix off-by-one error
ASAN reported:

    hw/block/pflash_cfi02.c:245:33: runtime error: index 82 out of bounds for type 'uint8_t [82]'

Since the 'cfi_len' member is not used, remove it to keep the code safer.

Cc: qemu-stable@nongnu.org
Reported-by: AddressSanitizer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 07c13a7172)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Greg Kurz 8999a5945f vfio-ccw: fix memory leaks in vfio_ccw_realize()
If the subchannel is already attached or if vfio_get_device() fails, the
code jumps to the 'out_device_err' label and doesn't free the string it
has just allocated.

The code should be reworked so that vcdev->vdev.name only gets set when
the device has been attached, and freed when it is about to be detached.
This could be achieved  with the addition of a vfio_ccw_get_device()
function that would be the counterpart of vfio_put_device(). But this is
a more elaborate cleanup that should be done in a follow-up. For now,
let's just add calls to g_free() on the buggy error paths.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <152311222681.203086.8874800175539040298.stgit@bahia>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit be4d026f64)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Peter Maydell 28895d021e cpus.c: ensure running CPU recalculates icount deadlines on timer expiry
When we run in TCG icount mode, we calculate the number of instructions
to execute using tcg_get_icount_limit(), which ensures that we stop
execution at the next timer deadline. However there is a bug where
currently we do not recalculate that limit if the guest reprograms
a timer so that the next deadline moves closer, and so we will
continue execution until the original limit and fire the timer
later than we should.

Fix this bug in qemu_timer_notify_cb(): if we are currently running
a VCPU in icount mode, we simply need to kick it out of the main
loop and back to tcg_cpu_exec(), where it will recalculate the
icount limit. If we are not currently running a VCPU, then we
retain the existing logic for waking up a halted CPU.

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1754038
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180406123838.21249-1-peter.maydell@linaro.org
(cherry picked from commit c52e7132d7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Kevin Wolf 12fc0de2ab gluster: Fix blockdev-add with server.N.type=unix
The legacy command line interface gets the socket path from an option
called 'socket'. QAPI in contract uses SocketAddress, where the
corresponding option is called 'path'.

Fix the gluster block driver to accept both 'socket' and 'path', with
'path' being the preferred syntax.

https://bugzilla.redhat.com/show_bug.cgi?id=1545155

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20180403110810.25624-1-kwolf@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit 9dae635afa)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Greg Kurz bb8d4bb3cc exec: fix memory leak in find_max_supported_pagesize()
The string returned by object_property_get_str() is dynamically allocated.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <152231458624.69730.1752893648612848392.stgit@bahia.lan>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 72a841d2a4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Alexandro Sanchez Bach 2b73d5d348 target/i386: Fix andn instruction
In commit 7073fbada7, the `andn` instruction
was implemented via `tcg_gen_andc` but passes the operands in the wrong
order:
- X86 defines `andn dest,src1,src2` as: dest = ~src1 & src2
- TCG defines `andc dest,src1,src2` as: dest = src1 & ~src2

The following simple test shows the issue:

    #include <stdio.h>
    #include <stdint.h>

    int main(void) {
        uint32_t ret = 0;
        __asm (
            "mov $0xFF00, %%ecx\n"
            "mov $0x0F0F, %%eax\n"
            "andn %%ecx, %%eax, %%ecx\n"
            "mov %%ecx, %0\n"
          : "=r" (ret));
        printf("%08X\n", ret);
        return 0;
    }

This patch fixes the problem by simply swapping the order of the two last
arguments in `tcg_gen_andc_tl`.

Reported-by: Alexandro Sanchez Bach <alexandro@phi.nz>
Signed-off-by: Alexandro Sanchez Bach <alexandro@phi.nz>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5cd10051c2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Richard Henderson 5aac5f6aee tcg: Mark muluh_i64 and mulsh_i64 as 64-bit ops
Failure to do so results in the tcg optimizer sign-extending
any constant fold from 32-bits.  This turns out to be visible
in the RISC-V testsuite using a host that emits these opcodes
(e.g. any non-x86_64).

Reported-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit f2f1dde751)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Max Reitz c793a0debd iotests: Test preallocated truncate of 2G image
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180228131315.30194-3-mreitz@redhat.com
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 733d1dce0f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Max Reitz d315353bed block/file-posix: Fix fully preallocated truncate
Storing the lseek() result in an int results in it overflowing when the
file is at least 2 GB big.  Then, we have a 50 % chance of the result
being "negative" and thus thinking an error occurred when actually
everything went just fine.

So we should use the correct type for storing the result: off_t.

Reported-by: Daniel P. Berrange <berrange@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549231
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180228131315.30194-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 82b45e0a0b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Michal Privoznik 332969a64d qemu-pr-helper: Actually allow users to specify pidfile
Due to wrong specification of arguments to getopt_long() any
attempt to set pidfile resulted in:

1) the default to be leaked
2) the @pidfile variable to be set to NULL (because optarg is
NULL without this patch).

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <6f10cd53d361a395aa0e85a9311ec4e9a8fc11e5.1521868451.git.mprivozn@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f8e1a98964)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:01 -05:00
Greg Kurz db9c0d907e virtio_net: flush uncompleted TX on reset
If the backend could not transmit a packet right away for some reason,
the packet is queued for asynchronous sending. The corresponding vq
element is tracked in the async_tx.elem field of the VirtIONetQueue,
for later freeing when the transmission is complete.

If a reset happens before completion, virtio_net_tx_complete() will push
async_tx.elem back to the guest anyway, and we end up with the inuse flag
of the vq being equal to -1. The next call to virtqueue_pop() is then
likely to fail with "Virtqueue size exceeded".

This can be reproduced easily by starting a guest with an hubport backend
that is not connected to a functional network, eg,

 -device virtio-net-pci,netdev=hub0 -netdev hubport,id=hub0,hubid=0

and no other -netdev hubport,hubid=0 on the command line.

The appropriate fix is to ensure that such an asynchronous transmission
cannot survive a device reset. So for all queues, we first try to send
the packet again, and eventually we purge it if the backend still could
not deliver it.

CC: qemu-stable@nongnu.org
Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://github.com/open-power-host-os/qemu/issues/37
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 94b52958b7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Victor Kamensky a053477b7c arm/translate-a64: treat DISAS_UPDATE as variant of DISAS_EXIT
In OE project 4.15 linux kernel boot hang was observed under
single cpu aarch64 qemu. Kernel code was in a loop waiting for
vtimer arrival, spinning in TC generated blocks, while interrupt
was pending unprocessed. This happened because when qemu tried to
handle vtimer interrupt target had interrupts disabled, as
result flag indicating TCG exit, cpu->icount_decr.u16.high,
was cleared but arm_cpu_exec_interrupt function did not call
arm_cpu_do_interrupt to process interrupt. Later when target
reenabled interrupts, it happened without exit into main loop, so
following code that waited for result of interrupt execution
run in infinite loop.

To solve the problem instructions that operate on CPU sys state
(i.e enable/disable interrupt), and marked as DISAS_UPDATE,
should be considered as DISAS_EXIT variant, and should be
forced to exit back to main loop so qemu will have a chance
processing pending CPU state updates, including pending
interrupts.

This change brings consistency with how DISAS_UPDATE is treated
in aarch32 case.

CC: Peter Maydell <peter.maydell@linaro.org>
CC: Alex Bennée <alex.bennee@linaro.org>
CC: qemu-stable@nongnu.org
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1521526368-1996-1-git-send-email-kamensky@cisco.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit a75a52d624)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Kevin Wolf 7c22c5a3a7 tests/multiboot: Add .gitignore
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit e2679395d5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Kevin Wolf 678a75b8b5 tests/multiboot: Add tests for the a.out kludge
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit 1c8c426fb4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Kevin Wolf 2392a86d2d tests/multiboot: Test exit code for every qemu run
Testing the exit code only once after a whole group of tests has
completed is not enough, it catches errors only in the very last qemu
invocation. We need to have the check after each qemu run.

The logging and diff with the reference output is still done once per
group to keep things more managable. This is not a problem because the
log file accumulates the output of all runs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit 49713c413a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Kevin Wolf 32ccaed1ae multiboot: Check validity of mh_header_addr
I couldn't find a case where this prevents something bad from happening
that isn't already caught by other checks, but let's err on the safe
side and check that mh_header_addr is as expected.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit dbf2dce7aa)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Kevin Wolf 31b0eb0dad multiboot: Reject kernels exceeding the address space
The code path where mh_load_end_addr is non-zero in the Multiboot
header checks that mh_load_end_addr >= mh_load_addr and so
mb_load_size is checked.  However, mb_load_size is not checked when
calculated from the file size, when mh_load_end_addr is 0.

If the kernel binary size is larger than can fit in the address space
after load_addr, we ended up with a kernel_size that is smaller than
load_size, which means that we read the file into a too small buffer.

Add a check to reject kernel files with such Multiboot headers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com>
(cherry picked from commit b17a9054a0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Jack Schwartz bbdcfd2dbc multiboot: fprintf(stderr...) -> error_report()
Change all fprintf(stderr...) calls in hw/i386/multiboot.c to call
error_report() instead, including the mb_debug macro.  Remove the "\n"
from strings passed to all modified calls, since error_report() appends
one.

Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4b9006a41e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Jack Schwartz ddf965ef9d multiboot: Use header names when displaying fields
Refer to field names when displaying fields in printf and debug statements.

Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ce5eb6dc4d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Jack Schwartz a030730621 multiboot: Remove unused variables from multiboot.c
Remove unused variables: mh_mode_type, mh_width, mh_height, mh_depth

Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7a2e43cc96)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Jack Schwartz 059a6962b2 multiboot: bss_end_addr can be zero
The multiboot spec (https://www.gnu.org/software/grub/manual/multiboot/),
section 3.1.3, allows for bss_end_addr to be zero.

A zero bss_end_addr signifies there is no .bss section.

Suggested-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2a8fcd119e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:45:00 -05:00
Peter Lieven ab2dfe8d17 migration/block: reset dirty bitmap before read in bulk phase
Reset the dirty bitmap before reading to make sure we don't miss
any new data.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1520507908-16743-3-git-send-email-pl@kamp.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 86b124bc76)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini cebb7fea7c memory: fix flatview_access_valid RCU read lock/unlock imbalance
Fixes: 11e732a5ed
Reported-by: Cornelia Huck <cohuck@redhat.com>
Reported-by: luigi burdo <intermediadc@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180307130238.19358-1-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit b39b61e410)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini aedaf01f8c address_space_rw: address_space_to_flatview needs RCU lock
address_space_rw is calling address_space_to_flatview but it can
be called outside the RCU lock.  To fix it, transform flatview_rw
into address_space_rw, since flatview_rw is otherwise unused.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit db84fd973e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini a7f4d8a105 address_space_map: address_space_to_flatview needs RCU lock
address_space_map is calling address_space_to_flatview but it can
be called outside the RCU lock.  The function itself is calling
rcu_read_lock/rcu_read_unlock, just in the wrong place, so the
fix is easy.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ad0c60fa57)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini fa876bc97a address_space_access_valid: address_space_to_flatview needs RCU lock
address_space_access_valid is calling address_space_to_flatview but it can
be called outside the RCU lock.  To fix it, push the rcu_read_lock/unlock
pair up from flatview_access_valid to address_space_access_valid.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 11e732a5ed)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini f77c231202 address_space_read: address_space_to_flatview needs RCU lock
address_space_read is calling address_space_to_flatview but it can
be called outside the RCU lock.  To fix it, push the rcu_read_lock/unlock
pair up from flatview_read_full to address_space_read's constant size
fast path and address_space_read_full.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b2a44fcad7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini df04d1f16a address_space_write: address_space_to_flatview needs RCU lock
address_space_write is calling address_space_to_flatview but it can
be called outside the RCU lock.  To fix it, push the rcu_read_lock/unlock
pair up from flatview_write to address_space_write.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 4c6ebbb364)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini ac25a3257f memory: inline some performance-sensitive accessors
These accessors are called from inlined functions, and the call sequence
is much more expensive than just inlining the access.  Move the
struct declaration to memory-internal.h so that exec.c and memory.c
can both use an inline function.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 785a507ec7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
Paolo Bonzini 8e8b73992c openpic_kvm: drop address_space_to_flatview call
The MemoryListener is registered on address_space_memory, there is
not much to assert.  This currently works because the callback
is invoked only once when the listener is registered, but section->fv
is the _new_ FlatView, not the old one on later calls and that
would break.

This confines address_space_to_flatview to exec.c and memory.c.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 80d2b933f9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:59 -05:00
KONRAD Frederic 499211829d sparc: fix leon3 casa instruction when MMU is disabled
Since the commit af7a06bac7d3abb2da48ef3277d2a415772d2ae8:
`casa [..](10), .., ..` (and probably others alternate space instructions)
triggers a data access exception when the MMU is disabled.

When we enter get_asi(...) dc->mem_idx is set to MMU_PHYS_IDX when the MMU
is disabled. Just keep mem_idx unchanged in this case so we passthrough the
MMU when it is disabled.

Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
(cherry picked from commit 6e10f37c86)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Max Filippov e1f5a04d17 linux-user: fix target_mprotect/target_munmap error return values
target_mprotect/target_munmap return value goes through get_errno at the
call site, thus the functions must either set errno to host error code
and return -1 or return negative guest error code. Do the latter.

Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180228221609.11265-8-jcmvbkbc@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 78cf339039)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Max Filippov 8fc971ed1b linux-user: fix assertion in shmdt
shmdt fails to call mmap_lock/mmap_unlock around page_set_flags,
resulting in the following assertion:
  page_set_flags: Assertion `have_mmap_lock()' failed.

Wrap shmdt internals into mmap_lock/mmap_unlock.

Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180228221609.11265-7-jcmvbkbc@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 3c5f6a5f88)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Max Filippov 1801fabd9b linux-user: fix mmap/munmap/mprotect/mremap/shmat
In linux-user QEMU that runs for a target with TARGET_ABI_BITS bigger
than L1_MAP_ADDR_SPACE_BITS an assertion in page_set_flags fires when
mmap, munmap, mprotect, mremap or shmat is called for an address outside
the guest address space. mmap and mprotect should return ENOMEM in such
case.

Change definition of GUEST_ADDR_MAX to always be the last valid guest
address. Account for this change in open_self_maps.
Add macro guest_addr_valid that verifies if the guest address is valid.
Add function guest_range_valid that verifies if address range is within
guest address space and does not wrap around. Use that macro in
mmap/munmap/mprotect/mremap/shmat for error checking.

Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180307215010.30706-1-jcmvbkbc@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit ebf9a3630c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Max Filippov 2ac88347fa target/xtensa: dump correct physical registers
xtensa_cpu_dump_state outputs CPU physical registers as is, without
synchronization from current window. That may result in different values
printed for the current window and corresponding physical registers.
Synchronize physical registers from window before dumping.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
(cherry picked from commit b55b1afda9)
 Conflicts:
	target/xtensa/translate.c
* drop context dependencies
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Mark Cave-Ayland 1f4637953e loader: don't perform overlapping address check for memory region ROM images
All memory region ROM images have a base address of 0 which causes the overlapping
address check to fail if more than one memory region ROM image is present, or an
existing ROM image is loaded at address 0.

Make sure that we ignore the overlapping address check in
rom_check_and_register_reset() if this is a memory region ROM image. In particular
this fixes the "rom: requested regions overlap" error on startup when trying to
run qemu-system-sparc with a -kernel image since commit 7497638642: "tcx: switch to
load_image_mr() and remove prom_addr hack".

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
(cherry picked from commit ca316c1152)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Stefan Berger e2fc495b4a tpm: Set the flags of the CMD_INIT command to 0
The flags of the CMD_INIT control channel command were not
initialized properly. Fix this and set to 0.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit 3027058764)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Kevin Wolf 4f81878152 rbd: Fix use after free in qemu_rbd_set_keypairs() error path
If we want to include the invalid option name in the error message, we
can't free the string earlier than that.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 71c87815f9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Alberto Garcia 6f36bd3969 specs/qcow2: Fix documentation of the compressed cluster descriptor
This patch fixes several mistakes in the documentation of the
compressed cluster descriptor:

1) the documentation claims that the cluster descriptor contains the
   number of sectors used to store the compressed data, but what it
   actually contains is the number of sectors *minus one* or, in other
   words, the number of additional sectors after the first one.

2) the width of the fields is incorrectly specified. The number of bits
   used by each field is

      x = 62 - (cluster_bits - 8)   for the offset field
      y = (cluster_bits - 8)        for the size field

   So the offset field's location is [0, x-1], not [0, x] as stated.

3) the size field does not contain the size of the compressed data,
   but rather the number of sectors where that data is stored. The
   compressed data starts at the exact point specified in the offset
   field and ends when there's enough data to produce a cluster of
   decompressed data. Both points can be in the middle of a sector,
   allowing several compressed clusters to be stored next to one
   another, sharing sectors if necessary.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 156b46ded3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Eric Blake 0430aa089c nbd: Honor server's advertised minimum block size
Commit 79ba8c98 (v2.7) changed the setting of request_alignment
to occur only during bdrv_refresh_limits(), rather than at at
bdrv_open() time; but at the time, NBD was unaffected, because
it still used sector-based callbacks, so the block layer
defaulted NBD to use 512 request_alignment.

Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based
callbacks, without setting request_alignment.  This resulted in
NBD using request_alignment of 1, which works great when the
server supports it (as is the case for qemu-nbd), but falls apart
miserably if the server requires alignment (but only if qemu
actually sends a sub-sector request; qemu-io can do it, but
most qemu operations still perform on sectors or larger).

Even later, the NBD protocol was updated to document that clients
should learn the server's minimum alignment during NBD_OPT_GO;
and recommended that clients should assume a minimum size of 512
unless the server understands NBD_OPT_GO and replied with a smaller
size.  Commit 081dd1fe (v2.10) attempted to do that, by assigning
request_alignment to whatever was learned from the server; but
it has two flaws: the assignment is done during bdrv_open() so
it gets unconditionally wiped out back to 1 during any later
bdrv_refresh_limits(); and the code is not using a default of 512
when the server did not report a minimum size.

Fix these issues by moving the assignment to request_alignment
to the right function, and by using a sane default when the
server does not advertise a minimum size.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180215032905.27146-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
(cherry picked from commit fd8d372dd3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:58 -05:00
Greg Kurz 327d4645a0 spapr: make pseries-2.11 the default machine type
The spapr capability framework was introduced in QEMU 2.12. It allows
to have an explicit control on how host features are exposed to the
guest. This is especially needed to handle migration between hetero-
geneous hosts (eg, POWER8 to POWER9). It is also used to expose fixes/
workarounds against speculative execution vulnerabilities to guests.
The framework was hence backported to QEMU 2.11.1, especially these
commits:

0fac4aa930 spapr: Add pseries-2.12 machine type
9070f408f4 spapr: Treat Hardware Transactional Memory (HTM) as an
 optional capability

0fac4aa930 has the confusing effect of making pseries-2.12 the default
machine type for QEMU 2.11.1, instead of the expected pseries-2.11. This
patch changes the default machine back to pseries-2.11.

Unfortunately, 9070f408f4 enforces the HTM capability for pseries-2.11
to be enabled by default, ie, when not passing cap-htm on the command
line. This breaks several 'make check' testcases that run qemu-system-ppc64
with TCG.

The only sane way to fix this is to adapt the impacted testcases so that
they all pass cap-htm=off in this case. This patch does that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 20:44:16 -05:00
Tiwei Bie b940928fa0 virtio-balloon: unref the memory region before continuing
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b86107ab43)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 15:20:44 -05:00
Laszlo Ersek c3c44a4fb0 pci-bridge/i82801b11: clear bridge registers on platform reset
The "i82801b11-bridge" device model is a descendant of "base-pci-bridge"
(TYPE_PCI_BRIDGE). However, unlike other similar devices, such as

- pci-bridge,
- pcie-pci-bridge,
- PCIE Root Port,
- xio3130 switch upstream and downstream ports,
- dec-21154-p2p-bridge,
- pbm-bridge,
- xilinx-pcie-root,

"i82801b11-bridge" does not clear the bridge specific registers at
platform reset.

This is a problem because devices on "i82801b11-bridge" continue to
respond to config space cycles after platform reset, when addressed with
the bus number that was previously programmed into the secondary bus
number register of "i82801b11-bridge". This error breaks OVMF's search for
extra (PXB) root buses, for example.

The device class reset method for "i82801b11-bridge" is currently NULL;
set it directly to pci_bridge_reset(), like the last three bridge models
in the above listing do.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: qemu-stable@nongnu.org
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1541839
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ed247f40db)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 15:20:44 -05:00
Murilo Opsfelder Araujo 82c5191f9c block/ssh: fix possible segmentation fault when .desc is not null-terminated
This patch prevents a possible segmentation fault when .desc members are checked
against NULL.

The ssh_runtime_opts was added by commit
8a6a80896d ("block/ssh: Use QemuOpts for runtime
options").

This fix was inspired by
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg00883.html.

Fixes: 8a6a80896d ("block/ssh: Use QemuOpts for runtime options")
Cc: Max Reitz <mreitz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit fbd5c4c0db)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20 15:20:44 -05:00
Greg Kurz 72cc467aab spapr: register dummy ICPs later
Some older machine types create more ICPs than needed. We hence
need to register up to xics_max_server_number() dummy ICPs to
accomodate the migration of these machine types.

Recent VSMT rework changed xics_max_server_number() to return

    DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads)

instead of

    DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);

The change is okay but it requires spapr->vsmt to be set, which
isn't the case with the current code. This causes the formula to
return zero and we don't create dummy ICPs. This breaks migration
of older guests as reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=1549087

The dummy ICP workaround doesn't really have a dependency on XICS
itself. But it does depend on proper VCPU id numbering and it must
be applied before creating vCPUs (ie, creating real ICPs). So this
patch moves the workaround to spapr_init_cpus(), which already
assumes VSMT to be set.

Fixes: 72194664c8 ("spapr: use spapr->vsmt to compute VCPU ids")
Reported-by: Lukas Doktor <ldoktor@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 72fdd4de8e)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:59:00 +02:00
Greg Kurz 184da186db spapr: fix missing CPU core nodes in DT when running with TCG
Commit 5d0fb1508e "spapr: consolidate the VCPU id numbering logic
in a single place" introduced a helper to detect thread0 of a virtual
core based on its VCPU id. This is used to create CPU core nodes in
the DT, but it is broken in TCG.

$ qemu-system-ppc64 -nographic -accel tcg -machine dumpdtb=dtb.bin \
                    -smp cores=16,maxcpus=16,threads=1
$ dtc -f -O dts dtb.bin | grep POWER8
                PowerPC,POWER8@0 {
                PowerPC,POWER8@8 {

instead of the expected 16 cores that we get with KVM:

$ dtc -f -O dts dtb.bin | grep POWER8
                PowerPC,POWER8@0 {
                PowerPC,POWER8@8 {
                PowerPC,POWER8@10 {
                PowerPC,POWER8@18 {
                PowerPC,POWER8@20 {
                PowerPC,POWER8@28 {
                PowerPC,POWER8@30 {
                PowerPC,POWER8@38 {
                PowerPC,POWER8@40 {
                PowerPC,POWER8@48 {
                PowerPC,POWER8@50 {
                PowerPC,POWER8@58 {
                PowerPC,POWER8@60 {
                PowerPC,POWER8@68 {
                PowerPC,POWER8@70 {
                PowerPC,POWER8@78 {

This happens because spapr_get_vcpu_id() maps VCPU ids to
cs->cpu_index in TCG mode. This confuses the code in
spapr_is_thread0_in_vcore(), since it assumes thread0 VCPU
ids to have a spapr->vsmt spacing.

    spapr_get_vcpu_id(cpu) % spapr->vsmt == 0

Actually, there's no real reason to expose cs->cpu_index instead
of the VCPU id, since we also generate it with TCG. Also we already
set it explicitly in spapr_set_vcpu_id(), so there's no real reason
either to call kvm_arch_vcpu_id() with KVM.

This patch unifies spapr_get_vcpu_id() to always return the computed
VCPU id both in TCG and KVM. This is one step forward towards KVM<->TCG
migration.

Fixes: 5d0fb1508e
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit b1a568c1c2)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:59:00 +02:00
Greg Kurz e676038e60 spapr: consolidate the VCPU id numbering logic in a single place
Several places in the code need to calculate a VCPU id:

    (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads
    (core_id / smp_threads) * spapr->vsmt (1 user)
    index * spapr->vsmt (2 users)

or guess that the VCPU id of a given VCPU is the first thread of a virtual
core:

    index % spapr->vsmt != 0

Even if the numbering logic isn't that complex, it is rather fragile to
have these assumptions open-coded in several places. FWIW this was
proved with recent issues related to VSMT.

This patch moves the VCPU id formula to a single function to be called
everywhere the code needs to compute one. It also adds an helper to
guess if a VCPU is the first thread of a VCORE.

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Rename spapr_is_vcore() to spapr_is_thread0_in_vcore() for clarity]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 5d0fb1508e)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
Greg Kurz 094706e69f spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id()
The spapr_vcpu_id() function is an accessor actually. Let's rename it
for symmetry with the recently added spapr_set_vcpu_id() helper.

The motivation behind this is that a later patch will consolidate
the VCPU id formula in a function and spapr_vcpu_id looks like an
appropriate name.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 14bb4486c8)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
David Gibson 8c0ec3c398 target/ppc: Clarify compat mode max_threads value
We recently had some discussions that were sidetracked for a while, because
nearly everyone misapprehended the purpose of the 'max_threads' field in
the compatiblity modes table.  It's all about guest expectations, not host
expectations or support (that's handled elsewhere).

In an attempt to avoid a repeat of that confusion, rename the field to
'max_vthreads' and add an explanatory comment.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
(cherry picked from commit abbc124753)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
Greg Kurz a8074a5895 spapr: move VCPU calculation to core machine code
The VCPU ids are currently computed and assigned to each individual
CPU threads in spapr_cpu_core_realize(). But the numbering logic
of VCPU ids is actually a machine-level concept, and many places
in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes.

The current formula used in spapr_cpu_core_realize() is:

    vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i

where:

    cc->core_id is a multiple of smp_threads
    cpu_index = cc->core_id + i
    0 <= i < smp_threads

So we have:

    cpu_index % smp_threads == i
    cc->core_id / smp_threads == cpu_index / smp_threads

hence:

    vcpu_id =
        (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads;

This formula was used before VSMT at the time VCPU ids where computed
at the target emulation level. It has the advantage of being useable
to derive a VPCU id out of a CPU index only. It is fitted for all the
places where the machine code has to compute a VCPU id.

This patch introduces an accessor to set the VCPU id in a PowerPCCPU object
using the above formula. It is a first step to consolidate all the VCPU id
logic in a single place.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 648edb6475)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
Greg Kurz cc86c46758 spapr: use spapr->vsmt to compute VCPU ids
Since the introduction of VSMT in 2.11, the spacing of VCPU ids
between cores is controllable through a machine property instead
of being only dictated by the SMT mode of the host:

    cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i

Until recently, the machine code would try to change the SMT mode
of the host to be equal to VSMT or exit. This allowed the rest of
the code to assume that kvmppc_smt_threads() == spapr->vsmt is
always true.

Recent commit "8904e5a75005 spapr: Adjust default VSMT value for
better migration compatibility" relaxed the rule. If the VSMT
mode cannot be set in KVM for some reasons, but the requested
CPU topology is compatible with the current SMT mode, then we
let the guest run with  kvmppc_smt_threads() != spapr->vsmt.

This breaks quite a few places in the code, in particular when
calculating DRC indexes.

This is what happens on a POWER host with subcores-per-core=2 (ie,
supports up to SMT4) when passing the following topology:

    -smp threads=4,maxcpus=16 \
    -device host-spapr-cpu-core,core-id=4,id=core1 \
    -device host-spapr-cpu-core,core-id=8,id=core2

qemu-system-ppc64: warning: Failed to set KVM's VSMT mode to 8 (errno -22)

This is expected since KVM is limited to SMT4, but the guest is started
anyway because this topology can run on SMT4 even with a VSMT8 spacing.

But when we look at the DT, things get nastier:

cpus {
        ...
        ibm,drc-indexes = <0x4 0x10000000 0x10000004 0x10000008 0x1000000c>;

This means that we have the following association:

 CPU core device |     DRC    | VCPU id
-----------------+------------+---------
   boot core     | 0x10000000 | 0
   core1         | 0x10000004 | 4
   core2         | 0x10000008 | 8
   core3         | 0x1000000c | 12

But since the spacing of VCPU ids is 8, the DRC for core1 points to a
VCPU that doesn't exist, the DRC for core2 points to the first VCPU of
core1 and and so on...

        ...

        PowerPC,POWER8@0 {
                ...
                ibm,my-drc-index = <0x10000000>;
                ...
        };

        PowerPC,POWER8@8 {
                ...
                ibm,my-drc-index = <0x10000008>;
                ...
        };

        PowerPC,POWER8@10 {
                ...

No ibm,my-drc-index property for this core since 0x10000010 doesn't
exist in ibm,drc-indexes above.

                ...
        };
};

...

interrupt-controller {
        ...
        ibm,interrupt-server-ranges = <0x0 0x10>;

With a spacing of 8, the highest VCPU id for the given topology should be:
        16 * 8 / 4 = 32 and not 16

        ...
        linux,phandle = <0x7e7323b8>;
        interrupt-controller;
};

And CPU hot-plug/unplug is broken:

(qemu) device_del core1
pseries-hotplug-cpu: Cannot find CPU (drc index 10000004) to remove

(qemu) device_del core2
cpu 4 (hwid 8) Ready to die...
cpu 5 (hwid 9) Ready to die...
cpu 6 (hwid 10) Ready to die...
cpu 7 (hwid 11) Ready to die...

These are the VCPU ids of core1 actually

(qemu) device_add host-spapr-cpu-core,core-id=12,id=core3
(qemu) device_del core3
pseries-hotplug-cpu: Cannot find CPU (drc index 1000000c) to remove

This patches all the code in hw/ppc/spapr.c to assume the VSMT
spacing when manipulating VCPU ids.

Fixes: 8904e5a750
Signed-off-by: Greg Kurz <groug@kaod.org>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

(cherry picked from commit 72194664c8)

Signed-off-by: Greg Kurz <groug@kaod.org>

Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
Laurent Vivier c30b366d00 spapr: set vsmt to MAX(8, smp_threads)
We ignore silently the value of smp_threads when we set
the default VSMT value, and if smp_threads is greater than VSMT
kernel is going into trouble later.

Fixes: 8904e5a750
("spapr: Adjust default VSMT value for better migration compatibility")

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4ad64cbd0c)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
David Gibson af65bce409 spapr: Adjust default VSMT value for better migration compatibility
fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
"vsmt" parameter for the pseries machine type, which controls the spacing
of the vcpu ids of thread 0 for each virtual core.  This was done to bring
some consistency and stability to how that was done, while still allowing
backwards compatibility for migration and otherwise.

The default value we used for vsmt was set to the max of the host's
advertised default number of threads and the number of vthreads per vcore
in the guest.  This was done to continue running without extra parameters
on older KVM versions which don't allow the VSMT value to be changed.

Unfortunately, even that smaller than before leakage of host configuration
into guest visible configuration still breaks things.  Specifically a guest
with 4 (or less) vthread/vcore will get a different vsmt value when
running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
vcpu ids don't line up so you can't migrate between them, though you should
be able to.

Long term we really want to make vsmt == smp_threads for sufficiently
new machine types.  However, that means that qemu will then require a
sufficiently recent KVM (one which supports changing VSMT) - that's still
not widely enough deployed to be really comfortable to do.

In the meantime we need some default that will work as often as
possible.  This patch changes that default to 8 in all circumstances.
This does change guest visible behaviour (including for existing
machine versions) for many cases - just not the most common/important
case.

Following is case by case justification for why this is still the least
worst option.  Note that any of the old behaviours can still be duplicated
after this patch, it's just that it requires manual intervention by
setting the vsmt property on the command line.

KVM HV on POWER8 host:
   This is the overwhelmingly common case in production setups, and is
   unchanged by design.  POWER8 hosts will advertise a default VSMT mode
   of 8, and > 8 vthreads/vcore isn't permitted

KVM HV on POWER7 host:
   Will break, but POWER7s allowing KVM were never released to the public.

KVM HV on POWER9 host:
   Not yet released to the public, breaking this now will reduce other
   breakage later.

KVM HV on PowerPC 970:
   Will theoretically break it, but it was barely supported to begin with
   and already required various user visible hacks to work.  Also so old
   that I just don't care.

TCG:
   This is the nastiest one; it means migration of TCG guests (without
   manual vsmt setting) will break.  Since TCG is rarely used in production
   I think this is worth it for the other benefits.  It does also remove
   one more barrier to TCG<->KVM migration which could be interesting for
   debugging applications.

KVM PR:
   As with TCG, this will break migration of existing configurations,
   without adding extra manual vsmt options.  As with TCG, it is rare in
   production so I think the benefits outweigh breakages.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 8904e5a750)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
David Gibson ad484114d1 spapr: Allow some cases where we can't set VSMT mode in the kernel
At present if we require a vsmt mode that's not equal to the kernel's
default, and the kernel doesn't let us change it (e.g. because it's an old
kernel without support) then we always fail.

But in fact we can cope with the kernel having a different vsmt as long as
  a) it's >= the actual number of vthreads/vcore (so that guest threads
     that are supposed to be on the same core act like it)
  b) it's a submultiple of the requested vsmt mode (so that guest threads
     spaced by the vsmt value will act like they're on different cores)

Allowing this case gives us a bit more freedom to adjust the vsmt behaviour
without breaking existing cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1f20f2e0ee)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:59 +02:00
Gerd Hoffmann 0a30cae50a sdl: workaround bug in sdl 2.0.8 headers
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892087

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20180307154258.9313-1-kraxel@redhat.com
(cherry picked from commit 2ca5c43091)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 12:58:05 +02:00
Paolo Bonzini 5892b5a9e3 memfd: fix configure test
Recent glibc added memfd_create in sys/mman.h.  This conflicts with
the definition in util/memfd.c:

    /builddir/build/BUILD/qemu-2.11.0-rc1/util/memfd.c:40:12: error: static declaration of memfd_create follows non-static declaration

Fix the configure test, and remove the sys/memfd.h inclusion since the
file actually does not exist---it is a typo in the memfd_create(2) man
page.

Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 75e5b70e6b)
Signed-off-by: Greg Kurz <groug@kaod.org>
2018-05-22 11:30:18 +02:00
Michael Roth 7c1beb52ed Update version for 2.11.1 release
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-14 14:41:05 -06:00
Greg Kurz 00e9fba2be spapr: add missing break in h_get_cpu_characteristics()
Detected by Coverity (CID 1385702). This fixes the recently added hypercall
to let guests properly apply Spectre and Meltdown workarounds.

Fixes: c59704b254 "target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS"
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit fa86f59234)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 19:39:27 -06:00
linzhecheng 63112b16a6 vga: check the validation of memory addr when draw text
Start a vm with qemu-kvm -enable-kvm -vnc :66 -smp 1 -m 1024 -hda
redhat_5.11.qcow2  -device pcnet -vga cirrus,
then use VNC client to connect to VM, and excute the code below in guest
OS will lead to qemu crash:

int main()
 {
    iopl(3);
    srand(time(NULL));
    int a,b;
    while(1){
	a = rand()%0x100;
	b = 0x3c0 + (rand()%0x20);
        outb(a,b);
    }
    return 0;
}

The above code is writing the registers of VGA randomly.
We can write VGA CRT controller registers index 0x0C or 0x0D
(which is the start address register) to modify the
the display memory address of the upper left pixel
or character of the screen. The address may be out of the
range of vga ram. So we should check the validation of memory address
when reading or writing it to avoid segfault.

Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-id: 20180111132724.13744-1-linzhecheng@huawei.com
Fixes: CVE-2018-5683
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 191f59dc17)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 19:19:25 -06:00
linzhecheng 30c3b4823c input: fix memory leak
If kbd_queue is not empty and queue_count >= queue_limit,
we should free evt.

Change-Id: Ieeacf90d5e7e370a40452ec79031912d8b864d83
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-id: 20171225023730.5512-1-linzhecheng@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fca4774a96)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 19:19:15 -06:00
Daniel P. Berrangé 88ab85384d ui: correctly advance output buffer when writing SASL data
In this previous commit:

  commit 8f61f1c5a6
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Dec 18 19:12:20 2017 +0000

    ui: track how much decoded data we consumed when doing SASL encoding

I attempted to fix a flaw with tracking how much data had actually been
processed when encoding with SASL. With that flaw, the VNC server could
mistakenly discard queued data that had not been sent.

The fix was not quite right though, because it merely decremented the
vs->output.offset value. This is effectively discarding data from the
end of the pending output buffer. We actually need to discard data from
the start of the pending output buffer. We also want to free memory that
is no longer required. The correct way to handle this is to use the
buffer_advance() helper method instead of directly manipulating the
offset value.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 20180201155841.27509-1-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 627ebec208)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:49 -06:00
Daniel P. Berrange 64653b7fbe ui: avoid sign extension using client width/height
Pixman returns a signed int for the image width/height, but the VNC
protocol only permits a unsigned int16. Effective framebuffer size
is determined by the guest, limited by the video RAM size, so the
dimensions are unlikely to exceed the range of an unsigned int16,
but this is not currently validated.

With the current use of 'int' for client width/height, the calculation
of offsets in vnc_update_throttle_offset() suffers from integer size
promotion and sign extension, causing coverity warnings

*** CID 1385147:  Integer handling issues  (SIGN_EXTENSION)
/ui/vnc.c: 979 in vnc_update_throttle_offset()
973      * than that the client would already suffering awful audio
974      * glitches, so dropping samples is no worse really).
975      */
976     static void vnc_update_throttle_offset(VncState *vs)
977     {
978         size_t offset =
>>>     CID 1385147:  Integer handling issues  (SIGN_EXTENSION)
>>>     Suspicious implicit sign extension:
    "vs->client_pf.bytes_per_pixel" with type "unsigned char" (8 bits,
    unsigned) is promoted in "vs->client_width * vs->client_height *
    vs->client_pf.bytes_per_pixel" to type "int" (32 bits, signed), then
    sign-extended to type "unsigned long" (64 bits, unsigned).  If
    "vs->client_width * vs->client_height * vs->client_pf.bytes_per_pixel"
    is greater than 0x7FFFFFFF, the upper bits of the result will all be 1.
979             vs->client_width * vs->client_height * vs->client_pf.bytes_per_pixel;

Change client_width / client_height to be a size_t to avoid sign
extension and integer promotion. Then validate that dimensions are in
range wrt the RFB protocol u16 limits.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20180118155254.17053-1-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 4c956bd81e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:43 -06:00
Daniel P. Berrange 9a26ca6b94 ui: mix misleading comments & return types of VNC I/O helper methods
While the QIOChannel APIs for reading/writing data return ssize_t, with negative
value indicating an error, the VNC code passes this return value through the
vnc_client_io_error() method. This detects the error condition, disconnects the
client and returns 0 to indicate error. Thus all the VNC helper methods should
return size_t (unsigned), and misleading comments which refer to the possibility
of negative return values need fixing.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-14-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 30b80fd526)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:38 -06:00
Daniel P. Berrange 172f4e5a31 ui: add trace events related to VNC client throttling
The VNC client throttling is quite subtle so will benefit from having trace
points available for live debugging.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-13-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6aa22a2918)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:32 -06:00
Daniel P. Berrange 0c85a40e71 ui: place a hard cap on VNC server output buffer size
The previous patches fix problems with throttling of forced framebuffer updates
and audio data capture that would cause the QEMU output buffer size to grow
without bound. Those fixes are graceful in that once the client catches up with
reading data from the server, everything continues operating normally.

There is some data which the server sends to the client that is impractical to
throttle. Specifically there are various pseudo framebuffer update encodings to
inform the client of things like desktop resizes, pointer changes, audio
playback start/stop, LED state and so on. These generally only involve sending
a very small amount of data to the client, but a malicious guest might be able
to do things that trigger these changes at a very high rate. Throttling them is
not practical as missed or delayed events would cause broken behaviour for the
client.

This patch thus takes a more forceful approach of setting an absolute upper
bound on the amount of data we permit to be present in the output buffer at
any time. The previous patch set a threshold for throttling the output buffer
by allowing an amount of data equivalent to one complete framebuffer update and
one seconds worth of audio data. On top of this it allowed for one further
forced framebuffer update to be queued.

To be conservative, we thus take that throttling threshold and multiply it by
5 to form an absolute upper bound. If this bound is hit during vnc_write() we
forceably disconnect the client, refusing to queue further data. This limit is
high enough that it should never be hit unless a malicious client is trying to
exploit the sever, or the network is completely saturated preventing any sending
of data on the socket.

This completes the fix for CVE-2017-15124 started in the previous patches.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-12-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f887cf165d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:26 -06:00
Daniel P. Berrange f9e53c77ea ui: fix VNC client throttling when forced update is requested
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).

The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check is disabled if the client has requested
a forced update, because we want to send these as soon as possible.

As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then repeatedly send full framebuffer update
requests, but never read data back from the server. This can easily make QEMU's
VNC server send buffer consume 100MB of RAM per second, until the OOM killer
starts reaping processes (hopefully the rogue QEMU process, but it might pick
others...).

To address this we make the throttling more intelligent, so we can throttle
full updates. When we get a forced update request, we keep track of exactly how
much data we put on the output buffer. We will not process a subsequent forced
update request until this data has been fully sent on the wire. We always allow
one forced update request to be in flight, regardless of what data is queued
for incremental updates or audio data. The slight complication is that we do
not initially know how much data an update will send, as this is done in the
background by the VNC job thread. So we must track the fact that the job thread
has an update pending, and not process any further updates until this job is
has been completed & put data on the output buffer.

This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.

This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:

  commit a7b20a8efa
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Oct 9 14:43:42 2017 +0100

    io: monitor encoutput buffer size from websocket GSource

This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-11-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit ada8d2e436)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:21 -06:00
Daniel P. Berrange f9c8767828 ui: fix VNC client throttling when audio capture is active
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).

The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check must be disabled if audio capture is
enabled, because when streaming audio the output buffer offset will rarely be
zero due to queued audio data, and so this would starve framebuffer updates.

As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then enable audio capture, and simply never
read data back from the server. This can easily make QEMU's VNC server send
buffer consume 100MB of RAM per second, until the OOM killer starts reaping
processes (hopefully the rogue QEMU process, but it might pick others...).

To address this we make the throttling more intelligent, so we can throttle
when audio capture is active too. To determine how to throttle incremental
updates or audio data, we calculate a size threshold. Normally the threshold is
the approximate number of bytes associated with a single complete framebuffer
update. ie width * height * bytes per pixel. We'll send incremental updates
until we hit this threshold, at which point we'll stop sending updates until
data has been written to the wire, causing the output buffer offset to fall
back below the threshold.

If audio capture is enabled, we increase the size of the threshold to also
allow for upto 1 seconds worth of audio data samples. ie nchannels * bytes
per sample * frequency. This allows the output buffer to have a mixture of
incremental framebuffer updates and audio data queued, but once the threshold
is exceeded, audio data will be dropped and incremental updates will be
throttled.

This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.

This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:

  commit a7b20a8efa
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Oct 9 14:43:42 2017 +0100

    io: monitor encoutput buffer size from websocket GSource

This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-10-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit e2b72cb6e0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:14 -06:00
Daniel P. Berrange 5af9f2504f ui: refactor code for determining if an update should be sent to the client
The logic for determining if it is possible to send an update to the client
will become more complicated shortly, so pull it out into a separate method
for easier extension later.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-9-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0bad834228)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:09 -06:00
Daniel P. Berrange 2e6571e671 ui: correctly reset framebuffer update state after processing dirty regions
According to the RFB protocol, a client sends one or more framebuffer update
requests to the server. The server can reply with a single framebuffer update
response, that covers all previously received requests. Once the client has
read this update from the server, it may send further framebuffer update
requests to monitor future changes. The client is free to delay sending the
framebuffer update request if it needs to throttle the amount of data it is
reading from the server.

The QEMU VNC server, however, has never correctly handled the framebuffer
update requests. Once QEMU has received an update request, it will continue to
send client updates forever, even if the client hasn't asked for further
updates. This prevents the client from throttling back data it gets from the
server. This change fixes the flawed logic such that after a set of updates are
sent out, QEMU waits for a further update request before sending more data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-8-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 728a7ac954)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:34:04 -06:00
Daniel P. Berrange 126617e6f8 ui: introduce enum to track VNC client framebuffer update request state
Currently the VNC servers tracks whether a client has requested an incremental
or forced update with two boolean flags. There are only really 3 distinct
states to track, so create an enum to more accurately reflect permitted states.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-7-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fef1bbadfb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:33:55 -06:00
Daniel P. Berrange 8a9c5c34ac ui: track how much decoded data we consumed when doing SASL encoding
When we encode data for writing with SASL, we encode the entire pending output
buffer. The subsequent write, however, may not be able to send the full encoded
data in one go though, particularly with a slow network. So we delay setting the
output buffer offset back to zero until all the SASL encoded data is sent.

Between encoding the data and completing sending of the SASL encoded data,
however, more data might have been placed on the pending output buffer. So it
is not valid to set offset back to zero. Instead we must keep track of how much
data we consumed during encoding and subtract only that amount.

With the current bug we would be throwing away some pending data without having
sent it at all. By sheer luck this did not previously cause any serious problem
because appending data to the send buffer is always an atomic action, so we
only ever throw away complete RFB protocol messages. In the case of frame buffer
updates we'd catch up fairly quickly, so no obvious problem was visible.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-6-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8f61f1c5a6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:33:50 -06:00
Daniel P. Berrange 616d64ac06 ui: avoid pointless VNC updates if framebuffer isn't dirty
The vnc_update_client() method checks the 'has_dirty' flag to see if there are
dirty regions that are pending to send to the client. Regardless of this flag,
if a forced update is requested, updates must be sent. For unknown reasons
though, the code also tries to sent updates if audio capture is enabled. This
makes no sense as audio capture state does not impact framebuffer contents, so
this check is removed.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-5-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3541b08475)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:33:39 -06:00
Daniel P. Berrange a7b2537f8a ui: remove redundant indentation in vnc_client_update
Now that previous dead / unreachable code has been removed, we can simplify
the indentation in the vnc_client_update method.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-4-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit b939eb89b6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:33:33 -06:00
Daniel P. Berrange de1e7a91c8 ui: remove unreachable code in vnc_update_client
A previous commit:

  commit 5a8be0f73d
  Author: Gerd Hoffmann <kraxel@redhat.com>
  Date:   Wed Jul 13 12:21:20 2016 +0200

    vnc: make sure we finish disconnect

Added a check for vs->disconnecting at the very start of the
vnc_update_client method. This means that the very next "if"
statement check for !vs->disconnecting always evaluates true,
and is thus redundant. This in turn means the vs->disconnecting
check at the very end of the method never evaluates true, and
is thus unreachable code.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-3-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c53df96161)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:33:28 -06:00
Daniel P. Berrange 0181686a98 ui: remove 'sync' parameter from vnc_update_client
There is only one caller of vnc_update_client and that always passes false
for the 'sync' parameter.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-2-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6af998db05)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 18:33:22 -06:00
Greg Kurz a3fd64f2fe migration: incoming postcopy advise sanity checks
If postcopy-ram was set on the source but not on the destination,
migration doesn't occur, the destination prints an error and boots
the guest:

qemu-system-ppc64: Expected vmdescription section, but got 0

We end up with two running instances.

This behaviour was introduced in 2.11 by commit 58110f0acb "migration:
split common postcopy out of ram postcopy" to prepare ground for the
upcoming dirty bitmap postcopy support. It adds a new case where the
source may send an empty postcopy advise because dirty bitmap doesn't
need to check page sizes like RAM postcopy does.

If the source has enabled postcopy-ram, then it sends an advise with
the page size values. If the destination hasn't enabled postcopy-ram,
then loadvm_postcopy_handle_advise() leaves the page size values on
the stream and returns. This confuses qemu_loadvm_state() later on
and causes the destination to start execution.

As discussed several times, postcopy-ram should be enabled both sides
to be functional. This patch changes the destination to perform some
extra checks on the advise length to ensure this is the case. Otherwise
an error is returned and migration is aborted.

Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <151791621042.19120.3103118434734245776.stgit@bahia>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 875fcd013a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-12 09:28:28 -06:00
Philippe Mathieu-Daudé 68d7e24475 target/sh4: add missing tcg_temp_free() in _decode_opc()
missed in c55497ecb8 and 852d481faf.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20171205170013.22337-3-f4bug@amsat.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit e691e0ed13)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-11 21:06:46 -06:00
Daniel Henrique Barboza 2095c5a2e3 migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32
MAX_VM_CMD_PACKAGED_SIZE is a constant used in qemu_savevm_send_packaged
and loadvm_handle_cmd_packaged to determine whether a package is too
big to be sent or received. qemu_savevm_send_packaged is called inside
postcopy_start (migration/migration.c) to send the MigrationState
in a single blob to the destination, using the MIG_CMD_PACKAGED subcommand,
which will read it up using loadvm_handle_cmd_packaged. If the blob is
larger than MAX_VM_CMD_PACKAGED_SIZE, an error is thrown and the postcopy
migration is aborted. Both MAX_VM_CMD_PACKAGED_SIZE and MIG_CMD_PACKAGED
were introduced by commit 11cf1d984b ("MIG_CMD_PACKAGED: Send a packaged
chunk ..."). The constant has its original value of 1ul << 24 (16MB).

The current MAX_VM_CMD_PACKAGED_SIZE value is not enough to support postcopy
migration of bigger pseries guests. The blob size for a postcopy migration of
a pseries guest with the following setup:

qemu-system-ppc64 --nographic -vga none -machine pseries,accel=kvm -m 64G \
-smp 1,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=f27.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-netdev user,id=u1 -net nic,netdev=u1

Goes around 12MB. Bumping the RAM to 128G makes the blob sizes goes to 20MB.
With 256G the blob goes to 37MB - more than twice the current maximum size.
At this moment the pseries machine can handle guests with up to 1TB of RAM,
making this postcopy blob goes to 128MB of size approximately.

Following the discussions made in [1], there is a need to understand what
devices are aggressively consuming the blob in that manner and see if that
can be mitigated. Until then, we can set MAX_VM_CMD_PACKAGED_SIZE to the
maximum value allowed. Since the size is a 32 bit int variable, we can set
it as 1ul << 32, giving a maximum blob size of 4G that is enough to support
postcopy migration of 32TB RAM guests given the above constraints.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06313.html

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit ee555cdf4d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-11 21:06:01 -06:00
Dr. David Alan Gilbert c8847f5565 migration: Recover block devices if failure in device state
In e91d895 I added the new pause-before-switchover mechanism
to allow migration completion to be delayed; this changes the
last state prior to completion to MIGRATE_STATUS_DEVICE rather
than MIGRATE_STATUS_ACTIVE.

Fix the failure path in migration_completion to recover the block
devices if it fails in MIGRATE_STATUS_DEVICE, not just the
MIGRATE_STATUS_ACTIVE that it previously had.

This corresponds to rh bz:
  https://bugzilla.redhat.com/show_bug.cgi?id=1538494
whose symptom is an occasional source crash on a failed migration.

Fixes: e91d8951d5
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 6039dd5b1c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-11 21:05:39 -06:00
Ross Lagerwall b9eec804f4 migration: Don't leak IO channels
Since qemu_fopen_channel_{in,out}put take references on the underlying
IO channels, make sure to release our references to them.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Message-Id: <20171101142526.1006-2-ross.lagerwall@citrix.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit 032b79f717)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-11 21:05:05 -06:00
Christian Borntraeger b8aa511bc0 s390x/sclp: fix event mask handling
commit 67915de9f0 ("s390x/event-facility: variable-length event
masks") switched the sclp receive/send mask. This broke the sclp
lm console.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: commit 67915de9f0 ("s390x/event-facility: variable-length event masks")
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20180202094241.59537-1-borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 869e676ae7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-11 20:27:04 -06:00
linzhecheng ab7b4f6734 memory: set ioeventfd_update_pending after address_space_update_ioeventfds
We should set ioeventfd_update_pending same as memory_region_update_pending.

Signed-off-by: linzhecheng <linzc@zju.edu.cn>
Message-Id: <1515934519-16158-1-git-send-email-linzc@zju.edu.cn>
Cc: qemu-stable@nongnu.org
Fixes: ade9c1aac5
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0b15209571)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-11 20:24:14 -06:00
Suraj Jitindar Singh ed8b4ecc68 target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS
The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query
behaviours and available characteristics of the cpu.

Implement the handler for this new H-Call which formulates its response
based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit c59704b254)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:38 -06:00
Suraj Jitindar Singh eab4b5170f target/ppc/spapr_caps: Add new tristate cap safe_indirect_branch
Add new tristate cap cap-ibs to represent the indirect branch
serialisation capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4be8d4e7d9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:38 -06:00
Suraj Jitindar Singh d7aa3d0a0a target/ppc/spapr_caps: Add new tristate cap safe_bounds_check
Add new tristate cap cap-sbbc to represent the speculation barrier
bounds checking capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 09114fd817)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:38 -06:00
Suraj Jitindar Singh 3dc12273b7 target/ppc/spapr_caps: Add new tristate cap safe_cache
Add new tristate cap cap-cfpc to represent the cache flush on privilege
change capability.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 8f38eaf8f9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:37 -06:00
Suraj Jitindar Singh e9a8747cd2 target/ppc/spapr_caps: Add support for tristate spapr_capabilities
spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.

Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 6898aed77f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:37 -06:00
Suraj Jitindar Singh 49b1fa33a3 target/ppc/kvm: Add cap_ppc_safe_[cache/bounds_check/indirect_branch]
Add three new kvm capabilities used to represent the level of host support
for three corresponding workarounds.

Host support for each of the capabilities is queried through the
new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The
first two, character and behaviour, represent the available
characteristics of the cpu and the behaviour of the cpu respectively.
The second two, c_mask and b_mask, represent the mask of known bits for
the character and beheviour dwords respectively.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Correct some compile errors due to name change in final kernel
 patch version]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

(cherry picked from commit 8acc2ae5e9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:37 -06:00
Suraj Jitindar Singh 43a29f0025 target/ppc/spapr_caps: Add macro to generate spapr_caps migration vmstate
The vmstate description and the contained needed function for migration
of spapr_caps is the same for each cap, with the name of the cap
substituted. As such introduce a macro to allow for easier generation of
these.

Convert the three existing spapr_caps (htm, vsx, and dfp) to use this
macro.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1f63ebaa91)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:37 -06:00
Cédric Le Goater d72e0a69ea target/ppc: introduce the PPC_BIT() macro
and use them in a couple of obvious places. Other macros will be used
in the model of the XIVE interrupt controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2a83f9976e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 19:07:37 -06:00
Greg Kurz 4374cbca95 spapr: fix device tree properties when using compatibility mode
Commit 51f84465dd changed the compatility mode setting logic:
- machine reset only sets compatibility mode for the boot CPU
- compatibility mode is set for other CPUs when they are put online
  by the guest with the "start-cpu" RTAS call

This causes a regression for machines started with max-compat-cpu:
the device tree nodes related to secondary CPU cores contain wrong
"cpu-version" and "ibm,pa-features" values, as shown below.

Guest started on a POWER8 host with:
     -smp cores=2 -machine pseries,max-cpu-compat=compat7

                        ibm,pa-features = [18 00 f6 3f c7 c0 80 f0 80 00
 00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00];
                        cpu-version = <0x4d0200>;

                               ^^^
                        second CPU core

                        ibm,pa-features = <0x600f63f 0xc70080c0>;
                        cpu-version = <0xf000003>;

                               ^^^
                          boot CPU core

The second core is advertised in raw POWER8 mode. This happens because
CAS assumes all CPUs to have the same compatibility mode. Since the
boot CPU already has the requested compatibility mode, the CAS code
does not set it for the secondary one, and exposes the bogus device
tree properties in in the CAS response to the guest.

A similar situation is observed when hot-plugging a CPU core. The
related device tree properties are generated and exposed to guest
with the "ibm,configure-connector" RTAS before "start-cpu" is called.
The CPU core is advertised to the guest in raw mode as well.

It both cases, it boils down to the fact that "start-cpu" happens too
late. This can be fixed globally by propagating the compatibility mode
of the boot CPU to the other CPUs during reset.  For this to work, the
compatibility mode of the boot CPU must be set before the machine code
actually resets all CPUs.

It is not needed to set the compatibility mode in "start-cpu" anymore,
so the code is dropped.

Fixes: 51f84465dd
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 9012a53f06)
 Conflicts:
	hw/ppc/spapr_cpu_core.c
	hw/ppc/spapr_rtas.c
* drop context dep on d6322252b3
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:55:26 -06:00
Jose Ricardo Ziviani a1f33a5b93 ppc: Change Power9 compat table to support at most 8 threads/core
Increases the max smt mode to 8 for Power9. That's because KVM supports
smt emulation in this platform so QEMU should allow users to use it as
well.

Today if we try to pass -smp ...,threads=8, QEMU will silently truncate
it to smt4 mode and may cause a crash if we try to perform a cpu
hotplug.

Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
[dwg: Added an explanatory comment]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

(cherry picked from commit 03ee51d354)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:55:26 -06:00
Suraj Jitindar Singh 6a47136799 hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representation
Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.

Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.

On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.

This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4e5fe3688e)
 Conflicts:
	include/hw/ppc/spapr.h
*drop context dep on 60c6823b9b
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:54:33 -06:00
David Gibson e4f4fa00eb spapr: Handle Decimal Floating Point (DFP) as an optional capability
Decimal Floating Point has been available on POWER7 and later (server)
cpus.  However, it can be disabled on the hypervisor, meaning that it's
not available to guests.

We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host.  That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.

This patch handles it by treating it as an optional capability for the
pseries machine type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 2d1fb9bc8e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:53:54 -06:00
David Gibson ff6f7e10c6 spapr: Handle VMX/VSX presence as an spapr capability flag
We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.

This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities.  We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.

Rework the advertisement of VSX and VMX based on a new VSX capability.  We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.

NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 2938664286)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:53:46 -06:00
David Gibson 7c578cbc37 target/ppc: Clean up probing of VMX, VSX and DFP availability on KVM
When constructing the "host" cpu class we modify whether the VMX and VSX
vector extensions and DFP (Decimal Floating Point) are available
based on whether KVM can support those instructions.  This can depend on
policy in the host kernel as well as on the actual host cpu capabilities.

However, the way we probe for this is not very nice: we explicitly check
the host's device tree.  That works in practice, but it's not really
correct, since the device tree is a property of the host kernel's platform
which we don't really know about.  We get away with it because the only
modern POWER platforms happen to encode VMX, VSX and DFP availability in
the device tree in the same way.

Arguably we should have an explicit KVM capability for this, but we haven't
needed one so far.  Barring specific KVM policies which don't yet exist,
each of these instruction classes will be available in the guest if and
only if they're available in the qemu userspace process.  We can determine
that from the ELF AUX vector we're supplied with.

Once reworked like this, there are no more callers for kvmppc_get_vmx() and
kvmppc_get_dfp() so remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 3f2ca480eb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:53:42 -06:00
David Gibson 804e5ea9ed spapr: Validate capabilities on migration
Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration.  Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.

This adds code to migrate the capabilities flags.  These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence.  However,
they are checked against the destination capabilities.

If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour.  If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit be85537d65)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:53:31 -06:00
David Gibson 9070f408f4 spapr: Treat Hardware Transactional Memory (HTM) as an optional capability
This adds an spapr capability bit for Hardware Transactional Memory.  It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it).  However it is disabled by default for the latest pseries-2.12
machine type.

This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:

 * This way running with -M pseries,accel=tcg will start with whatever cpu
   and will provide the same guest visible model as with accel=kvm.
     - More specifically, this means existing make check tests don't have
       to be modified to use cap-htm=off in order to run with TCG

 * We hope to add a new "HTM without suspend" feature in the not too
   distant future which could work on both POWER8 and POWER9 cpus, and
   could be enabled by default.

 * Best guesses suggest that future POWER cpus may well only support the
   HTM-without-suspend model, not the (frankly, horribly overcomplicated)
   POWER8 style HTM with suspend.

 * Anecdotal evidence suggests problems with HTM being enabled when it
   wasn't wanted are more common than being missing when it was.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit ee76a09fc7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:53:23 -06:00
David Gibson 78a38cd47e spapr: Capabilities infrastructure
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor.  PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.

In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.

Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.

We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability.  But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.

This introduces an infrastructure for defining "sPAPR capabilities".  These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.

The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations).  If not we simply fail, rather than silently modifying the
advertised featureset to the guest.

This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 33face6b89)
 Conflicts:
	include/hw/ppc/spapr.h
*drop context dep on 60c6823b9b
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:52:49 -06:00
David Gibson 0fac4aa930 spapr: Add pseries-2.12 machine type
While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2b6154120c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-05 18:52:01 -06:00
Laurent Vivier 97d17551b5 spapr: don't initialize PATB entry if max-cpu-compat < power9
if KVM is enabled and KVM capabilities MMU radix is available,
the partition table entry (patb_entry) for the radix mode is
initialized by default in ppc_spapr_reset().

It's a problem if we want to migrate the guest to a POWER8 host
while the kernel is not started to set the value to the one
expected for a POWER8 CPU.

The "-machine max-cpu-compat=power8" should allow to migrate
a POWER9 KVM host to a POWER8 KVM host, but because patb_entry
is set, the destination QEMU tries to enable radix mode on the
POWER8 host. This fails and cancels the migration:

    Process table config unsupported by the host
    error while loading state for instance 0x0 of device 'spapr'
    load of migration failed: Invalid argument

This patch doesn't set the PATB entry if the user provides
a CPU compatibility mode that doesn't support radix mode.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1481fe5fcf)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-04 23:38:26 -06:00
Peter Maydell 7e25155a7b linux-user/signal.c: Rename MC_* defines
The SPARC code in linux-user/signal.c defines a set of
MC_* constants. On some SPARC hosts these are also defined
by sys/ucontext.h, resulting in build failures:

linux-user/signal.c:2786:0: error: "MC_NGREG" redefined [-Werror]
 #define MC_NGREG 19

In file included from /usr/include/signal.h:302:0,
                 from include/qemu/osdep.h:86,
                 from linux-user/signal.c:19:
/usr/include/sparc64-linux-gnu/sys/ucontext.h:59:0: note: this is the location of the previous definition
 # define MC_NGREG __MC_NGREG

Rename all these constants to SPARC_MC_* to avoid the clash.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1517318239-15764-1-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 8ebb314b95)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-01 15:22:33 -06:00
Greg Kurz 80277d7cd5 spapr_pci: fix MSI/MSIX selection
In various place we don't correctly check if the device supports MSI or
MSI-X. This can cause devices to be advertised with MSI support, even
if they only support MSI-X (like virtio-pci-* devices for example):

                ethernet@0 {
                        ibm,req#msi = <0x1>; <--- wrong!
			.
			ibm,loc-code = "qemu_virtio-net-pci:0000:00:00.0";
			.
			ibm,req#msi-x = <0x3>;
                };

Worse, this can also cause the "ibm,change-msi" RTAS call to corrupt the
PCI status and cause migration to fail:

  qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x6
    read: 0 device: 10 cmask: 10 wmask: 0 w1cmask:0
                              ^^
           PCI_STATUS_CAP_LIST bit which is assumed to be constant

This patch changes spapr_populate_pci_child_dt() to properly check for
MSI support using msi_present(): this ensures that PCIDevice::msi_cap
was set by msi_init() and that msi_nr_vectors_allocated() will look at
the right place in the config space.

Checking PCIDevice::msix_entries_nr is enough for MSI-X but let's add
a call to msix_present() there as well for consistency.

It also changes rtas_ibm_change_msi() to select the appropriate MSI
type in Function 1 instead of always selecting plain MSI. This new
behaviour is compliant with LoPAPR 1.1, as described in "Table 71.
ibm,change-msi Argument Call Buffer":

  Function 1: If Number Outputs is equal to 3, request to set to a new
           number of MSIs (including set to 0).
           If the “ibm,change-msix-capable” property exists and Number
           Outputs is equal to 4, request is to set to a new number of
           MSI or MSI-X (platform choice) interrupts (including set to
           0).

Since MSI is the the platform default (LoPAPR 6.2.3 MSI Option), let's
check for MSI support first.

And finally, it checks the input parameters are valid, as described in
LoPAPR 1.1 "R1–7.3.10.5.1–3":

  For the MSI option: The platform must return a Status of -3 (Parameter
  error) from ibm,change-msi, with no change in interrupt assignments if
  the PCI configuration address does not support MSI and Function 3 was
  requested (that is, the “ibm,req#msi” property must exist for the PCI
  configuration address in order to use Function 3), or does not support
  MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property
  must exist for the PCI configuration address in order to use Function 4),
  or if neither MSIs nor MSI-Xs are supported and Function 1 is requested.

This ensures that the ret_intr_type variable contains a valid MSI type
for this device, and that spapr_msi_setmsg() won't corrupt the PCI status.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 9cbe305b60)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-02-01 15:21:29 -06:00
Fam Zheng 50c6998c20 usb-storage: Fix share-rw option parsing
Because usb-storage creates an internal scsi device, we should propagate
options. We already do so for bootindex etc, but failed to take care of
share-rw. Fix it in an apparent way: add a new parameter to
scsi_bus_legacy_add_drive and pass in s->conf.share_rw.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-id: 20180117005222.4781-1-famz@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 395b953959)
 Conflicts:
	hw/usb/dev-storage.c
* dropped context dep on ceff3e1f01
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 09:27:47 -06:00
Fam Zheng dccdaacc3d osdep: Retry SETLK upon EINTR
We could hit lock failure if there is a signal that makes fcntl return
-1 and errno set to EINTR. In this case we should retry.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit f86428a1f4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 09:14:03 -06:00
Christian Borntraeger 5683983e99 s390x/kvm: provide stfle.81
stfle.81 (ppa15) is a transparent facility that can be passed to the
guest without the need to implement hypervisor support. As this feature
can be provided by firmware we add it to all full models.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180118085628.40798-4-borntraeger@de.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 9f0d13f4f1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 08:33:00 -06:00
Christian Borntraeger 4d79c0e434 s390x/kvm: Handle bpb feature
We need to handle the bpb control on reset and migration. Normally
stfle.82 is transparent (and the normal guest part works without
hypervisor activity). To prevent any issues we require full
host kernel support for this feature.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180118085628.40798-3-borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[CH: 'Branch Prediction Blocking' -> 'Branch prediction blocking']
Signed-off-by: Cornelia Huck <cohuck@redhat.com>

(cherry picked from commit b073c87517)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 08:32:50 -06:00
Cornelia Huck 098132386d linux-headers: update
Update headers against 4.15-rc9.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 9cbb636270)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 08:32:23 -06:00
Eric Auger 61c8e67a66 linux-headers: update to 4.15-rc1
Update headers against v4.15-rc1.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1511883692-11511-4-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit dd8739669f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 08:32:18 -06:00
Claudio Imbrenda e7857ad997 s390x: fix storage attributes migration for non-small guests
Fix storage attribute migration so that it does not fail for guests
with more than a few GB of RAM.
With such guests, the index in the buffer would go out of bounds,
usually by large amounts, thus receiving -EFAULT from the kernel.
Migration itself would be successful, but storage attributes would then
not be migrated completely.

This patch fixes the out of bounds access, and thus migration of all
storage attributes when the guest have large amounts of memory.

Cc: qemu-stable@nongnu.org
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Fixes: 903fd80b03 ("s390x/migration: Storage attributes device")
Message-Id: <1516297904-18188-1-git-send-email-imbrenda@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 46fa893355)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 08:28:08 -06:00
Peter Maydell 9327a8e2d6 linux-user: Fix locking order in fork_start()
Our locking order is that the tb lock should be taken
inside the mmap_lock, but fork_start() grabs locks the
other way around. This means that if a heavily multithreaded
guest process (such as Java) calls fork() it can deadlock,
with the thread that called fork() stuck in fork_start()
with the tb lock and waiting for the mmap lock, but some
other thread in tb_find() with the mmap lock and waiting
for the tb lock. The cpu_list_lock() should also always be
taken last, not first.

Fix this by making fork_start() grab the locks in the
right order. The order in which we drop locks doesn't
matter, so we leave fork_end() the way it is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1512397331-15238-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 024949caf3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-29 08:27:35 -06:00
Eduardo Habkost 8ebfafa796 i386: Add EPYC-IBPB CPU model
EPYC-IBPB is a copy of the EPYC CPU model with
just CPUID_8000_0008_EBX_IBPB added.

Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-7-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 6cfbc54e89)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 17:08:04 -06:00
Eduardo Habkost 61efbbf869 i386: Add new -IBRS versions of Intel CPU models
The new MSR IA32_SPEC_CTRL MSR was introduced by a recent Intel
microcode updated and can be used by OSes to mitigate
CVE-2017-5715.  Unfortunately we can't change the existing CPU
models without breaking existing setups, so users need to
explicitly update their VM configuration to use the new *-IBRS
CPU model if they want to expose IBRS to guests.

The new CPU models are simple copies of the existing CPU models,
with just CPUID_7_0_EDX_SPEC_CTRL added and model_id updated.

Cc: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit ac96c41354)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 17:08:04 -06:00
Eduardo Habkost 1ade973f52 i386: Add FEAT_8000_0008_EBX CPUID feature word
Add the new feature word and the "ibpb" feature flag.

Based on a patch by Paolo Bonzini.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-5-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 1b3420e1c4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 17:08:04 -06:00
Eduardo Habkost 803d42fa65 i386: Add spec-ctrl CPUID bit
Add the feature name and a CPUID_7_0_EDX_SPEC_CTRL macro.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit a2381f0934)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 17:08:04 -06:00
Paolo Bonzini cb2637a5ae i386: Add support for SPEC_CTRL MSR
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-3-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit a33a2cfe2f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 17:08:04 -06:00
Eduardo Habkost 4b220d88ba i386: Change X86CPUDefinition::model_id to const char*
It is valid to have a 48-character model ID on CPUID, however the
definition of X86CPUDefinition::model_id is char[48], which can
make the compiler drop the null terminator from the string.

If a CPU model happens to have 48 bytes on model_id, "-cpu help"
will print garbage and the object_property_set_str() call at
x86_cpu_load_def() will read data outside the model_id array.

We could increase the array size to 49, but this would mean the
compiler would not issue a warning if a 49-char string is used by
mistake for model_id.

To make things simpler, simply change model_id to be const char*,
and validate the string length using an assert() on
x86_register_cpudef_type().

Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 807e9869b8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 17:08:04 -06:00
Marcel Apfelbaum 1027f3419b hw/pci-bridge: fix QEMU crash because of pcie-root-port
If we try to use more pcie_root_ports then available slots
and an IO hint is passed to the port, QEMU crashes because
we try to init the "IO hint" capability even if the device
is not created.
Fix it by checking for error before adding the capability,
so QEMU can fail gracefully.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fced4d00e6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 16:45:21 -06:00
Stefan Hajnoczi ccf82aee58 scsi-disk: release AioContext in unaligned WRITE SAME case
scsi_write_same_complete() can retry the write if the request was
unaligned.  Make sure to release the AioContext when that code path is
taken!

This patch fixes a hang when QEMU terminates after an unaligned WRITE
SAME request has been processed with dataplane.  The hang occurs because
iothread_stop_all() cannot acquire the AioContext lock that was leaked
by the IOThread in scsi_write_same_complete().

Fixes: b9e413dd37 ("block: explicitly acquire aiocontext in aio callbacks that need it").
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reported-by: Cong Li <coli@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180104142502.15175-1-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 24355b79bd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 16:35:01 -06:00
Peter Maydell 1e14820884 hw/sd/ssi-sd: Reset SD card on controller reset
Since ssi-sd is still using the legacy SD card API, the SD
card created by sd_init() is not plugged into any bus. This
means that the controller has to reset it manually.

Failing to do this mostly didn't affect the guest since the
guest typically does a programmed SD card reset as part of
its SD controller driver initialization, but meant that
migration failed because it's only in sd_reset() that we
set up the wpgrps_size field.

In the case of sd-ssi, we have to implement an entire
reset function since there wasn't one previously, and
that requires a QOM cast macro that got omitted when this
device was QOMified.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1515506513-31961-4-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 8046d44f3c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 16:34:00 -06:00
Peter Maydell a22b8096b6 hw/sd/milkymist-memcard: Reset SD card on controller reset
Since milkymist-memcard is still using the legacy SD card API,
the SD card created by sd_init() is not plugged into any bus.
This means that the controller has to reset it manually.

Failing to do this mostly didn't affect the guest since the
guest typically does a programmed SD card reset as part of
its SD controller driver initialization, but meant that
migration failed because it's only in sd_reset() that we
set up the wpgrps_size field.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1515506513-31961-3-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 16bf0e0e7a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 16:33:50 -06:00
Peter Maydell 3f44190cb6 hw/sd/pl181: Reset SD card on controller reset
Since pl181 is still using the legacy SD card API, the SD
card created by sd_init() is not plugged into any bus. This
means that the controller has to reset it manually.

Failing to do this mostly didn't affect the guest since the
guest typically does a programmed SD card reset as part of
its SD controller driver initialization, but meant that
migration failed because it's only in sd_reset() that we
set up the wpgrps_size field.

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1739378
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1515506513-31961-2-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 0cb57cc701)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 16:33:39 -06:00
Jay Zhou 88bf4a70df vhost: remove assertion to prevent crash
QEMU will assert on vhost-user backed virtio device hotplug if QEMU is
using more RAM regions than VHOST_MEMORY_MAX_NREGIONS (for example if
it were started with a lot of DIMM devices).

Fix it by returning error instead of asserting and let callers of
vhost_set_mem_table() handle error condition gracefully.

Cc: qemu-stable@nongnu.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f4bf56fb78)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-23 16:30:56 -06:00
Michael S. Tsirkin f793121539 virtio_error: don't invoke status callbacks
Backends don't need to know what frontend requested a reset,
and notifying then from virtio_error is messy because
virtio_error itself might be invoked from backend.

Let's just set the status directly.

Cc: qemu-stable@nongnu.org
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 8fc47c876d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-15 18:18:05 -06:00
Peter Maydell 0af294d774 hw/intc/arm_gic: reserved register addresses are RAZ/WI
The GICv2 specification says that reserved register addresses
must RAZ/WI; now that we implement external abort handling
for Arm CPUs this means we must return MEMTX_OK rather than
MEMTX_ERROR, to avoid generating a spurious guest data abort.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1513183941-24300-3-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
(cherry picked from commit 0cf0985201)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-11 15:10:57 -06:00
Peter Maydell 62425350b5 hw/intc/arm_gicv3: Make reserved register addresses RAZ/WI
The GICv3 specification says that reserved register addresses
should RAZ/WI. This means we need to return MEMTX_OK, not MEMTX_ERROR,
because now that we support generating external aborts the
latter will cause an abort on new board models.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1513183941-24300-2-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
(cherry picked from commit f1945632b4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-11 15:10:48 -06:00
Alex Williamson d6f1448277 vfio: Fix vfio-kvm group registration
Commit 8c37faa475 ("vfio-pci, ppc64/spapr: Reorder group-to-container
attaching") moved registration of groups with the vfio-kvm device from
vfio_get_group() to vfio_connect_container(), but it missed the case
where a group is attached to an existing container and takes an early
exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
(without viommu) all groups are connected to the same container and
thus only the first group gets registered with the vfio-kvm device.
This becomes a problem if we then hot-unplug the devices associated
with that first group and we end up with KVM being misinformed about
any vfio connections that might remain.  Fix by including the call to
vfio_kvm_device_add_group() in this early exit path.

Fixes: 8c37faa475 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
Cc: qemu-stable@nongnu.org # qemu-2.10+
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
(cherry picked from commit 2016986aed)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-09 11:09:36 -06:00
Fam Zheng 7f53a81073 block: Open backing image in force share mode for size probe
Management tools create overlays of running guests with qemu-img:

  $ qemu-img create -b /image/in/use.qcow2 -f qcow2 /overlay/image.qcow2

but this doesn't work anymore due to image locking:

    qemu-img: /overlay/image.qcow2: Failed to get shared "write" lock
    Is another process using the image?
    Could not open backing image to determine size.
Use the force share option to allow this use case again.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit cc954f01e3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-09 10:49:44 -06:00
Kevin Wolf 2b0c34cf61 block: Call .drain_begin only once in bdrv_drain_all_begin()
bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver
callback inside its polling loop. This means that how many times it got
called for each node depended on long it had to poll the event loop.

This is obviously not right and results in nodes that stay drained even
after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per
node.

Fix bdrv_drain_all_begin() to call the callback only once, too.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 2da9b7d456)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-09 10:49:13 -06:00
Kevin Wolf 057364da77 block: Make bdrv_drain_invoke() recursive
This change separates bdrv_drain_invoke(), which calls the BlockDriver
drain callbacks, from bdrv_drain_recurse(). Instead, the function
performs its own recursion now.

One reason for this is that bdrv_drain_recurse() can be called multiple
times by bdrv_drain_all_begin(), but the callbacks may only be called
once. The separation is necessary to fix this bug.

The other reason is that we intend to go to a model where we call all
driver callbacks first, and only then start polling. This is not fully
achieved yet with this patch, as bdrv_drain_invoke() contains a
BDRV_POLL_WHILE() loop for the block driver callbacks, which can still
call callbacks for any unrelated event. It's a step in this direction
anyway.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit db0289b9b2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-09 10:48:39 -06:00
Murilo Opsfelder Araujo b9da3c1de7 block/nbd: fix segmentation fault when .desc is not null-terminated
The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:

    #0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
    #1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
    #2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
    #3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
    #4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
    #5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395

>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:

    >>> p desc[5]
    $8 = {
      name = 0x1037f098 "R27A",
      type = 1561964883,
      help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
      def_value_str = 0x2 <error: Cannot access memory at address 0x2>
    }
    >>> p desc[6]
    $9 = {
      name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
      type = 272101528,
      help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
      def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
    }

This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1727259
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com>
CC: qemu-stable@nongnu.org
Fixes: 7ccc44fd7d
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit c4365735a7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-09 10:41:11 -06:00
Paolo Bonzini c184e17c75 qemu-pr-helper: miscellaneous fixes
1) Return a generic sense if TEST UNIT READY does not provide one;

2) Fix two mistakes in copying from the spec.

Cc: qemu-stable@nongnu.org
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a4a9b6eaf3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-08 07:52:41 -06:00
Markus Armbruster 5ba945f1cb qemu-options: Remove stray colons from output of --help
Commit 43f187a broke --help: it put colons into blank lines.  It
removed the colon from DEFHEADING(TITLE:) and added it back in the
macro expansion of DEFHEADING(TITLE), so hxtool can emit "@subsection
TITLE" more easily.  Trouble is it's added back even for the blank
lines made with DEFHEADING().

Put the colons back where they were before commit 43f187a, and strip
them in hxtool instead.

Cc: Paolo Bonzini <pbonzini@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20171002140307.5292-2-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit de6b4f908c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-08 07:52:41 -06:00
Alex Bennée ea311a9959 target/sh4: fix TCG leak during gusa sequence
This fixes bug #1735384 while running java under qemu-sh4. When debug
was enabled it showed a problem with TCG temps. Once fixed I was able
to run java -version normally.

Cc: qemu-stable@nongnu.org
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20171206093050.25308-1-alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit 6d56fc6cc3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-08 07:52:41 -06:00
Peter Lieven fd89d93e85 block/iscsi: dont leave allocmap in an invalid state on UNMAP failure
we forgot to set the allocmap to invalid if an UNMAP call fails.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1512733868-9009-2-git-send-email-pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit aef172ffdc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-08 07:40:25 -06:00
Peter Maydell 817a9fcba8 target/i386: Fix handling of VEX prefixes
In commit e3af7c788b we
replaced direct calls to to cpu_ld*_code() with calls
to the x86_ld*_code() wrappers which incorporate an
advance of s->pc. Unfortunately we didn't notice that
in one place the old code was deliberately not incrementing
s->pc:

@@ -4501,7 +4528,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             static const int pp_prefix[4] = {
                 0, PREFIX_DATA, PREFIX_REPZ, PREFIX_REPNZ
             };
-            int vex3, vex2 = cpu_ldub_code(env, s->pc);
+            int vex3, vex2 = x86_ldub_code(env, s);

             if (!CODE64(s) && (vex2 & 0xc0) != 0xc0) {
                 /* 4.1.4.6: In 32-bit mode, bits [7:6] must be 11b,

This meant we were mishandling this set of instructions.
Remove the manual advance of s->pc for the "is VEX" case
(which is now done by x86_ldub_code()) and instead rewind
PC in the case where we decide that this isn't really VEX.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reported-by: Alexandro Sanchez Bach <alexandro@phi.nz>
Message-Id: <1513163959-17545-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cfcca361d7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-01-08 07:39:13 -06:00
200 changed files with 4207 additions and 1379 deletions

View File

@ -1680,6 +1680,12 @@ F: include/sysemu/replay.h
F: docs/replay.txt
F: stubs/replay.c
IOVA Tree
M: Peter Xu <peterx@redhat.com>
S: Maintained
F: include/qemu/iova-tree.h
F: util/iova-tree.c
Usermode Emulation
------------------
Overall

View File

@ -1 +1 @@
2.11.0
2.11.2

20
block.c
View File

@ -1596,13 +1596,24 @@ static int bdrv_reopen_get_flags(BlockReopenQueue *q, BlockDriverState *bs)
/* Returns whether the image file can be written to after the reopen queue @q
* has been successfully applied, or right now if @q is NULL. */
static bool bdrv_is_writable(BlockDriverState *bs, BlockReopenQueue *q)
static bool bdrv_is_writable_after_reopen(BlockDriverState *bs,
BlockReopenQueue *q)
{
int flags = bdrv_reopen_get_flags(q, bs);
return (flags & (BDRV_O_RDWR | BDRV_O_INACTIVE)) == BDRV_O_RDWR;
}
/*
* Return whether the BDS can be written to. This is not necessarily
* the same as !bdrv_is_read_only(bs), as inactivated images may not
* be written to but do not count as read-only images.
*/
bool bdrv_is_writable(BlockDriverState *bs)
{
return bdrv_is_writable_after_reopen(bs, NULL);
}
static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
BdrvChild *c, const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
@ -1640,7 +1651,7 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
/* Write permissions never work with read-only images */
if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
!bdrv_is_writable(bs, q))
!bdrv_is_writable_after_reopen(bs, q))
{
error_setg(errp, "Block node is read-only");
return -EPERM;
@ -1930,7 +1941,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
&perm, &shared);
/* Format drivers may touch metadata even if the guest doesn't write */
if (bdrv_is_writable(bs, reopen_queue)) {
if (bdrv_is_writable_after_reopen(bs, reopen_queue)) {
perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
}
@ -4593,10 +4604,11 @@ void bdrv_img_create(const char *filename, const char *fmt,
back_flags = flags;
back_flags &= ~(BDRV_O_RDWR | BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
backing_options = qdict_new();
if (backing_fmt) {
backing_options = qdict_new();
qdict_put_str(backing_options, "driver", backing_fmt);
}
qdict_put_bool(backing_options, BDRV_OPT_FORCE_SHARE, true);
bs = bdrv_open(full_backing, NULL, backing_options, back_flags,
&local_err);

View File

@ -1694,6 +1694,7 @@ static int raw_regular_truncate(int fd, int64_t offset, PreallocMode prealloc,
case PREALLOC_MODE_FULL:
{
int64_t num = 0, left = offset - current_length;
off_t seek_result;
/*
* Knowing the final size from the beginning could allow the file
@ -1708,8 +1709,8 @@ static int raw_regular_truncate(int fd, int64_t offset, PreallocMode prealloc,
buf = g_malloc0(65536);
result = lseek(fd, current_length, SEEK_SET);
if (result < 0) {
seek_result = lseek(fd, current_length, SEEK_SET);
if (seek_result < 0) {
result = -errno;
error_setg_errno(errp, -result,
"Failed to seek to the old end of file");

View File

@ -164,7 +164,12 @@ static QemuOptsList runtime_unix_opts = {
{
.name = GLUSTER_OPT_SOCKET,
.type = QEMU_OPT_STRING,
.help = "socket file path)",
.help = "socket file path (legacy)",
},
{
.name = GLUSTER_OPT_PATH,
.type = QEMU_OPT_STRING,
.help = "socket file path (QAPI)",
},
{ /* end of list */ }
},
@ -612,10 +617,18 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
goto out;
}
ptr = qemu_opt_get(opts, GLUSTER_OPT_SOCKET);
ptr = qemu_opt_get(opts, GLUSTER_OPT_PATH);
if (!ptr) {
ptr = qemu_opt_get(opts, GLUSTER_OPT_SOCKET);
} else if (qemu_opt_get(opts, GLUSTER_OPT_SOCKET)) {
error_setg(&local_err,
"Conflicting parameters 'path' and 'socket'");
error_append_hint(&local_err, GERR_INDEX_HINT, i);
goto out;
}
if (!ptr) {
error_setg(&local_err, QERR_MISSING_PARAMETER,
GLUSTER_OPT_SOCKET);
GLUSTER_OPT_PATH);
error_append_hint(&local_err, GERR_INDEX_HINT, i);
goto out;
}
@ -680,7 +693,7 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
"file.server.0.host=1.2.3.4,"
"file.server.0.port=24007,"
"file.server.1.transport=unix,"
"file.server.1.socket=/var/run/glusterd.socket ..."
"file.server.1.path=/var/run/glusterd.socket ..."
"\n");
errno = -ret;
return NULL;

View File

@ -175,8 +175,10 @@ static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
bdrv_wakeup(bs);
}
/* Recursively call BlockDriver.bdrv_co_drain_begin/end callbacks */
static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
{
BdrvChild *child, *tmp;
BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
@ -187,6 +189,10 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
bdrv_coroutine_enter(bs, data.co);
BDRV_POLL_WHILE(bs, !data.done);
QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
bdrv_drain_invoke(child->bs, begin);
}
}
static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
@ -194,9 +200,6 @@ static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
BdrvChild *child, *tmp;
bool waited;
/* Ensure any pending metadata writes are submitted to bs->file. */
bdrv_drain_invoke(bs, begin);
/* Wait for drained requests to finish */
waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
@ -279,6 +282,7 @@ void bdrv_drained_begin(BlockDriverState *bs)
bdrv_parent_drained_begin(bs);
}
bdrv_drain_invoke(bs, true);
bdrv_drain_recurse(bs, true);
}
@ -294,6 +298,7 @@ void bdrv_drained_end(BlockDriverState *bs)
}
bdrv_parent_drained_end(bs);
bdrv_drain_invoke(bs, false);
bdrv_drain_recurse(bs, false);
aio_enable_external(bdrv_get_aio_context(bs));
}
@ -350,6 +355,7 @@ void bdrv_drain_all_begin(void)
aio_context_acquire(aio_context);
bdrv_parent_drained_begin(bs);
aio_disable_external(aio_context);
bdrv_drain_invoke(bs, true);
aio_context_release(aio_context);
if (!g_slist_find(aio_ctxs, aio_context)) {
@ -393,6 +399,7 @@ void bdrv_drain_all_end(void)
aio_context_acquire(aio_context);
aio_enable_external(aio_context);
bdrv_parent_drained_end(bs);
bdrv_drain_invoke(bs, false);
bdrv_drain_recurse(bs, false);
aio_context_release(aio_context);
}

View File

@ -2,7 +2,7 @@
* QEMU Block driver for iSCSI images
*
* Copyright (c) 2010-2011 Ronnie Sahlberg <ronniesahlberg@gmail.com>
* Copyright (c) 2012-2016 Peter Lieven <pl@kamp.de>
* Copyright (c) 2012-2017 Peter Lieven <pl@kamp.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
@ -1128,6 +1128,9 @@ retry:
goto retry;
}
iscsi_allocmap_set_invalid(iscsilun, offset >> BDRV_SECTOR_BITS,
bytes >> BDRV_SECTOR_BITS);
if (iTask.status == SCSI_STATUS_CHECK_CONDITION) {
/* the target might fail with a check condition if it
is not happy with the alignment of the UNMAP request
@ -1140,9 +1143,6 @@ retry:
goto out_unlock;
}
iscsi_allocmap_set_invalid(iscsilun, offset >> BDRV_SECTOR_BITS,
bytes >> BDRV_SECTOR_BITS);
out_unlock:
qemu_mutex_unlock(&iscsilun->mutex);
return r;

View File

@ -846,9 +846,6 @@ int nbd_client_init(BlockDriverState *bs,
if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
}
if (client->info.min_block > bs->bl.request_alignment) {
bs->bl.request_alignment = client->info.min_block;
}
qemu_co_mutex_init(&client->send_mutex);
qemu_co_queue_init(&client->free_sema);

View File

@ -388,6 +388,7 @@ static QemuOptsList nbd_runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "ID of the TLS credentials to use",
},
{ /* end of list */ }
},
};
@ -473,8 +474,10 @@ static int nbd_co_flush(BlockDriverState *bs)
static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
{
NBDClientSession *s = nbd_get_client_session(bs);
uint32_t min = s->info.min_block;
uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);
bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
bs->bl.max_pdiscard = max;
bs->bl.max_pwrite_zeroes = max;
bs->bl.max_transfer = max;

View File

@ -4235,7 +4235,7 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t offset,
char *message;
va_list ap;
fatal = fatal && !bs->read_only;
fatal = fatal && bdrv_is_writable(bs);
if (s->signaled_corruption &&
(!fatal || (s->incompatible_features & QCOW2_INCOMPAT_CORRUPT)))

View File

@ -167,16 +167,37 @@ static void raw_reopen_abort(BDRVReopenState *state)
state->opaque = NULL;
}
/* Check and adjust the offset, against 'offset' and 'size' options. */
static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset,
uint64_t bytes, bool is_write)
{
BDRVRawState *s = bs->opaque;
if (s->has_size && (*offset > s->size || bytes > (s->size - *offset))) {
/* There's not enough space for the write, or the read request is
* out-of-range. Don't read/write anything to prevent leaking out of
* the size specified in options. */
return is_write ? -ENOSPC : -EINVAL;;
}
if (*offset > INT64_MAX - s->offset) {
return -EINVAL;
}
*offset += s->offset;
return 0;
}
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
{
BDRVRawState *s = bs->opaque;
int ret;
if (offset > UINT64_MAX - s->offset) {
return -EINVAL;
ret = raw_adjust_offset(bs, &offset, bytes, false);
if (ret) {
return ret;
}
offset += s->offset;
BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@ -186,23 +207,11 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
{
BDRVRawState *s = bs->opaque;
void *buf = NULL;
BlockDriver *drv;
QEMUIOVector local_qiov;
int ret;
if (s->has_size && (offset > s->size || bytes > (s->size - offset))) {
/* There's not enough space for the data. Don't write anything and just
* fail to prevent leaking out of the size specified in options. */
return -ENOSPC;
}
if (offset > UINT64_MAX - s->offset) {
ret = -EINVAL;
goto fail;
}
if (bs->probed && offset < BLOCK_PROBE_BUF_SIZE && bytes) {
/* Handling partial writes would be a pain - so we just
* require that guests have 512-byte request alignment if
@ -237,7 +246,10 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
qiov = &local_qiov;
}
offset += s->offset;
ret = raw_adjust_offset(bs, &offset, bytes, true);
if (ret) {
goto fail;
}
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@ -267,22 +279,24 @@ static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int bytes,
BdrvRequestFlags flags)
{
BDRVRawState *s = bs->opaque;
if (offset > UINT64_MAX - s->offset) {
return -EINVAL;
int ret;
ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
if (ret) {
return ret;
}
offset += s->offset;
return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
}
static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
int64_t offset, int bytes)
{
BDRVRawState *s = bs->opaque;
if (offset > UINT64_MAX - s->offset) {
return -EINVAL;
int ret;
ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
if (ret) {
return ret;
}
offset += s->offset;
return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
}

View File

@ -265,13 +265,14 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const char *keypairs_json,
key = qstring_get_str(name);
ret = rados_conf_set(cluster, key, qstring_get_str(value));
QDECREF(name);
QDECREF(value);
if (ret < 0) {
error_setg_errno(errp, -ret, "invalid conf option %s", key);
QDECREF(name);
ret = -EINVAL;
break;
}
QDECREF(name);
}
QDECREF(keypairs);

View File

@ -556,6 +556,7 @@ static QemuOptsList ssh_runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "Defines how and what to check the host key against",
},
{ /* end of list */ }
},
};

View File

@ -35,9 +35,12 @@ static QemuOptsList throttle_opts = {
},
};
static int throttle_configure_tgm(BlockDriverState *bs,
ThrottleGroupMember *tgm,
QDict *options, Error **errp)
/*
* If this function succeeds then the throttle group name is stored in
* @group and must be freed by the caller.
* If there's an error then @group remains unmodified.
*/
static int throttle_parse_options(QDict *options, char **group, Error **errp)
{
int ret;
const char *group_name;
@ -62,8 +65,7 @@ static int throttle_configure_tgm(BlockDriverState *bs,
goto fin;
}
/* Register membership to group with name group_name */
throttle_group_register_tgm(tgm, group_name, bdrv_get_aio_context(bs));
*group = g_strdup(group_name);
ret = 0;
fin:
qemu_opts_del(opts);
@ -74,6 +76,8 @@ static int throttle_open(BlockDriverState *bs, QDict *options,
int flags, Error **errp)
{
ThrottleGroupMember *tgm = bs->opaque;
char *group;
int ret;
bs->file = bdrv_open_child(NULL, options, "file", bs,
&child_file, false, errp);
@ -83,7 +87,14 @@ static int throttle_open(BlockDriverState *bs, QDict *options,
bs->supported_write_flags = bs->file->bs->supported_write_flags;
bs->supported_zero_flags = bs->file->bs->supported_zero_flags;
return throttle_configure_tgm(bs, tgm, options, errp);
ret = throttle_parse_options(options, &group, errp);
if (ret == 0) {
/* Register membership to group with name group_name */
throttle_group_register_tgm(tgm, group, bdrv_get_aio_context(bs));
g_free(group);
}
return ret;
}
static void throttle_close(BlockDriverState *bs)
@ -159,35 +170,36 @@ static void throttle_attach_aio_context(BlockDriverState *bs,
static int throttle_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
{
ThrottleGroupMember *tgm;
int ret;
char *group = NULL;
assert(reopen_state != NULL);
assert(reopen_state->bs != NULL);
reopen_state->opaque = g_new0(ThrottleGroupMember, 1);
tgm = reopen_state->opaque;
return throttle_configure_tgm(reopen_state->bs, tgm, reopen_state->options,
errp);
ret = throttle_parse_options(reopen_state->options, &group, errp);
reopen_state->opaque = group;
return ret;
}
static void throttle_reopen_commit(BDRVReopenState *reopen_state)
{
ThrottleGroupMember *old_tgm = reopen_state->bs->opaque;
ThrottleGroupMember *new_tgm = reopen_state->opaque;
BlockDriverState *bs = reopen_state->bs;
ThrottleGroupMember *tgm = bs->opaque;
char *group = reopen_state->opaque;
throttle_group_unregister_tgm(old_tgm);
g_free(old_tgm);
reopen_state->bs->opaque = new_tgm;
assert(group);
if (strcmp(group, throttle_group_get_name(tgm))) {
throttle_group_unregister_tgm(tgm);
throttle_group_register_tgm(tgm, group, bdrv_get_aio_context(bs));
}
g_free(reopen_state->opaque);
reopen_state->opaque = NULL;
}
static void throttle_reopen_abort(BDRVReopenState *reopen_state)
{
ThrottleGroupMember *tgm = reopen_state->opaque;
throttle_group_unregister_tgm(tgm);
g_free(tgm);
g_free(reopen_state->opaque);
reopen_state->opaque = NULL;
}

5
configure vendored
View File

@ -930,6 +930,8 @@ for opt do
;;
--firmwarepath=*) firmwarepath="$optarg"
;;
--host=*|--build=*|\
--disable-dependency-tracking|\
--sbindir=*|--sharedstatedir=*|\
--oldincludedir=*|--datarootdir=*|--infodir=*|--localedir=*|\
--htmldir=*|--dvidir=*|--pdfdir=*|--psdir=*)
@ -2788,6 +2790,7 @@ if test "$sdl" != "no" ; then
int main( void ) { return SDL_Init (SDL_INIT_VIDEO); }
EOF
sdl_cflags=$($sdlconfig --cflags 2>/dev/null)
sdl_cflags="$sdl_cflags -Wno-undef" # workaround 2.0.8 bug
if test "$static" = "yes" ; then
if $pkg_config $sdlname --exists; then
sdl_libs=$($pkg_config $sdlname --static --libs 2>/dev/null)
@ -3920,7 +3923,7 @@ fi
# check if memfd is supported
memfd=no
cat > $TMPC << EOF
#include <sys/memfd.h>
#include <sys/mman.h>
int main(void)
{

10
cpus.c
View File

@ -843,11 +843,19 @@ void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
return;
}
if (!qemu_in_vcpu_thread() && first_cpu) {
if (qemu_in_vcpu_thread()) {
/* A CPU is currently running; kick it back out to the
* tcg_cpu_exec() loop so it will recalculate its
* icount deadline immediately.
*/
qemu_cpu_kick(current_cpu);
} else if (first_cpu) {
/* qemu_cpu_kick is not enough to kick a halted CPU out of
* qemu_tcg_wait_io_event. async_run_on_cpu, instead,
* causes cpu_thread_is_idle to return false. This way,
* handle_icount_deadline can run.
* If we have no CPUs at all for some reason, we don't
* need to do anything.
*/
async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
}

View File

@ -29,7 +29,7 @@
#include <libfdt.h>
#define FDT_MAX_SIZE 0x10000
#define FDT_MAX_SIZE 0x100000
void *create_device_tree(int *sizep)
{

View File

@ -426,10 +426,20 @@ Standard Cluster Descriptor:
Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a
cluster boundary!
Bit 0 - x-1: Host cluster offset. This is usually _not_ aligned to a
cluster or sector boundary!
x+1 - 61: Compressed size of the images in sectors of 512 bytes
x - 61: Number of additional 512-byte sectors used for the
compressed data, beyond the sector containing the offset
in the previous field. Some of these sectors may reside
in the next contiguous host cluster.
Note that the compressed data does not necessarily occupy
all of the bytes in the final sector; rather, decompression
stops when it has produced a cluster of data.
Another compressed cluster may map to the tail of the final
sector used by this compressed cluster.
If a cluster is unallocated, read requests shall read the data from the backing
file (except if bit 0 in the Standard Cluster Descriptor is set). If there is

92
exec.c
View File

@ -1455,6 +1455,7 @@ static int find_max_supported_pagesize(Object *obj, void *opaque)
mem_path = object_property_get_str(obj, "mem-path", NULL);
if (mem_path) {
long hpsize = qemu_mempath_getpagesize(mem_path);
g_free(mem_path);
if (hpsize < *hpsize_min) {
*hpsize_min = hpsize;
}
@ -2575,6 +2576,8 @@ static const MemoryRegionOps watch_mem_ops = {
},
};
static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len);
static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
const uint8_t *buf, int len);
static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
@ -3005,6 +3008,7 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
return result;
}
/* Called from RCU critical section. */
static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
const uint8_t *buf, int len)
{
@ -3013,25 +3017,14 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
MemoryRegion *mr;
MemTxResult result = MEMTX_OK;
if (len > 0) {
rcu_read_lock();
l = len;
mr = flatview_translate(fv, addr, &addr1, &l, true);
result = flatview_write_continue(fv, addr, attrs, buf, len,
addr1, l, mr);
rcu_read_unlock();
}
l = len;
mr = flatview_translate(fv, addr, &addr1, &l, true);
result = flatview_write_continue(fv, addr, attrs, buf, len,
addr1, l, mr);
return result;
}
MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs,
const uint8_t *buf, int len)
{
return flatview_write(address_space_to_flatview(as), addr, attrs, buf, len);
}
/* Called within RCU critical section. */
MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
@ -3102,42 +3095,61 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
return result;
}
MemTxResult flatview_read_full(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len)
/* Called from RCU critical section. */
static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len)
{
hwaddr l;
hwaddr addr1;
MemoryRegion *mr;
l = len;
mr = flatview_translate(fv, addr, &addr1, &l, false);
return flatview_read_continue(fv, addr, attrs, buf, len,
addr1, l, mr);
}
MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len)
{
MemTxResult result = MEMTX_OK;
FlatView *fv;
if (len > 0) {
rcu_read_lock();
l = len;
mr = flatview_translate(fv, addr, &addr1, &l, false);
result = flatview_read_continue(fv, addr, attrs, buf, len,
addr1, l, mr);
fv = address_space_to_flatview(as);
result = flatview_read(fv, addr, attrs, buf, len);
rcu_read_unlock();
}
return result;
}
static MemTxResult flatview_rw(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
uint8_t *buf, int len, bool is_write)
MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs,
const uint8_t *buf, int len)
{
if (is_write) {
return flatview_write(fv, addr, attrs, (uint8_t *)buf, len);
} else {
return flatview_read(fv, addr, attrs, (uint8_t *)buf, len);
MemTxResult result = MEMTX_OK;
FlatView *fv;
if (len > 0) {
rcu_read_lock();
fv = address_space_to_flatview(as);
result = flatview_write(fv, addr, attrs, buf, len);
rcu_read_unlock();
}
return result;
}
MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len, bool is_write)
MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
uint8_t *buf, int len, bool is_write)
{
return flatview_rw(address_space_to_flatview(as),
addr, attrs, buf, len, is_write);
if (is_write) {
return address_space_write(as, addr, attrs, buf, len);
} else {
return address_space_read_full(as, addr, attrs, buf, len);
}
}
void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
@ -3303,14 +3315,12 @@ static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
MemoryRegion *mr;
hwaddr l, xlat;
rcu_read_lock();
while (len > 0) {
l = len;
mr = flatview_translate(fv, addr, &xlat, &l, is_write);
if (!memory_access_is_direct(mr, is_write)) {
l = memory_access_size(mr, l, addr);
if (!memory_region_access_valid(mr, xlat, l, is_write)) {
rcu_read_unlock();
return false;
}
}
@ -3318,15 +3328,20 @@ static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
len -= l;
addr += l;
}
rcu_read_unlock();
return true;
}
bool address_space_access_valid(AddressSpace *as, hwaddr addr,
int len, bool is_write)
{
return flatview_access_valid(address_space_to_flatview(as),
addr, len, is_write);
FlatView *fv;
bool result;
rcu_read_lock();
fv = address_space_to_flatview(as);
result = flatview_access_valid(fv, addr, len, is_write);
rcu_read_unlock();
return result;
}
static hwaddr
@ -3372,7 +3387,7 @@ void *address_space_map(AddressSpace *as,
hwaddr l, xlat;
MemoryRegion *mr;
void *ptr;
FlatView *fv = address_space_to_flatview(as);
FlatView *fv;
if (len == 0) {
return NULL;
@ -3380,6 +3395,7 @@ void *address_space_map(AddressSpace *as,
l = len;
rcu_read_lock();
fv = address_space_to_flatview(as);
mr = flatview_translate(fv, addr, &xlat, &l, is_write);
if (!memory_access_is_direct(mr, is_write)) {

View File

@ -515,6 +515,7 @@ static inline int tohex(int v)
return v - 10 + 'a';
}
/* writes 2*len+1 bytes in buf */
static void memtohex(char *buf, const uint8_t *mem, int len)
{
int i, c;
@ -970,8 +971,8 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
const char *p;
uint32_t thread;
int ch, reg_size, type, res;
char buf[MAX_PACKET_LENGTH];
uint8_t mem_buf[MAX_PACKET_LENGTH];
char buf[sizeof(mem_buf) + 1 /* trailing NUL */];
uint8_t *registers;
target_ulong addr, len;

View File

@ -90,7 +90,6 @@ struct pflash_t {
uint16_t ident1;
uint16_t ident2;
uint16_t ident3;
uint8_t cfi_len;
uint8_t cfi_table[0x52];
uint64_t counter;
unsigned int writeblock_size;
@ -153,7 +152,7 @@ static uint32_t pflash_cfi_query(pflash_t *pfl, hwaddr offset)
boff = offset >> (ctz32(pfl->bank_width) +
ctz32(pfl->max_device_width) - ctz32(pfl->device_width));
if (boff > pfl->cfi_len) {
if (boff >= sizeof(pfl->cfi_table)) {
return 0;
}
/* Now we will construct the CFI response generated by a single
@ -385,10 +384,10 @@ static uint32_t pflash_read (pflash_t *pfl, hwaddr offset,
boff = boff >> 2;
}
if (boff > pfl->cfi_len) {
ret = 0;
} else {
if (boff < sizeof(pfl->cfi_table)) {
ret = pfl->cfi_table[boff];
} else {
ret = 0;
}
} else {
/* If we have a read larger than the bank_width, combine multiple
@ -791,7 +790,6 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
pfl->cmd = 0;
pfl->status = 0;
/* Hardcoded CFI table */
pfl->cfi_len = 0x52;
/* Standard "QRY" string */
pfl->cfi_table[0x10] = 'Q';
pfl->cfi_table[0x11] = 'R';

View File

@ -83,7 +83,6 @@ struct pflash_t {
uint16_t ident3;
uint16_t unlock_addr0;
uint16_t unlock_addr1;
uint8_t cfi_len;
uint8_t cfi_table[0x52];
QEMUTimer *timer;
/* The device replicates the flash memory across its memory space. Emulate
@ -235,10 +234,11 @@ static uint32_t pflash_read (pflash_t *pfl, hwaddr offset,
break;
case 0x98:
/* CFI query mode */
if (boff > pfl->cfi_len)
ret = 0;
else
if (boff < sizeof(pfl->cfi_table)) {
ret = pfl->cfi_table[boff];
} else {
ret = 0;
}
break;
}
@ -663,7 +663,6 @@ static void pflash_cfi02_realize(DeviceState *dev, Error **errp)
pfl->cmd = 0;
pfl->status = 0;
/* Hardcoded CFI table (mostly from SG29 Spansion flash) */
pfl->cfi_len = 0x52;
/* Standard "QRY" string */
pfl->cfi_table[0x10] = 'Q';
pfl->cfi_table[0x11] = 'R';

View File

@ -274,6 +274,7 @@ static void uart_write(void *opaque, hwaddr offset, uint64_t value,
* is then reflected into the intstatus value by the update function).
*/
s->state &= ~(value & (R_INTSTATUS_TXO_MASK | R_INTSTATUS_RXO_MASK));
s->intstatus &= ~value;
cmsdk_apb_uart_update(s);
break;
case A_BAUDDIV:

View File

@ -1104,20 +1104,22 @@ int rom_check_and_register_reset(void)
if (rom->fw_file) {
continue;
}
if ((addr > rom->addr) && (as == rom->as)) {
fprintf(stderr, "rom: requested regions overlap "
"(rom %s. free=0x" TARGET_FMT_plx
", addr=0x" TARGET_FMT_plx ")\n",
rom->name, addr, rom->addr);
return -1;
if (!rom->mr) {
if ((addr > rom->addr) && (as == rom->as)) {
fprintf(stderr, "rom: requested regions overlap "
"(rom %s. free=0x" TARGET_FMT_plx
", addr=0x" TARGET_FMT_plx ")\n",
rom->name, addr, rom->addr);
return -1;
}
addr = rom->addr;
addr += rom->romsize;
as = rom->as;
}
addr = rom->addr;
addr += rom->romsize;
section = memory_region_find(rom->mr ? rom->mr : get_system_memory(),
rom->addr, 1);
rom->isrom = int128_nz(section.size) && memory_region_is_rom(section.mr);
memory_region_unref(section.mr);
as = rom->as;
}
qemu_register_reset(rom_reset, NULL);
roms_loaded = 1;

View File

@ -1140,6 +1140,30 @@ static void device_class_init(ObjectClass *class, void *data)
dc->user_creatable = true;
}
void device_class_set_parent_reset(DeviceClass *dc,
DeviceReset dev_reset,
DeviceReset *parent_reset)
{
*parent_reset = dc->reset;
dc->reset = dev_reset;
}
void device_class_set_parent_realize(DeviceClass *dc,
DeviceRealize dev_realize,
DeviceRealize *parent_realize)
{
*parent_realize = dc->realize;
dc->realize = dev_realize;
}
void device_class_set_parent_unrealize(DeviceClass *dc,
DeviceUnrealize dev_unrealize,
DeviceUnrealize *parent_unrealize)
{
*parent_unrealize = dc->unrealize;
dc->unrealize = dev_unrealize;
}
void device_reset(DeviceState *dev)
{
DeviceClass *klass = DEVICE_GET_CLASS(dev);

View File

@ -169,7 +169,8 @@ void qxl_render_update(PCIQXLDevice *qxl)
qemu_mutex_lock(&qxl->ssd.lock);
if (!runstate_is_running() || !qxl->guest_primary.commands) {
if (!runstate_is_running() || !qxl->guest_primary.commands ||
qxl->mode == QXL_MODE_UNDEFINED) {
qxl_render_update_area_unlocked(qxl);
qemu_mutex_unlock(&qxl->ssd.lock);
return;

View File

@ -1280,6 +1280,9 @@ static void vga_draw_text(VGACommonState *s, int full_update)
cx_min = width;
cx_max = -1;
for(cx = 0; cx < width; cx++) {
if (src + sizeof(uint16_t) > s->vram_ptr + s->vram_size) {
break;
}
ch_attr = *(uint16_t *)src;
if (full_update || ch_attr != *ch_attr_ptr || src == cursor_ptr) {
if (cx < cx_min)
@ -1486,6 +1489,8 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
region_start = (s->start_addr * 4);
region_end = region_start + (ram_addr_t)s->line_offset * height;
region_end += width * s->get_bpp(s) / 8; /* scanline length */
region_end -= s->line_offset;
if (region_end > s->vbe_size) {
/* wraps around (can happen with cirrus vbe modes) */
region_start = 0;

View File

@ -2460,6 +2460,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
AcpiDmarDeviceScope *scope = NULL;
/* Root complex IOAPIC use one path[0] only */
size_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
IntelIOMMUState *intel_iommu = INTEL_IOMMU_DEVICE(iommu);
assert(iommu);
if (iommu->intr_supported) {
@ -2467,7 +2468,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
}
dmar = acpi_data_push(table_data, sizeof(*dmar));
dmar->host_address_width = VTD_HOST_ADDRESS_WIDTH - 1;
dmar->host_address_width = intel_iommu->aw_bits - 1;
dmar->flags = dmar_flags;
/* DMAR Remapping Hardware Unit Definition structure */

View File

@ -128,6 +128,22 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr addr,
return new_val;
}
static inline void vtd_iommu_lock(IntelIOMMUState *s)
{
qemu_mutex_lock(&s->iommu_lock);
}
static inline void vtd_iommu_unlock(IntelIOMMUState *s)
{
qemu_mutex_unlock(&s->iommu_lock);
}
/* Whether the address space needs to notify new mappings */
static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as)
{
return as->notifier_flags & IOMMU_NOTIFIER_MAP;
}
/* GHashTable functions */
static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
{
@ -172,9 +188,9 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
}
/* Reset all the gen of VTDAddressSpace to zero and set the gen of
* IntelIOMMUState to 1.
* IntelIOMMUState to 1. Must be called with IOMMU lock held.
*/
static void vtd_reset_context_cache(IntelIOMMUState *s)
static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
{
VTDAddressSpace *vtd_as;
VTDBus *vtd_bus;
@ -197,12 +213,20 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
s->context_cache_gen = 1;
}
static void vtd_reset_iotlb(IntelIOMMUState *s)
/* Must be called with IOMMU lock held. */
static void vtd_reset_iotlb_locked(IntelIOMMUState *s)
{
assert(s->iotlb);
g_hash_table_remove_all(s->iotlb);
}
static void vtd_reset_iotlb(IntelIOMMUState *s)
{
vtd_iommu_lock(s);
vtd_reset_iotlb_locked(s);
vtd_iommu_unlock(s);
}
static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
uint32_t level)
{
@ -215,6 +239,7 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
}
/* Must be called with IOMMU lock held */
static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
hwaddr addr)
{
@ -235,6 +260,7 @@ out:
return entry;
}
/* Must be with IOMMU lock held */
static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
uint16_t domain_id, hwaddr addr, uint64_t slpte,
uint8_t access_flags, uint32_t level)
@ -246,7 +272,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
trace_vtd_iotlb_reset("iotlb exceeds size limit");
vtd_reset_iotlb(s);
vtd_reset_iotlb_locked(s);
}
entry->gfn = gfn;
@ -521,9 +547,9 @@ static inline dma_addr_t vtd_ce_get_slpt_base(VTDContextEntry *ce)
return ce->lo & VTD_CONTEXT_ENTRY_SLPTPTR;
}
static inline uint64_t vtd_get_slpte_addr(uint64_t slpte)
static inline uint64_t vtd_get_slpte_addr(uint64_t slpte, uint8_t aw)
{
return slpte & VTD_SL_PT_BASE_ADDR_MASK;
return slpte & VTD_SL_PT_BASE_ADDR_MASK(aw);
}
/* Whether the pte indicates the address of the page frame */
@ -608,35 +634,29 @@ static inline bool vtd_ce_type_check(X86IOMMUState *x86_iommu,
return true;
}
static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
static inline uint64_t vtd_iova_limit(VTDContextEntry *ce, uint8_t aw)
{
uint32_t ce_agaw = vtd_ce_get_agaw(ce);
return 1ULL << MIN(ce_agaw, VTD_MGAW);
return 1ULL << MIN(ce_agaw, aw);
}
/* Return true if IOVA passes range check, otherwise false. */
static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce)
static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce,
uint8_t aw)
{
/*
* Check if @iova is above 2^X-1, where X is the minimum of MGAW
* in CAP_REG and AW in context-entry.
*/
return !(iova & ~(vtd_iova_limit(ce) - 1));
return !(iova & ~(vtd_iova_limit(ce, aw) - 1));
}
static const uint64_t vtd_paging_entry_rsvd_field[] = {
[0] = ~0ULL,
/* For not large page */
[1] = 0x800ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
[2] = 0x800ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
[3] = 0x800ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
[4] = 0x880ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
/* For large page */
[5] = 0x800ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
[6] = 0x1ff800ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
[7] = 0x3ffff800ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
[8] = 0x880ULL | ~(VTD_HAW_MASK | VTD_SL_IGN_COM),
};
/*
* Rsvd field masks for spte:
* Index [1] to [4] 4k pages
* Index [5] to [8] large pages
*/
static uint64_t vtd_paging_entry_rsvd_field[9];
static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
{
@ -676,7 +696,7 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
*/
static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
uint64_t *slptep, uint32_t *slpte_level,
bool *reads, bool *writes)
bool *reads, bool *writes, uint8_t aw_bits)
{
dma_addr_t addr = vtd_ce_get_slpt_base(ce);
uint32_t level = vtd_ce_get_level(ce);
@ -684,7 +704,7 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
uint64_t slpte;
uint64_t access_right_check;
if (!vtd_iova_range_check(iova, ce)) {
if (!vtd_iova_range_check(iova, ce, aw_bits)) {
trace_vtd_err_dmar_iova_overflow(iova);
return -VTD_FR_ADDR_BEYOND_MGAW;
}
@ -721,29 +741,124 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
*slpte_level = level;
return 0;
}
addr = vtd_get_slpte_addr(slpte);
addr = vtd_get_slpte_addr(slpte, aw_bits);
level--;
}
}
typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
/**
* Constant information used during page walking
*
* @hook_fn: hook func to be called when detected page
* @private: private data to be passed into hook func
* @notify_unmap: whether we should notify invalid entries
* @as: VT-d address space of the device
* @aw: maximum address width
* @domain: domain ID of the page walk
*/
typedef struct {
VTDAddressSpace *as;
vtd_page_walk_hook hook_fn;
void *private;
bool notify_unmap;
uint8_t aw;
uint16_t domain_id;
} vtd_page_walk_info;
static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
{
VTDAddressSpace *as = info->as;
vtd_page_walk_hook hook_fn = info->hook_fn;
void *private = info->private;
DMAMap target = {
.iova = entry->iova,
.size = entry->addr_mask,
.translated_addr = entry->translated_addr,
.perm = entry->perm,
};
DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
if (entry->perm == IOMMU_NONE && !info->notify_unmap) {
trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
return 0;
}
assert(hook_fn);
/* Update local IOVA mapped ranges */
if (entry->perm) {
if (mapped) {
/* If it's exactly the same translation, skip */
if (!memcmp(mapped, &target, sizeof(target))) {
trace_vtd_page_walk_one_skip_map(entry->iova, entry->addr_mask,
entry->translated_addr);
return 0;
} else {
/*
* Translation changed. Normally this should not
* happen, but it can happen when with buggy guest
* OSes. Note that there will be a small window that
* we don't have map at all. But that's the best
* effort we can do. The ideal way to emulate this is
* atomically modify the PTE to follow what has
* changed, but we can't. One example is that vfio
* driver only has VFIO_IOMMU_[UN]MAP_DMA but no
* interface to modify a mapping (meanwhile it seems
* meaningless to even provide one). Anyway, let's
* mark this as a TODO in case one day we'll have
* a better solution.
*/
IOMMUAccessFlags cache_perm = entry->perm;
int ret;
/* Emulate an UNMAP */
entry->perm = IOMMU_NONE;
trace_vtd_page_walk_one(info->domain_id,
entry->iova,
entry->translated_addr,
entry->addr_mask,
entry->perm);
ret = hook_fn(entry, private);
if (ret) {
return ret;
}
/* Drop any existing mapping */
iova_tree_remove(as->iova_tree, &target);
/* Recover the correct permission */
entry->perm = cache_perm;
}
}
iova_tree_insert(as->iova_tree, &target);
} else {
if (!mapped) {
/* Skip since we didn't map this range at all */
trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
return 0;
}
iova_tree_remove(as->iova_tree, &target);
}
trace_vtd_page_walk_one(info->domain_id, entry->iova,
entry->translated_addr, entry->addr_mask,
entry->perm);
return hook_fn(entry, private);
}
/**
* vtd_page_walk_level - walk over specific level for IOVA range
*
* @addr: base GPA addr to start the walk
* @start: IOVA range start address
* @end: IOVA range end address (start <= addr < end)
* @hook_fn: hook func to be called when detected page
* @private: private data to be passed into hook func
* @read: whether parent level has read permission
* @write: whether parent level has write permission
* @notify_unmap: whether we should notify invalid entries
* @info: constant information for the page walk
*/
static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
uint64_t end, vtd_page_walk_hook hook_fn,
void *private, uint32_t level,
bool read, bool write, bool notify_unmap)
uint64_t end, uint32_t level, bool read,
bool write, vtd_page_walk_info *info)
{
bool read_cur, write_cur, entry_valid;
uint32_t offset;
@ -786,37 +901,34 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
*/
entry_valid = read_cur | write_cur;
if (vtd_is_last_slpte(slpte, level)) {
if (!vtd_is_last_slpte(slpte, level) && entry_valid) {
/*
* This is a valid PDE (or even bigger than PDE). We need
* to walk one further level.
*/
ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
iova, MIN(iova_next, end), level - 1,
read_cur, write_cur, info);
} else {
/*
* This means we are either:
*
* (1) the real page entry (either 4K page, or huge page)
* (2) the whole range is invalid
*
* In either case, we send an IOTLB notification down.
*/
entry.target_as = &address_space_memory;
entry.iova = iova & subpage_mask;
/* NOTE: this is only meaningful if entry_valid == true */
entry.translated_addr = vtd_get_slpte_addr(slpte);
entry.addr_mask = ~subpage_mask;
entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
if (!entry_valid && !notify_unmap) {
trace_vtd_page_walk_skip_perm(iova, iova_next);
goto next;
}
trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
entry.addr_mask, entry.perm);
if (hook_fn) {
ret = hook_fn(&entry, private);
if (ret < 0) {
return ret;
}
}
} else {
if (!entry_valid) {
trace_vtd_page_walk_skip_perm(iova, iova_next);
goto next;
}
ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte), iova,
MIN(iova_next, end), hook_fn, private,
level - 1, read_cur, write_cur,
notify_unmap);
if (ret < 0) {
return ret;
}
entry.addr_mask = ~subpage_mask;
/* NOTE: this is only meaningful if entry_valid == true */
entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
ret = vtd_page_walk_one(&entry, info);
}
if (ret < 0) {
return ret;
}
next:
@ -832,27 +944,24 @@ next:
* @ce: context entry to walk upon
* @start: IOVA address to start the walk
* @end: IOVA range end address (start <= addr < end)
* @hook_fn: the hook that to be called for each detected area
* @private: private data for the hook function
* @info: page walking information struct
*/
static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
vtd_page_walk_hook hook_fn, void *private,
bool notify_unmap)
vtd_page_walk_info *info)
{
dma_addr_t addr = vtd_ce_get_slpt_base(ce);
uint32_t level = vtd_ce_get_level(ce);
if (!vtd_iova_range_check(start, ce)) {
if (!vtd_iova_range_check(start, ce, info->aw)) {
return -VTD_FR_ADDR_BEYOND_MGAW;
}
if (!vtd_iova_range_check(end, ce)) {
if (!vtd_iova_range_check(end, ce, info->aw)) {
/* Fix end so that it reaches the maximum */
end = vtd_iova_limit(ce);
end = vtd_iova_limit(ce, info->aw);
}
return vtd_page_walk_level(addr, start, end, hook_fn, private,
level, true, true, notify_unmap);
return vtd_page_walk_level(addr, start, end, level, true, true, info);
}
/* Map a device to its corresponding domain (context-entry) */
@ -874,7 +983,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
return -VTD_FR_ROOT_ENTRY_P;
}
if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD)) {
if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->aw_bits))) {
trace_vtd_re_invalid(re.rsvd, re.val);
return -VTD_FR_ROOT_ENTRY_RSVD;
}
@ -891,7 +1000,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
}
if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
(ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
(ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->aw_bits))) {
trace_vtd_ce_invalid(ce->hi, ce->lo);
return -VTD_FR_CONTEXT_ENTRY_RSVD;
}
@ -911,6 +1020,58 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
return 0;
}
static int vtd_sync_shadow_page_hook(IOMMUTLBEntry *entry,
void *private)
{
memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
return 0;
}
/* If context entry is NULL, we'll try to fetch it on our own. */
static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
VTDContextEntry *ce,
hwaddr addr, hwaddr size)
{
IntelIOMMUState *s = vtd_as->iommu_state;
vtd_page_walk_info info = {
.hook_fn = vtd_sync_shadow_page_hook,
.private = (void *)&vtd_as->iommu,
.notify_unmap = true,
.aw = s->aw_bits,
.as = vtd_as,
};
VTDContextEntry ce_cache;
int ret;
if (ce) {
/* If the caller provided context entry, use it */
ce_cache = *ce;
} else {
/* If the caller didn't provide ce, try to fetch */
ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
vtd_as->devfn, &ce_cache);
if (ret) {
/*
* This should not really happen, but in case it happens,
* we just skip the sync for this time. After all we even
* don't have the root table pointer!
*/
trace_vtd_err("Detected invalid context entry when "
"trying to sync shadow page table");
return 0;
}
}
info.domain_id = VTD_CONTEXT_ENTRY_DID(ce_cache.hi);
return vtd_page_walk(&ce_cache, addr, addr + size, &info);
}
static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
{
return vtd_sync_shadow_page_table_range(vtd_as, NULL, 0, UINT64_MAX);
}
/*
* Fetch translation type for specific device. Returns <0 if error
* happens, otherwise return the shifted type to check against
@ -1092,7 +1253,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
IntelIOMMUState *s = vtd_as->iommu_state;
VTDContextEntry ce;
uint8_t bus_num = pci_bus_num(bus);
VTDContextCacheEntry *cc_entry = &vtd_as->context_cache_entry;
VTDContextCacheEntry *cc_entry;
uint64_t slpte, page_mask;
uint32_t level;
uint16_t source_id = vtd_make_source_id(bus_num, devfn);
@ -1109,6 +1270,10 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
*/
assert(!vtd_is_interrupt_addr(addr));
vtd_iommu_lock(s);
cc_entry = &vtd_as->context_cache_entry;
/* Try to fetch slpte form IOTLB */
iotlb_entry = vtd_lookup_iotlb(s, source_id, addr);
if (iotlb_entry) {
@ -1168,12 +1333,12 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
* IOMMU region can be swapped back.
*/
vtd_pt_enable_fast_path(s, source_id);
vtd_iommu_unlock(s);
return true;
}
ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
&reads, &writes);
&reads, &writes, s->aw_bits);
if (ret_fr) {
ret_fr = -ret_fr;
if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
@ -1189,13 +1354,15 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
vtd_update_iotlb(s, source_id, VTD_CONTEXT_ENTRY_DID(ce.hi), addr, slpte,
access_flags, level);
out:
vtd_iommu_unlock(s);
entry->iova = addr & page_mask;
entry->translated_addr = vtd_get_slpte_addr(slpte) & page_mask;
entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
entry->addr_mask = ~page_mask;
entry->perm = access_flags;
return true;
error:
vtd_iommu_unlock(s);
entry->iova = 0;
entry->translated_addr = 0;
entry->addr_mask = 0;
@ -1207,7 +1374,7 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
{
s->root = vtd_get_quad_raw(s, DMAR_RTADDR_REG);
s->root_extended = s->root & VTD_RTADDR_RTT;
s->root &= VTD_RTADDR_ADDR_MASK;
s->root &= VTD_RTADDR_ADDR_MASK(s->aw_bits);
trace_vtd_reg_dmar_root(s->root, s->root_extended);
}
@ -1223,7 +1390,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
uint64_t value = 0;
value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
s->intr_root = value & VTD_IRTA_ADDR_MASK;
s->intr_root = value & VTD_IRTA_ADDR_MASK(s->aw_bits);
s->intr_eime = value & VTD_IRTA_EIME;
/* Notify global invalidation */
@ -1234,20 +1401,23 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
static void vtd_iommu_replay_all(IntelIOMMUState *s)
{
IntelIOMMUNotifierNode *node;
VTDAddressSpace *vtd_as;
QLIST_FOREACH(node, &s->notifiers_list, next) {
memory_region_iommu_replay_all(&node->vtd_as->iommu);
QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
vtd_sync_shadow_page_table(vtd_as);
}
}
static void vtd_context_global_invalidate(IntelIOMMUState *s)
{
trace_vtd_inv_desc_cc_global();
/* Protects context cache */
vtd_iommu_lock(s);
s->context_cache_gen++;
if (s->context_cache_gen == VTD_CONTEXT_CACHE_GEN_MAX) {
vtd_reset_context_cache(s);
vtd_reset_context_cache_locked(s);
}
vtd_iommu_unlock(s);
vtd_switch_address_space_all(s);
/*
* From VT-d spec 6.5.2.1, a global context entry invalidation
@ -1299,7 +1469,9 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
VTD_PCI_FUNC(devfn_it));
vtd_iommu_lock(s);
vtd_as->context_cache_entry.context_cache_gen = 0;
vtd_iommu_unlock(s);
/*
* Do switch address space when needed, in case if the
* device passthrough bit is switched.
@ -1307,14 +1479,13 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s,
vtd_switch_address_space(vtd_as);
/*
* So a device is moving out of (or moving into) a
* domain, a replay() suites here to notify all the
* IOMMU_NOTIFIER_MAP registers about this change.
* domain, resync the shadow page table.
* This won't bring bad even if we have no such
* notifier registered - the IOMMU notification
* framework will skip MAP notifications if that
* happened.
*/
memory_region_iommu_replay_all(&vtd_as->iommu);
vtd_sync_shadow_page_table(vtd_as);
}
}
}
@ -1358,48 +1529,60 @@ static void vtd_iotlb_global_invalidate(IntelIOMMUState *s)
static void vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
{
IntelIOMMUNotifierNode *node;
VTDContextEntry ce;
VTDAddressSpace *vtd_as;
trace_vtd_inv_desc_iotlb_domain(domain_id);
vtd_iommu_lock(s);
g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_domain,
&domain_id);
vtd_iommu_unlock(s);
QLIST_FOREACH(node, &s->notifiers_list, next) {
vtd_as = node->vtd_as;
QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
vtd_as->devfn, &ce) &&
domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
memory_region_iommu_replay_all(&vtd_as->iommu);
vtd_sync_shadow_page_table(vtd_as);
}
}
}
static int vtd_page_invalidate_notify_hook(IOMMUTLBEntry *entry,
void *private)
{
memory_region_notify_iommu((IOMMUMemoryRegion *)private, *entry);
return 0;
}
static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
uint16_t domain_id, hwaddr addr,
uint8_t am)
{
IntelIOMMUNotifierNode *node;
VTDAddressSpace *vtd_as;
VTDContextEntry ce;
int ret;
hwaddr size = (1 << am) * VTD_PAGE_SIZE;
QLIST_FOREACH(node, &(s->notifiers_list), next) {
VTDAddressSpace *vtd_as = node->vtd_as;
QLIST_FOREACH(vtd_as, &(s->vtd_as_with_notifiers), next) {
ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
vtd_as->devfn, &ce);
if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
vtd_page_walk(&ce, addr, addr + (1 << am) * VTD_PAGE_SIZE,
vtd_page_invalidate_notify_hook,
(void *)&vtd_as->iommu, true);
if (vtd_as_has_map_notifier(vtd_as)) {
/*
* As long as we have MAP notifications registered in
* any of our IOMMU notifiers, we need to sync the
* shadow page table.
*/
vtd_sync_shadow_page_table_range(vtd_as, &ce, addr, size);
} else {
/*
* For UNMAP-only notifiers, we don't need to walk the
* page tables. We just deliver the PSI down to
* invalidate caches.
*/
IOMMUTLBEntry entry = {
.target_as = &address_space_memory,
.iova = addr,
.translated_addr = 0,
.addr_mask = size - 1,
.perm = IOMMU_NONE,
};
memory_region_notify_iommu(&vtd_as->iommu, entry);
}
}
}
}
@ -1415,7 +1598,9 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
info.domain_id = domain_id;
info.addr = addr;
info.mask = ~((1 << am) - 1);
vtd_iommu_lock(s);
g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
vtd_iommu_unlock(s);
vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
}
@ -1479,7 +1664,7 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en)
trace_vtd_inv_qi_enable(en);
if (en) {
s->iq = iqa_val & VTD_IQA_IQA_MASK;
s->iq = iqa_val & VTD_IQA_IQA_MASK(s->aw_bits);
/* 2^(x+8) entries */
s->iq_size = 1UL << ((iqa_val & VTD_IQA_QS) + 8);
s->qi_enabled = true;
@ -2323,8 +2508,6 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
{
VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
IntelIOMMUState *s = vtd_as->iommu_state;
IntelIOMMUNotifierNode *node = NULL;
IntelIOMMUNotifierNode *next_node = NULL;
if (!s->caching_mode && new & IOMMU_NOTIFIER_MAP) {
error_report("We need to set cache_mode=1 for intel-iommu to enable "
@ -2332,22 +2515,13 @@ static void vtd_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
exit(1);
}
if (old == IOMMU_NOTIFIER_NONE) {
node = g_malloc0(sizeof(*node));
node->vtd_as = vtd_as;
QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
return;
}
/* Update per-address-space notifier flags */
vtd_as->notifier_flags = new;
/* update notifier node with new flags */
QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
if (node->vtd_as == vtd_as) {
if (new == IOMMU_NOTIFIER_NONE) {
QLIST_REMOVE(node, next);
g_free(node);
}
return;
}
if (old == IOMMU_NOTIFIER_NONE) {
QLIST_INSERT_HEAD(&s->vtd_as_with_notifiers, vtd_as, next);
} else if (new == IOMMU_NOTIFIER_NONE) {
QLIST_REMOVE(vtd_as, next);
}
}
@ -2410,6 +2584,8 @@ static Property vtd_properties[] = {
DEFINE_PROP_ON_OFF_AUTO("eim", IntelIOMMUState, intr_eim,
ON_OFF_AUTO_AUTO),
DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
DEFINE_PROP_UINT8("x-aw-bits", IntelIOMMUState, aw_bits,
VTD_HOST_ADDRESS_WIDTH),
DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
DEFINE_PROP_END_OF_LIST(),
};
@ -2714,6 +2890,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
vtd_dev_as->devfn = (uint8_t)devfn;
vtd_dev_as->iommu_state = s;
vtd_dev_as->context_cache_entry.context_cache_gen = 0;
vtd_dev_as->iova_tree = iova_tree_new();
/*
* Memory region relationships looks like (Address range shows
@ -2765,6 +2942,8 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
hwaddr size;
hwaddr start = n->start;
hwaddr end = n->end;
IntelIOMMUState *s = as->iommu_state;
DMAMap map;
/*
* Note: all the codes in this function has a assumption that IOVA
@ -2772,12 +2951,12 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
* VT-d spec), otherwise we need to consider overflow of 64 bits.
*/
if (end > VTD_ADDRESS_SIZE) {
if (end > VTD_ADDRESS_SIZE(s->aw_bits)) {
/*
* Don't need to unmap regions that is bigger than the whole
* VT-d supported address space size
*/
end = VTD_ADDRESS_SIZE;
end = VTD_ADDRESS_SIZE(s->aw_bits);
}
assert(start <= end);
@ -2789,9 +2968,9 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
* suite the minimum available mask.
*/
int n = 64 - clz64(size);
if (n > VTD_MGAW) {
if (n > s->aw_bits) {
/* should not happen, but in case it happens, limit it */
n = VTD_MGAW;
n = s->aw_bits;
}
size = 1ULL << n;
}
@ -2809,17 +2988,19 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
VTD_PCI_FUNC(as->devfn),
entry.iova, size);
map.iova = entry.iova;
map.size = entry.addr_mask;
iova_tree_remove(as->iova_tree, &map);
memory_region_notify_one(n, &entry);
}
static void vtd_address_space_unmap_all(IntelIOMMUState *s)
{
IntelIOMMUNotifierNode *node;
VTDAddressSpace *vtd_as;
IOMMUNotifier *n;
QLIST_FOREACH(node, &s->notifiers_list, next) {
vtd_as = node->vtd_as;
QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
vtd_address_space_unmap(vtd_as, n);
}
@ -2851,7 +3032,19 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
PCI_FUNC(vtd_as->devfn),
VTD_CONTEXT_ENTRY_DID(ce.hi),
ce.hi, ce.lo);
vtd_page_walk(&ce, 0, ~0ULL, vtd_replay_hook, (void *)n, false);
if (vtd_as_has_map_notifier(vtd_as)) {
/* This is required only for MAP typed notifiers */
vtd_page_walk_info info = {
.hook_fn = vtd_replay_hook,
.private = (void *)n,
.notify_unmap = false,
.aw = s->aw_bits,
.as = vtd_as,
.domain_id = VTD_CONTEXT_ENTRY_DID(ce.hi),
};
vtd_page_walk(&ce, 0, ~0ULL, &info);
}
} else {
trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
PCI_FUNC(vtd_as->devfn));
@ -2882,10 +3075,27 @@ static void vtd_init(IntelIOMMUState *s)
s->qi_enabled = false;
s->iq_last_desc_type = VTD_INV_DESC_NONE;
s->next_frcd_reg = 0;
s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | VTD_CAP_MGAW |
VTD_CAP_SAGAW | VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS;
s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits);
if (s->aw_bits == VTD_HOST_AW_48BIT) {
s->cap |= VTD_CAP_SAGAW_48bit;
}
s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
/*
* Rsvd field masks for spte
*/
vtd_paging_entry_rsvd_field[0] = ~0ULL;
vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits);
vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits);
if (x86_iommu->intr_supported) {
s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
if (s->intr_eim == ON_OFF_AUTO_ON) {
@ -2906,8 +3116,10 @@ static void vtd_init(IntelIOMMUState *s)
s->cap |= VTD_CAP_CM;
}
vtd_reset_context_cache(s);
vtd_reset_iotlb(s);
vtd_iommu_lock(s);
vtd_reset_context_cache_locked(s);
vtd_reset_iotlb_locked(s);
vtd_iommu_unlock(s);
/* Define registers with default values and bit semantics */
vtd_define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);
@ -3021,6 +3233,14 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
}
}
/* Currently only address widths supported are 39 and 48 bits */
if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
(s->aw_bits != VTD_HOST_AW_48BIT)) {
error_setg(errp, "Supported values for x-aw-bits are: %d, %d",
VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT);
return false;
}
return true;
}
@ -3047,7 +3267,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
return;
}
QLIST_INIT(&s->notifiers_list);
QLIST_INIT(&s->vtd_as_with_notifiers);
qemu_mutex_init(&s->iommu_lock);
memset(s->vtd_as_by_bus_num, 0, sizeof(s->vtd_as_by_bus_num));
memory_region_init_io(&s->csrmem, OBJECT(s), &vtd_mem_ops, s,
"intel_iommu", DMAR_REG_SIZE);

View File

@ -131,7 +131,7 @@
#define VTD_TLB_DID(val) (((val) >> 32) & VTD_DOMAIN_ID_MASK)
/* IVA_REG */
#define VTD_IVA_ADDR(val) ((val) & ~0xfffULL & ((1ULL << VTD_MGAW) - 1))
#define VTD_IVA_ADDR(val) ((val) & ~0xfffULL)
#define VTD_IVA_AM(val) ((val) & 0x3fULL)
/* GCMD_REG */
@ -172,10 +172,10 @@
/* RTADDR_REG */
#define VTD_RTADDR_RTT (1ULL << 11)
#define VTD_RTADDR_ADDR_MASK (VTD_HAW_MASK ^ 0xfffULL)
#define VTD_RTADDR_ADDR_MASK(aw) (VTD_HAW_MASK(aw) ^ 0xfffULL)
/* IRTA_REG */
#define VTD_IRTA_ADDR_MASK (VTD_HAW_MASK ^ 0xfffULL)
#define VTD_IRTA_ADDR_MASK(aw) (VTD_HAW_MASK(aw) ^ 0xfffULL)
#define VTD_IRTA_EIME (1ULL << 11)
#define VTD_IRTA_SIZE_MASK (0xfULL)
@ -197,9 +197,8 @@
#define VTD_DOMAIN_ID_SHIFT 16 /* 16-bit domain id for 64K domains */
#define VTD_DOMAIN_ID_MASK ((1UL << VTD_DOMAIN_ID_SHIFT) - 1)
#define VTD_CAP_ND (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
#define VTD_MGAW 39 /* Maximum Guest Address Width */
#define VTD_ADDRESS_SIZE (1ULL << VTD_MGAW)
#define VTD_CAP_MGAW (((VTD_MGAW - 1) & 0x3fULL) << 16)
#define VTD_ADDRESS_SIZE(aw) (1ULL << (aw))
#define VTD_CAP_MGAW(aw) ((((aw) - 1) & 0x3fULL) << 16)
#define VTD_MAMV 18ULL
#define VTD_CAP_MAMV (VTD_MAMV << 48)
#define VTD_CAP_PSI (1ULL << 39)
@ -213,13 +212,12 @@
#define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT)
/* 48-bit AGAW, 4-level page-table */
#define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT)
#define VTD_CAP_SAGAW VTD_CAP_SAGAW_39bit
/* IQT_REG */
#define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL)
/* IQA_REG */
#define VTD_IQA_IQA_MASK (VTD_HAW_MASK ^ 0xfffULL)
#define VTD_IQA_IQA_MASK(aw) (VTD_HAW_MASK(aw) ^ 0xfffULL)
#define VTD_IQA_QS 0x7ULL
/* IQH_REG */
@ -252,7 +250,7 @@
#define VTD_FRCD_SID_MASK 0xffffULL
#define VTD_FRCD_SID(val) ((val) & VTD_FRCD_SID_MASK)
/* For the low 64-bit of 128-bit */
#define VTD_FRCD_FI(val) ((val) & (((1ULL << VTD_MGAW) - 1) ^ 0xfffULL))
#define VTD_FRCD_FI(val) ((val) & ~0xfffULL)
/* DMA Remapping Fault Conditions */
typedef enum VTDFaultReason {
@ -360,8 +358,7 @@ typedef union VTDInvDesc VTDInvDesc;
#define VTD_INV_DESC_IOTLB_DOMAIN (2ULL << 4)
#define VTD_INV_DESC_IOTLB_PAGE (3ULL << 4)
#define VTD_INV_DESC_IOTLB_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK)
#define VTD_INV_DESC_IOTLB_ADDR(val) ((val) & ~0xfffULL & \
((1ULL << VTD_MGAW) - 1))
#define VTD_INV_DESC_IOTLB_ADDR(val) ((val) & ~0xfffULL)
#define VTD_INV_DESC_IOTLB_AM(val) ((val) & 0x3fULL)
#define VTD_INV_DESC_IOTLB_RSVD_LO 0xffffffff0000ff00ULL
#define VTD_INV_DESC_IOTLB_RSVD_HI 0xf80ULL
@ -373,6 +370,24 @@ typedef union VTDInvDesc VTDInvDesc;
#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI 0xffeULL
#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffff0000ffe0fff8
/* Rsvd field masks for spte */
#define VTD_SPTE_PAGE_L1_RSVD_MASK(aw) \
(0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_PAGE_L2_RSVD_MASK(aw) \
(0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_PAGE_L3_RSVD_MASK(aw) \
(0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \
(0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \
(0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \
(0x1ff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_LPAGE_L3_RSVD_MASK(aw) \
(0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
#define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \
(0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
/* Information about page-selective IOTLB invalidate */
struct VTDIOTLBPageInvInfo {
uint16_t domain_id;
@ -403,7 +418,7 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_ROOT_ENTRY_CTP (~0xfffULL)
#define VTD_ROOT_ENTRY_NR (VTD_PAGE_SIZE / sizeof(VTDRootEntry))
#define VTD_ROOT_ENTRY_RSVD (0xffeULL | ~VTD_HAW_MASK)
#define VTD_ROOT_ENTRY_RSVD(aw) (0xffeULL | ~VTD_HAW_MASK(aw))
/* Masks for struct VTDContextEntry */
/* lo */
@ -415,7 +430,7 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_CONTEXT_TT_PASS_THROUGH (2ULL << 2)
/* Second Level Page Translation Pointer*/
#define VTD_CONTEXT_ENTRY_SLPTPTR (~0xfffULL)
#define VTD_CONTEXT_ENTRY_RSVD_LO (0xff0ULL | ~VTD_HAW_MASK)
#define VTD_CONTEXT_ENTRY_RSVD_LO(aw) (0xff0ULL | ~VTD_HAW_MASK(aw))
/* hi */
#define VTD_CONTEXT_ENTRY_AW 7ULL /* Adjusted guest-address-width */
#define VTD_CONTEXT_ENTRY_DID(val) (((val) >> 8) & VTD_DOMAIN_ID_MASK)
@ -439,7 +454,7 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_SL_RW_MASK 3ULL
#define VTD_SL_R 1ULL
#define VTD_SL_W (1ULL << 1)
#define VTD_SL_PT_BASE_ADDR_MASK (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK)
#define VTD_SL_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK(aw))
#define VTD_SL_IGN_COM 0xbff0000000000000ULL
#endif

View File

@ -31,12 +31,13 @@
#include "hw/loader.h"
#include "elf.h"
#include "sysemu/sysemu.h"
#include "qemu/error-report.h"
/* Show multiboot debug output */
//#define DEBUG_MULTIBOOT
#ifdef DEBUG_MULTIBOOT
#define mb_debug(a...) fprintf(stderr, ## a)
#define mb_debug(a...) error_report(a)
#else
#define mb_debug(a...)
#endif
@ -137,7 +138,7 @@ static void mb_add_mod(MultibootState *s,
stl_p(p + MB_MOD_END, end);
stl_p(p + MB_MOD_CMDLINE, cmdline_phys);
mb_debug("mod%02d: "TARGET_FMT_plx" - "TARGET_FMT_plx"\n",
mb_debug("mod%02d: "TARGET_FMT_plx" - "TARGET_FMT_plx,
s->mb_mods_count, start, end);
s->mb_mods_count++;
@ -179,12 +180,12 @@ int load_multiboot(FWCfgState *fw_cfg,
if (!is_multiboot)
return 0; /* no multiboot */
mb_debug("qemu: I believe we found a multiboot image!\n");
mb_debug("qemu: I believe we found a multiboot image!");
memset(bootinfo, 0, sizeof(bootinfo));
memset(&mbs, 0, sizeof(mbs));
if (flags & 0x00000004) { /* MULTIBOOT_HEADER_HAS_VBE */
fprintf(stderr, "qemu: multiboot knows VBE. we don't.\n");
error_report("qemu: multiboot knows VBE. we don't.");
}
if (!(flags & 0x00010000)) { /* MULTIBOOT_HEADER_HAS_ADDR */
uint64_t elf_entry;
@ -193,7 +194,7 @@ int load_multiboot(FWCfgState *fw_cfg,
fclose(f);
if (((struct elf64_hdr*)header)->e_machine == EM_X86_64) {
fprintf(stderr, "Cannot load x86-64 image, give a 32bit one.\n");
error_report("Cannot load x86-64 image, give a 32bit one.");
exit(1);
}
@ -201,7 +202,7 @@ int load_multiboot(FWCfgState *fw_cfg,
&elf_low, &elf_high, 0, I386_ELF_MACHINE,
0, 0);
if (kernel_size < 0) {
fprintf(stderr, "Error while loading elf kernel\n");
error_report("Error while loading elf kernel");
exit(1);
}
mh_load_addr = elf_low;
@ -210,12 +211,13 @@ int load_multiboot(FWCfgState *fw_cfg,
mbs.mb_buf = g_malloc(mb_kernel_size);
if (rom_copy(mbs.mb_buf, mh_load_addr, mb_kernel_size) != mb_kernel_size) {
fprintf(stderr, "Error while fetching elf kernel from rom\n");
error_report("Error while fetching elf kernel from rom");
exit(1);
}
mb_debug("qemu: loading multiboot-elf kernel (%#x bytes) with entry %#zx\n",
mb_kernel_size, (size_t)mh_entry_addr);
mb_debug("qemu: loading multiboot-elf kernel "
"(%#x bytes) with entry %#zx",
mb_kernel_size, (size_t)mh_entry_addr);
} else {
/* Valid if mh_flags sets MULTIBOOT_HEADER_HAS_ADDR. */
uint32_t mh_header_addr = ldl_p(header+i+12);
@ -224,7 +226,11 @@ int load_multiboot(FWCfgState *fw_cfg,
mh_load_addr = ldl_p(header+i+16);
if (mh_header_addr < mh_load_addr) {
fprintf(stderr, "invalid mh_load_addr address\n");
error_report("invalid load_addr address");
exit(1);
}
if (mh_header_addr - mh_load_addr > i) {
error_report("invalid header_addr address");
exit(1);
}
@ -233,43 +239,43 @@ int load_multiboot(FWCfgState *fw_cfg,
mh_entry_addr = ldl_p(header+i+28);
if (mh_load_end_addr) {
if (mh_bss_end_addr < mh_load_addr) {
fprintf(stderr, "invalid mh_bss_end_addr address\n");
exit(1);
}
mb_kernel_size = mh_bss_end_addr - mh_load_addr;
if (mh_load_end_addr < mh_load_addr) {
fprintf(stderr, "invalid mh_load_end_addr address\n");
error_report("invalid load_end_addr address");
exit(1);
}
mb_load_size = mh_load_end_addr - mh_load_addr;
} else {
if (kernel_file_size < mb_kernel_text_offset) {
fprintf(stderr, "invalid kernel_file_size\n");
error_report("invalid kernel_file_size");
exit(1);
}
mb_kernel_size = kernel_file_size - mb_kernel_text_offset;
mb_load_size = mb_kernel_size;
mb_load_size = kernel_file_size - mb_kernel_text_offset;
}
if (mb_load_size > UINT32_MAX - mh_load_addr) {
error_report("kernel does not fit in address space");
exit(1);
}
if (mh_bss_end_addr) {
if (mh_bss_end_addr < (mh_load_addr + mb_load_size)) {
error_report("invalid bss_end_addr address");
exit(1);
}
mb_kernel_size = mh_bss_end_addr - mh_load_addr;
} else {
mb_kernel_size = mb_load_size;
}
/* Valid if mh_flags sets MULTIBOOT_HEADER_HAS_VBE.
uint32_t mh_mode_type = ldl_p(header+i+32);
uint32_t mh_width = ldl_p(header+i+36);
uint32_t mh_height = ldl_p(header+i+40);
uint32_t mh_depth = ldl_p(header+i+44); */
mb_debug("multiboot: mh_header_addr = %#x\n", mh_header_addr);
mb_debug("multiboot: mh_load_addr = %#x\n", mh_load_addr);
mb_debug("multiboot: mh_load_end_addr = %#x\n", mh_load_end_addr);
mb_debug("multiboot: mh_bss_end_addr = %#x\n", mh_bss_end_addr);
mb_debug("qemu: loading multiboot kernel (%#x bytes) at %#x\n",
mb_debug("multiboot: header_addr = %#x", mh_header_addr);
mb_debug("multiboot: load_addr = %#x", mh_load_addr);
mb_debug("multiboot: load_end_addr = %#x", mh_load_end_addr);
mb_debug("multiboot: bss_end_addr = %#x", mh_bss_end_addr);
mb_debug("qemu: loading multiboot kernel (%#x bytes) at %#x",
mb_load_size, mh_load_addr);
mbs.mb_buf = g_malloc(mb_kernel_size);
fseek(f, mb_kernel_text_offset, SEEK_SET);
if (fread(mbs.mb_buf, 1, mb_load_size, f) != mb_load_size) {
fprintf(stderr, "fread() failed\n");
error_report("fread() failed");
exit(1);
}
memset(mbs.mb_buf + mb_load_size, 0, mb_kernel_size - mb_load_size);
@ -323,10 +329,10 @@ int load_multiboot(FWCfgState *fw_cfg,
hwaddr c = mb_add_cmdline(&mbs, tmpbuf);
if ((next_space = strchr(tmpbuf, ' ')))
*next_space = '\0';
mb_debug("multiboot loading module: %s\n", tmpbuf);
mb_debug("multiboot loading module: %s", tmpbuf);
mb_mod_length = get_image_size(tmpbuf);
if (mb_mod_length < 0) {
fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
error_report("Failed to open file '%s'", tmpbuf);
exit(1);
}
@ -337,7 +343,7 @@ int load_multiboot(FWCfgState *fw_cfg,
mb_add_mod(&mbs, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
mb_debug("mod_start: %p\nmod_end: %p\n cmdline: "TARGET_FMT_plx"\n",
mb_debug("mod_start: %p\nmod_end: %p\n cmdline: "TARGET_FMT_plx,
(char *)mbs.mb_buf + offs,
(char *)mbs.mb_buf + offs + mb_mod_length, c);
initrd_filename = next_initrd+1;
@ -365,10 +371,11 @@ int load_multiboot(FWCfgState *fw_cfg,
stl_p(bootinfo + MBI_BOOT_DEVICE, 0x8000ffff); /* XXX: use the -boot switch? */
stl_p(bootinfo + MBI_MMAP_ADDR, ADDR_E820_MAP);
mb_debug("multiboot: mh_entry_addr = %#x\n", mh_entry_addr);
mb_debug(" mb_buf_phys = "TARGET_FMT_plx"\n", mbs.mb_buf_phys);
mb_debug(" mod_start = "TARGET_FMT_plx"\n", mbs.mb_buf_phys + mbs.offset_mods);
mb_debug(" mb_mods_count = %d\n", mbs.mb_mods_count);
mb_debug("multiboot: entry_addr = %#x", mh_entry_addr);
mb_debug(" mb_buf_phys = "TARGET_FMT_plx, mbs.mb_buf_phys);
mb_debug(" mod_start = "TARGET_FMT_plx,
mbs.mb_buf_phys + mbs.offset_mods);
mb_debug(" mb_mods_count = %d", mbs.mb_mods_count);
/* save bootinfo off the stack */
mb_bootinfo_data = g_memdup(bootinfo, sizeof(bootinfo));

View File

@ -39,9 +39,10 @@ vtd_fault_disabled(void) "Fault processing disabled for context entry"
vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t domain, uint64_t hi, uint64_t lo) "replay valid context device %02"PRIx8":%02"PRIx8".%02"PRIx8" domain 0x%"PRIx16" hi 0x%"PRIx64" lo 0x%"PRIx64
vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
vtd_page_walk_one(uint32_t level, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "detected page level 0x%"PRIx32" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
vtd_page_walk_one(uint16_t domain, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "domain 0x%"PRIu16" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
vtd_page_walk_one_skip_map(uint64_t iova, uint64_t mask, uint64_t translated) "iova 0x%"PRIx64" mask 0x%"PRIx64" translated 0x%"PRIx64
vtd_page_walk_one_skip_unmap(uint64_t iova, uint64_t mask) "iova 0x%"PRIx64" mask 0x%"PRIx64
vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64

View File

@ -533,13 +533,6 @@ static void ahci_check_cmd_bh(void *opaque)
qemu_bh_delete(ad->check_bh);
ad->check_bh = NULL;
if ((ad->busy_slot != -1) &&
!(ad->port.ifs[0].status & (BUSY_STAT|DRQ_STAT))) {
/* no longer busy */
ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
ad->busy_slot = -1;
}
check_cmd(ad->hba, ad->port_no);
}
@ -1426,6 +1419,12 @@ static void ahci_cmd_done(IDEDMA *dma)
trace_ahci_cmd_done(ad->hba, ad->port_no);
/* no longer busy */
if (ad->busy_slot != -1) {
ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
ad->busy_slot = -1;
}
/* update d2h status */
ahci_write_fis_d2h(ad);

View File

@ -1261,7 +1261,8 @@ static MemTxResult gic_cpu_read(GICState *s, int cpu, int offset,
default:
qemu_log_mask(LOG_GUEST_ERROR,
"gic_cpu_read: Bad offset %x\n", (int)offset);
return MEMTX_ERROR;
*data = 0;
break;
}
return MEMTX_OK;
}
@ -1329,7 +1330,7 @@ static MemTxResult gic_cpu_write(GICState *s, int cpu, int offset,
default:
qemu_log_mask(LOG_GUEST_ERROR,
"gic_cpu_write: Bad offset %x\n", (int)offset);
return MEMTX_ERROR;
return MEMTX_OK;
}
gic_update(s);
return MEMTX_OK;

View File

@ -27,6 +27,7 @@
#include "hw/intc/arm_gicv3_common.h"
#include "gicv3_internal.h"
#include "hw/arm/linux-boot-if.h"
#include "sysemu/kvm.h"
static int gicv3_pre_save(void *opaque)
{
@ -141,6 +142,79 @@ static const VMStateDescription vmstate_gicv3_cpu = {
}
};
static int gicv3_gicd_no_migration_shift_bug_pre_load(void *opaque)
{
GICv3State *cs = opaque;
/*
* The gicd_no_migration_shift_bug flag is used for migration compatibility
* for old version QEMU which may have the GICD bmp shift bug under KVM mode.
* Strictly, what we want to know is whether the migration source is using
* KVM. Since we don't have any way to determine that, we look at whether the
* destination is using KVM; this is close enough because for the older QEMU
* versions with this bug KVM -> TCG migration didn't work anyway. If the
* source is a newer QEMU without this bug it will transmit the migration
* subsection which sets the flag to true; otherwise it will remain set to
* the value we select here.
*/
if (kvm_enabled()) {
cs->gicd_no_migration_shift_bug = false;
}
return 0;
}
static int gicv3_gicd_no_migration_shift_bug_post_load(void *opaque,
int version_id)
{
GICv3State *cs = opaque;
if (cs->gicd_no_migration_shift_bug) {
return 0;
}
/* Older versions of QEMU had a bug in the handling of state save/restore
* to the KVM GICv3: they got the offset in the bitmap arrays wrong,
* so that instead of the data for external interrupts 32 and up
* starting at bit position 32 in the bitmap, it started at bit
* position 64. If we're receiving data from a QEMU with that bug,
* we must move the data down into the right place.
*/
memmove(cs->group, (uint8_t *)cs->group + GIC_INTERNAL / 8,
sizeof(cs->group) - GIC_INTERNAL / 8);
memmove(cs->grpmod, (uint8_t *)cs->grpmod + GIC_INTERNAL / 8,
sizeof(cs->grpmod) - GIC_INTERNAL / 8);
memmove(cs->enabled, (uint8_t *)cs->enabled + GIC_INTERNAL / 8,
sizeof(cs->enabled) - GIC_INTERNAL / 8);
memmove(cs->pending, (uint8_t *)cs->pending + GIC_INTERNAL / 8,
sizeof(cs->pending) - GIC_INTERNAL / 8);
memmove(cs->active, (uint8_t *)cs->active + GIC_INTERNAL / 8,
sizeof(cs->active) - GIC_INTERNAL / 8);
memmove(cs->edge_trigger, (uint8_t *)cs->edge_trigger + GIC_INTERNAL / 8,
sizeof(cs->edge_trigger) - GIC_INTERNAL / 8);
/*
* While this new version QEMU doesn't have this kind of bug as we fix it,
* so it needs to set the flag to true to indicate that and it's necessary
* for next migration to work from this new version QEMU.
*/
cs->gicd_no_migration_shift_bug = true;
return 0;
}
const VMStateDescription vmstate_gicv3_gicd_no_migration_shift_bug = {
.name = "arm_gicv3/gicd_no_migration_shift_bug",
.version_id = 1,
.minimum_version_id = 1,
.pre_load = gicv3_gicd_no_migration_shift_bug_pre_load,
.post_load = gicv3_gicd_no_migration_shift_bug_post_load,
.fields = (VMStateField[]) {
VMSTATE_BOOL(gicd_no_migration_shift_bug, GICv3State),
VMSTATE_END_OF_LIST()
}
};
static const VMStateDescription vmstate_gicv3 = {
.name = "arm_gicv3",
.version_id = 1,
@ -165,6 +239,10 @@ static const VMStateDescription vmstate_gicv3 = {
VMSTATE_STRUCT_VARRAY_POINTER_UINT32(cpu, GICv3State, num_cpu,
vmstate_gicv3_cpu, GICv3CPUState),
VMSTATE_END_OF_LIST()
},
.subsections = (const VMStateDescription * []) {
&vmstate_gicv3_gicd_no_migration_shift_bug,
NULL
}
};
@ -364,6 +442,7 @@ static void arm_gicv3_common_reset(DeviceState *dev)
gicv3_gicd_group_set(s, i);
}
}
s->gicd_no_migration_shift_bug = true;
}
static void arm_gic_common_linux_init(ARMLinuxBootIf *obj,

View File

@ -431,7 +431,7 @@ static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
{
GICv3CPUState *cs = icc_cs_from_env(env);
int regno = ri->opc2 & 3;
int grp = ri->crm & 1 ? GICV3_G0 : GICV3_G1NS;
int grp = (ri->crm & 1) ? GICV3_G1NS : GICV3_G0;
uint64_t value = cs->ich_apr[grp][regno];
trace_gicv3_icv_ap_read(ri->crm & 1, regno, gicv3_redist_affid(cs), value);
@ -443,7 +443,7 @@ static void icv_ap_write(CPUARMState *env, const ARMCPRegInfo *ri,
{
GICv3CPUState *cs = icc_cs_from_env(env);
int regno = ri->opc2 & 3;
int grp = ri->crm & 1 ? GICV3_G0 : GICV3_G1NS;
int grp = (ri->crm & 1) ? GICV3_G1NS : GICV3_G0;
trace_gicv3_icv_ap_write(ri->crm & 1, regno, gicv3_redist_affid(cs), value);
@ -1465,7 +1465,7 @@ static uint64_t icc_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
uint64_t value;
int regno = ri->opc2 & 3;
int grp = ri->crm & 1 ? GICV3_G0 : GICV3_G1;
int grp = (ri->crm & 1) ? GICV3_G1 : GICV3_G0;
if (icv_access(env, grp == GICV3_G0 ? HCR_FMO : HCR_IMO)) {
return icv_ap_read(env, ri);
@ -1487,7 +1487,7 @@ static void icc_ap_write(CPUARMState *env, const ARMCPRegInfo *ri,
GICv3CPUState *cs = icc_cs_from_env(env);
int regno = ri->opc2 & 3;
int grp = ri->crm & 1 ? GICV3_G0 : GICV3_G1;
int grp = (ri->crm & 1) ? GICV3_G1 : GICV3_G0;
if (icv_access(env, grp == GICV3_G0 ? HCR_FMO : HCR_IMO)) {
icv_ap_write(env, ri, value);
@ -2296,7 +2296,7 @@ static uint64_t ich_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
{
GICv3CPUState *cs = icc_cs_from_env(env);
int regno = ri->opc2 & 3;
int grp = ri->crm & 1 ? GICV3_G0 : GICV3_G1NS;
int grp = (ri->crm & 1) ? GICV3_G1NS : GICV3_G0;
uint64_t value;
value = cs->ich_apr[grp][regno];
@ -2309,7 +2309,7 @@ static void ich_ap_write(CPUARMState *env, const ARMCPRegInfo *ri,
{
GICv3CPUState *cs = icc_cs_from_env(env);
int regno = ri->opc2 & 3;
int grp = ri->crm & 1 ? GICV3_G0 : GICV3_G1NS;
int grp = (ri->crm & 1) ? GICV3_G1NS : GICV3_G0;
trace_gicv3_ich_ap_write(ri->crm & 1, regno, gicv3_redist_affid(cs), value);

View File

@ -817,6 +817,13 @@ MemTxResult gicv3_dist_read(void *opaque, hwaddr offset, uint64_t *data,
"%s: invalid guest read at offset " TARGET_FMT_plx
"size %u\n", __func__, offset, size);
trace_gicv3_dist_badread(offset, size, attrs.secure);
/* The spec requires that reserved registers are RAZ/WI;
* so use MEMTX_ERROR returns from leaf functions as a way to
* trigger the guest-error logging but don't return it to
* the caller, or we'll cause a spurious guest data abort.
*/
r = MEMTX_OK;
*data = 0;
} else {
trace_gicv3_dist_read(offset, *data, size, attrs.secure);
}
@ -852,6 +859,12 @@ MemTxResult gicv3_dist_write(void *opaque, hwaddr offset, uint64_t data,
"%s: invalid guest write at offset " TARGET_FMT_plx
"size %u\n", __func__, offset, size);
trace_gicv3_dist_badwrite(offset, data, size, attrs.secure);
/* The spec requires that reserved registers are RAZ/WI;
* so use MEMTX_ERROR returns from leaf functions as a way to
* trigger the guest-error logging but don't return it to
* the caller, or we'll cause a spurious guest data abort.
*/
r = MEMTX_OK;
} else {
trace_gicv3_dist_write(offset, data, size, attrs.secure);
}

View File

@ -67,7 +67,8 @@ static MemTxResult gicv3_its_trans_read(void *opaque, hwaddr offset,
MemTxAttrs attrs)
{
qemu_log_mask(LOG_GUEST_ERROR, "ITS read at offset 0x%"PRIx64"\n", offset);
return MEMTX_ERROR;
*data = 0;
return MEMTX_OK;
}
static MemTxResult gicv3_its_trans_write(void *opaque, hwaddr offset,
@ -82,15 +83,12 @@ static MemTxResult gicv3_its_trans_write(void *opaque, hwaddr offset,
if (ret <= 0) {
qemu_log_mask(LOG_GUEST_ERROR,
"ITS: Error sending MSI: %s\n", strerror(-ret));
return MEMTX_DECODE_ERROR;
}
return MEMTX_OK;
} else {
qemu_log_mask(LOG_GUEST_ERROR,
"ITS write at bad offset 0x%"PRIx64"\n", offset);
return MEMTX_DECODE_ERROR;
}
return MEMTX_OK;
}
static const MemoryRegionOps gicv3_its_trans_ops = {

View File

@ -135,7 +135,14 @@ static void kvm_dist_get_priority(GICv3State *s, uint32_t offset, uint8_t *bmp)
uint32_t reg, *field;
int irq;
field = (uint32_t *)bmp;
/* For the KVM GICv3, affinity routing is always enabled, and the first 8
* GICD_IPRIORITYR<n> registers are always RAZ/WI. The corresponding
* functionality is replaced by GICR_IPRIORITYR<n>. It doesn't need to
* sync them. So it needs to skip the field of GIC_INTERNAL irqs in bmp and
* offset.
*/
field = (uint32_t *)(bmp + GIC_INTERNAL);
offset += (GIC_INTERNAL * 8) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 8) {
kvm_gicd_access(s, offset, &reg, false);
*field = reg;
@ -149,7 +156,14 @@ static void kvm_dist_put_priority(GICv3State *s, uint32_t offset, uint8_t *bmp)
uint32_t reg, *field;
int irq;
field = (uint32_t *)bmp;
/* For the KVM GICv3, affinity routing is always enabled, and the first 8
* GICD_IPRIORITYR<n> registers are always RAZ/WI. The corresponding
* functionality is replaced by GICR_IPRIORITYR<n>. It doesn't need to
* sync them. So it needs to skip the field of GIC_INTERNAL irqs in bmp and
* offset.
*/
field = (uint32_t *)(bmp + GIC_INTERNAL);
offset += (GIC_INTERNAL * 8) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 8) {
reg = *field;
kvm_gicd_access(s, offset, &reg, true);
@ -164,6 +178,14 @@ static void kvm_dist_get_edge_trigger(GICv3State *s, uint32_t offset,
uint32_t reg;
int irq;
/* For the KVM GICv3, affinity routing is always enabled, and the first 2
* GICD_ICFGR<n> registers are always RAZ/WI. The corresponding
* functionality is replaced by GICR_ICFGR<n>. It doesn't need to sync
* them. So it should increase the offset to skip GIC_INTERNAL irqs.
* This matches the for_each_dist_irq_reg() macro which also skips the
* first GIC_INTERNAL irqs.
*/
offset += (GIC_INTERNAL * 2) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 2) {
kvm_gicd_access(s, offset, &reg, false);
reg = half_unshuffle32(reg >> 1);
@ -181,6 +203,14 @@ static void kvm_dist_put_edge_trigger(GICv3State *s, uint32_t offset,
uint32_t reg;
int irq;
/* For the KVM GICv3, affinity routing is always enabled, and the first 2
* GICD_ICFGR<n> registers are always RAZ/WI. The corresponding
* functionality is replaced by GICR_ICFGR<n>. It doesn't need to sync
* them. So it should increase the offset to skip GIC_INTERNAL irqs.
* This matches the for_each_dist_irq_reg() macro which also skips the
* first GIC_INTERNAL irqs.
*/
offset += (GIC_INTERNAL * 2) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 2) {
reg = *gic_bmp_ptr32(bmp, irq);
if (irq % 32 != 0) {
@ -222,6 +252,15 @@ static void kvm_dist_getbmp(GICv3State *s, uint32_t offset, uint32_t *bmp)
uint32_t reg;
int irq;
/* For the KVM GICv3, affinity routing is always enabled, and the
* GICD_IGROUPR0/GICD_IGRPMODR0/GICD_ISENABLER0/GICD_ISPENDR0/
* GICD_ISACTIVER0 registers are always RAZ/WI. The corresponding
* functionality is replaced by the GICR registers. It doesn't need to sync
* them. So it should increase the offset to skip GIC_INTERNAL irqs.
* This matches the for_each_dist_irq_reg() macro which also skips the
* first GIC_INTERNAL irqs.
*/
offset += (GIC_INTERNAL * 1) / 8;
for_each_dist_irq_reg(irq, s->num_irq, 1) {
kvm_gicd_access(s, offset, &reg, false);
*gic_bmp_ptr32(bmp, irq) = reg;
@ -235,6 +274,19 @@ static void kvm_dist_putbmp(GICv3State *s, uint32_t offset,
uint32_t reg;
int irq;
/* For the KVM GICv3, affinity routing is always enabled, and the
* GICD_IGROUPR0/GICD_IGRPMODR0/GICD_ISENABLER0/GICD_ISPENDR0/
* GICD_ISACTIVER0 registers are always RAZ/WI. The corresponding
* functionality is replaced by the GICR registers. It doesn't need to sync
* them. So it should increase the offset and clroffset to skip GIC_INTERNAL
* irqs. This matches the for_each_dist_irq_reg() macro which also skips the
* first GIC_INTERNAL irqs.
*/
offset += (GIC_INTERNAL * 1) / 8;
if (clroffset != 0) {
clroffset += (GIC_INTERNAL * 1) / 8;
}
for_each_dist_irq_reg(irq, s->num_irq, 1) {
/* If this bitmap is a set/clear register pair, first write to the
* clear-reg to clear all bits before using the set-reg to write
@ -243,6 +295,7 @@ static void kvm_dist_putbmp(GICv3State *s, uint32_t offset,
if (clroffset != 0) {
reg = 0;
kvm_gicd_access(s, clroffset, &reg, true);
clroffset += 4;
}
reg = *gic_bmp_ptr32(bmp, irq);
kvm_gicd_access(s, offset, &reg, true);

View File

@ -455,6 +455,13 @@ MemTxResult gicv3_redist_read(void *opaque, hwaddr offset, uint64_t *data,
"size %u\n", __func__, offset, size);
trace_gicv3_redist_badread(gicv3_redist_affid(cs), offset,
size, attrs.secure);
/* The spec requires that reserved registers are RAZ/WI;
* so use MEMTX_ERROR returns from leaf functions as a way to
* trigger the guest-error logging but don't return it to
* the caller, or we'll cause a spurious guest data abort.
*/
r = MEMTX_OK;
*data = 0;
} else {
trace_gicv3_redist_read(gicv3_redist_affid(cs), offset, *data,
size, attrs.secure);
@ -505,6 +512,12 @@ MemTxResult gicv3_redist_write(void *opaque, hwaddr offset, uint64_t data,
"size %u\n", __func__, offset, size);
trace_gicv3_redist_badwrite(gicv3_redist_affid(cs), offset, data,
size, attrs.secure);
/* The spec requires that reserved registers are RAZ/WI;
* so use MEMTX_ERROR returns from leaf functions as a way to
* trigger the guest-error logging but don't return it to
* the caller, or we'll cause a spurious guest data abort.
*/
r = MEMTX_OK;
} else {
trace_gicv3_redist_write(gicv3_redist_affid(cs), offset, data,
size, attrs.secure);

View File

@ -124,10 +124,6 @@ static void kvm_openpic_region_add(MemoryListener *listener,
uint64_t reg_base;
int ret;
if (section->fv != address_space_to_flatview(&address_space_memory)) {
abort();
}
/* Ignore events on regions that are not us */
if (section->mr != &opp->mem) {
return;

View File

@ -422,6 +422,7 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
static void virtio_net_reset(VirtIODevice *vdev)
{
VirtIONet *n = VIRTIO_NET(vdev);
int i;
/* Reset back to compatibility mode */
n->promisc = 1;
@ -445,6 +446,16 @@ static void virtio_net_reset(VirtIODevice *vdev)
memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
memset(n->vlans, 0, MAX_VLAN >> 3);
/* Flush any async TX */
for (i = 0; i < n->max_queues; i++) {
NetClientState *nc = qemu_get_subqueue(n->nic, i);
if (nc->peer) {
qemu_flush_or_purge_queued_packets(nc->peer, true);
assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
}
}
}
static void peer_test_vnet_hdr(VirtIONet *n)

View File

@ -74,8 +74,13 @@ static void gen_rp_realize(DeviceState *dev, Error **errp)
PCIDevice *d = PCI_DEVICE(dev);
GenPCIERootPort *grp = GEN_PCIE_ROOT_PORT(d);
PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
Error *local_err = NULL;
rpc->parent_realize(dev, errp);
rpc->parent_realize(dev, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
int rc = pci_bridge_qemu_reserve_cap_init(d, 0, grp->bus_reserve,
grp->io_reserve, grp->mem_reserve, grp->pref32_reserve,

View File

@ -98,6 +98,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, void *data)
k->realize = i82801b11_bridge_realize;
k->config_write = pci_bridge_write_config;
dc->vmsd = &i82801b11_bridge_dev_vmstate;
dc->reset = pci_bridge_reset;
set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
}

View File

@ -1,7 +1,7 @@
# shared objects
obj-y += ppc.o ppc_booke.o fdt.o
# IBM pSeries (sPAPR)
obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
obj-$(CONFIG_PSERIES) += spapr.o spapr_caps.o spapr_vio.o spapr_events.o
obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o

View File

@ -100,6 +100,21 @@
#define PHANDLE_XICP 0x00001111
/* These two functions implement the VCPU id numbering: one to compute them
* all and one to identify thread 0 of a VCORE. Any change to the first one
* is likely to have an impact on the second one, so let's keep them close.
*/
static int spapr_vcpu_id(sPAPRMachineState *spapr, int cpu_index)
{
return
(cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads;
}
static bool spapr_is_thread0_in_vcore(sPAPRMachineState *spapr,
PowerPCCPU *cpu)
{
return spapr_get_vcpu_id(cpu) % spapr->vsmt == 0;
}
static ICSState *spapr_ics_create(sPAPRMachineState *spapr,
const char *type_ics,
int nr_irqs, Error **errp)
@ -161,15 +176,14 @@ static void pre_2_10_vmstate_unregister_dummy_icp(int i)
(void *)(uintptr_t) i);
}
static inline int xics_max_server_number(void)
static int xics_max_server_number(sPAPRMachineState *spapr)
{
return DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads);
}
static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
{
sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
if (kvm_enabled()) {
if (machine_kernel_irqchip_allowed(machine) &&
@ -191,17 +205,6 @@ static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
return;
}
}
if (smc->pre_2_10_has_unused_icps) {
int i;
for (i = 0; i < xics_max_server_number(); i++) {
/* Dummy entries get deregistered when real ICPState objects
* are registered during CPU core hotplug.
*/
pre_2_10_vmstate_register_dummy_icp(i);
}
}
}
static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
@ -210,7 +213,7 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
int i, ret = 0;
uint32_t servers_prop[smt_threads];
uint32_t gservers_prop[smt_threads * 2];
int index = spapr_vcpu_id(cpu);
int index = spapr_get_vcpu_id(cpu);
if (cpu->compat_pvr) {
ret = fdt_setprop_cell(fdt, offset, "cpu-version", cpu->compat_pvr);
@ -239,7 +242,7 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu)
{
int index = spapr_vcpu_id(cpu);
int index = spapr_get_vcpu_id(cpu);
uint32_t associativity[] = {cpu_to_be32(0x5),
cpu_to_be32(0x0),
cpu_to_be32(0x0),
@ -253,7 +256,9 @@ static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu)
}
/* Populate the "ibm,pa-features" property */
static void spapr_populate_pa_features(PowerPCCPU *cpu, void *fdt, int offset,
static void spapr_populate_pa_features(sPAPRMachineState *spapr,
PowerPCCPU *cpu,
void *fdt, int offset,
bool legacy_guest)
{
CPUPPCState *env = &cpu->env;
@ -318,7 +323,7 @@ static void spapr_populate_pa_features(PowerPCCPU *cpu, void *fdt, int offset,
*/
pa_features[3] |= 0x20;
}
if (kvmppc_has_cap_htm() && pa_size > 24) {
if ((spapr_get_cap(spapr, SPAPR_CAP_HTM) != 0) && pa_size > 24) {
pa_features[24] |= 0x80; /* Transactional memory support */
}
if (legacy_guest && pa_size > 40) {
@ -336,16 +341,15 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState *spapr)
int ret = 0, offset, cpus_offset;
CPUState *cs;
char cpu_model[32];
int smt = kvmppc_smt_threads();
uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
CPU_FOREACH(cs) {
PowerPCCPU *cpu = POWERPC_CPU(cs);
DeviceClass *dc = DEVICE_GET_CLASS(cs);
int index = spapr_vcpu_id(cpu);
int compat_smt = MIN(smp_threads, ppc_compat_max_threads(cpu));
int index = spapr_get_vcpu_id(cpu);
int compat_smt = MIN(smp_threads, ppc_compat_max_vthreads(cpu));
if ((index % smt) != 0) {
if (!spapr_is_thread0_in_vcore(spapr, cpu)) {
continue;
}
@ -384,8 +388,8 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState *spapr)
return ret;
}
spapr_populate_pa_features(cpu, fdt, offset,
spapr->cas_legacy_guest_workaround);
spapr_populate_pa_features(spapr, cpu, fdt, offset,
spapr->cas_legacy_guest_workaround);
}
return ret;
}
@ -491,7 +495,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
PowerPCCPU *cpu = POWERPC_CPU(cs);
CPUPPCState *env = &cpu->env;
PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
int index = spapr_vcpu_id(cpu);
int index = spapr_get_vcpu_id(cpu);
uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
0xffffffff, 0xffffffff};
uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq()
@ -501,7 +505,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
size_t page_sizes_prop_size;
uint32_t vcpus_per_socket = smp_threads * smp_cores;
uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
int compat_smt = MIN(smp_threads, ppc_compat_max_threads(cpu));
int compat_smt = MIN(smp_threads, ppc_compat_max_vthreads(cpu));
sPAPRDRConnector *drc;
int drc_index;
uint32_t radix_AP_encodings[PPC_PAGE_SIZES_MAX_SZ];
@ -555,20 +559,22 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
segs, sizeof(segs))));
}
/* Advertise VMX/VSX (vector extensions) if available
* 0 / no property == no vector extensions
/* Advertise VSX (vector extensions) if available
* 1 == VMX / Altivec available
* 2 == VSX available */
if (env->insns_flags & PPC_ALTIVEC) {
uint32_t vmx = (env->insns_flags2 & PPC2_VSX) ? 2 : 1;
_FDT((fdt_setprop_cell(fdt, offset, "ibm,vmx", vmx)));
* 2 == VSX available
*
* Only CPUs for which we create core types in spapr_cpu_core.c
* are possible, and all of those have VMX */
if (spapr_get_cap(spapr, SPAPR_CAP_VSX) != 0) {
_FDT((fdt_setprop_cell(fdt, offset, "ibm,vmx", 2)));
} else {
_FDT((fdt_setprop_cell(fdt, offset, "ibm,vmx", 1)));
}
/* Advertise DFP (Decimal Floating Point) if available
* 0 / no property == no DFP
* 1 == DFP available */
if (env->insns_flags2 & PPC2_DFP) {
if (spapr_get_cap(spapr, SPAPR_CAP_DFP) != 0) {
_FDT((fdt_setprop_cell(fdt, offset, "ibm,dfp", 1)));
}
@ -579,7 +585,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
page_sizes_prop, page_sizes_prop_size)));
}
spapr_populate_pa_features(cpu, fdt, offset, false);
spapr_populate_pa_features(spapr, cpu, fdt, offset, false);
_FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id",
cs->cpu_index / vcpus_per_socket)));
@ -610,7 +616,6 @@ static void spapr_populate_cpus_dt_node(void *fdt, sPAPRMachineState *spapr)
CPUState *cs;
int cpus_offset;
char *nodename;
int smt = kvmppc_smt_threads();
cpus_offset = fdt_add_subnode(fdt, 0, "cpus");
_FDT(cpus_offset);
@ -624,11 +629,11 @@ static void spapr_populate_cpus_dt_node(void *fdt, sPAPRMachineState *spapr)
*/
CPU_FOREACH_REVERSE(cs) {
PowerPCCPU *cpu = POWERPC_CPU(cs);
int index = spapr_vcpu_id(cpu);
int index = spapr_get_vcpu_id(cpu);
DeviceClass *dc = DEVICE_GET_CLASS(cs);
int offset;
if ((index % smt) != 0) {
if (!spapr_is_thread0_in_vcore(spapr, cpu)) {
continue;
}
@ -1101,7 +1106,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
_FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
/* /interrupt controller */
spapr_dt_xics(xics_max_server_number(), fdt, PHANDLE_XICP);
spapr_dt_xics(xics_max_server_number(spapr), fdt, PHANDLE_XICP);
ret = spapr_populate_memory(spapr, fdt);
if (ret < 0) {
@ -1440,7 +1445,12 @@ static void ppc_spapr_reset(void)
/* Check for unknown sysbus devices */
foreach_dynamic_sysbus_device(find_unknown_sysbus_device, NULL);
if (kvm_enabled() && kvmppc_has_cap_mmu_radix()) {
spapr_caps_reset(spapr);
first_ppc_cpu = POWERPC_CPU(first_cpu);
if (kvm_enabled() && kvmppc_has_cap_mmu_radix() &&
ppc_check_compat(first_ppc_cpu, CPU_POWERPC_LOGICAL_3_00, 0,
spapr->max_compat_pvr)) {
/* If using KVM with radix mode available, VCPUs can be started
* without a HPT because KVM will start them in radix mode.
* Set the GR bit in PATB so that we know there is no HPT. */
@ -1449,6 +1459,15 @@ static void ppc_spapr_reset(void)
spapr_setup_hpt_and_vrma(spapr);
}
/* if this reset wasn't generated by CAS, we should reset our
* negotiated options and start from scratch */
if (!spapr->cas_reboot) {
spapr_ovec_cleanup(spapr->ov5_cas);
spapr->ov5_cas = spapr_ovec_new();
ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal);
}
qemu_devices_reset();
/* DRC reset may cause a device to be unplugged. This will cause troubles
@ -1469,15 +1488,6 @@ static void ppc_spapr_reset(void)
rtas_addr = rtas_limit - RTAS_MAX_SIZE;
fdt_addr = rtas_addr - FDT_MAX_SIZE;
/* if this reset wasn't generated by CAS, we should reset our
* negotiated options and start from scratch */
if (!spapr->cas_reboot) {
spapr_ovec_cleanup(spapr->ov5_cas);
spapr->ov5_cas = spapr_ovec_new();
ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
}
fdt = spapr_build_fdt(spapr, rtas_addr, spapr->rtas_size);
spapr_load_rtas(spapr, fdt, rtas_addr);
@ -1499,7 +1509,6 @@ static void ppc_spapr_reset(void)
g_free(fdt);
/* Set up the entry state */
first_ppc_cpu = POWERPC_CPU(first_cpu);
first_ppc_cpu->env.gpr[3] = fdt_addr;
first_ppc_cpu->env.gpr[5] = 0;
first_cpu->halted = 0;
@ -1552,11 +1561,28 @@ static bool spapr_vga_init(PCIBus *pci_bus, Error **errp)
}
}
static int spapr_pre_load(void *opaque)
{
int rc;
rc = spapr_caps_pre_load(opaque);
if (rc) {
return rc;
}
return 0;
}
static int spapr_post_load(void *opaque, int version_id)
{
sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
int err = 0;
err = spapr_caps_post_migration(spapr);
if (err) {
return err;
}
if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) {
CPUState *cs;
CPU_FOREACH(cs) {
@ -1588,6 +1614,18 @@ static int spapr_post_load(void *opaque, int version_id)
return err;
}
static int spapr_pre_save(void *opaque)
{
int rc;
rc = spapr_caps_pre_save(opaque);
if (rc) {
return rc;
}
return 0;
}
static bool version_before_3(void *opaque, int version_id)
{
return version_id < 3;
@ -1708,7 +1746,9 @@ static const VMStateDescription vmstate_spapr = {
.name = "spapr",
.version_id = 3,
.minimum_version_id = 1,
.pre_load = spapr_pre_load,
.post_load = spapr_post_load,
.pre_save = spapr_pre_save,
.fields = (VMStateField[]) {
/* used to be @next_irq */
VMSTATE_UNUSED_BUFFER(version_before_3, 0, 4),
@ -1723,6 +1763,12 @@ static const VMStateDescription vmstate_spapr = {
&vmstate_spapr_ov5_cas,
&vmstate_spapr_patb_entry,
&vmstate_spapr_pending_events,
&vmstate_spapr_cap_htm,
&vmstate_spapr_cap_vsx,
&vmstate_spapr_cap_dfp,
&vmstate_spapr_cap_cfpc,
&vmstate_spapr_cap_sbbc,
&vmstate_spapr_cap_ibs,
NULL
}
};
@ -2152,8 +2198,8 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
{
MachineState *machine = MACHINE(spapr);
MachineClass *mc = MACHINE_GET_CLASS(machine);
sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
const char *type = spapr_get_cpu_core_type(machine->cpu_type);
int smt = kvmppc_smt_threads();
const CPUArchIdList *possible_cpus;
int boot_cores_nr = smp_cpus / smp_threads;
int i;
@ -2183,12 +2229,23 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
boot_cores_nr = possible_cpus->len;
}
if (smc->pre_2_10_has_unused_icps) {
int i;
for (i = 0; i < xics_max_server_number(spapr); i++) {
/* Dummy entries get deregistered when real ICPState objects
* are registered during CPU core hotplug.
*/
pre_2_10_vmstate_register_dummy_icp(i);
}
}
for (i = 0; i < possible_cpus->len; i++) {
int core_id = i * smp_threads;
if (mc->has_hotpluggable_cpus) {
spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_CPU,
(core_id / smp_threads) * smt);
spapr_vcpu_id(spapr, core_id));
}
if (i < boot_cores_nr) {
@ -2237,26 +2294,43 @@ static void spapr_set_vsmt_mode(sPAPRMachineState *spapr, Error **errp)
}
/* In this case, spapr->vsmt has been set by the command line */
} else {
/* Choose a VSMT mode that may be higher than necessary but is
* likely to be compatible with hosts that don't have VSMT. */
spapr->vsmt = MAX(kvm_smt, smp_threads);
/*
* Default VSMT value is tricky, because we need it to be as
* consistent as possible (for migration), but this requires
* changing it for at least some existing cases. We pick 8 as
* the value that we'd get with KVM on POWER8, the
* overwhelmingly common case in production systems.
*/
spapr->vsmt = MAX(8, smp_threads);
}
/* KVM: If necessary, set the SMT mode: */
if (kvm_enabled() && (spapr->vsmt != kvm_smt)) {
ret = kvmppc_set_smt_threads(spapr->vsmt);
if (ret) {
/* Looks like KVM isn't able to change VSMT mode */
error_setg(&local_err,
"Failed to set KVM's VSMT mode to %d (errno %d)",
spapr->vsmt, ret);
if (!vsmt_user) {
error_append_hint(&local_err, "On PPC, a VM with %d threads/"
"core on a host with %d threads/core requires "
" the use of VSMT mode %d.\n",
smp_threads, kvm_smt, spapr->vsmt);
/* We can live with that if the default one is big enough
* for the number of threads, and a submultiple of the one
* we want. In this case we'll waste some vcpu ids, but
* behaviour will be correct */
if ((kvm_smt >= smp_threads) && ((spapr->vsmt % kvm_smt) == 0)) {
warn_report_err(local_err);
local_err = NULL;
goto out;
} else {
if (!vsmt_user) {
error_append_hint(&local_err,
"On PPC, a VM with %d threads/core"
" on a host with %d threads/core"
" requires the use of VSMT mode %d.\n",
smp_threads, kvm_smt, spapr->vsmt);
}
kvmppc_hint_smt_possible(&local_err);
goto out;
}
kvmppc_hint_smt_possible(&local_err);
goto out;
}
}
/* else TCG: nothing to do currently */
@ -2282,6 +2356,7 @@ static void ppc_spapr_init(MachineState *machine)
long load_limit, fw_size;
char *filename;
Error *resize_hpt_err = NULL;
PowerPCCPU *first_ppc_cpu;
msi_nonbroken = true;
@ -2374,11 +2449,6 @@ static void ppc_spapr_init(MachineState *machine)
}
spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
if (!kvm_enabled() || kvmppc_has_cap_mmu_radix()) {
/* KVM and TCG always allow GTSE with radix... */
spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
}
/* ... but not with hash (currently). */
/* advertise support for dedicated HP event source to guests */
if (spapr->use_hotplug_event_source) {
@ -2395,6 +2465,15 @@ static void ppc_spapr_init(MachineState *machine)
spapr_init_cpus(spapr);
first_ppc_cpu = POWERPC_CPU(first_cpu);
if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
ppc_check_compat(first_ppc_cpu, CPU_POWERPC_LOGICAL_3_00, 0,
spapr->max_compat_pvr)) {
/* KVM and TCG always allow GTSE with radix... */
spapr_ovec_set(spapr->ov5, OV5_MMU_RADIX_GTSE);
}
/* ... but not with hash (currently). */
if (kvm_enabled()) {
/* Enable H_LOGICAL_CI_* so SLOF can talk to in-kernel devices */
kvmppc_enable_logical_ci_hcalls();
@ -3154,7 +3233,7 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
{
PowerPCCPU *cpu = POWERPC_CPU(cs);
DeviceClass *dc = DEVICE_GET_CLASS(cs);
int id = spapr_vcpu_id(cpu);
int id = spapr_get_vcpu_id(cpu);
void *fdt;
int offset, fdt_size;
char *nodename;
@ -3200,10 +3279,10 @@ static
void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
Error **errp)
{
sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
int index;
sPAPRDRConnector *drc;
CPUCore *cc = CPU_CORE(dev);
int smt = kvmppc_smt_threads();
if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) {
error_setg(errp, "Unable to find CPU core with core-id: %d",
@ -3215,7 +3294,8 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
return;
}
drc = spapr_drc_by_id(TYPE_SPAPR_DRC_CPU, index * smt);
drc = spapr_drc_by_id(TYPE_SPAPR_DRC_CPU,
spapr_vcpu_id(spapr, cc->core_id));
g_assert(drc);
spapr_drc_detach(drc);
@ -3234,7 +3314,6 @@ static void spapr_core_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
CPUState *cs = CPU(core->threads);
sPAPRDRConnector *drc;
Error *local_err = NULL;
int smt = kvmppc_smt_threads();
CPUArchId *core_slot;
int index;
bool hotplugged = spapr_drc_hotplugged(dev);
@ -3245,7 +3324,8 @@ static void spapr_core_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
cc->core_id);
return;
}
drc = spapr_drc_by_id(TYPE_SPAPR_DRC_CPU, index * smt);
drc = spapr_drc_by_id(TYPE_SPAPR_DRC_CPU,
spapr_vcpu_id(spapr, cc->core_id));
g_assert(drc || !mc->has_hotpluggable_cpus);
@ -3578,15 +3658,27 @@ static void spapr_pic_print_info(InterruptStatsProvider *obj,
ics_pic_print_info(spapr->ics, mon);
}
int spapr_vcpu_id(PowerPCCPU *cpu)
int spapr_get_vcpu_id(PowerPCCPU *cpu)
{
CPUState *cs = CPU(cpu);
return cpu->vcpu_id;
}
if (kvm_enabled()) {
return kvm_arch_vcpu_id(cs);
} else {
return cs->cpu_index;
void spapr_set_vcpu_id(PowerPCCPU *cpu, int cpu_index, Error **errp)
{
sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
int vcpu_id;
vcpu_id = spapr_vcpu_id(spapr, cpu_index);
if (kvm_enabled() && !kvm_vcpu_id_is_valid(vcpu_id)) {
error_setg(errp, "Can't create CPU with id %d in KVM", vcpu_id);
error_append_hint(errp, "Adjust the number of cpus to %d "
"or try to raise the number of threads per core\n",
vcpu_id * smp_threads / spapr->vsmt);
return;
}
cpu->vcpu_id = vcpu_id;
}
PowerPCCPU *spapr_find_cpu(int vcpu_id)
@ -3596,7 +3688,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
CPU_FOREACH(cs) {
PowerPCCPU *cpu = POWERPC_CPU(cs);
if (spapr_vcpu_id(cpu) == vcpu_id) {
if (spapr_get_vcpu_id(cpu) == vcpu_id) {
return cpu;
}
}
@ -3663,6 +3755,14 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
* in which LMBs are represented and hot-added
*/
mc->numa_mem_align_shift = 28;
smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON;
smc->default_caps.caps[SPAPR_CAP_DFP] = SPAPR_CAP_ON;
smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
spapr_caps_add_properties(smc, &error_abort);
}
static const TypeInfo spapr_machine_info = {
@ -3713,16 +3813,38 @@ static const TypeInfo spapr_machine_info = {
} \
type_init(spapr_machine_register_##suffix)
/*
* pseries-2.12
*/
static void spapr_machine_2_12_instance_options(MachineState *machine)
{
}
static void spapr_machine_2_12_class_options(MachineClass *mc)
{
/* Defaults for the latest behaviour inherited from the base class */
}
DEFINE_SPAPR_MACHINE(2_12, "2.12", false);
/*
* pseries-2.11
*/
#define SPAPR_COMPAT_2_11 \
HW_COMPAT_2_11
static void spapr_machine_2_11_instance_options(MachineState *machine)
{
spapr_machine_2_12_instance_options(machine);
}
static void spapr_machine_2_11_class_options(MachineClass *mc)
{
/* Defaults for the latest behaviour inherited from the base class */
sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
spapr_machine_2_12_class_options(mc);
smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_ON;
SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_11);
}
DEFINE_SPAPR_MACHINE(2_11, "2.11", true);
@ -3731,10 +3853,11 @@ DEFINE_SPAPR_MACHINE(2_11, "2.11", true);
* pseries-2.10
*/
#define SPAPR_COMPAT_2_10 \
HW_COMPAT_2_10 \
HW_COMPAT_2_10
static void spapr_machine_2_10_instance_options(MachineState *machine)
{
spapr_machine_2_11_instance_options(machine);
}
static void spapr_machine_2_10_class_options(MachineClass *mc)

443
hw/ppc/spapr_caps.c 100644
View File

@ -0,0 +1,443 @@
/*
* QEMU PowerPC pSeries Logical Partition capabilities handling
*
* Copyright (c) 2017 David Gibson, Red Hat Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu/error-report.h"
#include "qapi/error.h"
#include "qapi/visitor.h"
#include "sysemu/hw_accel.h"
#include "target/ppc/cpu.h"
#include "cpu-models.h"
#include "kvm_ppc.h"
#include "hw/ppc/spapr.h"
typedef struct sPAPRCapabilityInfo {
const char *name;
const char *description;
const char *options; /* valid capability values */
int index;
/* Getter and Setter Function Pointers */
ObjectPropertyAccessor *get;
ObjectPropertyAccessor *set;
const char *type;
/* Make sure the virtual hardware can support this capability */
void (*apply)(sPAPRMachineState *spapr, uint8_t val, Error **errp);
} sPAPRCapabilityInfo;
static void spapr_cap_get_bool(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
{
sPAPRCapabilityInfo *cap = opaque;
sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
bool value = spapr_get_cap(spapr, cap->index) == SPAPR_CAP_ON;
visit_type_bool(v, name, &value, errp);
}
static void spapr_cap_set_bool(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
{
sPAPRCapabilityInfo *cap = opaque;
sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
bool value;
Error *local_err = NULL;
visit_type_bool(v, name, &value, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
spapr->cmd_line_caps[cap->index] = true;
spapr->eff.caps[cap->index] = value ? SPAPR_CAP_ON : SPAPR_CAP_OFF;
}
static void spapr_cap_get_tristate(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
{
sPAPRCapabilityInfo *cap = opaque;
sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
char *val = NULL;
uint8_t value = spapr_get_cap(spapr, cap->index);
switch (value) {
case SPAPR_CAP_BROKEN:
val = g_strdup("broken");
break;
case SPAPR_CAP_WORKAROUND:
val = g_strdup("workaround");
break;
case SPAPR_CAP_FIXED:
val = g_strdup("fixed");
break;
default:
error_setg(errp, "Invalid value (%d) for cap-%s", value, cap->name);
return;
}
visit_type_str(v, name, &val, errp);
g_free(val);
}
static void spapr_cap_set_tristate(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
{
sPAPRCapabilityInfo *cap = opaque;
sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
char *val;
Error *local_err = NULL;
uint8_t value;
visit_type_str(v, name, &val, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
if (!strcasecmp(val, "broken")) {
value = SPAPR_CAP_BROKEN;
} else if (!strcasecmp(val, "workaround")) {
value = SPAPR_CAP_WORKAROUND;
} else if (!strcasecmp(val, "fixed")) {
value = SPAPR_CAP_FIXED;
} else {
error_setg(errp, "Invalid capability mode \"%s\" for cap-%s", val,
cap->name);
goto out;
}
spapr->cmd_line_caps[cap->index] = true;
spapr->eff.caps[cap->index] = value;
out:
g_free(val);
}
static void cap_htm_apply(sPAPRMachineState *spapr, uint8_t val, Error **errp)
{
if (!val) {
/* TODO: We don't support disabling htm yet */
return;
}
if (tcg_enabled()) {
error_setg(errp,
"No Transactional Memory support in TCG, try cap-htm=off");
} else if (kvm_enabled() && !kvmppc_has_cap_htm()) {
error_setg(errp,
"KVM implementation does not support Transactional Memory, try cap-htm=off"
);
}
}
static void cap_vsx_apply(sPAPRMachineState *spapr, uint8_t val, Error **errp)
{
PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
CPUPPCState *env = &cpu->env;
if (!val) {
/* TODO: We don't support disabling vsx yet */
return;
}
/* Allowable CPUs in spapr_cpu_core.c should already have gotten
* rid of anything that doesn't do VMX */
g_assert(env->insns_flags & PPC_ALTIVEC);
if (!(env->insns_flags2 & PPC2_VSX)) {
error_setg(errp, "VSX support not available, try cap-vsx=off");
}
}
static void cap_dfp_apply(sPAPRMachineState *spapr, uint8_t val, Error **errp)
{
PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
CPUPPCState *env = &cpu->env;
if (!val) {
/* TODO: We don't support disabling dfp yet */
return;
}
if (!(env->insns_flags2 & PPC2_DFP)) {
error_setg(errp, "DFP support not available, try cap-dfp=off");
}
}
static void cap_safe_cache_apply(sPAPRMachineState *spapr, uint8_t val,
Error **errp)
{
if (tcg_enabled() && val) {
/* TODO - for now only allow broken for TCG */
error_setg(errp, "Requested safe cache capability level not supported by tcg, try a different value for cap-cfpc");
} else if (kvm_enabled() && (val > kvmppc_get_cap_safe_cache())) {
error_setg(errp, "Requested safe cache capability level not supported by kvm, try a different value for cap-cfpc");
}
}
static void cap_safe_bounds_check_apply(sPAPRMachineState *spapr, uint8_t val,
Error **errp)
{
if (tcg_enabled() && val) {
/* TODO - for now only allow broken for TCG */
error_setg(errp, "Requested safe bounds check capability level not supported by tcg, try a different value for cap-sbbc");
} else if (kvm_enabled() && (val > kvmppc_get_cap_safe_bounds_check())) {
error_setg(errp, "Requested safe bounds check capability level not supported by kvm, try a different value for cap-sbbc");
}
}
static void cap_safe_indirect_branch_apply(sPAPRMachineState *spapr,
uint8_t val, Error **errp)
{
if (tcg_enabled() && val) {
/* TODO - for now only allow broken for TCG */
error_setg(errp, "Requested safe indirect branch capability level not supported by tcg, try a different value for cap-ibs");
} else if (kvm_enabled() && (val > kvmppc_get_cap_safe_indirect_branch())) {
error_setg(errp, "Requested safe indirect branch capability level not supported by kvm, try a different value for cap-ibs");
}
}
#define VALUE_DESC_TRISTATE " (broken, workaround, fixed)"
sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
[SPAPR_CAP_HTM] = {
.name = "htm",
.description = "Allow Hardware Transactional Memory (HTM)",
.options = "",
.index = SPAPR_CAP_HTM,
.get = spapr_cap_get_bool,
.set = spapr_cap_set_bool,
.type = "bool",
.apply = cap_htm_apply,
},
[SPAPR_CAP_VSX] = {
.name = "vsx",
.description = "Allow Vector Scalar Extensions (VSX)",
.options = "",
.index = SPAPR_CAP_VSX,
.get = spapr_cap_get_bool,
.set = spapr_cap_set_bool,
.type = "bool",
.apply = cap_vsx_apply,
},
[SPAPR_CAP_DFP] = {
.name = "dfp",
.description = "Allow Decimal Floating Point (DFP)",
.options = "",
.index = SPAPR_CAP_DFP,
.get = spapr_cap_get_bool,
.set = spapr_cap_set_bool,
.type = "bool",
.apply = cap_dfp_apply,
},
[SPAPR_CAP_CFPC] = {
.name = "cfpc",
.description = "Cache Flush on Privilege Change" VALUE_DESC_TRISTATE,
.index = SPAPR_CAP_CFPC,
.get = spapr_cap_get_tristate,
.set = spapr_cap_set_tristate,
.type = "string",
.apply = cap_safe_cache_apply,
},
[SPAPR_CAP_SBBC] = {
.name = "sbbc",
.description = "Speculation Barrier Bounds Checking" VALUE_DESC_TRISTATE,
.index = SPAPR_CAP_SBBC,
.get = spapr_cap_get_tristate,
.set = spapr_cap_set_tristate,
.type = "string",
.apply = cap_safe_bounds_check_apply,
},
[SPAPR_CAP_IBS] = {
.name = "ibs",
.description = "Indirect Branch Serialisation" VALUE_DESC_TRISTATE,
.index = SPAPR_CAP_IBS,
.get = spapr_cap_get_tristate,
.set = spapr_cap_set_tristate,
.type = "string",
.apply = cap_safe_indirect_branch_apply,
},
};
static sPAPRCapabilities default_caps_with_cpu(sPAPRMachineState *spapr,
CPUState *cs)
{
sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
PowerPCCPU *cpu = POWERPC_CPU(cs);
sPAPRCapabilities caps;
caps = smc->default_caps;
if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_07,
0, spapr->max_compat_pvr)) {
caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
}
if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_06,
0, spapr->max_compat_pvr)) {
caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_OFF;
caps.caps[SPAPR_CAP_DFP] = SPAPR_CAP_OFF;
}
return caps;
}
int spapr_caps_pre_load(void *opaque)
{
sPAPRMachineState *spapr = opaque;
/* Set to default so we can tell if this came in with the migration */
spapr->mig = spapr->def;
return 0;
}
int spapr_caps_pre_save(void *opaque)
{
sPAPRMachineState *spapr = opaque;
spapr->mig = spapr->eff;
return 0;
}
/* This has to be called from the top-level spapr post_load, not the
* caps specific one. Otherwise it wouldn't be called when the source
* caps are all defaults, which could still conflict with overridden
* caps on the destination */
int spapr_caps_post_migration(sPAPRMachineState *spapr)
{
int i;
bool ok = true;
sPAPRCapabilities dstcaps = spapr->eff;
sPAPRCapabilities srccaps;
srccaps = default_caps_with_cpu(spapr, first_cpu);
for (i = 0; i < SPAPR_CAP_NUM; i++) {
/* If not default value then assume came in with the migration */
if (spapr->mig.caps[i] != spapr->def.caps[i]) {
srccaps.caps[i] = spapr->mig.caps[i];
}
}
for (i = 0; i < SPAPR_CAP_NUM; i++) {
sPAPRCapabilityInfo *info = &capability_table[i];
if (srccaps.caps[i] > dstcaps.caps[i]) {
error_report("cap-%s higher level (%d) in incoming stream than on destination (%d)",
info->name, srccaps.caps[i], dstcaps.caps[i]);
ok = false;
}
if (srccaps.caps[i] < dstcaps.caps[i]) {
warn_report("cap-%s lower level (%d) in incoming stream than on destination (%d)",
info->name, srccaps.caps[i], dstcaps.caps[i]);
}
}
return ok ? 0 : -EINVAL;
}
/* Used to generate the migration field and needed function for a spapr cap */
#define SPAPR_CAP_MIG_STATE(cap, ccap) \
static bool spapr_cap_##cap##_needed(void *opaque) \
{ \
sPAPRMachineState *spapr = opaque; \
\
return spapr->cmd_line_caps[SPAPR_CAP_##ccap] && \
(spapr->eff.caps[SPAPR_CAP_##ccap] != \
spapr->def.caps[SPAPR_CAP_##ccap]); \
} \
\
const VMStateDescription vmstate_spapr_cap_##cap = { \
.name = "spapr/cap/" #cap, \
.version_id = 1, \
.minimum_version_id = 1, \
.needed = spapr_cap_##cap##_needed, \
.fields = (VMStateField[]) { \
VMSTATE_UINT8(mig.caps[SPAPR_CAP_##ccap], \
sPAPRMachineState), \
VMSTATE_END_OF_LIST() \
}, \
}
SPAPR_CAP_MIG_STATE(htm, HTM);
SPAPR_CAP_MIG_STATE(vsx, VSX);
SPAPR_CAP_MIG_STATE(dfp, DFP);
SPAPR_CAP_MIG_STATE(cfpc, CFPC);
SPAPR_CAP_MIG_STATE(sbbc, SBBC);
SPAPR_CAP_MIG_STATE(ibs, IBS);
void spapr_caps_reset(sPAPRMachineState *spapr)
{
sPAPRCapabilities default_caps;
int i;
/* First compute the actual set of caps we're running with.. */
default_caps = default_caps_with_cpu(spapr, first_cpu);
for (i = 0; i < SPAPR_CAP_NUM; i++) {
/* Store the defaults */
spapr->def.caps[i] = default_caps.caps[i];
/* If not set on the command line then apply the default value */
if (!spapr->cmd_line_caps[i]) {
spapr->eff.caps[i] = default_caps.caps[i];
}
}
/* .. then apply those caps to the virtual hardware */
for (i = 0; i < SPAPR_CAP_NUM; i++) {
sPAPRCapabilityInfo *info = &capability_table[i];
/*
* If the apply function can't set the desired level and thinks it's
* fatal, it should cause that.
*/
info->apply(spapr, spapr->eff.caps[i], &error_fatal);
}
}
void spapr_caps_add_properties(sPAPRMachineClass *smc, Error **errp)
{
Error *local_err = NULL;
ObjectClass *klass = OBJECT_CLASS(smc);
int i;
for (i = 0; i < ARRAY_SIZE(capability_table); i++) {
sPAPRCapabilityInfo *cap = &capability_table[i];
const char *name = g_strdup_printf("cap-%s", cap->name);
char *desc;
object_class_property_add(klass, name, cap->type,
cap->get, cap->set,
NULL, cap, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
desc = g_strdup_printf("%s%s", cap->description, cap->options);
object_class_property_set_description(klass, name, desc, &local_err);
g_free(desc);
if (local_err) {
error_propagate(errp, local_err);
return;
}
}
}

View File

@ -35,6 +35,13 @@ static void spapr_cpu_reset(void *opaque)
cs->halted = 1;
env->spr[SPR_HIOR] = 0;
/* Set compatibility mode to match the boot CPU, which was either set
* by the machine reset code or by CAS. This should never fail.
*/
if (cs != first_cpu) {
ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort);
}
}
static void spapr_cpu_destroy(PowerPCCPU *cpu)
@ -169,13 +176,8 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
cs = CPU(obj);
cpu = POWERPC_CPU(cs);
cs->cpu_index = cc->core_id + i;
cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i;
if (kvm_enabled() && !kvm_vcpu_id_is_valid(cpu->vcpu_id)) {
error_setg(&local_err, "Can't create CPU with id %d in KVM",
cpu->vcpu_id);
error_append_hint(&local_err, "Adjust the number of cpus to %d "
"or try to raise the number of threads per core\n",
cpu->vcpu_id * smp_threads / spapr->vsmt);
spapr_set_vcpu_id(cpu, cs->cpu_index, &local_err);
if (local_err) {
goto err;
}

View File

@ -1655,6 +1655,61 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
return H_SUCCESS;
}
static target_ulong h_get_cpu_characteristics(PowerPCCPU *cpu,
sPAPRMachineState *spapr,
target_ulong opcode,
target_ulong *args)
{
uint64_t characteristics = H_CPU_CHAR_HON_BRANCH_HINTS &
~H_CPU_CHAR_THR_RECONF_TRIG;
uint64_t behaviour = H_CPU_BEHAV_FAVOUR_SECURITY;
uint8_t safe_cache = spapr_get_cap(spapr, SPAPR_CAP_CFPC);
uint8_t safe_bounds_check = spapr_get_cap(spapr, SPAPR_CAP_SBBC);
uint8_t safe_indirect_branch = spapr_get_cap(spapr, SPAPR_CAP_IBS);
switch (safe_cache) {
case SPAPR_CAP_WORKAROUND:
characteristics |= H_CPU_CHAR_L1D_FLUSH_ORI30;
characteristics |= H_CPU_CHAR_L1D_FLUSH_TRIG2;
characteristics |= H_CPU_CHAR_L1D_THREAD_PRIV;
behaviour |= H_CPU_BEHAV_L1D_FLUSH_PR;
break;
case SPAPR_CAP_FIXED:
break;
default: /* broken */
assert(safe_cache == SPAPR_CAP_BROKEN);
behaviour |= H_CPU_BEHAV_L1D_FLUSH_PR;
break;
}
switch (safe_bounds_check) {
case SPAPR_CAP_WORKAROUND:
characteristics |= H_CPU_CHAR_SPEC_BAR_ORI31;
behaviour |= H_CPU_BEHAV_BNDS_CHK_SPEC_BAR;
break;
case SPAPR_CAP_FIXED:
break;
default: /* broken */
assert(safe_bounds_check == SPAPR_CAP_BROKEN);
behaviour |= H_CPU_BEHAV_BNDS_CHK_SPEC_BAR;
break;
}
switch (safe_indirect_branch) {
case SPAPR_CAP_FIXED:
characteristics |= H_CPU_CHAR_BCCTRL_SERIALISED;
break;
default: /* broken */
assert(safe_indirect_branch == SPAPR_CAP_BROKEN);
break;
}
args[0] = characteristics;
args[1] = behaviour;
return H_SUCCESS;
}
static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
@ -1734,6 +1789,10 @@ static void hypercall_register_types(void)
spapr_register_hypercall(H_INVALIDATE_PID, h_invalidate_pid);
spapr_register_hypercall(H_REGISTER_PROC_TBL, h_register_process_table);
/* hcall-get-cpu-characteristics */
spapr_register_hypercall(H_GET_CPU_CHARACTERISTICS,
h_get_cpu_characteristics);
/* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
* here between the "CI" and the "CACHE" variants, they will use whatever
* mapping attributes qemu is using. When using KVM, the kernel will

View File

@ -280,20 +280,6 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
int *config_addr_key;
Error *err = NULL;
switch (func) {
case RTAS_CHANGE_MSI_FN:
case RTAS_CHANGE_FN:
ret_intr_type = RTAS_TYPE_MSI;
break;
case RTAS_CHANGE_MSIX_FN:
ret_intr_type = RTAS_TYPE_MSIX;
break;
default:
error_report("rtas_ibm_change_msi(%u) is not implemented", func);
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
return;
}
/* Fins sPAPRPHBState */
phb = spapr_pci_find_phb(spapr, buid);
if (phb) {
@ -304,6 +290,39 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
return;
}
switch (func) {
case RTAS_CHANGE_FN:
if (msi_present(pdev)) {
ret_intr_type = RTAS_TYPE_MSI;
} else if (msix_present(pdev)) {
ret_intr_type = RTAS_TYPE_MSIX;
} else {
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
return;
}
break;
case RTAS_CHANGE_MSI_FN:
if (msi_present(pdev)) {
ret_intr_type = RTAS_TYPE_MSI;
} else {
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
return;
}
break;
case RTAS_CHANGE_MSIX_FN:
if (msix_present(pdev)) {
ret_intr_type = RTAS_TYPE_MSIX;
} else {
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
return;
}
break;
default:
error_report("rtas_ibm_change_msi(%u) is not implemented", func);
rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
return;
}
msi = (spapr_pci_msi *) g_hash_table_lookup(phb->msi, &config_addr);
/* Releasing MSIs */
@ -1286,13 +1305,17 @@ static void spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
_FDT(fdt_setprop_cell(fdt, offset, "#size-cells",
RESOURCE_CELLS_SIZE));
max_msi = msi_nr_vectors_allocated(dev);
if (max_msi) {
_FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi", max_msi));
if (msi_present(dev)) {
max_msi = msi_nr_vectors_allocated(dev);
if (max_msi) {
_FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi", max_msi));
}
}
max_msix = dev->msix_entries_nr;
if (max_msix) {
_FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x", max_msix));
if (msix_present(dev)) {
max_msix = dev->msix_entries_nr;
if (max_msix) {
_FDT(fdt_setprop_cell(fdt, offset, "ibm,req#msi-x", max_msix));
}
}
populate_resource_props(dev, &rp);

View File

@ -40,6 +40,13 @@ static Property ccw_device_properties[] = {
DEFINE_PROP_END_OF_LIST(),
};
static void ccw_device_reset(DeviceState *d)
{
CcwDevice *ccw_dev = CCW_DEVICE(d);
css_reset_sch(ccw_dev->sch);
}
static void ccw_device_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
@ -48,6 +55,7 @@ static void ccw_device_class_init(ObjectClass *klass, void *data)
k->realize = ccw_device_realize;
k->refill_ids = ccw_device_refill_ids;
dc->props = ccw_device_properties;
dc->reset = ccw_device_reset;
}
const VMStateDescription vmstate_ccw_dev = {

View File

@ -617,6 +617,14 @@ void css_inject_io_interrupt(SubchDev *sch)
void css_conditional_io_interrupt(SubchDev *sch)
{
/*
* If the subchannel is not enabled, it is not made status pending
* (see PoP p. 16-17, "Status Control").
*/
if (!(sch->curr_status.pmcw.flags & PMCW_FLAGS_MASK_ENA)) {
return;
}
/*
* If the subchannel is not currently status pending, make it pending
* with alert status.

View File

@ -293,10 +293,10 @@ static void write_event_mask(SCLPEventFacility *ef, SCCB *sccb)
ef->receive_mask = be32_to_cpu(tmp_mask);
/* return the SCLP's capability masks to the guest */
tmp_mask = cpu_to_be32(get_host_send_mask(ef));
tmp_mask = cpu_to_be32(get_host_receive_mask(ef));
copy_mask(WEM_RECEIVE_MASK(we_mask, mask_length), (uint8_t *)&tmp_mask,
mask_length, sizeof(tmp_mask));
tmp_mask = cpu_to_be32(get_host_receive_mask(ef));
tmp_mask = cpu_to_be32(get_host_send_mask(ef));
copy_mask(WEM_SEND_MASK(we_mask, mask_length), (uint8_t *)&tmp_mask,
mask_length, sizeof(tmp_mask));

View File

@ -116,7 +116,7 @@ static void kvm_s390_stattrib_synchronize(S390StAttribState *sa)
for (cx = 0; cx + len <= max; cx += len) {
clog.start_gfn = cx;
clog.count = len;
clog.values = (uint64_t)(sas->incoming_buffer + cx * len);
clog.values = (uint64_t)(sas->incoming_buffer + cx);
r = kvm_vm_ioctl(kvm_state, KVM_S390_SET_CMMA_BITS, &clog);
if (r) {
error_report("KVM_S390_SET_CMMA_BITS failed: %s", strerror(-r));
@ -126,7 +126,7 @@ static void kvm_s390_stattrib_synchronize(S390StAttribState *sa)
if (cx < max) {
clog.start_gfn = cx;
clog.count = max - cx;
clog.values = (uint64_t)(sas->incoming_buffer + cx * len);
clog.values = (uint64_t)(sas->incoming_buffer + cx);
r = kvm_vm_ioctl(kvm_state, KVM_S390_SET_CMMA_BITS, &clog);
if (r) {
error_report("KVM_S390_SET_CMMA_BITS failed: %s", strerror(-r));

View File

@ -152,14 +152,38 @@ static void virtio_ccw_register_hcalls(void)
virtio_ccw_hcall_early_printk);
}
/*
* KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
* as the dirty bitmap must be managed by bitops that take an int as
* position indicator. If we have a guest beyond that we will split off
* new subregions. The split must happen on a segment boundary (1MB).
*/
#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
#define SEG_MSK (~0xfffffULL)
#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
static void s390_memory_init(ram_addr_t mem_size)
{
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *ram = g_new(MemoryRegion, 1);
ram_addr_t chunk, offset = 0;
unsigned int number = 0;
gchar *name;
/* allocate RAM for core */
memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
memory_region_add_subregion(sysmem, 0, ram);
name = g_strdup_printf("s390.ram");
while (mem_size) {
MemoryRegion *ram = g_new(MemoryRegion, 1);
uint64_t size = mem_size;
/* KVM does not allow memslots >= 8 TB */
chunk = MIN(size, KVM_SLOT_MAX_BYTES);
memory_region_allocate_system_memory(ram, NULL, name, chunk);
memory_region_add_subregion(sysmem, offset, ram);
mem_size -= chunk;
offset += chunk;
g_free(name);
name = g_strdup_printf("s390.ram.%u", ++number);
}
g_free(name);
/* Initialize storage key device */
s390_skeys_init();

View File

@ -751,7 +751,7 @@ out_err:
g_free(sch);
}
static int virtio_ccw_exit(VirtioCcwDevice *dev)
static void virtio_ccw_unrealize(VirtioCcwDevice *dev, Error **errp)
{
CcwDevice *ccw_dev = CCW_DEVICE(dev);
SubchDev *sch = ccw_dev->sch;
@ -759,12 +759,12 @@ static int virtio_ccw_exit(VirtioCcwDevice *dev)
if (sch) {
css_subch_assign(sch->cssid, sch->ssid, sch->schid, sch->devno, NULL);
g_free(sch);
ccw_dev->sch = NULL;
}
if (dev->indicators) {
release_indicator(&dev->routes.adapter, dev->indicators);
dev->indicators = NULL;
}
return 0;
}
static void virtio_ccw_net_realize(VirtioCcwDevice *ccw_dev, Error **errp)
@ -1057,10 +1057,12 @@ static void virtio_ccw_reset(DeviceState *d)
{
VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
VirtIODevice *vdev = virtio_bus_get_device(&dev->bus);
CcwDevice *ccw_dev = CCW_DEVICE(d);
VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
virtio_ccw_reset_virtio(dev, vdev);
css_reset_sch(ccw_dev->sch);
if (vdc->parent_reset) {
vdc->parent_reset(d);
}
}
static void virtio_ccw_vmstate_change(DeviceState *d, bool running)
@ -1343,8 +1345,7 @@ static void virtio_ccw_net_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_net_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_net_properties;
set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
}
@ -1371,8 +1372,7 @@ static void virtio_ccw_blk_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_blk_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_blk_properties;
set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
}
@ -1399,8 +1399,7 @@ static void virtio_ccw_serial_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_serial_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_serial_properties;
set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
}
@ -1427,8 +1426,7 @@ static void virtio_ccw_balloon_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_balloon_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_balloon_properties;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
}
@ -1455,8 +1453,7 @@ static void virtio_ccw_scsi_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_scsi_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_scsi_properties;
set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
}
@ -1482,8 +1479,7 @@ static void vhost_ccw_scsi_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = vhost_ccw_scsi_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = vhost_ccw_scsi_properties;
set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
}
@ -1519,8 +1515,7 @@ static void virtio_ccw_rng_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_rng_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_rng_properties;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
}
@ -1557,8 +1552,7 @@ static void virtio_ccw_crypto_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_crypto_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_crypto_properties;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
}
@ -1595,8 +1589,7 @@ static void virtio_ccw_gpu_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_gpu_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_gpu_properties;
dc->hotpluggable = false;
set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
@ -1624,8 +1617,7 @@ static void virtio_ccw_input_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = virtio_ccw_input_realize;
k->exit = virtio_ccw_exit;
dc->reset = virtio_ccw_reset;
k->unrealize = virtio_ccw_unrealize;
dc->props = virtio_ccw_input_properties;
set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
}
@ -1704,12 +1696,12 @@ static void virtio_ccw_busdev_realize(DeviceState *dev, Error **errp)
virtio_ccw_device_realize(_dev, errp);
}
static int virtio_ccw_busdev_exit(DeviceState *dev)
static void virtio_ccw_busdev_unrealize(DeviceState *dev, Error **errp)
{
VirtioCcwDevice *_dev = (VirtioCcwDevice *)dev;
VirtIOCCWDeviceClass *_info = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
return _info->exit(_dev);
_info->unrealize(_dev, errp);
}
static void virtio_ccw_busdev_unplug(HotplugHandler *hotplug_dev,
@ -1724,11 +1716,13 @@ static void virtio_ccw_device_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
CCWDeviceClass *k = CCW_DEVICE_CLASS(dc);
VirtIOCCWDeviceClass *vdc = VIRTIO_CCW_DEVICE_CLASS(klass);
k->unplug = virtio_ccw_busdev_unplug;
dc->realize = virtio_ccw_busdev_realize;
dc->exit = virtio_ccw_busdev_exit;
dc->unrealize = virtio_ccw_busdev_unrealize;
dc->bus_type = TYPE_VIRTUAL_CSS_BUS;
device_class_set_parent_reset(dc, virtio_ccw_reset, &vdc->parent_reset);
}
static const TypeInfo virtio_ccw_device_info = {
@ -1803,9 +1797,8 @@ static void virtio_ccw_9p_class_init(ObjectClass *klass, void *data)
DeviceClass *dc = DEVICE_CLASS(klass);
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->exit = virtio_ccw_exit;
k->unrealize = virtio_ccw_unrealize;
k->realize = virtio_ccw_9p_realize;
dc->reset = virtio_ccw_reset;
dc->props = virtio_ccw_9p_properties;
set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
}
@ -1852,10 +1845,9 @@ static void vhost_vsock_ccw_class_init(ObjectClass *klass, void *data)
VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
k->realize = vhost_vsock_ccw_realize;
k->exit = virtio_ccw_exit;
k->unrealize = virtio_ccw_unrealize;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
dc->props = vhost_vsock_ccw_properties;
dc->reset = virtio_ccw_reset;
}
static void vhost_vsock_ccw_instance_init(Object *obj)

View File

@ -76,7 +76,8 @@ typedef struct VirtioCcwDevice VirtioCcwDevice;
typedef struct VirtIOCCWDeviceClass {
CCWDeviceClass parent_class;
void (*realize)(VirtioCcwDevice *dev, Error **errp);
int (*exit)(VirtioCcwDevice *dev);
void (*unrealize)(VirtioCcwDevice *dev, Error **errp);
void (*parent_reset)(DeviceState *dev);
} VirtIOCCWDeviceClass;
/* Performance improves when virtqueue kick processing is decoupled from the

View File

@ -224,6 +224,7 @@ static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
/* handle legacy '-drive if=scsi,...' cmd line args */
SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
int unit, bool removable, int bootindex,
bool share_rw,
const char *serial, Error **errp)
{
const char *driver;
@ -254,6 +255,12 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
object_unparent(OBJECT(dev));
return NULL;
}
object_property_set_bool(OBJECT(dev), share_rw, "share-rw", &err);
if (err != NULL) {
error_propagate(errp, err);
object_unparent(OBJECT(dev));
return NULL;
}
object_property_set_bool(OBJECT(dev), true, "realized", &err);
if (err != NULL) {
error_propagate(errp, err);
@ -288,7 +295,7 @@ void scsi_bus_legacy_handle_cmdline(SCSIBus *bus, bool deprecated)
}
}
scsi_bus_legacy_add_drive(bus, blk_by_legacy_dinfo(dinfo),
unit, false, -1, NULL, &error_fatal);
unit, false, -1, false, NULL, &error_fatal);
}
loc_pop(&loc);
}

View File

@ -1755,6 +1755,7 @@ static void scsi_write_same_complete(void *opaque, int ret)
data->sector << BDRV_SECTOR_BITS,
&data->qiov, 0,
scsi_write_same_complete, data);
aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
return;
}

View File

@ -248,6 +248,10 @@ static void milkymist_memcard_reset(DeviceState *d)
for (i = 0; i < R_MAX; i++) {
s->regs[i] = 0;
}
/* Since we're still using the legacy SD API the card is not plugged
* into any bus, and we must reset it manually.
*/
device_reset(DEVICE(s->card));
}
static int milkymist_memcard_init(SysBusDevice *dev)

View File

@ -480,6 +480,10 @@ static void pl181_reset(DeviceState *d)
/* We can assume our GPIO outputs have been wired up now */
sd_set_cb(s->card, s->cardstatus[0], s->cardstatus[1]);
/* Since we're still using the legacy SD API the card is not plugged
* into any bus, and we must reset it manually.
*/
device_reset(DEVICE(s->card));
}
static void pl181_init(Object *obj)

View File

@ -50,6 +50,9 @@ typedef struct {
SDState *sd;
} ssi_sd_state;
#define TYPE_SSI_SD "ssi-sd"
#define SSI_SD(obj) OBJECT_CHECK(ssi_sd_state, (obj), TYPE_SSI_SD)
/* State word bits. */
#define SSI_SDR_LOCKED 0x0001
#define SSI_SDR_WP_ERASE 0x0002
@ -241,7 +244,6 @@ static void ssi_sd_realize(SSISlave *d, Error **errp)
ssi_sd_state *s = FROM_SSI_SLAVE(ssi_sd_state, d);
DriveInfo *dinfo;
s->mode = SSI_SD_CMD;
/* FIXME use a qdev drive property instead of drive_get_next() */
dinfo = drive_get_next(IF_SD);
s->sd = sd_init(dinfo ? blk_by_legacy_dinfo(dinfo) : NULL, true);
@ -251,6 +253,24 @@ static void ssi_sd_realize(SSISlave *d, Error **errp)
}
}
static void ssi_sd_reset(DeviceState *dev)
{
ssi_sd_state *s = SSI_SD(dev);
s->mode = SSI_SD_CMD;
s->cmd = 0;
memset(s->cmdarg, 0, sizeof(s->cmdarg));
memset(s->response, 0, sizeof(s->response));
s->arglen = 0;
s->response_pos = 0;
s->stopping = 0;
/* Since we're still using the legacy SD API the card is not plugged
* into any bus, and we must reset it manually.
*/
device_reset(DEVICE(s->sd));
}
static void ssi_sd_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
@ -260,10 +280,11 @@ static void ssi_sd_class_init(ObjectClass *klass, void *data)
k->transfer = ssi_sd_transfer;
k->cs_polarity = SSI_CS_LOW;
dc->vmsd = &vmstate_ssi_sd;
dc->reset = ssi_sd_reset;
}
static const TypeInfo ssi_sd_info = {
.name = "ssi-sd",
.name = TYPE_SSI_SD,
.parent = TYPE_SSI_SLAVE,
.instance_size = sizeof(ssi_sd_state),
.class_init = ssi_sd_class_init,

View File

@ -260,7 +260,9 @@ static int tpm_emulator_check_caps(TPMEmulator *tpm_emu)
static int tpm_emulator_startup_tpm(TPMBackend *tb)
{
TPMEmulator *tpm_emu = TPM_EMULATOR(tb);
ptm_init init;
ptm_init init = {
.u.req.init_flags = 0,
};
ptm_res res;
DPRINTF("%s", __func__);

View File

@ -206,7 +206,8 @@ static TPMVersion tpm_passthrough_get_tpm_version(TPMBackend *tb)
* Unless path or file descriptor set has been provided by user,
* determine the sysfs cancel file following kernel documentation
* in Documentation/ABI/stable/sysfs-class-tpm.
* From /dev/tpm0 create /sys/class/misc/tpm0/device/cancel
* From /dev/tpm0 create /sys/class/tpm/tpm0/device/cancel
* before 4.0: /sys/class/misc/tpm0/device/cancel
*/
static int tpm_passthrough_open_sysfs_cancel(TPMPassthruState *tpm_pt)
{
@ -217,28 +218,35 @@ static int tpm_passthrough_open_sysfs_cancel(TPMPassthruState *tpm_pt)
if (tpm_pt->options->cancel_path) {
fd = qemu_open(tpm_pt->options->cancel_path, O_WRONLY);
if (fd < 0) {
error_report("Could not open TPM cancel path : %s",
error_report("tpm_passthrough: Could not open TPM cancel path: %s",
strerror(errno));
}
return fd;
}
dev = strrchr(tpm_pt->tpm_dev, '/');
if (dev) {
dev++;
if (snprintf(path, sizeof(path), "/sys/class/misc/%s/device/cancel",
dev) < sizeof(path)) {
fd = qemu_open(path, O_WRONLY);
if (fd >= 0) {
tpm_pt->options->cancel_path = g_strdup(path);
} else {
error_report("tpm_passthrough: Could not open TPM cancel "
"path %s : %s", path, strerror(errno));
if (!dev) {
error_report("tpm_passthrough: Bad TPM device path %s",
tpm_pt->tpm_dev);
return -1;
}
dev++;
if (snprintf(path, sizeof(path), "/sys/class/tpm/%s/device/cancel",
dev) < sizeof(path)) {
fd = qemu_open(path, O_WRONLY);
if (fd < 0) {
if (snprintf(path, sizeof(path), "/sys/class/misc/%s/device/cancel",
dev) < sizeof(path)) {
fd = qemu_open(path, O_WRONLY);
}
}
}
if (fd < 0) {
error_report("tpm_passthrough: Could not guess TPM cancel path");
} else {
error_report("tpm_passthrough: Bad TPM device path %s",
tpm_pt->tpm_dev);
tpm_pt->options->cancel_path = g_strdup(path);
}
return fd;

View File

@ -965,12 +965,16 @@ static MTPData *usb_mtp_get_object(MTPState *s, MTPControl *c,
static MTPData *usb_mtp_get_partial_object(MTPState *s, MTPControl *c,
MTPObject *o)
{
MTPData *d = usb_mtp_data_alloc(c);
MTPData *d;
off_t offset;
if (c->argc <= 2) {
return NULL;
}
trace_usb_mtp_op_get_partial_object(s->dev.addr, o->handle, o->path,
c->argv[1], c->argv[2]);
d = usb_mtp_data_alloc(c);
d->fd = open(o->path, O_RDONLY);
if (d->fd == -1) {
usb_mtp_data_free(d);

View File

@ -329,8 +329,8 @@ static const uint8_t qemu_ccid_descriptor[] = {
*/
0x07, /* u8 bVoltageSupport; 01h - 5.0v, 02h - 3.0, 03 - 1.8 */
0x00, 0x00, /* u32 dwProtocols; RRRR PPPP. RRRR = 0000h.*/
0x01, 0x00, /* PPPP: 0001h = Protocol T=0, 0002h = Protocol T=1 */
0x01, 0x00, /* u32 dwProtocols; RRRR PPPP. RRRR = 0000h.*/
0x00, 0x00, /* PPPP: 0001h = Protocol T=0, 0002h = Protocol T=1 */
/* u32 dwDefaultClock; in kHZ (0x0fa0 is 4 MHz) */
0xa0, 0x0f, 0x00, 0x00,
/* u32 dwMaximumClock; */

View File

@ -635,7 +635,8 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
scsi_bus_new(&s->bus, sizeof(s->bus), DEVICE(dev),
&usb_msd_scsi_info_storage, NULL);
scsi_dev = scsi_bus_legacy_add_drive(&s->bus, blk, 0, !!s->removable,
s->conf.bootindex, dev->serial,
s->conf.bootindex, s->conf.share_rw,
dev->serial,
&err);
blk_unref(blk);
if (!scsi_dev) {

View File

@ -247,7 +247,11 @@ static int usb_host_init(void)
if (rc != 0) {
return -1;
}
#if LIBUSB_API_VERSION >= 0x01000106
libusb_set_option(ctx, LIBUSB_OPTION_LOG_LEVEL, loglevel);
#else
libusb_set_debug(ctx, loglevel);
#endif
#ifdef CONFIG_WIN32
/* FIXME: add support for Windows. */
#else

View File

@ -795,7 +795,7 @@ static void usbredir_handle_bulk_data(USBRedirDevice *dev, USBPacket *p,
usbredirparser_peer_has_cap(dev->parser,
usb_redir_cap_32bits_bulk_length));
if (ep & USB_DIR_IN) {
if (ep & USB_DIR_IN || size == 0) {
usbredirparser_send_bulk_packet(dev->parser, p->id,
&bulk_packet, NULL, 0);
} else {

View File

@ -357,11 +357,13 @@ static void vfio_ccw_realize(DeviceState *dev, Error **errp)
if (strcmp(vbasedev->name, vcdev->vdev.name) == 0) {
error_setg(&err, "vfio: subchannel %s has already been attached",
vcdev->vdev.name);
g_free(vcdev->vdev.name);
goto out_device_err;
}
}
if (vfio_get_device(group, cdev->mdevid, &vcdev->vdev, &err)) {
g_free(vcdev->vdev.name);
goto out_device_err;
}

View File

@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
group->container = container;
QLIST_INSERT_HEAD(&container->group_list, group, container_next);
vfio_kvm_device_add_group(group);
return 0;
}
}

View File

@ -317,11 +317,14 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
&offset);
fd = memory_region_get_fd(mr);
if (fd > 0) {
if (fd_num == VHOST_MEMORY_MAX_NREGIONS) {
error_report("Failed preparing vhost-user memory table msg");
return -1;
}
msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
msg.payload.memory.regions[fd_num].memory_size = reg->memory_size;
msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
msg.payload.memory.regions[fd_num].mmap_offset = offset;
assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
fds[fd_num++] = fd;
}
}

View File

@ -234,6 +234,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
memory_region_is_rom(section.mr) ||
memory_region_is_romd(section.mr)) {
trace_virtio_balloon_bad_addr(pa);
memory_region_unref(section.mr);
continue;
}

View File

@ -2469,7 +2469,7 @@ void GCC_FMT_ATTR(2, 3) virtio_error(VirtIODevice *vdev, const char *fmt, ...)
va_end(ap);
if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
virtio_set_status(vdev, vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET);
vdev->status = vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET;
virtio_notify_config(vdev);
}

View File

@ -437,6 +437,7 @@ bool bdrv_is_read_only(BlockDriverState *bs);
int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
bool ignore_allow_rdw, Error **errp);
int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
bool bdrv_is_writable(BlockDriverState *bs);
bool bdrv_is_sg(BlockDriverState *bs);
bool bdrv_is_inserted(BlockDriverState *bs);
void bdrv_lock_medium(BlockDriverState *bs, bool locked);

View File

@ -159,8 +159,12 @@ extern unsigned long guest_base;
extern int have_guest_base;
extern unsigned long reserved_va;
#define GUEST_ADDR_MAX (reserved_va ? reserved_va : \
#if HOST_LONG_BITS <= TARGET_VIRT_ADDR_SPACE_BITS
#define GUEST_ADDR_MAX (~0ul)
#else
#define GUEST_ADDR_MAX (reserved_va ? reserved_va - 1 : \
(1ul << TARGET_VIRT_ADDR_SPACE_BITS) - 1)
#endif
#else
#include "exec/hwaddr.h"

View File

@ -51,15 +51,13 @@
/* All direct uses of g2h and h2g need to go away for usermode softmmu. */
#define g2h(x) ((void *)((unsigned long)(target_ulong)(x) + guest_base))
#if HOST_LONG_BITS <= TARGET_VIRT_ADDR_SPACE_BITS
#define h2g_valid(x) 1
#else
#define h2g_valid(x) ({ \
unsigned long __guest = (unsigned long)(x) - guest_base; \
(__guest < (1ul << TARGET_VIRT_ADDR_SPACE_BITS)) && \
(!reserved_va || (__guest < reserved_va)); \
})
#endif
#define guest_addr_valid(x) ((x) <= GUEST_ADDR_MAX)
#define h2g_valid(x) guest_addr_valid((unsigned long)(x) - guest_base)
static inline int guest_range_valid(unsigned long start, unsigned long len)
{
return len - 1 <= GUEST_ADDR_MAX && start <= GUEST_ADDR_MAX - len + 1;
}
#define h2g_nocheck(x) ({ \
unsigned long __ret = (unsigned long)(x) - guest_base; \

View File

@ -20,7 +20,15 @@
#define MEMORY_INTERNAL_H
#ifndef CONFIG_USER_ONLY
typedef struct AddressSpaceDispatch AddressSpaceDispatch;
static inline AddressSpaceDispatch *flatview_to_dispatch(FlatView *fv)
{
return fv->dispatch;
}
static inline AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as)
{
return flatview_to_dispatch(address_space_to_flatview(as));
}
extern const MemoryRegionOps unassigned_mem_ops;
@ -30,9 +38,6 @@ bool memory_region_access_valid(MemoryRegion *mr, hwaddr addr,
void flatview_add_to_dispatch(FlatView *fv, MemoryRegionSection *section);
AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv);
void address_space_dispatch_compact(AddressSpaceDispatch *d);
AddressSpaceDispatch *address_space_to_dispatch(AddressSpace *as);
AddressSpaceDispatch *flatview_to_dispatch(FlatView *fv);
void address_space_dispatch_free(AddressSpaceDispatch *d);
void mtree_print_dispatch(fprintf_function mon, void *f,

View File

@ -318,7 +318,27 @@ struct AddressSpace {
QTAILQ_ENTRY(AddressSpace) address_spaces_link;
};
FlatView *address_space_to_flatview(AddressSpace *as);
typedef struct AddressSpaceDispatch AddressSpaceDispatch;
typedef struct FlatRange FlatRange;
/* Flattened global view of current active memory hierarchy. Kept in sorted
* order.
*/
struct FlatView {
struct rcu_head rcu;
unsigned ref;
FlatRange *ranges;
unsigned nr;
unsigned nr_allocated;
struct AddressSpaceDispatch *dispatch;
MemoryRegion *root;
};
static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
return atomic_rcu_read(&as->current_map);
}
/**
* MemoryRegionSection: describes a fragment of a #MemoryRegion
@ -1887,13 +1907,12 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
/* Internal functions, part of the implementation of address_space_read. */
MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len);
MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len, hwaddr addr1, hwaddr l,
MemoryRegion *mr);
MemTxResult flatview_read_full(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len);
void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr);
static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
@ -1912,7 +1931,7 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
*
* Return a MemTxResult indicating whether the operation succeeded
* or failed (eg unassigned memory, device rejected the transaction,
* IOMMU fault).
* IOMMU fault). Called within RCU critical section.
*
* @as: #AddressSpace to be accessed
* @addr: address within that address space
@ -1920,17 +1939,20 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
* @buf: buffer with the data transferred
*/
static inline __attribute__((__always_inline__))
MemTxResult flatview_read(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
uint8_t *buf, int len)
MemTxResult address_space_read(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len)
{
MemTxResult result = MEMTX_OK;
hwaddr l, addr1;
void *ptr;
MemoryRegion *mr;
FlatView *fv;
if (__builtin_constant_p(len)) {
if (len) {
rcu_read_lock();
fv = address_space_to_flatview(as);
l = len;
mr = flatview_translate(fv, addr, &addr1, &l, false);
if (len == l && memory_access_is_direct(mr, false)) {
@ -1943,18 +1965,11 @@ MemTxResult flatview_read(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
rcu_read_unlock();
}
} else {
result = flatview_read_full(fv, addr, attrs, buf, len);
result = address_space_read_full(as, addr, attrs, buf, len);
}
return result;
}
static inline MemTxResult address_space_read(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len)
{
return flatview_read(address_space_to_flatview(as), addr, attrs, buf, len);
}
/**
* address_space_read_cached: read from a cached RAM region
*

View File

@ -1,6 +1,8 @@
#ifndef HW_COMPAT_H
#define HW_COMPAT_H
#define HW_COMPAT_2_11
#define HW_COMPAT_2_10 \
{\
.driver = "virtio-mouse-device",\

View File

@ -27,6 +27,7 @@
#include "hw/i386/ioapic.h"
#include "hw/pci/msi.h"
#include "hw/sysbus.h"
#include "qemu/iova-tree.h"
#define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
#define INTEL_IOMMU_DEVICE(obj) \
@ -46,8 +47,10 @@
#define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff)
#define DMAR_REG_SIZE 0x230
#define VTD_HOST_ADDRESS_WIDTH 39
#define VTD_HAW_MASK ((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
#define VTD_HOST_AW_39BIT 39
#define VTD_HOST_AW_48BIT 48
#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT
#define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1)
#define DMAR_REPORT_F_INTR (1)
@ -65,7 +68,6 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
typedef struct VTDIrq VTDIrq;
typedef struct VTD_MSIMessage VTD_MSIMessage;
typedef struct IntelIOMMUNotifierNode IntelIOMMUNotifierNode;
/* Context-Entry */
struct VTDContextEntry {
@ -91,6 +93,10 @@ struct VTDAddressSpace {
MemoryRegion iommu_ir; /* Interrupt region: 0xfeeXXXXX */
IntelIOMMUState *iommu_state;
VTDContextCacheEntry context_cache_entry;
QLIST_ENTRY(VTDAddressSpace) next;
/* Superset of notifier flags that this address space has */
IOMMUNotifierFlag notifier_flags;
IOVATree *iova_tree; /* Traces mapped IOVA ranges */
};
struct VTDBus {
@ -251,11 +257,6 @@ struct VTD_MSIMessage {
/* When IR is enabled, all MSI/MSI-X data bits should be zero */
#define VTD_IR_MSI_DATA (0)
struct IntelIOMMUNotifierNode {
VTDAddressSpace *vtd_as;
QLIST_ENTRY(IntelIOMMUNotifierNode) next;
};
/* The iommu (DMAR) device state struct */
struct IntelIOMMUState {
X86IOMMUState x86_iommu;
@ -293,7 +294,7 @@ struct IntelIOMMUState {
GHashTable *vtd_as_by_busptr; /* VTDBus objects indexed by PCIBus* reference */
VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */
/* list of registered notifiers */
QLIST_HEAD(, IntelIOMMUNotifierNode) notifiers_list;
QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
/* interrupt remapping */
bool intr_enabled; /* Whether guest enabled IR */
@ -302,6 +303,13 @@ struct IntelIOMMUState {
bool intr_eime; /* Extended interrupt mode enabled */
OnOffAuto intr_eim; /* Toggle for EIM cabability */
bool buggy_eim; /* Force buggy EIM unless eim=off */
uint8_t aw_bits; /* Host/IOVA address width (in bits) */
/*
* Protects IOMMU states in general. Currently it protects the
* per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace.
*/
QemuMutex iommu_lock;
};
/* Find the VTD Address space associated with the given bus pointer,

View File

@ -217,6 +217,7 @@ struct GICv3State {
uint32_t revision;
bool security_extn;
bool irq_reset_nonsecure;
bool gicd_no_migration_shift_bug;
int dev_fd; /* kvm device fd if backed by kvm vgic support */
Error *migration_blocker;

View File

@ -50,6 +50,41 @@ typedef enum {
SPAPR_RESIZE_HPT_REQUIRED,
} sPAPRResizeHPT;
/**
* Capabilities
*/
/* Hardware Transactional Memory */
#define SPAPR_CAP_HTM 0x00
/* Vector Scalar Extensions */
#define SPAPR_CAP_VSX 0x01
/* Decimal Floating Point */
#define SPAPR_CAP_DFP 0x02
/* Cache Flush on Privilege Change */
#define SPAPR_CAP_CFPC 0x03
/* Speculation Barrier Bounds Checking */
#define SPAPR_CAP_SBBC 0x04
/* Indirect Branch Serialisation */
#define SPAPR_CAP_IBS 0x05
/* Num Caps */
#define SPAPR_CAP_NUM (SPAPR_CAP_IBS + 1)
/*
* Capability Values
*/
/* Bool Caps */
#define SPAPR_CAP_OFF 0x00
#define SPAPR_CAP_ON 0x01
/* Broken | Workaround | Fixed Caps */
#define SPAPR_CAP_BROKEN 0x00
#define SPAPR_CAP_WORKAROUND 0x01
#define SPAPR_CAP_FIXED 0x02
typedef struct sPAPRCapabilities sPAPRCapabilities;
struct sPAPRCapabilities {
uint8_t caps[SPAPR_CAP_NUM];
};
/**
* sPAPRMachineClass:
*/
@ -66,6 +101,7 @@ struct sPAPRMachineClass {
hwaddr *mmio32, hwaddr *mmio64,
unsigned n_dma, uint32_t *liobns, Error **errp);
sPAPRResizeHPT resize_hpt_default;
sPAPRCapabilities default_caps;
};
/**
@ -127,6 +163,9 @@ struct sPAPRMachineState {
MemoryHotplugState hotplug_memory;
const char *icp_type;
bool cmd_line_caps[SPAPR_CAP_NUM];
sPAPRCapabilities def, eff, mig;
};
#define H_SUCCESS 0
@ -266,6 +305,18 @@ struct sPAPRMachineState {
#define H_DABRX_KERNEL (1ULL<<(63-62))
#define H_DABRX_USER (1ULL<<(63-63))
/* Values for KVM_PPC_GET_CPU_CHAR & H_GET_CPU_CHARACTERISTICS */
#define H_CPU_CHAR_SPEC_BAR_ORI31 PPC_BIT(0)
#define H_CPU_CHAR_BCCTRL_SERIALISED PPC_BIT(1)
#define H_CPU_CHAR_L1D_FLUSH_ORI30 PPC_BIT(2)
#define H_CPU_CHAR_L1D_FLUSH_TRIG2 PPC_BIT(3)
#define H_CPU_CHAR_L1D_THREAD_PRIV PPC_BIT(4)
#define H_CPU_CHAR_HON_BRANCH_HINTS PPC_BIT(5)
#define H_CPU_CHAR_THR_RECONF_TRIG PPC_BIT(6)
#define H_CPU_BEHAV_FAVOUR_SECURITY PPC_BIT(0)
#define H_CPU_BEHAV_L1D_FLUSH_PR PPC_BIT(1)
#define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR PPC_BIT(2)
/* Each control block has to be on a 4K boundary */
#define H_CB_ALIGNMENT 4096
@ -353,6 +404,7 @@ struct sPAPRMachineState {
#define H_GET_HCA_INFO 0x1B8
#define H_GET_PERF_COUNT 0x1BC
#define H_MANAGE_TRACE 0x1C0
#define H_GET_CPU_CHARACTERISTICS 0x1C8
#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4
#define H_QUERY_INT_STATE 0x1E4
#define H_POLL_PENDING 0x1D8
@ -704,7 +756,30 @@ void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
#define HTAB_SIZE(spapr) (1ULL << ((spapr)->htab_shift))
int spapr_vcpu_id(PowerPCCPU *cpu);
int spapr_get_vcpu_id(PowerPCCPU *cpu);
void spapr_set_vcpu_id(PowerPCCPU *cpu, int cpu_index, Error **errp);
PowerPCCPU *spapr_find_cpu(int vcpu_id);
int spapr_caps_pre_load(void *opaque);
int spapr_caps_pre_save(void *opaque);
/*
* Handling of optional capabilities
*/
extern const VMStateDescription vmstate_spapr_cap_htm;
extern const VMStateDescription vmstate_spapr_cap_vsx;
extern const VMStateDescription vmstate_spapr_cap_dfp;
extern const VMStateDescription vmstate_spapr_cap_cfpc;
extern const VMStateDescription vmstate_spapr_cap_sbbc;
extern const VMStateDescription vmstate_spapr_cap_ibs;
static inline uint8_t spapr_get_cap(sPAPRMachineState *spapr, int cap)
{
return spapr->eff.caps[cap];
}
void spapr_caps_reset(sPAPRMachineState *spapr);
void spapr_caps_add_properties(sPAPRMachineClass *smc, Error **errp);
int spapr_caps_post_migration(sPAPRMachineState *spapr);
#endif /* HW_SPAPR_H */

View File

@ -32,9 +32,9 @@ typedef enum DeviceCategory {
typedef int (*qdev_initfn)(DeviceState *dev);
typedef int (*qdev_event)(DeviceState *dev);
typedef void (*qdev_resetfn)(DeviceState *dev);
typedef void (*DeviceRealize)(DeviceState *dev, Error **errp);
typedef void (*DeviceUnrealize)(DeviceState *dev, Error **errp);
typedef void (*DeviceReset)(DeviceState *dev);
typedef void (*BusRealize)(BusState *bus, Error **errp);
typedef void (*BusUnrealize)(BusState *bus, Error **errp);
@ -117,7 +117,7 @@ typedef struct DeviceClass {
bool hotpluggable;
/* callbacks */
void (*reset)(DeviceState *dev);
DeviceReset reset;
DeviceRealize realize;
DeviceUnrealize unrealize;
@ -381,6 +381,16 @@ void qdev_machine_init(void);
*/
void device_reset(DeviceState *dev);
void device_class_set_parent_reset(DeviceClass *dc,
DeviceReset dev_reset,
DeviceReset *parent_reset);
void device_class_set_parent_realize(DeviceClass *dc,
DeviceRealize dev_realize,
DeviceRealize *parent_realize);
void device_class_set_parent_unrealize(DeviceClass *dc,
DeviceUnrealize dev_unrealize,
DeviceUnrealize *parent_unrealize);
const struct VMStateDescription *qdev_get_vmsd(DeviceState *dev);
const char *qdev_fw_name(DeviceState *dev);

View File

@ -151,6 +151,7 @@ static inline SCSIBus *scsi_bus_from_device(SCSIDevice *d)
SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
int unit, bool removable, int bootindex,
bool share_rw,
const char *serial, Error **errp);
void scsi_bus_legacy_handle_cmdline(SCSIBus *bus, bool deprecated);
void scsi_legacy_handle_cmdline(void);

View File

@ -156,6 +156,7 @@ ssize_t qemu_send_packet_async(NetClientState *nc, const uint8_t *buf,
int size, NetPacketSent *sent_cb);
void qemu_purge_queued_packets(NetClientState *nc);
void qemu_flush_queued_packets(NetClientState *nc);
void qemu_flush_or_purge_queued_packets(NetClientState *nc, bool purge);
void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
bool qemu_has_ufo(NetClientState *nc);
bool qemu_has_vnet_hdr(NetClientState *nc);

View File

@ -0,0 +1,134 @@
/*
* An very simplified iova tree implementation based on GTree.
*
* Copyright 2018 Red Hat, Inc.
*
* Authors:
* Peter Xu <peterx@redhat.com>
*
* This work is licensed under the terms of the GNU GPL, version 2 or later.
*/
#ifndef IOVA_TREE_H
#define IOVA_TREE_H
/*
* Currently the iova tree will only allow to keep ranges
* information, and no extra user data is allowed for each element. A
* benefit is that we can merge adjacent ranges internally within the
* tree. It can save a lot of memory when the ranges are splitted but
* mostly continuous.
*
* Note that current implementation does not provide any thread
* protections. Callers of the iova tree should be responsible
* for the thread safety issue.
*/
#include "qemu/osdep.h"
#include "exec/memory.h"
#include "exec/hwaddr.h"
#define IOVA_OK (0)
#define IOVA_ERR_INVALID (-1) /* Invalid parameters */
#define IOVA_ERR_OVERLAP (-2) /* IOVA range overlapped */
typedef struct IOVATree IOVATree;
typedef struct DMAMap {
hwaddr iova;
hwaddr translated_addr;
hwaddr size; /* Inclusive */
IOMMUAccessFlags perm;
} QEMU_PACKED DMAMap;
typedef gboolean (*iova_tree_iterator)(DMAMap *map);
/**
* iova_tree_new:
*
* Create a new iova tree.
*
* Returns: the tree pointer when succeeded, or NULL if error.
*/
IOVATree *iova_tree_new(void);
/**
* iova_tree_insert:
*
* @tree: the iova tree to insert
* @map: the mapping to insert
*
* Insert an iova range to the tree. If there is overlapped
* ranges, IOVA_ERR_OVERLAP will be returned.
*
* Return: 0 if succeeded, or <0 if error.
*/
int iova_tree_insert(IOVATree *tree, DMAMap *map);
/**
* iova_tree_remove:
*
* @tree: the iova tree to remove range from
* @map: the map range to remove
*
* Remove mappings from the tree that are covered by the map range
* provided. The range does not need to be exactly what has inserted,
* all the mappings that are included in the provided range will be
* removed from the tree. Here map->translated_addr is meaningless.
*
* Return: 0 if succeeded, or <0 if error.
*/
int iova_tree_remove(IOVATree *tree, DMAMap *map);
/**
* iova_tree_find:
*
* @tree: the iova tree to search from
* @map: the mapping to search
*
* Search for a mapping in the iova tree that overlaps with the
* mapping range specified. Only the first found mapping will be
* returned.
*
* Return: DMAMap pointer if found, or NULL if not found. Note that
* the returned DMAMap pointer is maintained internally. User should
* only read the content but never modify or free the content. Also,
* user is responsible to make sure the pointer is valid (say, no
* concurrent deletion in progress).
*/
DMAMap *iova_tree_find(IOVATree *tree, DMAMap *map);
/**
* iova_tree_find_address:
*
* @tree: the iova tree to search from
* @iova: the iova address to find
*
* Similar to iova_tree_find(), but it tries to find mapping with
* range iova=iova & size=0.
*
* Return: same as iova_tree_find().
*/
DMAMap *iova_tree_find_address(IOVATree *tree, hwaddr iova);
/**
* iova_tree_foreach:
*
* @tree: the iova tree to iterate on
* @iterator: the interator for the mappings, return true to stop
*
* Iterate over the iova tree.
*
* Return: 1 if found any overlap, 0 if not, <0 if error.
*/
void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
/**
* iova_tree_destroy:
*
* @tree: the iova tree to destroy
*
* Destroy an existing iova tree.
*
* Return: None.
*/
void iova_tree_destroy(IOVATree *tree);
#endif

View File

@ -76,7 +76,11 @@ extern const struct SCSISense sense_code_LUN_FAILURE;
extern const struct SCSISense sense_code_LUN_COMM_FAILURE;
/* Command aborted, Overlapped Commands Attempted */
extern const struct SCSISense sense_code_OVERLAPPED_COMMANDS;
/* LUN not ready, Capacity data has changed */
/* Medium error, Unrecovered read error */
extern const struct SCSISense sense_code_READ_ERROR;
/* LUN not ready, Cause not reportable */
extern const struct SCSISense sense_code_NOT_READY;
/* Unit attention, Capacity data has changed */
extern const struct SCSISense sense_code_CAPACITY_CHANGED;
/* Unit attention, SCSI bus reset */
extern const struct SCSISense sense_code_SCSI_BUS_RESET;

View File

@ -1,12 +1,9 @@
/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
/*
* Definitions for virtio-ccw devices.
*
* Copyright IBM Corp. 2013
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License (version 2 only)
* as published by the Free Software Foundation.
*
* Author(s): Cornelia Huck <cornelia.huck@de.ibm.com>
*/
#ifndef __KVM_VIRTIO_CCW_H

View File

@ -1,393 +1 @@
#ifndef _ASM_X86_HYPERV_H
#define _ASM_X86_HYPERV_H
#include "standard-headers/linux/types.h"
/*
* The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
* is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
*/
#define HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS 0x40000000
#define HYPERV_CPUID_INTERFACE 0x40000001
#define HYPERV_CPUID_VERSION 0x40000002
#define HYPERV_CPUID_FEATURES 0x40000003
#define HYPERV_CPUID_ENLIGHTMENT_INFO 0x40000004
#define HYPERV_CPUID_IMPLEMENT_LIMITS 0x40000005
#define HYPERV_HYPERVISOR_PRESENT_BIT 0x80000000
#define HYPERV_CPUID_MIN 0x40000005
#define HYPERV_CPUID_MAX 0x4000ffff
/*
* Feature identification. EAX indicates which features are available
* to the partition based upon the current partition privileges.
*/
/* VP Runtime (HV_X64_MSR_VP_RUNTIME) available */
#define HV_X64_MSR_VP_RUNTIME_AVAILABLE (1 << 0)
/* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
#define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE (1 << 1)
/* Partition reference TSC MSR is available */
#define HV_X64_MSR_REFERENCE_TSC_AVAILABLE (1 << 9)
/* A partition's reference time stamp counter (TSC) page */
#define HV_X64_MSR_REFERENCE_TSC 0x40000021
/*
* There is a single feature flag that signifies if the partition has access
* to MSRs with local APIC and TSC frequencies.
*/
#define HV_X64_ACCESS_FREQUENCY_MSRS (1 << 11)
/*
* Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM
* and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available
*/
#define HV_X64_MSR_SYNIC_AVAILABLE (1 << 2)
/*
* Synthetic Timer MSRs (HV_X64_MSR_STIMER0_CONFIG through
* HV_X64_MSR_STIMER3_COUNT) available
*/
#define HV_X64_MSR_SYNTIMER_AVAILABLE (1 << 3)
/*
* APIC access MSRs (HV_X64_MSR_EOI, HV_X64_MSR_ICR and HV_X64_MSR_TPR)
* are available
*/
#define HV_X64_MSR_APIC_ACCESS_AVAILABLE (1 << 4)
/* Hypercall MSRs (HV_X64_MSR_GUEST_OS_ID and HV_X64_MSR_HYPERCALL) available*/
#define HV_X64_MSR_HYPERCALL_AVAILABLE (1 << 5)
/* Access virtual processor index MSR (HV_X64_MSR_VP_INDEX) available*/
#define HV_X64_MSR_VP_INDEX_AVAILABLE (1 << 6)
/* Virtual system reset MSR (HV_X64_MSR_RESET) is available*/
#define HV_X64_MSR_RESET_AVAILABLE (1 << 7)
/*
* Access statistics pages MSRs (HV_X64_MSR_STATS_PARTITION_RETAIL_PAGE,
* HV_X64_MSR_STATS_PARTITION_INTERNAL_PAGE, HV_X64_MSR_STATS_VP_RETAIL_PAGE,
* HV_X64_MSR_STATS_VP_INTERNAL_PAGE) available
*/
#define HV_X64_MSR_STAT_PAGES_AVAILABLE (1 << 8)
/* Frequency MSRs available */
#define HV_FEATURE_FREQUENCY_MSRS_AVAILABLE (1 << 8)
/* Crash MSR available */
#define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE (1 << 10)
/*
* Feature identification: EBX indicates which flags were specified at
* partition creation. The format is the same as the partition creation
* flag structure defined in section Partition Creation Flags.
*/
#define HV_X64_CREATE_PARTITIONS (1 << 0)
#define HV_X64_ACCESS_PARTITION_ID (1 << 1)
#define HV_X64_ACCESS_MEMORY_POOL (1 << 2)
#define HV_X64_ADJUST_MESSAGE_BUFFERS (1 << 3)
#define HV_X64_POST_MESSAGES (1 << 4)
#define HV_X64_SIGNAL_EVENTS (1 << 5)
#define HV_X64_CREATE_PORT (1 << 6)
#define HV_X64_CONNECT_PORT (1 << 7)
#define HV_X64_ACCESS_STATS (1 << 8)
#define HV_X64_DEBUGGING (1 << 11)
#define HV_X64_CPU_POWER_MANAGEMENT (1 << 12)
#define HV_X64_CONFIGURE_PROFILER (1 << 13)
/*
* Feature identification. EDX indicates which miscellaneous features
* are available to the partition.
*/
/* The MWAIT instruction is available (per section MONITOR / MWAIT) */
#define HV_X64_MWAIT_AVAILABLE (1 << 0)
/* Guest debugging support is available */
#define HV_X64_GUEST_DEBUGGING_AVAILABLE (1 << 1)
/* Performance Monitor support is available*/
#define HV_X64_PERF_MONITOR_AVAILABLE (1 << 2)
/* Support for physical CPU dynamic partitioning events is available*/
#define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE (1 << 3)
/*
* Support for passing hypercall input parameter block via XMM
* registers is available
*/
#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE (1 << 4)
/* Support for a virtual guest idle state is available */
#define HV_X64_GUEST_IDLE_STATE_AVAILABLE (1 << 5)
/* Guest crash data handler available */
#define HV_X64_GUEST_CRASH_MSR_AVAILABLE (1 << 10)
/*
* Implementation recommendations. Indicates which behaviors the hypervisor
* recommends the OS implement for optimal performance.
*/
/*
* Recommend using hypercall for address space switches rather
* than MOV to CR3 instruction
*/
#define HV_X64_AS_SWITCH_RECOMMENDED (1 << 0)
/* Recommend using hypercall for local TLB flushes rather
* than INVLPG or MOV to CR3 instructions */
#define HV_X64_LOCAL_TLB_FLUSH_RECOMMENDED (1 << 1)
/*
* Recommend using hypercall for remote TLB flushes rather
* than inter-processor interrupts
*/
#define HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED (1 << 2)
/*
* Recommend using MSRs for accessing APIC registers
* EOI, ICR and TPR rather than their memory-mapped counterparts
*/
#define HV_X64_APIC_ACCESS_RECOMMENDED (1 << 3)
/* Recommend using the hypervisor-provided MSR to initiate a system RESET */
#define HV_X64_SYSTEM_RESET_RECOMMENDED (1 << 4)
/*
* Recommend using relaxed timing for this partition. If used,
* the VM should disable any watchdog timeouts that rely on the
* timely delivery of external interrupts
*/
#define HV_X64_RELAXED_TIMING_RECOMMENDED (1 << 5)
/*
* Virtual APIC support
*/
#define HV_X64_DEPRECATING_AEOI_RECOMMENDED (1 << 9)
/* Recommend using the newer ExProcessorMasks interface */
#define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED (1 << 11)
/*
* Crash notification flag.
*/
#define HV_CRASH_CTL_CRASH_NOTIFY (1ULL << 63)
/* MSR used to identify the guest OS. */
#define HV_X64_MSR_GUEST_OS_ID 0x40000000
/* MSR used to setup pages used to communicate with the hypervisor. */
#define HV_X64_MSR_HYPERCALL 0x40000001
/* MSR used to provide vcpu index */
#define HV_X64_MSR_VP_INDEX 0x40000002
/* MSR used to reset the guest OS. */
#define HV_X64_MSR_RESET 0x40000003
/* MSR used to provide vcpu runtime in 100ns units */
#define HV_X64_MSR_VP_RUNTIME 0x40000010
/* MSR used to read the per-partition time reference counter */
#define HV_X64_MSR_TIME_REF_COUNT 0x40000020
/* MSR used to retrieve the TSC frequency */
#define HV_X64_MSR_TSC_FREQUENCY 0x40000022
/* MSR used to retrieve the local APIC timer frequency */
#define HV_X64_MSR_APIC_FREQUENCY 0x40000023
/* Define the virtual APIC registers */
#define HV_X64_MSR_EOI 0x40000070
#define HV_X64_MSR_ICR 0x40000071
#define HV_X64_MSR_TPR 0x40000072
#define HV_X64_MSR_APIC_ASSIST_PAGE 0x40000073
/* Define synthetic interrupt controller model specific registers. */
#define HV_X64_MSR_SCONTROL 0x40000080
#define HV_X64_MSR_SVERSION 0x40000081
#define HV_X64_MSR_SIEFP 0x40000082
#define HV_X64_MSR_SIMP 0x40000083
#define HV_X64_MSR_EOM 0x40000084
#define HV_X64_MSR_SINT0 0x40000090
#define HV_X64_MSR_SINT1 0x40000091
#define HV_X64_MSR_SINT2 0x40000092
#define HV_X64_MSR_SINT3 0x40000093
#define HV_X64_MSR_SINT4 0x40000094
#define HV_X64_MSR_SINT5 0x40000095
#define HV_X64_MSR_SINT6 0x40000096
#define HV_X64_MSR_SINT7 0x40000097
#define HV_X64_MSR_SINT8 0x40000098
#define HV_X64_MSR_SINT9 0x40000099
#define HV_X64_MSR_SINT10 0x4000009A
#define HV_X64_MSR_SINT11 0x4000009B
#define HV_X64_MSR_SINT12 0x4000009C
#define HV_X64_MSR_SINT13 0x4000009D
#define HV_X64_MSR_SINT14 0x4000009E
#define HV_X64_MSR_SINT15 0x4000009F
/*
* Synthetic Timer MSRs. Four timers per vcpu.
*/
#define HV_X64_MSR_STIMER0_CONFIG 0x400000B0
#define HV_X64_MSR_STIMER0_COUNT 0x400000B1
#define HV_X64_MSR_STIMER1_CONFIG 0x400000B2
#define HV_X64_MSR_STIMER1_COUNT 0x400000B3
#define HV_X64_MSR_STIMER2_CONFIG 0x400000B4
#define HV_X64_MSR_STIMER2_COUNT 0x400000B5
#define HV_X64_MSR_STIMER3_CONFIG 0x400000B6
#define HV_X64_MSR_STIMER3_COUNT 0x400000B7
/* Hyper-V guest crash notification MSR's */
#define HV_X64_MSR_CRASH_P0 0x40000100
#define HV_X64_MSR_CRASH_P1 0x40000101
#define HV_X64_MSR_CRASH_P2 0x40000102
#define HV_X64_MSR_CRASH_P3 0x40000103
#define HV_X64_MSR_CRASH_P4 0x40000104
#define HV_X64_MSR_CRASH_CTL 0x40000105
#define HV_X64_MSR_CRASH_CTL_NOTIFY (1ULL << 63)
#define HV_X64_MSR_CRASH_PARAMS \
(1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0))
#define HV_X64_MSR_HYPERCALL_ENABLE 0x00000001
#define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT 12
#define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK \
(~((1ull << HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1))
/* Declare the various hypercall operations. */
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE 0x00000001
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT 12
#define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \
(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x00000001
#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12
#define HV_PROCESSOR_POWER_STATE_C0 0
#define HV_PROCESSOR_POWER_STATE_C1 1
#define HV_PROCESSOR_POWER_STATE_C2 2
#define HV_PROCESSOR_POWER_STATE_C3 3
#define HV_FLUSH_ALL_PROCESSORS BIT(0)
#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES BIT(1)
#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY BIT(2)
#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT BIT(3)
enum HV_GENERIC_SET_FORMAT {
HV_GENERIC_SET_SPARCE_4K,
HV_GENERIC_SET_ALL,
};
/* hypercall status code */
#define HV_STATUS_SUCCESS 0
#define HV_STATUS_INVALID_HYPERCALL_CODE 2
#define HV_STATUS_INVALID_HYPERCALL_INPUT 3
#define HV_STATUS_INVALID_ALIGNMENT 4
#define HV_STATUS_INSUFFICIENT_MEMORY 11
#define HV_STATUS_INVALID_CONNECTION_ID 18
#define HV_STATUS_INSUFFICIENT_BUFFERS 19
typedef struct _HV_REFERENCE_TSC_PAGE {
uint32_t tsc_sequence;
uint32_t res1;
uint64_t tsc_scale;
int64_t tsc_offset;
} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
/* Define the number of synthetic interrupt sources. */
#define HV_SYNIC_SINT_COUNT (16)
/* Define the expected SynIC version. */
#define HV_SYNIC_VERSION_1 (0x1)
#define HV_SYNIC_CONTROL_ENABLE (1ULL << 0)
#define HV_SYNIC_SIMP_ENABLE (1ULL << 0)
#define HV_SYNIC_SIEFP_ENABLE (1ULL << 0)
#define HV_SYNIC_SINT_MASKED (1ULL << 16)
#define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
#define HV_SYNIC_SINT_VECTOR_MASK (0xFF)
#define HV_SYNIC_STIMER_COUNT (4)
/* Define synthetic interrupt controller message constants. */
#define HV_MESSAGE_SIZE (256)
#define HV_MESSAGE_PAYLOAD_BYTE_COUNT (240)
#define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30)
/* Define hypervisor message types. */
enum hv_message_type {
HVMSG_NONE = 0x00000000,
/* Memory access messages. */
HVMSG_UNMAPPED_GPA = 0x80000000,
HVMSG_GPA_INTERCEPT = 0x80000001,
/* Timer notification messages. */
HVMSG_TIMER_EXPIRED = 0x80000010,
/* Error messages. */
HVMSG_INVALID_VP_REGISTER_VALUE = 0x80000020,
HVMSG_UNRECOVERABLE_EXCEPTION = 0x80000021,
HVMSG_UNSUPPORTED_FEATURE = 0x80000022,
/* Trace buffer complete messages. */
HVMSG_EVENTLOG_BUFFERCOMPLETE = 0x80000040,
/* Platform-specific processor intercept messages. */
HVMSG_X64_IOPORT_INTERCEPT = 0x80010000,
HVMSG_X64_MSR_INTERCEPT = 0x80010001,
HVMSG_X64_CPUID_INTERCEPT = 0x80010002,
HVMSG_X64_EXCEPTION_INTERCEPT = 0x80010003,
HVMSG_X64_APIC_EOI = 0x80010004,
HVMSG_X64_LEGACY_FP_ERROR = 0x80010005
};
/* Define synthetic interrupt controller message flags. */
union hv_message_flags {
uint8_t asu8;
struct {
uint8_t msg_pending:1;
uint8_t reserved:7;
};
};
/* Define port identifier type. */
union hv_port_id {
uint32_t asu32;
struct {
uint32_t id:24;
uint32_t reserved:8;
} u;
};
/* Define synthetic interrupt controller message header. */
struct hv_message_header {
uint32_t message_type;
uint8_t payload_size;
union hv_message_flags message_flags;
uint8_t reserved[2];
union {
uint64_t sender;
union hv_port_id port;
};
};
/* Define synthetic interrupt controller message format. */
struct hv_message {
struct hv_message_header header;
union {
uint64_t payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
} u;
};
/* Define the synthetic interrupt message page layout. */
struct hv_message_page {
struct hv_message sint_message[HV_SYNIC_SINT_COUNT];
};
/* Define timer message payload structure. */
struct hv_timer_message_payload {
uint32_t timer_index;
uint32_t reserved;
uint64_t expiration_time; /* When the timer expired */
uint64_t delivery_time; /* When the message was delivered */
};
#define HV_STIMER_ENABLE (1ULL << 0)
#define HV_STIMER_PERIODIC (1ULL << 1)
#define HV_STIMER_LAZY (1ULL << 2)
#define HV_STIMER_AUTOENABLE (1ULL << 3)
#define HV_STIMER_SINT(config) (uint8_t)(((config) >> 16) & 0x0F)
#endif
/* this is a temporary placeholder until kvm_para.h stops including it */

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Input event codes
*
@ -406,6 +407,7 @@
#define BTN_TOOL_MOUSE 0x146
#define BTN_TOOL_LENS 0x147
#define BTN_TOOL_QUINTTAP 0x148 /* Five fingers on trackpad */
#define BTN_STYLUS3 0x149
#define BTN_TOUCH 0x14a
#define BTN_STYLUS 0x14b
#define BTN_STYLUS2 0x14c

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Copyright (c) 1999-2002 Vojtech Pavlik
*

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* pci_regs.h
*
@ -746,6 +747,7 @@
#define PCI_ERR_ROOT_FIRST_FATAL 0x00000010 /* First UNC is Fatal */
#define PCI_ERR_ROOT_NONFATAL_RCV 0x00000020 /* Non-Fatal Received */
#define PCI_ERR_ROOT_FATAL_RCV 0x00000040 /* Fatal Received */
#define PCI_ERR_ROOT_AER_IRQ 0xf8000000 /* Advanced Error Interrupt Message Number */
#define PCI_ERR_ROOT_ERR_SRC 52 /* Error Source Identification */
/* Virtual Channel */
@ -939,9 +941,13 @@
#define PCI_SATA_SIZEOF_LONG 16
/* Resizable BARs */
#define PCI_REBAR_CAP 4 /* capability register */
#define PCI_REBAR_CAP_SIZES 0x00FFFFF0 /* supported BAR sizes */
#define PCI_REBAR_CTRL 8 /* control register */
#define PCI_REBAR_CTRL_NBAR_MASK (7 << 5) /* mask for # bars */
#define PCI_REBAR_CTRL_NBAR_SHIFT 5 /* shift for # bars */
#define PCI_REBAR_CTRL_BAR_IDX 0x00000007 /* BAR index */
#define PCI_REBAR_CTRL_NBAR_MASK 0x000000E0 /* # of resizable BARs */
#define PCI_REBAR_CTRL_NBAR_SHIFT 5 /* shift for # of BARs */
#define PCI_REBAR_CTRL_BAR_SIZE 0x00001F00 /* BAR size */
/* Dynamic Power Allocation */
#define PCI_DPA_CAP 4 /* capability register */
@ -960,6 +966,7 @@
/* Downstream Port Containment */
#define PCI_EXP_DPC_CAP 4 /* DPC Capability */
#define PCI_EXP_DPC_IRQ 0x1f /* DPC Interrupt Message Number */
#define PCI_EXP_DPC_CAP_RP_EXT 0x20 /* Root Port Extensions for DPC */
#define PCI_EXP_DPC_CAP_POISONED_TLP 0x40 /* Poisoned TLP Egress Blocking Supported */
#define PCI_EXP_DPC_CAP_SW_TRIGGER 0x80 /* Software Triggering Supported */
@ -995,19 +1002,25 @@
#define PCI_PTM_CTRL_ENABLE 0x00000001 /* PTM enable */
#define PCI_PTM_CTRL_ROOT 0x00000002 /* Root select */
/* L1 PM Substates */
#define PCI_L1SS_CAP 4 /* capability register */
#define PCI_L1SS_CAP_PCIPM_L1_2 1 /* PCI PM L1.2 Support */
#define PCI_L1SS_CAP_PCIPM_L1_1 2 /* PCI PM L1.1 Support */
#define PCI_L1SS_CAP_ASPM_L1_2 4 /* ASPM L1.2 Support */
#define PCI_L1SS_CAP_ASPM_L1_1 8 /* ASPM L1.1 Support */
#define PCI_L1SS_CAP_L1_PM_SS 16 /* L1 PM Substates Support */
#define PCI_L1SS_CTL1 8 /* Control Register 1 */
#define PCI_L1SS_CTL1_PCIPM_L1_2 1 /* PCI PM L1.2 Enable */
#define PCI_L1SS_CTL1_PCIPM_L1_1 2 /* PCI PM L1.1 Support */
#define PCI_L1SS_CTL1_ASPM_L1_2 4 /* ASPM L1.2 Support */
#define PCI_L1SS_CTL1_ASPM_L1_1 8 /* ASPM L1.1 Support */
#define PCI_L1SS_CTL1_L1SS_MASK 0x0000000F
#define PCI_L1SS_CTL2 0xC /* Control Register 2 */
/* ASPM L1 PM Substates */
#define PCI_L1SS_CAP 0x04 /* Capabilities Register */
#define PCI_L1SS_CAP_PCIPM_L1_2 0x00000001 /* PCI-PM L1.2 Supported */
#define PCI_L1SS_CAP_PCIPM_L1_1 0x00000002 /* PCI-PM L1.1 Supported */
#define PCI_L1SS_CAP_ASPM_L1_2 0x00000004 /* ASPM L1.2 Supported */
#define PCI_L1SS_CAP_ASPM_L1_1 0x00000008 /* ASPM L1.1 Supported */
#define PCI_L1SS_CAP_L1_PM_SS 0x00000010 /* L1 PM Substates Supported */
#define PCI_L1SS_CAP_CM_RESTORE_TIME 0x0000ff00 /* Port Common_Mode_Restore_Time */
#define PCI_L1SS_CAP_P_PWR_ON_SCALE 0x00030000 /* Port T_POWER_ON scale */
#define PCI_L1SS_CAP_P_PWR_ON_VALUE 0x00f80000 /* Port T_POWER_ON value */
#define PCI_L1SS_CTL1 0x08 /* Control 1 Register */
#define PCI_L1SS_CTL1_PCIPM_L1_2 0x00000001 /* PCI-PM L1.2 Enable */
#define PCI_L1SS_CTL1_PCIPM_L1_1 0x00000002 /* PCI-PM L1.1 Enable */
#define PCI_L1SS_CTL1_ASPM_L1_2 0x00000004 /* ASPM L1.2 Enable */
#define PCI_L1SS_CTL1_ASPM_L1_1 0x00000008 /* ASPM L1.1 Enable */
#define PCI_L1SS_CTL1_L1SS_MASK 0x0000000f
#define PCI_L1SS_CTL1_CM_RESTORE_TIME 0x0000ff00 /* Common_Mode_Restore_Time */
#define PCI_L1SS_CTL1_LTR_L12_TH_VALUE 0x03ff0000 /* LTR_L1.2_THRESHOLD_Value */
#define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */
#define PCI_L1SS_CTL2 0x0c /* Control 2 Register */
#endif /* LINUX_PCI_REGS_H */

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
@ -151,6 +152,12 @@ struct kvm_arch_memory_slot {
(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
#define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
/* PL1 Physical Timer Registers */
#define KVM_REG_ARM_PTIMER_CTL ARM_CP15_REG32(0, 14, 2, 1)
#define KVM_REG_ARM_PTIMER_CNT ARM_CP15_REG64(0, 14)
#define KVM_REG_ARM_PTIMER_CVAL ARM_CP15_REG64(2, 14)
/* Virtual Timer Registers */
#define KVM_REG_ARM_TIMER_CTL ARM_CP15_REG32(0, 14, 3, 1)
#define KVM_REG_ARM_TIMER_CNT ARM_CP15_REG64(1, 14)
#define KVM_REG_ARM_TIMER_CVAL ARM_CP15_REG64(3, 14)
@ -215,6 +222,7 @@ struct kvm_arch_memory_slot {
#define KVM_DEV_ARM_ITS_SAVE_TABLES 1
#define KVM_DEV_ARM_ITS_RESTORE_TABLES 2
#define KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
#define KVM_DEV_ARM_ITS_CTRL_RESET 4
/* KVM_IRQ_LINE irq field index values */
#define KVM_ARM_IRQ_TYPE_SHIFT 24

View File

@ -1 +1,2 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#include <asm-generic/kvm_para.h>

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* arch/arm/include/asm/unistd.h
*
@ -35,5 +36,6 @@
#define __ARM_NR_usr26 (__ARM_NR_BASE+3)
#define __ARM_NR_usr32 (__ARM_NR_BASE+4)
#define __ARM_NR_set_tls (__ARM_NR_BASE+5)
#define __ARM_NR_get_tls (__ARM_NR_BASE+6)
#endif /* __ASM_ARM_UNISTD_H */

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Copyright (C) 2012,2013 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
@ -195,6 +196,12 @@ struct kvm_arch_memory_slot {
#define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
/* Physical Timer EL0 Registers */
#define KVM_REG_ARM_PTIMER_CTL ARM64_SYS_REG(3, 3, 14, 2, 1)
#define KVM_REG_ARM_PTIMER_CVAL ARM64_SYS_REG(3, 3, 14, 2, 2)
#define KVM_REG_ARM_PTIMER_CNT ARM64_SYS_REG(3, 3, 14, 0, 1)
/* EL0 Virtual Timer Registers */
#define KVM_REG_ARM_TIMER_CTL ARM64_SYS_REG(3, 3, 14, 3, 1)
#define KVM_REG_ARM_TIMER_CNT ARM64_SYS_REG(3, 3, 14, 3, 2)
#define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2)
@ -227,6 +234,7 @@ struct kvm_arch_memory_slot {
#define KVM_DEV_ARM_ITS_SAVE_TABLES 1
#define KVM_DEV_ARM_ITS_RESTORE_TABLES 2
#define KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
#define KVM_DEV_ARM_ITS_CTRL_RESET 4
/* Device Control API on vcpu fd */
#define KVM_ARM_VCPU_PMU_V3_CTRL 0

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* Copyright (C) 2012 ARM Ltd.
*

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) */
/*
* ePAPR hcall interface
*

View File

@ -1,3 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
@ -442,6 +443,31 @@ struct kvm_ppc_rmmu_info {
__u32 ap_encodings[8];
};
/* For KVM_PPC_GET_CPU_CHAR */
struct kvm_ppc_cpu_char {
__u64 character; /* characteristics of the CPU */
__u64 behaviour; /* recommended software behaviour */
__u64 character_mask; /* valid bits in character */
__u64 behaviour_mask; /* valid bits in behaviour */
};
/*
* Values for character and character_mask.
* These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
*/
#define KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 (1ULL << 63)
#define KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED (1ULL << 62)
#define KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 (1ULL << 61)
#define KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 (1ULL << 60)
#define KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV (1ULL << 59)
#define KVM_PPC_CPU_CHAR_BR_HINT_HONOURED (1ULL << 58)
#define KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF (1ULL << 57)
#define KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS (1ULL << 56)
#define KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY (1ULL << 63)
#define KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR (1ULL << 62)
#define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR (1ULL << 61)
/* Per-vcpu XICS interrupt controller state */
#define KVM_REG_PPC_ICP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c)

Some files were not shown because too many files have changed in this diff Show More