Commit graph

20061 commits

Author SHA1 Message Date
Marcel Apfelbaum d6b6abc51d fw_cfg: fix memory corruption when all fw_cfg slots are used
When all the fw_cfg slots are used, a write is made outside the
bounds of the fw_cfg files array as part of the sort algorithm.

Fix it by avoiding an unnecessary array element move.
Fix also an assert while at it.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <20180108215007.46471-1-marcel@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Igor Mammedov d342eb7662 possible_cpus: add CPUArchId::type field
Remove dependency of possible_cpus on 1st CPU instance,
which decouples configuration data from CPU instances that
are created using that data.

Also later it would be used for enabling early cpu to numa node
configuration at runtime qmp_query_hotpluggable_cpus() should
provide a list of available cpu slots at early stage,
before machine_init() is called and the 1st cpu is created,
so that mgmt might be able to call it and use output to set
numa mapping.

Use MachineClass::possible_cpu_arch_ids() callback to set
cpu type info, along with the rest of possible cpu properties,
to let machine define which cpu type* will be used.

* for SPAPR it will be a spapr core type and for ARM/s390x/x86
  a respective descendant of CPUClass.

Move parse_numa_opts() in vl.c after cpu_model is parsed into
cpu_type so that possible_cpu_arch_ids() would know which
cpu_type to use during layout initialization.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1515597770-268979-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Haozhong Zhang cb836434cd nvdimm: add 'unarmed' option
Currently the only vNVDIMM backend can guarantee the guest write
persistence is device DAX on Linux, because no host-side kernel cache
is involved in the guest access to it. The approach to detect whether
the backend is device DAX needs to access sysfs, which may not work
with SELinux.

Instead, we add the 'unarmed' option to device 'nvdimm', so that users
or management utils, which have enough knowledge about the backend,
can control the unarmed flag in guest ACPI NFIT via this option. The
guest Linux NVDIMM driver, for example, will mark the corresponding
vNVDIMM device read-only if the unarmed flag in guest NFIT is set.

The default value of 'unarmed' option is 'off' in order to keep the
backwards compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171211072806.2812-4-haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Haozhong Zhang da6789c27c nvdimm: add a macro for property "label-size"
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20171211072806.2812-3-haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Thomas Huth 03fcbd9dc5 qdev: Check for the availability of a hotplug controller before adding a device
The qdev_unplug() function contains a g_assert(hotplug_ctrl) statement,
so QEMU crashes when the user tries to device_add + device_del a device
that does not have a corresponding hotplug controller. This could be
provoked for a couple of devices in the past (see commit 4c93950659
or 84ebd3e8c7 for example), and can currently for example also be
triggered like this:

$ s390x-softmmu/qemu-system-s390x -M none -nographic
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) device_add qemu-s390x-cpu,id=x
(qemu) device_del x
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)

So devices clearly need a hotplug controller when they should be usable
with device_add.
The code in qdev_device_add() already checks whether the bus has a proper
hotplug controller, but for devices that do not have a corresponding bus,
there is no appropriate check available yet. In that case we should check
whether the machine itself provides a suitable hotplug controller and
refuse to plug the device if none is available.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1509617407-21191-3-git-send-email-thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Eduardo Habkost ef18310d54 q35: Allow only supported dynamic sysbus devices
The only user-creatable sysbus devices in qemu-system-x86_64 are
amd-iommu, intel-iommu, and xen-backend.  xen-backend is handled
by xen_set_dynamic_sysbus(), so we only need to add amd-iommu and
intel-iommu.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-7-ehabkost@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Eduardo Habkost b1b68e1094 xen: Add only xen-sysdev to dynamic sysbus device list
There's no need to make the machine allow every possible sysbus
device.  We can now just add xen-sysdev to the allowed list.

Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-6-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Eduardo Habkost 7da79a167a spapr: Allow only supported dynamic sysbus devices
TYPE_SPAPR_PCI_HOST_BRIDGE is the only dynamic sysbus device not
rejected by ppc_spapr_reset(), so it can be the only entry on the
allowed list.

Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-5-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Eduardo Habkost 50d01d240f ppc: e500: Allow only supported dynamic sysbus devices
platform_bus_create_devtree() already rejects all dynamic sysbus
devices except TYPE_ETSEC_COMMON, so register it as the only
allowed dynamic sysbus device for the ppce500 machine-type.

Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-4-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Eduardo Habkost 6f2062b975 hw/arm/virt: Allow only supported dynamic sysbus devices
Replace the TYPE_SYS_BUS_DEVICE entry in the allowed sysbus
device list with the two device types that are really supported
by the virt machine: vfio-amd-xgbe and vfio-calxeda-xgmac.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-arm@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-3-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Eduardo Habkost 0bd1909da6 machine: Replace has_dynamic_sysbus with list of allowed devices
The existing has_dynamic_sysbus flag makes the machine accept
every user-creatable sysbus device type on the command-line.
Replace it with a list of allowed device types, so machines can
easily accept some sysbus devices while rejecting others.

To keep exactly the same behavior as before, the existing
has_dynamic_sysbus=true assignments are replaced with a
TYPE_SYS_BUS_DEVICE entry on the allowed list.  Other patches
will replace the TYPE_SYS_BUS_DEVICE entries with more specific
lists of devices.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: qemu-arm@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-2-ehabkost@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19 11:18:51 -02:00
Jay Zhou f4bf56fb78 vhost: remove assertion to prevent crash
QEMU will assert on vhost-user backed virtio device hotplug if QEMU is
using more RAM regions than VHOST_MEMORY_MAX_NREGIONS (for example if
it were started with a lot of DIMM devices).

Fix it by returning error instead of asserting and let callers of
vhost_set_mem_table() handle error condition gracefully.

Cc: qemu-stable@nongnu.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:39 +02:00
Michael S. Tsirkin 69aff03064 vhost-user: fix misaligned access to payload
We currently take a pointer to a misaligned field of a packed structure.
clang reports this as a build warning.
A fix is to keep payload in a separate structure, and access is it
from there using a vectored write.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:39 +02:00
Michael S. Tsirkin 24e34754eb vhost-user: factor out msg head and payload
split header and payload into separate structures,
to enable easier handling of alignment issues.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:39 +02:00
Mohammed Gamal a0c167a184 x86_iommu: check if machine has PCI bus
Starting qemu with
qemu-system-x86_64 -S -M isapc -device {amd|intel}-iommu
leads to a segfault. The code assume PCI bus is present and
tries to access the bus structure without checking.

Since Intel VT-d and AMDVI should only work with PCI, add a
check for PCI bus and return error if not present.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2018-01-18 21:52:38 +02:00
Mohammed Gamal 29396ed9ac x86_iommu: Move machine check to x86_iommu_realize()
Instead of having the same error checks in vtd_realize()
and amdvi_realize(), move that over to the generic
x86_iommu_realize().

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2018-01-18 21:52:38 +02:00
Dou Liyang 6cf6fe394a hw/acpi-build: Make next_base easy to follow
It may be hard to read the assignment statement of "next_base", so

S/next_base += (1ULL << 32) - pcms->below_4g_mem_size;
 /next_base = mem_base + mem_len;

... for readability.

No functionality change.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:38 +02:00
Marcel Apfelbaum fced4d00e6 hw/pci-bridge: fix QEMU crash because of pcie-root-port
If we try to use more pcie_root_ports then available slots
and an IO hint is passed to the port, QEMU crashes because
we try to init the "IO hint" capability even if the device
is not created.
Fix it by checking for error before adding the capability,
so QEMU can fail gracefully.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:38 +02:00
Prasad Singamsetty 37f51384ae intel-iommu: Extend address width to 48 bits
The current implementation of Intel IOMMU code only supports 39 bits
iova address width. This patch provides a new parameter (x-aw-bits)
for intel-iommu to extend its address width to 48 bits but keeping the
default the same (39 bits). The reason for not changing the default
is to avoid potential compatibility problems with live migration of
intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits'
parameter are 39 and 48.

After enabling larger address width (48), we should be able to map
larger iova addresses in the guest. For example, a QEMU guest that
is configured with large memory ( >=1TB ). To check whether 48 bits
aw is enabled, we can grep in the guest dmesg output with line:
"DMAR: Host address width 48".

Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:38 +02:00
Prasad Singamsetty 92e5d85e83 intel-iommu: Redefine macros to enable supporting 48 bit address width
The current implementation of Intel IOMMU code only supports 39 bits
host/iova address width so number of macros use hard coded values based
on that. This patch is to redefine them so they can be used with
variable address widths. This patch doesn't add any new functionality
but enables adding support for 48 bit address width.

Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:38 +02:00
Yuval Shaia 37e626ceda pci/shpc: Move function to generic header file
This function should be declared in generic header file so we can
utilize it.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:38 +02:00
Gal Hammer 6f0bb23072 virtio: improve virtio devices initialization time
The loading time of a VM is quite significant when its virtio
devices use a large amount of virt-queues (e.g. a virtio-serial
device with max_ports=511). Most of the time is spend in the
creation of all the required event notifiers (ioeventfd and memory
regions).

This patch pack all the changes to the memory regions in a
single memory transaction.

Reported-by: Sitong Liu <siliu@redhat.com>
Reported-by: Xiaoling Gao <xiagao@redhat.com>
Signed-off-by: Gal Hammer <ghammer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:38 +02:00
Gal Hammer 4fe6d78b2e virtio: postpone the execution of event_notifier_cleanup function
Use the EventNotifier's cleanup callback function to execute the
event_notifier_cleanup function after kvm unregistered the eventfd.

This change supports running the virtio_bus_set_host_notifier
function inside a memory region transaction. Otherwise, a closed
fd is sent to kvm, which results in a failure.

Signed-off-by: Gal Hammer <ghammer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:37 +02:00
Changpeng Liu 00343e4b54 vhost-user-blk: introduce a new vhost-user-blk host device
This commit introduces a new vhost-user device for block, it uses a
chardev to connect with the backend, same with Qemu virito-blk device,
Guest OS still uses the virtio-blk frontend driver.

To use it, start QEMU with command line like this:

qemu-system-x86_64 \
    -chardev socket,id=char0,path=/path/vhost.socket \
    -device vhost-user-blk-pci,chardev=char0,num-queues=2, \
            bootindex=2... \

Users can use different parameters for `num-queues` and `bootindex`.

Different with exist Qemu virtio-blk host device, it makes more easy
for users to implement their own I/O processing logic, such as all
user space I/O stack against hardware block device. It uses the new
vhost messages(VHOST_USER_GET_CONFIG) to get block virtio config
information from backend process.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:37 +02:00
Changpeng Liu 4c3e257b5e vhost-user: add new vhost user messages to support virtio config space
Add VHOST_USER_GET_CONFIG/VHOST_USER_SET_CONFIG messages which can be
used for live migration of vhost user devices, also vhost user devices
can benefit from the messages to get/set virtio config space from/to the
I/O target. For the purpose to support virtio config space change,
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG message is added as the event notifier
in case virtio config space change in the slave I/O target.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:37 +02:00
Peter Maydell 5cad8ca516 x86 queue, 2018-01-17
Highlight: new CPU models that expose CPU features that guests
 can use to mitigate CVE-2017-5715 (Spectre variant #2).
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJaX/+jAAoJECgHk2+YTcWmlzIP/i0oKKTtMccOozXQ8XbxfGs/
 Ek+k1joJSBRixUEB+hHHLraRmtw0b94R6uWXRF1KK9CPD06annHdr4tOAsryrQmp
 /lJfs7weGKi8o4Jz/YJW83NzNdNie0XloiS3+JGfu8fRh2EJDW3lv0j2CT3ytRlf
 rbal/j2E8lsmSsdL1lGbwb3E3DWDWIesWOGQMd3tu3WiMBMSgDqZa8RZo7hNiRsE
 7Vdj2yAWuj3vKRLSipIsSSOimr2P1hZsCMP2CI43BIvl6gW1S5ymExEppLNxruH6
 mqjAC96It3kqEZHVMPJg4evhwZitNxgqGtgrEbVfeZj+DTO/ZP6X6pcqtLdPA553
 dMrspDkYgU/OvE1ZQSMEXUm2IDt6fmpRiC4LvkWjMkvOOADIIBzL6LTzBd4k6fZ2
 hxQi+nc/IrIkQpq3f51YRVxwOs8otTBJzyqokxRvB3tOhg/I+NMxCvz5dyRjj5sN
 33eVdIuyndHiPTyvvv8eCjFeQG+wFFptPXMUhUEvJvQobJ/ZW76E+On8Kz3aYEF8
 lz++g3HvN7b7YPx3fqAvRfX/nZtDt04MDXvvnccXRt55Cn8tblQ92y84Wjc84SNZ
 lkgKhl4uOg6k7A1TblIhrk93eew/hSqaW8R8+y6qTUMkS6teAFsMrT0BSKETi1do
 GWTTbgH/3OECAQYFopBz
 =GtpX
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging

x86 queue, 2018-01-17

Highlight: new CPU models that expose CPU features that guests
can use to mitigate CVE-2017-5715 (Spectre variant #2).

# gpg: Signature made Thu 18 Jan 2018 02:00:03 GMT
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-pull-request:
  i386: Add EPYC-IBPB CPU model
  i386: Add new -IBRS versions of Intel CPU models
  i386: Add FEAT_8000_0008_EBX CPUID feature word
  i386: Add spec-ctrl CPUID bit
  i386: Add support for SPEC_CTRL MSR
  i386: Change X86CPUDefinition::model_id to const char*
  target/i386: add clflushopt to "Skylake-Server" cpu model
  pc: add 2.12 machine types

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-18 12:59:24 +00:00
Haozhong Zhang df47ce8af4 pc: add 2.12 machine types
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171219033730.12748-2-haozhong.zhang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-17 23:04:31 -02:00
Cédric Le Goater fef592f909 ppc/pnv: change initrd address
When skiboot starts, it first clears the CPU structs for all possible
CPUs on a system :

	for (i = 0; i <= cpu_max_pir; i++)
		memset(&cpu_stacks[i].cpu, 0, sizeof(struct cpu_thread));

On POWER9, cpu_max_pir is quite big, 0x7fff, and the skiboot cpu_stacks
array overlaps with the memory region in which QEMU maps the initramfs
file. Move it upwards in memory to keep it safe.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Cédric Le Goater c035851ac0 ppc/pnv: fix XSCOM core addressing on POWER9
The XSCOM base address of the core chiplet was wrongly calculated. Use
the OPAL macros to fix that and do a couple of renames.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Cédric Le Goater b3b066e9d8 ppc/pnv: introduce pnv*_is_power9() helpers
These are useful when instantiating device models which are shared
between the POWER8 and the POWER9 processor families.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Cédric Le Goater 09279d7e7b ppc/pnv: change core mask for POWER9
When addressed by XSCOM, the first core has the 0x20 chiplet ID but
the CPU PIR can start at 0x0.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Cédric Le Goater 83028a2b28 ppc/pnv: use POWER9 DD2 processor
commit 1ed9c8af50 ("target/ppc: Add POWER9 DD2.0 model information")
deprecated the POWER9 model v1.0.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
David Gibson 8904e5a750 spapr: Adjust default VSMT value for better migration compatibility
fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
"vsmt" parameter for the pseries machine type, which controls the spacing
of the vcpu ids of thread 0 for each virtual core.  This was done to bring
some consistency and stability to how that was done, while still allowing
backwards compatibility for migration and otherwise.

The default value we used for vsmt was set to the max of the host's
advertised default number of threads and the number of vthreads per vcore
in the guest.  This was done to continue running without extra parameters
on older KVM versions which don't allow the VSMT value to be changed.

Unfortunately, even that smaller than before leakage of host configuration
into guest visible configuration still breaks things.  Specifically a guest
with 4 (or less) vthread/vcore will get a different vsmt value when
running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host.  That means the
vcpu ids don't line up so you can't migrate between them, though you should
be able to.

Long term we really want to make vsmt == smp_threads for sufficiently
new machine types.  However, that means that qemu will then require a
sufficiently recent KVM (one which supports changing VSMT) - that's still
not widely enough deployed to be really comfortable to do.

In the meantime we need some default that will work as often as
possible.  This patch changes that default to 8 in all circumstances.
This does change guest visible behaviour (including for existing
machine versions) for many cases - just not the most common/important
case.

Following is case by case justification for why this is still the least
worst option.  Note that any of the old behaviours can still be duplicated
after this patch, it's just that it requires manual intervention by
setting the vsmt property on the command line.

KVM HV on POWER8 host:
   This is the overwhelmingly common case in production setups, and is
   unchanged by design.  POWER8 hosts will advertise a default VSMT mode
   of 8, and > 8 vthreads/vcore isn't permitted

KVM HV on POWER7 host:
   Will break, but POWER7s allowing KVM were never released to the public.

KVM HV on POWER9 host:
   Not yet released to the public, breaking this now will reduce other
   breakage later.

KVM HV on PowerPC 970:
   Will theoretically break it, but it was barely supported to begin with
   and already required various user visible hacks to work.  Also so old
   that I just don't care.

TCG:
   This is the nastiest one; it means migration of TCG guests (without
   manual vsmt setting) will break.  Since TCG is rarely used in production
   I think this is worth it for the other benefits.  It does also remove
   one more barrier to TCG<->KVM migration which could be interesting for
   debugging applications.

KVM PR:
   As with TCG, this will break migration of existing configurations,
   without adding extra manual vsmt options.  As with TCG, it is rare in
   production so I think the benefits outweigh breakages.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson 1f20f2e0ee spapr: Allow some cases where we can't set VSMT mode in the kernel
At present if we require a vsmt mode that's not equal to the kernel's
default, and the kernel doesn't let us change it (e.g. because it's an old
kernel without support) then we always fail.

But in fact we can cope with the kernel having a different vsmt as long as
  a) it's >= the actual number of vthreads/vcore (so that guest threads
     that are supposed to be on the same core act like it)
  b) it's a submultiple of the requested vsmt mode (so that guest threads
     spaced by the vsmt value will act like they're on different cores)

Allowing this case gives us a bit more freedom to adjust the vsmt behaviour
without breaking existing cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson abbc124753 target/ppc: Clarify compat mode max_threads value
We recently had some discussions that were sidetracked for a while, because
nearly everyone misapprehended the purpose of the 'max_threads' field in
the compatiblity modes table.  It's all about guest expectations, not host
expectations or support (that's handled elsewhere).

In an attempt to avoid a repeat of that confusion, rename the field to
'max_vthreads' and add an explanatory comment.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
2018-01-17 09:35:24 +11:00
David Gibson 895d5cd620 spapr: Remove unnecessary 'options' field from sPAPRCapabilityInfo
The options field here is intended to list the available values for the
capability.  It's not used yet, because the existing capabilities are
boolean.

We're going to add capabilities that aren't, but in that case the info on
the possible values can be folded into the .description field.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
Suraj Jitindar Singh 4e5fe3688e hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representation
Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.

Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.

On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.

This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17 09:35:24 +11:00
David Gibson 2d1fb9bc8e spapr: Handle Decimal Floating Point (DFP) as an optional capability
Decimal Floating Point has been available on POWER7 and later (server)
cpus.  However, it can be disabled on the hypervisor, meaning that it's
not available to guests.

We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host.  That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.

This patch handles it by treating it as an optional capability for the
pseries machine type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson 2938664286 spapr: Handle VMX/VSX presence as an spapr capability flag
We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.

This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities.  We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.

Rework the advertisement of VSX and VMX based on a new VSX capability.  We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.

NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson be85537d65 spapr: Validate capabilities on migration
Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration.  Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.

This adds code to migrate the capabilities flags.  These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence.  However,
they are checked against the destination capabilities.

If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour.  If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson ee76a09fc7 spapr: Treat Hardware Transactional Memory (HTM) as an optional capability
This adds an spapr capability bit for Hardware Transactional Memory.  It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it).  However it is disabled by default for the latest pseries-2.12
machine type.

This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:

 * This way running with -M pseries,accel=tcg will start with whatever cpu
   and will provide the same guest visible model as with accel=kvm.
     - More specifically, this means existing make check tests don't have
       to be modified to use cap-htm=off in order to run with TCG

 * We hope to add a new "HTM without suspend" feature in the not too
   distant future which could work on both POWER8 and POWER9 cpus, and
   could be enabled by default.

 * Best guesses suggest that future POWER cpus may well only support the
   HTM-without-suspend model, not the (frankly, horribly overcomplicated)
   POWER8 style HTM with suspend.

 * Anecdotal evidence suggests problems with HTM being enabled when it
   wasn't wanted are more common than being missing when it was.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
David Gibson 33face6b89 spapr: Capabilities infrastructure
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor.  PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.

In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.

Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.

We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability.  But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.

This introduces an infrastructure for defining "sPAPR capabilities".  These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.

The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations).  If not we simply fail, rather than silently modifying the
advertised featureset to the guest.

This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17 09:35:24 +11:00
Peter Maydell c1d5b9add7 * QemuMutex tracing improvements (Alex)
* ram_addr_t optimization (David)
 * SCSI fixes (Fam, Stefan, me)
 * do {} while (0) fixes (Eric)
 * KVM fix for PMU (Jan)
 * memory leak fixes from ASAN (Marc-André)
 * migration fix for HPET, icount, loadvm (Maria, Pavel)
 * hflags fixes (me, Tao)
 * block/iscsi uninitialized variable (Peter L.)
 * full support for GMainContexts in character devices (Peter Xu)
 * more boot-serial-test (Thomas)
 * Memory leak fix (Zhecheng)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJaXgkRAAoJEL/70l94x66DA3EIAI8z8Y+1NAmbLqiHhrrN9Ji/
 b8EHQ8wf0pwwrHuRVKYZvKUU8yvp/CRIoVWZwfeGjRbZC+l7l+BAwdOx42Bj/dUW
 VopNzcJMu3s5SNwoYLvs01OjhciBYNXWTXBkIiErwurF0Ow7oYR7trkLwOw0veSO
 L4qFAGoIBI/7b6BZ3YRQXshhzdSQ6dvHrDness2V1c0crLG+yhvjKJ8PJ2tJyNZO
 DbsrCd7hS6e6liSUqdLj9XgRySFj9R5kgjaLjckjg1SC6kmhLN9hyke8iXgH7uvz
 WGnRPmKjKexFHVYgR0rRFlazcQclAczHuIi/OZe0HLi6trg2YKBkolMaQLQdgfk=
 =HTyS
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* QemuMutex tracing improvements (Alex)
* ram_addr_t optimization (David)
* SCSI fixes (Fam, Stefan, me)
* do {} while (0) fixes (Eric)
* KVM fix for PMU (Jan)
* memory leak fixes from ASAN (Marc-André)
* migration fix for HPET, icount, loadvm (Maria, Pavel)
* hflags fixes (me, Tao)
* block/iscsi uninitialized variable (Peter L.)
* full support for GMainContexts in character devices (Peter Xu)
* more boot-serial-test (Thomas)
* Memory leak fix (Zhecheng)

# gpg: Signature made Tue 16 Jan 2018 14:15:45 GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (51 commits)
  scripts/analyse-locks-simpletrace.py: script to analyse lock times
  util/qemu-thread-*: add qemu_lock, locked and unlock trace events
  cpu: flush TB cache when loading VMState
  block/iscsi: fix initialization of iTask in iscsi_co_get_block_status
  find_ram_offset: Align ram_addr_t allocation on long boundaries
  find_ram_offset: Add comments and tracing
  cpu_physical_memory_sync_dirty_bitmap: Another alignment fix
  checkpatch: Enforce proper do/while (0) style
  maint: Fix macros with broken 'do/while(0); ' usage
  tests: Avoid 'do/while(false); ' in vhost-user-bridge
  chardev: Clean up previous patch indentation
  chardev: Use goto/label instead of do/break/while(0)
  mips: Tweak location of ';' in macros
  net: Drop unusual use of do { } while (0);
  irq: fix memory leak
  cpus: unify qemu_*_wait_io_event
  icount: fixed saving/restoring of icount warp timers
  scripts/qemu-gdb/timers.py: new helper to dump timer state
  scripts/qemu-gdb: add simple tcg lock status helper
  target-i386: update hflags on Hypervisor.framework
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-16 15:45:15 +00:00
Eric Blake 2562755ee7 maint: Fix macros with broken 'do/while(0); ' usage
The point of writing a macro embedded in a 'do { ... } while (0)'
loop (particularly if the macro has multiple statements or would
otherwise end with an 'if' statement) is so that the macro can be
used as a drop-in statement with the caller supplying the
trailing ';'.  Although our coding style frowns on brace-less 'if':
  if (cond)
    statement;
  else
    something else;
that is the classic case where failure to use do/while(0) wrapping
would cause the 'else' to pair with any embedded 'if' in the macro
rather than the intended outer 'if'.  But conversely, if the macro
includes an embedded ';', then the same brace-less coding style
would now have two statements, making the 'else' a syntax error
rather than pairing with the outer 'if'.  Thus, even though our
coding style with required braces is not impacted, ending a macro
with ';' makes our code harder to port to projects that use
brace-less styles.

The change should have no semantic impact.  I was not able to
fully compile-test all of the changes (as some of them are
examples of the ugly bit-rotting debug print statements that are
completely elided by default, and I didn't want to recompile
with the necessary -D witnesses - cleaning those up is left as a
bite-sized task for another day); I did, however, audit that for
all files touched, all callers of the changed macros DID supply
a trailing ';' at the callsite, and did not appear to be used
as part of a brace-less conditional.

Found mechanically via: $ git grep -B1 'while (0);' | grep -A1 \\\\

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20171201232433.25193-7-eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 14:54:52 +01:00
Eric Blake 1b4c0a0436 net: Drop unusual use of do { } while (0);
For a couple of macros in pcnet.c, we have to provide a new scope
to avoid compiler warnings about declarations in the middle of a
switch statement that aren't in a sub-scope.  But use of
'do { ... } while (0);' merely to provide that new scope is arcane
overkill, compared to just using '{ ... }'.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20171201232433.25193-2-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 14:54:51 +01:00
Stefan Hajnoczi 24355b79bd scsi-disk: release AioContext in unaligned WRITE SAME case
scsi_write_same_complete() can retry the write if the request was
unaligned.  Make sure to release the AioContext when that code path is
taken!

This patch fixes a hang when QEMU terminates after an unaligned WRITE
SAME request has been processed with dataplane.  The hang occurs because
iothread_stop_all() cannot acquire the AioContext lock that was leaked
by the IOThread in scsi_write_same_complete().

Fixes: b9e413dd37 ("block: explicitly acquire aiocontext in aio callbacks that need it").
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reported-by: Cong Li <coli@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180104142502.15175-1-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 14:54:50 +01:00
Marc-André Lureau b7438458a1 mips: fix potential fopen(NULL,...)
Spotted thanks to ASAN.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180104160523.22995-18-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 14:54:50 +01:00
Philippe Mathieu-Daudé 60765b6cee sdhci: add a 'dma' property to the sysbus devices
Add a 'dma' property allowing machine creation to provide the address-space
SDHCI DMA operates on.

[based on a patch from Alistair Francis <alistair.francis@xilinx.com>
 from qemu/xilinx tag xilinx-v2016.1]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180115182436.2066-15-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-16 13:28:21 +00:00
Philippe Mathieu-Daudé dd55c485ec sdhci: fix the PCI device, using the PCI address space for DMA
While SysBus devices can use the get_system_memory() address space,
PCI devices should use the bus master address space for DMA.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180115182436.2066-14-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-16 13:28:21 +00:00
Andrey Smirnov 5d2c0464fa sdhci: Implement write method of ACMD12ERRSTS register
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 20180115182436.2066-13-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-01-16 13:28:20 +00:00