Commit graph

662 commits

Author SHA1 Message Date
Greg Kurz 8251248394 spapr: reset DRCs after devices
A DRC with a pending unplug request releases its associated device at
machine reset time.

In the case of LMB, when all DRCs for a DIMM device have been reset,
the DIMM gets unplugged, causing guest memory to disappear. This may
be very confusing for anything still using this memory.

This is exactly what happens with vhost backends, and QEMU aborts
with:

qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion
 `r >= 0' failed.

The issue is that each DRC registers a QEMU reset handler, and we
don't control the order in which these handlers are called (ie,
a LMB DRC will unplug a DIMM before the virtio device using the
memory on this DIMM could stop its vhost backend).

To avoid such situations, let's reset DRCs after all devices
have been reset.

Reported-by: Mallesh N. Koti <mallesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-20 10:10:56 +11:00
Suraj Jitindar Singh 7abd43baec target/ppc: Update setting of cpu features to account for compat modes
The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features
are used to communicate features of the cpu to the guest operating
system. The properties of each of these are determined based on the
selected cpu model and the availability of hypervisor features.
Currently the compatibility mode of the cpu is not taken into account.

The ibm,arch-vec-5-platform-support node is used to communicate the
level of support for various ISAv3 processor features to the guest
before CAS to inform the guests' request. The available mmu mode should
only be hash unless the cpu is a POWER9 which is not in a prePOWER9
compat mode, in which case the available modes depend on the
accelerator and the hypervisor capabilities.

The ibm,pa-featues node is used to communicate the level of cpu support
for various features to the guest os. This should only contain features
relevant to the operating mode of the processor, that is the selected
cpu model taking into account any compat mode. This means that the
compat mode should be taken into account when choosing the properties of
ibm,pa-features and they should match the compat mode selected, or the
cpu model selected if no compat mode.

Update the setting of these cpu features in the device tree as described
above to properly take into account any compat mode. We use the
ppc_check_compat function which takes into account the current processor
model and the cpu compat mode.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-20 10:07:49 +11:00
Igor Mammedov 2e9c10eba0 ppc: spapr: use generic cpu_model parsing
use generic cpu_model parsing introduced by
 (6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())

it allows to:
  * replace sPAPRMachineClass::tcg_default_cpu with
    MachineClass::default_cpu_type
  * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
    one in vl.c
  * simplify spapr_get_cpu_core_type() by removing
    not needed anymore recurrsion since alias look up
    happens earlier at vl.c and spapr_get_cpu_core_type()
    works only with resulted from that cpu type.
  * spapr no more needs to parse/depend on being phased out
    MachineState::cpu_model, all tha parsing done by generic
    code and target specific callback.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov 17be88a713 ppc: spapr: use cpu model names as tcg defaults instead of aliases
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:01 +11:00
Igor Mammedov b51d3c8818 ppc: spapr: use cpu type name directly
replace sPAPRCPUCoreClass::cpu_class with cpu type name
since it were needed just to get that at points it were
accessed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Igor Mammedov b8e999673b ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr()
there is a dedicated callback CPUClass::parse_features
which purpose is to convert -cpu features into a set of
global properties AND deal with compat/legacy features
that couldn't be directly translated into CPU's properties.

Create ppc variant of it (ppc_cpu_parse_featurestr) and
move 'compat=val' handling from spapr_cpu_core.c into it.
That removes a dependency of board/core code on cpu_model
parsing and would let to reuse common -cpu parsing
introduced by 6063d4c0

Set "max-cpu-compat" property only if it exists, in practice
it should limit 'compat' hack to spapr machine and allow
to avoid including machine/spapr headers in target/ppc/cpu.c

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Daniel Henrique Barboza 2a129767eb hw/ppc/spapr.c: abort unplug_request if previous unplug isn't done
LMB removal is completed only when the spapr_lmb_release callback
is called after all DRCs of the dimm are detached. During this
time, it is possible that a unplug request for the same dimm
arrives, trying to detach DRCs that were detached by the guest
in the first unplug_request.

BQL doesn't help in this case - the lock will prevent any concurrent
removal from happening until the end of spapr_memory_unplug_request
only. What happens is that the second unplug_request ends up calling
spapr_drc_detach in a DRC that were detached already, causing an
assert error in spapr_drc_detach (e.g
https://bugs.launchpad.net/qemu/+bug/1718118).

spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the
spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left
to be detached by the guest. When there are no more DRCs left, this
structure is deleted and the pc-dimm unplug handler is called to
finish the process.

This patch reuses the sPAPRDIMMState to allow unplug_request to know
if there is an ongoing unplug process for a given dimm, aborting the
unplug request in this case, by doing the following changes:

- in spapr_lmb_release callback, move the dimm state removal to the
end, after pc-dimm unplug handler. With this change we can check for
the existence of the dimm state to see if the unplug process is
done.

- use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request
to check if the dimm state exists. If positive, there is an unplug
operation already in progress for this dimm, meaning that we should
abort it and warn the user about it.

Fixes: https://bugs.launchpad.net/qemu/+bug/1718118
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz 827b17c468 spapr: sanity check size of the CAS buffer
The CAS buffer is provided by SLOF. A broken SLOF could pass a silly
size: either smaller than the diff header, in which case the current
code will try to allocate 16 Exabytes of memory and g_malloc0() will
abort, or bigger than the maximum memory provisioned for SLOF (ie,
40 Megabytes), which doesn't make sense. Both cases indicate that
SLOF has a bug.

Let's print out an explicit error message and exit since rebooting as
we do with other errors would only result in a reset loop.

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix format specifier that broke 32-bit builds]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz dc1b5eee86 spapr: fix OF word name in comment
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz a4f3885c74 hw/ppc: use 0 instead of fdt_path_offset(fdt, "/")
The offset of the root node is guaranteed to be 0.

This doesn't fix anything, it's just trivial cleanup of the two
remaining places where this was done under hw/ppc.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17 10:34:00 +11:00
Greg Kurz 1ec26c757d spapr: fix the value of SDR1 in kvmppc_put_books_sregs()
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.

Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.

This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.

It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz 332f7721cb spapr: introduce helpers to migrate HPT chunks and the end marker
This consolidates some duplicated code in a dedicated helpers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz 14b0d74887 ppc/kvm: generalize the use of kvmppc_get_htab_fd()
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().

This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
  => only kvmppc_read_hptes() passes an actual index, all other users
     pass 0
- add an errp argument to propagate error messages to the caller.
  => spapr migration code prints the error
  => hpte helpers pass &error_abort to keep the current behavior
     of hw_error()

While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c1385933804bb:

"This support updating htab managed by the hypervisor. Currently
 we don't have any user for this feature. This actually bring the
 store_hpte interface in-line with the load_hpte one. We may want
 to use this when we want to emulate henter hcall in qemu for HV
 kvm."

The above is still true today.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz 82be8e7394 ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.

Let's patch kvmppc_get_htab_fd() accordingly.

While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Igor Mammedov 79e0793614 numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed
Calculating default node-ids for CPUs in possible_cpu_arch_ids()
is rather fragile since defaults calculation uses nb_numa_nodes but
callback might be potentially called early before all -numa CLI
options are parsed, which would lead to cpus assigned only upto
nb_numa_nodes at the time possible_cpu_arch_ids() is called.

Issue was introduced by
(7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus)
and for example CLI:
  -smp 4 -numa node,cpus=0 -numa node
would set props.node-id in possible_cpus array for every non
explicitly mapped CPU to the first node.

Issue is not visible to guest nor to mgmt interface due to
  1) implictly mapped cpus are forced to the first node in
     case of partial mapping
  2) in case of default mapping possible_cpu_arch_ids() is
     called after all -numa options are parsed (resulting
     in correct mapping).

However it's fragile to rely on late execution of
possible_cpu_arch_ids(), therefore add machine specific
callback that returns node-id for CPU and use it to calculate/
set defaults at machine_numa_finish_init() time when all -numa
options are parsed.

Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19 16:51:33 -03:00
Cédric Le Goater 21f3f8db0e ppc/xive: fix OV5_XIVE_EXPLOIT bits
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Daniel Henrique Barboza c86c1affae hw/ppc/spapr.c: cleaning up qdev_get_machine() calls
This patch removes the qdev_get_machine() calls that are made in
spapr.c in situations where we can get an existing pointer for
the MachineState by either passing it as an argument to the function
or by using other already available pointers.

The following changes were made:

- spapr_node0_size: static function that is called two times:
at spapr_setup_hpt_and_vrma and ppc_spapr_init. In both cases we can
pass an existing MachineState pointer to it.

- spapr_build_fdt: MachineState pointer can be retrieved from
the existing sPAPRMachineState pointer.

- spapr_boot_set: the opaque in the first arg is a sPAPRMachineState
pointer as we can see inside ppc_spapr_init:

    qemu_register_boot_set(spapr_boot_set, spapr);

We can get a MachineState pointer from it.

- spapr_machine_device_plug and spapr_machine_device_unplug_request: the
MachineState, sPAPRMachineState, MachineClass and sPAPRMachineClass pointers
can all be retrieved from the HotplugHandler pointer.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Sam Bobroff fa98fbfcdf PPC: KVM: Support machine option to set VSMT mode
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.

This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.

Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Sam Bobroff 2e886fb391 ppc: spapr: Make VCPU ID handling private to SPAPR
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Sam Bobroff 81210c2009 ppc: spapr: Rename cpu_dt_id to vcpu_id
This field actually records the VCPU ID used by KVM and, although the
value is also used in the device tree it is primarily the VCPU ID so
rename it as such.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Updated comment missed in cpu.h]
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Greg Kurz e2676b1697 spapr: add pseries-2.11 machine type
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Daniel Henrique Barboza 10f12e6450 hw/ppc: CAS reset on early device hotplug
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].

At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.

In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.

A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.

With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.

This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:

- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.

- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.

No changes are made for coldplug devices.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza 5625817423 hw/ppc: clear pending_events on machine reset
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.

This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:54 +10:00
Thomas Huth 0479097859 hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev'
QEMU currently crashes when trying to use a 'pc-dimm' on the pseries
machine without specifying its 'memdev' property. This happens because
pc_dimm_get_memory_region() does not check whether the 'memdev' property
has properly been set by the user. Looking closer at this function, it's
also obvious that it is using &error_abort to call another function - and
this is bad in a function that is used in the hot-plugging calling chain
since this can also cause QEMU to exit unexpectedly.

So let's fix these issues in a proper way now: Add a "Error **errp"
parameter to pc_dimm_get_memory_region() which we use in case the 'memdev'
property has not been set by the user, and which we can use instead of
the &error_abort, and change the callers of get_memory_region() to make
use of this "errp" parameter for proper error checking.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-08-22 21:26:46 +10:00
David Gibson fc7e0765fc Revert "spapr: populate device tree depending on XIVE_EXPLOIT option"
This reverts commit b87680427e.

I thought this was a harmless preliminary for XIVE enablement patches
we expect later on.  However, due to some subtle interactions between
qemu and SLOF (guest firmware) this breaks some things.  Revert it for
now, we'll work out how to fix it when the rest of the XIVE patches
are ready.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-29 16:22:14 +10:00
Bharata B Rao 8d5981c4fc spapr: Fix QEMU abort during memory unplug
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
device that is marked for removal. Since this commit we can hit the
assert in spapr_pending_dimm_unplugs_add() in the following situation:

- DIMM device removal fails as the guest doesn't allow the removal.
- Subsequent attempt to remove the same DIMM would hit the assert
  as the corresponding sPAPRDIMMState is still part of the
  pending_dimm_unplugs list.

Fix this by removing the assert and conditionally adding the
sPAPRDIMMState to pending_dimm_unplugs list only when it is not
already present.

Fixes: 0cffce56ae
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Tweaked to avoid returning NULL when spapr_pending_dimm_unplugs_add()
 does find an existing entry]
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-25 11:14:25 +10:00
Laurent Vivier e8cd4247e9 spapr/htab: fix savevm
Commit 3a38429 ("spapr: Add a "no HPT" encoding to HTAB migration stream")
allows to migrate an empty HPT, but doesn't mark correctly the
end of the migration stream.

The end condition (value returned by htab_save_iterate())
should be 1, whereas in 3a38429 it returns 0.

The problem can be reproduced with QEMU monitor command "savevm":
the command never stops and the disk image grows without limit.

Fixes: 3a38429748
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-25 11:14:25 +10:00
Greg Kurz df8658de43 spapr: fix memory leak in spapr_core_pre_plug()
In case of error, we must ensure the dynamically allocated base_core_type
is freed, like it is done everywhere else in this function.

This is a regression introduced in QEMU 2.9 by commit 8149e2992f.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
David Gibson 2772cf6be9 pseries: Use smaller default hash page tables when guest can resize
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.

This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.

When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.

If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).

For now we make that reallocation assuming the guest has not yet used the
HPT at all.  That's true in practice, but not, strictly, an architectural
or PAPR requirement.  If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-07-17 15:07:05 +10:00
David Gibson 52b81ab5e9 pseries: Enable HPT resizing for 2.10
We've now implemented a PAPR extensions which allows PAPR guests (i.e.
"pseries" machine type) to resize their hash page table during runtime.

However, that extension is only enabled if explicitly chosen on the
command line.  This patch enables it by default for spapr-2.10, but leaves
it disabled (by default) for older machine types.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-07-17 15:07:05 +10:00
David Gibson 0b0b831016 pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table.  This will eventually allow for more flexible memory
hotplug.

The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function.  The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.

The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion.  If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.

The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT.  The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).

For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT.  This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).

In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs.  That's a project for another day, but should be possible
without any changes to the guest interface.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
David Gibson 30f4b05bd0 pseries: Stubs for HPT resizing
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table.  It
also adds a new machine property for controlling whether this new
facility is available.

For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.

Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls.  This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-07-17 15:07:05 +10:00
Greg Kurz e49c63d5b3 spapr: fix potential memory leak in spapr_core_plug()
Since commit 5c1da81215 ("spapr: Remove unnecessary differences between
hotplug and coldplug paths"), the CPU DT for the DRC is always allocated.
This causes a memory leak for pseries-2.6 and older machine types, that
don't support CPU hotplug and don't allocate DRCs for CPUs.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
David Gibson a8dc47fd82 spapr: Refactor spapr_drc_detach()
This function has two unused parameters - remove them.

It also sets awaiting_release on all paths, except one.  On that path
setting it is harmless, since it will be immediately cleared by
spapr_drc_release().  So factor it out of the if statements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson 765d1bdda5 spapr: Simplify unplug path
spapr_lmb_release() and spapr_core_release() call hotplug_handler_unplug()
which after a bunch of indirection calls spapr_memory_unplug() or
spapr_core_unplug().  But we already know which is the appropriate thing
to call here, so we can just fold it directly into the release function.

Once that's done, there's no need for an hc->unplug method in the spapr
machine at all: since we also have an hc->unplug_request method, the
hotplug core will never use ->unplug.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
Laurent Vivier 94fd9cbaa3 spapr: Treat devices added before inbound migration as coldplugged
When migrating a guest which has already had devices hotplugged,
libvirt typically starts the destination qemu with -incoming defer,
adds those hotplugged devices with qmp, then initiates the incoming
migration.

This causes problems for the management of spapr DRC state.  Because
the device is treated as hotplugged, it goes into a DRC state for a
device immediately after it's plugged, but before the guest has
acknowledged its presence.  However, chances are the guest on the
source machine *has* acknowledged the device's presence and configured
it.

If the source has fully configured the device, then DRC state won't be
sent in the migration stream: for maximum migration compatibility with
earlier versions we don't migrate DRCs in coldplug-equivalent state.
That means that the DRC effectively changes state over the migrate,
causing problems later on.

In addition, logging hotplug events for these devices isn't what we
want because a) those events should already have been issued on the
source host and b) the event queue should get wiped out by the
incoming state anyway.

In short, what we really want is to treat devices added before an
incoming migration as if they were coldplugged.

To do this, we first add a spapr_drc_hotplugged() helper which
determines if the device is hotplugged in the sense relevant for DRC
state management.  We only send hotplug events when this is true.
Second, when we add a device which isn't hotplugged in this sense, we
force a reset of the DRC state - this ensures the DRC is in a
coldplug-equivalent state (there isn't usually a system reset between
these device adds and the incoming migration).

This is based on an earlier patch by Laurent Vivier, cleaned up and
extended.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-07-17 15:07:05 +10:00
David Gibson 5341258e86 spapr: Minor cleanups to events handling
The rtas_error_log structure is marked packed, which strongly suggests its
precise layout is important to match an external interface.  Along with
that one could expect it to have a fixed endianness to match the same
interface.  That used to be the case - matching the layout of PAPR RTAS
event format and requiring BE fields.

Now, however, it's only used embedded within sPAPREventLogEntry with the
fields in native order, since they're processed internally.

Clear that up by removing the nested structure in sPAPREventLogEntry.
struct rtas_error_log is moved back to spapr_events.c where it is used as
a temporary to help convert the fields in sPAPREventLogEntry to the correct
in memory format when delivering an event to the guest.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
Daniel Henrique Barboza fd38804b38 spapr: migrate pending_events of spapr state
In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.

To achieve the minimal VMSD possible to migrate the pending_events list,
this patch makes the changes in spapr_events.c:

- 'log_type' of sPAPREventLogEntry struct deleted. This information can be
derived by inspecting the rtas_error_log summary field. A new function
called 'spapr_event_log_entry_type' was added to retrieve the type of
a given sPAPREventLogEntry.

- sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
only data we're going to migrate in the VMSD is the event log data itself,
which can be divided in two parts: a rtas_error_log header and an extended
event log field. The rtas_error_log header contains information about the
size of the extended log field, which can be used inside VMSD as the size
parameter of the VBUFFER_ALOC field that will store it. To allow this use,
the header.extended_length field must be exposed inline to the VMSD instead
of embedded into a 'data' field that holds everything. With this in mind,
the following changes were done:

    * a new 'header' field was added to sPAPREventLogEntry. This field holds a
a struct rtas_error_log inline.
    * the declaration of the 'rtas_error_log' struct was moved to spapr.h
to be visible to the VMSD macros.
    * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
now holds only the contents of the extended event log.
   *  'struct rtas_error_log hdr' were taken away from both epow_log_full
and hp_log_full. This information is now available at the header field of
sPAPREventLogEntry.
   * epow_log_full and hp_log_full were renamed to epow_extended_log and
hp_extended_log respectively. This rename makes it clearer to understand
the new purpose of both structures: hold the information of an extended
event log field.
    * spapr_powerdown_req and spapr_hotplug_req_event now creates a
sPAPREventLogEntry structure that contains the full rtas log entry.
    * rtas_event_log_queue and rtas_event_log_dequeue now receives a
sPAPREventLogEntry pointer as a parameter instead of a void pointer.

- the endianess of the sPAPREventLogEntry header is now native instead
of be32. We can use the fields in native endianess internally and write
them in be32 in the guest physical memory inside 'check_exception'. This
allows the VMSD inside spapr.c to read the correct size of the
entended_log field.

- inside spapr.c, pending_events is put in a subsection in the spapr state
VMSD to make sure migration across different versions is not broken.

A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
made: instead of calling qdev_get_machine(), both functions now receive
a pointer to the sPAPRMachineState. This pointer is already available in
the callers of these functions and we don't need to waste resources
calling qdev() again.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17 15:07:05 +10:00
Alistair Francis 3dc6f86936 Convert error_report() to warn_report()
Convert all uses of error_report("warning:"... to use warn_report()
instead. This helps standardise on a single method of printing warnings
to the user.

All of the warnings were changed using these two commands:
    find ./* -type f -exec sed -i \
      's|error_report(".*warning[,:] |warn_report("|Ig' {} +

Indentation fixed up manually afterwards.

The test-qdev-global-props test case was manually updated to ensure that
this patch passes make check (as the test cases are case sensitive).

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Suggested-by: Thomas Huth <thuth@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Lieven <pl@kamp.de>
Cc: Josh Durgin <jdurgin@redhat.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Peter Chubb <peter.chubb@nicta.com.au>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Greg Kurz <groug@kaod.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au>
Acked-by: Max Reitz <mreitz@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-07-13 13:49:58 +02:00
Peter Maydell aa916e409c ppc patch queue 2017-07-11
* Several minor cleanups from Greg Kurz
   * Fix for migration of pseries-2.7 and earlier machine types
   * More reworking of the DRC hotplug code, fixing several problems
     though there are still more to go
   * Fixes for CPU family / alias handling on POWER9
   * Preliminary patches for POWER9 XIVE (new interrupt controller)
     support
   * Assorted other fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZZFWEAAoJEGw4ysog2bOSxgAQAI85Vv8RuK1mgN0w0aIguP09
 JIM+iZ3zJwSFM3A/D8CnWxMGEQkjkVfKWT8cB97v5vPGTu21WD2hdQ26ZrcjC8Do
 Y5sPuCGRRSZvz+tnz17HU2aZMQwteNNgdes9MGr61kdVUk+1uvcyqTdhqxka5rF7
 SYcIEf95+Fcu00+bhwGaGg0ZXHer4rSTjDXbT3CcxT64sgQW8X36SceFBkFH0P40
 tX1bn9gdQgBNOT11O0MNeq6ewxHhSSusTwyYXpHTvK6p0EXPqfm+vM9dQSmXeKsk
 T7/yDmKplutVnWlfbxrdG+wp+ObE1h7KljGdWLx4jIX58dHVvjDJ+kZ+OJbcb6Xj
 oEV947tYkZaDC7q7TkwXjYltbq+A6HFFKEwxJ59L4zYgVYVkTUMRJ3Apl66sq5a1
 SHEBXAA5SDq8jxdKKqvwzh4ZtkkxIelOO8lTVjOAg8ffcNfEwbJOuom2h0kgzOgz
 Sn2PxC/jwk2RZZ4T+qe1KNpVbV3RYpGanMXYDMFUnTRw2RAU2io0R2bBwOlm/0I7
 ZUrjD2xCFrMPuthxr5/5/w0P1StALVN50S5YqWvDuQYIbMYhSjSh3tDgAHVrqL4W
 Yc1Zr5X9X91qgUjAkejBuirvWLvgofiw8jlqAZ6K2zTUcvtn0KdQGe7eiK+wostA
 PhLW9tYrkpt/BmzEMi1X
 =8Wy2
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170711' into staging

ppc patch queue 2017-07-11

  * Several minor cleanups from Greg Kurz
  * Fix for migration of pseries-2.7 and earlier machine types
  * More reworking of the DRC hotplug code, fixing several problems
    though there are still more to go
  * Fixes for CPU family / alias handling on POWER9
  * Preliminary patches for POWER9 XIVE (new interrupt controller)
    support
  * Assorted other fixes

# gpg: Signature made Tue 11 Jul 2017 05:35:16 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.10-20170711:
  spapr: populate device tree depending on XIVE_EXPLOIT option
  spapr: introduce the XIVE_EXPLOIT option in CAS
  ppc/kvm: have the "family" CPU alias to point to TYPE_HOST_POWERPC_CPU
  spapr: Only report host/guest IOMMU page size mismatches on KVM
  spapr: fix memory hotplug error path
  target/ppc: Add debug function for radix mmu translation
  target/ppc: Refactor tcg radix mmu code
  spapr: Use unplug_request for PCI hot unplug
  spapr: Remove unnecessary differences between hotplug and coldplug paths
  spapr: Add DRC release method
  spapr: Uniform DRC reset paths
  spapr: Leave DR-indicator management to the guest
  target-ppc: SPR_BOOKE_ESR not set on FP exceptions
  spapr: fix migration to pseries machine < 2.8
  spapr: fix bogus function name in comment
  spapr: refresh "platform-specific" hcalls comment
  spapr: make spapr_populate_hotplug_cpu_dt() static

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-11 16:34:09 +01:00
Cédric Le Goater b87680427e spapr: populate device tree depending on XIVE_EXPLOIT option
When XIVE is supported, the device tree should be populated
accordingly and the XIVE memory regions mapped to activate MMIOs.

Depending on the design we choose, we could also allocate different
ICS and ICP objects, or switch between objects. This needs to be
discussed.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:02 +10:00
Cédric Le Goater f2b14e3a9f spapr: introduce the XIVE_EXPLOIT option in CAS
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility
(the former POWER8 interrupt model) or in XIVE exploitation mode (the
newer POWER9 interrupt model).

Bit 7 of Byte 23 of vector 5 is used for this purpose.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:02 +10:00
Greg Kurz 160bb67885 spapr: fix memory hotplug error path
QEMU shouldn't abort if spapr_add_lmbs()->spapr_drc_attach() fails.
Let's propagate the error instead, like it is done everywhere else
where spapr_drc_attach() is called.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:02 +10:00
David Gibson 5c1da81215 spapr: Remove unnecessary differences between hotplug and coldplug paths
spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into
configured state initially, instead of the usual ISOLATED/UNUSABLE state.
It turns out this is unnecessary: although coldplugged devices do need to
be in CONFIGURED state once the guest starts, that will already be
accomplished by the reset code which will move DRCs for already plugged
devices into a coldplug equivalent state.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-07-11 11:04:01 +10:00
David Gibson 6caf3ac613 spapr: Uniform DRC reset paths
DRC objects have a regular device reset method.  However, it only gets
called in the usual way for PCI DRCs.  Because of where CPU and LMB DRCs
are in the QOM tree, their device reset method isn't automatically called.
So, the machine manually registers reset handlers to call device_reset().

This patch removes the device reset method, and instead always explicitly
registers the reset handler from realize().  This means the callers don't
have to worry about the two cases, and we always get proper resets.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-07-11 11:04:01 +10:00
Greg Kurz f3728f9cbb spapr: fix bogus function name in comment
$ git grep spapr_ppc_reset
hw/ppc/spapr.c: * as part of spapr_ppc_reset().

$ git grep ppc_spapr_reset
hw/ppc/spapr.c:static void ppc_spapr_reset(void)
hw/ppc/spapr.c:    mc->reset = ppc_spapr_reset;
hw/ppc/spapr_hcall.c:        /* If ppc_spapr_reset() did not set up a HPT
 but one is necessary

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:01 +10:00
Greg Kurz 04d0ffbd52 spapr: make spapr_populate_hotplug_cpu_dt() static
Since commit ff9006ddbf ("spapr: move spapr_core_[foo]plug() callbacks
close to machine code in spapr.c"), this function doesn't need to be extern
anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-11 11:04:01 +10:00
Juan Quintela 70f794fcfa migration: Rename cleanup() to save_cleanup()
We need a cleanup for loads, so we rename here to be consistent.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

--

Rename htab_cleanup to htap_save_cleanup as dave suggestion
Message-Id: <20170628095228.4661-3-quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-07-10 17:52:21 +01:00
Juan Quintela 9907e842d7 migration: Rename save_live_setup() to save_setup()
We are going to use it now for more than save live regions.
Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170628095228.4661-2-quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-07-10 17:52:21 +01:00
David Gibson 4f9242fc93 spapr: Make DRC reset force DRC into known state
The reset handler for DRCs attempts several state transitions which are
subject to various checks and restrictions.  But at reset time we know
there is no guest, so we can ignore most of the usual sequencing rules and
just set the DRC back to a known state.  In fact, it's safer to do so.

The existing code also has several redundant checks for
drc->awaiting_release inside a block which has already tested that.  This
patch removes those and sets the DRC to a fixed initial state based only
on whether a device is currently plugged or not.

With DRCs correctly reset to a state based on device presence, we don't
need to force state transitions as cold plugged devices are processed.
This allows us to remove all the callers of the set_*_state() methods from
outside spapr_drc.c.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-30 14:03:32 +10:00
Daniel Henrique Barboza aca8bf9f1c hw/ppc/spapr.c: consecutive 'spapr->patb_entry = 0' statements
In ppc_spapr_reset(), if the guest is using HPT, the code was executing:

    } else {
        spapr->patb_entry = 0;
        spapr_setup_hpt_and_vrma(spapr);
    }

And, at the end of spapr_setup_hpt_and_vrma:

    /* We're setting up a hash table, so that means we're not radix */
    spapr->patb_entry = 0;

Resulting in spapr->patb_entry being assigned to 0 twice in a row.

Given that 'spapr_setup_hpt_and_vrma' is also called inside
'spapr_check_setup_free_hpt' of spapr_hcall.c, this trivial patch removes
the 'patb_entry = 0' assignment from the 'else' clause inside ppc_spapr_reset
to avoid this behavior.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
Greg Kurz 46f7afa370 spapr: fix migration of ICPState objects from/to older QEMU
Commit 5bc8d26de2 ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
This is an improvement since we no longer allocate ICPState objects
that will never be used. But it has the side-effect of breaking
migration of older machine types from older QEMU versions.

This patch allows spapr to register dummy "icp/server" entries to vmstate.
These entries use a dedicated VMStateDescription that can swallow and
discard state of an incoming migration stream, and that don't send anything
on outgoing migration.

As for real ICPState objects, the instance_id is the cpu_index of the
corresponding vCPU, which happens to be equal to the generated instance_id
of older machine types.

The machine can unregister/register these entries when CPUs are dynamically
plugged/unplugged.

This is only available for pseries-2.9 and older machines, thanks to a
compat property.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
Bharata B Rao d39c90f5f3 spapr: Fix migration of Radix guests
Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.

Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
Bharata B Rao 3a38429748 spapr: Add a "no HPT" encoding to HTAB migration stream
Add a "no HPT" encoding (using value -1) to the HTAB migration
stream (in the place of HPT size) when the guest doesn't allocate HPT.
This will help the target side to match target HPT with the source HPT
and thus enable successful migration.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30 14:03:31 +10:00
David Gibson d5fc133eed ppc: Rework CPU compatibility testing across migration
Migrating between different CPU versions is a bit complicated for ppc.
A long time ago, we ensured identical CPU versions at either end by
checking the PVR had the same value.  However, this breaks under KVM
HV, because we always have to use the host's PVR - it's not
virtualized.  That would mean we couldn't migrate between hosts with
different PVRs, even if the CPUs are close enough to compatible in
practice (sometimes identical cores with different surrounding logic
have different PVRs, so this happens in practice quite often).

So, we removed the PVR check, but instead checked that several flags
indicating supported instructions matched.  This turns out to be a bad
idea, because those instruction masks are not architected information, but
essentially a TCG implementation detail.  So changes to qemu internal CPU
modelling can break migration - this happened between qemu-2.6 and
qemu-2.7.  That was addressed by 146c11f1 "target-ppc: Allow eventual
removal of old migration mistakes".

Now, verification of CPU compatibility across a migration basically doesn't
happen.  We simply ignore the PVR of the incoming migration, and hope the
cpu on the destination is close enough to work.

Now that we've cleaned up handling of processor compatibility modes
for pseries machine type, we can do better.  For new machine types
(pseries-2.10+) We allow migration if:

    * The source and destination PVRs are for the same type of CPU, as
      determined by CPU class's pvr_match function
OR  * When the source was in a compatibility mode, and the destination CPU
      supports the same compatibility mode

For older machine types we retain the existing behaviour - current CAS
code will usually set a compat mode which would break backwards
migration if we made them use the new behaviour. [Fixed from an
earlier version by Greg Kurz].

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
David Gibson 66d5c492dd pseries: Reset CPU compatibility mode
Currently, the CPU compatibility mode is set when the cpu is initialized,
then again when the guest negotiates features.  This means if a guest
negotiates a compatibility mode, then reboots, that compatibility mode
will be retained across the reset.

Usually that will get overridden when features are negotiated on the next
boot, but it's still not really correct.  This patch moves the initial set
up of the compatibility mode from cpu init to reset time.  The mode *is*
retained if the reboot was caused by the feature negotiation (it might
be important in that case, though it's unlikely).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
David Gibson 7843c0d60d pseries: Move CPU compatibility property to machine
Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor.  However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.

To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine.  Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.

The option was used with -cpu.  So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30 14:03:31 +10:00
Peter Xu 15c3850325 migration: move skip_section_footers
Move it into MigrationState, revert its meaning and renaming it to
send_section_footer, with a property bound to it. Same trick is played
like previous patches.

Removing savevm_skip_section_footers().

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28 11:18:39 +02:00
Peter Xu 71dd4c1a56 migration: move skip_configuration out
It was in SaveState but now moved to MigrationState altogether, reverted
its meaning, then renamed to "send_configuration". Again, using
HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for
xen_init().

Removing savevm_skip_configuration().

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28 11:18:38 +02:00
Peter Xu 5272298c48 migration: move global_state.optional out
Put it into MigrationState then we can use the properties to specify
whether to enable storing global state.

Removing global_state_set_optional() since now we can use HW_COMPAT_2_3
for x86/power, and AccelClass.global_props for Xen.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28 11:18:38 +02:00
Marc-André Lureau 9ed442b8ae pc-dimm: use get_uint() for dimm properties
TYPE_PC_DIMM's property PC_DIMM_ADDR_PROP is defined with
DEFINE_PROP_UINT64().

TYPE_PC_DIMM's property PC_DIMM_NODE_PROP is defined with
DEFINE_PROP_UINT32().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-22-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-06-20 14:31:32 +02:00
Peter Maydell 735286a4f8 migration/next for 20170613
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZP6n5AAoJEPSH7xhYctcj04oQAJczMfc2X8vTwII6lN9klf+T
 Cy32B4WB8FBO9M7oJYD/yytJ3ibcLuMwKwTy/GGfaTspuYDI/HrplUD3Pt+trDPc
 fUxmTNjK9vE9foPAwOTSwTGsdOp5ICoZuDjHTj8gtHmfFLclDxxJMojtthMJ1Csc
 qn9oJzjLn3izn8C6CY6oXGnqOt6gy2lz+RqNKlve/bwxaVdQIXTXCVsLWwQZuj48
 VI9qAFw9TsgSBi9dlTYpVfdMvItO73SVYd2c1ETzL0YSNK3S/Yhpww7fyK8TQNpO
 Y8xXMMBMybHZej1ixHXh01CRmEnBZXpjLCIXnWwxQGXxTH8p7F+W1+lhDTL4IIXR
 Py0EwiPUj4sPyTW2htSnDBRtE1uHcJlDtsFAAmsEqfeASet7ueE2bkfKwWUftqTs
 GZ7ikseIb9F0eQKjecYcEfaLtYNn+0UflgVkimW1gXIeuO58VYLpa8vdiUV3eKJn
 UCDDHGYKf7QJQLpSzYWXGRT4HJOQvaCbJ0a03hKceYyLB6rJv96khajirbczKZ92
 cja0EJfDy5S9fBulWRveHKLUAFMrR3zA4DhlK0pb591uIs4iMcKH3egHQZpv0uf0
 iifWNI+AFuorhQfdhV2G4Zg1g/fwI2RRJK7HdBOklulUrcr0caPvjjGdbA3Q0Hf6
 u61pWdr+Yb3XPaqlC2AH
 =EFHC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170613' into staging

migration/next for 20170613

# gpg: Signature made Tue 13 Jun 2017 10:01:45 BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20170613:
  migration: Move migration.h to migration/
  migration: Move remaining exported functions to migration/misc.h
  migration: create global_state.c
  migration: ram_control_* are implemented in qemu_file
  migration: Commands are only used inside migration.c
  migration: Move constants to savevm.h
  migration: Move dump_vmsate_json_to_file() to misc.h
  migration: Split registration functions from vmstate.h
  migration: Move self_announce_delay() to misc.h
  migration: Remove MigrationState from migration_channel_incomming()
  ram: Now POSTCOPY_ACTIVE is the same that STATUS_ACTIVE
  ram: Print block stats also in the complete case
  migration: Don't try to set *errp directly
  migration: isolate return path on src

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-13 13:51:29 +01:00
Juan Quintela c4b63b7cc5 migration: Move remaining exported functions to migration/misc.h
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-06-13 11:00:45 +02:00
Juan Quintela 84a899de8c migration: create global_state.c
It don't belong anywhere else, just the global state where everybody
can stick other things.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-06-13 11:00:45 +02:00
Juan Quintela f2a8f0a631 migration: Split registration functions from vmstate.h
They are indpendent, and nowadays almost every device register things
with qdev->vmsd.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-06-13 11:00:44 +02:00
Greg Kurz ad265631c0 xics: introduce macros for ICP/ICS link properties
These properties are part of the XICS API. They deserve to appear
explicitely in the XICS header file.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-09 12:12:34 +10:00
Thomas Huth 4871dd4c3f hw/ppc/spapr: Adjust firmware name for PCI bridges
SLOF uses "pci" as name for PCI bridges nodes in the device tree instead
of "pci-bridges", so booting via bootindex from a device behind a PCI
bridge currently does not work since QEMU passes the wrong name in the
"qemu,boot-list" property. Fix it by changing the name of the PCI bridge
nodes to "pci" instead.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08 14:38:27 +10:00
David Gibson 0be4e88621 spapr: Change DRC attach & detach methods to functions
DRC objects have attach & detach methods, but there's only one
implementation.  Although there are some differences in its behaviour for
different DRC types, the overall structure is the same, so while we might
want different method implementations for some parts, we're unlikely to
want them for the top-level functions.

So, replace them with direct function calls.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
David Gibson 454b580ae9 spapr: Don't misuse DR-indicator in spapr_recover_pending_dimm_state()
With some combinations of migration and hotplug we can lost temporary state
indicating how many DRCs (guest side hotplug handles) are still connected
to a DIMM object in the process of removal.  When we hit that situation
spapr_recover_pending_dimm_state() is used to scan more extensively and
work out the right number.

It does this using drc->indicator state to determine what state of
disconnection the DRC is in.  However, this is not safe, because the
indicator state is guest settable - in fact it's more-or-less a purely
guest->host notification mechanism which should have no bearing on the
internals of hotplug state management.

So, replace the test for this with a test on drc->dev, which is a purely
qemu side managed variable, and updated the same BQL critical section as
the indicator state.

This does introduce an off-by-one change, because the indicator state was
updated before the call to spapr_lmb_release() on the current DRC, whereas
drc->dev is updated afterwards.  That's corrected by always decrementing
the nr_lmbs value instead of only doing so in the case where we didn't
have to recover information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08 14:38:26 +10:00
Greg Kurz 8a9e0e7b89 spapr: fix memory leak in spapr_memory_pre_plug()
The string returned by object_property_get_str() is dynamically allocated.

(Spotted by Coverity, CID 1375942)

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08 11:05:31 +10:00
Peter Maydell e02bbe1956 ppc patch queue 2017-06-06
Accumulated patches for ppc targets and the pseries machine type.
 
 The big thing in this batch is a start on a substantial cleanup of the
 pseries hotplug mechanisms, which were pretty confusing.  For now
 these shouldn't cause substantial behavioural changes, but I am hoping
 these lead to clearer code and eventually to fixes for the bugs we
 have in hotplug handling, particularly when hotplug and migration are
 combined.
 
 The remaining patches are mostly bugfixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZNhgSAAoJEGw4ysog2bOSERwP/A7T7UJ8XXWit9QXGCi+G83w
 +RUuHxjA9qFqrg1zYqFyLg3ctGl93Sxu7mzI5MOIKwAVXlTsE6+84TH7zBc18DPB
 fekPWmzJ6jfiVO+1Zg1JPWorMfIHDDc2v6Q6qPfD8KWbt02yPfrXbKlivQB4hVZ4
 Qb4VJdjZgBDcVy79xhcW5k6v8dVw8PdSyDmkQrBhccI0noLerhI41Mgt7QQaWQRH
 Le3ziexUpWelVCRQB0FqE/PIWo2+NY/e0pumX7Aqtjs/G35KjOXy0ja3yKLjfeUW
 Z4NugIO2I2hncERa68YFar/BqG26DX8KCErNMDkn7LyZcoDAQWhcDH+65G1BNuf2
 jW+KApMNm+N1vXabbz8P9BbLjuZpRQQhyPOxB3I8UGaTYGtCPe/lUCe2/V8EbKNa
 VFavc1UuLftOZuJj/rYGJeU/4JBU6srbAKCO3VVK4Tnd8DyiT3QCpUWEkjv+J6jo
 co35oYBavLfQPMr+rsX15lgbmZwg7iBV+dgKLa2+cwmKXzCf7aYe38aJy7nRBmhb
 ivhH3bKtdysy0qq4UYaCgW06qQcVF0QMJaxFQ0X7I+GBNwHA7wdZD/i6IMcO6Z7H
 7gQdavBTdukgKb2+pVjR58H13ieHXuBxktonhOz70rvEDVa4xx8pxhnZlpSiH2ha
 RzpkhanrwEeECG6Lke/3
 =QDWB
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' into staging

ppc patch queue 2017-06-06

Accumulated patches for ppc targets and the pseries machine type.

The big thing in this batch is a start on a substantial cleanup of the
pseries hotplug mechanisms, which were pretty confusing.  For now
these shouldn't cause substantial behavioural changes, but I am hoping
these lead to clearer code and eventually to fixes for the bugs we
have in hotplug handling, particularly when hotplug and migration are
combined.

The remaining patches are mostly bugfixes.

# gpg: Signature made Tue 06 Jun 2017 03:48:50 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.10-20170606:
  spapr: Remove some non-useful properties on DRC objects
  spapr: Eliminate spapr_drc_get_type_str()
  spapr: Move configure-connector state into DRC
  spapr: Clean up spapr_dr_connector_by_*()
  spapr: Introduce DRC subclasses
  spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs
  spapr: Allow boot from vhost-*-scsi backends
  ppc/pnv: check the return value of fdt_setprop()
  spapr_nvram: Check return value from blk_getlength()
  target/ppc: Fixup set_spr error in h_register_process_table
  target-ppc: Fix openpic timer read register offset
  spapr: Make DRC get_index and get_type methods into plain functions
  spapr: Abolish DRC set_configured method
  spapr: Abolish DRC get_fdt method
  spapr: Move DRC RTAS calls into spapr_drc.c
  migration: Mark CPU states dirty before incoming migration/loadvm
  migration: remove register_savevm()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-06 14:30:06 +01:00
David Gibson b8fdd530be spapr: Move configure-connector state into DRC
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector
structures which store intermediate state for the ibm,configure-connector
RTAS call.

This was an attempt to separate this state from the core of the DRC state.
However the configure connector process is intimately tied to the DRC
model, so there's really no point trying to have two levels of interface
here.

Moving the configure-connector state into its corresponding DRC allows
removal of a number of helpers for maintaining the anciliary list.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:17 +10:00
David Gibson fbf5539718 spapr: Clean up spapr_dr_connector_by_*()
* Change names to something less ludicrously verbose
 * Now that we have QOM subclasses for the different DRC types, use a QOM
   typename instead of a PAPR type value parameter

The latter allows removal of the get_type_shift() helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:24:08 +10:00
David Gibson 2d33581899 spapr: Introduce DRC subclasses
Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our existing type system.

So, instead create QOM subclasses for each PAPR defined DRC type.  We
also introduce intermediate subclasses for physical and logical DRCs,
a division which will be useful later on.

Instead of being stored in the DRC object itself, the PAPR type is now
stored in the class structure.  There are still many places where we
switch directly on the PAPR type value, but this at least provides the
basis to start to remove those.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06 09:23:46 +10:00
Felipe Franciosi c4e13492af spapr: Allow boot from vhost-*-scsi backends
The current implementation of spapr_get_fw_dev_path() doesn't take into
consideration vhost-*-scsi devices. This makes said devices unbootable
on PPC as SLOF is unable to work out the path to scan boot disks.

This makes VMs bootable on spapr when using vhost-*-scsi by implementing
a disk path for VHostSCSICommon (which currently includes both
vhost-user-scsi and vhost-scsi).

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Mike Cui <cui@nutanix.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06 09:19:01 +10:00
David Gibson 0b55aa91c9 spapr: Make DRC get_index and get_type methods into plain functions
These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verging on
impossible.

So replace them with simple functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06 08:53:24 +10:00
Igor Mammedov 99861ecbc5 spapr: cleanup spapr_fixup_cpu_numa_dt() usage
even though spapr_fixup_cpu_numa_dt() has no effect on FDT
if numa is disabled, don't call it uselessly. It makes it
obvious at call sites that function is needed only when numa
is enabled.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05 14:59:09 -03:00
Igor Mammedov 15f8b14228 numa: move numa_node from CPUState into target specific classes
Move vcpu's associated numa_node field out of generic CPUState
into inherited classes that actually care about cpu<->numa mapping,
i.e: ARMCPU, PowerPCCPU, X86CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com>
[ehabkost: s/CPU is belonging to/CPU belongs to/ on comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05 14:59:09 -03:00
Igor Mammedov a0ceb640d0 numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com>
[ehabkost: Fix indentation]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05 14:59:08 -03:00
Daniel Henrique Barboza 16ee99805e hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release
When a LMB hot unplug starts, the current DRC LMB status is stored at
spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus
if a migration occurs in the middle of a LMB unplug the
spapr_lmb_release callback will lost track of the LMB unplug progress.

This patch implements a new recover function spapr_recover_pending_dimm_state
that is used inside spapr_lmb_release to recover this DRC LMB release
status that is lost during the migration.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic changes, simplify error handling]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza 318347234d hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
information can also be retrieved simply by checking drc->type and
choosing the right callback based on it. In this context, detach_cb
is redundant information that must be managed.

After the previous spapr_lmb_release change, no detach_cb_opaques
are being used by any of the three callbacks functions. This is
yet another information that is now unused and, on top of that, can't
be migrated either.

This patch makes the following changes:

- removal of detach_cb_opaque. the 'opaque' argument was removed from
the callbacks and from the detach() function of sPAPRConnectorClass. The
attribute detach_cb_opaque of sPAPRConnector was removed.

- removal of detach_cb from the detach() call. The function pointer
detach_cb of sPAPRConnector was removed. detach() now uses a
switch(drc->type) to execute the apropriate callback. To achieve this,
spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
callbacks were made public to be visible inside detach().

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:33 +10:00
David Gibson 0cffce56ae hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState
The LMB DRC release callback, spapr_lmb_release(), uses an opaque
parameter, a sPAPRDIMMState struct that stores the current LMBs that
are allocated to a DIMM (nr_lmbs). After each call to this callback,
the nr_lmbs is decremented by one and, when it reaches zero, the callback
proceeds with the qdev calls to hot unplug the LMB.

Using drc->detach_cb_opaque is problematic because it can't be migrated in
the future DRC migration work. This patch makes the following changes to
eliminate the usage of this opaque callback inside spapr_lmb_release:

- sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
attribute called 'addr' was added to it. This is used as an unique
identifier to associate a sPAPRDIMMState to a PCDIMM element.

- sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
that are currently going under an unplug process.

- spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.

After these changes, the opaque argument for spapr_lmb_release is now
unused and is passed as NULL inside spapr_del_lmbs. This and the other
opaque arguments can now be safely removed from the code.

As an additional cleanup made by this patch, the spapr_del_lmbs function
was merged with spapr_memory_unplug_request. The former was being called
only by the latter and both were small enough to fit one single function.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic cleanups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25 11:31:28 +10:00
Laurent Vivier c871bc70bb spapr: add pre_plug function for memory
This allows to manage errors before the memory
has started to be hotplugged. We already have
the function for the CPU cores.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Fixed a couple of style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 17:27:39 +10:00
David Gibson 459264ef24 pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types
As of pseries-2.7 and later, we require the total number of guest vcpus to
be a multiple of the threads-per-core.  pseries-2.6 and earlier machine
types, however, are supposed to allow this for the sake of migration from
old qemu versions which allowed this.

Unfortunately, 8149e29 "pseries: Enforce homogeneous threads-per-core"
broke this by not considering the old machine type case.  This fixes it by
only applying the check when the machine type supports hotpluggable cpus.
By not-entirely-coincidence, that corresponds to the same time when we
started enforcing total threads being a multiple of threads-per-core.

Fixes: 8149e2992f

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
2017-05-24 11:39:53 +10:00
Greg Kurz 3d85885a1b spapr: fix error reporting in xics_system_init()
If the user explicitely asked for kernel-irqchip support and "xics-kvm"
initialization fails, we shouldn't fallback to emulated "xics" as we
do now. It is also awkward to print an error message when we have an
errp pointer argument.

Let's use the errp argument to report the error and let the caller decide.
This simplifies the code as we don't need a local Error * here.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:53 +10:00
Greg Kurz 07572c0653 spapr: ensure core_slot isn't NULL in spapr_core_unplug()
If we go that far on the path of hot-removing a core and we find out that
the core-id is invalid, then we have a serious bug.

Let's make it explicit with an assert() instead of dereferencing a NULL
pointer.

This fixes Coverity issue CID 1375404.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:53 +10:00
Bharata B Rao 06ec79e865 spapr: Consolidate HPT freeing code into a routine
Consolidate the code that frees HPT into a separate routine
spapr_free_hpt() as the same chunk of code is called from two places.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Greg Kurz 175d2aa038 spapr: sanitize error handling in spapr_ics_create()
The spapr_ics_create() function handles errors in a rather convoluted
way, with two local Error * variables. Moreover, failing to parent the
ICS object to the machine should be considered as a bug but it is
currently ignored.

This patch addresses both issues.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Greg Kurz f63ebfe0ac ppc/xics: simplify prototype of xics_spapr_init()
This function only does hypercall and RTAS-call registration, and thus
never returns an error. This patch adapt the prototype to reflect that.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24 11:39:52 +10:00
Stefan Hajnoczi ba9915e1f8 x86 and machine queue, 2017-05-11
Highlights:
 * New "-numa cpu" option
 * NUMA distance configuration
 * migration/i386 vmstatification
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn
 RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j
 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11
 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT
 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE
 lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W
 VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI
 /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07
 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ
 /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC
 LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC
 CfdxWB5kYM6/lLbOHj94
 =48wH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging

x86 and machine queue, 2017-05-11

Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification

# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
  migration/i386: Remove support for pre-0.12 formats
  vmstatification: i386 FPReg
  migration/i386: Remove old non-softfloat 64bit FP support
  tests: check -numa node,cpu=props_list usecase
  numa: add '-numa cpu,...' option for property based node mapping
  numa: remove node_cpu bitmaps as they are no longer used
  numa: use possible_cpus for not mapped CPUs check
  machine: call machine init from wrapper
  numa: remove no longer need numa_post_machine_init()
  tests: numa: add case for QMP command query-cpus
  QMP: include CpuInstanceProperties into query_cpus output output
  virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: add check that board supports cpu_index to node mapping
  virt-arm: add node-id property to CPU
  pc: add node-id property to CPU
  spapr: add node-id property to sPAPR core
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-15 14:12:03 +01:00
Igor Mammedov 722387e78d spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:49 -03:00
Igor Mammedov 0b8497f08c spapr: add node-id property to sPAPR core
it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Igor Mammedov ea089eebbd numa: move source of default CPUs to NUMA node mapping into boards
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:48 -03:00
Laurent Vivier 3bfe57165b numa: equally distribute memory on nodes
When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.

This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.

To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.

We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.

Example:

qemu-system-ppc64 -S -nographic  -nodefaults -monitor stdio -m 1G -smp 8 \
                  -numa node -numa node -numa node \
                  -numa node -numa node -numa node

Before:

(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB

After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB

[1] https://en.wikipedia.org/wiki/Error_diffusion

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11 16:08:47 -03:00
David Gibson 9bf502fe12 spapr: Don't accidentally advertise HTM support on POWER9
Logic in spapr_populate_pa_features() enables the bit advertising
Hardware Transactional Memory (HTM) in the guest's device tree only when
KVM advertises its availability with the KVM_CAP_PPC_HTM feature.

However, this assumes that the HTM bit is off in the base template used for
the device tree value.  That is true for POWER8, but not for POWER9.

It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
3.0 MMU mode selection via CAS".

Fixes: 9fb4541f58

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh 545d6e2b5c target/ppc: Enable RADIX mmu mode for pseries TCG guest
Now that we have added all the infrastructure we can enable a pseries TCG
guest to use radix.

In order to do this we have to add the appropriate bits to the
ibm,arch-vec-5-platform-support vector to represent that we support both
hash and radix mmu models.

A radix guest can now be booted in pseries tcg mode by specifying:
-cpu POWER9

Note that we assume hash, that is we allocate a hpt, until a guest tells
us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in
which case we free the hpt. If we were right and the guest is hash then
there's nothing for us to do.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-11 09:45:15 +10:00
Cédric Le Goater 71cd4dace9 spapr: remove the 'nr_servers' field from the machine
xics_system_init() does not need 'nr_servers' anymore as it is only
used to define the 'interrupt-controller' node in the device tree. So
let's just compute the value when calling spapr_dt_xics().

This also gives us an opportunity to simplify the xics_system_init()
routine and introduce a specific spapr_ics_create() helper to create
the sPAPR ICS object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:41:55 +10:00
Cédric Le Goater 5bc8d26de2 spapr: allocate the ICPState object from under sPAPRCPUCore
Today, all the ICPs are created before the CPUs, stored in an array
under the sPAPR machine and linked to the CPU when the core threads
are realized. This modeling brings some complexity when a lookup in
the array is required and it can be simplified by allocating the ICPs
when the CPUs are.

This is the purpose of this proposal which introduces a new 'icp_type'
field under the machine and creates the ICP objects of the right type
(KVM or not) before the PowerPCCPU object are.

This change allows more cleanups : the removal of the icps array under
the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
helper.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Cédric Le Goater 06747ba6d4 spapr: move the IRQ server number mapping under the machine
This is the second step to abstract the IRQ 'server' number of the
XICS layer. Now that the prereq cleanups have been done in the
previous patch, we can move down the 'cpu_dt_id' to 'cpu_index'
mapping in the sPAPR machine handler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:42 +10:00
Alexey Kardashevskiy 3dc410ae83 target-ppc/kvm: Enable in-kernel TCE acceleration for multi-tce
This enables in-kernel handling of H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls. The host kernel support is there since v4.6,
in particular d3695aa4f452
("KVM: PPC: Add support for multiple-TCE hcalls").

H_PUT_TCE is already accelerated and does not need any special enablement.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Sam Bobroff e957f6a9b9 spapr: Workaround for broken radix guests
For a little while around 4.9, Linux kernels that saw the radix bit in
ibm,pa-features would attempt to set up the MMU as if they were a
hypervisor, even if they were a guest, which would cause them to
crash.

Work around this by detecting pre-ISA 3.0 guests by their lack of that
bit in option vector 1, and then removing the radix bit from
ibm,pa-features. Note: This now requires regeneration of that node
after CAS negotiation.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Sam Bobroff 9fb4541f58 spapr: Enable ISA 3.0 MMU mode selection via CAS
Add the new node, /chosen/ibm,arch-vec-5-platform-support to the
device tree. This allows the guest to determine which modes are
supported by the hypervisor.

Update the option vector processing in h_client_architecture_support()
to handle the new MMU bits. This allows guests to request hash or
radix mode and QEMU to create the guest's HPT at this time if it is
necessary but hasn't yet been done.  QEMU will terminate the guest if
it requests an unavailable mode, as required by the architecture.

Extend the ibm,pa-features node with the new ISA 3.0 values
and set the radix bit if KVM supports radix mode. This probably won't
be used directly by guests to determine the availability of radix mode
(that is indicated by the new node added above) but the architecture
requires that it be set when the hardware supports it.

If QEMU is using KVM, and KVM is capable of running in radix mode,
guests can be run in real-mode without allocating a HPT (because KVM
will use a minimal RPT). So in this case, we avoid creating the HPT
at reset time and later (during CAS) create it if it is necessary.

ISA 3.0 guests will now begin to call h_register_process_table(),
which has been added previously.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Strip some unneeded prefix from error messages]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Sam Bobroff 86d5771a5a spapr: move spapr_populate_pa_features()
In the next patch, spapr_fixup_cpu_dt() will need to call
spapr_populate_pa_features() so move it's definition up without making
any other changes.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh b4db54132f target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL
The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the
hypervisor where in memory its process table is and how translation should
be performed using this process table.

Provide the implementation of this H_CALL for a guest.

We first check for invalid flags, then parse the flags to determine the
operation, and then check the other parameters for valid values based on
the operation (register new table/deregister table/maintain registration).
The process table is then stored in the appropriate location and registered
with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits
are updated as required.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Correct missing prototype and uninitialized variable]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Sam Bobroff c64abd1f9c spapr: Add ibm,processor-radix-AP-encodings to the device tree
Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU
information from KVM and present the page encodings in the device tree
under ibm,processor-radix-AP-encodings. This provides page size
information to the guest which is necessary for it to use radix mode.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Compile fix for 32-bit targets, style nit fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
Cédric Le Goater 147ff8079e ppc/spapr: QOM'ify sPAPRRTCState
Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold
the RTC object. Overall, these changes remove an unnecessary and
implicit dependency on SysBus.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
David Gibson 3fa14fbe13 pseries: Add pseries-2.10 machine type
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26 12:00:41 +10:00
David Gibson 8149e2992f pseries: Enforce homogeneous threads-per-core
For reasons that may be useful in future, CPU core objects, as used on the
pseries machine type have their own nr-threads property, potentially
allowing cores with different numbers of threads in the same system.

If the user/management uses the values specified in query-hotpluggable-cpus
as they're expected to do, this will never matter in pratice.  But that's
not actually enforced - it's possible to manually specify a core with
a different number of threads from that in -smp.  That will confuse the
platform - most immediately, this can be used to create a CPU thread with
index above max_cpus which leads to an assertion failure in
spapr_cpu_core_realize().

For now, enforce that all cores must have the same, standard, number of
threads.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
2017-04-03 13:46:18 +10:00
Marc-André Lureau 24ec2863b1 spapr: fix buffer-overflow
Running postcopy-test with ASAN produces the following error:

QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64  tests/postcopy-test
...
=================================================================
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f1556600000 thread T6
    #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528
    #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665
    #2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044
    #3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976
    #4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
    #5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)

0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000)
allocated by thread T0 here:
    #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
    #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106
    #2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122
    #3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214
    #4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261
    #5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
    #6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
    #7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)

Thread T6 created by T0 here:
    #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
    #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465
    #2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096
    #3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500
    #4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87
    #5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
    #6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88
    #7 0x7f15823e38e6  (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass

index seems to be wrongly incremented, unless I miss something that
would be worth a comment.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-29 11:35:02 +11:00
Laurent Vivier 55641213fc numa,spapr: align default numa node memory size to 256MB
Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node
memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE).

But when "-numa" option is provided without "mem" parameter,
the memory is equally divided between nodes, but 8MB aligned.
This can be not valid for pseries.

In that case we can have:
$ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node
qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB

With this patch, we have:
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 1280 MB
node 1 cpus:
node 1 size: 1280 MB
node 2 cpus:
node 2 size: 1536 MB

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-22 11:32:42 +11:00
David Gibson 82516263ce pseries: Don't expose PCIe extended config space on older machine types
bb9986452 "spapr_pci: Advertise access to PCIe extended config space"
allowed guests to access the extended config space of PCI Express devices
via the PAPR interfaces, even though the paravirtualized bus mostly acts
like plain PCI.

However, that patch enabled access unconditionally, including for existing
machine types, which is an unwise change in behaviour.  This patch limits
the change to pseries-2.9 (and later) machine types.

Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-14 11:54:17 +11:00
Cédric Le Goater 7ea6e06717 ppc/xics: register reset handlers for the ICP and ICS objects
The recent changes on the XICS layer removed the XICSState object to
let the sPAPR machine handle the ICP and ICS directly. The reset of
these objects was previously handled by XICSState, which was a SysBus
device, and to keep the same behavior, the ICP and ICS were assigned
to SysbBus.

But that broke the 'info qtree' command in the monitor. 'qtree'
performs a loop on the children of a bus to print their properties and
SysBus devices are expected to be found under SysBus, which is not the
case anymore.

The fix for this problem is to register reset handlers for the ICP and
ICS objects and stop using SysBus for such devices.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-06 10:07:38 +11:00
Sam Bobroff ec975e839c spapr: Small cleanup of PPC MMU enums
The PPC MMU types are sometimes treated as if they were a bit field
and sometime as if they were an enum which causes maintenance
problems: flipping bits in the MMU type (which is done on both the 1TB
segment and 64K segment bits) currently produces new MMU type
values that are not handled in every "switch" on it, sometimes causing
an abort().

This patch provides some macros that can be used to filter out the
"bit field-like" bits so that the remainder of the value can be
switched on, like an enum. This allows removal of all of the
"degraded" types from the list and should ease maintenance.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh 4975c098c9 target/ppc/POWER9: Add POWER9 pa-features definition
Add a pa-features definition which includes all of the new fields which
have been added, note we don't claim support for any of these new features
at this stage.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh 9861bb3efd target/ppc: Add patb_entry to sPAPRMachineState
ISA v3.00 adds the idea of a partition table which is used to store the
address translation details for all partitions on the system. The partition
table consists of double word entries indexed by partition id where the second
double word contains the location of the process table in guest memory. The
process table is registered by the guest via a h-call.

We need somewhere to store the address of the process table so we add an entry
to the sPAPRMachineState struct called patb_entry to represent the second
doubleword of a single partition table entry corresponding to the current
guest. We need to store this value so we know if the guest is using radix or
hash translation and the location of the corresponding process table in guest
memory. Since we only have a single guest per qemu instance, we only need one
entry.

Since the partition table is technically a hypervisor resource we require that
access to it is abstracted by the virtual hypervisor through the get_patbe()
call. Currently the value of the entry is never set (and thus
defaults to 0 indicating hash), but it will be required to both implement
POWER9 kvm support and tcg radix support.

We also add this field to be migrated as part of the sPAPRMachineState as we
will need it on the receiving side as the guest will never tell us this
information again and we need it to perform translation.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-03 11:30:59 +11:00
Cédric Le Goater 6449da4545 ppc/xics: move InterruptStatsProvider to the sPAPR machine
It provides a better monitor output of the ICP and ICS objects, else
the objects are printed out of order.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater a7ff1212e9 ppc/xics: move ics-simple post_load under the machine
The ICS object uses a post_load() handler which is implicitly relying
on the fact that the internal state of the ICS and ICP objects has
been restored but this is not guaranteed. So, let's move the code
under the post_load() handler of the machine where we know the objects
have been fully restored.

The icp_resend() handler of the XICSFabric QOM interface is also
removed as it is now obsolete.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater e6f7e110ee ppc/xics: remove the XICSState classes
The XICSState classes are not used anymore. They have now been fully
deprecated by the XICSFabric QOM interface. Do the cleanups.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater 2192a9303d ppc/xics: export the XICS init routines
There is nothing left related to the XICS object in the realize
functions of the KVMXICSState and XICSState class. So adapt the
interfaces to call these routines directly from the sPAPR machine init
sequence.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater 852ad27e14 ppc/xics: move the ICP array under the sPAPR machine
This is the last step to remove the XICSState abstraction and have the
machine hold all the objects related to interrupts : ICSs and ICPs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater 20147f2fce ppc/xics: register the reset handler of ICP objects
The reset of the ICP objects is currently handled by XICS but this can
be done for each individual ICP.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:40 +11:00
Cédric Le Goater b0ec31290c ppc/xics: simplify spapr_dt_xics() interface
spapr_dt_xics() only needs the number of servers to build the device
tree nodes. Let's change the routine interface to reflect that.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater b4f27d71e3 ppc/xics: use the QOM interface to grab an ICP
Also introduce a xics_icp_get() helper to simplify the changes.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater b2fc59aaf9 ppc/xics: extend the QOM interface to handle ICPs
Let's add two new handlers for ICPs. One is to get an ICP object from
a server number and a second is to resend the irqs when needed.

The icp_resend() handler is a temporary workaround needed by the
ics-simple post_load() handler. It will be removed when the post_load
portion can be done at the machine level.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater d114a66225 ppc/xics: remove the XICS list of ICS
This is not used anymore.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater c79b2fdd7b ppc/xics: register the reset handler of ICS objects
The reset of the ICS objects is currently handled by XICS but this can
be done for each individual ICS. This also reduces the use of the XICS
list of ICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 2cd908d0ad ppc/xics: use the QOM interface to resend irqs
Also change the ICPState 'xics' backlink to be a XICSFabric, this
removes the need of using qdev_get_machine() to get the QOM interface
in some of the routines.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 7844e12b28 ppc/xics: use the QOM interface under the sPAPR machine
Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These
are relatively simple for a single ICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 681bfaded6 ppc/xics: store the ICS object under the sPAPR machine
A list of ICS objects was introduced under the XICS object for the
PowerNV machine but, for the sPAPR machine, it brings extra complexity
as there is only a single ICS. To simplify the code, let's add the ICS
pointer under the sPAPR machine and try to reduce the use of this list
where possible.

Also, change the xics_spapr_*() routines to use an ICS object instead
of an XICSState and change their name to reflect that these are
specific to the sPAPR ICS object.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 817bb6a446 ppc/xics: remove set_nr_servers() handler from XICSStateClass
Today, the ICP (Interrupt Controller Presenter) objects are created by
the 'nr_servers' property handler of the XICS object and a class
handler. They are realized in the XICS object realize routine.

Let's simplify the process by creating the ICP objects along with the
XICS object at the machine level.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Cédric Le Goater 4e4169f7a2 ppc/xics: remove set_nr_irqs() handler from XICSStateClass
Today, the ICS (Interrupt Controller Source) object is created and
realized by the init and realize routines of the XICS object, but some
of the parameters are only known at the machine level.

These parameters are passed from the sPAPR machine to the ICS object
in a rather convoluted way using property handlers and a class handler
of the XICS object. The number of irqs required to allocate the IRQ
state objects in the ICS realize routine is one of them.

Let's simplify the process by creating the ICS object along with the
XICS object at the machine level and link the ICS into the XICS list
of ICSs at this level also. In the sPAPR machine, there is only a
single ICS but that will change with the PowerNV machine.

Also, QOMify the creation of the objects and get rid of the
superfluous code.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
David Gibson 738d5db824 xics: XICS should not be a SysBusDevice
Currently xics - the component of the IBM POWER interrupt controller
representing the overall interrupt fabric / architecture is
represented as a descendent of SysBusDevice.  However, this is not
really correct - the xics presents nothing in MMIO space so it should
be an "unattached" device in the current QOM model.

Since this device will always be created by the machine type, not created
specifically from the command line, and because it has no migrated state
it should be safe to move it around the device composition tree.

Therefore this patch changes it to a descendent of TYPE_DEVICE, and
makes it an unattached device.  So that its reset handler still gets
called correctly, we add a qdev_set_parent_bus() to attach it to
sysbus.  It's not really clear that's correct (instead of using
register_reset()) but it appears to a common technique.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[clg corrected problems with reset]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg folded together and updated commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
David Gibson e57ca75ce3 target/ppc: Manage external HPT via virtual hypervisor
The pseries machine type implements the behaviour of a PAPR compliant
hypervisor, without actually executing such a hypervisor on the virtual
CPU.  To do this we need some hooks in the CPU code to make hypervisor
facilities get redirected to the machine instead of emulated internally.

For hypercalls this is managed through the cpu->vhyp field, which points
to a QOM interface with a method implementing the hypercall.

For the hashed page table (HPT) - also a hypervisor resource - we use an
older hack.  CPUPPCState has an 'external_htab' field which when non-NULL
indicates that the HPT is stored in qemu memory, rather than within the
guest's address space.

For consistency - and to make some future extensions easier - this merges
the external HPT mechanism into the vhyp mechanism.  Methods are added
to vhyp for the basic operations the core hash MMU code needs: map_hptes()
and unmap_hptes() for reading the HPT, store_hpte() for updating it and
hpt_mask() to retrieve its size.

To match this, the pseries machine now sets these vhyp fields in its
existing vhyp class, rather than reaching into the cpu object to set the
external_htab field.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-03-01 11:23:39 +11:00
Greg Kurz 6244bb7e58 sysemu: support up to 1024 vCPUs
Some systems can already provide more than 255 hardware threads.

Bumping the QEMU limit to 1024 seems reasonable:
- it has no visible overhead in top;
- the limit itself has no effect on hot paths.

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01 11:23:39 +11:00
Peter Maydell 28f997a82c This is the MTTCG pull-request as posted yesterday.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYsBZfAAoJEPvQ2wlanipElJ0H+QGoStSPeHrvKu7Q07v4F9zM
 Pvf05gRsaxvXl7UbwmXC4oKhvZf9rVJ6ITk0x/y0WvmK0mHCmNBWkC0nn5UFL5IH
 cdxetLz21Q+Ghpc36tZvqn2HYwRQFoEznge2LdtBDG0TyVA4jwquHU3HCG2D51zi
 BaImI6lYW1e4ejjZHw8cEInSxsj/HJZE4pPas2Tkci+uAnrJroErwBVRRcE/y/Tn
 aupl9TJFs2JdyJFNDibIm0kjB+i+jvCiLgYjbKZ/dR/+GZt73TtiBk/q9ZOFjdmT
 7YFPI3F46QbGHoZahtzh0Xt7WMj94SlQgQ9OJ3zmNMfpXrze6Yc78xo/nbQ33U0=
 =wR0/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stsquad/tags/pull-mttcg-240217-1' into staging

This is the MTTCG pull-request as posted yesterday.

# gpg: Signature made Fri 24 Feb 2017 11:17:51 GMT
# gpg:                using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* remotes/stsquad/tags/pull-mttcg-240217-1: (24 commits)
  tcg: enable MTTCG by default for ARM on x86 hosts
  hw/misc/imx6_src: defer clearing of SRC_SCR reset bits
  target-arm: ensure all cross vCPUs TLB flushes complete
  target-arm: don't generate WFE/YIELD calls for MTTCG
  target-arm/powerctl: defer cpu reset work to CPU context
  cputlb: introduce tlb_flush_*_all_cpus[_synced]
  cputlb: atomically update tlb fields used by tlb_reset_dirty
  cputlb: add tlb_flush_by_mmuidx async routines
  cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap
  cputlb: introduce tlb_flush_* async work.
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb: add assert_cpu_is_self checks
  tcg: handle EXCP_ATOMIC exception for system emulation
  tcg: enable thread-per-vCPU
  tcg: enable tb_lock() for SoftMMU
  tcg: remove global exit_request
  tcg: drop global lock during TCG code execution
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: add options for enabling MTTCG
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-25 18:43:52 +00:00
Jan Kiszka 8d04fb55de tcg: drop global lock during TCG code execution
This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota <cota@braap.org>
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
[PM: target-arm changes]
Acked-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-24 10:32:45 +00:00
Thomas Huth df58713396 hw/ppc/spapr: Check for valid page size when hot plugging memory
On POWER, the valid page sizes that the guest can use are bound
to the CPU and not to the memory region. QEMU already has some
fancy logic to find out the right maximum memory size to tell
it to the guest during boot (see getrampagesize() in the file
target/ppc/kvm.c for more information).
However, once we're booted and the guest is using huge pages
already, it is currently still possible to hot-plug memory regions
that does not support huge pages - which of course does not work
on POWER, since the guest thinks that it is possible to use huge
pages everywhere. The KVM_RUN ioctl will then abort with -EFAULT,
QEMU spills out a not very helpful error message together with
a register dump and the user is annoyed that the VM unexpectedly
died.
To avoid this situation, we should check the page size of hot-plugged
DIMMs to see whether it is possible to use it in the current VM.
If it does not fit, we can print out a better error message and
refuse to add it, so that the VM does not die unexpectely and the
user has a second chance to plug a DIMM with a matching memory
backend instead.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1419466
Signed-off-by: Thomas Huth <thuth@redhat.com>
[dwg: Fix a build error on 32-bit builds with KVM]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 14:28:53 +11:00
Igor Mammedov c5514d0e4b machine: replace query_hotpluggable_cpus() callback with has_hotpluggable_cpus flag
Generic helper machine_query_hotpluggable_cpus() replaced
target specific query_hotpluggable_cpus() callbacks so
there is no need in it anymore. However inon NULL callback
value is used to detect/report hotpluggable cpus support,
therefore it can be removed completely.
Replace it with MachineClass.has_hotpluggable_cpus boolean
which is sufficient for the task.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:28 +11:00
Igor Mammedov f2d672c248 machine: unify [pc_|spapr_]query_hotpluggable_cpus() callbacks
All callbacks FOO_query_hotpluggable_cpus() are practically
the same except of setting vcpus_count to different values.
Convert them to a generic machine_query_hotpluggable_cpus()
callback by moving vcpus_count initialization to per machine
specific callback possible_cpu_arch_ids().

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:28 +11:00
Igor Mammedov 535455fdee spapr: reuse machine->possible_cpus instead of cores[]
Replace SPAPR specific cores[] array with generic
machine->possible_cpus and store core objects there.
It makes cores bookkeeping similar to x86 cpus and
will allow to unify similar code.
It would allow to replace cpu_index based NUMA node
mapping with iproperty based one (for -device created
cores) since possible_cpus carries board defined
topology/layout.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:28 +11:00
Igor Mammedov 115debf26c spapr: make cpu core unplug follow expected hotunplug call flow
spapr_core_unplug() were essentially spapr_core_unplug_request()
handler that requested CPU removal and registered callback
which did actual cpu core removali but it was called from
spapr_machine_device_unplug() which is intended for actual object
removal. Commit (cf632463 spapr: Memory hot-unplug support)
sort of fixed it introducing spapr_machine_device_unplug_request()
and calling spapr_core_unplug() but it hasn't renamed callback and
by mistake calls it from spapr_machine_device_unplug().

However spapr_machine_device_unplug() isn't ever called for
cpu core since spapr_core_release() doesn't follow expected
hotunplug call flow which is:
 1: device_del() ->
        hotplug_handler_unplug_request() ->
            set destroy_cb()
 2: destroy_cb() ->
        hotplug_handler_unplug() ->
            object_unparent // actual device removal

Fix it by renaming spapr_core_unplug() to spapr_core_unplug_request()
which is called from spapr_machine_device_unplug_request() and
making spapr_core_release() call hotplug_handler_unplug() which
will call spapr_machine_device_unplug() -> spapr_core_unplug()
to remove cpu core.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reveiwed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:27 +11:00
Igor Mammedov ff9006ddbf spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c
spapr_core_pre_plug/spapr_core_plug/spapr_core_unplug() are managing
wiring CPU core into spapr machine state and not internal CPU core state.
So move them from spapr_cpu_core.c to spapr.c where other similar
(spapr_memory_[foo]plug()) callbacks are located, which also matches
x86 target practice.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22 11:28:27 +11:00
Michael S. Tsirkin 25e6a11832 ppc: switch to constants within BUILD_BUG_ON
We are switching BUILD_BUG_ON to verify that it's parameter is a
compile-time constant, and it turns out that some gcc versions
(specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) are
not smart enough to figure it out for expressions involving local
variables. This is harmless but means that the check is ineffective for
these platforms.  To fix, replace the variable with macros.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[dwg: Correct a printf format warning]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31 14:04:06 +11:00
Laurent Vivier 42043e4f12 spapr: clock should count only if vm is running
This is a port to ppc of the i386 commit:
    00f4d64 kvmclock: clock should count only if vm is running

We remove timebase_post_load function, and use the VM state
change handler to save and restore the guest_timebase (on stop
and continue).

We keep timebase_pre_save to reduce the clock difference on
migration like in:
    6053a86 kvmclock: reduce kvmclock difference on migration

Time base offset has originally been introduced by commit
    98a8b52 spapr: Add support for time base offset migration

So while VM is paused, the time is stopped. This allows to have
the same result with date (based on Time Base Register) and
hwclock (based on "get-time-of-day" RTAS call).

Moreover in TCG mode, the Time Base is always paused, so this
patch also adjust the behavior between TCG and KVM.

VM state field "time_of_the_day_ns" is now useless but we keep
it to be able to migrate to older version of the machine.

As vmstate_ppc_timebase structure (with timebase_pre_save() and
timebase_post_load() functions) was only used by vmstate_spapr,
we register the VM state change handler only in ppc_spapr_init().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31 10:10:14 +11:00
David Gibson 12dbeb16d0 ppc: Rewrite ppc_get_compat_smt_threads()
To continue consolidation of compatibility mode information, this rewrites
the ppc_get_compat_smt_threads() function using the table of compatiblity
modes in target-ppc/compat.c.

It's not a direct replacement, the new ppc_compat_max_threads() function
has simpler semantics - it just returns the number of threads the cpu
model has, taking into account any compatiblity mode it is in.

This no longer takes into account kvmppc_smt_threads() as the previous
version did.  That check wasn't useful because we check in
ppc_cpu_realizefn() that CPUs aren't instantiated with more threads
than kvm allows (or if we didn't things will already be broken and
this won't make it any worse).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-01-31 10:10:13 +11:00
David Gibson fa325e6cbf pseries: Add pseries-2.9 machine type
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-01-31 10:10:13 +11:00
Thomas Huth b99260ebbb hw/ppc/spapr: Fix boot path of usb-host storage devices
When passing through an USB storage device to a pseries guest, it
is currently not possible to automatically boot from the device
if the "bootindex" property has been specified, too (e.g. when using
"-device nec-usb-xhci -device usb-host,hostbus=1,hostaddr=2,bootindex=0"
at the command line). The problem is that QEMU builds a device tree path
like "/pci@800000020000000/usb@0/usb-host@1" and passes it to SLOF
in the /chosen/qemu,boot-list property. SLOF, however, probes the
USB device, recognizes that it is a storage device and thus changes
its name to "storage", and additionally adds a child node for the
SCSI LUN, so the correct boot path in SLOF is something like
"/pci@800000020000000/usb@0/storage@1/disk@101000000000000" instead.
So when we detect an USB mass storage device with SCSI interface,
we've got to adjust the firmware boot-device path properly that
SLOF can automatically boot from the device.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354177
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31 10:10:13 +11:00
Nicholas Piggin 1c7ad77e56 ppc/spapr: implement H_SIGNAL_SYS_RESET
The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset
exception on CPUs within the same guest -- all CPUs, all-but-self, or a
specific CPU (including self).

This has not made its way to a PAPR release yet, but we have an hcall
number assigned.

  H_SIGNAL_SYS_RESET = 0x380

  Syntax:
    hcall(uint64 H_SIGNAL_SYS_RESET, int64 target);

  Generate a system reset NMI on the threads indicated by target.

  Values for target:
    -1 = target all online threads including the caller
    -2 = target all online threads except for the caller
    All other negative values: reserved
    Positive values: The thread to be targeted, obtained from the value
    of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF
    device tree.

  Semantics:
    - Invalid target: return H_Parameter.
    - Otherwise: Generate a system reset NMI on target thread(s),
      return H_Success.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31 10:10:13 +11:00
David Gibson d6e166c082 ppc: Rename cpu_version to compat_pvr
The 'cpu_version' field in PowerPCCPU is badly named.  It's named after the
'cpu-version' device tree property where it is advertised, but that meaning
may not be obvious in most places it appears.

Worse, it doesn't even really correspond to that device tree property.  The
property contains either the processor's PVR, or, if the CPU is running in
a compatibility mode, a special "logical PVR" representing which mode.

Rename the cpu_version field, and a number of related variables to
compat_pvr to make this clearer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2017-01-31 10:10:13 +11:00
David Gibson 1d1be34d26 ppc: Clean up and QOMify hypercall emulation
The pseries machine type is a bit unusual in that it runs a paravirtualized
guest.  The guest expects to interact with a hypervisor, and qemu
emulates the functions of that hypervisor directly, rather than executing
hypervisor code within the emulated system.

To implement this in TCG, we need to intercept hypercall instructions and
direct them to the machine's hypercall handlers, rather than attempting to
perform a privilege change within TCG.  This is controlled by a global
hook - cpu_ppc_hypercall.

This cleanup makes the handling a little cleaner and more extensible than
a single global variable.  Instead, each CPU to have hypercalls intercepted
has a pointer set to a QOM object implementing a new virtual hypervisor
interface.  A method in that interface is called by TCG when it sees a
hypercall instruction.  It's possible we may want to add other methods in
future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-01-31 10:10:13 +11:00
David Gibson 5b120785e7 pseries: Make cpu_update during CAS unconditional
spapr_h_cas_compose_response() includes a cpu_update parameter which
controls whether it includes updated information on the CPUs in the device
tree fragment returned from the ibm,client-architecture-support (CAS) call.

Providing the updated information is essential when CAS has negotiated
compatibility options which require different cpu information to be
presented to the guest.  However, it should be safe to provide in other
cases (it will just override the existing data in the device tree with
identical data).  This simplifies the code by removing the parameter and
always providing the cpu update information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-01-31 10:10:13 +11:00
David Gibson 0c86d0fd92 pseries: Always use core objects for CPU construction
Currently the pseries machine has two paths for constructing CPUs.  On
newer machine type versions, which support cpu hotplug, it constructs
cpu core objects, which in turn construct CPU threads.  For older machine
versions it individually constructs the CPU threads.

This division is going to make some future changes to the cpu construction
harder, so this patch unifies them.  Now cpu core objects are always
created.  This requires some updates to allow core objects to be created
without a full complement of threads (since older versions allowed a
number of cpus not a multiple of the threads-per-core).  Likewise it needs
some changes to the cpu core hot/cold plug path so as not to choke on the
old machine types without hotplug support.

For good measure, we move the cpu construction to its own subfunction,
spapr_init_cpus().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-01-31 10:10:13 +11:00
Vincent Palatin b39466269b kvm: move cpu synchronization code
Move the generic cpu_synchronize_ functions to the common hw_accel.h header,
in order to prepare for the addition of a second hardware accelerator.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-19 22:07:46 +01:00
Michael Roth 5c0139a8c2 spapr: fix default DRC state for coldplugged LMBs
Currently we set the initial isolation/allocation state for DRCs
associated with coldplugged LMBs to ISOLATED/UNUSABLE,
respectively, under the assumption that the guest will move this
state to UNISOLATED/USABLE.

In fact, this is only the case for LMBs added via hotplug. For
coldplugged LMBs, the guest actually assumes the initial state to
be UNISOLATED/USABLE.

In practice, this only becomes an issue when we attempt to unplug
one of these LMBs, where the guest kernel will issue an
rtas-get-sensor-state call to check that the corresponding DRC is
in an USABLE state before it will release the LMB back to
QEMU. If the returned state is otherwise, the guest will assume no
further action is needed, which bypasses the QEMU-side cleanup that
occurs during the USABLE->UNUSABLE transition. This results in
LMBs and their corresponding pc-dimm devices to stick around
indefinitely.

This patch fixes the issue by manually setting DRCs associated with
cold-plugged LMBs to UNISOLATED/ALLOCATED, but leaving the hotplug
state untouched. As it turns out, this is analogous to the handling
for cold-plugged CPUs in spapr_core_plug().

Cc: qemu-ppc@nongnu.org
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-12-01 13:41:00 +11:00
David Gibson 5c4537bded spapr: Fix 2.7<->2.8 migration of PCI host bridge
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration
from qemu-2.7 to the current version.  It split the device's MMIO
window into two pieces for 32-bit and 64-bit MMIO.

The patch included backwards compatibility code to convert the old
property into the new format.  However, the property value was also
transferred in the migration stream and compared with a (probably
unwise) VMSTATE_EQUAL.  So, the "raw" value from 2.7 is compared to
the new style converted value from (pre-)2.8 giving a mismatch and
migration failure.

Along with the actual field that caused the breakage, there are
several other ill-advised VMSTATE_EQUAL()s.  To fix forwards
migration, we read the values in the stream into scratch variables and
ignore them, instead of comparing for equality.  To fix backwards
migration, we populate those scratch variables in pre_save() with
adjusted values to match the old behaviour.

To permit the eventual possibility of removing this cruft from the
stream, we only include these compatibility fields if a new
'pre-2.8-migration' property is set.  We clear it on the pseries-2.8
machine type, which obviously can't be migrated backwards, but set it
on earlier machine type versions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-11-23 12:00:48 +11:00
David Gibson 146c11f16f target-ppc: Allow eventual removal of old migration mistakes
Until very recently, the vmstate for ppc cpus included some poorly
thought out VMSTATE_EQUAL() components, that can easily break
migration compatibility, and did so between qemu-2.6 and later
versions.  A hack was recently added which fixes this migration
breakage, but it leaves the unhelpful cruft of these fields in the
migration stream.

This patch adds a new cpu property allowing these fields to be removed
from the stream entirely.  For the pseries-2.8 machine type - which
comes after the fix - and for all non-pseries machine types - which
aren't mature enough to care about cross-version migration - we remove
the fields from the stream.

For pseries-2.7 and earlier, The migration hack remains in place,
allowing backwards and forwards migration with the older machine
types.

This restricts the migration compatibility cruft to older machine
types, and at least opens the possibility of eventually deprecating
and removing it entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-11-23 12:00:48 +11:00
Michael Roth 62ef3760d4 spapr: migration support for CAS-negotiated option vectors
With the additional of the OV5_HP_EVT option vector, we now have
certain functionality (namely, memory unplug) that checks at run-time
for whether or not the guest negotiated the option via CAS. Because
we don't currently migrate these negotiated values, we are unable
to unplug memory from a guest after it's been migrated until after
the guest is rebooted and CAS-negotiation is repeated.

This patch fixes this by adding CAS-negotiated options to the
migration stream. We do this using a subsection, since the
negotiated value of OV5_HP_EVT is the only option currently needed
to maintain proper functionality for a running guest.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-11-23 12:00:48 +11:00
Peter Maydell 6bc56d317f Base patches for MTTCG enablement.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJYF07FFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 ppoIAI4AxWocso5WIUH6uEHjOAxw9ZNhZ92nF8VtcbvGtN/eh8Qk4jfRX+W/Jl0q
 D13Rm3m8ynNHqh8YFs+O6i/WSgxHGxKwb75mNr36HDnYnMFluTvRQkvYJUXRyRuL
 CVtNgy8+q8FbbWo+NiJ5I7gfk2Si4BQfZN0uCLqGuCwqvvA/spN13xUcpeBXEKhL
 TeDGZBT/atDnT2bRcve8E8g5/0RKjTL9EB0jwfJjHocT5bs+toPe6js9VnZDRNWN
 ZldcONgEHj3zAj9j7hTkVWFTGPSCx/tt6y6JeORq1oxk0mCCswEk0U9A3hLzLjc/
 94XHsLaEoZ7HNAKtkLc07NYhkQM=
 =+6Sj
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-mttcg' into staging

Base patches for MTTCG enablement.

# gpg: Signature made Mon 31 Oct 2016 14:01:41 GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream-mttcg:
  tcg: move locking for tb_invalidate_phys_page_range up
  *_run_on_cpu: introduce run_on_cpu_data type
  cpus: re-factor out handle_icount_deadline
  tcg: cpus rm tcg_exec_all()
  tcg: move tcg_exec_all and helpers above thread fn
  target-arm/arm-powerctl: wake up sleeping CPUs
  tcg: protect translation related stuff with tb_lock.
  translate-all: Add assert_(memory|tb)_lock annotations
  linux-user/elfload: ensure mmap_lock() held while setting up
  tcg: comment on which functions have to be called with tb_lock held
  cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  translate-all: add DEBUG_LOCKING asserts
  translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  cpus: make all_vcpus_paused() return bool

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31 15:29:12 +00:00
Paolo Bonzini 14e6fe12a7 *_run_on_cpu: introduce run_on_cpu_data type
This changes the *_run_on_cpu APIs (and helpers) to pass data in a
run_on_cpu_data type instead of a plain void *. This is because we
sometimes want to pass a target address (target_ulong) and this fails on
32 bit hosts emulating 64 bit guests.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31 15:00:25 +01:00
Peter Maydell 277d44f5a6 trivial patches for 2016-10-28
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJYE2wfAAoJEHAbT2saaT5ZGYUH/3QWJ4OFWbqGo1YYN5AIAheF
 v1bQGTh1HGbLk46ajhUvzB0bMHb1FC1KoOruU2wFYuKK/J5zQ+4X9EmaC/fD7hyx
 nGTcPWAyxKOlqOq3In9ro+xWQNzEhfoypKCQQVC4Y3quzub48wAro8fuFSNXLyBq
 ERvAsjgj0TrLEHoWtJl2bPYiqSd6KAHZAKPFW3Jw8MmsBcTLmnF2PVW3LBfdcHe7
 6vlhqX7lPzVlHRaUsaxRkFxYd2YGisbe3bPRDw2fTxrtOYyEkopQq7xi2Q6Yq5N0
 z0yM2oJ7o1QtUOXYa7KBf03WZ7e119HimaUkGLg+0LVhQNbeG3hd3gNwApXa5og=
 =tYml
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into staging

trivial patches for 2016-10-28

# gpg: Signature made Fri 28 Oct 2016 16:17:51 BST
# gpg:                using RSA key 0x701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

* remotes/mjt/tags/trivial-patches-fetch: (23 commits)
  Fix build for less common build directories names
  clean-up: removed duplicate #includes
  scripts/clean-includes: added duplicate #include check
  monitor: deprecate 'default' option
  qemu-ga: Remove stray 'q' in documentation
  Makefile: Fix help text for target 'installer'
  s390: avoid always-true comparison in s390_pci_generate_fid()
  migration: Remove unneeded NULL check from migrate_fd_error()
  scripts/hxtool: fix undefined behavour of echo
  qemu-options.hx: set: fix copy-paste error
  usb: Change *_exitfn return type from int to void
  MAINTAINERS: qemu-trivial information
  colo-compare: remove unused struct CompareChardevProps and 'props' variable
  milkymist-pfpu: fix potential integer overflow
  hw/block/nvme: Simplify if-statements a little bit
  target-lm32: rewrite gen_compare()
  lm32: milkymist-tmu2: fix integer overflow
  target-lm32: disable asm logging via LOG_DIS()
  target-lm32: swap operand of wcsr in LOG_DIS()
  target-lm32: fix LOG_DIS operand order
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31 11:58:30 +00:00
Anand J 814bb12a56 clean-up: removed duplicate #includes
Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Anand J <anand.indukala@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-28 18:17:24 +03:00
Bharata B Rao cf63246319 spapr: Memory hot-unplug support
Add support to hot remove pc-dimm memory devices.

Since we're introducing a machine-level unplug_request hook, we also
had handling for CPU unplug there as well to ensure CPU unplug
continues to work as it did before.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* add hooks to CAS/cmdline enablement of hotplug ACR support
* add hook for CPU unplug
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 11:17:35 +11:00
Michael Roth 79b78a6bd4 spapr: use count+index for memory hotplug
Commit 0a417869:

    spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type

dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.

With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.

Cc: bharata@linux.vnet.ibm.com
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 11:17:35 +11:00
Michael Roth f622921430 spapr: add hotplug interrupt machine options
This adds machine options of the form:

  -machine pseries,modern-hotplug-events=true
  -machine pseries,modern-hotplug-events=false

If false, QEMU will force the use of "legacy" style hotplug events,
which are surfaced through EPOW events instead of a dedicated
hot plug event source, and lack certain features necessary, mainly,
for memory unplug support.

If true, QEMU will enable support for "modern" dedicated hot plug
event source. Note that we will still default to "legacy" style unless
the guest advertises support for the "modern" hotplug events via
ibm,client-architecture-support hcall during early boot.

For pseries-2.7 and earlier we default to false, for newer machine
types we default to true.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 11:17:35 +11:00
Michael Roth ffbb1705a3 spapr_events: add support for dedicated hotplug event source
Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.

Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 11:17:35 +11:00
Michael Roth 417ece33fc spapr: improve ibm,architecture-vec-5 property handling
ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.

Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
Michael Roth 6787d27b04 spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
   call spapr_dt_cas_updates() with the set of capabilities
   added since the previous call to ibm,client-architecture-support.
   For the initial boot, or a system reset generated by something
   other than the CAS call itself, this set will consist of *all*
   options supported both the platform and the guest. For calls
   to ibm,client-architecture-support immediately after a CAS-induced
   reset, we call spapr_dt_cas_updates() with only the set
   of capabilities added since the previous call, since the other
   capabilities will have already been addressed by the boot-time
   device-tree this time around. In the unlikely event that
   capabilities are *removed* since the previous CAS, we will
   generate a CAS-induced reset. In the unlikely event that we
   cannot fit the device-tree updates into the buffer provided
   by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
   include the updates in the boot-time device tree, we call the
   spapr_dt_cas_updates() using the full set of negotiated
   capabilities as part of the reset path. At initial boot, or after
   a reset generated by something other than the CAS call itself,
   this set will be empty, resulting in what should be the same
   boot-time device-tree as we generated prior to this patch. For
   CAS-induced reset, this routine will be called with the full set of
   capabilities negotiated by the platform/guest in the previous
   CAS call, which should result in CAS updates from previous call
   being accounted for in the initial boot-time device tree.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
Michael Roth facdb8b63b spapr_hcall: use spapr_ovec_* interfaces for CAS options
Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28 09:38:26 +11:00
David Gibson 398a0bd5ae pseries: Remove spapr_create_fdt_skel()
For historical reasons construction of the guest device tree in spapr is
divided between spapr_create_fdt_skel() which is called at init time, and
spapr_build_fdt() which runs at reset time.  Over time, more and more
things have needed to be moved to reset time.

Previous cleanups mean the only things left in spapr_create_fdt_skel() are
the properties of the root node itself.  Finish consolidating these two
parts of device tree construction, by moving this to the start of
spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson bf5a6696ba pseries: Consolidate construction of /vdevice device tree node
Construction of the /vdevice node (and its children) is divided between
spapr_create_fdt_skel() (at init time), which creates the base node, and
spapr_populate_vdevice() (at reset time) which creates the nodes for each
individual virtual device.

This consolidates both into a single function called from
spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson fca5f2dc6c pseries: Move /hypervisor node construction to fdt_build_fdt()
Currently the /hypervisor device tree node is constructed in
spapr_create_fdt_skel().  As part of consolidating device tree construction
to reset time, move it to a function called from spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson ffb1e275a6 pseries: Move /event-sources construction to spapr_build_fdt()
The /event-sources device tree node is built from spapr_create_fdt_skel().
As part of consolidating device tree construction to reset time, this moves
it to spapr_build_fdt().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 3f5dabceba pseries: Consolidate construction of /rtas device tree node
For historical reasons construction of the /rtas node in the device
tree (amongst others) is split into several places.  In particular
it's split between spapr_create_fdt_skel(), spapr_build_fdt() and
spapr_rtas_device_tree_setup().

In fact, as well as adding the actual RTAS tokens to the device tree,
spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity
property, which despite going in the /rtas node, doesn't have a lot to
do with RTAS.

This patch consolidates the code constructing /rtas together into a new
spapr_dt_rtas() function.  spapr_rtas_device_tree_setup() is renamed to
spapr_dt_rtas_tokens() and now only adds the token properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 7c866c6a60 pseries: Consolidate construction of /chosen device tree node
For historical reasons, building the /chosen node in the guest device tree
is split across several places and includes both parts which write the DT
sequentially and others which use random access functions.

This patch consolidates construction of the node into one place, using
random access functions throughout.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 9b9a19080a pseries: Move construction of /interrupt-controller fdt node
Currently the device tree node for the XICS interrupt controller is in
spapr_create_fdt_skel().  As part of consolidating device tree construction
to reset time, this moves it to a function called from spapr_build_fdt().

In addition we move the actual code into hw/intc/xics_spapr.c with the
rest of the PAPR specific interrupt controller code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson 2cac78c12a pseries: Consolidate RTAS loading
At each system reset, the pseries machine needs to load RTAS (the runtime
portion of the guest firmware) into the VM.  This means copying
the actual RTAS code into guest memory, and also updating the device
tree so that the guest OS and boot firmware can locate it.

For historical reasons the copy and update to the device tree were in
different parts of the code.  This cleanup brings them both together in
an spapr_load_rtas() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:26 +11:00
David Gibson cf6e522390 pseries: Move adding of fdt reserve map entries
The flattened device tree passed to pseries guests contains a list of
reserved memory areas.  Currently we construct this list early in
spapr_create_fdt_skel() as we sequentially write the fdt.

This will be inconvenient for upcoming cleanups, so this patch moves
the reserve map changes to the end of fdt construction.  This changes
fdt_add_reservemap_entry() calls - which work when writing the fdt
sequentially to fdt_add_mem_rsv() calls used when altering the fdt in
random access mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:25 +11:00
David Gibson a19f7fb045 pseries: Make spapr_create_fdt_skel() get information from machine state
Currently spapr_create_fdt_skel() takes a bunch of individual parameters
for various things it will put in the device tree.  Some of these can
already be taken directly from sPAPRMachineState.  This patch alters it so
that all of them can be taken from there, which will allow this code to
be moved away from its current caller in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:25 +11:00
David Gibson cae172ab6d pseries: Remove rtas_addr and fdt_addr fields from machinestate
These values are used only within ppc_spapr_reset(), so just change them
to local variables.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28 09:38:25 +11:00
David Gibson 997b6cfc3d pseries: Split device tree construction from device tree load
spapr_finalize_fdt() both finishes building the device tree for the guest
and loads it into guest memory.  For future cleanups, it's going to be
more convenient to do these two things separately.  The loading portion is
pretty trivial, so we move it inline into the caller, ppc_spapr_reset().

We also rename spapr_finalize_fdt(), because the current name is going to
become inaccurate.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-10-28 09:38:25 +11:00
Igor Mammedov 079019f2e3 Increase MAX_CPUMASK_BITS from 255 to 288
so that it would be possible to increase maxcpus limit
for x86 target. Keep spapr/virt_arm at limit they used
to have 255.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24 17:29:15 -02:00
David Gibson 357d1e3bc7 spapr: Improved placement of PCI host bridges in guest memory map
Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
    now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
    for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
    aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:04:15 +11:00
David Gibson daa2369903 spapr_pci: Add a 64-bit MMIO window
On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
  - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
  - A 64-bit window which maps onto a large region somewhere high in PCI
    address space (traditionally this used an identity mapping from guest
    physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however.  At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window.  With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured.  The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified.  This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows.  The default
configuration still uses the legacy mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson 2efff1c0dd spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space.  This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.

Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.

This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.

Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson 6737d9ad79 spapr_pci: Delegate placement of PCI host bridges to machine type
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows.  This gives the most control, but is very awkward with 6
mandatory parameters.  Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO).  Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done.  It doesn't
change the actual location of the host bridges, or any other behaviour.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
Michael Roth 672de881e9 spapr: fix inheritance chain for default machine options
Rather than machine instances having backward-compatible option
defaults that need to be repeatedly re-enabled for every new machine
type we introduce, we set the defaults appropriate for newer machine
types, then add code to explicitly disable instance options as needed
to maintain compatibility with older machine types.

Currently pseries-2.5 does not inherit from pseries-2.6 in this
fashion, which is okay at the moment since we do not have any
instance compatibility options for pseries-2.6+ currently.

We will make use of this in future patches though, so fix it here.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Extended to make 2.7 inherit from 2.8 as well]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 15:33:32 +11:00
Thomas Huth 3daa4a9f95 hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machine
A couple of distributors are compiling their distributions
with "-mcpu=power8" for ppc64le these days, so the user sooner
or later runs into a crash there when not explicitely specifying
the "-cpu POWER8" option to QEMU (which is currently using POWER7
for the "pseries" machine by default). Due to this reason, the
linux-user target already switched to POWER8 a while ago (see commit
de3f1b9841). Since the softmmu target
of course has the same problem, we should switch there to POWER8 for
the newer machine types, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-06 16:15:53 +11:00
Thomas Huth bac3bf287a ppc: Check the availability of transactional memory
KVM-PR currently does not support transactional memory, and the
implementation in TCG is just a fake. We should not announce TM
support in the ibm,pa-features property when running on such a
system, so disable it by default and only enable it if the KVM
implementation supports it (i.e. recent versions of KVM-HV).
These changes are based on some earlier work from Anton Blanchard
(thanks!).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05 11:05:28 +11:00
Thomas Huth 4cbec30d76 hw/ppc/spapr: Fix the selection of the processor features
The current code uses pa_features_206 for POWERPC_MMU_2_06, and
for everything else, it uses pa_features_207. This is bad in some
cases because there is also a "degraded" MMU version of ISA 2.06,
called POWERPC_MMU_2_06a, which should of course use the flags for
2.06 instead. And there is also the possibility that the user runs
the pseries machine with a POWER5+ or even 970 processor. In that
case we certainly do not want to set the flags for 2.07, and rather
simply skip the setting of the pa-features property instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05 11:05:28 +11:00
Thomas Huth 230bf719d3 hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate function
The function spapr_populate_cpu_dt() has become quite big
already, and since we likely have to extend the pa-features
property for every new processor generation, it is nicer
if we put the related code into a separate function.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05 11:05:28 +11:00
David Gibson db800b21d8 pseries: Add 2.8 machine type, set up compatibility macros
Now that 2.7 is released, create the pseries-2.8 machine type and add the
boilerplate compatiblity macro stuff.  There's nothing new to put into the
2.7 compatiliby properties yet, but we'll need something eventually, so
we might as well get it ready now.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-05 11:05:28 +11:00
Peter Maydell c640f2849e * thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-)
* license clarification for compiler.h (Felipe)
 * glib cflags improvement (Marc-André)
 * checkpatch silencing (Paolo)
 * SMRAM migration fix (Paolo)
 * Replay improvements (Pavel)
 * IOMMU notifier improvements (Peter)
 * IOAPIC now defaults to version 0x20 (Peter)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJX6kKUFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 M1UIAKCQ7XfWDoClYd1TyGZ+Qj3K3TrjwLDIl/Z258euyeZ9p7PpqYQ64OCRsREJ
 fsGQOqkFYDe7gi4epJiJOuu4oAW7Xu8G6lB2RfBd7KWVMhsl3Che9AEom7amzyzh
 yoN+g9gwKfAmYwpKyjYWnlWOSjUvif6o0DaTCQCMTaAoEM3b4HKdgHfr6A2dA/E/
 47rtIVp/jNExmrZkaOjnCDS1DJ8XYT3aVeoTkuzRFQ3DBzrAiPABn6B4ExP8IBcJ
 YLFX/W8xG7F3qyXbKQOV/uYM25A55WS5B0G94ZfSlDtUGa/avzS7df9DFD/IWQT+
 RpfiyDdeJueByiTw9R0ZYxFjhd8=
 =g7xm
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-)
* license clarification for compiler.h (Felipe)
* glib cflags improvement (Marc-André)
* checkpatch silencing (Paolo)
* SMRAM migration fix (Paolo)
* Replay improvements (Pavel)
* IOMMU notifier improvements (Peter)
* IOAPIC now defaults to version 0x20 (Peter)

# gpg: Signature made Tue 27 Sep 2016 10:57:40 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (28 commits)
  replay: allow replay stopping and restarting
  replay: vmstate for replay module
  replay: move internal data to the structure
  cpus-common: lock-free fast path for cpu_exec_start/end
  tcg: Make tb_flush() thread safe
  cpus-common: Introduce async_safe_run_on_cpu()
  cpus-common: simplify locking for start_exclusive/end_exclusive
  cpus-common: remove redundant call to exclusive_idle()
  cpus-common: always defer async_run_on_cpu work items
  docs: include formal model for TCG exclusive sections
  cpus-common: move exclusive work infrastructure from linux-user
  cpus-common: fix uninitialized variable use in run_on_cpu
  cpus-common: move CPU work item management to common code
  cpus-common: move CPU list management to common code
  linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  linux-user: Use QemuMutex and QemuCond
  cpus: Rename flush_queued_work()
  cpus: Move common code out of {async_, }run_on_cpu()
  cpus: pass CPUState to run_on_cpu helpers
  build-sys: put glib_cflags in QEMU_CFLAGS
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-28 23:02:56 +01:00
David Gibson 4f01a63779 sysbus: Remove ignored return value of FindSysbusDeviceFunc
Functions of type FindSysbusDeviceFunc currently return an integer.
However, this return value is always ignored by the caller in
find_sysbus_device().

This changes the function type to return void, to avoid confusion over
the function semantics.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27 17:03:34 -03:00
Alex Bennée e0eeb4a21a cpus: pass CPUState to run_on_cpu helpers
CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.

This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov:
 - eliminate more CPUState in user data;
 - remove unnecessary user data passing;
 - fix target-s390x/kvm.c and target-s390x/misc_helper.c]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-27 11:57:29 +02:00
Peter Maydell c229472af0 ppc patch queue 2016-09-23
This pull request supersedes ppc-for-2.8-20160922.  There was a clang
 build error in that, and I've also added one extra patch in the new pull.
 
 Included in this set of ppc and spapr patches are:
     * TCG implementations for more POWER9 instructions
     * Some preliminary XICS fixes in preparataion for the pnv machine type
     * A significant ADB (Macintosh kbd/mouse) cleanup
     * Some conversions to use trace instead of debug macros
     * Fixes to correctly handle global TLB flush synchronization in
       TCG.  This is already a bug, but it will have much more impact
       when we get MTTCG
     * Add more qtest testcases for Power
     * Some MAINTAINERS updates
     * Assorted bugfixes
     * Add the basics of NUMA associativity to the spapr PCI host bridge
 
 This touches some test files and monitor.c which are technically
 outside the ppc code, but coming through this tree because the changes
 are primarily of interest to ppc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJX5NZnAAoJEGw4ysog2bOSoLEP/1YpRFG/6gmiT+T+Btz1QYcd
 eqrJkV63/rY/lvgZOvUBdqA/YKaBSWDOEByNFRZ+Grqz9h5zKrRcmM7IWdRWg+vG
 gyrZUm1pscFG20iGNcenxB8mD0VMk7C77gnUlv12bo+mK+1D1i8eUfKLFqxb0kOx
 JGIRQNG5orF5vZxsyjRPVpvMS9gNG90vrPIypux4ryozCVMWbrjXRZNsPQKz8wb9
 UGcJIFB6R6JVbmBGchi434PEJkcdZzP/a0HvVSO51oGsFBnwYwQ7XVc3PyA4KCD7
 tTbm6T2Rpdak3Pcd/nuzoXCMBCkh48XGKxZ+yPuLXGG5ZGIZ6rzlHPqBsEqqiLz5
 DLzbsxKyLHX2Af87js4J9OXkoNQI4rVGurvNbkQ7IMQ2/Xt97kgUEgr3W0Vj+r82
 bqIqWm4OdJ9cDzTGVlQ7l2vLv6RMe7DrkeWRNEKZZgfir7Hgj1gr79BOe96ETKBd
 7r/1z0fBkZoWSq2OdjX8RouXMwd1Nq3FnqYv2BQ99rvM/AqpkY0HYsPIfUilHq6T
 ZXhvm/4LIEev0F/GiJvV5jHHg637QS4QqdyglF8ODC8vSMvOThhL9Gj7EMgJs7hj
 Ywt1B5y88//Zq4+IGVda98J5ynOZO1CArvzoYR5UMnWiq2K0Lxpq7wemE/finyIK
 0jWLqlmCmYRzsS+oQEg/
 =et1C
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20160923' into staging

ppc patch queue 2016-09-23

This pull request supersedes ppc-for-2.8-20160922.  There was a clang
build error in that, and I've also added one extra patch in the new pull.

Included in this set of ppc and spapr patches are:
    * TCG implementations for more POWER9 instructions
    * Some preliminary XICS fixes in preparataion for the pnv machine type
    * A significant ADB (Macintosh kbd/mouse) cleanup
    * Some conversions to use trace instead of debug macros
    * Fixes to correctly handle global TLB flush synchronization in
      TCG.  This is already a bug, but it will have much more impact
      when we get MTTCG
    * Add more qtest testcases for Power
    * Some MAINTAINERS updates
    * Assorted bugfixes
    * Add the basics of NUMA associativity to the spapr PCI host bridge

This touches some test files and monitor.c which are technically
outside the ppc code, but coming through this tree because the changes
are primarily of interest to ppc.

# gpg: Signature made Fri 23 Sep 2016 08:14:47 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.8-20160923: (45 commits)
  spapr_pci: Add numa node id
  monitor: fix crash for platforms without a CPU 0
  linux-user: ppc64: fix ARCH_206 bit in AT_HWCAP
  ppc/kvm: Mark 64kB page size support as disabled if not available
  ppc/xics: An ICS with offset 0 is assumed to be uninitialized
  ppc/xics: account correct irq status
  Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64.
  target-ppc: tlbie/tlbivax should have global effect
  target-ppc: add flag in check_tlb_flush()
  target-ppc: add TLB_NEED_LOCAL_FLUSH flag
  spapr: Introduce sPAPRCPUCoreClass
  target-ppc: implement darn instruction
  target-ppc: add stxsi[bh]x instruction
  target-ppc: add lxsi[bw]zx instruction
  target-ppc: add xxspltib instruction
  target-ppc: consolidate store conditional
  target-ppc: move out stqcx impementation
  target-ppc: consolidate load with reservation
  target-ppc: convert st[16,32,64]r to use new macro
  target-ppc: convert st64 to use new macro
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-23 14:26:12 +01:00
Fam Zheng 9c5ce8db2e vl: Switch qemu_uuid to QemuUUID
Update all qemu_uuid users as well, especially get rid of the duplicated
low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API.

Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to
QemuUUID is done here too to keep everything in sync and avoid code
churn.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-10-git-send-email-famz@redhat.com>
2016-09-23 11:42:52 +08:00
Nathan Whitehorn 5145ad4fad Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64.
These are mandatory per PAPR and available on Linux 4.3 and newer kernels. The calls in question are required to run FreeBSD guests with reasonable performance, so enable them if possible.

Signed-off-by: Nathan Whitehorn <nwhitehorn@freebsd.org>
[dwg: Added a stub to fix compile without KVM (e.g. on x86 host)]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23 12:39:07 +10:00
Bharata B Rao 7ebaf79556 spapr: Introduce sPAPRCPUCoreClass
Each spapr cpu core type defines an instance_init routine which just
populates the CPU class name. This can be done in the class_init
commonly for all core types which simplifies the registration.
This is inspired by how PowerNV core types are registered.

Certain types of spapr cpu cores ('host' and generic type based on host
CPU) are initialized in target-ppc/kvm.c. To convert these type
registrations to use class_init, we need to expose
spapr_cpu_core_class_init() outside of spapr_cpu_core.c.

Commit d11b268e17 added a generic sPAPR CPU core family
type to support cases like POWER8 CPU type on POWER8E host CPU.
Switching to class_init would fix such scenarios to use the right
CPU thread type instead of defaulting to host-powerpc64-cpu.

In an unrelated cleanup, fix a typo in .get_hotplug_handler routine.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23 12:39:06 +10:00
Cédric Le Goater 3654fa95bc hw/ppc: add a ppc_create_page_sizes_prop() helper routine
The exact same routine will be used in PowerNV.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07 12:40:12 +10:00
Cédric Le Goater ce9863b797 hw/ppc: use error_report instead of fprintf
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07 12:40:12 +10:00
Cédric Le Goater 7804c353a9 hw/ppc: include fdt helper routine in a common file
spapr_pci would also be a good candidate but the macro _FDT is
slightly different. It returns and does not exit.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-07 09:52:14 +10:00
Greg Kurz e703d2f71c ppc: parse cpu features once
Considering that features are converted to global properties and
global properties are automatically applied to every new instance
of created CPU (at object_new() time), there is no point in
parsing cpu_model string every time a CPU created. So move
parsing outside CPU creation loop and do it only once.

Parsing also should be done before any CPU is created so that
features would affect the first CPU a well.

This patch does that for all PowerPC machine types.

It is based on previous work from Bharata:

https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07564.html

Signed-off-by: Greg Kurz <groug@kaod.org>
[clg: only kept the fix for the spapr platform. support for other
      platform will be added in 2.8 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-13 17:32:58 +10:00
Thomas Huth 4babfaf05d hw/ppc/spapr: Look up CPU alias names instead of hard-coding the aliases
Hard-coding the CPU alias names in the spapr_cores[] array has
two big disadvantages:

1) We register a real type with the CPU alias name in
   spapr_cpu_core_register_types() - this prevents us from registering
   a CPU family name in kvm_ppc_register_host_cpu_type() with the same
   name (as we do it for the non-hotpluggable CPU types).

2) It's quite cumbersome to maintain the aliases here in sync with the
   ppc_cpu_aliases list from target-ppc/cpu-models.c.

So let's simply add proper alias lookup to the spapr cpu core code,
too (by checking whether the given model can be used directly, and
if not by trying to look up the given model as an alias name instead).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-10 13:12:20 +10:00
Cédric Le Goater caebf37859 spapr: remove extra type variable
The sPAPR CPU core typename is already available in the upper
block. Let's use it and move the check upward also.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-10 13:12:20 +10:00
David Gibson 3c0c47e346 spapr: Correctly set query_hotpluggable_cpus hook based on machine version
Prior to c8721d3 "spapr: Error out when CPU hotplug is attempted on older
pseries machines", attempting to use query-hotpluggable-cpus on pseries-2.6
and earlier machine types would SEGV.

That change fixed that, but due to some unexpected interactions in init
order and a brown-paper-bag worthy failure to test, it accidentally
disabled query-hotpluggable-cpus for all pseries machine types, including
the current one which should allow it.

In fact, query_hotpluggable_cpus needs to be non-NULL when and only when
the dr_cpu_enabled flag in sPAPRMachineClass is set, which makes
dr_cpu_enabled itself redundant.

This patch removes dr_cpu_enabled, instead directly setting
query_hotpluggable_cpus from the machine class_init functions, and using
that to determine the availability of CPU hotplug when necessary.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-08 09:45:03 +10:00
Bharata B Rao c8721d3599 spapr: Error out when CPU hotplug is attempted on older pseries machines
CPU hotplug and coldplug aren't supported prior to pseries-2.7.  Further,
earlier machine types don't use CPU core objects at all.  These mean that
query-hotpluggable-cpus and coldplug on older pseries machines will crash
QEMU.  It also means that hotpluggable_cpus flag in query-machines will
be incorrectly set to true for pseries < 2.7, since it is based on the
presence of the query_hotpluggable_cpus hook.

- Don't assign the query_hotpluggable_cpus hook for pseries < 2.7
- query_hotpluggable_cpus should therefore never be called on pseries <
  2.7, so add an assert
- spapr_core_pre_plug() should fail hot/cold plug attempts for pseries <
  2.7, since core objects are never used there
- spapr_core_plug() should therefore never be called for pseries < 2.7, so
  add an assert.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Change from query_hotpluggable_cpus returning NULL for pseries < 2.7
 to not being called at all, reword commit message for accuracy]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-08-03 13:08:54 +10:00
Greg Kurz 12bf2d33fe spapr: disintricate core-id from DT semantics
The goal of this patch is to have a stable core-id which does not depend
on any DT related semantics, which involve non-obvious computations on
modern PowerPC server cpus.

With this patch, the DT core id is computed on-demand as:

       (core-id / smp_threads) * smt

where smt is the number of threads per core in the host.

This formula should be consolidated in a helper since it is needed in
several places.

Other uses for core-id includes: compute a stable cpu_index (which
allows random order hotplug/unplug without breaking migration) and
NUMA.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-25 15:43:41 +10:00
Thomas Huth c573fc03da hw/ppc/spapr: Make sure to close the htab_fd when migration is canceled
When canceling a migration process, we currently do not close the
HTAB migration file descriptor since htab_save_complete() is never
called in that case. So we leave the migration process with a
dangling htab_fd value around, and this causes any further migration
attempts to fail. To fix this issue, simply make sure that the
htab_fd is closed during the migration cleanup stage. And since the
cleanup() function is also called when migration succeeds, we can
also remove the call to close_htab_fd() from the htab_save_complete()
function.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354341
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-25 10:19:30 +10:00
Benjamin Herrenschmidt 912acdf487 ppc/hash64: Add proper real mode translation support
This adds proper support for translating real mode addresses based
on the combination of HV and LPCR bits. This handles HRMOR offset
for hypervisor real mode, and both RMA and VRMA modes for guest
real mode. PAPR mode adjusts the offsets appropriately to match the
RMA used in TCG, but we need to limit to the max supported by the
implementation (16G).

This includes some fixes by Cédric Le Goater <clg@kaod.org>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[dwg: Adjusted for differences in my version of the prereq patches]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy ae4de14cd3 spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-05 14:31:08 +10:00
Benjamin Herrenschmidt 27f2458245 ppc/xics: Replace "icp" with "xics" in most places
The "ICP" is a different object than the "XICS". For historical reasons,
we have a number of places where we name a variable "icp" while it contains
a XICSState pointer. There *is* an ICPState structure too so this makes
the code really confusing.

This is a mechanical replacement of all those instances to use the name
"xics" instead. There should be no functional change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[spapr_cpu_init has been moved to spapr_cpu_core.c, change there]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 13:41:47 +10:00
Benjamin Herrenschmidt 161deaf225 ppc/xics: Rename existing xics to xics_spapr
The common class doesn't change, the KVM one is sPAPR specific. Rename
variables and functions to xics_spapr.

Retain the type name as "xics" to preserve migration for existing sPAPR
guests.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 13:41:46 +10:00
Thomas Huth 6cc09e261b hw/ppc/spapr: Add some missing hcall function set strings
Add "hcall-sprg0" (for H_SET_SPRG0), "hcall-copy" (for H_PAGE_INIT)
and "hcall-debug" (for H_LOGICAL_CI_LOAD/STORE) to the property
"ibm,hypertas-functions" to indicate that we support these hypercalls.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01 09:57:01 +10:00
Peter Krempa 27393c33d8 qapi: keep names in 'CpuInstanceProperties' in sync with struct CPUCore
struct CPUCore uses 'id' suffix in the property name. As docs for
query-hotpluggable-cpus state that the cpu core properties should be
passed back to device_add by management in case new members are added
and thus the names for the fields should be kept in sync.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Removed a duplicated word in comment]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-27 13:15:06 +10:00
Igor Mammedov 2474bfd460 spapr: implement query-hotpluggable-cpus callback
It returns a list of present/possible to hotplug CPU
objects with a list of properties to use with
device_add.

in spapr case returned list would looks like:
-> { "execute": "query-hotpluggable-cpus" }
<- {"return": [
     { "props": { "core": 8 }, "type": "POWER8-spapr-cpu-core",
       "vcpus-count": 2 },
     { "props": { "core": 0 }, "type": "POWER8-spapr-cpu-core",
       "vcpus-count": 2,
       "qom-path": "/machine/unattached/device[0]"}
   ]}'

TODO:
  add 'node' property for core <-> numa node mapping

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao 6f4b5c3ec5 spapr: CPU hot unplug support
Remove the CPU core device by removing the underlying CPU thread devices.
Hot removal of CPU for sPAPR guests is achieved by sending the hot unplug
notification to the guest. Release the vCPU object after CPU hot unplug so
that vCPU fd can be parked and reused.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao af81cf323c spapr: CPU hotplug support
Set up device tree entries for the hotplugged CPU core and use the
exising RTAS event logging infrastructure to send CPU hotplug notification
to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao 94a94e4c49 spapr: convert boot CPUs into CPU core devices
Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for
CPU core hotplug. Initialize boot time CPUs as core deivces and prevent
topologies that result in partially filled cores. Both of these are done
only if CPU core hotplug is supported.

Note: An unrelated change in the call to xics_system_init() is done
in this patch as it makes sense to use the local variable smt introduced
in this patch instead of kvmppc_smt_threads() call here.

TODO: We derive sPAPR core type by looking at -cpu <model>. However
we don't take care of "compat=" feature yet for boot time as well
as hotplug CPUs.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:49 +10:00
Bharata B Rao afd10a0fa6 spapr: Move spapr_cpu_init() to spapr_cpu_core.c
Start consolidating CPU init related routines in spapr_cpu_core.c. As
part of this, move spapr_cpu_init() and its dependencies from spapr.c
to spapr_cpu_core.c

No functionality change in this patch.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a
 public(ish) header]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao 3b54254966 spapr: Abstract CPU core device and type specific core devices
Add sPAPR specific abastract CPU core device that is based on generic
CPU core device. Use this as base type to create sPAPR CPU specific core
devices.

TODO:
- Add core types for other remaining CPU types
- Handle CPU model alias correctly

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17 16:33:48 +10:00
Bharata B Rao d0e5a8f293 spapr: Ensure all LMBs are represented in ibm,dynamic-memory
Memory hotplug can fail for some combinations of RAM and maxmem when
DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
on maximum addressable memory returned by guest and this value is currently
being calculated wrongly by the guest kernel routine memory_hotplug_max().
While there is an attempt to fix the guest kernel, this patch works
around the problem within QEMU itself.

memory_hotplug_max() routine in the guest kernel arrives at max
addressable memory by multiplying lmb-size with the lmb-count obtained
from ibm,dynamic-memory property. There are two assumptions here:

- All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
  where only hot-pluggable LMBs are present in this property.
- The memory area comprising of RAM and hotplug region is contiguous: This
  needn't be true always for PowerKVM as there can be gap between
  boot time RAM and hotplug region.

To work around this guest kernel bug, ensure that ibm,dynamic-memory
has information about all the LMBs (RMA, boot-time LMBs, future
hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
hotpluggable region).

RMA is represented separately by memory@0 node. Hence mark RMA LMBs
and also the LMBs for the gap b/n RAM and hotpluggable region as
reserved and as having no valid DRC so that these LMBs are not considered
by the guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14 13:20:01 +10:00
Bharata B Rao 1ea1eefcbb spapr: Introduce pseries-2.7 machine type
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07 10:17:45 +10:00
Bharata B Rao 71c9a3dd04 spapr: Increase hotpluggable memory slots to 256
KVM now supports 512 memslots on PowerPC (earlier it was 32). Allow half
of it (256) to be used as hotpluggable memory slots.

Instead of hard coding the max value, use the KVM supplied value if KVM
is enabled. Otherwise resort to the default value of 32.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07 10:17:45 +10:00
Jianjun Duan 5dd5238c0b spapr: ensure device trees are always associated with DRC
There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Zhou Jie 8afc22a20f Added negative check for get_image_size()
This patch adds check for negative return value from get_image_size(),
where it is missing. It avoids unnecessary two function calls.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Igor Mammedov bacc344c54 machine: add properties to compat_props incrementaly
Switch to adding compat properties incrementaly instead of
completly overwriting compat_props per machine type.
That removes data duplication which we have due to nested
[PC|SPAPR]_COMPAT_* macros.

It also allows to set default device properties from
default foo_machine_options() hook, which will be used
in following patch for putting VMGENID device as
a function if ISA bridge on pc/q35 machines.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: Fixed CCW_COMPAT_* and PC_COMPAT_0_* defines]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Paolo Bonzini 03dd024ff5 hw: explicitly include qemu/log.h
Move the inclusion out of hw/hw.h, most files do not need it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Thomas Huth da34fed707 hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge
QEMU currently crashes when using bad parameters for the
spapr-pci-host-bridge device:

$ qemu-system-ppc64 -device spapr-pci-host-bridge,buid=0x123,liobn=0x321,mem_win_addr=0x1,io_win_addr=0x10
Segmentation fault

The problem is that spapr_tce_find_by_liobn() might return NULL, but
the code in spapr_populate_pci_dt() does not check for this condition
and then tries to dereference this NULL pointer.
Apart from that, the return value of spapr_populate_pci_dt() also
has to be checked for all PCI buses, not only for the last one, to
make sure we catch all errors.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-23 16:52:20 +10:00
Gonglei 1a5512bb7e spapr: fix possible Negative array index read
fix CID 1351391.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1456998223-12356-6-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:56 +02:00
Peter Maydell 84a5a80148 * Log filtering from Alex and Peter
* Chardev fix from Marc-André
 * config.status tweak from David
 * Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
 * get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
 * Coverity fix from myself
 * PKE implementation from myself, based on rth's XSAVE support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW9ErPAAoJEL/70l94x66DJfEH/A/QkMpAhrgNdyVsahzsGrzE
 wx5gHFIc1nBYxyr62w4apUb5jPB7zaXu0LA7EAWDeAe0pyP8hZzLT9kJyOEDsuJu
 zwKN2QeLSNMtPbnbKN0I/YQ2za2xX1V5ruhSeOJoVslUI214hgnAURaGshhQNzuZ
 2CluDT9KgL5cQifAnKs5kJrwhIYShYNQB+1eDC/7wk28dd/EH+sPALIoF+rqrSmt
 Zu4Mdqd+9Ns+oKOjA6br9ULq/Hzg0aDfY82J+XLVVqfF3PXQe8rTDmuMf/7jTn+M
 Un7ZOcei9oZF2/9vfAfKQpDCcgD9HvOUSbgqV/ubmkPPmN/LNJzeKj0fBhrRN+Y=
 =K12D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Log filtering from Alex and Peter
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support

# gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (28 commits)
  target-i386: implement PKE for TCG
  config.status: Pass extra parameters
  char: translate from QIOChannel error to errno
  exec: fix error handling in file_ram_alloc
  cputlb: modernise the debug support
  qemu-log: support simple pid substitution for logs
  target-arm: dfilter support for in_asm
  qemu-log: dfilter-ise exec, out_asm, op and opt_op
  qemu-log: new option -dfilter to limit output
  qemu-log: Improve the "exec" TB execution logging
  qemu-log: Avoid function call for disabled qemu_log_mask logging
  qemu-log: correct help text for -d cpu
  tcg: pass down TranslationBlock to tcg_code_gen
  util: move declarations out of qemu-common.h
  Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
  hw: explicitly include qemu-common.h and cpu.h
  include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
  isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
  Move ParallelIOArg from qemu-common.h to sysemu/char.h
  Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	scripts/clean-includes
2016-03-24 21:42:40 +00:00
Thomas Huth 57c522f47b hw/net/spapr_llan: Enable the RX buffer pools by default for new machines
RX buffer pools are now enabled by default for new machine types.
For older machine types, they are still disabled to avoid breaking
migration.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt 26a7f1291b ppc: Create cpu_ppc_set_papr() helper
And move the code adjusting the MSR mask and calling kvmppc_set_papr()
to it. This allows us to add a few more things such as disabling setting
of MSR:HV and appropriate LPCR bits which will be used when fixing
the exception model.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[clg: removed LPCR setting ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Alexey Kardashevskiy 0ddbd05362 spapr/target-ppc/kvm: Only add hcall-instructions if KVM supports it
ePAPR defines "hcall-instructions" device-tree property which contains
code to call hypercalls in ePAPR paravirtualized guests.  In general
pseries guests won't use this property, instead using the PAPR defined
hypercall interface.

However, this property has been re-used to implement a hack to allow
PR KVM to run (slightly modified) guests in some situations where it
otherwise wouldn't be able to (because the system's L0 hypervisor
doesn't forward the PAPR hypercalls to the PR KVM kernel).

Hence, this property is always present in the device tree for pseries
guests. All KVM guests use it at least to read features via the
KVM_HC_FEATURES hypercall.

The property is populated by the code returned from the KVM's
KVM_PPC_GET_PVINFO ioctl; if not implemented in the KVM, QEMU supplies
code which will fail all hypercall attempts. If QEMU does not create
the property, and the guest kernel is compiled with
CONFIG_EPAPR_PARAVIRT (which is normally the case), there is exactly
the same stub at @epapr_hypercall_start already.

Rather than maintaining this fairly useless stub implementation, it
makes more sense not to create the property in the device tree in the
first place if the host kernel does not implement it.

This changes kvmppc_get_hypercall() to return 1 if the host kernel
does not implement KVM_CAP_PPC_GET_PVINFO. The caller can use it to decide
on whether to create the property or not.

This changes the pseries machine to not create the property if KVM does
not implement KVM_PPC_GET_PVINFO. In practice this means that from now
on the property will not be created if either HV KVM or TCG is used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[reworded commit message for clarity --dwg]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:33 +11:00
Veronia Bahaa f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Eduardo Habkost 0e6aac87fd machine: Use type_init() to register machine classes
Change all machine_init() users that simply call type_register*()
to use type_init().

Cc: Evgeny Voevodin <e.voevodin@samsung.com>
Cc: Maksim Kozlov <m.kozlov@samsung.com>
Cc: Igor Mitsyanko <i.mitsyanko@gmail.com>
Cc: Dmitry Solodkiy <d.solodkiy@samsung.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Michael Walle <michael@walle.cc>
Cc: "Hervé Poussineau" <hpoussin@reactos.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-16 15:34:05 -03:00
David Gibson c18ad9a54b target-ppc: Eliminate kvmppc_kern_htab global
fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM"
purports to remove a hack in the handling of hash page tables (HPTs)
managed by KVM instead of qemu.  However, it actually went in the wrong
direction.

That patch requires anything looking for an external HPT (that is one not
managed by the guest itself) to check both env->external_htab (for a qemu
managed HPT) and kvmppc_kern_htab (for a KVM managed HPT).  That's a
problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places
which need to check for an external HPT are outside that, such as
kvm_arch_get_registers().  The latter was subtly broken by the earlier
patch such that gdbstub can no longer access memory.

Basically a KVM managed HPT is much more like a qemu managed HPT than it is
like a guest managed HPT, so the original "hack" was actually on the right
track.

This partially reverts fa48b43, so we again mark a KVM managed external HPT
by putting a special but non-NULL value in env->external_htab.  It then
goes further, using that marker to eliminate the kvmppc_kern_htab global
entirely.  The ppc_hash64_set_external_hpt() helper function is extended
to set that marker if passed a NULL value (if you're setting an external
HPT, but don't have an actual HPT to set, the assumption is that it must
be a KVM managed HPT).

This also has some flow-on changes to the HPT access helpers, required by
the above changes.

Reported-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2016-03-16 09:55:06 +11:00
David Gibson e5c0d3ce40 target-ppc: Add helpers for updating a CPU's SDR1 and external HPT
When a Power cpu with 64-bit hash MMU has it's hash page table (HPT)
pointer updated by a write to the SDR1 register we need to update some
derived variables.  Likewise, when the cpu is configured for an external
HPT (one not in the guest memory space) some derived variables need to be
updated.

Currently the logic for this is (partially) duplicated in ppc_store_sdr1()
and in spapr_cpu_reset().  In future we're going to need it in some other
places, so make some common helpers for this update.

In addition the new ppc_hash64_set_external_hpt() helper also updates
SDR1 in KVM - it's not updated by the normal runtime KVM <-> qemu CPU
synchronization.  In a sense this belongs logically in the
ppc_hash64_set_sdr1() helper, but that is called from
kvm_arch_get_registers() so can't itself call cpu_synchronize_state()
without infinite recursion.  In practice this doesn't matter because
the only other caller is TCG specific.

Currently there aren't situations where updating SDR1 at runtime in KVM
matters, but there are going to be in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-16 09:55:06 +11:00
Michael S. Tsirkin 226419d615 msi_supported -> msi_nonbroken
Rename controller flag to make it clearer what it means.
Add some documentation as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:45:21 +02:00
Peter Crosthwaite 7ef295ea5b loader: Add data swap option to load-elf
Some CPUs are of an opposite data-endianness to other components in the
system. Sometimes elfs have the data sections layed out with this CPU
data-endianness accounting for when loaded via the CPU, so byte swaps
(relative to other system components) will occur.

The leading example, is ARM's BE32 mode, which is is basically LE with
address manipulation on half-word and byte accesses to access the
hw/byte reversed address. This means that word data is invariant
across LE and BE32. This also means that instructions are still LE.
The expectation is that the elf will be loaded via the CPU in this
endianness scheme, which means the data in the elf is reversed at
compile time.

As QEMU loads via the system memory directly, rather than the CPU, we
need a mechanism to reverse elf data endianness to implement this
possibility.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Greg Kurz 09b5e30da5 spapr: skip configuration section during migration of older machines
Since QEMU 2.4, we have a configuration section in the migration stream.
This must be skipped for older machines, like it is already done for x86.

This patch fixes the migration of pseries-2.3 from/to QEMU 2.3, but it
breaks migration of the same machine from/to QEMU 2.4/2.4.1/2.5. We do
that anyway because QEMU 2.3 is likely to be more widely deployed than
newer QEMU versions.

Fixes: 61964c23e5
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-28 16:19:02 +11:00
Greg Kurz cba0e7796b spapr: disable vmdesc submission for old machines
Since QEMU 2.3, we have a vmdesc section in the migration stream.
This section is not mandatory but when migrating a pseries-2.2
machine from QEMU 2.2, you get a warning at the destination:

qemu-system-ppc64: Expected vmdescription section, but got 0

The warning goes away if we decide to skip vmdesc as well for
older pseries, like it is already done for pc's.

This can only be observed with -cpu POWER7 because POWER8
cannot migrate from QEMU 2.2 to 2.3 (insns_flags2 mismatch).

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-28 16:19:02 +11:00
Greg Kurz 9897e46264 spapr: initialize local Error pointer
This fixes a crash in the target QEMU during migration.

Broken in commit c5f54f3.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[reworded commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-25 13:58:44 +11:00
David Gibson 1c81003acc pseries: Include missing pseries-2.5 compat properties in pseries-2.4
Commit 4b23699 "pseries: Add pseries-2.6 machine type" added a new
SPAPR_COMPAT_2_5 macro in the usual way.  However, it didn't add this
macro to the existing SPAPR_COMPAT_2_4 macro so that pseries-2.4
inherits newer compatibility properties which are needed for 2.5 and
earlier.

This corrects the oversight.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-17 10:25:37 +11:00
David Gibson 378bc21756 migration: ensure htab_save_first completes after timeout
htab_save_first_pass could return without finishing its work due to
timeout. The patch checks if another invocation of it is necessary and
will call it in htab_save_complete if necessary.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[removed overlong line]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-17 09:59:30 +11:00
David Gibson fa48b4328c target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM
With HV KVM, the guest's hash page table (HPT) is managed by the kernel and
not directly accessible to QEMU.  This means that spapr->htab is NULL
and normally env->external_htab would also be NULL for each cpu.

However, that would cause ppc_hash64_load_hpte*() to do the wrong thing in
the few cases where QEMU does need to load entries from the in-kernel HPT.
Specifically, seeing external_htab is NULL, they would look for an HPT
within the guest's address space instead.

To stop that we have an ugly hack in the pseries machine type code to
set external htab to (void *)1 instead.

This patch removes that hack by having ppc_hash64_load_hpte*() explicitly
check kvmppc_kern_htab instead, which makes more sense.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-02-17 09:59:30 +11:00
David Gibson c5f54f3e31 pseries: Move hash page table allocation to reset time
At the moment the size of the hash page table (HPT) is fixed based on the
maximum memory allowed to the guest.  As such, we allocate the table during
machine construction, and just clear it at reset.

However, we're planning to implement a PAPR extension allowing the hash
page table to be resized at runtime.  This will mean that on reset we want
to revert it to the default size.  It also means that when migrating, we
need to make sure the destination allocates an HPT of size matching the
host, since the guest could have changed it before the migration.

This patch replaces the spapr_alloc_htab() and spapr_reset_htab() functions
with a new spapr_reallocate_hpt() function.  This is called at reset and
inbound migration only, not during machine init any more.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-02-17 09:59:30 +11:00
David Gibson 8dfe8e7f4f pseries: Add helper to calculate recommended hash page table size
At present we calculate the recommended hash page table (HPT) size for a
pseries guest just once in ppc_spapr_init() before allocating the HPT.
In future patches we're going to want this calculation in other places, so
this splits it out into a helper function.  While we're at it, change the
calculation to use ctz() instead of an explicit loop.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-02-17 09:59:30 +11:00
David Gibson 715c54071a pseries: Simplify handling of the hash page table fd
When migrating the 'pseries' machine type with KVM, we use a special fd
to access the hash page table stored within KVM.  Usually, this fd is
opened at the beginning of migration, and kept open until the migration
is complete.

However, if there is a guest reset during the migration, the fd can become
stale and we need to re-open it.  At the moment we use an 'htab_fd_stale'
flag in sPAPRMachineState to signal this, which is checked in the migration
iterators.

But that's rather ugly.  It's simpler to just close and invalidate the
fd on reset, and lazily re-open it in migration if necessary.  This patch
implements that change.

This requires a small addition to the machine state's instance_init,
so that htab_fd is initialized to -1 (telling the migration code it
needs to open it) instead of 0, which could be a valid fd.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-02-17 09:59:30 +11:00
David Gibson 98a5d100c2 pseries: Clean up error reporting in htab migration functions
The functions for migrating the hash page table on pseries machine type
(htab_save_setup() and htab_load()) can report some errors with an
explicit fprintf() before returning an appropriate error code.  Change some
of these to use error_report() instead. htab_save_setup() is omitted for
now to avoid conflicts with some other in-progress work.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-01-30 23:37:37 +11:00