Commit graph

2587 commits

Author SHA1 Message Date
Peter Maydell 460b6c8e58 * Speed up AddressSpaceDispatch creation (Alexey)
* Fix kvm.c assert (David)
 * Memory fixes and further speedup (me)
 * Persistent reservation manager infrastructure (me)
 * virtio-serial: add enable_backend callback (Pavel)
 * chardev GMainContext fixes (Peter)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlnFX3UUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq2wf/Z7i67tTQYhaY7trAehdGDLSa6C4m
 0xAex+DVJrpfxFHLINkktx9NpvyZbQ/PuA0+5W10qmfPVF3hddTgLL3Dcg5xkQOh
 qNa2pFPMTn2T4eEdAANycNEF3nz8at5EnZ5anW2uMS41iDMq6aBjPhDgvi/iyG4w
 GBeZFjUUXQ8Wtp5fZJ1RgV/2PFg3W1REodvM143Ge84UUmnltf/snmx3NMQWw5wu
 coZFSIpcachMRxZ+bbLtJnCoRWG+8lkmTXYkswRWGez+WniscR0898RRpD0lJgIA
 cgeX5Cg/EbBIpwcqjsW2018WlsH5qp4rb6wVuqTY2kzbG+FUyKSqxSwGZw==
 =9GLQ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Speed up AddressSpaceDispatch creation (Alexey)
* Fix kvm.c assert (David)
* Memory fixes and further speedup (me)
* Persistent reservation manager infrastructure (me)
* virtio-serial: add enable_backend callback (Pavel)
* chardev GMainContext fixes (Peter)

# gpg: Signature made Fri 22 Sep 2017 20:07:33 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (32 commits)
  chardev: remove context in chr_update_read_handler
  chardev: use per-dev context for io_add_watch_poll
  chardev: add Chardev.gcontext field
  chardev: new qemu_chr_be_update_read_handlers()
  scsi: add persistent reservation manager using qemu-pr-helper
  scsi: add multipath support to qemu-pr-helper
  scsi: build qemu-pr-helper
  scsi, file-posix: add support for persistent reservation management
  memory: Share special empty FlatView
  memory: seek FlatView sharing candidates among children subregions
  memory: trace FlatView creation and destruction
  memory: Create FlatView directly
  memory: Get rid of address_space_init_shareable
  memory: Rework "info mtree" to print flat views and dispatch trees
  memory: Do not allocate FlatView in address_space_init
  memory: Share FlatView's and dispatch trees between address spaces
  memory: Move address_space_update_ioeventfds
  memory: Alloc dispatch tree where topology is generared
  memory: Store physical root MR in FlatView
  memory: Rename mem_begin/mem_commit/mem_add helpers
  ...

# Conflicts:
#	configure
2017-09-23 12:55:40 +01:00
John Snow 159a9df021 ide: fix enum comparison for gcc 4.7
Apparently GCC gets bent over comparing enum values against zero.
Replace the conditional with something less readable.

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170921013821.1673-1-jsnow@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 13:23:53 +01:00
Alexey Kardashevskiy b516572f31 memory: Get rid of address_space_init_shareable
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-17-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Subbaraya Sundeep ebc1fbb4a1 msf2: Add Smartfusion2 SoC
Smartfusion2 SoC has hardened Microcontroller subsystem
and flash based FPGA fabric. This patch adds support for
Microcontroller subsystem in the SoC.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-5-f4bug@amsat.org
[PMD: drop cpu_model to directly use cpu type, check m3clk non null]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep 268ee7deb4 msf2: Add Smartfusion2 SPI controller
Modelled Microsemi's Smartfusion2 SPI controller.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-4-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep 0ee1e1f469 msf2: Microsemi Smartfusion2 System Register block
Added Sytem register block of Smartfusion2.
This block has PLL registers which are accessed by guest.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-3-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep 96401bad45 msf2: Add Smartfusion2 System timer
Modelled System Timer in Microsemi's Smartfusion2 Soc.
Timer has two 32bit down counters and two interrupts.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-2-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Peter Maydell e1be0a576b nvic: Implement NVIC_ITNS<n> registers
For v8M, the NVIC has a new set of registers per interrupt,
NVIC_ITNS<n>. These determine whether the interrupt targets Secure
or Non-secure state. Implement the register read/write code for
these, and make them cause NVIC_IABR, NVIC_ICER, NVIC_ISER,
NVIC_ICPR, NVIC_IPR and NVIC_ISPR to RAZ/WI for non-secure
accesses to fields corresponding to interrupts which are
configured to target secure state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-8-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell 3b2e934463 nvic: Implement AIRCR changes for v8M
The Application Interrupt and Reset Control Register has some changes
for v8M:
 * new bits SYSRESETREQS, BFHFNMINS and PRIS: these all have
   real state if the security extension is implemented and otherwise
   are constant
 * the PRIGROUP field is banked between security states
 * non-secure code can be blocked from using the SYSRESET bit
   to reset the system if SYSRESETREQS is set

Implement the new state and the changes to register read and write.
For the moment we ignore the effects of the secure PRIGROUP.
We will implement the effects of PRIS and BFHFNMIS later.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-6-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell 5255fcf8e4 nvic: Add cached vectpending_prio state
Instead of looking up the pending priority
in nvic_pending_prio(), cache it in a new state struct
field. The calculation of the pending priority given
the interrupt number is more complicated in v8M with
the security extension, so the caching will be worthwhile.

This changes nvic_pending_prio() from returning a full
(group + subpriority) priority value to returning a group
priority. This doesn't require changes to its callsites
because we use it only in comparisons of the form
  execution_prio > nvic_pending_prio()
and execution priority is always a group priority, so
a test (exec prio > full prio) is true if and only if
(execprio > group_prio).

(Architecturally the expected comparison is with the
group priority for this sort of "would we preempt" test;
we were only doing a test with a full priority as an
optimisation to avoid the mask, which is possible
precisely because the two comparisons always give the
same answer.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-5-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell e93bc2ac11 nvic: Add cached vectpending_is_s_banked state
With banked exceptions, just the exception number in
s->vectpending is no longer sufficient to uniquely identify
the pending exception. Add a vectpending_is_s_banked bool
which is true if the exception is using the sec_vectors[]
array.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1505240046-11454-4-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:23 +01:00
Peter Maydell 17906a162a nvic: Add banked exception states
For the v8M security extension, some exceptions must be banked
between security states. Add the new vecinfo array which holds
the state for the banked exceptions and migrate it if the
CPU the NVIC is attached to implements the security extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-21 16:28:59 +01:00
Pavel Butsykin 55289fb036 virtio-serial: add enable_backend callback
We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-Id: <20170919120733.22020-1-pbutsykin@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 11:51:49 +02:00
Mark Cave-Ayland c110425d16 net: add Sun HME (Happy Meal Ethernet) on-board NIC
Enable it by default for the sparc64-softmmu configuration.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Artyom Tarasenko <atar4qemu@gmail.com>
2017-09-21 08:38:42 +01:00
Peter Maydell d3f5433c7b Machine/CPU/NUMA queue, 2017-09-19
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZwXs9AAoJECgHk2+YTcWmS18P/1OceEmetatwBQ6YWURA0VfU
 CB+nCHxijlWXU8UiwoTVDOXc11P00V5VO0mMFouUbv+o+/80qyAfNl2DhDVq7ZXo
 nhIFHKXUR1BX3YbcqfIonH8xCMGtrAexghhhaGPQbx450PqmGyyjBsZsXttNge1S
 Yt+jzgD9drsZPYw4ZLJjT/tnWI/+8kDPn37jVujzRzApTD9/fq+77ZZq7q25RzQH
 ISa5OXOQk6pq+tPHvaIFXfOqfhILcpM7u/X7MYVwiA1oBqcLXTDCjDX3cKavKade
 +i3yrH7ahNgUO1DyT2bMZX6NAFoZDBrwlYgpkw8n+Yf+EUcdPDOHHEcEeaMpdTGx
 wgWbQrs+xzIg/ulRb8Qqe9FwdXGQbehfFGofd+gnGQ0XuxekT82in+ucMOivQO3x
 W/azGnzoz6D83stJFIZ93S69SRswqBuj2R8mu821yzqx1EUSNXgfKXz9OPwwFed5
 El9YN127F/VAyp0av1CsOg1XgqIujMUGRxGf7eQBfkh1R3C/g2XNPTvR3yaY9L5B
 zuMJfWLF6r6zL53ymt7/9UVEim295Lia3mNGS5/Min5QGms7edphsDsrubzXsZGq
 2owWfAU/KeDH9gNVNNkZdLcEcS5TEz+2oGPR5oeDeB/QlzVdNQ3FeTVlzFavxNQa
 8nrzeFcw7VNrIx2gdvsY
 =LQ8U
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

Machine/CPU/NUMA queue, 2017-09-19

# gpg: Signature made Tue 19 Sep 2017 21:17:01 BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  MAINTAINERS: Update git URLs for my trees
  hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
  NUMA: Replace MAX_NODES with nb_numa_nodes in for loop
  numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed
  arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly
  pc: use generic cpu_model parsing
  vl.c: convert cpu_model to cpu type and set of global properties before machine_init()
  cpu: make cpu_generic_init() abort QEMU on error
  qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts
  hostmem-file: Add "discard-data" option
  osdep: Define QEMU_MADV_REMOVE
  vl: Clean up user-creatable objects when exiting

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-20 17:35:36 +01:00
Igor Mammedov 79e0793614 numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed
Calculating default node-ids for CPUs in possible_cpu_arch_ids()
is rather fragile since defaults calculation uses nb_numa_nodes but
callback might be potentially called early before all -numa CLI
options are parsed, which would lead to cpus assigned only upto
nb_numa_nodes at the time possible_cpu_arch_ids() is called.

Issue was introduced by
(7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus)
and for example CLI:
  -smp 4 -numa node,cpus=0 -numa node
would set props.node-id in possible_cpus array for every non
explicitly mapped CPU to the first node.

Issue is not visible to guest nor to mgmt interface due to
  1) implictly mapped cpus are forced to the first node in
     case of partial mapping
  2) in case of default mapping possible_cpu_arch_ids() is
     called after all -numa options are parsed (resulting
     in correct mapping).

However it's fragile to rely on late execution of
possible_cpu_arch_ids(), therefore add machine specific
callback that returns node-id for CPU and use it to calculate/
set defaults at machine_numa_finish_init() time when all -numa
options are parsed.

Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19 16:51:33 -03:00
David Hildenbrand 53d8e91d64 s390x: move sclp_service_call() to sclp.h
Implemented in sclp.c, so let's move it to the right include file.
Also adjust some includes.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170913132417.24384-9-david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19 18:31:31 +02:00
David Hildenbrand 19c69829d6 s390x: move subsystem_reset() to s390-virtio-ccw.h
Implemented in s390-virtio-ccw.c, so move it to the right header.
We can also drop the extern. Fix up one include.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170913132417.24384-7-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19 18:31:31 +02:00
Gonglei 6c69dfb67e i386/cpu/hyperv: support over 64 vcpus for windows guests
Starting with Windows Server 2012 and Windows 8, if
CPUID.40000005.EAX contains a value of -1, Windows assumes specific
limit to the number of VPs. In this case, Windows Server 2012
guest VMs may use more than 64 VPs, up to the maximum supported
number of processors applicable to the specific Windows
version being used.

https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs

For compatibility, Let's introduce a new property for X86CPU,
named "x-hv-max-vps" as Eduardo's suggestion, and set it
to 0x40 before machine 2.10.

(The "x-" prefix indicates that the property is not supposed to
be a stable user interface.)

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1505143227-14324-1-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19 16:20:49 +02:00
Igor Mammedov ba1ba5cca3 arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly
there are 2 use cases to deal with:
  1: fixed CPU models per board/soc
  2: boards with user configurable cpu_model and fallback to
     default cpu_model if user hasn't specified one explicitly

For the 1st
  drop intermediate cpu_model parsing and use const cpu type
  directly, which replaces:
     typename = object_class_get_name(
           cpu_class_by_name(TYPE_ARM_CPU, cpu_model))
     object_new(typename)
  with
     object_new(FOO_CPU_TYPE_NAME)
  or
     cpu_generic_init(BASE_CPU_TYPE, "my cpu model")
  with
     cpu_create(FOO_CPU_TYPE_NAME)

as result 1st use case doesn't have to invoke not necessary
translation and not needed code is removed.

For the 2nd
 1: set default cpu type with MachineClass::default_cpu_type and
 2: use generic cpu_model parsing that done before machine_init()
    is run and:
    2.1: drop custom cpu_model parsing where pattern is:
       typename = object_class_get_name(
           cpu_class_by_name(TYPE_ARM_CPU, cpu_model))
       [parse_features(typename, cpu_model, &err) ]

    2.2: or replace cpu_generic_init() which does what
         2.1 does + create_cpu(typename) with just
         create_cpu(machine->cpu_type)
as result cpu_name -> cpu_type translation is done using
generic machine code one including parsing optional features
if supported/present (removes a bunch of duplicated cpu_model
parsing code) and default cpu type is defined in an uniform way
within machine_class_init callbacks instead of adhoc places
in boadr's machine_init code.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1505318697-77161-6-git-send-email-imammedo@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19 09:09:32 -03:00
Igor Mammedov 6063d4c0f9 vl.c: convert cpu_model to cpu type and set of global properties before machine_init()
All machines that support user specified cpu_model either call
cpu_generic_init() or cpu_class_by_name()/CPUClass::parse_features
to parse feature string and to get CPU type to create.

Which leads to code duplication and hard-codding default CPU model
within machine_foo_init() code. Which makes it impossible to
get CPU type before machine_init() is run.

So instead of setting default CPUs models and doing parsing in
target specific machine_foo_init() in various ways, provide
a generic data driven cpu_model parsing before machine_init()
is called.

in follow up per target patches, it will allow to:
  * define default CPU type in consistent/generic manner
    per machine type and drop custom code that fallbacks
    to default if cpu_model is NULL
  * drop custom features parsing in targets and do it
    in centralized way.
  * for cases of
      cpu_generic_init(TYPE_BASE/DEFAULT_CPU, "some_cpu")
    replace it with
      cpu_create(machine->cpu_type) || cpu_create(TYPE_FOO)
    depending if CPU type is user settable or not.
    not doing useless parsing and clearly documenting where
    CPU model is user settable or fixed one.

Patch allows machine subclasses to define default CPU type
per machine class at class_init() time and if that is set
generic code will parse cpu_model into a MachineState::cpu_type
which will be used to create CPUs for that machine instance
and allows gradual per board conversion.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1505318697-77161-4-git-send-email-imammedo@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19 09:09:32 -03:00
Paolo Bonzini 08e2c9f19c scsi: move block/scsi.h to include/scsi/constants.h
Complete the transition by renaming this header, which was
shared by block/iscsi.c and the SCSI emulation code.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19 14:09:31 +02:00
Paolo Bonzini e5b5728cd3 scsi: move non-emulation specific code to scsi/
util/scsi.c includes some SCSI code that is shared by block/iscsi.c and
hw/scsi, but the introduction of the persistent reservation helper
will add many more instances of this.  There is also include/block/scsi.h,
which actually is not part of the core block layer.

The persistent reservation manager will also need a home.  A scsi/
directory provides one for both the aforementioned shared code and
the PR manager code.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19 14:09:11 +02:00
Paolo Bonzini 37b6045c45 scsi: rename scsi_build_sense to scsi_convert_sense
After introducing the scsi/ subdirectory, there will be a scsi_build_sense
function that is the same as scsi_req_build_sense but without needing
a SCSIRequest.  The existing scsi_build_sense function gets in the way,
remove it.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19 14:09:11 +02:00
Richard W.M. Jones 5c0919d020 virtio-scsi: Add virtqueue_size parameter allowing virtqueue size to be set.
Since Linux switched to blk-mq as the default in Linux commit
5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume
about 10x as much guest kernel memory.

This commit allows you to choose the virtqueue size for each
virtio-scsi-pci controller like this:

  -device virtio-scsi-pci,id=scsi,virtqueue_size=16

The default is still 128 as before.  Using smaller virtqueue_size
allows many more disks to be added to small memory virtual machines.
For a 1 vCPU, 500 MB, no swap VM I observed:

  With scsi-mq enabled (upstream kernel):              175 disks
    -"- ditto -"-   virtqueue_size=64:                 318 disks
    -"- ditto -"-   virtqueue_size=16:                 775 disks
  With scsi-mq disabled (kernel before 5c279bd9e406): 1755 disks

Note that to have any effect, this requires a kernel patch:

  https://lkml.org/lkml/2017/8/10/689

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170810165255.20865-1-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19 14:09:11 +02:00
Mao Zhongyi 794939e81d hw/ide: Convert DeviceClass init to realize
Replace init with realize in IDEDeviceClass, which has errp
as a parameter. So all the implementations now use error_setg
instead of error_report for reporting error.

Cc: John Snow <jsnow@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>

Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: c4d27b4b5d9e37468e63e35214ce4833ca271542.1505737465.git.maozy.fnst@cn.fujitsu.com
Signed-off-by: John Snow <jsnow@redhat.com>
2017-09-18 19:43:38 -04:00
John Snow 0e168d3551 IDE: replace DEBUG_AIO with trace events
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170901001502.29915-6-jsnow@redhat.com
[Edited enum conditional for Clang --js]
Signed-off-by: John Snow <jsnow@redhat.com>
2017-09-18 15:01:26 -04:00
John Snow 82a13ff821 ATAPI: Replace DEBUG_IDE_ATAPI with tracing events
As part of the ongoing effort to modernize the tracing facilities for
the IDE family of devices, remove PRINTFs in the ATAPI device with
actual tracing events.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170901001502.29915-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2017-09-18 15:01:26 -04:00
John Snow 3eee2611dd IDE: replace DEBUG_IDE with tracing system
Remove the DEBUG_IDE preprocessor definition with something more
appropriately flexible, using the trace-events subsystem.

This will be less prone to bitrot and will more effectively allow
us to target just the functions we care about.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170901001502.29915-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2017-09-18 15:01:25 -04:00
Peter Maydell d535f5d363 ppc patch queue 2017-09-15
Here's the current batch of accumulated ppc patches.  These are all
 pretty simple bugfixes or cleanups, no big new features here.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlm7TegACgkQbDjKyiDZ
 s5K7AhAAiGS7MOhaaA7a/TC4ekYbfxRKH7tE/FB+xvJg4NXp/f1/gj1ZuU4mAZcP
 LVKwTtoNzUjPUputlMafJrFvCAHdDwBzJ1CBSd3H9WCDFDRy/QZH7on7JJQKZwnQ
 Ls33PL5UMkhzsEW7XOb2HHni6VPYtw+Fr6fLLea/6xa+L5qPlsKMN7r8zhoqKnRj
 qkLSwgR64NqaQCTazhL5kty0JcyMivRHRIEqKaDLrs5zDeeP8yvMDr9ZfY+ri+DN
 LnC4u+2b4rSFaL7352i9TVUvFFXSNs45TruMLTAF5d8AwTrYe9yu1BD/Q/bTRszr
 aBAdOQApsPEbBk+TqnHQ+l231ihZpnbTmCc5EsNGYXSBm4P6ealIv3pkzYpAaU0m
 x9LdGbMwiqriGi/70sNpIkyzowNW0UCoKAXjxdNzYethczI08EGyvrjGfPsrU6n4
 w3HLFI+iGD5iFweW5sUbB2puybo32gAKs3j2NO4B1G6NI8NVgKnAFUkkkiM2e8Y8
 Xp8luUacL7x+PB9aAs+QpR2kqbbU1vvEOtQ2LkEQfQ5zJNdAtp3imdhKuFQT+cX+
 7FTvzpDzKThdtdTtizQ8JCoA2Z+LLZlR8wJ76oVe1CSyK0+xxvclipD5MPaSowqW
 UBV47+EuKOkCCNEBn3xxbx08bcTR1OqTaW+xIIdFm2Rdyt0SMDM=
 =GAd3
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20170915' into staging

ppc patch queue 2017-09-15

Here's the current batch of accumulated ppc patches.  These are all
pretty simple bugfixes or cleanups, no big new features here.

# gpg: Signature made Fri 15 Sep 2017 04:50:00 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20170915:
  ppc/kvm: use kvm_vm_check_extension() in kvmppc_is_pr()
  spapr_events: use QTAILQ_FOREACH_SAFE() in spapr_clear_pending_events()
  spapr_cpu_core: cleaning up qdev_get_machine() calls
  spapr_pci: don't create 64-bit MMIO window if we don't need to
  spapr_pci: convert sprintf() to g_strdup_printf()
  spapr_cpu_core: fail gracefully with non-pseries machine types
  xics: fix several error leaks
  vfio, spapr: Fix levels calculation
  spapr_pci: handle FDT creation errors with _FDT()
  spapr_pci: use the common _FDT() helper
  spapr: fix CAS-generated reset
  ppc/xive: fix OV5_XIVE_EXPLOIT bits
  spapr: only update SDR1 once per-cpu during CAS
  spapr_pci: use g_strdup_printf()
  spapr_pci: drop useless check in spapr_populate_pci_child_dt()
  spapr_pci: drop useless check in spapr_phb_vfio_get_loc_code()
  hw/ppc/spapr.c: cleaning up qdev_get_machine() calls
  net: Add SunGEM device emulation as found on Apple UniNorth

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-15 19:00:16 +01:00
Cédric Le Goater 21f3f8db0e ppc/xive: fix OV5_XIVE_EXPLOIT bits
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Benjamin Herrenschmidt f85504b23a net: Add SunGEM device emulation as found on Apple UniNorth
This adds a simplistic emulation of the Sun GEM ethernet controller
found in Apple ASICs.

Currently we only support the Apple UniNorth 1.x variant, but the
other Apple or Sun variants should mostly be a matter of adding
PCI IDs options.

We have a very primitive emulation of a single Broadcom 5201 PHY
which is supported by the MacOS driver.

This model brings out-of-the-box networking to MacOS 9, and all
versions of OS X I tried with the mac99 platform.

Further improvements from Mark:
- Remove sungem.h file, moving constants into sungem.c as required
- Switch to using tracepoints for debugging
- Split register blocks into separate memory regions
- Use arrays in SunGEMState to hold register values
- Add state-saving support

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-15 10:29:48 +10:00
Pranavkumar Sawargaonkar 70bfdce6a1 hw/pci-host/gpex: Set INTx index/gsi mapping
To implement INTx to gsi routing we need to pass the gpex host
bridge the gsi associated to each INTx index. Let's introduce
irq_num array and gpex_set_irq_num setter function.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Tushar Jagad <tushar.jagad@linaro.org>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Feng Kan <fkan@apm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1505296004-6798-2-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-14 18:43:18 +01:00
Alistair Francis 1946809ece xlnx-zcu102: Add a machine level virtualization property
Add a machine level virtualization property. This defaults to false and can be
set to true using this machine command line argument:
    -machine xlnx-zcu102,virtualization=on

This follows what the ARM virt machine does.

This property only applies to the ZCU102 machine. The EP108 machine does
not have this property.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-14 18:43:18 +01:00
Vadim Galitsyn 9aa3397f19 qmp: introduce query-memory-size-summary command
Add a new query-memory-size-summary command which provides the
following memory information in bytes:

  * base-memory - size of "base" memory specified with command line option -m.

  * plugged-memory - amount of memory that was hot-plugged.
    If target does not have CONFIG_MEM_HOTPLUG enabled, no
    value is reported.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Signed-off-by: Mohammed Gamal <mohammed.gamal@profitbricks.com>
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Signed-off-by: Vadim Galitsyn <vadim.galitsyn@profitbricks.com>
Reviewed-by: Eugene Crosser <evgenii.cherkashin@profitbricks.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org
Message-Id: <20170829153022.27004-3-vadim.galitsyn@profitbricks.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Fixup comments as per Igor's review
  Added 'of' from Vadim's reply
2017-09-14 15:52:10 +01:00
Peter Maydell fcea73709b pc, pci, virtio: patches queued before 2.10
A bunch of stuff that was posted before the 2.10 timeframe,
 mostly fixes/cleanups.  New PCI bridges.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZspf2AAoJECgfDbjSjVRpggMIAJ7QZ0nex97iAC0MSss8meLb
 Rs/p9+d2DnpW/eO3sZZTuEl3bryopW1pT/0761UkHbMB5dnNKCCSXcQdeNgPECK3
 TzddK8+9qI5weHv9qBJihc4cVynvFAB0sRFr1QIAanUes7XXEvPn0NOMeeXltbgU
 rA52sc9ksqD8QoUW377/HeXkeM/F8M/bJSR6wxMFfaMMlRUqfxkSTmeYAjk7RDT7
 SMElwg2acsaZ7uP388m9nuXs7nEuYIXRaiwGet9ltXK2E8nheckm0QYVgd7jmrTa
 836iWnXhik1jFmDkMkZpGfBUyfzAVgD4eofO5DLXd17JWU/sZjD3ufP9P3ng63A=
 =5cNH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: patches queued before 2.10

A bunch of stuff that was posted before the 2.10 timeframe,
mostly fixes/cleanups.  New PCI bridges.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 08 Sep 2017 14:15:34 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  fw_cfg: rename read callback
  pci: add reserved slot check to do_pci_register_device()
  pci: move check for existing devfn into new pci_bus_devfn_available() helper
  vmgenid: replace x-write-pointer-available hack
  vhost-user-bridge: fix resume regression (since 2.9)
  libvhost-user: support resuming vq->last_avail_idx based on used_idx
  acpi/vmgenid: change device category to misc
  intel_iommu: fix missing BQL in pt fast path
  docs: update documentation considering PCIE-PCI bridge
  hw/pci: add QEMU-specific PCI capability to the Generic PCI Express Root Port
  hw/pci: introduce bridge-only vendor-specific capability to provide some hints to firmware
  hw/pci: introduce pcie-pci-bridge device
  Revert "ACPI: don't call acpi_pcihp_device_plug_cb on xen"
  hw/acpi: Move acpi_set_pci_info to pcihp
  hw/acpi: Limit hotplug to root bus on legacy mode
  pc: add 2.11 machine types
  vhost: Release memory references on cleanup

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-08 16:04:42 +01:00
Marc-André Lureau 6f6f4aec74 fw_cfg: rename read callback
The callback is called on select.

Furthermore, the next patch introduced a new callback, so rename the
function type with a generic name.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Mark Cave-Ayland 8b8849844f pci: add reserved slot check to do_pci_register_device()
Add a new slot_reserved_mask bitmask to PCIBus indicating whether or not each
PCI slot on the bus is reserved. Ensure that it is initialised to zero to
maintain the existing behaviour that all slots are available by default, and
add the additional check with appropriate error reporting to
do_pci_register_device().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Marc-André Lureau c8389550de vmgenid: replace x-write-pointer-available hack
This compat property sole function is to prevent the device from being
instantiated. Instead of requiring an extra compat property, check if
fw_cfg has DMA enabled.

fw_cfg is a built-in device that is initialized very early by the
machine init code.  We have at least one other device that also
assumes fw_cfg_find() can be safely used on realize: pvpanic.

This has the additional benefit of handling other cases properly, like:

  $ qemu-system-x86_64 -device vmgenid -machine none
  qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
  $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
  qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
  $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
  [boots normally]

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Ben Warren <ben@skyportsystems.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Aleksandr Bezzubikov 226263fb5c hw/pci: add QEMU-specific PCI capability to the Generic PCI Express Root Port
To enable hotplugging of a newly created pcie-pci-bridge,
we need to tell firmware (e.g. SeaBIOS) to reserve
additional buses or IO/MEM/PREF space for pcie-root-port.
Additional bus reservation allows us to hotplug pcie-pci-bridge into this root port.
The number of buses and IO/MEM/PREF space to reserve are provided to the device via
a corresponding property, and to the firmware via new PCI capability.
The properties' default values are -1 to keep default behavior unchanged.

Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Aleksandr Bezzubikov 70e1ee59bb hw/pci: introduce bridge-only vendor-specific capability to provide some hints to firmware
On PCI init PCI bridges may need some extra info about bus number,
IO, memory and prefetchable memory to reserve. QEMU can provide this
with a special vendor-specific PCI capability.

Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Aleksandr Bezzubikov a35fe22655 hw/pci: introduce pcie-pci-bridge device
Introduce a new PCIExpress-to-PCI Bridge device,
which is a hot-pluggable PCI Express device and
supports devices hot-plug with SHPC.

This device is intended to replace the DMI-to-PCI Bridge.

Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Marcel Apfelbaum a6fd5b0e05 pc: add 2.11 machine types
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Sam Bobroff fa98fbfcdf PPC: KVM: Support machine option to set VSMT mode
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.

This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.

Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
BALATON Zoltan 3b09bb0fb9 ppc4xx_i2c: QOMify
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
BALATON Zoltan 0453428047 ppc4xx: Make MAL emulation more generic
Allow MAL with more RX and TX channels as found in newer versions.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
BALATON Zoltan 517284a771 ppc4xx: Move MAL from ppc405_uc to ppc4xx_devs
This device appears in other SoCs as well not just in 405 ones

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Sam Bobroff 2e886fb391 ppc: spapr: Make VCPU ID handling private to SPAPR
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:55 +10:00
Daniel Henrique Barboza 10f12e6450 hw/ppc: CAS reset on early device hotplug
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].

At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.

In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.

A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.

With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.

This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:

- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.

- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.

No changes are made for coldplug devices.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza 5625817423 hw/ppc: clear pending_events on machine reset
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.

This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08 09:30:54 +10:00