Commit graph

66034 commits

Author SHA1 Message Date
Juan Quintela 7df2c7181a virtio: split virtio input host bits from virtio-pci
For consistency with other devices, rename
virtio_host_{initfn,pci_info} to virtio_input_host_{initfn,info}.

Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Juan Quintela ef7e7845b2 virtio: split vhost vsock bits from virtio-pci
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Yuri Benditovich d47e5e31c3 virtio-net: changed VIRTIO_NET_F_RSC_EXT to be 61
Allocated feature bit changed in spec draft per TC request.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Yuri Benditovich 2974e916df virtio-net: support RSC v4/v6 tcp traffic for Windows HCK
This commit adds implementation of RX packets
coalescing, compatible with requirements of Windows
Hardware compatibility kit.

The device enables feature VIRTIO_NET_F_RSC_EXT in
host features if it supports extended RSC functionality
as defined in the specification.
This feature requires at least one of VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6. Windows guest driver acks
this feature only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
is also present.

If the guest driver acks VIRTIO_NET_F_RSC_EXT feature,
the device coalesces TCPv4 and TCPv6 packets (if
respective VIRTIO_NET_F_GUEST_TSO feature is on,
populates extended RSC information in virtio header
and sets VIRTIO_NET_HDR_F_RSC_INFO bit in header flags.
The device does not recalculate checksums in the coalesced
packet, so they are not valid.

In this case:
All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
50000(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '300000(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '500000':

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,
guest_rsc_ext=on,rsc_interval=500000'

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.

A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.

An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet

Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.

Difference between ip v4 and v6 processing:
 Fragment length in ipv4 header includes itself, while it's not
 included for ipv6, thus means ipv6 can carry a real 65535 payload.

Note that main goal of implementing this feature in software
is to create reference setup for certification tests. In such
setups guest migration is not required, so the coalesced packets
not yet delivered to the guest will be lost in case of migration.

Signed-off-by: Wei Xu <wexu@redhat.com>
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Igor Mammedov b137522c35 tests: acpi: use AcpiSdtTable::aml instead of AcpiSdtTable::header::signature
AcpiSdtTable::header::signature is the only remained field from
AcpiTableHeader structure used by tests. Instead of using packed
structure to access signature, just read it directly from table
blob and remove no longer used AcpiSdtTable::header / union and
keep only AcpiSdtTable::aml byte array.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Igor Mammedov b997a04a50 tests: acpi: squash sanitize_fadt_ptrs() into test_acpi_fadt_table()
some parts of sanitize_fadt_ptrs() do redundant job
  - locating FADT
  - checking original checksum

There is no need to do it as test_acpi_fadt_table() already does that,
so drop duplicate code and move remaining fixup code into
test_acpi_fadt_table().

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Igor Mammedov 3314778d88 tests: smbios: fetch whole table in one step instead of reading it step by step
replace a bunch of ACPI_READ_ARRAY/ACPI_READ_FIELD macro, that read
SMBIOS table field by field with one memread() to fetch whole table
at once and drop no longer used ACPI_READ_ARRAY/ACPI_READ_FIELD macro.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Igor Mammedov acee774b3d tests: acpi: reuse fetch_table() in vmgenid-test
Move fetch_table() into acpi-utils.c renaming it to acpi_fetch_table()
and reuse it in vmgenid-test that reads RSDT and then tables it references,
to find and parse VMGNEID SSDT.
While at it wrap RSDT referenced tables enumeration into FOREACH macro
(similar to what we do with QLIST_FOREACH & co) to reuse it with bios and
vmgenid tests.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
Igor Mammedov 59f9c6cc01 tests: acpi: reuse fetch_table() for fetching FACS and DSDT
It allows to remove a bit more of code duplication and
reuse common utility to get ACPI tables from guest (modulo RSDP).

While at it, consolidate signature checking into fetch_table() instead
of open-codding it.

Considering FACS is special and doesn't have checksum, make checksum
validation optin, the same goes for signature verification.

PS:
By pure accident, patch also fixes FACS not being tested against
reference table since it wasn't added to data::tables list.
But we managed not to regress it since reference file was added
by commit
   (d25979380 acpi unit-test: add test files)
back in 2013

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Igor Mammedov 569afd8428 tests: acpi: simplify rsdt handling
RSDT referenced tables always have length at offset 4 and checksum at
offset 9, that's enough for reusing fetch_table() and replacing custom
RSDT fetching code with it.
While at it
 * merge fetch_rsdt_referenced_tables() into test_acpi_rsdt_table()
 * drop test_data::rsdt_table/rsdt_tables_addr/rsdt_tables_nr since
   we need this data only for duration of test_acpi_rsdt_table() to
   fetch other tables and use locals instead.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Igor Mammedov db57544988 tests: acpi: make sure FADT is fetched only once
Whole FADT is fetched as part of RSDT referenced tables in
fetch_rsdt_referenced_tables() albeit a bit later than when FADT
is partially parsed in fadt_fetch_facs_and_dsdt_ptrs().
However there is no reason for calling fetch_rsdt_referenced_tables()
so late, just move it right after we fetched RSDT and before
fadt_fetch_facs_and_dsdt_ptrs(). That way we can reuse whole FADT
fetched by fetch_rsdt_referenced_tables() and avoid duplicate
custom fields fetching in fadt_fetch_facs_and_dsdt_ptrs().

While at it rename fadt_fetch_facs_and_dsdt_ptrs() to
test_acpi_fadt_table(). The follow up patch will merge
fadt_fetch_facs_and_dsdt_ptrs() into test_acpi_rsdt_table(),
so that we would end up calling only test_acpi_FOO_table()
for consistency for tables that require special processing.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Igor Mammedov 81eb530db4 tests: acpi: use AcpiSdtTable::aml in consistent way
Currently in the 1st case we store table body fetched from QEMU in
AcpiSdtTable::aml minus it's header but in the 2nd case when we
load reference aml from disk, it holds whole blob including header.
More over in the 1st case, we read header in separate AcpiSdtTable::header
structure and then jump over hoops to fixup tables and combine both.

Treat AcpiSdtTable::aml as whole table blob approach in both cases
and when fetching tables from QEMU, first get table length and then
fetch whole table into AcpiSdtTable::aml instead if doing it field
by field.

As result
 * AcpiSdtTable::aml is used in consistent manner
 * FADT fixups use offsets from spec instead of being shifted by
   header length
 * calculating checksums and dumping blobs becomes simpler

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Li Qiang da93b82079 util: check the return value of fcntl in qemu_set_{block, nonblock}
Assert that the return value is not an error. This is like commit
7e6478e7d4 for qemu_set_cloexec.

Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Li Qiang b0aa77d36d vhost-user: fix ioeventfd_enabled
Currently, the vhost-user-test assumes the eventfd is available.
However it's not true because the accel is qtest. So the
'vhost_set_vring_file' will not add fds to the msg and the server
side of vhost-user-test will be broken. The bug is in 'ioeventfd_enabled'.
We should make this function return true if not using kvm accel.

Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Li Qiang 82248cd45e tests: vhost-user-test: initialize 'fd' in chr_read
Currently when processing VHOST_USER_SET_VRING_CALL
if 'qemu_chr_fe_get_msgfds' get no fd, the 'fd' will
be a stack uninitialized value.

Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Jian Wang a5390d9367 qemu: avoid memory leak while remove disk
Memset vhost_dev to zero in the vhost_dev_cleanup function.
This causes dev.vqs to be NULL, so that
vqs does not free up space when calling the g_free function.
This will result in a memory leak. But you can't release vqs
directly in the vhost_dev_cleanup function, because vhost_net
will also call this function, and vhost_net's vqs is assigned by array.
In order to solve this problem, we first save the pointer of vqs,
and release the space of vqs after vhost_dev_cleanup is called.

Signed-off-by: Jian Wang <wangjian161@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Thomas Huth 5a0e75f0a9 hw/misc/ivshmem: Remove deprecated "ivshmem" legacy device
It's been marked as deprecated in QEMU v2.6.0 already, so really nobody
should use the legacy "ivshmem" device anymore (but use ivshmem-plain or
ivshmem-doorbell instead). Time to remove the deprecated device now.

Belatedly also update a mention of the deprecated "ivshmem" in the file
docs/specs/ivshmem-spec.txt to "ivshmem-doorbell". Missed in commit
5400c02b90 ("ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem").

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Dongli Zhang 17323e8b68 msix: make pba size math more uniform
In msix_exclusive_bar the bar_pba_size is more than what the pba is
expected to have, although this never affects the bar size.

Specifically, the math in msix_init_exclusive_bar allocates too much
memory in some cases.

For example consider nentries = 8.  msix_exclusive_bar will give us
bar_pba_size = 16.  So 16 bytes.  However 8 bytes would be enough - this
is all that the spec requires.

So in practice bar_pba_size sometimes allocates an extra 8 bytes but
never more.

Since each MSIX entry size is 16 bytes, and since we make sure that
table+pba is a power of two, this always leaves a multiple of 16 bytes
for the PBA, so extra 8 bytes have no effect.

However, its ugly to have pba size temporary variable have an incorrect
value.  For consistency switch to the formula used in msix_init.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
David Hildenbrand b9731850d7 pci/pcie: stop plug/unplug if the slot is locked
We better stop right away. For now, errors would be partially ignored
(so the guest might get informed or the device might get unplugged),
although actual plug/unplug will be reported as failed to the user.

While at it, properly move the check to the pre_plug handler for the plug
case, as we can test the slot state before the device will be realized.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-14 19:31:04 -05:00
Peter Maydell 89bd861c2b x86 queue, 2019-01-14
* Reenable RDTSCP support on Opteron_G[345] CPU models CPU models
   (Borislav Petkov)
 * host-phys-bits-limit option for better control of 5-level EPT
   (Eduardo Habkost)
 * Disable MPX support on named CPU models (Paolo Bonzini)
 * expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX
   as feature words (Vitaly Kuznetsov)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJcPJ3TAAoJECgHk2+YTcWmNlkP/1zlFjCo1em9RLGxeZrkzFna
 ugH/tZxqWQb3MfzNSsj51DGy2NB2WT2BaCtJKWjhzdDTLqveFaFCUKfhGyYETIHU
 OgNbUCjxBkv3vvWJ8c8w7pcc/atPZL7OCqIEbgJ/SwcSl0hF3bJrRJYYuA5dYeWx
 KewMRinFy3VEa+qlTGSAvpyp5C6XxIOPI6COQvu/CVlLmdlj+gURbaTtWHpZ3USU
 tTxSJLjbAPUlULAk57Wh2r5LpEgGzuBts9VosiQWzREz03rqYTk6JPjyxQKEasaz
 2lhjQWriRwcPNpgSSRfdnwMxlKisgLwWf/A9StoNjHWlzcTn6HR7HFP5O6YgKcEV
 EkAhu9UGeltqOHGJPReO6y5J8VHHj7FJZE5PfDE3mYKgmEsn+1O7gOr9amy5PCT9
 a7zozGHWOqJMb05XbB4CxENmgoNRKE9rMhwe5Jjc9dxrHOtgcXoUO9YNYvPE6ZyC
 6GxKdn0rAzZldwuFj8orZZeYLUlfoV0frNWfNKi8FdBad7z5HalVtB7jjaWRw6eI
 mOabTZPV6aaiyHgiR/YGBd3h9kWex9zJWxVDZMoRRFBNFSs8HFKzbVrwv/g6e83B
 2ZUrQ4MLI5cgU8zZ2XdTPx7SVV+Nxr6wWXtx55Sn9LB8IGMdEqZ06W+iZfZgF2PV
 miQR5IsbOapmIBssV1Mo
 =S08V
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging

x86 queue, 2019-01-14

* Reenable RDTSCP support on Opteron_G[345] CPU models CPU models
  (Borislav Petkov)
* host-phys-bits-limit option for better control of 5-level EPT
  (Eduardo Habkost)
* Disable MPX support on named CPU models (Paolo Bonzini)
* expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX
  as feature words (Vitaly Kuznetsov)

# gpg: Signature made Mon 14 Jan 2019 14:33:55 GMT
# gpg:                using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-next-pull-request:
  i386/kvm: add a comment explaining why .feat_names are commented out for Hyper-V feature bits
  x86: host-phys-bits-limit option
  target/i386: Disable MPX support on named CPU models
  target-i386: Reenable RDTSCP support on Opteron_G[345] CPU models CPU models
  i386/kvm: expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX as feature words

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-14 17:35:00 +00:00
Vitaly Kuznetsov abd5fc4c86 i386/kvm: add a comment explaining why .feat_names are commented out for Hyper-V feature bits
Hyper-V .feat_names are, unlike hardware features, commented out and it is
not obvious why we do that. Document the current status quo.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20181221141604.16935-1-vkuznets@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-14 12:28:44 -02:00
Eduardo Habkost 258fe08bd3 x86: host-phys-bits-limit option
Some downstream distributions of QEMU set host-phys-bits=on by
default.  This worked very well for most use cases, because
phys-bits really didn't have huge consequences. The only
difference was on the CPUID data seen by guests, and on the
handling of reserved bits.

This changed in KVM commit 855feb673640 ("KVM: MMU: Add 5 level
EPT & Shadow page table support").  Now choosing a large
phys-bits value for a VM has bigger impact: it will make KVM use
5-level EPT even when it's not really necessary.  This means
using the host phys-bits value may not be the best choice.

Management software could address this problem by manually
configuring phys-bits depending on the size of the VM and the
amount of MMIO address space required for hotplug.  But this is
not trivial to implement.

However, there's another workaround that would work for most
cases: keep using the host phys-bits value, but only if it's
smaller than 48.  This patch makes this possible by introducing a
new "-cpu" option: "host-phys-bits-limit".  Management software
or users can make sure they will always use 4-level EPT using:
"host-phys-bits=on,host-phys-bits-limit=48".

This behavior is still not enabled by default because QEMU
doesn't enable host-phys-bits=on by default.  But users,
management software, or downstream distributions may choose to
change their defaults using the new option.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181211192527.13254-1-ehabkost@redhat.com>
[ehabkost: removed test code while some issues are addressed]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-14 12:23:36 -02:00
Paolo Bonzini ecb85fe48c target/i386: Disable MPX support on named CPU models
MPX support is being phased out by Intel; GCC has dropped it, Linux
is also going to do that.  Even though KVM will have special code
to support MPX after the kernel proper stops enabling it in XCR0,
we probably also want to deprecate that in a few years.  As a start,
do not enable it by default for any named CPU model starting with
the 4.0 machine types; this include Skylake, Icelake and Cascadelake.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181220121100.21554-1-pbonzini@redhat.com>
Reviewed-by:   Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-14 12:23:36 -02:00
Borislav Petkov 483c6ad426 target-i386: Reenable RDTSCP support on Opteron_G[345] CPU models CPU models
The missing functionality was added ~3 years ago with the Linux commit

  46896c73c1a4 ("KVM: svm: add support for RDTSCP")

so reenable RDTSCP support on those CPU models.

Opteron_G2 - being family 15, model 6, doesn't have RDTSCP support
(the real hardware doesn't have it. K8 got RDTSCP support with the NPT
models, i.e., models >= 0x40).

Document the host's minimum required kernel version, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Message-ID: <20181212200803.GG6653@zn.tnic>
[ehabkost: moved compat properties code to pc.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-14 12:23:36 -02:00
Vitaly Kuznetsov a2b107dbbd i386/kvm: expose HV_CPUID_ENLIGHTMENT_INFO.EAX and HV_CPUID_NESTED_FEATURES.EAX as feature words
It was found that QMP users of QEMU (e.g. libvirt) may need
HV_CPUID_ENLIGHTMENT_INFO.EAX/HV_CPUID_NESTED_FEATURES.EAX information. In
particular, 'hv_tlbflush' and 'hv_evmcs' enlightenments are only exposed in
HV_CPUID_ENLIGHTMENT_INFO.EAX.

HV_CPUID_NESTED_FEATURES.EAX is exposed for two reasons: convenience
(we don't need to export it from hyperv_handle_properties() and as
future-proof for Enlightened MSR-Bitmap, PV EPT invalidation and
direct virtual flush features.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20181126135958.20956-1-vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-14 12:23:36 -02:00
Peter Maydell c9d18c1c15 Xen queue
* Xen PV backend 'qdevification'.
   Starting with xen_disk.
 * Performance improvements for xen-block.
 * Remove of the Xen PV domain builder.
 * bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFOBAABCgA4FiEE+AwAYwjiLP2KkueYDPVXL9f7Va8FAlw8krkaHGFudGhvbnku
 cGVyYXJkQGNpdHJpeC5jb20ACgkQDPVXL9f7Va81vQf/XH0vIHytB0xM9qd06Svi
 axW6grfLfHipSIjQwdv8xeJjLJUQxZCV3Q1kg8sHDAioj9LVc0jAiJKMMUkQdnHT
 Ad98PbKrPNdEZJf+dydzFmibuehqD8OHL0Nu4U/k3+PKp/QjWJZvr7SspVxJPvz3
 K29n5HTjfA4QntahSEHKndmbQ9sxPd+Pz8+qZEXBLtogOW0xhK6WAVwkPG4XwhFz
 6wXySfuruSM0v5F3wkdA7FS1rmcdW/prty81MT7zg6h8hcYLo49pbXrRTuB6Psuz
 UWsbNhegevYWpxZAZZv9TvCY6d/5X733wDhENCTD8am+GBoDVV4Dh9YljTTjG4S0
 tA==
 =88n2
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20190114' into staging

Xen queue

* Xen PV backend 'qdevification'.
  Starting with xen_disk.
* Performance improvements for xen-block.
* Remove of the Xen PV domain builder.
* bug fixes.

# gpg: Signature made Mon 14 Jan 2019 13:46:33 GMT
# gpg:                using RSA key 0CF5572FD7FB55AF
# gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>"
# gpg:                 aka "Anthony PERARD <anthony.perard@citrix.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5379 2F71 024C 600F 778A  7161 D8D5 7199 DF83 42C8
#      Subkey fingerprint: F80C 0063 08E2 2CFD 8A92  E798 0CF5 572F D7FB 55AF

* remotes/aperard/tags/pull-xen-20190114: (25 commits)
  xen-block: avoid repeated memory allocation
  xen-block: improve response latency
  xen-block: improve batching behaviour
  xen: Replace few mentions of xend by libxl
  Remove broken Xen PV domain builder
  xen: remove the legacy 'xen_disk' backend
  MAINTAINERS: add myself as a Xen maintainer
  xen: automatically create XenBlockDevice-s
  xen: add a mechanism to automatically create XenDevice-s...
  xen: add implementations of xen-block connect and disconnect functions...
  xen: purge 'blk' and 'ioreq' from function names in dataplane/xen-block.c
  xen: remove 'ioreq' struct/varable/field names from dataplane/xen-block.c
  xen: remove 'XenBlkDev' and 'blkdev' names from dataplane/xen-block
  xen: add header and build dataplane/xen-block.c
  xen: remove unnecessary code from dataplane/xen-block.c
  xen: duplicate xen_disk.c as basis of dataplane/xen-block.c
  xen: add event channel interface for XenDevice-s
  xen: add grant table interface for XenDevice-s
  xen: add xenstore watcher infrastructure
  xen: create xenstore areas for XenDevice-s
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-14 13:54:17 +00:00
Tim Smith c6025bd197 xen-block: avoid repeated memory allocation
The xen-block dataplane currently allocates memory to hold the data for
each request as that request is used, and frees it afterwards. Because
it requires page-aligned blocks, this interacts poorly with non-page-
aligned allocations and balloons the heap.

Instead, allocate the maximum possible buffer size required for the
protocol, which is BLKIF_MAX_SEGMENTS_PER_REQUEST (currently 11) pages
when the request structure is created, and keep that buffer until it is
destroyed. Since the requests are re-used via a free list, this should
actually improve memory usage.

Signed-off-by: Tim Smith <tim.smith@citrix.com>

Re-based and commit comment adjusted.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Tim Smith bfd0d63660 xen-block: improve response latency
If the I/O ring is full, the guest cannot send any more requests
until some responses are sent. Only sending all available responses
just before checking for new work does not leave much time for the
guest to supply new work, so this will cause stalls if the ring gets
full. Also, not completing reads as soon as possible adds latency
to the guest.

To alleviate that, complete IO requests as soon as they come back.
xen_block_send_response() already returns a value indicating whether
a notify should be sent, which is all the batching we need.

Signed-off-by: Tim Smith <tim.smith@citrix.com>

Re-based and commit comment adjusted.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Tim Smith 6de45f9109 xen-block: improve batching behaviour
When I/O consists of many small requests, performance is improved by
batching them together in a single io_submit() call. When there are
relatively few requests, the extra overhead is not worth it. This
introduces a check to start batching I/O requests via blk_io_plug()/
blk_io_unplug() in an amount proportional to the number which were
already in flight at the time we started reading the ring.

Signed-off-by: Tim Smith <tim.smith@citrix.com>

Re-based and commit comment adjusted.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Anthony PERARD 1077bcaccd xen: Replace few mentions of xend by libxl
xend have been replaced by libxenlight (libxl) for many Xen releases
now.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2019-01-14 13:45:40 +00:00
Anthony PERARD 6d7c06c213 Remove broken Xen PV domain builder
It is broken since Xen 4.9 [1] and it will not build in Xen 4.12. Also,
it is not built by default since QEMU 2.6.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg00313.html

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2019-01-14 13:45:40 +00:00
Paul Durrant 19f87870ba xen: remove the legacy 'xen_disk' backend
This backend has now been replaced by the 'xen-qdisk' XenDevice.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 19b845bda8 MAINTAINERS: add myself as a Xen maintainer
I have made many significant contributions to the Xen code in QEMU,
particularly the recent patches introducing a new PV device framework.
I intend to make further significant contributions, porting other PV back-
ends to the new framework with the intent of eventually removing the
legacy code. It therefore seems reasonable that I become a maintainer of
the Xen code.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant db9ff46eeb xen: automatically create XenBlockDevice-s
This patch adds create and destroy function for XenBlockDevice-s so that
they can be created automatically when the Xen toolstack instantiates a new
PV backend via xenstore. When the XenBlockDevice is created this way it is
also necessary to create a 'drive' which matches the configuration that the
Xen toolstack has written into xenstore. This is done by formulating the
parameters necessary for each 'blockdev' layer of the drive and then using
qmp_blockdev_add() to create the layers. Also, for compatibility with the
legacy 'xen_disk' implementation, an iothread is automatically created for
the new XenBlockDevice. This, like the driver layers, will be destroyed
after the XenBlockDevice is unrealized.

The legacy backend scan for 'qdisk' is removed by this patch, which makes
the 'xen_disk' code is redundant. The code will be removed by a subsequent
patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant a783f8ad4e xen: add a mechanism to automatically create XenDevice-s...
...that maintains compatibility with existing Xen toolstacks.

Xen toolstacks instantiate PV backends by simply writing information into
xenstore and expecting a backend implementation to be watching for this.

This patch adds a new 'xen-backend' module to allow individual XenDevice
implementations to register create and destroy functions. The creator
will be called when a tool-stack instantiates a new backend in this way,
and the destructor will then be called after the resulting XenDevice
object is unrealized.

To support this it is also necessary to add new watchers into the XenBus
implementation to handle enumeration of new backends and also destruction
of XenDevice-s when the toolstack sets the backend 'online' key to 0.

NOTE: This patch only adds the framework. A subsequent patch will add a
      creator function for xen-block devices.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant b6af8926fb xen: add implementations of xen-block connect and disconnect functions...
...and wire in the dataplane.

This patch adds the remaining code to make the xen-block XenDevice
functional. The parameters that a block frontend expects to find are
populated in the backend xenstore area, and the 'ring-ref' and
'event-channel' values specified in the frontend xenstore area are
mapped/bound and used to set up the dataplane.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant d4683cf952 xen: purge 'blk' and 'ioreq' from function names in dataplane/xen-block.c
This is a purely cosmetic patch that purges remaining use of 'blk' and
'ioreq' in local function names, and then makes sure all functions are
prefixed with 'xen_block_'.

No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant e7f5b5f841 xen: remove 'ioreq' struct/varable/field names from dataplane/xen-block.c
This is a purely cosmetic patch that purges the name 'ioreq' from struct,
variable and field names. (This name has been problematic for a long time
as 'ioreq' is the name used for generic I/O requests coming from Xen).
The patch replaces 'struct ioreq' with a new 'XenBlockRequest' type and
'ioreq' field/variable names with 'request', and then does necessary
fix-up to adhere to coding style.

Function names are not modified by this patch. They will be dealt with in
a subsequent patch.

No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant f3b604e31d xen: remove 'XenBlkDev' and 'blkdev' names from dataplane/xen-block
This is a purely cosmetic patch that substitutes the old 'struct XenBlkDev'
name with 'XenBlockDataPlane' and 'blkdev' field/variable names with
'dataplane', and then does necessary fix-up to adhere to coding style.

No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant fcab2b464e xen: add header and build dataplane/xen-block.c
This patch adds the transformations necessary to get dataplane/xen-block.c
to build against the new XenBus/XenDevice framework. MAINTAINERS is also
updated due to the introduction of dataplane/xen-block.h.

NOTE: Existing data structure names are retained for the moment. These will
      be modified by subsequent patches. A typedef for XenBlockDataPlane
      has been added to the header (based on the old struct XenBlkDev name
      for the moment) so that the old names don't need to leak out of the
      dataplane code.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant ca07280075 xen: remove unnecessary code from dataplane/xen-block.c
Not all of the code duplicated from xen_disk.c is required as the basis for
the new dataplane implementation so this patch removes extraneous code,
along with the legacy #includes and calls to the legacy xen_pv_printf()
function. Error messages are changed to be reported using error_report().

NOTE: The code is still not yet built. Further transformations will be
      required to make it correctly interface to the new XenBus/XenDevice
      framework. They will be delivered in a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 4ea7d1a7f1 xen: duplicate xen_disk.c as basis of dataplane/xen-block.c
The new xen-block XenDevice implementation requires the same core
dataplane as the legacy xen_disk implementation it will eventually replace.
This patch therefore copies the legacy xen_disk.c source module into a new
dataplane/xen-block.c source module as the basis for the new dataplane and
adjusts the MAINTAINERS file accordingly.

NOTE: The duplicated code is not yet built. It is simply put into place by
      this patch (just fixing style violations) such that the
      modifications that will need to be made to the code are not
      conflated with code movement, thus making review harder.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant a3d669c8bd xen: add event channel interface for XenDevice-s
The legacy PV backend infrastructure provides functions to bind, unbind
and send notifications to event channnels. Similar functionality will be
required by XenDevice implementations so this patch adds the necessary
support.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Patch squashed with:

Patch "xen: add event channel interface for XenDevice-s" makes use of
the type xenevtchn_port_or_error_t, but this isn't avaiable before Xen
4.7. Also the function xen_device_bind_event_channel assign the return
value of xenevtchn_bind_interdomain to channel->local_port but check the
result for error with xendev->local_port.

Fix by:
- removing local_port from struct XenDevice as it isn't use anywere.
- adding a compatibility typedef for xenevtchn_port_or_error_t for Xen
  4.6 and earlier.

As extra, replace the type of XenEventChannel->local_port by
evtchn_port_t.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 4b34b5b140 xen: add grant table interface for XenDevice-s
The legacy PV backend infrastructure provides functions to map, unmap and
copy pages granted by frontends. Similar functionality will be required
by XenDevice implementations so this patch adds the necessary support.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 82a29e3048 xen: add xenstore watcher infrastructure
A Xen PV frontend communicates its state to the PV backend by writing to
the 'state' key in the frontend area in xenstore. It is therefore
necessary for a XenDevice implementation to be notified whenever the
value of this key changes.

This patch adds code to do this as follows:

- an 'fd handler' is registered on the libxenstore handle which will be
  triggered whenever a 'watch' event occurs
- primitives are added to xen-bus-helper to add or remove watch events
- a list of Notifier objects is added to XenBus to provide a mechanism
  to call the appropriate 'watch handler' when its associated event
  occurs

The xen-block implementation is extended with a 'frontend_changed' method,
which calls as-yet stub 'connect' and 'disconnect' functions when the
relevant frontend state transitions occur. A subsequent patch will supply
a full implementation for these functions.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 094a22399f xen: create xenstore areas for XenDevice-s
This patch adds a new source module, xen-bus-helper.c, which builds on
basic libxenstore primitives to provide functions to create (setting
permissions appropriately) and destroy xenstore areas, and functions to
'printf' and 'scanf' nodes therein. The main xen-bus code then uses
these primitives [1] to initialize and destroy the frontend and backend
areas for a XenDevice during realize and unrealize respectively.

The 'xen-block' implementation is extended with a 'get_name' method that
returns the VBD number. This number is required to 'name' the xenstore
areas.

NOTE: An exit handler is also added to make sure the xenstore areas are
      cleaned up if QEMU terminates without devices being unrealized.

[1] The 'scanf' functions are actually not yet needed, but they will be
    needed by code delivered in subsequent patches.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 1a72d9ae31 xen: introduce 'xen-block', 'xen-disk' and 'xen-cdrom'
This patch adds new XenDevice-s: 'xen-disk' and 'xen-cdrom', both derived
from a common 'xen-block' parent type. These will eventually replace the
'xen_disk' (note the underscore rather than hyphen) legacy PV backend but
it is illustrative to build up the implementation incrementally, along with
the XenBus/XenDevice framework. Subsequent patches will therefore add to
these devices' implementation as new features are added to the framework.

After this patch has been applied it is possible to instantiate new
'xen-disk' or 'xen-cdrom' devices with a single 'vdev' parameter, which
accepts values adhering to the Xen VBD naming scheme [1]. For example, a
command-line instantiation of a xen-disk can be done with an argument
similar to the following:

-device xen-disk,vdev=hda

The implementation of the vdev parameter formulates the appropriate VBD
number for use in the PV protocol.

[1] https://xenbits.xen.org/docs/unstable/man/xen-vbd-interface.7.html

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 108f7bba15 xen: introduce new 'XenBus' and 'XenDevice' object hierarchy
This patch adds the basic boilerplate for a 'XenBus' object that will act
as a parent to 'XenDevice' PV backends.
A new 'XenBridge' object is also added to connect XenBus to the system bus.

The XenBus object is instantiated by a new xen_bus_init() function called
from the same sites as the legacy xen_be_init() function.

Subsequent patches will flesh-out the functionality of these objects.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Paul Durrant 2d0ed5e642 xen: re-name XenDevice to XenLegacyDevice...
...and xen_backend.h to xen-legacy-backend.h

Rather than attempting to convert the existing backend infrastructure to
be QOM compliant (which would be hard to do in an incremental fashion),
subsequent patches will introduce a completely new framework for Xen PV
backends. Hence it is necessary to re-name parts of existing code to avoid
name clashes. The re-named 'legacy' infrastructure will be removed once all
backends have been ported to the new framework.

This patch is purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00
Zhao Yan 92dbfcc6d4 xen/pt: allow passthrough of devices with bogus interrupt pin
For some pci device, even its PCI_INTERRUPT_PIN is not 0, it actually
doesn't support INTx mode, so its machine irq read from host sysfs is 0.
In that case, report PCI_INTERRUPT_PIN as 0 to guest and let passthrough
continue.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2019-01-14 13:45:40 +00:00