qemu-patch-raspberry4/hw
Keno Fischer fc78d5ee76 9pfs: Correctly handle cancelled requests
# Background

I was investigating spurious non-deterministic EINTR returns from
various 9p file system operations in a Linux guest served from the
qemu 9p server.

 ## EINTR, ERESTARTSYS and the linux kernel

When a signal arrives that the Linux kernel needs to deliver to user-space
while a given thread is blocked (in the 9p case waiting for a reply to its
request in 9p_client_rpc -> wait_event_interruptible), it asks whatever
driver is currently running to abort its current operation (in the 9p case
causing the submission of a TFLUSH message) and return to user space.
In these situations, the error message reported is generally ERESTARTSYS.
If the userspace processes specified SA_RESTART, this means that the
system call will get restarted upon completion of the signal handler
delivery (assuming the signal handler doesn't modify the process state
in complicated ways not relevant here). If SA_RESTART is not specified,
ERESTARTSYS gets translated to EINTR and user space is expected to handle
the restart itself.

 ## The 9p TFLUSH command

The 9p TFLUSH commands requests that the server abort an ongoing operation.
The man page [1] specifies:

```
If it recognizes oldtag as the tag of a pending transaction, it should
abort any pending response and discard that tag.
[...]
When the client sends a Tflush, it must wait to receive the corresponding
Rflush before reusing oldtag for subsequent messages. If a response to the
flushed request is received before the Rflush, the client must honor the
response as if it had not been flushed, since the completed request may
signify a state change in the server
```

In particular, this means that the server must not send a reply with the
orignal tag in response to the cancellation request, because the client is
obligated to interpret such a reply as a coincidental reply to the original
request.

 # The bug

When qemu receives a TFlush request, it sets the `cancelled` flag on the
relevant pdu. This flag is periodically checked, e.g. in
`v9fs_co_name_to_path`, and if set, the operation is aborted and the error
is set to EINTR. However, the server then violates the spec, by returning
to the client an Rerror response, rather than discarding the message
entirely. As a result, the client is required to assume that said Rerror
response is a result of the original request, not a result of the
cancellation and thus passes the EINTR error back to user space.
This is not the worst thing it could do, however as discussed above, the
correct error code would have been ERESTARTSYS, such that user space
programs with SA_RESTART set get correctly restarted upon completion of
the signal handler.
Instead, such programs get spurious EINTR results that they were not
expecting to handle.

It should be noted that there are plenty of user space programs that do not
set SA_RESTART and do not correctly handle EINTR either. However, that is
then a userspace bug. It should also be noted that this bug has been
mitigated by a recent commit to the Linux kernel [2], which essentially
prevents the kernel from sending Tflush requests unless the process is about
to die (in which case the process likely doesn't care about the response).
Nevertheless, for older kernels and to comply with the spec, I believe this
change is beneficial.

 # Implementation

The fix is fairly simple, just skipping notification of a reply if
the pdu was previously cancelled. We do however, also notify the transport
layer that we're doing this, so it can clean up any resources it may be
holding. I also added a new trace event to distinguish
operations that caused an error reply from those that were cancelled.

One complication is that we only omit sending the message on EINTR errors in
order to avoid confusing the rest of the code (which may assume that a
client knows about a fid if it sucessfully passed it off to pud_complete
without checking for cancellation status). This does mean that if the server
acts upon the cancellation flag, it always needs to set err to EINTR. I
believe this is true of the current code.

[1] https://9fans.github.io/plan9port/man/man9/flush.html
[2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891

Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, send a zero-sized reply instead of detaching the buffer]
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2018-02-01 21:21:27 +01:00
..
9pfs 9pfs: Correctly handle cancelled requests 2018-02-01 21:21:27 +01:00
acpi nvdimm: add 'unarmed' option 2018-01-19 11:18:51 -02:00
adc maint: Fix macros with broken 'do/while(0); ' usage 2018-01-16 14:54:52 +01:00
alpha Merge remote-tracking branch 'origin/master' into HEAD 2018-01-11 22:03:50 +02:00
arm xlnx-zynqmp: Connect the IPI device to the ZynqMP SoC 2018-01-26 11:09:09 +01:00
audio Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
block Block layer patches 2018-01-24 22:55:57 +00:00
bt hw/bt: Replace fprintf(stderr, "*\n" with error_report() 2018-01-22 09:51:00 +01:00
char hw: convert the escc device to keycodemapdb 2018-01-29 09:30:25 +01:00
core linux-user: remove nmi.c and fw-path-provider.c 2018-01-23 14:20:52 +01:00
cpu hw: use "qemu/osdep.h" as first #include in source files 2017-12-18 17:07:02 +03:00
cris cris: use generic cpu_model parsing 2017-10-27 16:03:54 +02:00
display aarch64-softmmu.mak: Use an ARM specific config 2018-01-26 11:09:09 +01:00
dma aarch64-softmmu.mak: Use an ARM specific config 2018-01-26 11:09:09 +01:00
gpio Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
hppa hw/hppa: Implement DINO system board 2018-01-31 05:30:50 -08:00
i2c Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
i386 tpm: add CRB device 2018-01-29 14:22:50 -05:00
ide Pull request for various patches that have been reviewed and 2018-01-23 10:15:09 +00:00
input input: switch devices to keycodemapdb, bugfixes. 2018-01-29 15:52:27 +00:00
intc xlnx-zynqmp-ipi: Initial version of the Xilinx IPI device 2018-01-26 11:09:09 +01:00
ipack pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices 2017-10-15 05:54:43 +03:00
ipmi hw/ipmi: Replace fprintf(stderr, "*\n" with error_report() 2018-01-22 09:51:00 +01:00
isa hw/isa: Replace fprintf(stderr, "*\n" with error_report() 2018-01-22 09:51:00 +01:00
lm32 lm32: lm32_boards: use generic cpu_model parsing 2017-10-27 16:03:54 +02:00
m68k m68k: mcf5208: use generic cpu_model parsing 2017-10-27 16:03:54 +02:00
mem nvdimm: add 'unarmed' option 2018-01-19 11:18:51 -02:00
microblaze xlnx-zynqmp-pmu: Connect the IPI device to the PMU 2018-01-26 11:09:09 +01:00
mips Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
misc Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
moxie hw/moxie/moxiesim: Add support for loading a BIOS on moxiesim 2017-12-21 09:30:31 +01:00
net i.MX: Fix FEC/ENET receive funtions 2018-01-25 11:45:28 +00:00
nios2 nios2: remove duplicated includes (in code commented out) 2017-12-18 17:07:02 +03:00
nvram fw_cfg: fix memory corruption when all fw_cfg slots are used 2018-01-19 11:18:51 -02:00
openrisc openrisc: use generic cpu_model parsing 2017-10-27 16:03:54 +02:00
pci pci/shpc: Move function to generic header file 2018-01-18 21:52:38 +02:00
pci-bridge simba: rename PBMPCIBridge and QOM types to reflect simba naming 2018-01-24 19:19:50 +00:00
pci-host uninorth: convert to trace-events 2018-01-27 17:26:46 +11:00
pcmcia hw: Clean up includes 2016-01-29 15:07:25 +00:00
ppc target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS 2018-01-29 14:24:55 +11:00
s390x s390x: fix storage attributes migration for non-small guests 2018-01-22 11:04:52 +01:00
scsi usb-storage: Fix share-rw option parsing 2018-01-26 07:58:34 +01:00
sd sdhci: fix a NULL pointer dereference due to uninitialized AddresSpace object 2018-01-25 11:45:30 +00:00
sh4 pci: Rename root bus initialization functions for clarity 2017-12-05 19:13:45 +02:00
smbios Merge remote-tracking branch 'origin/master' into HEAD 2018-01-11 22:03:50 +02:00
sparc sun4m: remove include/hw/sparc/sun4m.h and all references to it 2018-01-09 21:48:20 +00:00
sparc64 sun4u: implement power device 2018-01-25 13:39:39 +00:00
ssi xilinx_spips: Correct usage of an uninitialized local variable 2018-01-25 11:45:30 +00:00
timer Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
tpm tpm: add CRB device 2018-01-29 14:22:50 -05:00
tricore tricore: use generic cpu_model parsing 2017-10-27 16:04:27 +02:00
unicore32 hw/unicore32: restrict hw addr defines to source file 2017-12-18 17:07:02 +03:00
usb usb-ccid: convert CCIDCardClass::exitfn() -> unrealize() 2018-01-26 07:59:33 +01:00
vfio Merge remote-tracking branch 'origin/master' into HEAD 2018-01-11 22:03:50 +02:00
virtio Revert "virtio: postpone the execution of event_notifier_cleanup function" 2018-01-24 19:20:19 +02:00
watchdog misc: drop old i386 dependency 2017-12-18 17:07:03 +03:00
xen xen: Add only xen-sysdev to dynamic sysbus device list 2018-01-19 11:18:51 -02:00
xenpv Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
xtensa target/xtensa: allow different default CPU for MMU/noMMU 2018-01-22 11:54:23 -08:00
Makefile.objs 9pfs: fix dependencies 2017-08-30 18:23:25 +02:00