peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-18	block: Drain BH in bdrv_drained_begin	Fam Zheng	1	-8/+14
	During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH in BDRV_POLL_WHILE. As a side effect this fixes a hang in block_job_detach_aio_context during system_reset when a block job is ready: #0 0x0000555555aa79f3 in bdrv_drain_recurse #1 0x0000555555aa825d in bdrv_drained_begin #2 0x0000555555aa8449 in bdrv_drain #3 0x0000555555a9c356 in blk_drain #4 0x0000555555aa3cfd in mirror_drain #5 0x0000555555a66e11 in block_job_detach_aio_context #6 0x0000555555a62f4d in bdrv_detach_aio_context #7 0x0000555555a63116 in bdrv_set_aio_context #8 0x0000555555a9d326 in blk_set_aio_context #9 0x00005555557e38da in virtio_blk_data_plane_stop #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd #13 0x00005555559f6a18 in virtio_pci_reset #14 0x00005555559139a9 in qdev_reset_one #15 0x0000555555916738 in qbus_walk_children #16 0x0000555555913318 in qdev_walk_children #17 0x0000555555916738 in qbus_walk_children #18 0x00005555559168ca in qemu_devices_reset #19 0x000055555581fcbb in pc_machine_reset #20 0x00005555558a4d96 in qemu_system_reset #21 0x000055555577157a in main_loop_should_exit #22 0x000055555577157a in main_loop #23 0x000055555577157a in main The rationale is that the loop in block_job_detach_aio_context cannot make any progress in pausing/completing the job, because bs->in_flight is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop BH. With this patch, it does. Reported-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-3-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-04-11	block: Introduce bdrv_coroutine_enter	Fam Zheng	1	-0/+5
	Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11	async: Introduce aio_co_enter	Fam Zheng	1	-0/+9
	They start the coroutine on the specified context. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11	coroutine: Extract qemu_aio_coroutine_enter	Fam Zheng	1	-0/+5
	It's a variant of qemu_coroutine_enter with an explicit AioContext parameter. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11	block: Make bdrv_parent_drained_begin/end public	Fam Zheng	1	-0/+16
	Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-10	Merge remote-tracking branch ↵	Peter Maydell	2	-0/+2
	'remotes/stsquad/tags/pull-mttcg-fixups-for-rc2-100417-1' into staging Final icount and misc MTTCG fixes for 2.9 Minor differences from: Message-Id: <20170405132503.32125-1-alex.bennee@linaro.org> - dropped new feature patches - last minute typo fix from Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> # gpg: Signature made Mon 10 Apr 2017 11:38:10 BST # gpg: using RSA key 0xFBD0DB095A9E2A44 # gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" # Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44 * remotes/stsquad/tags/pull-mttcg-fixups-for-rc2-100417-1: replay: assert time only goes forward cpus: call cpu_update_icount on read cpu-exec: update icount after each TB_EXIT cpus: introduce cpu_update_icount helper cpus: don't credit executed instructions before they have run cpus: move icount preparation out of tcg_exec_cpu cpus: check cpu->running in cpu_get_icount_raw() cpus: remove icount handling from qemu_tcg_cpu_thread_fn target/i386/misc_helper: wrap BQL around another IRQ generator cpus: fix wrong define name scripts/qemugdb/mtree.py: fix up mtree dump Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-10	cpus: introduce cpu_update_icount helper	Alex Bennée	1	-0/+1
	By holding off updates to timer_state.qemu_icount we can run into trouble when the non-vCPU thread needs to know the time. This helper ensures we atomically update timers_state.qemu_icount based on what has been currently executed. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2017-04-10	cpus: don't credit executed instructions before they have run	Alex Bennée	1	-0/+1
	Outside of the vCPU thread icount time will only be tracked against timers_state.qemu_icount. We no longer credit cycles until they have completed the run. Inside the vCPU thread we adjust for passage of time by looking at how many have run so far. This is only valid inside the vCPU thread while it is running. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2017-04-07	block: Ignore guest dev permissions during incoming migration	Kevin Wolf	1	-0/+2
	Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
2017-04-05	tco: do not generate an NMI	Paolo Bonzini	1	-1/+0
	This behavior is not indicated in the datasheet and can confuse the OS. The TCO can trap NMIs from SERR# or IOCHK# and convert them to SMIs; but any other TCO event is either delivered as an SMI or completely disabled. Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-04	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	1	-6/+4
	* MemoryRegionCache revert * glib optimization workaround * fix "info lapic" segfault on isapc * fix QIOChannel memory leak # gpg: Signature made Mon 03 Apr 2017 18:17:00 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: main-loop: Acquire main_context lock around os_host_main_loop_wait. exec: revert MemoryRegionCache nbd: fix memory leak on socket_connect failed ipmi: Fix macro issues target-i386: fix "info lapic" segfault on isapc iscsi: drop unused IscsiAIOCB.qiov field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-03	sockets: New helper socket_address_crumple()	Markus Armbruster	1	-0/+11
	SocketAddress is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. I intend to limit its use to existing external interfaces. New ones should use SocketAddressFlat. I further intend to convert all internal interfaces to SocketAddressFlat. This helper should go away then. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-8-git-send-email-armbru@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-03	exec: revert MemoryRegionCache	Paolo Bonzini	1	-6/+4
	MemoryRegionCache did not know about virtio support for IOMMUs (because the two features were developed at the same time). Revert MemoryRegionCache to "normal" address_space_* operations for 2.9, as it is simpler than undoing the virtio patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-30	vhost: generalize iommu memory region	Jason Wang	1	-0/+11
	We assumes the iommu_ops were attached to the root region of address space. This may not be true for all kinds of IOMMU implementation and especially after commit 3716d5902d74 ("pci: introduce a bus master container"). So fix this by not assuming as->root has iommu_ops, instead depending on the regions reported by memory listener through: - register a memory listener to dma_as - during region_add, if it's a region of IOMMU, register a specific IOMMU notifier, and store all notifiers in a list. - during region_del, compare and delete the IOMMU notifier from the list This is also a must for making vhost device IOTLB works for all types of IOMMUs. Note, since we register one notifier during each .region_add, the IOTLB may be flushed more than one times, this is suboptimal and could be optimized in the future. Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Fixes: 3716d5902d74 ("pci: introduce a bus master container") Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-03-30	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170329' ↵	Peter Maydell	1	-0/+1
	into staging ppc patch queue for 2017-03-29 Two more bugfixes of sufficient severity to warrant going into 2.9. # gpg: Signature made Wed 29 Mar 2017 04:33:19 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170329: spapr: fix memory hot-unplugging spapr: fix buffer-overflow Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-29	spapr: fix memory hot-unplugging	Laurent Vivier	1	-0/+1
	If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-29	virtio: fix vring_align() on 64-bit windows	Andrew Baumann	1	-1/+1
	long is 32-bits on 64-bit windows, which caused the top half of the address to be truncated; this patch changes it to use the QEMU_ALIGN_UP macro which does not suffer the same problem Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-03-27	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	3	-7/+16
	* MTTCG fix for win32 * virtio-scsi assertion failure * mem-prealloc coverity fix * x86 migration revert which requires more thought * x86 instruction limit (avoids >2 page translation blocks) * nbd dead code cleanup * small memory.c logic fix # gpg: Signature made Mon 27 Mar 2017 17:03:04 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: scsi-generic: Fill in opt_xfer_len in INQUIRY reply if it is zero Revert "apic: save apic_delivered flag" nbd: drop unused NBDClientSession.is_unix field win32: replace custom mutex and condition variable with native primitives mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case. tcg/i386: Check the size of instruction being translated virtio-scsi: Fix acquire/release in dataplane handlers virtio-scsi: Make virtio_scsi_acquire/release public clear pending status before calling memory commit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-27	Revert "apic: save apic_delivered flag"	Paolo Bonzini	1	-2/+0
	This reverts commit 07bfa354772f2de67008dc66c201b627acff0106. The global variable is only read as part of a apic_reset_irq_delivered(); qemu_irq_raise(s->irq); if (!apic_get_irq_delivered()) { sequence, so the value never matters at migration time. Reported-by: Dr. David Alan Gilbert <dglibert@redhat.com> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27	win32: replace custom mutex and condition variable with native primitives	Andrey Shedel	1	-5/+2
	The multithreaded TCG implementation exposed deadlocks in the win32 condition variables: as implemented, qemu_cond_broadcast waited on receivers, whereas the pthreads API it was intended to emulate does not. This was causing a deadlock because broadcast was called while holding the IO lock, as well as all possible waiters blocked on the same lock. This patch replaces all the custom synchronisation code for mutexes and condition variables with native Windows primitives (SRWlocks and condition variables) with the same semantics as their POSIX equivalents. To enable that, it requires a Windows Vista or newer host OS. Signed-off-by: Andrey Shedel <ashedel@microsoft.com> [AB: edited commit message] Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com> Message-Id: <20170324220141.10104-1-Andrew.Baumann@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27	virtio-input: fix eventq batching	Ladi Prosek	1	-1/+4
	virtio_input_send buffers input events until it sees a SYNC. Then it either sends or drops the entire batch, depending on whether eventq has enough space available. The case to avoid here is partial sends where only part of the batch would get to the guest. Using virtqueue_get_avail_bytes to check the state of eventq was not correct. The queue may have a smaller number of larger buffers available so bytes may be enough but the batch would still not be possible to send, leading to the "Huh? No vq elem available" error. Instead of checking available bytes, this patch optimistically pops buffers from the queue and puts them back in case it runs out of space and the batch needs to be dropped. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Message-id: 1490365490-4854-3-git-send-email-lprosek@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-03-24	virtio-scsi: Make virtio_scsi_acquire/release public	Fam Zheng	1	-0/+14
	They will be used in virtio-scsi-dataplane.c as well, so move them to header. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170317061447.16243-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170323' ↵	Peter Maydell	1	-0/+1
	into staging ppc patch queue for 2017-03-23 Just a single bugfix in this batch. It's not strictly in ppc code, though it's for the pseries machine's benefit. Eduardo suggested it go through my tree however. # gpg: Signature made Thu 23 Mar 2017 10:09:17 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170323: numa,spapr: align default numa node memory size to 256MB Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-23	Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging	Peter Maydell	1	-0/+8
	# gpg: Signature made Wed 22 Mar 2017 17:28:56 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: blockjob: add devops to blockjob backends block-backend: add drained_begin / drained_end ops blockjob: add block_job_start_shim blockjob: avoid recursive AioContext locking Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-22	block-backend: add drained_begin / drained_end ops	John Snow	1	-0/+8
	Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-03-22	hw/acpi/vmgenid: prevent more than one vmgenid device	Laszlo Ersek	1	-0/+1
	A system with multiple VMGENID devices is undefined in the VMGENID spec by omission. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Ben Warren <ben@skyportsystems.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2017-03-22	hw/acpi/vmgenid: prevent device realization on pre-2.5 machine types	Laszlo Ersek	2	-0/+5
	The WRITE_POINTER linker/loader command that underlies VMGENID depends on commit baf2d5bfbac0 ("fw-cfg: support writeable blobs", 2017-01-12), which in turn depends on fw_cfg DMA. DMA for fw_cfg is enabled in 2.5+ machine types only (see commit e6915b5f3a87, "fw_cfg: unbreak migration compatibility for 2.4 and earlier machines", 2016-02-18). Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Ben Warren <ben@skyportsystems.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Ben Warren <ben@skyportsystems.com <mailto:ben@skyportsystems.com>> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2017-03-22	numa,spapr: align default numa node memory size to 256MB	Laurent Vivier	1	-0/+1
	Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE). But when "-numa" option is provided without "mem" parameter, the memory is equally divided between nodes, but 8MB aligned. This can be not valid for pseries. In that case we can have: $ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB With this patch, we have: (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 1280 MB node 1 cpus: node 1 size: 1280 MB node 2 cpus: node 2 size: 1536 MB Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-19	qemu-ga: obey LISTEN_PID when using systemd socket activation	Paolo Bonzini	1	-0/+26
	qemu-ga's socket activation support was not obeying the LISTEN_PID environment variable, which avoids that a process uses a socket-activation file descriptor meant for its parent. Mess can for example ensue if a process forks a children before consuming the socket-activation file descriptor and therefore setting O_CLOEXEC on it. Luckily, qemu-nbd also got socket activation code, and its copy does support LISTEN_PID. Some extra fixups are needed to ensure that the code can be used for both, but that's what this patch does. The main change is to replace get_listen_fds's "consume" argument with the FIRST_SOCKET_ACTIVATION_FD macro from the qemu-nbd code. Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-17	block: Always call bdrv_child_check_perm first	Fam Zheng	1	-4/+0
	bdrv_child_set_perm alone is not very usable because the caller must call bdrv_child_check_perm first. This is already encapsulated conveniently in bdrv_child_try_set_perm, so remove the other prototypes from the header and fix the one wrong caller, block/mirror.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-03-16	Merge remote-tracking branch 'remotes/kraxel/tags/pull-cirrus-20170316-1' ↵	Peter Maydell	2	-7/+8
	into staging cirrus: blitter fixes. # gpg: Signature made Thu 16 Mar 2017 09:05:22 GMT # gpg: using RSA key 0x4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/pull-cirrus-20170316-1: cirrus: stop passing around src pointers in the blitter cirrus: stop passing around dst pointers in the blitter cirrus: fix cirrus_invalidate_region cirrus: add option to disable blitter cirrus: switch to 4 MB video memory by default cirrus/vnc: zap bitblit support from console code. fix :cirrus_vga fix OOB read case qemu Segmentation fault # Conflicts: # include/hw/compat.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-16	Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170316' ↵	Peter Maydell	2	-1/+5
	into staging migration/next for 20170316 # gpg: Signature made Thu 16 Mar 2017 08:21:51 GMT # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170316: postcopy: Check for shared memory RAMBlocks: qemu_ram_is_shared vmstate: fix failed iotests case 68 and 91 migration/block: Avoid invoking blk_drain too frequently migration: use "" as the default for tls-creds/hostname Change the method to calculate dirty-pages-rate Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-16	Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging	Peter Maydell	3	-0/+23
	virtio, pci: fixes More fixes missed in the previous pull request. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 16 Mar 2017 02:29:49 GMT # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: virtio-serial-bus: Delete timer from list before free it hw/virtio: fix Power Management Control Register for PCI Express virtio devices hw/virtio: fix Link Control Register for PCI Express virtio devices hw/virtio: fix error enabling flags in Device Control register hw/pcie: fix Extended Configuration Space for devices with no Extended Capabilities Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-16	RAMBlocks: qemu_ram_is_shared	Dr. David Alan Gilbert	1	-0/+1
	Provide a helper to say whether a RAMBlock was created as a shared mapping. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-16	cirrus: switch to 4 MB video memory by default	Gerd Hoffmann	1	-0/+8
	Quoting cirrus source code: Follow real hardware, cirrus card emulated has 4 MB video memory. Also accept 8 MB/16 MB for backward compatibility. So just use 4MB by default. We decided to leave that at 8MB by default a while ago, for live migration compatibility reasons. But we have compat properties to handle that, so that isn't a compeling reason. This also removes some sanity check inconsistencies in the cirrus code. Some places check against the allocated video memory, some places check against the 4MB physical hardware has. Guest code can trigger asserts because of that. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 1489494514-15606-1-git-send-email-kraxel@redhat.com
2017-03-16	cirrus/vnc: zap bitblit support from console code.	Gerd Hoffmann	1	-7/+0
	There is a special code path (dpy_gfx_copy) to allow graphic emulation notify user interface code about bitblit operations carryed out by guests. It is supported by cirrus and vnc server. The intended purpose is to optimize display scrolls and just send over the scroll op instead of a full display update. This is rarely used these days though because modern guests simply don't use the cirrus blitter any more. Any linux guest using the cirrus drm driver doesn't. Any windows guest newer than winxp doesn't ship with a cirrus driver any more and thus uses the cirrus as simple framebuffer. So this code tends to bitrot and bugs can go unnoticed for a long time. See for example commit "3e10c3e vnc: fix qemu crash because of SIGSEGV" which fixes a bug lingering in the code for almost a year, added by commit "c7628bf vnc: only alloc server surface with clients connected". Also the vnc server will throttle the frame rate in case it figures the network can't keep up (send buffers are full). This doesn't work with dpy_gfx_copy, for any copy operation sent to the vnc client we have to send all outstanding updates beforehand, otherwise the vnc client might run the client side blit on outdated data and thereby corrupt the display. So this dpy_gfx_copy "optimization" might even make things worse on slow network links. Lets kill it once for all. Oh, and one more reason: Turns out (after writing the patch) we have a security bug in that code path ... Fixes: CVE-2016-9603 Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 1489494419-14340-1-git-send-email-kraxel@redhat.com
2017-03-16	Change the method to calculate dirty-pages-rate	Chao Fan	1	-1/+4
	In function cpu_physical_memory_sync_dirty_bitmap, file include/exec/ram_addr.h: if (src[idx][offset]) { unsigned long bits = atomic_xchg(&src[idx][offset], 0); unsigned long new_dirty; new_dirty = ~dest[k]; dest[k] \|= bits; new_dirty &= bits; num_dirty += ctpopl(new_dirty); } After these codes executed, only the pages not dirtied in bitmap(dest), but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated. For example: When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b00001111, and atomic_rcu_read(&migration_bitmap_rcu)->bmap = 0b00000011, the new_dirty will be 0b00001100, and this function will return 2 but not 4 which is expected. the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new, so these should be calculated also. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-15	ide: core: add cleanup function	Li Qiang	1	-0/+1
	As the pci ahci can be hotplug and unplug, in the ahci unrealize function it should free all the resource once allocated in the realized function. This patch add ide_exit to free the resource. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Message-id: 1488449293-80280-3-git-send-email-liqiang6-s@360.cn Signed-off-by: John Snow <jsnow@redhat.com>
2017-03-16	hw/virtio: fix Power Management Control Register for PCI Express virtio devices	Marcel Apfelbaum	2	-0/+6
	Make Power Management State flag writable to conform with the PCI Express spec. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-16	hw/virtio: fix Link Control Register for PCI Express virtio devices	Marcel Apfelbaum	2	-0/+7
	Make several Link Control Register flags writable to conform with the PCI Express spec. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-16	hw/virtio: fix error enabling flags in Device Control register	Marcel Apfelbaum	1	-0/+4
	When the virtio devices are PCI Express, make error-enabling flags writable to respect the PCIe spec. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-16	hw/pcie: fix Extended Configuration Space for devices with no Extended ↵	Marcel Apfelbaum	2	-0/+6
	Capabilities Absence of any Extended Capabilities is required to be indicated by an Extended Capability header with a Capability ID of 0000h, a Capability Version of 0h, and a Next Capability Offset of 000h. Instead of inserting a 'NULL' capability is simpler to mark the start of the Extended Configuration Space as read-only to achieve the same behaviour. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-15	pci: introduce a bus master container	Jason Wang	1	-0/+1
	96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU after caching ring translations") tries to make IOMMU works with virtio memory region cache, but it requires IOMMU to be created before any virtio devices. This is sub optimal, fixing this by introduce a bus master container to make sure address space can be initialized during device registering, and then we can safely set alias and make bus_master_enable_region as its subregion during bus master initialization. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-14	icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread	Paolo Bonzini	1	-0/+24
	icount has become much slower after tcg_cpu_exec has stopped using the BQL. There is also a latent bug that is masked by the slowness. The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL timer now has to wake up the I/O thread and wait for it. The rendez-vous is mediated by the BQL QemuMutex: - handle_icount_deadline wakes up the I/O thread with BQL taken - the I/O thread wakes up and waits on the BQL - the VCPU thread releases the BQL a little later - the I/O thread raises an interrupt, which calls qemu_cpu_kick - the VCPU thread notices the interrupt, takes the BQL to process it and waits on it All this back and forth is extremely expensive, causing a 6 to 8-fold slowdown when icount is turned on. One may think that the issue is that the VCPU thread is too dependent on the BQL, but then the latent bug comes in. I first tried removing the BQL completely from the x86 cpu_exec, only to see everything break. The only way to fix it (and make everything slow again) was to add a dummy BQL lock/unlock pair. This is because in -icount mode you really have to process the events before the CPU restarts executing the next instruction. Therefore, this series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in the vCPU thread when running in icount mode. The required changes include: - make the timer notification callback wake up TCG's single vCPU thread when run from another thread. By using async_run_on_cpu, the callback can override all_cpu_threads_idle() when the CPU is halted. - move handle_icount_deadline after qemu_tcg_wait_io_event, so that the timer notification callback is invoked after the dummy work item wakes up the vCPU thread - make handle_icount_deadline run the timers instead of just waking the I/O thread. - stop processing the timers in the main loop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	cpus: define QEMUTimerListNotifyCB for QEMU system emulation	Paolo Bonzini	2	-2/+3
	There is no change for now, because the callback just invokes qemu_notify_event. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h	Paolo Bonzini	2	-1/+2
	This dependency is the wrong way, and we will need util/qemu-timer.h from sysemu/cpus.h in the next patch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	mem-prealloc: reduce large guest start-up and migration time.	Jitendra Kolhe	1	-1/+2
	Using "-mem-prealloc" option for a large guest leads to higher guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest \| Start-up time \| Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB \| 54m11.796s \| 75m43.843s 64 Core - 1TB \| 8m56.576s \| 14m29.049s 64 Core - 256GB \| 2m11.245s \| 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB \| 5m1.027s \| 34m10.565s 64 Core - 1TB \| 1m10.366s \| 8m28.188s 64 Core - 256GB \| 0m19.040s \| 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB \| 1m58.970s \| 31m43.400s 64 Core - 1TB \| 0m39.885s \| 7m55.289s 64 Core - 256GB \| 0m11.960s \| 2m0.135s ----------------------------------------------------------------------- Changed in v2: - modify number of memset threads spawned to min(smp_cpus, 16). - removed 64GB memory restriction for spawning memset threads. Changed in v3: - limit number of threads spawned based on min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus) - implement memset thread specific siglongjmp in SIGBUS signal_handler. Changed in v4 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread as main thread no longer touches any pages. - simplify code my returning memset_thread_failed status from touch_all_pages. Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com> Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	memory_region: Fix name comments	Dr. David Alan Gilbert	1	-6/+12
	The 'name' parameter to memory_region_init_* had been marked as debug only, however vmstate_region_ram uses it as a parameter to qemu_ram_set_idstr to set RAMBlock names and these form part of the migration stream. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170309152708.30635-1-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170314' ↵	Peter Maydell	1	-0/+2
	into staging ppc patch queue for 2017-03-14 This set has a handful og bugfixes to go into qemu-2.9. This includes an update to the dtc/libfdt submodule which will fix the build errors seen on some distributions. # gpg: Signature made Tue 14 Mar 2017 04:00:41 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170314: dtc: Update submodule to avoid build errors pseries: Don't expose PCIe extended config space on older machine types target/ppc: fix cpu_ov setting for 32-bit target/ppc: Fix wrong number of UAMR register Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-14	build: include sys/sysmacros.h for major() and minor()	Christopher Covington	1	-0/+4
	The definition of the major() and minor() macros are moving within glibc to <sys/sysmacros.h>. Include this header when it is available to avoid the following sorts of build-stopping messages: qga/commands-posix.c: In function ‘dev_major_minor’: qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined by <sys/sysmacros.h>. For historical compatibility, it is currently defined by <sys/types.h> as well, but we plan to remove this soon. To use "major", include <sys/sysmacros.h> directly. If you did not intend to use a system-defined macro "major", you should undefine it after including <sys/types.h>. [-Werror] devmajor = major(st.st_rdev); ^~~~~~~~~~~~~~~~~~~~~~~~~~ qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined by <sys/sysmacros.h>. For historical compatibility, it is currently defined by <sys/types.h> as well, but we plan to remove this soon. To use "minor", include <sys/sysmacros.h> directly. If you did not intend to use a system-defined macro "minor", you should undefine it after including <sys/types.h>. [-Werror] devminor = minor(st.st_rdev); ^~~~~~~~~~~~~~~~~~~~~~~~~~ The additional include allows the build to complete on Fedora 26 (Rawhide) with glibc version 2.24.90. Signed-off-by: Christopher Covington <cov@codeaurora.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>