peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-04	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	1	-6/+4
	* MemoryRegionCache revert * glib optimization workaround * fix "info lapic" segfault on isapc * fix QIOChannel memory leak # gpg: Signature made Mon 03 Apr 2017 18:17:00 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: main-loop: Acquire main_context lock around os_host_main_loop_wait. exec: revert MemoryRegionCache nbd: fix memory leak on socket_connect failed ipmi: Fix macro issues target-i386: fix "info lapic" segfault on isapc iscsi: drop unused IscsiAIOCB.qiov field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-03	sockets: New helper socket_address_crumple()	Markus Armbruster	1	-0/+11
	SocketAddress is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. I intend to limit its use to existing external interfaces. New ones should use SocketAddressFlat. I further intend to convert all internal interfaces to SocketAddressFlat. This helper should go away then. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-8-git-send-email-armbru@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-03	exec: revert MemoryRegionCache	Paolo Bonzini	1	-6/+4
	MemoryRegionCache did not know about virtio support for IOMMUs (because the two features were developed at the same time). Revert MemoryRegionCache to "normal" address_space_* operations for 2.9, as it is simpler than undoing the virtio patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-30	vhost: generalize iommu memory region	Jason Wang	1	-0/+11
	We assumes the iommu_ops were attached to the root region of address space. This may not be true for all kinds of IOMMU implementation and especially after commit 3716d5902d74 ("pci: introduce a bus master container"). So fix this by not assuming as->root has iommu_ops, instead depending on the regions reported by memory listener through: - register a memory listener to dma_as - during region_add, if it's a region of IOMMU, register a specific IOMMU notifier, and store all notifiers in a list. - during region_del, compare and delete the IOMMU notifier from the list This is also a must for making vhost device IOTLB works for all types of IOMMUs. Note, since we register one notifier during each .region_add, the IOTLB may be flushed more than one times, this is suboptimal and could be optimized in the future. Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Fixes: 3716d5902d74 ("pci: introduce a bus master container") Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-03-30	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170329' ↵	Peter Maydell	1	-0/+1
	into staging ppc patch queue for 2017-03-29 Two more bugfixes of sufficient severity to warrant going into 2.9. # gpg: Signature made Wed 29 Mar 2017 04:33:19 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170329: spapr: fix memory hot-unplugging spapr: fix buffer-overflow Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-29	spapr: fix memory hot-unplugging	Laurent Vivier	1	-0/+1
	If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-29	virtio: fix vring_align() on 64-bit windows	Andrew Baumann	1	-1/+1
	long is 32-bits on 64-bit windows, which caused the top half of the address to be truncated; this patch changes it to use the QEMU_ALIGN_UP macro which does not suffer the same problem Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-03-27	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	3	-7/+16
	* MTTCG fix for win32 * virtio-scsi assertion failure * mem-prealloc coverity fix * x86 migration revert which requires more thought * x86 instruction limit (avoids >2 page translation blocks) * nbd dead code cleanup * small memory.c logic fix # gpg: Signature made Mon 27 Mar 2017 17:03:04 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: scsi-generic: Fill in opt_xfer_len in INQUIRY reply if it is zero Revert "apic: save apic_delivered flag" nbd: drop unused NBDClientSession.is_unix field win32: replace custom mutex and condition variable with native primitives mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case. tcg/i386: Check the size of instruction being translated virtio-scsi: Fix acquire/release in dataplane handlers virtio-scsi: Make virtio_scsi_acquire/release public clear pending status before calling memory commit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-27	Revert "apic: save apic_delivered flag"	Paolo Bonzini	1	-2/+0
	This reverts commit 07bfa354772f2de67008dc66c201b627acff0106. The global variable is only read as part of a apic_reset_irq_delivered(); qemu_irq_raise(s->irq); if (!apic_get_irq_delivered()) { sequence, so the value never matters at migration time. Reported-by: Dr. David Alan Gilbert <dglibert@redhat.com> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27	win32: replace custom mutex and condition variable with native primitives	Andrey Shedel	1	-5/+2
	The multithreaded TCG implementation exposed deadlocks in the win32 condition variables: as implemented, qemu_cond_broadcast waited on receivers, whereas the pthreads API it was intended to emulate does not. This was causing a deadlock because broadcast was called while holding the IO lock, as well as all possible waiters blocked on the same lock. This patch replaces all the custom synchronisation code for mutexes and condition variables with native Windows primitives (SRWlocks and condition variables) with the same semantics as their POSIX equivalents. To enable that, it requires a Windows Vista or newer host OS. Signed-off-by: Andrey Shedel <ashedel@microsoft.com> [AB: edited commit message] Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com> Message-Id: <20170324220141.10104-1-Andrew.Baumann@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27	virtio-input: fix eventq batching	Ladi Prosek	1	-1/+4
	virtio_input_send buffers input events until it sees a SYNC. Then it either sends or drops the entire batch, depending on whether eventq has enough space available. The case to avoid here is partial sends where only part of the batch would get to the guest. Using virtqueue_get_avail_bytes to check the state of eventq was not correct. The queue may have a smaller number of larger buffers available so bytes may be enough but the batch would still not be possible to send, leading to the "Huh? No vq elem available" error. Instead of checking available bytes, this patch optimistically pops buffers from the queue and puts them back in case it runs out of space and the batch needs to be dropped. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Message-id: 1490365490-4854-3-git-send-email-lprosek@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-03-24	virtio-scsi: Make virtio_scsi_acquire/release public	Fam Zheng	1	-0/+14
	They will be used in virtio-scsi-dataplane.c as well, so move them to header. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170317061447.16243-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170323' ↵	Peter Maydell	1	-0/+1
	into staging ppc patch queue for 2017-03-23 Just a single bugfix in this batch. It's not strictly in ppc code, though it's for the pseries machine's benefit. Eduardo suggested it go through my tree however. # gpg: Signature made Thu 23 Mar 2017 10:09:17 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170323: numa,spapr: align default numa node memory size to 256MB Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-23	Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging	Peter Maydell	1	-0/+8
	# gpg: Signature made Wed 22 Mar 2017 17:28:56 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: blockjob: add devops to blockjob backends block-backend: add drained_begin / drained_end ops blockjob: add block_job_start_shim blockjob: avoid recursive AioContext locking Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-22	block-backend: add drained_begin / drained_end ops	John Snow	1	-0/+8
	Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-03-22	hw/acpi/vmgenid: prevent more than one vmgenid device	Laszlo Ersek	1	-0/+1
	A system with multiple VMGENID devices is undefined in the VMGENID spec by omission. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Ben Warren <ben@skyportsystems.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2017-03-22	hw/acpi/vmgenid: prevent device realization on pre-2.5 machine types	Laszlo Ersek	2	-0/+5
	The WRITE_POINTER linker/loader command that underlies VMGENID depends on commit baf2d5bfbac0 ("fw-cfg: support writeable blobs", 2017-01-12), which in turn depends on fw_cfg DMA. DMA for fw_cfg is enabled in 2.5+ machine types only (see commit e6915b5f3a87, "fw_cfg: unbreak migration compatibility for 2.4 and earlier machines", 2016-02-18). Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Ben Warren <ben@skyportsystems.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Ben Warren <ben@skyportsystems.com <mailto:ben@skyportsystems.com>> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2017-03-22	numa,spapr: align default numa node memory size to 256MB	Laurent Vivier	1	-0/+1
	Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE). But when "-numa" option is provided without "mem" parameter, the memory is equally divided between nodes, but 8MB aligned. This can be not valid for pseries. In that case we can have: $ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB With this patch, we have: (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 1280 MB node 1 cpus: node 1 size: 1280 MB node 2 cpus: node 2 size: 1536 MB Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-19	qemu-ga: obey LISTEN_PID when using systemd socket activation	Paolo Bonzini	1	-0/+26
	qemu-ga's socket activation support was not obeying the LISTEN_PID environment variable, which avoids that a process uses a socket-activation file descriptor meant for its parent. Mess can for example ensue if a process forks a children before consuming the socket-activation file descriptor and therefore setting O_CLOEXEC on it. Luckily, qemu-nbd also got socket activation code, and its copy does support LISTEN_PID. Some extra fixups are needed to ensure that the code can be used for both, but that's what this patch does. The main change is to replace get_listen_fds's "consume" argument with the FIRST_SOCKET_ACTIVATION_FD macro from the qemu-nbd code. Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-17	block: Always call bdrv_child_check_perm first	Fam Zheng	1	-4/+0
	bdrv_child_set_perm alone is not very usable because the caller must call bdrv_child_check_perm first. This is already encapsulated conveniently in bdrv_child_try_set_perm, so remove the other prototypes from the header and fix the one wrong caller, block/mirror.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-03-16	Merge remote-tracking branch 'remotes/kraxel/tags/pull-cirrus-20170316-1' ↵	Peter Maydell	2	-7/+8
	into staging cirrus: blitter fixes. # gpg: Signature made Thu 16 Mar 2017 09:05:22 GMT # gpg: using RSA key 0x4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/pull-cirrus-20170316-1: cirrus: stop passing around src pointers in the blitter cirrus: stop passing around dst pointers in the blitter cirrus: fix cirrus_invalidate_region cirrus: add option to disable blitter cirrus: switch to 4 MB video memory by default cirrus/vnc: zap bitblit support from console code. fix :cirrus_vga fix OOB read case qemu Segmentation fault # Conflicts: # include/hw/compat.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-16	Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170316' ↵	Peter Maydell	2	-1/+5
	into staging migration/next for 20170316 # gpg: Signature made Thu 16 Mar 2017 08:21:51 GMT # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170316: postcopy: Check for shared memory RAMBlocks: qemu_ram_is_shared vmstate: fix failed iotests case 68 and 91 migration/block: Avoid invoking blk_drain too frequently migration: use "" as the default for tls-creds/hostname Change the method to calculate dirty-pages-rate Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-16	Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging	Peter Maydell	3	-0/+23
	virtio, pci: fixes More fixes missed in the previous pull request. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 16 Mar 2017 02:29:49 GMT # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: virtio-serial-bus: Delete timer from list before free it hw/virtio: fix Power Management Control Register for PCI Express virtio devices hw/virtio: fix Link Control Register for PCI Express virtio devices hw/virtio: fix error enabling flags in Device Control register hw/pcie: fix Extended Configuration Space for devices with no Extended Capabilities Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-16	RAMBlocks: qemu_ram_is_shared	Dr. David Alan Gilbert	1	-0/+1
	Provide a helper to say whether a RAMBlock was created as a shared mapping. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-16	cirrus: switch to 4 MB video memory by default	Gerd Hoffmann	1	-0/+8
	Quoting cirrus source code: Follow real hardware, cirrus card emulated has 4 MB video memory. Also accept 8 MB/16 MB for backward compatibility. So just use 4MB by default. We decided to leave that at 8MB by default a while ago, for live migration compatibility reasons. But we have compat properties to handle that, so that isn't a compeling reason. This also removes some sanity check inconsistencies in the cirrus code. Some places check against the allocated video memory, some places check against the 4MB physical hardware has. Guest code can trigger asserts because of that. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 1489494514-15606-1-git-send-email-kraxel@redhat.com
2017-03-16	cirrus/vnc: zap bitblit support from console code.	Gerd Hoffmann	1	-7/+0
	There is a special code path (dpy_gfx_copy) to allow graphic emulation notify user interface code about bitblit operations carryed out by guests. It is supported by cirrus and vnc server. The intended purpose is to optimize display scrolls and just send over the scroll op instead of a full display update. This is rarely used these days though because modern guests simply don't use the cirrus blitter any more. Any linux guest using the cirrus drm driver doesn't. Any windows guest newer than winxp doesn't ship with a cirrus driver any more and thus uses the cirrus as simple framebuffer. So this code tends to bitrot and bugs can go unnoticed for a long time. See for example commit "3e10c3e vnc: fix qemu crash because of SIGSEGV" which fixes a bug lingering in the code for almost a year, added by commit "c7628bf vnc: only alloc server surface with clients connected". Also the vnc server will throttle the frame rate in case it figures the network can't keep up (send buffers are full). This doesn't work with dpy_gfx_copy, for any copy operation sent to the vnc client we have to send all outstanding updates beforehand, otherwise the vnc client might run the client side blit on outdated data and thereby corrupt the display. So this dpy_gfx_copy "optimization" might even make things worse on slow network links. Lets kill it once for all. Oh, and one more reason: Turns out (after writing the patch) we have a security bug in that code path ... Fixes: CVE-2016-9603 Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 1489494419-14340-1-git-send-email-kraxel@redhat.com
2017-03-16	Change the method to calculate dirty-pages-rate	Chao Fan	1	-1/+4
	In function cpu_physical_memory_sync_dirty_bitmap, file include/exec/ram_addr.h: if (src[idx][offset]) { unsigned long bits = atomic_xchg(&src[idx][offset], 0); unsigned long new_dirty; new_dirty = ~dest[k]; dest[k] \|= bits; new_dirty &= bits; num_dirty += ctpopl(new_dirty); } After these codes executed, only the pages not dirtied in bitmap(dest), but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated. For example: When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b00001111, and atomic_rcu_read(&migration_bitmap_rcu)->bmap = 0b00000011, the new_dirty will be 0b00001100, and this function will return 2 but not 4 which is expected. the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new, so these should be calculated also. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-15	ide: core: add cleanup function	Li Qiang	1	-0/+1
	As the pci ahci can be hotplug and unplug, in the ahci unrealize function it should free all the resource once allocated in the realized function. This patch add ide_exit to free the resource. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Message-id: 1488449293-80280-3-git-send-email-liqiang6-s@360.cn Signed-off-by: John Snow <jsnow@redhat.com>
2017-03-16	hw/virtio: fix Power Management Control Register for PCI Express virtio devices	Marcel Apfelbaum	2	-0/+6
	Make Power Management State flag writable to conform with the PCI Express spec. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-16	hw/virtio: fix Link Control Register for PCI Express virtio devices	Marcel Apfelbaum	2	-0/+7
	Make several Link Control Register flags writable to conform with the PCI Express spec. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-16	hw/virtio: fix error enabling flags in Device Control register	Marcel Apfelbaum	1	-0/+4
	When the virtio devices are PCI Express, make error-enabling flags writable to respect the PCIe spec. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-16	hw/pcie: fix Extended Configuration Space for devices with no Extended ↵	Marcel Apfelbaum	2	-0/+6
	Capabilities Absence of any Extended Capabilities is required to be indicated by an Extended Capability header with a Capability ID of 0000h, a Capability Version of 0h, and a Next Capability Offset of 000h. Instead of inserting a 'NULL' capability is simpler to mark the start of the Extended Configuration Space as read-only to achieve the same behaviour. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-15	pci: introduce a bus master container	Jason Wang	1	-0/+1
	96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU after caching ring translations") tries to make IOMMU works with virtio memory region cache, but it requires IOMMU to be created before any virtio devices. This is sub optimal, fixing this by introduce a bus master container to make sure address space can be initialized during device registering, and then we can safely set alias and make bus_master_enable_region as its subregion during bus master initialization. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-14	icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread	Paolo Bonzini	1	-0/+24
	icount has become much slower after tcg_cpu_exec has stopped using the BQL. There is also a latent bug that is masked by the slowness. The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL timer now has to wake up the I/O thread and wait for it. The rendez-vous is mediated by the BQL QemuMutex: - handle_icount_deadline wakes up the I/O thread with BQL taken - the I/O thread wakes up and waits on the BQL - the VCPU thread releases the BQL a little later - the I/O thread raises an interrupt, which calls qemu_cpu_kick - the VCPU thread notices the interrupt, takes the BQL to process it and waits on it All this back and forth is extremely expensive, causing a 6 to 8-fold slowdown when icount is turned on. One may think that the issue is that the VCPU thread is too dependent on the BQL, but then the latent bug comes in. I first tried removing the BQL completely from the x86 cpu_exec, only to see everything break. The only way to fix it (and make everything slow again) was to add a dummy BQL lock/unlock pair. This is because in -icount mode you really have to process the events before the CPU restarts executing the next instruction. Therefore, this series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in the vCPU thread when running in icount mode. The required changes include: - make the timer notification callback wake up TCG's single vCPU thread when run from another thread. By using async_run_on_cpu, the callback can override all_cpu_threads_idle() when the CPU is halted. - move handle_icount_deadline after qemu_tcg_wait_io_event, so that the timer notification callback is invoked after the dummy work item wakes up the vCPU thread - make handle_icount_deadline run the timers instead of just waking the I/O thread. - stop processing the timers in the main loop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	cpus: define QEMUTimerListNotifyCB for QEMU system emulation	Paolo Bonzini	2	-2/+3
	There is no change for now, because the callback just invokes qemu_notify_event. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h	Paolo Bonzini	2	-1/+2
	This dependency is the wrong way, and we will need util/qemu-timer.h from sysemu/cpus.h in the next patch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	mem-prealloc: reduce large guest start-up and migration time.	Jitendra Kolhe	1	-1/+2
	Using "-mem-prealloc" option for a large guest leads to higher guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest \| Start-up time \| Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB \| 54m11.796s \| 75m43.843s 64 Core - 1TB \| 8m56.576s \| 14m29.049s 64 Core - 256GB \| 2m11.245s \| 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB \| 5m1.027s \| 34m10.565s 64 Core - 1TB \| 1m10.366s \| 8m28.188s 64 Core - 256GB \| 0m19.040s \| 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB \| 1m58.970s \| 31m43.400s 64 Core - 1TB \| 0m39.885s \| 7m55.289s 64 Core - 256GB \| 0m11.960s \| 2m0.135s ----------------------------------------------------------------------- Changed in v2: - modify number of memset threads spawned to min(smp_cpus, 16). - removed 64GB memory restriction for spawning memset threads. Changed in v3: - limit number of threads spawned based on min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus) - implement memset thread specific siglongjmp in SIGBUS signal_handler. Changed in v4 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread as main thread no longer touches any pages. - simplify code my returning memset_thread_failed status from touch_all_pages. Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com> Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	memory_region: Fix name comments	Dr. David Alan Gilbert	1	-6/+12
	The 'name' parameter to memory_region_init_* had been marked as debug only, however vmstate_region_ram uses it as a parameter to qemu_ram_set_idstr to set RAMBlock names and these form part of the migration stream. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170309152708.30635-1-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170314' ↵	Peter Maydell	1	-0/+2
	into staging ppc patch queue for 2017-03-14 This set has a handful og bugfixes to go into qemu-2.9. This includes an update to the dtc/libfdt submodule which will fix the build errors seen on some distributions. # gpg: Signature made Tue 14 Mar 2017 04:00:41 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170314: dtc: Update submodule to avoid build errors pseries: Don't expose PCIe extended config space on older machine types target/ppc: fix cpu_ov setting for 32-bit target/ppc: Fix wrong number of UAMR register Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-14	build: include sys/sysmacros.h for major() and minor()	Christopher Covington	1	-0/+4
	The definition of the major() and minor() macros are moving within glibc to <sys/sysmacros.h>. Include this header when it is available to avoid the following sorts of build-stopping messages: qga/commands-posix.c: In function ‘dev_major_minor’: qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined by <sys/sysmacros.h>. For historical compatibility, it is currently defined by <sys/types.h> as well, but we plan to remove this soon. To use "major", include <sys/sysmacros.h> directly. If you did not intend to use a system-defined macro "major", you should undefine it after including <sys/types.h>. [-Werror] devmajor = major(st.st_rdev); ^~~~~~~~~~~~~~~~~~~~~~~~~~ qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined by <sys/sysmacros.h>. For historical compatibility, it is currently defined by <sys/types.h> as well, but we plan to remove this soon. To use "minor", include <sys/sysmacros.h> directly. If you did not intend to use a system-defined macro "minor", you should undefine it after including <sys/types.h>. [-Werror] devminor = minor(st.st_rdev); ^~~~~~~~~~~~~~~~~~~~~~~~~~ The additional include allows the build to complete on Fedora 26 (Rawhide) with glibc version 2.24.90. Signed-off-by: Christopher Covington <cov@codeaurora.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-14	pseries: Don't expose PCIe extended config space on older machine types	David Gibson	1	-0/+2
	bb9986452 "spapr_pci: Advertise access to PCIe extended config space" allowed guests to access the extended config space of PCI Express devices via the PAPR interfaces, even though the paravirtualized bus mostly acts like plain PCI. However, that patch enabled access unconditionally, including for existing machine types, which is an unwise change in behaviour. This patch limits the change to pseries-2.9 (and later) machine types. Suggested-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-10	i386: Change stepping of Haswell to non-blacklisted value	Eduardo Habkost	1	-0/+5
	glibc blacklists TSX on Haswell CPUs with model==60 and stepping < 4. To make the Haswell CPU model more useful, make those guests actually use TSX by changing CPU stepping to 4. References: * glibc commit 2702856bf45c82cf8e69f2064f5aa15c0ceb6359 https://sourceware.org/git/?p=glibc.git;a=commit;h=2702856bf45c82cf8e69f2064f5aa15c0ceb6359 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170309181212.18864-4-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-03-08	Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging	Peter Maydell	2	-5/+5
	Block layer fixes for 2.9.0-rc0 # gpg: Signature made Tue 07 Mar 2017 14:59:18 GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (27 commits) commit: Don't use error_abort in commit_start block: Don't use error_abort in blk_new_open sheepdog: Support blockdev-add qapi-schema: Rename SocketAddressFlat's variant tcp to inet qapi-schema: Rename GlusterServer to SocketAddressFlat gluster: Plug memory leaks in qemu_gluster_parse_json() gluster: Don't duplicate qapi-util.c's qapi_enum_parse() gluster: Drop assumptions on SocketTransport names sheepdog: Implement bdrv_parse_filename() sheepdog: Use SocketAddress and socket_connect() sheepdog: Report errors in pseudo-filename more usefully sheepdog: Don't truncate long VDI name in _open(), _create() sheepdog: Fix snapshot ID parsing in _open(), _create, _goto() sheepdog: Mark sd_snapshot_delete() lossage FIXME sheepdog: Fix error handling sd_create() sheepdog: Fix error handling in sd_snapshot_delete() sheepdog: Defuse time bomb in sd_open() error handling block: Fix error handling in bdrv_replace_in_backing_chain() block: Handle permission errors in change_parent_backing_link() block: Ignore multiple children in bdrv_check_update_perm() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-07	qapi: New qobject_input_visitor_new_str() for convenience	Markus Armbruster	1	-0/+12
	Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488317230-26248-21-git-send-email-armbru@redhat.com>
2017-03-07	qapi: New parse_qapi_name()	Markus Armbruster	1	-0/+2
	Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1488317230-26248-19-git-send-email-armbru@redhat.com>
2017-03-07	qobject: Propagate parse errors through qobject_from_json()	Markus Armbruster	1	-1/+1
	The next few commits will put the errors to use where appropriate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1488317230-26248-13-git-send-email-armbru@redhat.com>
2017-03-07	qobject: Propagate parse errors through qobject_from_jsonv()	Markus Armbruster	1	-1/+2
	The next few commits will put the errors to use where appropriate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1488317230-26248-9-git-send-email-armbru@redhat.com>
2017-03-07	qapi: qobject input visitor variant for use with keyval_parse()	Daniel P. Berrange	1	-0/+9
	Currently the QObjectInputVisitor assumes that all scalar values are directly represented as the final types declared by the thing being visited. i.e. it assumes an 'int' is using QInt, and a 'bool' is using QBool, etc. This is good when QObjectInputVisitor is fed a QObject that came from a JSON document on the QMP monitor, as it will strictly validate correctness. To allow QObjectInputVisitor to be reused for visiting a QObject originating from keyval_parse(), an alternative mode is needed where all the scalars types are represented as QString and converted on the fly to the final desired type. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1475246744-29302-8-git-send-email-berrange@redhat.com> Rebased, conflicts resolved, commit message updated to refer to keyval_parse(). autocast replaced by keyval in identifiers, noautocast replaced by fail in tests. Fix qobject_input_type_uint64_keyval() not to reject '-', for QemuOpts compatibility: replace parse_uint_full() by open-coded parse_option_number(). The next commit will add suitable tests. Leave out the fancy ERANGE error reporting for now, but add a TODO comment. Add it qobject_input_type_int64_keyval() and qobject_input_type_number_keyval(), too. Open code parse_option_bool() and parse_option_size() so we have to call qobject_input_get_name() only when actually needed. Again, leave out ERANGE error reporting for now. QAPI/QMP downstream extension prefixes __RFQDN_ don't work, because keyval_parse() splits them at '.'. This will be addressed later in the series. qobject_input_type_int64_keyval(), qobject_input_type_uint64_keyval(), qobject_input_type_number_keyval() tweaked for style. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1488317230-26248-5-git-send-email-armbru@redhat.com>
2017-03-07	keyval: New keyval_parse()	Markus Armbruster	1	-0/+3
	keyval_parse() parses KEY=VALUE,... into a QDict. Works like qemu_opts_parse(), except: * Returns a QDict instead of a QemuOpts (d'oh). * Supports nesting, unlike QemuOpts: a KEY is split into key fragments at '.' (dotted key convention; the block layer does something similar on top of QemuOpts). The key fragments are QDict keys, and the last one's value is updated to VALUE. * Each key fragment may be up to 127 bytes long. qemu_opts_parse() limits the entire key to 127 bytes. * Overlong key fragments are rejected. qemu_opts_parse() silently truncates them. * Empty key fragments are rejected. qemu_opts_parse() happily accepts empty keys. * It does not store the returned value. qemu_opts_parse() stores it in the QemuOptsList. * It does not treat parameter "id" specially. qemu_opts_parse() ignores all but the first "id", and fails when its value isn't id_wellformed(), or duplicate (a QemuOpts with the same ID is already stored). It also screws up when a value contains ",id=". * Implied value is not supported. qemu_opts_parse() desugars "foo" to "foo=on", and "nofoo" to "foo=off". * An implied key's value can't be empty, and can't contain ','. I intend to grow this into a saner replacement for QemuOpts. It'll take time, though. Note: keyval_parse() provides no way to do lists, and its key syntax is incompatible with the __RFQDN_ prefix convention for downstream extensions, because it blindly splits at '.', even in __RFQDN_. Both issues will be addressed later in the series. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488317230-26248-4-git-send-email-armbru@redhat.com>
2017-03-07	block: Fix error handling in bdrv_replace_in_backing_chain()	Kevin Wolf	2	-4/+4
	When adding an Error parameter, bdrv_replace_in_backing_chain() would become nothing more than a wrapper around change_parent_backing_link(). So make the latter public, renamed as bdrv_replace_node(), and remove bdrv_replace_in_backing_chain(). Most of the callers just remove a node from the graph that they just inserted, so they can use &error_abort, but completion of a mirror job with 'replaces' set can actually fail. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>