summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-11-11block: Disallow snapshots if the overlay doesn't support backing filesAlberto Garcia1-0/+5
This addresses scenarios like this one: { 'execute': 'blockdev-add', 'arguments': { 'options': { 'driver': 'qcow2', 'node-name': 'new0', 'file': { 'driver': 'file', 'filename': 'new.qcow2', 'node-name': 'file0' } } } } { 'execute': 'blockdev-snapshot', 'arguments': { 'node': 'virtio0', 'overlay': 'file0' } } Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11throttle: Use bs->throttle_state instead of bs->io_limits_enabledAlberto Garcia4-7/+10
There are two ways to check for I/O limits in a BlockDriverState: - bs->throttle_state: if this pointer is not NULL, it means that this BDS is member of a throttling group, its ThrottleTimers structure has been initialized and its I/O limits are ready to be applied. - bs->io_limits_enabled: if true it means that the throttle_state pointer is valid _and_ the limits are currently enabled. The latter is used in several places to check whether a BDS has I/O limits configured, but what it really checks is whether requests are being throttled or not. For example, io_limits_enabled can be temporarily set to false in cases like bdrv_read_unthrottled() without otherwise touching the throtting configuration of that BDS. This patch replaces bs->io_limits_enabled with bs->throttle_state in all cases where what we really want to check is the existence of I/O limits, not whether they are currently enabled or not. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11throttle: Check for pending requests in throttle_group_unregister_bs()Alberto Garcia1-0/+7
throttle_group_unregister_bs() removes a BlockDriverState from its throttling group and destroys the timers. This means that there must be no pending throttled requests at that point (because it would be impossible to complete them), so the caller has to drain them first. At the moment throttle_group_unregister_bs() is only called from bdrv_io_limits_disable(), which already takes care of draining the requests, so there's nothing to worry about, but this patch makes this invariant explicit in the documentation and adds the relevant assertions. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11qemu-img: add check for zero-length job lenJohn Snow1-1/+2
The mirror job doesn't update its total length until it has already started running, so we should translate a zero-length job-len as meaning 0%. Otherwise, we may get divide-by-zero faults. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11qcow2: avoid misaligned 64bit bswapJohn Snow1-4/+7
If we create a buffer directly on the stack by using 12 bytes, there's no guarantee the 64bit value we want to swap will be aligned, which could cause errors with undefined behavior. Spotted with clang -fsanitize=undefined and observed in iotests 15, 26, 44, 115 and 121. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11qemu-iotests: Test the reopening of overlay_bs in 'block-commit'Alberto Garcia2-2/+32
The 'block-commit' command needs the overlay image of 'top' to be opened in read-write mode in order to update the backing file string. If 'top' is not the active layer or its backing file then its overlay needs to be reopened during the block job. This is a test case for that scenario. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11commit: reopen overlay_bs before baseAlberto Garcia1-4/+4
'block-commit' needs write access to two different nodes of the chain: - 'base', because that's where the data is written to. - the overlay of 'top', because it needs to update the backing file string to point to 'base' after the operation. Both images have to be opened in read-write mode, and commit_start() takes care of reopening them if necessary. With the current implementation, however, when overlay_bs is reopened in read-write mode it has the side effect of making 'base' read-only again, eventually making 'block-commit' fail. This needs to be fixed in bdrv_reopen(), but until we get to that it can be worked around simply by swapping the order of base and overlay_bs in the reopen queue. In order to reproduce this bug, overlay_bs needs to be initially in read-only mode. That is: the 'top' parameter of 'block-commit' cannot be the active layer nor its immediate backing chain. Cc: qemu-stable@nongnu.org Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: add tests for the 'blockdev-snapshot' commandAlberto Garcia2-8/+128
Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: add a 'blockdev-snapshot' QMP commandAlberto Garcia4-61/+172
One of the limitations of the 'blockdev-snapshot-sync' command is that it does not allow passing BlockdevOptions to the newly created snapshots, so they are always opened using the default values. Extending the command to allow passing options is not a practical solution because there is overlap between those options and some of the existing parameters of the command. This patch introduces a new 'blockdev-snapshot' command with a simpler interface: it just takes two references to existing block devices that will be used as the source and target for the snapshot. Since the main difference between the two commands is that one of them creates and opens the target image, while the other uses an already opened one, the bulk of the implementation is shared. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: support passing 'backing': '' to 'blockdev-add'Alberto Garcia1-0/+7
Passing an empty string allows opening an image but not its backing file. This was already described in the API documentation, only the implementation was missing. This is useful for creating snapshots using images opened with blockdev-add, since they are not supposed to have a backing image before the operation. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: rename BlockdevSnapshot to BlockdevSnapshotSyncAlberto Garcia3-6/+6
We will introduce the 'blockdev-snapshot' command that will require its own struct for the parameters, so we need to rename this one in order to avoid name clashes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: check for existing device IDs in external_snapshot_prepare()Alberto Garcia1-2/+3
The 'snapshot-node-name' parameter of blockdev-snapshot-sync allows setting the node name of the image that is going to be created. Before creating the image, external_snapshot_prepare() checks that the name is not already being used. The check is however incomplete since it only considers existing node names, but node names must not clash with device IDs either because they share the same namespace. If the user attempts to create a snapshot using the name of an existing device for the 'snapshot-node-name' parameter the operation will eventually fail, but only after the new image has been created. This patch replaces bdrv_find_node() with bdrv_lookup_bs() to extend the check to existing device IDs, and thus detect possible name clashes before the new image is created. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11iotests: Add test for change-related QMP commandsMax Reitz3-0/+726
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11hmp: Add read-only-mode option to change commandMax Reitz2-4/+38
Expose the new read-only-mode option of 'blockdev-change-medium' for the 'change' HMP command. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: read-only-mode for blockdev-change-mediumMax Reitz6-4/+72
Add an option to qmp_blockdev_change_medium() which allows changing the read-only status of the block device whose medium is changed. Some drives do not have a inherently fixed read-only status; for instance, floppy disks can be set read-only or writable independently of the drive. Some users may find it useful to be able to therefore change the read-only status of a block device when changing the medium. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11hmp: Use blockdev-change-medium for change commandMax Reitz1-12/+15
Use separate code paths for the two overloaded functions of the 'change' HMP command, and invoke the 'blockdev-change-medium' QMP command if used on a block device (by calling qmp_blockdev_change_medium()). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11qmp: Introduce blockdev-change-mediumMax Reitz7-12/+69
Introduce a new QMP command 'blockdev-change-medium' which is intended to replace the 'change' command for block devices. The existing function qmp_change_blockdev() is accordingly renamed to qmp_blockdev_change_medium(). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: Inquire tray state before tray-moved eventsMax Reitz1-10/+7
blk_dev_change_media_cb() is called for all potential tray movements; however, it is possible to request closing the tray but nothing actually happening (on a floppy disk drive without a medium). Thus, the actual tray status should be inquired before sending a tray-moved event (and an event should be sent whenever the status changed). Checking @load is now superfluous; it was necessary because it was possible to change a medium without having explicitly opened the tray and closed it again (or it might have been possible, at least). This is no longer possible, though. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: Implement change with basic operationsMax Reitz1-110/+68
Implement 'change' on block devices by calling blockdev-open-tray, blockdev-remove-medium, blockdev-insert-medium (a variation of that which does not need a node-name) and blockdev-close-tray. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: Implement eject with basic operationsMax Reitz1-6/+5
Implement 'eject' by calling blockdev-open-tray and blockdev-remove-medium. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: Add blockdev-insert-mediumMax Reitz3-0/+110
And a helper function for that, which directly takes a pointer to the BDS to be inserted instead of its node-name (which will be used for implementing 'change' using blockdev-insert-medium). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: Add blockdev-remove-mediumMax Reitz3-0/+112
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: Add blockdev-close-trayMax Reitz3-0/+74
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11blockdev: Add blockdev-open-trayMax Reitz3-0/+116
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: Add functions for inheriting a BBRSMax Reitz2-0/+29
In order to open a BDS which inherits a BB's root state, blk_get_open_flags_from_root_state() is used to inquire the flags to be passed to bdrv_open(), and blk_apply_root_state() is used to apply the remaining state after the BDS has been opened. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: Make bdrv_states publicMax Reitz2-2/+3
When inserting a BDS tree into a BB, we will need to add the root BDS to this list. Since we will want to do that in the blockdev-insert-medium implementation in blockdev.c, we will need access to it there. This patch is not exactly elegant, but bdrv_states will be removed in the future anyway because we no longer need it since we have BBs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: Add blk_remove_bs()Max Reitz2-0/+13
This function removes the BlockDriverState associated with the given BlockBackend from that BB and sets the BDS pointer in the BB to NULL. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11block: Don't call blk_bs() twice in bdrv_lookup_bs()Alberto Garcia1-3/+3
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11Merge remote-tracking branch 'remotes/dgibson/tags/ppc-next-20151111' into ↵Peter Maydell7-6/+32
staging ppc patch queue - 2015-11-11 Highlights: - Updated SLOF version for "pseries machine - Bugfix / cleanup for KVM hash page table allocation # gpg: Signature made Wed 11 Nov 2015 02:30:51 GMT using RSA key ID 20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-next-20151111: spapr: Handle failure of KVM_PPC_ALLOCATE_HTAB ioctl ppc: Let kvmppc_reset_htab() return 0 for !CONFIG_KVM pseries: Update SLOF firmware image to qemu-slof-20151103 ppc: Add/Re-introduce MMU model definitions needed by PR KVM Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-11spapr: Handle failure of KVM_PPC_ALLOCATE_HTAB ioctlBharata B Rao1-4/+16
KVM_PPC_ALLOCATE_HTAB ioctl can return -ENOMEM for KVM guests and QEMU never handled this correctly. But this didn't cause any problems till now as KVM_PPC_ALLOCATE_HTAB ioctl returned with smaller than requested HTAB when enough contiguous memory wasn't available in the host. After the proposed kernel change: https://patchwork.ozlabs.org/patch/530501/, KVM_PPC_ALLOCATE_HTAB ioctl will not fallback to lower sized HTAB allocation and will fail if requested HTAB size can't be met. Check for such failures in QEMU and abort appropriately. This will prevent guest kernel from hanging/freezing during early boot by doing graceful exit when host is unable to allocate requested HTAB. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11ppc: Let kvmppc_reset_htab() return 0 for !CONFIG_KVMBharata B Rao1-1/+1
The !CONFIG_KVM implementation of kvmppc_reset_htab() returns -1 by default. Change this to return 0 so that we fall back to user space HTAB allocation for emulated guests. This fixes the make check failures for ppc64 emulated target. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11pseries: Update SLOF firmware image to qemu-slof-20151103Alexey Kardashevskiy3-1/+1
The changes are: 1. supports recent binutils; 2. 64bit BARs behind PCI bridges supported; 3. Many fixes for USB keyboard support - keys, XHCI; 4. virtio-vga support. This image was built with: gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) GNU ld version 2.23.2 The full changelog is: > version: update to 20151103 > documentation: Add a clause about signing off > qemu/js2x/client: Support binutils >= 2.25.1 > Fix special keys on USB > Fix function keys on USB > pci-scan: program 64-bit mem bar range in pci-bridge bar > Allow to build SLOF on Little Endian host > usb-xhci: add keyboard support > usb-xhci: ready the link trb early > usb-xhci: scan usb high speed ports > usb-xhci: bulk improve event handling loop > usb-xhci: return on allocation failure > usb-xhci: add delay in shutdown path > usb-xhci: event trbs does not need link trb > usb-hid: refactor usb key reading > takeover: Fix header includes > board-js2x: Add missing file dma-function.fs > vga: Add support for virtio-vga > qemu-vga: Use MMIO BAR instead of legacy IO ports > slof: Change call_c() function to a proper assembler function Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11ppc: Add/Re-introduce MMU model definitions needed by PR KVMBharata B Rao2-0/+14
Commit aa4bb5875231 (ppc: Add mmu_model defines for arch 2.03 and 2.07) removed the mmu_model definition POWERPC_MMU_2_06a which is needed by PR KVM. Reintroduce it and also add POWERPC_MMU_2_07a. This fixes QEMU crash (qemu: fatal: Unknown MMU model) during booting of PR KVM guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-10Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20151110.0' ↵Peter Maydell3-16/+42
into staging VFIO updates 2015-11-10 - Make Windows happy with vfio-pci devices exposed on conventional PCI buses on q35 by hiding PCIe capability (Alex Williamson) - Convert to g_new() where appropriate (Markus Armbruster) # gpg: Signature made Tue 10 Nov 2015 19:46:41 GMT using RSA key ID 3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" # gpg: aka "Alex Williamson <alex@shazbot.org>" # gpg: aka "Alex Williamson <alwillia@redhat.com>" # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" * remotes/awilliam/tags/vfio-update-20151110.0: vfio: Use g_new() & friends where that makes obvious sense vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10vfio: Use g_new() & friends where that makes obvious senseMarkus Armbruster3-11/+11
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Same Coccinelle semantic patch as in commit b45c03f. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-11-10vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMsAlex Williamson1-5/+31
When we have a PCIe VM, such as Q35, guests start to care more about valid configurations of devices relative to the VM view of the PCI topology. Windows will error with a Code 10 for an assigned device if a PCIe capability is found for a device on a conventional bus. We also have the possibility of IOMMUs, like VT-d, where the where the guest may be acutely aware of valid express capabilities on physical hardware. Some devices, like tg3 are adversely affected by this due to driver dependencies on the PCIe capability. The only solution for such devices is to attach them to an express capable bus in the VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-11-10Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151110' ↵Peter Maydell35-251/+4168
into staging migration/next for 20151110 # gpg: Signature made Tue 10 Nov 2015 14:23:26 GMT using RSA key ID 5872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" * remotes/juanquintela/tags/migration/20151110: (57 commits) migration: qemu_savevm_state_cleanup becomes mandatory operation Inhibit ballooning during postcopy Disable mlock around incoming postcopy End of migration for postcopy Postcopy: Mark nohugepage before discard postcopy: Wire up loadvm_postcopy_handle_ commands Start up a postcopy/listener thread ready for incoming page data Postcopy; Handle userfault requests Round up RAMBlock sizes to host page sizes Host page!=target page: Cleanup bitmaps Don't iterate on precopy-only devices during postcopy Don't sync dirty bitmaps in postcopy postcopy: Check order of received target pages Postcopy: Use helpers to map pages during migration postcopy_ram.c: place_page and helpers Page request: Consume pages off the post-copy queue Page request: Process incoming page request Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Postcopy: End of iteration Postcopy: Postcopy startup in migration thread ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10migration: qemu_savevm_state_cleanup becomes mandatory operationDenis V. Lunev1-1/+1
since commit commit 94f5a43704129ca4995aa3385303c5ae225bde42 Author: Liang Li <liang.z.li@intel.com> Date: Mon Nov 2 15:37:00 2015 +0800 migration: defer migration_end & blk_mig_cleanup when actual .cleanup callbacks calling was removed from complete operations. The patch fixes regression introduced by the commit above results in 100% reliable assert for virtio-scsi VM with iothreads enabled during 'virsh create-snapshot' operation: assert(i != mr->ioeventfd_nb); memory_region_del_eventfd virtio_pci_set_host_notifier_internal virtio_pci_set_host_notifier virtio_scsi_dataplane_start virtio_scsi_handle_cmd virtio_queue_notify_vq virtio_queue_host_notifier_read aio_dispatch Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Liang Li <liang.z.li@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Inhibit ballooning during postcopyDr. David Alan Gilbert4-1/+25
Postcopy detects accesses to pages that haven't been transferred yet using userfaultfd, and it causes exceptions on pages that are 'not present'. Ballooning also causes pages to be marked as 'not present' when the guest inflates the balloon. Potentially a balloon could be inflated to discard pages that are currently inflight during postcopy and that may be arriving at about the same time. To avoid this confusion, disable ballooning during postcopy. When disabled we drop balloon requests from the guest. Since ballooning is generally initiated by the host, the management system should avoid initiating any balloon instructions to the guest during migration, although it's not possible to know how long it would take a guest to process a request made prior to the start of migration. Guest initiated ballooning will not know if it's really freed a page of host memory or not. Queueing the requests until after migration would be nice, but is non-trivial, since the set of inflate/deflate requests have to be compared with the state of the page to know what the final outcome is allowed to be. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Disable mlock around incoming postcopyDr. David Alan Gilbert2-0/+25
Userfault doesn't work with mlock; mlock is designed to nail down pages so they don't move, userfault is designed to tell you when they're not there. munlock the pages we userfault protect before postcopy. mlock everything again at the end if mlock is enabled. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10End of migration for postcopyDr. David Alan Gilbert2-3/+29
Tweak the end of migration cleanup; we don't want to close stuff down at the end of the main stream, since the postcopy is still sending pages on the other thread. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Postcopy: Mark nohugepage before discardDr. David Alan Gilbert4-4/+59
Prior to servicing userfault requests we must ensure we've not got huge pages in the area that might include non-transferred memory, since a hugepage could incorrectly mark the whole huge page as present. We mark the area as non-huge page (nhp) just before we perform discards; the discard code now tells us to discard any areas that haven't been sent (as well as any that are redirtied); any already formed transparent-huge-pages get fragmented by this discard process if they cotnain any discards. Transparent huge pages that have been entirely transferred and don't contain any discards are not broken by this mechanism; they stay as huge pages. By starting postcopy after a full precopy pass, many of the pages then stay as huge pages; this is important for maintaining performance after the end of the migration. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10postcopy: Wire up loadvm_postcopy_handle_ commandsDr. David Alan Gilbert2-1/+29
Wire up more of the handlers for the commands on the destination side, in particular loadvm_postcopy_handle_run now has enough to start the guest running. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Start up a postcopy/listener thread ready for incoming page dataDr. David Alan Gilbert4-1/+90
The loading of a device state (during postcopy) may access guest memory that's still on the source machine and thus might need a page fill; split off a separate thread that handles the incoming page data so that the original incoming migration code can finish off the device data. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Postcopy; Handle userfault requestsDr. David Alan Gilbert3-9/+158
userfaultfd is a Linux syscall that gives an fd that receives a stream of notifications of accesses to pages registered with it and allows the program to acknowledge those stalls and tell the accessing thread to carry on. We convert the requests from the kernel into messages back to the source asking for the pages. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Round up RAMBlock sizes to host page sizesDr. David Alan Gilbert1-4/+4
RAMBlocks that are not a multiple of host pages in length cause problems for postcopy (I've seen an ACPI table on aarch64 be 5k in length - i.e. 5x target-page), so round RAMBlock sizes up to a host-page. This potentially breaks migration compatibility due to changes in RAMBlock sizes; however: 1) x86 and s390 I think always have host=target page size 2) When I've tried on Power the block sizes already seem aligned. 3) I don't think there's anything else that maintains per-version machine-types for compatibility. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Host page!=target page: Cleanup bitmapsDr. David Alan Gilbert1-0/+172
Prior to the start of postcopy, ensure that everything that will be transferred later is a whole host-page in size. This is accomplished by discarding partially transferred host pages and marking any that are partially dirty as fully dirty. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Don't iterate on precopy-only devices during postcopyDr. David Alan Gilbert3-4/+13
During the postcopy phase we must not call the iterate method on precopy-only devices, since they may have done some cleanup during the _complete call at the end of the precopy phase. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10Don't sync dirty bitmaps in postcopyDr. David Alan Gilbert1-2/+5
Once we're in postcopy the source processors are stopped and memory shouldn't change any more, so there's no need to look at the dirty map. There are two notes to this: 1) If we do resync and a page had changed then the page would get sent again, which the destination wouldn't allow (since it might have also modified the page) 2) Before disabling this I'd seen very rare cases where a page had been marked dirtied although the memory contents are apparently identical Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10postcopy: Check order of received target pagesDr. David Alan Gilbert1-0/+11
Ensure that target pages received within a host page are in order. This shouldn't trigger, but in the cases where the sender goes wrong and sends stuff out of order it produces a corruption that's really nasty to debug. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>