peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-21	Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170421' ↵	Peter Maydell	1	-4/+4
	into staging migration/next for 20170421 # gpg: Signature made Fri 21 Apr 2017 11:28:13 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170421: (65 commits) hmp: info migrate_parameters format tunes hmp: info migrate_capability format tunes migration: rename max_size to threshold_size migration: set current_active_state once virtio-rng: stop virtqueue while the CPU is stopped migration: don't close a file descriptor while it can be in use ram: Remove migration_bitmap_extend() migration: Disable hotplug/unplug during migration qdev: Move qdev_unplug() to qdev-monitor.c qdev: Export qdev_hot_removed qdev: qdev_hotplug is really a bool migration: Remove MigrationState parameter from migration_is_idle() ram: Use RAMBitmap type for coherence ram: rename last_ram_offset() last_ram_pages() ram: Use ramblock and page offset instead of absolute offset ram: Change offset field in PageSearchStatus to page ram: Remember last_page instead of last_offset ram: Use page number instead of an address for the bitmap operations ram: reorganize last_sent_block ram: ram_discard_range() don't use the mis parameter ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-21	ram: Remove migration_bitmap_extend()	Juan Quintela	1	-2/+0
	We have disabled memory hotplug, so we don't need to handle migration_bitamp there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
2017-04-21	ram: rename last_ram_offset() last_ram_pages()	Juan Quintela	1	-1/+1
	We always use it as pages anyways. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-04-21	ram: Pass RAMBlock to bitmap_sync	Juan Quintela	1	-0/+2
	We change the meaning of start to be the offset from the beggining of the block. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-04-21	ram: Change num_dirty_pages_period type to uint64_t	Juan Quintela	1	-1/+1
	Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2017-04-20	intel_iommu: provide its own replay() callback	Peter Xu	1	-0/+2
	The default replay() don't work for VT-d since vt-d will have a huge default memory region which covers address range 0-(2^64-1). This will normally consumes a lot of time (which looks like a dead loop). The solution is simple - we don't walk over all the regions. Instead, we jump over the regions when we found that the page directories are empty. It'll greatly reduce the time to walk the whole region. To achieve this, we provided a page walk helper to do that, invoking corresponding hook function when we found an page we are interested in. vtd_page_walk_level() is the core logic for the page walking. It's interface is designed to suite further use case, e.g., to invalidate a range of addresses. Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-8-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20	memory: add MemoryRegionIOMMUOps.replay() callback	Peter Xu	1	-0/+2
	Originally we have one memory_region_iommu_replay() function, which is the default behavior to replay the translations of the whole IOMMU region. However, on some platform like x86, we may want our own replay logic for IOMMU regions. This patch adds one more hook for IOMMUOps for the callback, and it'll override the default if set. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-6-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20	memory: introduce memory_region_notify_one()	Peter Xu	1	-0/+15
	Generalizing the notify logic in memory_region_notify_iommu() into a single function. This can be further used in customized replay() functions for IOMMUs. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-5-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20	memory: provide iommu_replay_all()	Peter Xu	1	-0/+8
	This is an "global" version of existing memory_region_iommu_replay() - we announce the translations to all the registered notifiers, instead of a specific one. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-4-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20	memory: provide IOMMU_NOTIFIER_FOREACH macro	Peter Xu	1	-0/+3
	A new macro is provided to iterate all the IOMMU notifiers hooked under specific IOMMU memory region. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-3-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20	memory: add section range info for IOMMU notifier	Peter Xu	1	-1/+18
	In this patch, IOMMUNotifier.{start\|end} are introduced to store section information for a specific notifier. When notification occurs, we not only check the notification type (MAP\|UNMAP), but also check whether the notified iova range overlaps with the range of specific IOMMU notifier, and skip those notifiers if not in the listened range. When removing an region, we need to make sure we removed the correct VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well. This patch is solving the problem that vfio-pci devices receive duplicated UNMAP notification on x86 platform when vIOMMU is there. The issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK this (splitted IOMMU region) is only happening on x86. This patch also helps vhost to leverage the new interface as well, so that vhost won't get duplicated cache flushes. In that sense, it's an slight performance improvement. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com> [ehabkost: included extra vhost_iommu_region_del() change from Peter Xu] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-03	exec: revert MemoryRegionCache	Paolo Bonzini	1	-6/+4
	MemoryRegionCache did not know about virtio support for IOMMUs (because the two features were developed at the same time). Revert MemoryRegionCache to "normal" address_space_* operations for 2.9, as it is simpler than undoing the virtio patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-16	RAMBlocks: qemu_ram_is_shared	Dr. David Alan Gilbert	1	-0/+1
	Provide a helper to say whether a RAMBlock was created as a shared mapping. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-16	Change the method to calculate dirty-pages-rate	Chao Fan	1	-1/+4
	In function cpu_physical_memory_sync_dirty_bitmap, file include/exec/ram_addr.h: if (src[idx][offset]) { unsigned long bits = atomic_xchg(&src[idx][offset], 0); unsigned long new_dirty; new_dirty = ~dest[k]; dest[k] \|= bits; new_dirty &= bits; num_dirty += ctpopl(new_dirty); } After these codes executed, only the pages not dirtied in bitmap(dest), but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated. For example: When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b00001111, and atomic_rcu_read(&migration_bitmap_rcu)->bmap = 0b00000011, the new_dirty will be 0b00001100, and this function will return 2 but not 4 which is expected. the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new, so these should be calculated also. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-14	memory_region: Fix name comments	Dr. David Alan Gilbert	1	-6/+12
	The 'name' parameter to memory_region_init_* had been marked as debug only, however vmstate_region_ram uses it as a parameter to qemu_ram_set_idstr to set RAMBlock names and these form part of the migration stream. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170309152708.30635-1-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-04	Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' ↵	Peter Maydell	1	-0/+1
	into staging ppc patch queuye for 2017-03-03 This will probably be my last pull request before the hard freeze. It has some new work, but that has all been posted in draft before the soft freeze, so I think it's reasonable to include in qemu-2.9. This batch has: * A substantial amount of POWER9 work * Implements the legacy (hash) MMU for POWER9 * Some more preliminaries for implementing the POWER9 radix MMU * POWER9 has_work * Basic POWER9 compatibility mode handling * Removal of some premature tests * Some cleanups and fixes to the existing MMU code to make the POWER9 work simpler * A bugfix for TCG multiply adds on power * Allow pseries guests to access PCIe extended config space This also includes a code-motion not strictly in ppc code - moving getrampagesize() from ppc code to exec.c. This will make some future VFIO improvements easier, Paolo said it was ok to merge via my tree. # gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170303: target/ppc: rewrite f[n]m[add,sub] using float64_muladd spapr: Small cleanup of PPC MMU enums spapr_pci: Advertise access to PCIe extended config space target/ppc: Rework hash mmu page fault code and add defines for clarity target/ppc: Move no-execute and guarded page checking into new function target/ppc: Add execute permission checking to access authority check target/ppc: Add Instruction Authority Mask Register Check hw/ppc/spapr: Add POWER9 to pseries cpu models target/ppc/POWER9: Add cpu_has_work function for POWER9 target/ppc/POWER9: Add POWER9 pa-features definition target/ppc/POWER9: Add POWER9 mmu fault handler target/ppc: Don't gen an SDR1 on POWER9 and rework register creation target/ppc: Add patb_entry to sPAPRMachineState target/ppc/POWER9: Add POWERPC_MMU_V3 bit powernv: Don't test POWER9 CPU yet exec, kvm, target-ppc: Move getrampagesize() to common code target/ppc: Add POWER9/ISAv3.00 to compat_table Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-03	memory: Introduce DEVICE_HOST_ENDIAN for ram device	Yongji Xie	1	-0/+6
	At the moment ram device's memory regions are DEVICE_NATIVE_ENDIAN. It's incorrect. This memory region is backed by a MMIO area in host, so the uint64_t data that MemoryRegionOps read from/write to this area should be host-endian rather than target-endian. Hence, current code does not work when target and host endianness are different which is the most common case on PPC64. To fix it, this introduces DEVICE_HOST_ENDIAN for the ram device. This has been tested on PPC64 BE/LE host/guest in all possible combinations including TCG. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Message-Id: <1488171164-28319-1-git-send-email-xyjxie@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03	Merge branch 'icount-update' into HEAD	Paolo Bonzini	1	-28/+25
	Merge the original development branch due to breakage caused by the MTTCG merge. Conflicts: cpu-exec.c translate-common.c Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03	exec, kvm, target-ppc: Move getrampagesize() to common code	Alexey Kardashevskiy	1	-0/+1
	getrampagesize() returns the largest supported page size and mainly used to know if huge pages are enabled. However is implemented in target-ppc/kvm.c and not available in TCG or other architectures. This renames and moves gethugepagesize() to mmap-alloc.c where fd-based analog of it is already implemented. This renames and moves getrampagesize() to exec.c as it seems to be the common place for helpers like this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-28	postcopy: Record largest page size	Dr. David Alan Gilbert	1	-0/+1
	Record the largest page size in use; we'll need it soon for allocating temporary buffers. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-7-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-28	exec: ram_block_discard_range	Dr. David Alan Gilbert	1	-0/+1
	Create ram_block_discard_range in exec.c to replace postcopy_ram_discard_range and most of ram_discard_range. Those two routines are a bit of a weird combination, and ram_discard_range is about to get more complex for hugepages. It's OS dependent code (so shouldn't be in migration/ram.c) but it needs quite a bit of the innards of RAMBlock so doesn't belong in the os*.c. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-24	cputlb: introduce tlb_flush_*_all_cpus[_synced]	Alex Bennée	1	-3/+113
	This introduces support to the cputlb API for flushing all CPUs TLBs with one call. This avoids the need for target helpers to iterate through the vCPUs themselves. An additional variant of the API (_synced) will cause the source vCPUs work to be scheduled as "safe work". The result will be all the flush operations will be complete by the time the originating vCPU executes its safe work. The calling implementation can either end the TB straight away (which will then pick up the cpu->exit_request on entering the next block) or defer the exit until the architectural sync point (usually a barrier instruction). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24	cputlb: atomically update tlb fields used by tlb_reset_dirty	Alex Bennée	1	-2/+0
	The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags in TLB entries to force the slow-path on writes. This is used to mark page ranges containing code which has been translated so it can be invalidated if written to. To do this safely we need to ensure the TLB entries in question for all vCPUs are updated before we attempt to run the code otherwise a race could be introduced. To achieve this we atomically set the flag in tlb_reset_dirty_range and take care when setting it when the TLB entry is filled. On 32 bit systems attempting to emulate 64 bit guests we don't even bother as we might not have the atomic primitives available. MTTCG is disabled in this case and can't be forced on. The copy_tlb_helper function helps keep the atomic semantics in one place to avoid confusion. The dirty helper function is made static as it isn't used outside of cputlb. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24	cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap	Alex Bennée	1	-6/+7
	While the vargs approach was flexible the original MTTCG ended up having munge the bits to a bitmap so the data could be used in deferred work helpers. Instead of hiding that in cputlb we push the change to the API to make it take a bitmap of MMU indexes instead. For ARM some the resulting flushes end up being quite long so to aid readability I've tended to move the index shifting to a new line so all the bits being or-ed together line up nicely, for example: tlb_flush_page_by_mmuidx(other_cs, pageaddr, (1 << ARMMMUIdx_S1SE1) \| (1 << ARMMMUIdx_S1SE0)); Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [AT: SPARC parts only] Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> [PM: ARM parts only] Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-24	cputlb: introduce tlb_flush_* async work.	KONRAD Frederic	1	-0/+1
	Some architectures allow to flush the tlb of other VCPUs. This is not a problem when we have only one thread for all VCPUs but it definitely needs to be an asynchronous work when we are in true multithreaded work. We take the tb_lock() when doing this to avoid racing with other threads which may be invalidating TB's at the same time. The alternative would be to use proper atomic primitives to clear the tlb entries en-mass. This patch doesn't do anything to protect other cputlb function being called in MTTCG mode making cross vCPU changes. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [AJB: remove need for g_malloc on defer, make check fixes, tb_lock] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24	tcg: remove global exit_request	Alex Bennée	1	-3/+0
	There are now only two uses of the global exit_request left. The first ensures we exit the run_loop when we first start to process pending work and in the kick handler. This is just as easily done by setting the first_cpu->exit_request flag. The second use is in the round robin kick routine. The global exit_request ensured every vCPU would set its local exit_request and cause a full exit of the loop. Now the iothread isn't being held while running we can just rely on the kick handler to push us out as intended. We lightly re-factor the main vCPU thread to ensure cpu->exit_requests cause us to exit the main loop and process any IO requests that might come along. As an cpu->exit_request may legitimately get squashed while processing the EXCP_INTERRUPT exception we also check cpu->queued_work_first to ensure queued work is expedited as soon as possible. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24	tcg: rename tcg_current_cpu to tcg_current_rr_cpu	Alex Bennée	1	-1/+0
	..and make the definition local to cpus. In preparation for MTTCG the concept of a global tcg_current_cpu will no longer make sense. However we still need to keep track of it in the single-threaded case to be able to exit quickly when required. qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as well as qemu_kick_rr_cpu() which will become a no-op in MTTCG. For the time being the setting of the global exit_request remains. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2017-02-22	cpu-exec: unify icount_decr and tcg_exit_req	Paolo Bonzini	1	-28/+25
	The icount interrupt flag and tcg_exit_req serve almost the same purpose, let's make them completely the same. The former TB_EXIT_REQUESTED and TB_EXIT_ICOUNT_EXPIRED cases are unified, since we can distinguish them from the value of the interrupt flag. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-17	virtio: use MemoryRegionCache to access descriptors	Paolo Bonzini	1	-0/+2
	For now, the cache is created on every virtqueue_pop. Later on, direct descriptors will be able to reuse it. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-16	cpu-exec: fix icount out-of-bounds access	Paolo Bonzini	1	-0/+1
	When icount is active, tb_add_jump is surprisingly called with an out of bounds basic block index. I have no idea how that can work, but it does not seem like a good idea. Clear *last_tb for all TB_EXIT_ICOUNT_EXPIRED cases, even when all you have to do is refill icount_extra. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-31	trace: switch to modular code generation for sub-directories	Daniel P. Berrange	2	-2/+2
	Introduce rules in the top level Makefile that are able to generate trace.[ch] files in every subdirectory which has a trace-events file. The top level directory is handled specially, so instead of creating trace.h, it creates trace-root.h. This allows sub-directories to include the top level trace-root.h file, without ambiguity wrt to the trace.g file in the current sub-dir. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170125161417.31949-7-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-27	memory: hmp: add "-f" for "info mtree"	Peter Xu	1	-1/+1
	Adding one more option "-f" for "info mtree" to dump the flat views of all the address spaces. This will be useful to debug the memory rendering logic, also it'll be much easier with it to know what memory region is handling what address range. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1484556005-29701-3-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-20	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	3	-49/+75
	* QOM interface fix (Eduardo) * RTC fixes (Gaohuai, Igor) * Memory leak fixes (Li Qiang, me) * Ctrl-a b regression (Marc-André) * Stubs cleanups and fixes (Leif, me) * hxtool tweak (me) * HAX support (Vincent) * QemuThread, exec.c and SCSI fixes (Roman, Xinhua, me) * PC_COMPAT_2_8 fix (Marcelo) * stronger bitmap assertions (Peter) # gpg: Signature made Fri 20 Jan 2017 12:49:01 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (35 commits) pc.h: move x-mach-use-reliable-get-clock compat entry to PC_COMPAT_2_8 bitmap: assert that start and nr are non negative Revert "win32: don't run subprocess tests on Mingw32 platform" hax: add Darwin support Plumb the HAXM-based hardware acceleration support target/i386: Add Intel HAX files kvm: move cpu synchronization code KVM: PPC: eliminate unnecessary duplicate constants ramblock-notifier: new char: fix ctrl-a b not working exec: Add missing rcu_read_unlock x86: ioapic: fix fail migration when irqchip=split x86: ioapic: dump version for "info ioapic" x86: ioapic: add traces for ioapic hxtool: emit Texinfo headings as @subsection qemu-thread: fix qemu_thread_set_name() race in qemu_thread_create() serial: fix memory leak in serial exit scsi-block: fix direction of BYTCHK test for VERIFY commands pc: fix crash in rtc_set_memory() if initial cpu is marked as hotplugged acpi: filter based on CONFIG_ACPI_X86 rather than TARGET ... # Conflicts: # include/hw/i386/pc.h
2017-01-16	ramblock-notifier: new	Paolo Bonzini	3	-49/+75
	This adds a notify interface of ram block additions and removals. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-13	cputlb: drop flush_global flag from tlb_flush	Alex Bennée	1	-8/+6
	We have never has the concept of global TLB entries which would avoid the flush so we never actually use this flag. Drop it and make clear that tlb_flush is the sledge-hammer it has always been. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> [DG: ppc portions] Acked-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-10	memory: handle alias in memory_region_is_iommu()	Jason Wang	1	-0/+3
	Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2017-01-10	exec: introduce address_space_get_iotlb_entry()	Jason Wang	1	-0/+5
	This patch introduces a helper to query the iotlb entry for a possible iova. This will be used by later device IOTLB API to enable the capability for a dataplane (e.g vhost) to query the IOTLB. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-22	exec: introduce MemoryRegionCache	Paolo Bonzini	2	-0/+174
	Device models often have to perform multiple access to a single memory region that is known in advance, but would to use "DMA-style" functions instead of address_space_map/unmap. This can happen for example when the data has to undergo endianness conversion. Introduce a new data structure to cache the result of address_space_translate without forcing usage of a host address like address_space_map does. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	exec: introduce memory_ldst.inc.c	Paolo Bonzini	2	-15/+15
	Templatize the address_space_* and *_phys functions, so that we can add similar functions in the next patch that work with a lightweight, cache-like version of address_space_map/unmap. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-22	cpu_ldst.h: use correct guest address parameter	Bobby Bingham	1	-1/+1
	In the user emulation code path, tlb_vaddr_to_host erronesously passed vaddr as the guest address to be translated, instead of addr, the parameter which actually contained the guest address. This resulted in incorrect addresses being used when emulating block copy (mvc/mvpg) and block clear (xc) instructions for the s390x target. Signed-off-by: Bobby Bingham <koorogi@koorogi.info> Message-Id: <20161113050523.23909-1-koorogi@koorogi.info> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31	Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20161031.0' ↵	Peter Maydell	1	-15/+32
	into staging VFIO updates 2016-10-31 - Replace skip_dump with ram_device to denote device memory and mark as non-direct to avoid memcpy to MMIO - fixes RTL (Alex Williamson) - Skip zero-length sparse mmaps - avoids unnecessary warning (Alex Williamson) - Clear BARs on reset so guest doesn't assume programming on return from S3 (Ido Yariv) - Enable sub-page MMIO mmaps - performance improvement for devices with smaller BARs, iff both host and guest map them to full, aligned pages (Yongji Xie) # gpg: Signature made Mon 31 Oct 2016 17:26:47 GMT # gpg: using RSA key 0x239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" # gpg: aka "Alex Williamson <alex@shazbot.org>" # gpg: aka "Alex Williamson <alwillia@redhat.com>" # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-updates-20161031.0: vfio: Add support for mmapping sub-page MMIO BARs vfio/pci: fix out-of-sync BAR information on reset vfio: Handle zero-length sparse mmap ranges memory: Don't use memcpy for ram_device regions memory: Replace skip_dump flag with "ram_device" Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31	memory: Don't use memcpy for ram_device regions	Alex Williamson	1	-2/+4
	With a vfio assigned device we lay down a base MemoryRegion registered as an IO region, giving us read & write accessors. If the region supports mmap, we lay down a higher priority sub-region MemoryRegion on top of the base layer initialized as a RAM device pointer to the mmap. Finally, if we have any quirks for the device (ie. address ranges that need additional virtualization support), we put another IO sub-region on top of the mmap MemoryRegion. When this is flattened, we now potentially have sub-page mmap MemoryRegions exposed which cannot be directly mapped through KVM. This is as expected, but a subtle detail of this is that we end up with two different access mechanisms through QEMU. If we disable the mmap MemoryRegion, we make use of the IO MemoryRegion and service accesses using pread and pwrite to the vfio device file descriptor. If the mmap MemoryRegion is enabled and results in one of these sub-page gaps, QEMU handles the access as RAM, using memcpy to the mmap. Using either pread/pwrite or the mmap directly should be correct, but using memcpy causes us problems. I expect that not only does memcpy not necessarily honor the original width and alignment in performing a copy, but it potentially also uses processor instructions not intended for MMIO spaces. It turns out that this has been a problem for Realtek NIC assignment, which has such a quirk that creates a sub-page mmap MemoryRegion access. To resolve this, we disable memory_access_is_direct() for ram_device regions since QEMU assumes that it can use memcpy for those regions. Instead we access through MemoryRegionOps, which replaces the memcpy with simple de-references of standard sizes to the host memory. With this patch we attempt to provide unrestricted access to the RAM device, allowing byte through qword access as well as unaligned access. The assumption here is that accesses initiated by the VM are driven by a device specific driver, which knows the device capabilities. If unaligned accesses are not supported by the device, we don't want them to work in a VM by performing multiple aligned accesses to compose the unaligned access. A down-side of this philosophy is that the xp command from the monitor attempts to use the largest available access weidth, unaware of the underlying device. Using memcpy had this same restriction, but at least now an operator can dump individual registers, even if blocks of device memory may result in access widths beyond the capabilities of a given device (RTL NICs only support up to dword). Reported-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31	memory: Replace skip_dump flag with "ram_device"	Alex Williamson	1	-13/+28
	Setting skip_dump on a MemoryRegion allows us to modify one specific code path, but the restriction we're trying to address encompasses more than that. If we have a RAM MemoryRegion backed by a physical device, it not only restricts our ability to dump that region, but also affects how we should manipulate it. Here we recognize that MemoryRegions do not change to sometimes allow dumps and other times not, so we replace setting the skip_dump flag with a new initializer so that we know exactly the type of region to which we're applying this behavior. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31	tcg: comment on which functions have to be called with tb_lock held	Paolo Bonzini	1	-0/+1
	softmmu requires more functions to be thread-safe, because translation blocks can be invalidated from e.g. notdirty callbacks. Probably the same holds for user-mode emulation, it's just that no one has ever tried to produce a coherent locking there. This patch will guide the introduction of more tb_lock and tb_unlock calls for system emulation. Note that after this patch some (most) of the mentioned functions are still called outside tb_lock/tb_unlock. The next one will rectify this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-7-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31	translate-all: add DEBUG_LOCKING asserts	Alex Bennée	1	-0/+1
	This adds asserts to check the locking on the various translation engines structures. There are two sets of structures that are protected by locks. The first the l1map and PageDesc structures used to track which translation blocks are associated with which physical addresses. In user-mode this is covered by the mmap_lock. The second case are TB context related structures which are protected by tb_lock which is also user-mode only. Currently the asserts do nothing in SoftMMU mode but this will change for MTTCG. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-4-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-26	tcg: Add EXCP_ATOMIC	Richard Henderson	2	-0/+2
	When we cannot emulate an atomic operation within a parallel context, this exception allows us to stop the world and try again in a serial context. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-25	Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into ↵	Peter Maydell	1	-1/+0
	staging x86 and CPU queue, 2016-10-24 x2APIC support to APIC code, cpu_exec_init() refactor on all architectures, and other x86 changes. # gpg: Signature made Mon 24 Oct 2016 20:51:14 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/x86-pull-request: exec: call cpu_exec_exit() from a CPU unrealize common function exec: move cpu_exec_init() calls to realize functions exec: split cpu_exec_init() pc: q35: Bump max_cpus to 288 pc: Require IRQ remapping and EIM if there could be x2APIC CPUs pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs Increase MAX_CPUMASK_BITS from 255 to 288 pc: Clarify FW_CFG_MAX_CPUS usage comment pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode pc: apic_common: Restore APIC ID to initial ID on reset pc: apic_common: Extend APIC ID property to 32bit pc: Leave max apic_id_limit only in legacy cpu hotplug code acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254 pc: acpi: x2APIC support for SRAT table pc: acpi: x2APIC support for MADT table and _MAT method Conflicts: target-arm/cpu.c Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-24	exec: move cpu_exec_init() calls to realize functions	Laurent Vivier	1	-1/+0
	Modify all CPUs to call it from XXX_cpu_realizefn() function. Remove all the cannot_destroy_with_object_finalize_yet as unsafe references have been moved to cpu_exec_realizefn(). (tested with QOM command provided by commit 4c315c27) for arm: Setting of cpu->mp_affinity is moved from arm_cpu_initfn() to arm_cpu_realizefn() as setting of cpu_index is now done in cpu_exec_realizefn(). To avoid to overwrite an user defined value, we set it to an invalid value by default, and update it in realize function only if the value is still invalid. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24	cpu: Support a target CPU having a variable page size	Peter Maydell	1	-0/+9
	Support target CPUs having a page size which isn't knownn at compile time. To use this, the CPU implementation should: * define TARGET_PAGE_BITS_VARY * not define TARGET_PAGE_BITS * define TARGET_PAGE_BITS_MIN to the smallest value it might possibly want for TARGET_PAGE_BITS * call set_preferred_target_page_bits() in its realize function to indicate the actual preferred target page size for the CPU (and report any error from it) In CONFIG_USER_ONLY, the CPU implementation should continue to define TARGET_PAGE_BITS appropriately for the guest OS page size. Machines which want to take advantage of having the page size something larger than TARGET_PAGE_BITS_MIN must set the MachineClass minimum_page_bits field to a value which they guarantee will be no greater than the preferred page size for any CPU they create. Note that changing the target page size by setting minimum_page_bits is a migration compatibility break for that machine. For debugging purposes, attempts to use TARGET_PAGE_SIZE before it has been finally confirmed will assert. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-24	memory: add a per-AddressSpace list of listeners	Paolo Bonzini	1	-1/+2
	This speeds up MEMORY_LISTENER_CALL noticeably. Right now, with many PCI devices you have N regions added to M AddressSpaces (M = # PCI devices with bus-master enabled) and each call looks up the whole listener list, with at least M listeners in it. Because most of the regions in N are BARs, which are also roughly proportional to M, the whole thing is O(M^3). This changes it to O(M^2), which is the best we can do without rewriting the whole thing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>