peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-03	exec: revert MemoryRegionCache	Paolo Bonzini	1	-53/+11
	MemoryRegionCache did not know about virtio support for IOMMUs (because the two features were developed at the same time). Revert MemoryRegionCache to "normal" address_space_* operations for 2.9, as it is simpler than undoing the virtio patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-16	RAMBlocks: qemu_ram_is_shared	Dr. David Alan Gilbert	1	-0/+5
	Provide a helper to say whether a RAMBlock was created as a shared mapping. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-03-14	exec: add cpu_synchronize_state to cpu_memory_rw_debug	Christian Borntraeger	1	-0/+2
	I sometimes got "Cannot access memory" when using the x command on the monitor. Turns out that the cpu env did contain stale data (e.g. wrong control register content for page table origin). We must synchronize the state of the CPU before walking the page tables. A similar issues happens for a remote gdb, so lets do the cpu_synchronize_state in cpu_memory_rw_debug. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <1488896348-13560-1-git-send-email-borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14	mem-prealloc: reduce large guest start-up and migration time.	Jitendra Kolhe	1	-1/+1
	Using "-mem-prealloc" option for a large guest leads to higher guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest \| Start-up time \| Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB \| 54m11.796s \| 75m43.843s 64 Core - 1TB \| 8m56.576s \| 14m29.049s 64 Core - 256GB \| 2m11.245s \| 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB \| 5m1.027s \| 34m10.565s 64 Core - 1TB \| 1m10.366s \| 8m28.188s 64 Core - 256GB \| 0m19.040s \| 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB \| 1m58.970s \| 31m43.400s 64 Core - 1TB \| 0m39.885s \| 7m55.289s 64 Core - 256GB \| 0m11.960s \| 2m0.135s ----------------------------------------------------------------------- Changed in v2: - modify number of memset threads spawned to min(smp_cpus, 16). - removed 64GB memory restriction for spawning memset threads. Changed in v3: - limit number of threads spawned based on min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus) - implement memset thread specific siglongjmp in SIGBUS signal_handler. Changed in v4 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread as main thread no longer touches any pages. - simplify code my returning memset_thread_failed status from touch_all_pages. Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com> Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03	exec, kvm, target-ppc: Move getrampagesize() to common code	Alexey Kardashevskiy	1	-0/+82
	getrampagesize() returns the largest supported page size and mainly used to know if huge pages are enabled. However is implemented in target-ppc/kvm.c and not available in TCG or other architectures. This renames and moves gethugepagesize() to mmap-alloc.c where fd-based analog of it is already implemented. This renames and moves getrampagesize() to exec.c as it seems to be the common place for helpers like this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-28	postcopy: Record largest page size	Dr. David Alan Gilbert	1	-0/+13
	Record the largest page size in use; we'll need it soon for allocating temporary buffers. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-7-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-28	postcopy: enhance ram_block_discard_range for hugepages	Dr. David Alan Gilbert	1	-4/+20
	Unfortunately madvise DONTNEED doesn't work on hugepagetlb so use fallocate(FALLOC_FL_PUNCH_HOLE) qemu_fd_getpagesize only sets the page based off a file if the file is from hugetlbfs. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-6-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-28	exec: ram_block_discard_range	Dr. David Alan Gilbert	1	-0/+54
	Create ram_block_discard_range in exec.c to replace postcopy_ram_discard_range and most of ram_discard_range. Those two routines are a bit of a weird combination, and ram_discard_range is about to get more complex for hugepages. It's OS dependent code (so shouldn't be in migration/ram.c) but it needs quite a bit of the innards of RAMBlock so doesn't belong in the os*.c. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-24	tcg: drop global lock during TCG code execution	Jan Kiszka	1	-3/+9
	This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex] Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [EGC: fixed iothread lock for cpu-exec IRQ handling] Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> [PM: target-arm changes] Acked-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-17	exec: make address_space_cache_destroy idempotent	Paolo Bonzini	1	-0/+1
	Clear cache->mr so that address_space_cache_destroy does nothing the second time it is called. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-07	arm: Correctly handle watchpoints for BE32 CPUs	Julian Brown	1	-0/+1
	In BE32 mode, sub-word size watchpoints can fail to trigger because the address of the access is adjusted in the opcode helpers before being compared with the watchpoint registers. This patch reverses the address adjustment before performing the comparison with the help of a new CPUClass hook. This version of the patch augments and tidies up comments a little. Signed-off-by: Julian Brown <julian@codesourcery.com> Message-id: caaf64ffc72f6ae183015337b7afdbd4b8989cb6.1484929304.git.julian@codesourcery.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-01-31	trace: switch to modular code generation for sub-directories	Daniel P. Berrange	1	-1/+1
	Introduce rules in the top level Makefile that are able to generate trace.[ch] files in every subdirectory which has a trace-events file. The top level directory is handled specially, so instead of creating trace.h, it creates trace-root.h. This allows sub-directories to include the top level trace-root.h file, without ambiguity wrt to the trace.g file in the current sub-dir. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170125161417.31949-7-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-27	memory: don't sign-extend 32-bit writes	Ladi Prosek	1	-1/+1
	ldl_p has a signed return type so assigning it to uint64_t implicitly sign-extends the value. This results in devices with min_access_size = 8 seeing unexpected values passed to their write handlers. Example: guest performs a 32-bit write of 0x80000000 to an mmio region and the handler receives 0xFFFFFFFF80000000 in its value argument. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Message-Id: <1485440557-10384-1-git-send-email-lprosek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-20	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	1	-0/+6
	* QOM interface fix (Eduardo) * RTC fixes (Gaohuai, Igor) * Memory leak fixes (Li Qiang, me) * Ctrl-a b regression (Marc-André) * Stubs cleanups and fixes (Leif, me) * hxtool tweak (me) * HAX support (Vincent) * QemuThread, exec.c and SCSI fixes (Roman, Xinhua, me) * PC_COMPAT_2_8 fix (Marcelo) * stronger bitmap assertions (Peter) # gpg: Signature made Fri 20 Jan 2017 12:49:01 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (35 commits) pc.h: move x-mach-use-reliable-get-clock compat entry to PC_COMPAT_2_8 bitmap: assert that start and nr are non negative Revert "win32: don't run subprocess tests on Mingw32 platform" hax: add Darwin support Plumb the HAXM-based hardware acceleration support target/i386: Add Intel HAX files kvm: move cpu synchronization code KVM: PPC: eliminate unnecessary duplicate constants ramblock-notifier: new char: fix ctrl-a b not working exec: Add missing rcu_read_unlock x86: ioapic: fix fail migration when irqchip=split x86: ioapic: dump version for "info ioapic" x86: ioapic: add traces for ioapic hxtool: emit Texinfo headings as @subsection qemu-thread: fix qemu_thread_set_name() race in qemu_thread_create() serial: fix memory leak in serial exit scsi-block: fix direction of BYTCHK test for VERIFY commands pc: fix crash in rtc_set_memory() if initial cpu is marked as hotplugged acpi: filter based on CONFIG_ACPI_X86 rather than TARGET ... # Conflicts: # include/hw/i386/pc.h
2017-01-16	ramblock-notifier: new	Paolo Bonzini	1	-0/+5
	This adds a notify interface of ram block additions and removals. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-16	exec: Add missing rcu_read_unlock	Roman Kapl	1	-0/+1
	rcu_read_unlock was not called if the address_space_access_valid result is negative. This caused (at least) a problem when qemu on PPC/E500+TAP failed to terminate properly and instead got stuck in a deadlock. Signed-off-by: Roman Kapl <rka@sysgo.com> Message-Id: <20170109110921.4931-1-rka@sysgo.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-13	cputlb: drop flush_global flag from tlb_flush	Alex Bennée	1	-2/+2
	We have never has the concept of global TLB entries which would avoid the flush so we never actually use this flag. Drop it and make clear that tlb_flush is the sledge-hammer it has always been. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> [DG: ppc portions] Acked-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-10	exec: introduce address_space_get_iotlb_entry()	Jason Wang	1	-0/+33
	This patch introduces a helper to query the iotlb entry for a possible iova. This will be used by later device IOTLB API to enable the capability for a dataplane (e.g vhost) to query the IOTLB. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-22	exec: introduce MemoryRegionCache	Paolo Bonzini	1	-0/+76
	Device models often have to perform multiple access to a single memory region that is known in advance, but would to use "DMA-style" functions instead of address_space_map/unmap. This can happen for example when the data has to undergo endianness conversion. Introduce a new data structure to cache the result of address_space_translate without forcing usage of a host address like address_space_map does. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	exec: introduce address_space_extend_translation	Paolo Bonzini	1	-21/+29
	This extracts the common part of address_space_map and address_space_cache_init into a new function. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	exec: introduce memory_ldst.inc.c	Paolo Bonzini	1	-671/+10
	Templatize the address_space_* and *_phys functions, so that we can add similar functions in the next patch that work with a lightweight, cache-like version of address_space_map/unmap. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	exec: optimize remaining address_space_* cases	Paolo Bonzini	1	-23/+103
	Do them right before the next patch generalizes them into a multi-included file. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-06	exec.c: Fix breakpoint invalidation race	Peter Maydell	1	-19/+6
	A bug (1647683) was reported showing a crash when removing breakpoints. The reproducer was bisected to 3359baad when tb_flush was finally made thread safe. While in MTTCG the locking in breakpoint_invalidate would have prevented any problems, but currently tb_lock() is a NOP for system emulation. The race is between a tb_flush from the gdbstub and the tb_invalidate_phys_addr() in breakpoint_invalidate(). Ideally we'd have actual locking here; for the moment the simple fix is to do a full tb_flush() for a bp invalidate, since that is thread-safe even if no lock is taken. Reported-by: Julian Brown <julian@codesourcery.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1481047629-7763-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-03	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Stefan Hajnoczi	1	-3/+30
	* NBD bugfix (Changlong) * NBD write zeroes support (Eric) * Memory backend fixes (Haozhong) * Atomics fix (Alex) * New AVX512 features (Luwei) * "make check" logging fix (Paolo) * Chardev refactoring fallout (Paolo) * Small checkpatch improvements (Paolo, Jeff) # gpg: Signature made Wed 02 Nov 2016 08:31:11 AM GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (30 commits) main-loop: Suppress I/O thread warning under qtest docs/rcu.txt: Fix minor typo vl: exit qemu on guest panic if -no-shutdown is not set checkpatch: allow spaces before parenthesis for 'coroutine_fn' x86: add AVX512_4VNNIW and AVX512_4FMAPS features slirp: fix CharDriver breakage qemu-char: do not forward events through the mux until QEMU has started nbd: Implement NBD_CMD_WRITE_ZEROES on client nbd: Implement NBD_CMD_WRITE_ZEROES on server nbd: Improve server handling of shutdown requests nbd: Refactor conversion to errno to silence checkpatch nbd: Support shorter handshake nbd: Less allocation during NBD_OPT_LIST nbd: Let client skip portions of server reply nbd: Let server know when client gives up negotiation nbd: Share common option-sending code in client nbd: Send message along with server NBD_REP_ERR errors nbd: Share common reply-sending code in server nbd: Rename struct nbd_request and nbd_reply nbd: Rename NbdClientSession to NBDClientSession ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-02	exec.c: check memory backend file size with 'size' option	Haozhong Zhang	1	-0/+7
	If the memory backend file is not large enough to hold the required 'size', Qemu will report error and exit. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Message-Id: <20161027042300.5929-3-haozhong.zhang@intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20161102010551.2723-1-haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-01	log: Add locking to large logging blocks	Richard Henderson	1	-0/+2
	Reuse the existing locking provided by stdio to keep in_asm, cpu, op, op_opt, op_ind, and out_asm as contiguous blocks. While it isn't possible to interleave e.g. in_asm or op_opt logs because of the TB lock protecting all code generation, it is possible to interleave cpu logs, or to interleave a cpu dump with an out_asm dump. For mingw32, we appear to have no viable solution for this. The locking functions are not properly exported from the system runtime library. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-11-01	exec.c: do not truncate non-empty memory backend file	Haozhong Zhang	1	-1/+21
	For '-object memory-backend-file,mem-path=foo,size=xyz', if the size of file 'foo' does not match the given size 'xyz', the current QEMU will truncate the file to the given size, which may corrupt the existing data in that file. To avoid such data corruption, this patch disables truncating non-empty backend files. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Message-Id: <20161027042300.5929-2-haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-01	exec.c: ensure all AddressSpaceDispatch updates under RCU	Alex Bennée	1	-2/+2
	The memory_dispatch field is meant to be protected by RCU so we should use the correct primitives when accessing it. This race was flagged up by the ThreadSanitizer. Signed-off-by: Alex BennÃ©e <alex.bennee@linaro.org> Message-Id: <20161021153418.21571-1-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31	tcg: move locking for tb_invalidate_phys_page_range up	Alex Bennée	1	-0/+16
	In the linux-user case all things that involve ''l1_map' and PageDesc tweaks are protected by the memory lock (mmpa_lock). For SoftMMU mode we previously relied on single threaded behaviour, with MTTCG we now use the tb_lock(). As a result we need to do a little re-factoring and push the taking of this lock up the call tree. This requires a slightly different entry for the SoftMMU and user-mode cases from tb_invalidate_phys_range. This also means user-mode breakpoint insertion needs to take two locks but it hadn't taken any previously so this is an improvement. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161027151030.20863-20-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31	tcg: protect translation related stuff with tb_lock.	KONRAD Frederic	1	-0/+6
	This protects all translation related work with tb_lock() too ensure thread safety. This effectively serialises all code generation. In addition to the code generation we also take the lock for TB invalidation. This has a knock on effect of meaning tb_lock() is held for modification of the SoftMMU TLB by non-self threads which will be used in later patches. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [AJB: moved into tree, clean-up history] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <20161027151030.20863-10-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-26	exec: Avoid direct references to Int128 parts	Richard Henderson	1	-2/+2
	Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-25	Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into ↵	Peter Maydell	1	-5/+7
	staging x86 and CPU queue, 2016-10-24 x2APIC support to APIC code, cpu_exec_init() refactor on all architectures, and other x86 changes. # gpg: Signature made Mon 24 Oct 2016 20:51:14 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/x86-pull-request: exec: call cpu_exec_exit() from a CPU unrealize common function exec: move cpu_exec_init() calls to realize functions exec: split cpu_exec_init() pc: q35: Bump max_cpus to 288 pc: Require IRQ remapping and EIM if there could be x2APIC CPUs pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs Increase MAX_CPUMASK_BITS from 255 to 288 pc: Clarify FW_CFG_MAX_CPUS usage comment pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode pc: apic_common: Restore APIC ID to initial ID on reset pc: apic_common: Extend APIC ID property to 32bit pc: Leave max apic_id_limit only in legacy cpu hotplug code acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254 pc: acpi: x2APIC support for SRAT table pc: acpi: x2APIC support for MADT table and _MAT method Conflicts: target-arm/cpu.c Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-24	exec: call cpu_exec_exit() from a CPU unrealize common function	Laurent Vivier	1	-1/+1
	As cpu_exec_exit() mirrors the cpu_exec_realizefn(), rename it as cpu_exec_unrealizefn(). Create and register a cpu_common_unrealizefn() function for the CPU device class and call cpu_exec_unrealizefn() from this function. Remove cpu_exec_exit() from cpu_common_finalize() (which mirrors init, not realize), and as x86_cpu_unrealizefn() and ppc_cpu_unrealizefn() overwrite the device class unrealize function, add a call to a parent_unrealize pointer. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24	exec: move cpu_exec_init() calls to realize functions	Laurent Vivier	1	-1/+1
	Modify all CPUs to call it from XXX_cpu_realizefn() function. Remove all the cannot_destroy_with_object_finalize_yet as unsafe references have been moved to cpu_exec_realizefn(). (tested with QOM command provided by commit 4c315c27) for arm: Setting of cpu->mp_affinity is moved from arm_cpu_initfn() to arm_cpu_realizefn() as setting of cpu_index is now done in cpu_exec_realizefn(). To avoid to overwrite an user defined value, we set it to an invalid value by default, and update it in realize function only if the value is still invalid. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24	exec: split cpu_exec_init()	Laurent Vivier	1	-4/+6
	Put in cpu_exec_initfn() what initializes the CPU, and leave in cpu_exec_init() what adds it to the environment. As cpu_exec_initfn() is called by all XX_cpu_initfn(), call it directly in cpu_common_initfn(). cpu_exec_init() is now a realize function, it will be renamed to cpu_exec_realizefn() and moved to the XX_cpu_realizefn() function in a following patch. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24	cpu: Support a target CPU having a variable page size	Peter Maydell	1	-0/+42
	Support target CPUs having a page size which isn't knownn at compile time. To use this, the CPU implementation should: * define TARGET_PAGE_BITS_VARY * not define TARGET_PAGE_BITS * define TARGET_PAGE_BITS_MIN to the smallest value it might possibly want for TARGET_PAGE_BITS * call set_preferred_target_page_bits() in its realize function to indicate the actual preferred target page size for the CPU (and report any error from it) In CONFIG_USER_ONLY, the CPU implementation should continue to define TARGET_PAGE_BITS appropriately for the guest OS page size. Machines which want to take advantage of having the page size something larger than TARGET_PAGE_BITS_MIN must set the MachineClass minimum_page_bits field to a value which they guarantee will be no greater than the preferred page size for any CPU they create. Note that changing the target page size by setting minimum_page_bits is a migration compatibility break for that machine. For debugging purposes, attempts to use TARGET_PAGE_SIZE before it has been finally confirmed will assert. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-24	exec.c: Remove static allocation of sub_section of sub_page	Vijaya Kumar K	1	-3/+2
	Allocate sub_section dynamically. Remove dependency on TARGET_PAGE_SIZE to make run-time page size detection for arm platforms. Signed-off-by: Vijaya Kumar K <vijayak@cavium.com> Message-id: 1465808915-4887-3-git-send-email-vijayak@caviumnetworks.com [PMM: use flexible array member rather than separate malloc so we don't need an extra pointer deref when using it] Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-24	exec.c: workaround regression caused by alignment change in d2f39ad	Haozhong Zhang	1	-1/+6
	Commit d2f39ad "exec.c: Ensure right alignment also for file backed ram" added an additional alignment requirement on the size of backend file besides the previous page size. On x86, the alignment is changed from 4KB in QEMU 2.6 to 2MB in QEMU 2.7. This change breaks certain usages in QEMU 2.7 on x86, e.g. -object memory-backend-file,id=mem1,mem-path=/tmp/,size=$SZ -device pc-dimm,id=dimm1,memdev=mem1 where $SZ is multiple of 4KB but not 2MB (e.g. 1023M). QEMU 2.7 reports the following error message and aborts: qemu-system-x86_64: -device pc-dimm,memdev=mem1,id=nv1: backend memory size must be multiple of 0x200000 The same regression may also happen in other platforms as indicated by Igor Mammedov. This change is however necessary for s390 according to the commit message of d2f39ad, so we workaround the regression by taking the change only on s390. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reported-by: "Xu, Anthony" <anthony.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-13	RAMBlocks: Store page size	Dr. David Alan Gilbert	1	-7/+12
	Store the page size in each RAMBlock, we need it later. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-08	exec: remove unused compacted argument	Marc-André Lureau	1	-5/+3
	Since commit b35ba30f8f when it was introduced, phys_page_compact() takes an unused compacted argument. ubsan complains about it when launching qemu-x86_64 without arguments: qemu/exec.c:310:5: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-27	cpus-common: move CPU list management to common code	Paolo Bonzini	1	-35/+2
	Add a mutex for the CPU list to system emulation, as it will be used to manage safe work. Abstract manipulation of the CPU list in new functions cpu_list_add and cpu_list_remove. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13	kvm-all: drop kvm_setup_guest_memory	Cao jin	1	-3/+1
	kvm_setup_guest_memory only does "madvise to QEMU_MADV_DONTFORK" and is only called by ram_block_add, which actually is duplicate code. Bonus: add simple comment for kvm_has_sync_mmu to make life easier. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Message-Id: <1473662096-32598-1-git-send-email-caoj.fnst@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13	qtail: clean up direct access to tqe_prev field	Igor Mammedov	1	-2/+1
	instead of accessing tqe_prev field dircetly outside of queue.h use macros to check if element is in list and make sure that afer element is removed from list tqe_prev field could be used to do the same check. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1469450832-84343-1-git-send-email-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-05	exec: Ensure the only one cpu_index allocation method is used	Igor Mammedov	1	-0/+7
	Make sure that cpu_index auto allocation isn't used in combination with manual cpu_index assignment. And dissallow out of order cpu removal if auto allocation is in use. Target that wishes to support out of order unplug should switch to manual cpu_index assignment. Following patch could be used as an example: (pc: init CPUState->cpu_index with index in possible_cpus[])) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-08-02	fix qemu exit on memory hotplug when allocation fails at prealloc time	Igor Mammedov	1	-2/+8
	When adding hostmem backend at runtime, QEMU might exit with error: "os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM" It happens due to os_mem_prealloc() not handling errors gracefully. Fix it by passing errp argument so that os_mem_prealloc() could report error to callers and undo performed allocation when os_mem_prealloc() fails. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1469008443-72059-1-git-send-email-imammedo@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-26	exec: Set cpu_index only if it's not been explictly set	Igor Mammedov	1	-38/+6
	It keeps the legacy behavior for all users that doesn't care about stable cpu_index value, but would allow boards that would support device_add/device_del to set stable cpu_index that won't depend on order in which cpus are created/destroyed. While at that simplify cpu_get_free_index() as cpu_index generated by USER_ONLY and softmmu variants is the same since none of the users support cpu-remove so far, except of not yet released spapr/x86 device_add/delr, which will be altered by follow up patches to set stable cpu_index manually. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-26	exec: Don't use cpu_index to detect if cpu_exec_init()'s been called	Igor Mammedov	1	-2/+3
	Instead use QTAIL's tqe_prev field to detect if cpu's been placed in list by cpu_exec_init() which is always set if QTAIL element is in list. Fixes SIGSEGV on failure path in case cpu_index is assigned by board and cpu.relalize() fails before cpu_exec_init() is called. In follow up patches, cpu_index will be assigned by boards that support cpu hot(un)plug and need stable cpu_index that doesn't depend on order cpus are created/removed. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reported-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-26	exec: Reduce CONFIG_USER_ONLY ifdeffenery	Igor Mammedov	1	-14/+3
	Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-17	exec: avoid realloc in phys_map_node_reserve	Peter Lieven	1	-1/+3
	this is the first step in reducing the brk heap fragmentation created by the map->nodes memory allocation. Since the introduction of RCU the freeing of the PhysPageMaps is delayed so that sometimes several hundred are allocated at the same time. Even worse the memory for map->nodes is allocated and shortly afterwards reallocated. Since the number of nodes it grows to in the end is the same for all PhysPageMaps remember this value and at least avoid the reallocation. The large number of simultaneous allocations (about 450 x 70kB in my configuration) has to be addressed later. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1468577030-21097-1-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-12	Use #include "..." for our own headers, <...> for others	Markus Armbruster	1	-1/+1
	Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>