peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-09-19	i386/cpu/hyperv: support over 64 vcpus for windows guests	Gonglei	1	-0/+5
	Starting with Windows Server 2012 and Windows 8, if CPUID.40000005.EAX contains a value of -1, Windows assumes specific limit to the number of VPs. In this case, Windows Server 2012 guest VMs may use more than 64 VPs, up to the maximum supported number of processors applicable to the specific Windows version being used. https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs For compatibility, Let's introduce a new property for X86CPU, named "x-hv-max-vps" as Eduardo's suggestion, and set it to 0x40 before machine 2.10. (The "x-" prefix indicates that the property is not supposed to be a stable user interface.) Signed-off-by: Gonglei <arei.gonglei@huawei.com> Message-Id: <1505143227-14324-1-git-send-email-arei.gonglei@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-08	pc: add 2.11 machine types	Marcel Apfelbaum	1	-0/+3
	Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02	intel_iommu: use access_flags for iotlb	Peter Xu	1	-2/+1
	It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-17	i386: expose "TCGTCGTCGTCG" in the 0x40000000 CPUID leaf	Daniel P. Berrange	1	-0/+5
	Currently when running KVM, we expose "KVMKVMKVM\0\0\0" in the 0x40000000 CPUID leaf. Other hypervisors (VMWare, HyperV, Xen, BHyve) all do the same thing, which leaves TCG as the odd one out. The CPUID signature is used by software to detect which virtual environment they are running in and (potentially) change behaviour in certain ways. For example, systemd supports a ConditionVirtualization= setting in unit files. The virt-what command can also report the virt type it is running on Currently both these apps have to resort to custom hacks like looking for 'fw-cfg' entry in the /proc/device-tree file to identify TCG. This change thus proposes a signature "TCGTCGTCGTCG" to be reported when running under TCG. To hide this, the -cpu option tcg-cpuid=off can be used. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20170509132736.10071-3-berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-07-14	memory/iommu: introduce IOMMUMemoryRegionClass	Alexey Kardashevskiy	1	-1/+2
	This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14	memory/iommu: QOM'fy IOMMU MemoryRegion	Alexey Kardashevskiy	1	-1/+1
	This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-04	Move CONFIG_KVM related definitions to kvm_i386.h	Thomas Huth	1	-13/+0
	pc.h and sysemu/kvm.h are also included from common code (where CONFIG_KVM is not available), so the #defines that depend on CONFIG_KVM should not be declared here to avoid that anybody is using them in a wrong way. Since we're also going to poison CONFIG_KVM for common code, let's move them to kvm_i386.h instead. Most of the dummy definitions from sysemu/kvm.h are also unused since the code that uses them is only compiled for CONFIG_KVM (e.g. target/i386/kvm.c), so the unused defines are also simply dropped here instead of being moved. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1498454578-18709-3-git-send-email-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-16	q35/mch: implement extended TSEG sizes	Laszlo Ersek	1	-0/+5
	The q35 machine type currently lets the guest firmware select a 1MB, 2MB or 8MB TSEG (basically, SMRAM) size. In edk2/OVMF, we use 8MB, but even that is not enough when a lot of VCPUs (more than approx. 224) are configured -- SMRAM footprint scales largely proportionally with VCPU count. Introduce a new property for "mch" called "extended-tseg-mbytes", which expresses (in megabytes) the user's choice of TSEG (SMRAM) size. Invent a new, QEMU-specific register in the config space of the DRAM Controller, at offset 0x50, in order to allow guest firmware to query the TSEG (SMRAM) size. According to Intel Document Number 316966-002, Table 5-1 "DRAM Controller Register Address Map (D0:F0)": Warning: Address locations that are not listed are considered Intel Reserved registers locations. Reads to Reserved registers may return non-zero values. Writes to reserved locations may cause system failures. All registers that are defined in the PCI 2.3 specification, but are not necessary or implemented in this component are simply not included in this document. The reserved/unimplemented space in the PCI configuration header space is not documented as such in this summary. Offsets 0x50 and 0x51 are not listed in Table 5-1. They are also not part of the standard PCI config space header. And they precede the capability list as well, which starts at 0xe0 for this device. When the guest writes value 0xffff to this register, the value that can be read back is that of "mch.extended-tseg-mbytes" -- unless it remains 0xffff. The guest is required to write 0xffff first (as opposed to a read-only register) because PCI config space is generally not cleared on QEMU reset, and after S3 resume or reboot, new guest firmware running on old QEMU could read a guest OS-injected value from this register. After reading the available "extended" TSEG size, the guest firmware may actually request that TSEG size by writing pattern 11b to the ESMRAMC register's TSEG_SZ bit-field. (The Intel spec referenced above defines only patterns 00b (1MB), 01b (2MB) and 10b (8MB); 11b is reserved.) On the QEMU command line, the value can be set with -global mch.extended-tseg-mbytes=N The default value for 2.10+ q35 machine types is 16. The value is limited to 0xfff (4095) at the moment, purely so that the product (4095 MB) can be stored to the uint32_t variable "tseg_size" in mch_update_smram(). Users are responsible for choosing sensible TSEG sizes. On 2.9 and earlier q35 machine types, the default value is 0. This lets the 11b bit pattern in ESMRAMC.TSEG_SZ, and the register at offset 0x50, keep their original behavior. When "extended-tseg-mbytes" is nonzero, the new register at offset 0x50 is set to that value on reset, for completeness. PCI config space is migrated automatically, so no VMSD changes are necessary. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1447027 Ref: https://lists.01.org/pipermail/edk2-devel/2017-May/010456.html Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-06-05	pc: Use "min-[x]level" on compat_props	Eduardo Habkost	1	-21/+21
	Since the automatic cpuid-level code was introduced in commit c39c0edf9bb3b968ba95484465a50c7b19f4aa3a ("target-i386: Automatically set level/xlevel/xlevel2 when needed"), the CPU model tables just define the default CPUID level code (set using "min-level"). Setting "[x]level" forces CPUID level to a specific value and disable the automatic-level logic. But the PC compat code was not updated and the existing "[x]level" compat properties broke compatibility for people using features that triggered the auto-level code. To keep previous behavior, we should set "min-[x]level" instead of "[x]level" on compat_props. This was not a problem for most cases, because old machine-types don't have full-cpuid-auto-level enabled. The only common use case it broke was the CPUID[7] auto-level code, that was already enabled since the first CPUID[7] feature was introduced (in QEMU 1.4.0). This causes the regression reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1454641 Change the PC compat code to use "min-[x]level" instead of "[x]level" on compat_props, and add new test cases to ensure we don't break this again. Reported-by: "Guo, Zhiyi" <zhguo@redhat.com> Fixes: c39c0edf9bb ("target-i386: Automatically set level/xlevel/xlevel2 when needed") Cc: qemu-stable@nongnu.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-25	intel_iommu: support passthrough (PT)	Peter Xu	1	-0/+1
	Hardware support for VT-d device passthrough. Although current Linux can live with iommu=pt even without this, but this is faster than when using software passthrough. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-10	pc: add 2.10 machine type	Peter Xu	1	-0/+3
	CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Richard Henderson <rth@twiddle.net> CC: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-10	pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware boot	Igor Mammedov	1	-4/+3
	Since 2.7 commit (b2a575a Add optionrom compatible with fw_cfg DMA version) regressed migration during firmware exection time by abusing fwcfg.dma_enabled property to decide loading dma version of option rom AND by mistake disabling DMA for 2.6 and earlier globally instead of only for option rom. so 2.6 machine type guest is broken when it already runs firmware in DMA mode but migrated to qemu-2.7(pc-2.6) at that time; a) qemu-2.6:pc2.6 (fwcfg.dma=on,firmware=dma,oprom=ioport) b) qemu-2.7:pc2.6 (fwcfg.dma=off,firmware=ioport,oprom=ioport) to: a b from a OK FAIL b OK OK So we currently have broken forward migration from qemu-2.6 to qemu-2.[789] that however could be fixed for 2.10 by re-enabling DMA for 2.[56] machine types and allowing dma capable option rom only since 2.7. As result qemu should end up with: c) qemu-2.10:pc2.6 (fwcfg.dma=on,firmware=dma,oprom=ioport) to: a b c from a OK FAIL OK b OK OK OK c OK FAIL OK where forward migration from qemu-2.6 to qemu-2.10 should work again leaving only qemu-2.[789]:pc-2.6 broken. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Analyzed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-03	hw/i386: Build-time assertion on pc/q35 reset register being identical.	Phil Dennis-Jordan	1	-0/+6
	This adds a clarifying comment and build time assert to the FADT reset register field initialisation: the reset register is the same on both machine types. Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu> Message-Id: <1489558827-28971-3-git-send-email-phil@philjordan.eu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-20	intel_iommu: enable remote IOTLB	Peter Xu	1	-0/+8
	This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch upstream: "IOMMU: enable intel_iommu map and unmap notifiers" https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html However I removed/fixed some content, and added my own codes. Instead of translate() every page for iotlb invalidations (which is slower), we walk the pages when needed and notify in a hook function. This patch enables vfio devices for VT-d emulation. And, since we already have vhost DMAR support via device-iotlb, a natural benefit that this patch brings is that vt-d enabled vhost can live even without ATS capability now. Though more tests are needed. Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20	intel_iommu: allow dynamic switch of IOMMU region	Peter Xu	1	-0/+2
	This is preparation work to finally enabled dynamic switching ON/OFF for VT-d protection. The old VT-d codes is using static IOMMU address space, and that won't satisfy vfio-pci device listeners. Let me explain. vfio-pci devices depend on the memory region listener and IOMMU replay mechanism to make sure the device mapping is coherent with the guest even if there are domain switches. And there are two kinds of domain switches: (1) switch from domain A -> B (2) switch from domain A -> no domain (e.g., turn DMAR off) Case (1) is handled by the context entry invalidation handling by the VT-d replay logic. What the replay function should do here is to replay the existing page mappings in domain B. However for case (2), we don't want to replay any domain mappings - we just need the default GPA->HPA mappings (the address_space_memory mapping). And this patch helps on case (2) to build up the mapping automatically by leveraging the vfio-pci memory listeners. Another important thing that this patch does is to seperate IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not depend on the DMAR region (like before this patch). It should be a standalone region, and it should be able to be activated without DMAR (which is a common behavior of Linux kernel - by default it enables IR while disabled DMAR). Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-9-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-05	tco: do not generate an NMI	Paolo Bonzini	1	-1/+0
	This behavior is not indicated in the datasheet and can confuse the OS. The TCO can trap NMIs from SERR# or IOCHK# and convert them to SMIs; but any other TCO event is either delivered as an SMI or completely disabled. Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27	Revert "apic: save apic_delivered flag"	Paolo Bonzini	1	-2/+0
	This reverts commit 07bfa354772f2de67008dc66c201b627acff0106. The global variable is only read as part of a apic_reset_irq_delivered(); qemu_irq_raise(s->irq); if (!apic_get_irq_delivered()) { sequence, so the value never matters at migration time. Reported-by: Dr. David Alan Gilbert <dglibert@redhat.com> Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-10	i386: Change stepping of Haswell to non-blacklisted value	Eduardo Habkost	1	-0/+5
	glibc blacklists TSX on Haswell CPUs with model==60 and stepping < 4. To make the Haswell CPU model more useful, make those guests actually use TSX by changing CPU stepping to 4. References: * glibc commit 2702856bf45c82cf8e69f2064f5aa15c0ceb6359 https://sourceware.org/git/?p=glibc.git;a=commit;h=2702856bf45c82cf8e69f2064f5aa15c0ceb6359 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170309181212.18864-4-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-03-03	x86: Work around SMI migration breakages	Dr. David Alan Gilbert	1	-0/+4
	Migration from a 2.3.0 qemu results in a reboot on the receiving QEMU due to a disagreement about SM (System management) interrupts. 2.3.0 didn't have much SMI support, but it did set CPU_INTERRUPT_SMI and this gets into the migration stream, but on 2.3.0 it never got delivered. ~2.4.0 SMI interrupt support was added but was broken - so that when a 2.3.0 stream was received it cleared the CPU_INTERRUPT_SMI but never actually caused an interrupt. The SMI delivery was recently fixed by 68c6efe07a, but the effect now is that an incoming 2.3.0 stream takes the interrupt it had flagged but it's bios can't actually handle it(I think partly due to the original interrupt not being taken during boot?). The consequence is a triple(?) fault and a reboot. Tested from: 2.3.1 -M 2.3.0 2.7.0 -M 2.3.0 2.8.0 -M 2.3.0 2.8.0 -M 2.8.0 This corresponds to RH bugzilla entry 1420679. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170223133441.16010-1-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-22	machine: move possible_cpus to MachineState	Igor Mammedov	1	-1/+0
	so that it would be possible to reuse it with spapr/virt-aarch64 targets. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-17	intel_iommu: add "caching-mode" option	Aviv Ben-David	1	-0/+2
	This capability asks the guest to invalidate cache before each map operation. We can use this invalidation to trap map operations in the hypervisor. Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> [peterx: using "caching-mode" instead of "cache-mode" to align with spec] [peterx: re-write the subject to make it short and clear] Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-01-27	char: rename CharDriverState Chardev	Marc-André Lureau	1	-1/+1
	Pick a uniform chardev type name. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27	pc: Enable vmware-cpuid-freq CPU option for 2.9+ machine types	Phil Dennis-Jordan	1	-1/+5
	Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu> Message-Id: <1484921496-11257-4-git-send-email-phil@philjordan.eu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27	hw/isa/lpc_ich9: negotiate SMI broadcast on pc-q35-2.9+ machine types	Laszlo Ersek	1	-0/+6
	Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20170126014416.11211-4-lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27	hw/isa/lpc_ich9: add broadcast SMI feature	Laszlo Ersek	1	-0/+3
	The generic edk2 SMM infrastructure prefers EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If Trigger() only brings the current processor into SMM, then edk2 handles it in the following ways: (1) If Trigger() is executed by the BSP (which is guaranteed before ExitBootServices(), but is not necessarily true at runtime), then: (a) If edk2 has been configured for "traditional" SMM synchronization, then the BSP sends directed SMIs to the APs with APIC delivery, bringing them into SMM individually. Then the BSP runs the SMI handler / dispatcher. (b) If edk2 has been configured for "relaxed" SMM synchronization, then the APs that are not already in SMM are not brought in, and the BSP runs the SMI handler / dispatcher. (2) If Trigger() is executed by an AP (which is possible after ExitBootServices(), and can be forced e.g. by "taskset -c 1 efibootmgr"), then the AP in question brings in the BSP with a directed SMI, and the BSP runs the SMI handler / dispatcher. The smaller problem with (1a) and (2) is that the BSP and AP synchronization is slow. For example, the "taskset -c 1 efibootmgr" command from (2) can take more than 3 seconds to complete, because efibootmgr accesses non-volatile UEFI variables intensively. The larger problem is that QEMU's current behavior diverges from the behavior usually seen on physical hardware, and that keeps exposing obscure corner cases, race conditions and other instabilities in edk2, which generally expects / prefers a software SMI to affect all CPUs at once. Therefore introduce the "broadcast SMI" feature that causes QEMU to inject the SMI on all VCPUs. While the original posting of this patch <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html> only intended to speed up (2), based on our recent "stress testing" of SMM this patch actually provides functional improvements. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20170126014416.11211-3-lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27	hw/isa/lpc_ich9: add SMI feature negotiation via fw_cfg	Laszlo Ersek	1	-0/+10
	Introduce the following fw_cfg files: - "etc/smi/supported-features": a little endian uint64_t feature bitmap, presenting the features known by the host to the guest. Read-only for the guest. The content of this file will be determined via bit-granularity ICH9-LPC device properties, to be introduced later. For now, the bitmask is left zeroed. The bits will be set from machine type compat properties and on the QEMU command line, hence this file is not migrated. - "etc/smi/requested-features": a little endian uint64_t feature bitmap, representing the features the guest would like to request. Read-write for the guest. The guest can freely (re)write this file, it has no direct consequence. Initial value is zero. A nonzero value causes the SMI-related fw_cfg files and fields that are under guest influence to be migrated. - "etc/smi/features-ok": contains a uint8_t value, and it is read-only for the guest. When the guest selects the associated fw_cfg key, the guest features are validated against the host features. In case of error, the negotiation doesn't proceed, and the "features-ok" file remains zero. In case of success, the "features-ok" file becomes (uint8_t)1, and the negotiated features are locked down internally (to which no further changes are possible until reset). The initial value is zero. A nonzero value causes the SMI-related fw_cfg files and fields that are under guest influence to be migrated. The C-language fields backing the "supported-features" and "requested-features" files are uint8_t arrays. This is because they carry guest-side representation (our choice is little endian), while VMSTATE_UINT64() assumes / implies host-side endianness for any uint64_t fields. If we migrate a guest between hosts with different endiannesses (which is possible with TCG), then the host-side value is preserved, and the host-side representation is translated. This would be visible to the guest through fw_cfg, unless we used plain byte arrays. So we do. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20170126014416.11211-2-lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-27	apic: save apic_delivered flag	Pavel Dovgalyuk	1	-0/+2
	This patch implements saving/restoring of static apic_delivered variable. v8: saving static variable only for one of the APICs Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20170126123429.5412.94368.stgit@PASHA-ISP> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-23	machine: Make possible_cpu_arch_ids() return const pointer	Igor Mammedov	1	-1/+1
	make sure that external callers won't try to modify possible_cpus and owner of possible_cpus can access it directly when it modifies it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1484759609-264075-5-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-01-20	pc.h: move x-mach-use-reliable-get-clock compat entry to PC_COMPAT_2_8	Marcelo Tosatti	1	-3/+4
	As noticed by David Gilbert, commit 6053a86 'kvmclock: reduce kvmclock differences on migration' added 'x-mach-use-reliable-get-clock' and a compatibility entry that turns it off; however it got merged after 2.8.0 was released but the entry has gone into PC_COMPAT_2_7 where it should have gone into PC_COMPAT_2_8. Fix it by moving the entry to PC_COMPAT_2_8. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170118175343.GA26873@amt.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-10	intel_iommu: support device iotlb descriptor	Jason Wang	1	-0/+1
	This patch enables device IOTLB support for intel iommu. The major work is to implement QI device IOTLB descriptor processing and notify the device through iommu notifier. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2016-12-22	kvmclock: reduce kvmclock difference on migration	Marcelo Tosatti	1	-0/+5
	Check for KVM_CAP_ADJUST_CLOCK capability KVM_CLOCK_TSC_STABLE, which indicates that KVM_GET_CLOCK returns a value as seen by the guest at that moment. For new machine types, use this value rather than reading from guest memory. This reduces kvmclock difference on migration from 5s to 0.1s (when max_downtime == 5s). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20161121105052.598267440@redhat.com> [Add comment explaining what is going on. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	pc: make pit configurable	Chao Peng	1	-0/+3
	Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <1478330391-74060-4-git-send-email-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	pc: make sata configurable	Chao Peng	1	-0/+2
	Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <1478330391-74060-3-git-send-email-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22	pc: make smbus configurable	Chao Peng	1	-0/+2
	Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <1478330391-74060-2-git-send-email-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-28	migration/pcspk: Turn migration of pcspk off for 2.7 and older	Dr. David Alan Gilbert	1	-0/+5
	To keep backwards migration compatibility allow us to turn pcspk migration off. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20161128133201.16104-3-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16	pc: fix FW_CFG_NB_CPUS to account for -device added CPUs	Igor Mammedov	1	-0/+2
	Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1479301481-197333-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-11-16	Revert "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs"	Igor Mammedov	1	-2/+0
	This reverts commit 080ac219cc7d9c55adf925c3545b7450055ad625. Legacy FW_CFG_NB_CPUS will be reused instead of 'etc/boot-cpus' fw_cfg file since it does the same and there is no point to maintaing duplicate guest ABI, if it can be helped. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1479212236-183810-2-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-11-15	intel_iommu: fix several incorrect endianess and bit fields	Peter Xu	1	-5/+4
	Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-11-02	PCMachineState: introduce acpi_build_enabled field	Wei Liu	1	-0/+2
	Introduce this field to control whether ACPI build is enabled by a particular machine or accelerator. It defaults to true if the machine itself supports ACPI build. Xen accelerator will disable it because Xen is in charge of building ACPI tables for the guest. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
2016-10-28	clean-up: removed duplicate #includes	Anand J	1	-1/+0
	Some files contain multiple #includes of the same header file. Removed most of those unnecessary duplicate entries using scripts/clean-includes. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Anand J <anand.indukala@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-24	pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs	Igor Mammedov	1	-0/+2
	Currently firmware uses 1 byte at 0x5F offset in RTC CMOS to get number of CPUs present at boot. However 1 byte is not enough to handle more than 255 CPUs. So add a new fw_cfg file that would allow QEMU to tell it. For compat reasons add file only for machine types that support more than 255 CPUs. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24	pc: apic_common: Extend APIC ID property to 32bit	Igor Mammedov	1	-1/+2
	ACPI ID is 32 bit wide on CPUs with x2APIC support. Extend 'id' property to support it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17	intel_iommu: reject broken EIM	Radim Krčmář	1	-0/+1
	Cluster x2APIC cannot work without KVM's x2apic API when the maximal APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we forbid other APICs and also the old KVM case with less than 9, to simplify the code. There is no point in enabling EIM in forbidden APICs, so we keep it enabled only for the KVM APIC; unconditionally, because making the option depend on KVM version would be a maintanance burden. Old QEMUs would enable eim whenever intremap was on, which would trick guests into thinking that they can enable cluster x2APIC even if any interrupt destination would get clamped to 8 bits. Depending on your configuration, QEMU could notice that the destination LAPIC is not present and report it with a very non-obvious: KVM: injection failed, MSI lost (Operation not permitted) Or the guest could say something about unexpected interrupts, because clamping leads to aliasing so interrupts were being delivered to incorrect VCPUs. KVM_X2APIC_API is the feature that allows us to enable EIM for KVM. QEMU 2.7 allowed EIM whenever interrupt remapping was enabled. In order to keep backward compatibility, we again allow guests to misbehave in non-obvious ways, and make it the default for old machine types. A user can enable the buggy mode it with "x-buggy-eim=on". Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17	intel_iommu: add OnOffAuto intr_eim as "eim" property	Radim Krčmář	1	-0/+1
	The default (auto) emulates the current behavior. A user can now control EIM like -device intel-iommu,intremap=on,eim=off Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17	apic: add send_msi() to APICCommonClass	Radim Krčmář	1	-0/+4
	The MMIO based interface to APIC doesn't work well with MSIs that have upper address bits set (remapped x2APIC MSIs). A specialized interface is a quick and dirty way to avoid the shortcoming. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17	apic: add global apic_get_class()	Radim Krčmář	1	-0/+2
	Every configuration has only up to one APIC class and we'll be extending the class with a function that can be called without an instanced object, so a direct access to the class is convenient. This patch will break compilation if some code uses apic_get_class() with CONFIG_USER_ONLY. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-10	Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging	Peter Maydell	1	-2/+0
	* Thread Sanitizer fixes (Alex) * Coverity fixes (David) * test-qht fixes (Emilio) * QOM interface for info irq/info pic (Hervé) * -rtc clock=rt fix (Junlian) * mux chardev fixes (Marc-André) * nicer report on death by signal (Michal) * qemu-tech TLC (Paolo) * MSI support for edu device (Peter) * qemu-nbd --offset fix (Tomáš) # gpg: Signature made Fri 07 Oct 2016 17:25:10 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (39 commits) qemu-doc: merge qemu-tech and qemu-doc qemu-tech: rewrite some parts qemu-tech: reorganize content qemu-tech: move TCG test documentation to tests/tcg/README qemu-tech: move user mode emulation features from qemu-tech qemu-tech: document lazy condition code evaluation in cpu.h qemu-tech: move text from qemu-tech to tcg/README qemu-doc: drop installation and compilation notes qemu-doc: replace introduction with the one from the internals manual qemu-tech: drop index test-qht: perform lookups under rcu_read_lock qht: fix unlock-after-free segfault upon resizing qht: simplify qht_reset_size qemu-nbd: Shrink image size by specified offset qemu_kill_report: Report PID name too util: Introduce qemu_get_pid_name char: update read handler in all cases char: use a fixed idx for child muxed chr i8259: give ISA device when registering ISA ioports .travis.yml: add gcc sanitizer build ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-04	intc: make HMP 'info irq' and 'info pic' commands use InterruptStatsProvider ↵	Hervé Poussineau	1	-2/+0
	interface Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Message-Id: <1474921408-24710-6-git-send-email-hpoussin@reactos.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-03	target-i386: Correct family/model/stepping for Opteron_G3	Evgeny Yakovlev	1	-0/+15
	Current CPU definition for AMD Opteron third generation includes features like SSE4a and LAHF_LM support in emulated CPUID. These features are present in K8 rev.E or K10 CPUs and later. However, current G3 family and model describe 2nd generation K8 cores instead. This is incorrect but was considered harmless until our tests found a problem with linux kernels >= 3.10 (and maybe earlier) which specifically check for Opteron K8 model when parsing CPUID leaf 0x80000001: http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/amd.c?v=3.16#L552 This code will disable LAHF_LM feature in /proc/cpuinfo if model number is inconsistent. This change sets Opteron_G3 family/model/stepping to 16/2/3 which is a proper Opteron 3rd generation 2350 CPU. Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Richard Henderson <rth@twiddle.net> CC: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27	target-i386: Automatically set level/xlevel/xlevel2 when needed	Eduardo Habkost	1	-0/+5
	Instead of requiring users and management software to be aware of required CPUID level/xlevel/xlevel2 values for each feature, automatically increase those values when features need them. This was already done for CPUID[7].EBX, and is now made generic for all CPUID feature flags. Unit test included, to make sure we don't break ABI on older machine-types and don't mess with the CPUID level values if they are explicitly set by the user. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>