summaryrefslogtreecommitdiff
path: root/hw/i386
AgeCommit message (Collapse)AuthorFilesLines
2017-09-20Merge remote-tracking branch ↵Peter Maydell3-54/+39
'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine/CPU/NUMA queue, 2017-09-19 # gpg: Signature made Tue 19 Sep 2017 21:17:01 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: MAINTAINERS: Update git URLs for my trees hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM NUMA: Replace MAX_NODES with nb_numa_nodes in for loop numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly pc: use generic cpu_model parsing vl.c: convert cpu_model to cpu type and set of global properties before machine_init() cpu: make cpu_generic_init() abort QEMU on error qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts hostmem-file: Add "discard-data" option osdep: Define QEMU_MADV_REMOVE vl: Clean up user-creatable objects when exiting Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-19hw/acpi-build: Fix SRAT memory building in case of node 0 without RAMEduardo Habkost1-6/+22
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1504231805-30957-2-git-send-email-douly.fnst@cn.fujitsu.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19numa: cpu: calculate/set default node-ids after all -numa CLI options are parsedIgor Mammedov1-9/+11
Calculating default node-ids for CPUs in possible_cpu_arch_ids() is rather fragile since defaults calculation uses nb_numa_nodes but callback might be potentially called early before all -numa CLI options are parsed, which would lead to cpus assigned only upto nb_numa_nodes at the time possible_cpu_arch_ids() is called. Issue was introduced by (7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus) and for example CLI: -smp 4 -numa node,cpus=0 -numa node would set props.node-id in possible_cpus array for every non explicitly mapped CPU to the first node. Issue is not visible to guest nor to mgmt interface due to 1) implictly mapped cpus are forced to the first node in case of partial mapping 2) in case of default mapping possible_cpu_arch_ids() is called after all -numa options are parsed (resulting in correct mapping). However it's fragile to rely on late execution of possible_cpu_arch_ids(), therefore add machine specific callback that returns node-id for CPU and use it to calculate/ set defaults at machine_numa_finish_init() time when all -numa options are parsed. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19General warn report fixupsAlistair Francis1-1/+1
Tidy up some of the warn_report() messages after having converted them to use warn_report(). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <9cb1d23551898c9c9a5f84da6773e99871285120.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19Convert multi-line fprintf() to warn_report()Alistair Francis1-2/+3
Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these commands: find ./* -type f -exec sed -i \ 'N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + Indentation fixed up manually afterwards. Some of the lines were manually edited to reduce the line length to below 80 charecters. Some of the lines with newlines in the middle of the string were also manually edit to avoid checkpatch errrors. The #include lines were manually updated to allow the code to compile. Several of the warning messages can be improved after this patch, to keep this patch mechanical this has been moved into a later patch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Jason Wang <jasowang@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19Convert single line fprintf(.../n) to warn_report()Alistair Francis1-1/+1
Convert all the single line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using this command: find ./* -type f -exec sed -i \ 's|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig' \ {} + Some of the lines were manually edited to reduce the line length to below 80 charecters. The #include lines were manually updated to allow the code to compile. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> [mips] Message-Id: <ae8f8a7f0a88ded61743dff2adade21f8122a9e7.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19hw/i386: Improve some of the warning messagesAlistair Francis3-12/+18
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1d6ef2ccd9667878ed5820fcf17eef35957ea5d8.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19multiboot: validate multiboot header address valuesPrasad J Pandit1-0/+19
While loading kernel via multiboot-v1 image, (flags & 0x00010000) indicates that multiboot header contains valid addresses to load the kernel image. These addresses are used to compute kernel size and kernel text offset in the OS image. Validate these address values to avoid an OOB access issue. This is CVE-2017-14167. Reported-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20170907063256.7418-1-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19pc: use generic cpu_model parsingIgor Mammedov2-39/+6
define default CPU type in generic way in pc_machine_class_init() and let common machine code to handle cpu_model parsing Patch also introduces TARGET_DEFAULT_CPU_TYPE define for 2 purposes: * make foo_machine_class_init() look uniform on every target * use define in [bsd|linux]-user targets to pick default cpu type Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1505318697-77161-5-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-08intel_iommu: fix missing BQL in pt fast pathPeter Xu1-0/+15
In vtd_switch_address_space() we did the memory region switch, however it's possible that the caller of it has not taken the BQL at all. Make sure we have it. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08hw/acpi: Move acpi_set_pci_info to pcihpAnthony PERARD1-32/+0
HW part of ACPI PCI hotplug in QEMU depends on ACPI_PCIHP_PROP_BSEL being set on a PCI bus that supports ACPI hotplug. It should work regardless of the source of ACPI tables (QEMU generator/legacy SeaBIOS/Xen). So move ACPI_PCIHP_PROP_BSEL initialization into HW ACPI implementation part from QEMU's ACPI table generator. To do PCI passthrough with Xen, the property ACPI_PCIHP_PROP_BSEL needs to be set, but this was done only when ACPI tables are built which is not needed for a Xen guest. The need for the property starts with commit "pc: pcihp: avoid adding ACPI_PCIHP_PROP_BSEL twice" (f0c9d64a68b776374ec4732424a3e27753ce37b6). Adding find_i440fx into stubs so that mips-softmmu target can be built. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08pc: add 2.11 machine typesMarcel Apfelbaum2-5/+23
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-31i386: replace g_malloc()+memcpy() with g_memdup()Marc-André Lureau1-2/+1
I found these pattern via grepping the source tree. I don't have a coccinelle script for it! Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-08-23numa: Move numa_legacy_auto_assign_ram to pc-i440fx-2.9Eduardo Habkost1-1/+1
The 'm->numa_auto_assign_ram = numa_legacy_auto_assign_ram;' line was supposed to be in pc_i440fx_2_9_machine_options() (see commit 3bfe5716 "numa: equally distribute memory on nodes"), but the merge commit adb354dd ("Merge remote-tracking branch 'mst/tags/for_upstream' into staging") moved it to the pc_i440fx_2_10_machine_options(). Move the line back to pc_i440fx_2_9_machine_options(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-id: 20170818190943.23858-1-ehabkost@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-08-22hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev'Thomas Huth1-2/+12
QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-08-08hw/i386: allow SHPC for Q35 machineAleksandr Bezzubikov1-2/+2
Unmask previously masked SHPC feature in _OSC method. Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02pc: acpi: force FADT rev1 for 440fx based machine typesIgor Mammedov1-4/+18
w2k used to boot on QEMU until revision of FADT has been bumped to rev3 (commit 77af8a2b hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.) Keep PC machine at rev1 to remain compatible and Q35 at rev3 where w2k isn't supported anyway so OSX could run as well. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02pc: make 'pc.rom' readonly when machine has PCI enabledIgor Mammedov1-0/+3
looking at bios ROM mapping in QEMU it seems that only isapc (i.e. not PCI enabled machine) requires ROM being mapped as RW in other cases BIOS is mapped as RO. Do the same for option ROM 'pc.rom' when machine has PCI enabled. As useful side-effect pc.rom MemoryRegion stops being put in vhost memory map (filtered out by vhost_section()), which reduces number of entries by 1. Coincidentally it fixes migration failure reported in "[PATCH V2] vhost: fix a migration failed because of vhost region merge" where following destination CLI with /sys/module/vhost/parameters/max_mem_regions = 8 export DIMMSCOUNT=6 QEMU -enable-kvm \ -netdev type=tap,id=guest0,vhost=on,script=no,vhostforce \ -device virtio-net-pci,netdev=guest0 \ -m 256,slots=256,maxmem=2G \ `i=0; while [ $i -lt $DIMMSCOUNT ]; do echo \ "-object memory-backend-ram,id=m$i,size=128M \ -device pc-dimm,id=d$i,memdev=m$i"; i=$(($i + 1)); \ done` will fail to startup with error: "-device pc-dimm,id=d5,memdev=m5: a used vhost backend has no free memory slots left" while it's possible to add the 6th DIMM during hotplug on source. Issue is caused by the fact that number of entries in vhost map is bigger on 1 entry, when -device is processed, than after guest boots up, and that offending entry belongs to 'pc.rom', it's not like vhost intends to do IO in ROM range so making it RO hides region from vhost and makes number of entries in vhost memory map at -device/machine_done time match number of entries after guest boots. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reported-by: Peng Hao <peng.hao2@zte.com.cn> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02intel_iommu: use access_flags for iotlbPeter Xu1-8/+7
It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02intel_iommu: fix iova for ptPeter Xu2-3/+2
IOMMUTLBEntry.iova is returned incorrectly on one PT path (though mostly we cannot really trigger this path, even if we do, we are mostly disgarding this value, so it didn't break anything). Fix it by converting the VTD_PAGE_MASK into the correct definition VTD_PAGE_MASK_4K, then remove VTD_PAGE_MASK. Fixes: b93130 ("intel_iommu: cleanup vtd_{do_}iommu_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-01trace-events: fix code style: print 0x before hex numbersVladimir Sementsov-Ogievskiy1-2/+2
The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events | xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ *.0-9]*(?:[hljztL]|ll|hh)?(?:x|X|"\s*PRI[xX][^"]*"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-01trace-events: fix code style: %# -> 0x%Vladimir Sementsov-Ogievskiy1-11/+11
In trace format '#' flag of printf is forbidden. Fix it to '0x%'. This patch is created by the following: check that we have a problem > find . -name trace-events | xargs grep '%#' | wc -l 56 check that there are no cases with additional printf flags before '#' > find . -name trace-events | xargs grep "%[-+ 0'I]+#" | wc -l 0 check that there are no wrong usage of '#' and '0x' together > find . -name trace-events | xargs grep '0x%#' | wc -l 0 fix the problem > find . -name trace-events | xargs sed -i 's/%#/0x%/g' [Eric Blake noted that xargs grep '%[-+ 0'I]+#' should be xargs grep "%[-+ 0'I]+#" instead so the shell quoting is correct. --Stefan] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170731160135.12101-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-07-31docs: fix broken paths to docs/devel/tracing.txtPhilippe Mathieu-Daudé1-1/+1
With the move of some docs/ to docs/devel/ on ac06724a71, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-07-21xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause ↵Alexey G1-2/+11
inconsistency in guest memory mappings Under certain circumstances normal xen-mapcache functioning may be broken by guest's actions. This may lead to either QEMU performing exit() due to a caught bad pointer (and with QEMU process gone the guest domain simply appears hung afterwards) or actual use of the incorrect pointer inside QEMU address space -- a write to unmapped memory is possible. The bug is hard to reproduce on a i440 machine as multiple DMA sources are required (though it's possible in theory, using multiple emulated devices), but can be reproduced somewhat easily on a Q35 machine using an emulated AHCI controller -- each NCQ queue command slot may be used as an independent DMA source ex. using READ FPDMA QUEUED command, so a single storage device on the AHCI controller port will be enough to produce multiple DMAs (up to 32). The detailed description of the issue follows. Xen-mapcache provides an ability to map parts of a guest memory into QEMU's own address space to work with. There are two types of cache lookups: - translating a guest physical address into a pointer in QEMU's address space, mapping a part of guest domain memory if necessary (while trying to reduce a number of such (re)mappings to a minimum) - translating a QEMU's pointer back to its physical address in guest RAM These lookups are managed via two linked-lists of structures. MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for reverse lookups. Every guest physical address is broken down into 2 parts: address_index = phys_addr >> MCACHE_BUCKET_SHIFT; address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1); MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for a 64-bit system (which assumed for the further description). Basically, this means that we deal with 1 MB chunks and offsets within those 1 MB chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB etc. Most DMA transfers typically are less than 1MB, however, if the transfer crosses any 1MB border(s) - than a nearest larger mapping size will be used, so ex. a 512-byte DMA transfer with the start address 700FFF80h will actually require a 2MB range. Current implementation assumes that MapCacheEntries are unique for a given address_index and size pair and that a single MapCacheEntry may be reused by multiple requests -- in this case the 'lock' field will be larger than 1. On other hand, each requested guest physical address (with 'lock' flag) is described by each own MapCacheRev. So there may be multiple MapCacheRev entries corresponding to a single MapCacheEntry. The xen-mapcache code uses MapCacheRev entries to retrieve the address_index & size pair which in turn used to find a related MapCacheEntry. The 'lock' field within a MapCacheEntry structure is actually a reference counter which shows a number of corresponding MapCacheRev entries. The bug lies in ability for the guest to indirectly manipulate with the xen-mapcache MapCacheEntries list via a special sequence of DMA operations, typically for storage devices. In order to trigger the bug, guest needs to issue DMA operations in specific order and timing. Although xen-mapcache is protected by the mutex lock -- this doesn't help in this case, as the bug is not due to a race condition. Suppose we have 3 DMA transfers, namely A, B and C, where - transfer A crosses 1MB border and thus uses a 2MB mapping - transfers B and C are normal transfers within 1MB range - and all 3 transfers belong to the same address_index In this case, if all these transfers are to be executed one-by-one (without overlaps), no special treatment necessary -- each transfer's mapping lock will be set and then cleared on unmap before starting the next transfer. The situation changes when DMA transfers overlap in time, ex. like this: |===== transfer A (2MB) =====| |===== transfer B (1MB) =====| |===== transfer C (1MB) =====| time ---> In this situation the following sequence of actions happens: 1. transfer A creates a mapping to 2MB area (lock=1) 2. transfer B (1MB) tries to find available mapping but cannot find one because transfer A is still in progress, and it has 2MB size + non-zero lock. So transfer B creates another mapping -- same address_index, but 1MB size. 3. transfer A completes, making 1st mapping entry available by setting its lock to 0 4. transfer C starts and tries to find available mapping entry and sees that 1st entry has lock=0, so it uses this entry but remaps the mapping to a 1MB size 5. transfer B completes and by this time - there are two locked entries in the MapCacheEntry list with the SAME values for both address_index and size - the entry for transfer B actually resides farther in list while transfer C's entry is first 6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index and size pair from corresponding MapCacheRev entry, but then it starts looking for MapCacheEntry with these values and finds the first entry -- which belongs to transfer C. At this point there may be following possible (bad) consequences: 1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value in this statement: raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) + ((unsigned long) ptr - (unsigned long) entry->vaddr_base); resulting in an incorrent raddr value returned from the function. The (ptr - entry->vaddr_base) expression may produce both positive and negative numbers and its actual value may differ greatly as there are many map/unmap operations take place. If the value will be beyond guest RAM limits then a "Bad RAM offset" error will be triggered and logged, followed by exit() in QEMU. 2. If raddr value won't exceed guest RAM boundaries, the same sequence of actions will be performed for xen_invalidate_map_cache_entry() on DMA unmap, resulting in a wrong MapCacheEntry being unmapped while DMA operation which uses it is still active. The above example must be extended by one more DMA transfer in order to allow unmapping as the first mapping in the list is sort of resident. The patch modifies the behavior in which MapCacheEntry's are added to the list, avoiding duplicates. Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-21xen: fix compilation on 32-bit hostsIgor Druzhinin1-4/+5
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-19Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170718-tag' ↵Peter Maydell3-60/+213
into staging Xen 2017/07/18 # gpg: Signature made Tue 18 Jul 2017 23:18:16 BST # gpg: using RSA key 0x894F8F4870E1AE90 # gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>" # gpg: aka "Stefano Stabellini <sstabellini@kernel.org>" # Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90 * remotes/sstabellini/tags/xen-20170718-tag: xen: don't use xenstore to save/restore physmap anymore xen/mapcache: introduce xen_replace_cache_entry() xen/mapcache: add an ability to create dummy mappings xen: move physmap saving into a separate function xen-platform: separate unplugging of NVMe disks xen_pt_msi.c: Check for xen_host_pci_get_* failures in xen_pt_msix_init() hw/xen: Set emu_mask for igd_opregion register Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-18xen: don't use xenstore to save/restore physmap anymoreIgor Druzhinin2-12/+40
If we have a system with xenforeignmemory_map2() implemented we don't need to save/restore physmap on suspend/restore anymore. In case we resume a VM without physmap - try to recreate the physmap during memory region restore phase and remap map cache entries accordingly. The old code is left for compatibility reasons. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-18xen/mapcache: introduce xen_replace_cache_entry()Igor Druzhinin1-8/+77
This new call is trying to update a requested map cache entry according to the changes in the physmap. The call is searching for the entry, unmaps it and maps again at the same place using a new guest address. If the mapping is dummy this call will make it real. This function makes use of a new xenforeignmemory_map2() call with an extended interface that was recently introduced in libxenforeignmemory [1]. [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-18xen/mapcache: add an ability to create dummy mappingsIgor Druzhinin1-8/+36
Dummys are simple anonymous mappings that are placed instead of regular foreign mappings in certain situations when we need to postpone the actual mapping but still have to give a memory region to QEMU to play with. This is planned to be used for restore on Xen. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-18xen: move physmap saving into a separate functionIgor Druzhinin1-26/+31
Non-functional change. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
2017-07-18xen-platform: separate unplugging of NVMe disksStefano Stabellini1-12/+35
Commit 090fa1c8 "add support for unplugging NVMe disks..." extended the existing disk unplug flag to cover NVMe disks as well as IDE and SCSI. The recent thread on the xen-devel mailing list [1] has highlighted that this is not desirable behaviour: PV frontends should be able to distinguish NVMe disks from other types of disk and should have separate control over whether they are unplugged. This patch defines a new bit in the unplug mask for this purpose (see Xen commit [2]) and also tidies up the definitions of, and improves the comments regarding, the previously exiting bits in the protocol. [1] https://lists.xen.org/archives/html/xen-devel/2017-03/msg02924.html [2] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=1096aa02 Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-18ahci: add ahci_get_num_portsJohn Snow1-2/+2
Instead of reaching into the PCI state, allow the AHCIDevice to respond with how many ports it has. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170623220926.11479-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2017-07-14hw: Use new memory_region_init_{ram, rom, rom_device}() functionsPeter Maydell3-8/+4
Use the new functions memory_region_init_{ram,rom,rom_device}() instead of manually calling the _nomigrate() version and then vmstate_register_ram_global(). Patch automatically created using coccinelle script: spatch --in-place -sp_file scripts/coccinelle/memory-region-init-ram.cocci -dir hw (As it turns out, there are no instances of the rom and rom_device functions that are caught by this script.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1499438577-7674-8-git-send-email-peter.maydell@linaro.org
2017-07-14memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate()Peter Maydell4-5/+5
Rename memory_region_init_ram() to memory_region_init_ram_nomigrate(). This leaves the way clear for us to provide a memory_region_init_ram() which does handle migration. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org
2017-07-14Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell4-70/+95
* gdbstub fixes (Alex) * IOMMU MemoryRegion subclass (Alexey) * Chardev hotswap (Anton) * NBD_OPT_GO support (Eric) * Misc bugfixes * DEFINE_PROP_LINK (minus the ARM patches - Fam) * MAINTAINERS updates (Philippe) # gpg: Signature made Fri 14 Jul 2017 11:06:27 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (55 commits) spapr_rng: Convert to DEFINE_PROP_LINK cpu: Convert to DEFINE_PROP_LINK mips_cmgcr: Convert to DEFINE_PROP_LINK ivshmem: Convert to DEFINE_PROP_LINK dimm: Convert to DEFINE_PROP_LINK virtio-crypto: Convert to DEFINE_PROP_LINK virtio-rng: Convert to DEFINE_PROP_LINK virtio-scsi: Convert to DEFINE_PROP_LINK virtio-blk: Convert to DEFINE_PROP_LINK qdev: Add const qualifier to PropertyInfo definitions qmp: Use ObjectProperty.type if present qdev: Introduce DEFINE_PROP_LINK qdev: Introduce PropertyInfo.create qom: enforce readonly nature of link's check callback translate-all: remove redundant !tcg_enabled check in dump_exec_info vl: fix breakage of -tb-size nbd: Implement NBD_INFO_BLOCK_SIZE on client nbd: Implement NBD_INFO_BLOCK_SIZE on server nbd: Implement NBD_OPT_GO on client nbd: Implement NBD_OPT_GO on server ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-14memory/iommu: introduce IOMMUMemoryRegionClassAlexey Kardashevskiy3-12/+42
This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14memory/iommu: QOM'fy IOMMU MemoryRegionAlexey Kardashevskiy2-12/+14
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14mttcg/i386: Patch instruction using async_safe_* frameworkPranith Kumar1-46/+39
In mttcg, calling pause_all_vcpus() during execution from the generated TBs causes a deadlock if some vCPU is waiting for exclusive execution in start_exclusive(). Fix this by using the aync_safe_* framework instead of pausing vcpus for patching instructions. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Message-Id: <20170712215143.19594-2-bobby.prani@gmail.com> [Get rid completely of the TCG-specific code. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-13Convert error_report*_err() to warn_report*_err()Alistair Francis1-2/+1
Convert all uses of error_report*_err("Warning:"... to use warn_report*_err() instead. This helps standardise on a single method of printing warnings to the user. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <d8e088757186955f40f04ec4f4be7f640d3c8660.1499866456.git.alistair.francis@xilinx.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-07-13Convert error_report() to warn_report()Alistair Francis5-20/+20
Convert all uses of error_report("warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these two commands: find ./* -type f -exec sed -i \ 's|error_report(".*warning[,:] |warn_report("|Ig' {} + Indentation fixed up manually afterwards. The test-qdev-global-props test case was manually updated to ensure that this patch passes make check (as the test cases are case sensitive). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Thomas Huth <thuth@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Josh Durgin <jdurgin@redhat.com> Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Greg Kurz <groug@kaod.org> Cc: Rob Herring <robh@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Peter Chubb <peter.chubb@nicta.com.au> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-07-07xen-platform: Cleanup network infrastructure when emulated NICs are unpluggedRoss Lagerwall1-0/+11
When the guest unplugs the emulated NICs, cleanup the peer for each NIC as it is not needed anymore. Most importantly, this allows the tap interfaces which QEMU holds open to be closed and removed. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-04vapic: use tcg_enabledPaolo Bonzini1-2/+3
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-04Move CONFIG_KVM related definitions to kvm_i386.hThomas Huth1-0/+1
pc.h and sysemu/kvm.h are also included from common code (where CONFIG_KVM is not available), so the #defines that depend on CONFIG_KVM should not be declared here to avoid that anybody is using them in a wrong way. Since we're also going to poison CONFIG_KVM for common code, let's move them to kvm_i386.h instead. Most of the dummy definitions from sysemu/kvm.h are also unused since the code that uses them is only compiled for CONFIG_KVM (e.g. target/i386/kvm.c), so the unused defines are also simply dropped here instead of being moved. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1498454578-18709-3-git-send-email-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-03intel_iommu: fix migration breakage on mr switchPeter Xu1-0/+15
Migration is broken after the vfio integration work: qemu-kvm: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address qemu-kvm: Failed to load ich9_ahci:ahci qemu-kvm: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci' qemu-kvm: load of migration failed: Operation not permitted The problem is that vfio work introduced dynamic memory region switching (actually it is also used for future PT mode), and this memory region layout is not properly delivered to destination when migration happens. Solution is to rebuild the layout in post_load. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1459906 Fixes: 558e0024 ("intel_iommu: allow dynamic switch of IOMMU region") Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03hw/acpi: remove dead acpi codeAleksandr Bezzubikov1-10/+0
Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03i386/kvm/pci-assign: Use errp directly rather than local_errMao Zhongyi1-15/+7
In assigned_device_pci_cap_init(), first, error messages are filled to a local_err variable, then through error_propagate() pass to the parameter of errp. It leads to cumbersome code. In order to avoid the extra local_err and error_propagate(), drop it and use errp instead. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: armbru@redhat.com Cc: marcel@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03i386/kvm/pci-assign: Fix return type of verify_irqchip_kernel()Mao Zhongyi1-12/+6
When the function no success value to transmit, it usually make the function return void. It has turned out not to be a success, because it means that the extra local_err variable and error_propagate() will be needed. It leads to cumbersome code, therefore, transmit success/ failure in the return value is worth. So fix the return type to avoid it. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: armbru@redhat.com Cc: marcel@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03pci: Replace pci_add_capability2() with pci_add_capability()Mao Zhongyi1-7/+7
After the patch 'Make errp the last parameter of pci_add_capability()', pci_add_capability() and pci_add_capability2() now do exactly the same. So drop the wrapper pci_add_capability() of pci_add_capability2(), then replace the pci_add_capability2() with pci_add_capability() everywhere. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: dmitry@daynix.com Cc: jasowang@redhat.com Cc: marcel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03pci: Make errp the last parameter of pci_add_capability()Mao Zhongyi1-7/+17
Add Error argument for pci_add_capability() to leverage the errp to pass info on errors. This way is helpful for its callers to make a better error handling when moving to 'realize'. Cc: pbonzini@redhat.com Cc: rth@twiddle.net Cc: ehabkost@redhat.com Cc: mst@redhat.com Cc: jasowang@redhat.com Cc: marcel@redhat.com Cc: alex.williamson@redhat.com Cc: armbru@redhat.com Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03intel_iommu: relax iq tail check on VTD_GCMD_QIE enableLadi Prosek2-15/+20
The VT-d spec (section 6.5.2) prescribes software to zero the Invalidation Queue Tail Register before enabling the VTD_GCMD_QIE Global Command Register bit. Windows Server 2012 R2 and possibly other older Windows versions violate the protocol and set a non-zero queue tail first, which in effect makes them crash early on boot with -device intel-iommu,intremap=on. This commit relaxes the check and instead of failing to enable VTD_GCMD_QIE with vtd_err_qi_enable, it behaves as if the tail register was set just after enabling VTD_GCMD_QIE (see vtd_handle_iqt_write). Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>