peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-07	pc-dimm: factor out MemoryDevice interface	David Hildenbrand	1	-1/+2
	On the qmp level, we already have the concept of memory devices: "query-memory-devices" Right now, we only support NVDIMM and PCDIMM. We want to map other devices later into the address space of the guest. Such device could e.g. be virtio devices. These devices will have a guest memory range assigned but won't be exposed via e.g. ACPI. We want to make them look like memory device, but not glued to pc-dimm. Especially, it will not always be possible to have TYPE_PC_DIMM as a parent class (e.g. virtio devices). Let's use an interface instead. As a first part, convert handling of - qmp_pc_dimm_device_list - get_plugged_memory_size to our new model. plug/unplug stuff etc. will follow later. A memory device will have to provide the following functions: - get_addr(): Necessary, as the property "addr" can e.g. not be used for virtio devices (already defined). - get_plugged_size(): The amount this device offers to the guest as of now. - get_region_size(): Because this can later on be bigger than the plugged size. - fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-04-27	Clear mem_path if we fall back to anonymous RAM allocation	David Gibson	1	-0/+1
	If the -mem-path option is set, we attempt to map the guest's RAM from a file in the given path; it's usually used to back guest RAM with hugepages. If we're unable to (e.g. not enough free hugepages) then we fall back to allocating normal anonymous pages. This behaviour can be surprising, but a comment in allocate_system_memory_nonnuma() suggests it's legacy behaviour we can't change. What really isn't ok, though, is that in this case we leave mem_path set. That means functions which attempt to determine the pagesize of main RAM can erroneously think it is hugepage based on the requested path, even though it's not. This is particular bad for the pseries machine type. KVM HV limitations mean the guest can't use pagesizes larger than the host page size used to back RAM. That means that such a fallback, rather than merely giving poorer performance than expected will cause the guest to freeze up early in boot as it attempts to use large page mappings that can't work. This patch addresses the problem by clearing the mem_path variable when we fall back to anonymous pages, meaning that subsequent attempts to determine the RAM page size will get an accurate result. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-20	qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList	Haozhong Zhang	1	-6/+13
	It may need to treat PC-DIMM and NVDIMM differently, e.g., when deciding the necessity of non-volatile flag bit in SRAT memory affinity structures. A new field 'nvdimm' is added to the union type MemoryDeviceInfo for such purpose. Its type is currently PCDIMMDeviceInfo and will be updated when necessary in the future. It also fixes "info memory-devices"/query-memory-devices which currently show nvdimm devices as dimm devices since object_dynamic_cast(obj, TYPE_PC_DIMM) happily cast nvdimm to TYPE_PC_DIMM which it's been inherited from. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-20	pc-dimm: make qmp_pc_dimm_device_list() sort devices by address	Haozhong Zhang	1	-3/+1
	Make qmp_pc_dimm_device_list() return sorted by start address list of devices so that it could be reused in places that would need sorted list. Reuse existing pc_dimm_built_list() to get sorted list. While at it hide recursive callbacks from callers, so that: qmp_pc_dimm_device_list(qdev_get_machine(), &list); could be replaced with simpler: list = qmp_pc_dimm_device_list(); follow up patch will use it in build_srat() Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-08	numa: we don't implement NUMA for s390x	David Hildenbrand	1	-1/+1
	Right now it is possible to crash QEMU for s390x by providing e.g. -numa node,nodeid=0,cpus=0-1 Problem is, that numa.c uses mc->cpu_index_to_instance_props as an indicator whether NUMA is supported by a machine type. We don't implement NUMA for s390x ("topology") yet. However we need mc->cpu_index_to_instance_props for query-cpus. So let's fix this case by also checking for mc->get_default_cpu_node_id, which will be needed by machine_set_cpu_numa_node(). qemu-system-s390x: -numa node,nodeid=0,cpus=0-1: NUMA is not supported by this machine-type While at it, make s390_cpu_index_to_props() look like on other architectures. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180227110255.20999-1-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-03-02	qapi: Empty out qapi-schema.json	Markus Armbruster	1	-2/+2
	The previous commit improved compile time by including less of the generated QAPI headers. This is impossible for stuff defined directly in qapi-schema.json, because that ends up in headers that that pull in everything. Move everything but include directives from qapi-schema.json to new sub-module qapi/misc.json, then include just the "misc" shard where possible. It's possible everywhere, except: * monitor.c needs qmp-command.h to get qmp_init_marshal() * monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need qapi-event.h to get enum QAPIEvent Perhaps we'll get rid of those some other day. Adding a type to qapi/migration.json now recompiles some 120 instead of 2300 out of 5100 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-25-armbru@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-09	Include qapi/error.h exactly where needed	Markus Armbruster	1	-0/+1
	This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
2018-02-05	qemu: improve hugepage allocation failure message	Marcelo Tosatti	1	-0/+1
	Improve hugepage allocation failure message, indicating what is happening to the user. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20180115201700.GA4439@amt.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-19	hostmem-file: add "align" option	Haozhong Zhang	1	-1/+1
	When mmap(2) the backend files, QEMU uses the host page size (getpagesize(2)) by default as the alignment of mapping address. However, some backends may require alignments different than the page size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned, fails with a kernel message like [617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff) Because there is no common approach to get such alignment requirement, we add the 'align' option to 'memory-backend-file', so that users or management utils, which have enough knowledge about the backend, can specify a proper alignment via this option. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Message-Id: <20171211072806.2812-2-haozhong.zhang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [ehabkost: fixed typo, fixed error_setg() format string] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-12-18	numa: remove unused #include	Philippe Mathieu-Daudé	1	-1/+0
	Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-12-15	spapr: replace numa_get_node() with lookup in pc-dimm list	Igor Mammedov	1	-94/+0
	SPAPR is the last user of numa_get_node() and a bunch of supporting code to maintain numa_info[x].addr list. Get LMB node id from pc-dimm list, which allows to remove ~80LOC maintaining dynamic address range lookup list. It also removes pc-dimm dependency on numa_[un]set_mem_node_id() and makes pc-dimms a sole source of information about which node it belongs to and removes duplicate data from global numa_info. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-11-16	NUMA: Enable adding NUMA node implicitly	Dou Liyang	1	-1/+20
	Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Alistair Francis <alistair23@gmail.com> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> Cc: Izumi Taku <izumi.taku@jp.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-27	numa: fixup parsed NumaNodeOptions earlier	Igor Mammedov	1	-9/+10
	numa 'mem' option with suffix or without one is possible only on CLI/HMP. Instead of fixing up special suffix less CLI case deep in parse_numa_node() do it earlier right after option is parsed into NumaNodeOptions with OptVisistor so that the rest of the code would use valid values in NumaNodeOptions and won't have to reparse QemuOpts. It will help to isolate CLI/HMP parts in parse_numa() and split out parsed NumaNodeOptions processing into separate function that could be reused by QMP handler where we have only NumaNodeOptions and don't need any fixups. While at it reuse qemu_strtosz_MiB() instead of manually checking for suffixes. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1507801198-98182-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19	NUMA: Replace MAX_NODES with nb_numa_nodes in for loop	Dou Liyang	1	-1/+1
	In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1503387936-3483-1-git-send-email-douly.fnst@cn.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-14	hmp: extend "info numa" with hotplugged memory information	Vadim Galitsyn	1	-5/+13
	Report amount of hotplugged memory in addition to total amount per NUMA node. Signed-off-by: Vadim Galitsyn <vadim.galitsyn@profitbricks.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: qemu-devel@nongnu.org Message-Id: <20170829153022.27004-2-vadim.galitsyn@profitbricks.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-07-14	memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate()	Peter Maydell	1	-2/+2
	Rename memory_region_init_ram() to memory_region_init_ram_nomigrate(). This leaves the way clear for us to provide a memory_region_init_ram() which does handle migration. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org
2017-06-20	numa: use get_uint() for "size" property	Marc-André Lureau	1	-3/+3
	"size" is a property of TYPE_MEMORY_BACKEND. host_memory_backend_get_size() and host_memory_backend_set_size() use visit_type_size(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-39-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-06-05	numa: make sure that all cpus have has_node_id set if numa is enabled	Igor Mammedov	1	-11/+5
	It fixes/add missing _PXM object for non mapped CPU (x86) and missing fdt node (virt-arm). It ensures that possible_cpus contains complete mapping if numa is enabled by the time machine_init() is executed. As result non completely mapped CPUs: 1) appear in ACPI/fdt blobs 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs 3) allows to drop checks for has_node_id in numa only code, reducing number of invariants incomplete mapping could produce 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug() (when CPU object is created) to machine_numa_finish_init() which helps to fix [1, 2] and make possible_cpus complete source of numa mapping available even before CPUs are created. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-4-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05	numa: move default mapping init to machine	Igor Mammedov	1	-26/+0
	there is no need use cpu_index_to_instance_props() for setting default cpu -> node mapping. Generic machine code can do it without cpu_index by just enabling already preset defaults in possible_cpus. PS: as bonus it makes one less user of cpu_index_to_instance_props() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05	numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr	Igor Mammedov	1	-0/+23
	Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com> [ehabkost: Fix indentation] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-30	numa: Fix format string for "Invalid node" message	Eduardo Habkost	1	-2/+1
	Some compilers complain about the PRIu16 format string with the MAX(src, dst) and MAX_NODES arguments. Example output from Apple LLVM version 7.3.0 (clang-703.0.31): numa.c:236:20: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] MAX(src, dst), MAX_NODES); ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ include/qapi/error.h:163:35: note: expanded from macro 'error_setg' (fmt), ## __VA_ARGS__) ^~~~~~~~~~~ glib/2.52.2/include/glib-2.0/glib/gmacros.h:288:20: note: expanded from macro 'MAX' #define MAX(a, b) (((a) > (b)) ? (a) : (b)) ^~~~~~~~~~~~~~~~~~~~~~~~~ numa.c:236:35: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] MAX(src, dst), MAX_NODES); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ include/qapi/error.h:163:35: note: expanded from macro 'error_setg' (fmt), ## __VA_ARGS__) ^~~~~~~~~~~ include/sysemu/sysemu.h:165:19: note: expanded from macro 'MAX_NODES' #define MAX_NODES 128 ^~~ MAX(src, dst) promotes the src and dst arguments to int, and MAX_NODES is an int. Use %d to silence those warnings. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170530184013.31044-1-ehabkost@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-15	Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' ↵	Stefan Hajnoczi	1	-96/+206
	into staging x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification # gpg: Signature made Thu 11 May 2017 08:16:07 PM BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # gpg: Note: This key has expired! # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * ehabkost/tags/x86-and-machine-pull-request: (29 commits) migration/i386: Remove support for pre-0.12 formats vmstatification: i386 FPReg migration/i386: Remove old non-softfloat 64bit FP support tests: check -numa node,cpu=props_list usecase numa: add '-numa cpu,...' option for property based node mapping numa: remove node_cpu bitmaps as they are no longer used numa: use possible_cpus for not mapped CPUs check machine: call machine init from wrapper numa: remove no longer need numa_post_machine_init() tests: numa: add case for QMP command query-cpus QMP: include CpuInstanceProperties into query_cpus output output virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() numa: do default mapping based on possible_cpus instead of node_cpu bitmaps numa: mirror cpu to node mapping in MachineState::possible_cpus numa: add check that board supports cpu_index to node mapping virt-arm: add node-id property to CPU pc: add node-id property to CPU spapr: add node-id property to sPAPR core ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-11	numa: add '-numa cpu,...' option for property based node mapping	Igor Mammedov	1	-0/+15
	legacy cpu to node mapping is using cpu index values to map VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]' option. However cpu index is internal concept and QEMU users have to guess /reimplement qemu's logic/ to map it to a concrete cpu socket/core/thread to make sane CPUs placement across numa nodes. This patch allows to map cpu objects to numa nodes using the same properties as used for cpus with -device/device_add (socket-id/core-id/thread-id/node-id). At present valid properties/values to address CPUs could be fetched using hotpluggable-cpus monitor/qmp command, it will require user to start qemu twice when creating domain to fetch possible CPUs for a machine type/-smp layout first and then the second time with numa explicit mapping for actual usage. The first step results could be saved and reused to set/change mapping later as far as machine type/-smp stays the same. Proposed impl. supports exact and wildcard matching to simplify CLI and allow to set mapping for a specific cpu or group of cpu objects specified by matched properties. For example: # exact mapping x86 -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n # exact mapping SPAPR -numa cpu,node-id=x,core-id=y # wildcard mapping, all cpu objects that match socket-id=y # are mapped to node-id=x -numa cpu,node-id=x,socket-id=y Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1494415802-227633-18-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: remove node_cpu bitmaps as they are no longer used	Igor Mammedov	1	-43/+0
	Postfactum "CPU(s) present in multiple NUMA nodes" check was the last user of node_cpu bitmaps, but it's not need as machine_set_cpu_numa_node() does the similar check at the time mapping is set for cpus (i.e. when -numa cpus= is parsed) and ensures that cpu can be mapped only to one node. Remove duplicate check based on node_cpu bitmaps and since the last user is gone remove node_cpu as well, which completes internal transition from legacy bitmap based mapping storage to possible_cpus storage. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-17-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: use possible_cpus for not mapped CPUs check	Igor Mammedov	1	-10/+0
	and remove corresponding part in numa.c that uses node_cpu bitmaps. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-16-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: remove no longer need numa_post_machine_init()	Igor Mammedov	1	-15/+0
	CPUState::numa_node is still in use but now it's set by board when it creates CPU objects. So there isn't any need to set it again after all CPU's are created, since it's been already set. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-14-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	tests: numa: add case for QMP command query-cpus	Igor Mammedov	1	-14/+0
	Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-13-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: do default mapping based on possible_cpus instead of node_cpu bitmaps	Igor Mammedov	1	-8/+12
	Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-8-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: mirror cpu to node mapping in MachineState::possible_cpus	Igor Mammedov	1	-0/+8
	Introduce machine_set_cpu_numa_node() helper that stores node mapping for CPU in MachineState::possible_cpus. CPU and node it belongs to is specified by 'props' argument. Patch doesn't remove old way of storing mapping in numa_info[X].node_cpu as removing it at the same time makes patch rather big. Instead it just mirrors mapping in possible_cpus and follow up per target patches will switch to possible_cpus and numa_info[X].node_cpu will be removed once there isn't any users left. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <1494415802-227633-7-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: add check that board supports cpu_index to node mapping	Igor Mammedov	1	-3/+10
	Default node mapping initialization already checks that board supports cpu_index to node mapping and refuses to start if it's not supported. Do the same for explicitly provided mapping "-numa node,cpus=..." Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-6-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: move source of default CPUs to NUMA node mapping into boards	Igor Mammedov	1	-13/+11
	Originally CPU threads were by default assigned in round-robin fashion. However it was causing issues in guest since CPU threads from the same socket/core could be placed on different NUMA nodes. Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) fixed it by grouping threads within a socket on the same node introducing cpu_index_to_socket_id() callback and commit 20bb648d (spapr: Fix default NUMA node allocation for threads) reused callback to fix similar issues for SPAPR machine even though socket doesn't make much sense there. As result QEMU ended up having 3 default distribution rules used by 3 targets /virt-arm, spapr, pc/. In effort of moving NUMA mapping for CPUs into possible_cpus, generalize default mapping in numa.c by making boards decide on default mapping and let them explicitly tell generic numa code to which node a CPU thread belongs to by replacing cpu_index_to_socket_id() with @cpu_index_to_instance_props() which provides default node_id assigned by board to specified cpu_index. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: equally distribute memory on nodes	Laurent Vivier	1	-11/+38
	When there are more nodes than available memory to put the minimum allowed memory by node, all the memory is put on the last node. This is because we put (ram_size / nb_numa_nodes) & ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this case the value is 0. This is particularly true with pseries, as the memory must be aligned to 256MB. To avoid this problem, this patch uses an error diffusion algorithm [1] to distribute equally the memory on nodes. We introduce numa_auto_assign_ram() function in MachineClass to keep compatibility between machine type versions. The legacy function is used with pseries-2.9, pc-q35-2.9 and pc-i440fx-2.9 (and previous), the new one with all others. Example: qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ -numa node -numa node -numa node \ -numa node -numa node -numa node Before: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 0 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 0 MB node 4 cpus: 4 node 4 size: 0 MB node 5 cpus: 5 node 5 size: 1024 MB After: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 256 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 256 MB node 4 cpus: 4 node 4 size: 256 MB node 5 cpus: 5 node 5 size: 256 MB [1] https://en.wikipedia.org/wiki/Error_diffusion Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170502162955.1610-2-lvivier@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11	numa: Allow setting NUMA distance for different NUMA nodes	He Chen	1	-2/+135
	This patch is going to add SLIT table support in QEMU, and provides additional option `dist` for command `-numa` to allow user set vNUMA distance by QEMU command. With this patch, when a user wants to create a guest that contains several vNUMA nodes and also wants to set distance among those nodes, the QEMU command would like: ``` -numa node,nodeid=0,cpus=0 \ -numa node,nodeid=1,cpus=1 \ -numa node,nodeid=2,cpus=2 \ -numa node,nodeid=3,cpus=3 \ -numa dist,src=0,dst=1,val=21 \ -numa dist,src=0,dst=2,val=31 \ -numa dist,src=0,dst=3,val=41 \ -numa dist,src=1,dst=2,val=21 \ -numa dist,src=1,dst=3,val=31 \ -numa dist,src=2,dst=3,val=21 \ ``` Signed-off-by: He Chen <he.chen@linux.intel.com> Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-07	Remove reduntant qemu: from error functions	Ishani Chugh	1	-2/+2
	This patch removes redundant "qemu:" from error functions. The link to the bitesized task is: http://wiki.qemu-project.org/Contribute/BiteSizedTasks#Error_checking Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-03-22	numa,spapr: align default numa node memory size to 256MB	Laurent Vivier	1	-3/+3
	Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE). But when "-numa" option is provided without "mem" parameter, the memory is equally divided between nodes, but 8MB aligned. This can be not valid for pseries. In that case we can have: $ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB With this patch, we have: (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 1280 MB node 1 cpus: node 1 size: 1280 MB node 2 cpus: node 2 size: 1536 MB Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-22	numa: Flatten simple union NumaOptions	Markus Armbruster	1	-2/+2
	Simple unions are simpler than flat unions in the schema, but more complicated in C and on the QMP wire: there's extra indirection in C and extra nesting on the wire, both pointless. They're best avoided in new code. NumaOptions isn't new, but it's only used internally, not in QMP. Convert it to a flat union. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487709988-14322-2-git-send-email-armbru@redhat.com>
2017-01-16	ramblock-notifier: new	Paolo Bonzini	1	-0/+29
	This adds a notify interface of ram block additions and removals. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-12	numa: make -numa parser dynamically allocate CPUs masks	Igor Mammedov	1	-7/+14
	so it won't impose an additional limits on max_cpus limits supported by different targets. It removes global MAX_CPUMASK_BITS constant and need to bump it up whenever max_cpus is being increased for a target above MAX_CPUMASK_BITS value. Use runtime max_cpus value instead to allocate sufficiently sized node_cpu bitmasks in numa parser. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1479466974-249781-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: Added asserts to ensure cpu_index < max_cpus] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-01-12	monitor: fix qmp/hmp query-memdev not reporting IDs of memory backends	Igor Mammedov	1	-0/+3
	Considering 'id' is mandatory for user_creatable objects/backends and user_creatable_add_type() always has it as an argument regardless of where from it is called CLI/monitor or QMP, Fix issue by adding 'id' property to hostmem backends and set it in user_creatable_add_type() for every object that implements 'id' property. Then later at query-memdev time get 'id' from object directly. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1484052795-158195-4-git-send-email-imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-10	numa: reduce code duplication by adding helper numa_get_node_for_cpu()	Igor Mammedov	1	-0/+12
	Replace repeated pattern for (i = 0; i < nb_numa_nodes; i++) { if (test_bit(idx, numa_info[i].node_cpu)) { ... break; with a helper function to lookup numa node index for cpu. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-07	numa: do not leak NumaOptions	Marc-André Lureau	1	-7/+8
	In all cases, call qapi_free_NumaOptions(), by using a common ending block. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-08-02	numa: set the memory backend "is_mapped" field	Greg Kurz	1	-0/+1
	Commit 2aece63 "hostmem: detect host backend memory is being used properly" added a way to know if a memory backend is busy or available for use. It caused a slight regression if we pass the same backend to a NUMA node and to a pc-dimm device: -m 1G,slots=2,maxmem=2G \ -object memory-backend-ram,size=1G,id=mem-mem1 \ -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \ -numa node,nodeid=0,memdev=mem-mem1 Before commit 2aece63, this would cause QEMU to print an error message and to exit gracefully: qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1: can't use already busy memdev: mem-mem1 Since commit 2aece63, QEMU hits an assertion in the memory code: qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common: Assertion `!subregion->container' failed. Aborted This happens because pc-dimm devices don't use memory_region_is_mapped() anymore and cannot guess the backend is already used by a NUMA node. Let's revert to the previous behavior by turning the NUMA code to also call host_memory_backend_set_mapped() when it uses a backend. Fixes: 2aece63c8a9d2c3a8ff41d2febc4cdeff2633331 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <146891691503.15642.9817215371777203794.stgit@bahia.lan> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-06	opts-visitor: Favor new visit_free() function	Eric Blake	1	-3/+3
	Now that we have a polymorphic visit_free(), we no longer need opts_visitor_cleanup(); which in turn means we no longer need to return a subtype from opts_visitor_new() nor a public upcast function. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-6-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18	qapi: Don't special-case simple union wrappers	Eric Blake	1	-2/+2
	Simple unions were carrying a special case that hid their 'data' QMP member from the resulting C struct, via the hack method QAPISchemaObjectTypeVariant.simple_union_type(). But by using the work we started by unboxing flat union and alternate branches, coupled with the ability to visit the members of an implicit type, we can now expose the simple union's implicit type in qapi-types.h: \| struct q_obj_ImageInfoSpecificQCow2_wrapper { \| ImageInfoSpecificQCow2 data; \| }; \| \| struct q_obj_ImageInfoSpecificVmdk_wrapper { \| ImageInfoSpecificVmdk data; \| }; ... \| struct ImageInfoSpecific { \| ImageInfoSpecificKind type; \| union { /* union tag is @type / \| void data; \|- ImageInfoSpecificQCow2 qcow2; \|- ImageInfoSpecificVmdk vmdk; \|+ q_obj_ImageInfoSpecificQCow2_wrapper qcow2; \|+ q_obj_ImageInfoSpecificVmdk_wrapper vmdk; \| } u; \| }; Doing this removes asymmetry between QAPI's QMP side and its C side (both sides now expose 'data'), and means that the treatment of a simple union as sugar for a flat union is now equivalent in both languages (previously the two approaches used a different layer of dereferencing, where the simple union could be converted to a flat union with equivalent C layout but different {} on the wire, or to an equivalent QMP wire form but with different C representation). Using the implicit type also lets us get rid of the simple_union_type() hack. Of course, now all clients of simple unions have to adjust from using su->u.member to using su->u.member.data; while this touches a number of files in the tree, some earlier cleanup patches helped minimize the change to the initialization of a temporary variable rather than every single member access. The generated qapi-visit.c code is also affected by the layout change: \|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member \| } \| switch (obj->type) { \| case IMAGE_INFO_SPECIFIC_KIND_QCOW2: \|- visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err); \|+ visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err); \| break; \| case IMAGE_INFO_SPECIFIC_KIND_VMDK: \|- visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err); \|+ visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err); \| break; \| default: \| abort(); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-04	qapi-dealloc: Reduce use outside of generated code	Eric Blake	1	-8/+1
	No need to roll our own use of the dealloc visitors when we can just directly use the qapi_free_FOO() functions that do what we want in one line. In net.c, inline net_visit() into its remaining lone caller. After this patch, test-visitor-serialization.c is the only non-generated file that needs to use a dealloc visitor, because it is testing low level aspects of the visitor interface. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1456262075-3311-2-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08	qapi: Swap visit_* arguments for consistent 'name' placement	Eric Blake	1	-3/+3
	JSON uses "name":value, but many of our visitor interfaces were called with visit_type_FOO(v, &value, name, errp). This can be a bit confusing to have to mentally swap the parameter order to match JSON order. It's particularly bad for visit_start_struct(), where the 'name' parameter is smack in the middle of the otherwise-related group of 'obj, kind, size' parameters! It's time to do a global swap of the parameter ordering, so that the 'name' parameter is always immediately after the Visitor argument. Additional reason in favor of the swap: the existing include/qjson.h prefers listing 'name' first in json_prop_(), and I have plans to unify that file with the qapi visitors; listing 'name' first in qapi will minimize churn to the (admittedly few) qjson.h clients. Later patches will then fix docs, object.h, visitor-impl.h, and those clients to match. Done by first patching scripts/qapi.py by hand to make generated files do what I want, then by running the following Coccinelle script to affect the rest of the code base: $ spatch --sp-file script `git grep -l '\bvisit_' -- '*/.[ch]'` I then had to apply some touchups (Coccinelle insisted on TAB indentation in visitor.h, and botched the signature of visit_type_enum() by rewriting 'const char const strings[]' to the syntactically invalid 'const charconst[] strings'). The movement of parameters is sufficient to provoke compiler errors if any callers were missed. // Part 1: Swap declaration order @@ type TV, TErr, TObj, T1, T2; identifier OBJ, ARG1, ARG2; @@ void visit_start_struct -(TV v, TObj OBJ, T1 ARG1, const char name, T2 ARG2, TErr errp) +(TV v, const char name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp) { ... } @@ type bool, TV, T1; identifier ARG1; @@ bool visit_optional -(TV v, T1 ARG1, const char name) +(TV v, const char name, T1 ARG1) { ... } @@ type TV, TErr, TObj, T1; identifier OBJ, ARG1; @@ void visit_get_next_type -(TV v, TObj OBJ, T1 ARG1, const char name, TErr errp) +(TV v, const char name, TObj OBJ, T1 ARG1, TErr errp) { ... } @@ type TV, TErr, TObj, T1, T2; identifier OBJ, ARG1, ARG2; @@ void visit_type_enum -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char name, TErr errp) +(TV v, const char name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp) { ... } @@ type TV, TErr, TObj; identifier OBJ; identifier VISIT_TYPE =~ "^visit_type_"; @@ void VISIT_TYPE -(TV v, TObj OBJ, const char name, TErr errp) +(TV v, const char name, TObj OBJ, TErr errp) { ... } // Part 2: swap caller order @@ expression V, NAME, OBJ, ARG1, ARG2, ERR; identifier VISIT_TYPE =~ "^visit_type_"; @@ ( -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR) +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR) \| -visit_optional(V, ARG1, NAME) +visit_optional(V, NAME, ARG1) \| -visit_get_next_type(V, OBJ, ARG1, NAME, ERR) +visit_get_next_type(V, NAME, OBJ, ARG1, ERR) \| -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR) +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR) \| -VISIT_TYPE(V, OBJ, NAME, ERR) +VISIT_TYPE(V, NAME, OBJ, ERR) ) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-04	all: Clean up includes	Peter Maydell	1	-0/+1
	Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2016-01-26	memory: exit when hugepage allocation fails if mem-prealloc	Luiz Capitulino	1	-4/+7
	When -mem-prealloc is passed on the command-line, the expected behavior is to exit if the hugepage allocation fails. However, this behavior is broken since commit cc57501dee which made hugepage allocation fall back to regular ram in case of faliure. This commit restores the expected behavior for -mem-prealloc. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Message-Id: <20160122091501.75bbd42a@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-13	Use error_fatal to simplify obvious fatal errors	Markus Armbruster	1	-6/+2
	Done with this Coccinelle semantic patch: @@ type T; identifier FUN, RET; expression list ARGS; expression ERR, EC; @@ ( - T RET = FUN(ARGS, &ERR); + T RET = FUN(ARGS, &error_fatal); \| - RET = FUN(ARGS, &ERR); + RET = FUN(ARGS, &error_fatal); \| - FUN(ARGS, &ERR); + FUN(ARGS, &error_fatal); ) - if (ERR != NULL) { - error_report_err(ERR); - exit(EC); - } This is actually a more elegant version of my initial semantic patch by courtesy of Eduardo. It leaves dead Error * variables behind, cleaned up manually. Cc: qemu-arm@nongnu.org Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2015-12-18	numa: Clean up query-memdev error handling	Markus Armbruster	1	-49/+10
	qmp_query_memdev() has two error paths: * When object_get_objects_root() returns null. It never does, so simply drop the useless error handling. * When query_memdev() fails. It leaks err then. But any failure there is actually a programming error. Switch it to &error_abort, and drop the useless error handling. Messed up in commit 76b5d85 "qmp: add query-memdev". Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>