peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2011-12-05	block: convert qemu_aio_flush() calls to bdrv_drain_all()	Stefan Hajnoczi	12	-13/+34
	Many places in QEMU call qemu_aio_flush() to complete all pending asynchronous I/O. Most of these places actually want to drain all block requests but there is no block layer API to do so. This patch introduces the bdrv_drain_all() API to wait for requests across all BlockDriverStates to complete. As a bonus we perform checks after qemu_aio_wait() to ensure that requests really have finished. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: wait_for_overlapping_requests() deadlock detection	Stefan Hajnoczi	1	-0/+8
	Debugging a reentrant request deadlock was fun but in the future we need a quick and obvious way of detecting such bugs. Add an assert that checks we are not about to deadlock when waiting for another request. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: implement bdrv_co_is_allocated() boundary cases	Stefan Hajnoczi	1	-8/+18
	Cases beyond the end of the disk image are only implemented for block drivers that do not provide .bdrv_co_is_allocated(). It's worth making these cases generic so that block drivers that do implement .bdrv_co_is_allocated() also get them for free. Suggested-by: Mark Wu <wudxw@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	dma-helpers: Add trace events	Kevin Wolf	2	-0/+17
	Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	cow: use bdrv_co_is_allocated()	Stefan Hajnoczi	1	-3/+3
	Now that bdrv_co_is_allocated() is available we can use it instead of the synchronous bdrv_is_allocated() interface. This is a follow-up that Kevin Wolf <kwolf@redhat.com> pointed out after applying the series that introduces bdrv_co_is_allocated(). It is safe to make cow_read() a coroutine_fn because its only caller is a coroutine_fn. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add -drive copy-on-read=on\|off	Stefan Hajnoczi	4	-3/+21
	This patch adds the -drive copy-on-read=on\|off command-line option: copy-on-read=on\|off copy-on-read is "on" or "off" and enables whether to copy read backing file sectors into the image file. Copy-on-read avoids accessing the same backing file sectors repeatedly and is useful when the backing file is over a slow network. By default copy-on-read is off. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: core copy-on-read logic	Stefan Hajnoczi	2	-0/+73
	Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: request overlap detection	Stefan Hajnoczi	1	-2/+43
	Detect overlapping requests and remember to align to cluster boundaries if the image format uses them. This assumes that allocating I/O is performed in cluster granularity - which is true for qcow2, qed, etc. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: wait for overlapping requests	Stefan Hajnoczi	1	-0/+35
	When copy-on-read is enabled it is necessary to wait for overlapping requests before issuing new requests. This prevents races between the copy-on-read and a write request. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add interface to toggle copy-on-read	Stefan Hajnoczi	3	-0/+28
	The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can be used to programmatically enable or disable copy-on-read for a block device. Later patches add the actual copy-on-read logic. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add request tracking	Stefan Hajnoczi	2	-1/+51
	The block layer does not know about pending requests. This information is necessary for copy-on-read since overlapping requests must be serialized to prevent races that corrupt the image. The BlockDriverState gets a new tracked_request list field which contains all pending requests. Each request is a BdrvTrackedRequest record with sector_num, nb_sectors, and is_write fields. Note that request tracking is always enabled but hopefully this extra work is so small that it doesn't justify adding an enable/disable flag. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	coroutine: add qemu_co_queue_restart_all()	Stefan Hajnoczi	3	-8/+14
	It's common to wake up all waiting coroutines. Introduce the qemu_co_queue_restart_all() function to do this instead of looping over qemu_co_queue_next() in every caller. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	qemu-common: add QEMU_ALIGN_DOWN() and QEMU_ALIGN_UP() macros	Stefan Hajnoczi	1	-0/+6
	Add macros for aligning a number to a multiple, for example: QEMU_ALIGN_DOWN(500, 2000) = 0 QEMU_ALIGN_UP(500, 2000) = 2000 Since ALIGN_UP() is a common macro name use the QEMU_* namespace prefix. Hopefully this will protect us from included headers that leak something with a similar name. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add bdrv_co_is_allocated() interface	Stefan Hajnoczi	2	-13/+26
	This patch introduces the public bdrv_co_is_allocated() interface which can be used to query image allocation status while the VM is running. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: drop .bdrv_is_allocated() interface	Stefan Hajnoczi	2	-22/+18
	Now that all block drivers have been converted to .bdrv_co_is_allocated() we can drop .bdrv_is_allocated(). Note that the public bdrv_is_allocated() interface is still available but is in fact a synchronous wrapper around .bdrv_co_is_allocated(). Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	cow: convert to .bdrv_co_is_allocated()	Stefan Hajnoczi	1	-4/+4
	The cow block driver does not keep internal state for cluster lookups. This means it is safe to perform cluster lookups in coroutine context without risk of race conditions that corrupt internal state. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	vdi: convert to .bdrv_co_is_allocated()	Stefan Hajnoczi	1	-3/+3
	It is trivial to switch from the synchronous .bdrv_is_allocated() interface to .bdrv_co_is_allocated() since vdi_is_allocated() does not block. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	vvfat: convert to .bdrv_co_is_allocated()	Stefan Hajnoczi	1	-2/+2
	It is trivial to switch from the synchronous .bdrv_is_allocated() interface to .bdrv_co_is_allocated() since vvfat_is_allocated() does not block. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: convert qcow2, qcow2, and vmdk to .bdrv_co_is_allocated()	Stefan Hajnoczi	3	-11/+18
	The qcow2, qcow, and vmdk block drivers are based on coroutines. They have a coroutine mutex which protects internal state. We can convert the .bdrv_is_allocated() function to .bdrv_co_is_allocated() by holding the mutex around the cluster lookup operation. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	qed: convert to .bdrv_co_is_allocated()	Stefan Hajnoczi	1	-4/+11
	The bdrv_qed_is_allocated() function is a synchronous wrapper around qed_find_cluster(), which performs the cluster lookup. In order to convert the synchronous function to a coroutine function we yield instead of using qemu_aio_wait(). Note that QED's cache is already safe for parallel requests so no locking is needed. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add .bdrv_co_is_allocated()	Stefan Hajnoczi	2	-0/+39
	This patch adds the .bdrv_co_is_allocated() interface which is identical to .bdrv_is_allocated() but runs in coroutine context. Running in coroutine context implies that other coroutines might be performing I/O at the same time. Therefore it must be safe to run while the following BlockDriver functions are in-flight: .bdrv_co_readv() .bdrv_co_writev() .bdrv_co_flush() .bdrv_co_is_allocated() The new .bdrv_co_is_allocated() interface is useful because it can be used when a VM is running, whereas .bdrv_is_allocated() is a synchronous interface that does not cope with parallel requests. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: use public bdrv_is_allocated() interface	Stefan Hajnoczi	1	-1/+1
	There is no need for bdrv_commit() to use the BlockDriver .bdrv_is_allocated() interface directly. Converting to the public interface gives us the freedom to drop .bdrv_is_allocated() entirely in favor of a new .bdrv_co_is_allocated() in the future. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	qcow2: Fix error path in qcow2_snapshot_load_tmp	Kevin Wolf	1	-12/+22
	If the bdrv_read() of the snapshot's L1 table fails, return the right error code and make sure that the old L1 table is still loaded and we don't break the BlockDriverState completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Fix order in qcow2_snapshot_delete	Kevin Wolf	1	-15/+33
	First the snapshot must be deleted and only then the refcounts can be decreased. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Fix order of refcount updates in qcow2_snapshot_goto	Kevin Wolf	2	-18/+50
	The refcount updates must be moved so that in the worst case we can get cluster leaks, but refcounts may never be too low. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Return real error in qcow2_snapshot_goto	Kevin Wolf	1	-11/+40
	Besides fixing the return code, this adds some comments that make clear how the code works and that it potentially breaks images if we fail in the wrong place. Actually fixing this is left for the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Rework qcow2_snapshot_create error handling	Kevin Wolf	1	-14/+41
	Increase refcounts only after allocating a new L1 table has succeeded in order to make leaks less likely. If writing the snapshot table fails, revert in-memory state to be consistent with that on disk. While at it, make it return the real error codes instead of -1. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Cleanups and memleak fix in qcow2_snapshot_create	Kevin Wolf	1	-15/+11
	sn->id_str could be leaked before this. The rest of this patch changes comments, fixes coding style or removes checks that are unnecessary with g_malloc. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Update snapshot table information at once	Kevin Wolf	1	-12/+10
	Failing in the middle wouldn't help with the integrity of the image, so doing everything in a single request seems better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Return real error code in qcow2_write_snapshots	Kevin Wolf	1	-10/+38
	Doesn't immediately fix anything as the callers don't use the return value, but they will be fixed next. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	qcow2: Return real error code in qcow2_read_snapshots	Kevin Wolf	2	-7/+23
	Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05	block: Add coroutine_fn marker to coroutine functions	Dong Xu Wang	3	-8/+8
	Looks better when reviewing these source files. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	hmp/qmp: add block_set_io_throttle	Zhi Yong Wu	9	-2/+175
	Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add I/O throttling algorithm	Zhi Yong Wu	3	-0/+236
	Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	CoQueue: introduce qemu_co_queue_wait_insert_head	Zhi Yong Wu	2	-0/+14
	Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: add the blockio limits command line support	Zhi Yong Wu	6	-0/+141
	Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	block: Use bdrv functions to replace file operation in cow.c	Li Zhi Hui	1	-18/+16
	Since common file operation functions lack of error detection, so change them to bdrv series functions. Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	xen_disk: remove dead code	Paolo Bonzini	1	-84/+2
	Xen_disk.c has support for using synchronous I/O instead of asynchronous, but it is compiled out by default. Remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	qed: adjust the way to get nb_sectors	Zhi Yong Wu	1	-3/+3
	This patch is only to refactor some lines of codes to get better and more robust codes. As you have seen, in qed_read_table_cb() it's nice to use qiov->size because that function doesn't obviously use a single struct iovec. In other two functions, if qiov use more than one struct iovec, the existing way will get wrong nb_sectors. To make the code more robust, it will be nicer to refactor the existing way as below. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	qcow2: avoid reentrant bdrv_read() in copy_sectors()	Stefan Hajnoczi	1	-8/+19
	A BlockDriverState should not issue requests on itself through the public block layer interface. Nested, or reentrant, requests are problematic because they do I/O throttling and request tracking twice. Features like block layer copy-on-read use request tracking to avoid race conditions between concurrent requests. The reentrant request will have to "wait" for its parent request to complete. But the parent is waiting for the reentrant request to make progress so we have reached deadlock. The solution is for block drivers to avoid the public block layer interfaces for reentrant requests. Instead they should call their own internal functions if they wish to perform reentrant requests. This is also a good opportunity to make copy_sectors() a true coroutine_fn. That means calling bdrv_co_writev() instead of bdrv_write(). Behavior is unchanged but we're being explicit that this executes in coroutine context. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05	qcow2: Unlock during COW	Kevin Wolf	1	-69/+35
	Unlocking during COW allows for more parallelism. One change it requires is that buffers are dynamically allocated instead of just using a per-image buffer. While touching the code, drop the synchronous qcow2_read() function and replace it by a bdrv_read() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-01	Update version for 1.0 releasev1.0	Anthony Liguori	1	-1/+1
	Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-30	Makefile: use full path for qapi-generated directory	Michael Roth	1	-1/+1
	Generally $(BUILD_DIR) == $(CURDIR), but that isn't necessarilly the case, so use $(BUILD_DIR)/qapi-generated for generated files to avoid potentionally sticking generating files in odd places outside the build's include paths. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-30	qapi: fix guardname generation	Michael Roth	1	-3/+4
	Fix a bug in handling dotted paths, and exclude directory prefixes from generated guardnames to avoid odd/pseudo-random guardnames in generated headers. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28	Update version for 1.0-rc4v1.0-rc4	Anthony Liguori	1	-1/+1
	Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28	ccid: Fix buffer overrun in handling of VSC_ATR message	Markus Armbruster	1	-0/+1
	ATR size exceeding the limit is diagnosed, but then we merrily use it anyway, overrunning card->atr[]. The message is read from a character device. Obvious security implications unless the other end of the character device is trusted. Spotted by Coverity. CVE-2011-4111. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28	Revert "fix out of tree build"	Anthony Liguori	1	-1/+1
	This reverts commit be85c90b74f56dca51782fa3080fcdf88593e045. This patch is incorrect and breaks the build with a freshly cloned git tree. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28	configure: avoid screening of --{en, dis}able-usb-redir options	Max Filippov	1	-2/+8
	--dir) option pattern precede --{en,dis}able-usb-redir) patterns in the option analysis switch, making the latter options have no effect. There were some --dir that are supported by Autoconf and not by QEMU configure. The aim was to let QEMU packagers use the rpm (or similar) macro that overrides directories for their distribution. Replace --*dir with exact option names. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28	cutils: Make strtosz & friends leave follow set to callers	Markus Armbruster	1	-44/+26
	strtosz() & friends require the size to be at the end of the string, or be followed by whitespace or ','. I find this surprising, because the name suggests it works like strtol(). The check simplifies callers that accept exactly that follow set slightly. No such callers exist. The check is redundant for callers that accept a smaller follow set, and thus need to check themselves anyway. Right now, this is the case for all but one caller. All of them neglected to check, or checked incorrectly, but the previous few commits fixed them up. Finally, the check is problematic for callers that accept a larger follow set. This is the case in monitor_parse_command(). Fortunately, the problems there are relatively harmless. monitor_parse_command() uses strtosz() for argument type 'o'. When the last argument is of type 'o', a trailing ',' is diagnosed differently than other trailing junk: (qemu) migrate_set_speed 1x invalid size (qemu) migrate_set_speed 1, migrate_set_speed: extraneous characters at the end of line A related inconsistency exists with non-last arguments. No such command exists, but let's use memsave to explore the inconsistency. The monitor permits, but does not require whitespace between arguments. For instance, "memsave (1-1)1024foo" is parsed as command memsave with three arguments 0, 1024 and "foo". Yes, this is daft, but at least it's consistently daft. If I change memsave's second argument from 'i' to 'o', then "memsave (1-1)1foo" is rejected, because the size is followed by an 'f'. But "memsave (1-1)1," is still accepted, and duly saves to file ",". We don't have any users of strtosz that profit from the check. In the users we have, it appears to encourage sloppy error checking, or gets in the way. Drop the bothersome check. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28	qemu-img: Tighten parsing of size arguments	Markus Armbruster	1	-4/+6
	strtosz_suffix() fails unless the size is followed by 0, whitespace or ','. Useless here, because we need to fail for any junk following the size, even if it starts with whitespace or ','. Check manually. Things like "qemu-img create xxx 1024," and "qemu-img convert -S '1024 junk'" are now caught. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>