peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2012-12-19	misc: move include files to include/qemu/	Paolo Bonzini	25	-42/+42
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19	migration: move include files to include/migration/	Paolo Bonzini	6	-6/+6
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19	block: move include files to include/block/	Paolo Bonzini	33	-43/+43
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19	qapi: move include files to include/qobject/	Paolo Bonzini	2	-2/+2
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19	janitor: do not include qemu-char everywhere	Paolo Bonzini	1	-1/+0
	Touching char/char.h basically causes the whole of QEMU to be rebuilt. Avoid this, it is usually unnecessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19	janitor: do not rely on indirect inclusions of or from qemu-char.h	Paolo Bonzini	2	-0/+2
	Various header files rely on qemu-char.h including qemu-config.h or main-loop.h, but they really do not need qemu-char.h at all (particularly interesting is the case of the block layer!). Clean this up, and also add missing inclusions of qemu-char.h itself. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19	build: move rules from Makefile to */Makefile.objs	Paolo Bonzini	1	-0/+2
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-13	qcow2: Factor out handle_dependencies()	Kevin Wolf	1	-28/+42
	Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Execute run_dependent_requests() without lock	Kevin Wolf	1	-20/+16
	There's no reason for run_dependent_requests() to hold s->lock, and a later patch will require that in fact the lock is not held. Also, before this patch, run_dependent_requests() not only does what its name suggests, but also removes the l2meta from the list of in-flight requests. When changing this, it becomes an one-liner, so just inline it completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2	Kevin Wolf	3	-7/+7
	This is closer to where the dirty flag is really needed, and it avoids having checks for special cases related to cluster allocation directly in the writev loop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Allocate l2meta only for cluster allocations	Kevin Wolf	3	-31/+31
	Even for writes to already allocated clusters, an l2meta is allocated, though it stays effectively unused. After this patch, only allocating requests still have one. Each l2meta now describes an in-flight request that writes to clusters that are not yet hooked up in the L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Drop l2meta.cluster_offset	Kevin Wolf	3	-15/+14
	There's no real reason to have an l2meta for normal requests that don't allocate anything. Before we can get rid of it, we must return the host cluster offset in a different way. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Allocate l2meta dynamically	Kevin Wolf	1	-11/+15
	As soon as delayed COW is introduced, the l2meta struct is needed even after completion of the request, so it can't live on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Introduce Qcow2COWRegion	Kevin Wolf	2	-36/+76
	This makes it easier to address the areas for which a COW must be performed. As a nice side effect, the COW code in qcow2_alloc_cluster_link_l2 becomes really trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13	qcow2: Round QCowL2Meta.offset down to cluster boundary	Kevin Wolf	2	-2/+24
	The offset within the cluster is already present as n_start and this is what the code uses. QCowL2Meta.offset is only needed at a cluster granularity. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12	qcow2: Move BLKDBG_EVENT out of the lock	Kevin Wolf	1	-1/+1
	We want to use these events to suspend requests for testing concurrent AIO requests. Suspending requests while they are holding the CoMutex is rather boring for this purpose. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12	blkdebug: Implement suspend/resume of AIO requests	Kevin Wolf	1	-3/+105
	This allows more systematic AIO testing. The patch adds three new operations to blkdebug: * Setting a "breakpoint" on a blkdebug event. The next request that triggers this breakpoint is suspended and is tagged with a name. The breakpoint is removed after a request has triggered it. * A suspended request (identified by it's tag) can be resumed * It's possible to check whether a suspended request with a given tag exists. This can be used for waiting for an event. Ideally, we would instead tag requests right when they are created and set breakpoints for individual requests. However, at this point the block layer doesn't allow this easily, and breakpoints that trigger for any request already allow a lot of useful testing. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12	blkdebug: Factor out remove_rule()	Kevin Wolf	1	-2/+13
	The cleanup work to remove a rule depends on the type of the rule. It's easy for the existing rules as there is no data that must be cleaned up and is specific to a type yet, but the next patch will change this. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12	blkdebug: Allow usage without config file	Kevin Wolf	1	-0/+5
	As soon as new rules can be set during runtime, as introduced by the next patch, blkdebug makes sense even without a config file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11	Fix error code checking for SetFilePointer() call	Fabien Chouteau	1	-3/+14
	An error has occurred if the return value is invalid_set_file_pointer and getlasterror doesn't return no_error. Signed-off-by: Fabien Chouteau <chouteau@adacore.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11	rbd: Fix race between aio completition and aio cancel	Stefan Priebe	1	-8/+12
	This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11	raw-posix: inline paio_ioctl into hdev_aio_ioctl	Paolo Bonzini	1	-17/+10
	clang now warns about an unused function: CC block/raw-posix.o block/raw-posix.c:707:26: warning: unused function paio_ioctl [-Wunused-function] static BlockDriverAIOCB paio_ioctl(BlockDriverState bs, int fd, ^ 1 warning generated. because the only use of paio_ioctl() is inside a #if defined(__linux__) guard and it is static now. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11	block: vpc support for ~2 TB disks	Charles Arnold	1	-4/+13
	The VHD specification allows for up to a 2 TB disk size. The current implementation in qemu emulates EIDE and ATA-2 hardware which only allows for up to 127 GB. This disk size limitation can be overridden by allowing up to 255 heads instead of the normal 4 bit limitation of 16. Doing so allows disk images to be created of up to nearly 2 TB. This change does not violate the VHD format specification nor does it change how smaller disks (ie, <=127GB) are defined. [Charles Arnold also writes: "In analyzing a 160 GB VHD fixed disk image created on Windows 2008 R2, it appears that MS is also ignoring the CHS values in the footer geometry field in whatever driver they use for accessing the image. The CHS values are set at 65535,16,255 which obviously doesn't represent an image size of 160 GB." -- Stefan] Signed-off-by: Charles Arnold <carnold@suse.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11	block: vpc initialize the uuid footer field	Charles Arnold	1	-1/+6
	Initialize the uuid field in the footer with a generated uuid. Signed-off-by: Charles Arnold <carnold@suse.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11	aio: Get rid of qemu_aio_flush()	Kevin Wolf	3	-3/+3
	There are no remaining users, and new users should probably be using bdrv_drain_all() in the first place. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-28	iscsi: do not assume device is zero initialized	Peter Lieven	1	-0/+6
	Without any complex checks we can't assume that an iscsi target is initialized to zero. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28	iscsi: fix deadlock during login	Peter Lieven	1	-181/+70
	If the connection is interrupted before the first login is successfully completed qemu-kvm is waiting forever in qemu_aio_wait(). This is fixed by performing an sync login to the target. If the connection breaks after the first successful login errors are handled internally by libiscsi. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28	iscsi: fix segfault in url parsing	Peter Lieven	1	-2/+1
	If an invalid URL is specified iscsi_get_error(iscsi) is called with iscsi == NULL. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-21	use int64_t for return values from rbd instead of int	Stefan Priebe	1	-2/+2
	rbd / rados tends to return pretty often length of writes or discarded blocks. These values might be bigger than int. The steps to reproduce are: mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support. Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-21	vdi: don't override libuuid symbols	Stefan Hajnoczi	1	-6/+3
	It's poor symbol hygiene to provide a global symbols that collide with a common library like libuuid. If QEMU links against a shared library that depends on uuid_generate() it can end up calling our stub version of the function. This exact scenario happened with GlusterFS libgfapi.so, which depends on libglusterfs.so's uuid_generate(). Scope the uuid stubs for vdi.c only and avoid affecting other shared objects. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2012-11-21	block: add bdrv_reopen() support for raw hdev, floppy, and cdrom	Jeff Cody	1	-0/+16
	For hdev, floppy, and cdrom, the reopen() handlers are the same as for the file reopen handler. For floppy and cdrom types, however, we keep O_NONBLOCK, as in the _open function. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-14	vmdk: Fix data corruption bug in WRITE and READ handling	Gerhard Wiesinger	1	-2/+8
	Fixed a MAJOR BUG in VMDK files on file boundaries on reads and ALSO ON WRITES WHICH MIGHT CORRUPT THE IMAGE AND DATA!!!!!! Triggered for example with the following VMDK file (partly listed): RW 4193792 FLAT "XP-W1-f001.vmdk" 0 RW 2097664 FLAT "XP-W1-f002.vmdk" 0 RW 4193792 FLAT "XP-W1-f003.vmdk" 0 RW 512 FLAT "XP-W1-f004.vmdk" 0 RW 4193792 FLAT "XP-W1-f005.vmdk" 0 RW 2097664 FLAT "XP-W1-f006.vmdk" 0 RW 4193792 FLAT "XP-W1-f007.vmdk" 0 RW 512 FLAT "XP-W1-f008.vmdk" 0 Patch includes: 1.) Patch fixes wrong calculation on extent boundaries. Especially it fixes the relativeness of the sector number to the current extent. Verfied correctness with: 1.) Converted either with Virtualbox to VDI and then with qemu-img and then with qemu-img only: VBoxManage clonehd --format vdi /VM/XP-W/new/XP-W1.vmdk ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi ./qemu-img convert -O raw ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi /root/QEMU/VM-XP-W1/XP-W1-via-VBOX.img md5sum /root/QEMU/VM-XP-W/XP-W1-direct.img md5sum /root/QEMU/VM-XP-W/XP-W1-via-VBOX.img => same MD5 hash 2.) Verified debug log files 3.) Run Windows XP successfully 4.) chkdsk run successfully without any errors Signed-off-by: Gerhard Wiesinger <lists@wiesinger.com> Acked-by: Fam Zheng <famcool@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14	aio: rename AIOPool to AIOCBInfo	Stefan Hajnoczi	10	-25/+25
	Now that AIOPool no longer keeps a freelist, it isn't really a "pool" anymore. Rename it to AIOCBInfo and make it const since it no longer needs to be modified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14	block: Workaround for older versions of MinGW gcc	Stefan Weil	1	-5/+5
	Versions before gcc-4.6 don't support unnamed fields in initializers (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676). Offset and OffsetHigh belong to an unnamed struct which is part of an unnamed union. Therefore the original code does not work with older versions of gcc. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14	qcow2: Fix refcount table size calculation	Kevin Wolf	1	-1/+2
	A missing factor for the refcount table entry size in the calculation could mean that too little memory was allocated for the in-memory representation of the table, resulting in a buffer overflow. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Tested-by: Michael Tokarev <mjt@tls.msk.ru>
2012-11-12	nbd: accept URIs	Paolo Bonzini	1	-1/+97
	The URI syntax is consistent with the Gluster syntax. Export names are specified in the path, preceded by one or more (otherwise unused) slashes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-12	nbd: accept relative path to Unix socket	Paolo Bonzini	1	-10/+7
	Adding the "is_unix" member now will simplify the parsing of NBD URIs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	Merge remote-tracking branch 'origin/master' into threadpool	Paolo Bonzini	4	-12/+326
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	raw-win32: implement native asynchronous I/O	Paolo Bonzini	4	-6/+274
	With the new support for EventNotifiers in the AIO event loop, we can hook a completion port to every opened file and use asynchronous I/O on them. Wine's support is extremely inefficient, also because it really does the I/O synchronously on regular files. (!) But it works, and it is good to keep the Win32 and POSIX ports as similar as possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	raw-posix: move linux-aio.c to block/	Paolo Bonzini	2	-0/+217
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	raw-win32: add emulated AIO support	Paolo Bonzini	1	-49/+138
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	raw-posix: rename raw-posix-aio.h, hide unavailable prototypes	Paolo Bonzini	2	-5/+7
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	raw: merge posix-aio-compat.c into block/raw-posix.c	Paolo Bonzini	2	-8/+294
	Making the qemu_paiocb specific to raw devices will let us access members of the BDRVRawState arbitrarily. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31	block: switch posix-aio-compat to threadpool	Paolo Bonzini	2	-11/+2
	This is not meant for portability, but to remove code duplication. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30	aio: add Win32 implementation	Paolo Bonzini	1	-1/+5
	The Win32 implementation will only accept EventNotifiers, thus a few drivers are disabled under Windows. EventNotifiers are a good match for the GSource implementation, too, because the Win32 port of glib allows to place their HANDLEs in a GPollFD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-24	mirror: add support for on-source-error/on-target-error	Paolo Bonzini	1	-21/+73
	Error management is important for mirroring; otherwise, an error on the target (even something as "innocent" as ENOSPC) requires to start again with a full copy. Similar to on_read_error/on_write_error, two separate knobs are provided for on_source_error (reads) and on_target_error (writes). The default is 'report' for both. The 'ignore' policy will leave the sector dirty, so that it will be retried later. Thus, it will not cause corruption. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24	mirror: implement completion	Paolo Bonzini	1	-5/+40
	Switching to the target of the migration is done mostly asynchronously, and reported to management via the BLOCK_JOB_COMPLETED event; the only synchronous phase is opening the backing files. bdrv_open_backing_file can always be done, even for migration of the full image (aka sync: 'full'). In this case, qmp_drive_mirror will create the target disk with no backing file at all, and bdrv_open_backing_file will be a no-op. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24	mirror: introduce mirror job	Paolo Bonzini	2	-0/+236
	This patch adds the implementation of a new job that mirrors a disk to a new image while letting the guest continue using the old image. The target is treated as a "black box" and data is copied from the source to the target in the background. This can be used for several purposes, including storage migration, continuous replication, and observation of the guest I/O in an external program. It is also a first step in replacing the inefficient block migration code that is part of QEMU. The job is possibly never-ending, but it is logically structured into two phases: 1) copy all data as fast as possible until the target first gets in sync with the source; 2) keep target in sync and ensure that reopening to the target gets a correct (full) copy of the source data. The second phase is indicated by the progress in "info block-jobs" reporting the current offset to be equal to the length of the file. When the job is cancelled in the second phase, QEMU will run the job until the source is clean and quiescent, then it will report successful completion of the job. In other words, the BLOCK_JOB_CANCELLED event means that the target may _not_ be consistent with a past state of the source; the BLOCK_JOB_COMPLETED event means that the target is consistent with a past state of the source. (Note that it could already happen that management lost the race against QEMU and got a completion event instead of cancellation). It is not yet possible to complete the job and switch over to the target disk. The next patches will fix this and add many refinements to the basic idea introduced here. These include improved error management, some tunable knobs and performance optimizations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24	block: rename block_job_complete to block_job_completed	Paolo Bonzini	2	-3/+3
	The imperative will be used for the QMP command. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24	block: in commit, determine base image from the top image	Jeff Cody	1	-9/+0
	This simplifies some code and error checking, and also fixes a bug. bdrv_find_backing_image() should only be passed absolute filenames, or filenames relative to the chain. In the QMP message handler for block commit, when looking up the base do so from the determined top image, so we know it is reachable from top. Some of the error messages put out by block-commit have changed slightly, which causes 2 tests cases for block-commit to fail. This patch updates the test cases to look for the correct error output. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>