peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2013-04-02	oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()	Stefan Hajnoczi	2	-2/+2
	The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets. Rename to qemu_set_nonblock() just like qemu_set_cloexec(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2013-03-28	qcow2: Gather clusters in a looping loop	Kevin Wolf	1	-31/+43
	Instead of just checking once in exactly this order if there are dependendies, non-COW clusters and new allocation, this starts looping around these. This way we can, for example, gather non-COW clusters after new allocations as long as the host cluster offsets stay contiguous. Once handle_dependencies() is extended so that COW areas of in-flight allocations can be overwritten, this allows to continue with gathering other clusters (we wouldn't be able to do that without this change because we would have missed a possible second dependency in one of the next clusters). This means that in the typical sequential write case, we can combine the COW overwrite of one cluster with the allocation of the next cluster as soon as something like Delayed COW gets actually implemented. It is only by avoiding splitting requests this way that Delayed COW actually starts improving performance noticably. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Move cluster gathering to a non-looping loop	Kevin Wolf	1	-64/+70
	This patch is mainly to separate the indentation change from the semantic changes. All that really changes here is that everything moves into a while loop, all 'goto done' become 'break' and at the end of the loop a new 'break is inserted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Allow requests with multiple l2metas	Kevin Wolf	3	-3/+17
	Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Use byte granularity in qcow2_alloc_cluster_offset()	Kevin Wolf	1	-56/+28
	This gets rid of the nb_clusters and keep_clusters and the associated complicated calculations. Just advance the number of bytes that have been processed and everything is fine. This patch advances the variables even after the last operation even though they aren't used any more afterwards to make things look more uniform. A later patch will turn the whole thing into a loop and then it actually starts making sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Prepare handle_alloc/copied() for byte granularity	Kevin Wolf	1	-9/+16
	This makes handle_alloc() and handle_copied() return byte-granularity host offsets instead of returning always the cluster start. This is required so that qcow2_alloc_cluster_offset() can stop aligning everything to cluster boundaries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: handle_copied(): Implement non-zero host_offset	Kevin Wolf	1	-8/+20
	Look only for clusters that start at a given physical offset. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: handle_copied(): Get rid of keep_clusters parameter	Kevin Wolf	1	-10/+13
	Now *bytes is used to return the length of the area that can be written to without performing an allocation or COW. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: handle_copied(): Get rid of nb_clusters parameter	Kevin Wolf	1	-6/+18
	handle_copied() uses its bytes parameter now to determine how many clusters it should try to find. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Factor out handle_copied()	Kevin Wolf	1	-40/+94
	Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Clean up handle_alloc()	Kevin Wolf	1	-57/+53
	Things can be simplified a bit now. No semantic changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Finalise interface of handle_alloc()	Kevin Wolf	2	-13/+21
	The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: handle_alloc(): Get rid of keep_clusters parameter	Kevin Wolf	2	-17/+32
	handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: handle_alloc(): Get rid of nb_clusters parameter	Kevin Wolf	1	-4/+15
	We already communicate the same information in *bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Factor out handle_alloc()	Kevin Wolf	1	-89/+151
	Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Decouple cluster allocation from cluster reuse code	Kevin Wolf	1	-15/+20
	This moves some code that prepares the allocation of new clusters to where the actual allocation happens. This is the minimum required to be able to move it to a separate function in the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Change handle_dependency to byte granularity	Kevin Wolf	2	-12/+39
	This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Improve check for overlapping allocations	Kevin Wolf	1	-1/+1
	The old code detected an overlapping allocation even when the allocations didn't actually overlap, but were only adjacent. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Handle dependencies earlier	Kevin Wolf	2	-16/+48
	Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Remove bogus unlock of s->lock	Kevin Wolf	1	-2/+0
	The unlock wakes up the next coroutine, but the currently running coroutine will lock it again before it yields, so this doesn't make a lot of sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28	qcow2: Fix "total clusters" number in bdrv_check	Kevin Wolf	1	-1/+3
	This should be based on the virtual disk size, not on the size of the image. Interesting observation: With some VM state stored in the image file, percentages higher than 100% are possible, even though snapshots themselves are ignored. This is a qcow2 bug to be fixed another day: The VM state should be discarded in the active L2 tables after completing the snapshot creation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-25	block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build)	Stefan Weil	1	-2/+4
	The new parameter is unused yet. This part was missing in commit 787e4a8500020695eb391e2f1cc4767ee071d441. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-25	rbd: fix compile error	Liu Yuan	1	-1/+2
	Commit 787e4a85 [block: Add options QDict to bdrv_file_open() prototypes] didn't update rbd.c accordingly. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-22	nbd: Check against invalid option combinations	Kevin Wolf	1	-0/+14
	A file name may only specified if no host or socket path is specified. The latter two may not appear at the same time either. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22	nbd: Use default port if only host is specified	Kevin Wolf	1	-9/+10
	The URL method already takes care to apply the default port when none is specfied. Directly specifying driver-specific options required the port number until now. Allow leaving it out and apply the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22	block: Make find_image_format safe with NULL filename	Kevin Wolf	1	-3/+10
	In order to achieve this, the .bdrv_probe callbacks of all drivers must cope with this. The DMG driver is the only one that bases its decision on the filename and it needs to be changed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22	block: Introduce .bdrv_parse_filename callback	Kevin Wolf	1	-16/+13
	If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22	nbd: Accept -drive options for the network connection	Kevin Wolf	1	-52/+77
	The existing parsers for the file name now parse everything into the bdrv_open() options QDict. Instead of using these parsers, you can now directly specify the options on the command line, like this: qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1 Clearly the file=... part could use further improvement, but it's a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22	nbd: Keep hostname and port separate	Kevin Wolf	1	-9/+40
	The NBD block supports an URL syntax, for which a URL parser returns separate hostname and port fields. It also supports the traditional qemu syntax encoded in a filename. Until now, after parsing the URL to get each piece of information, a new string is built to be fed to socket functions. Instead of building a string in the URL case that is immediately parsed again, parse the string in both cases and use the QemuOpts interface to qemu-sockets.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22	block: Add options QDict to bdrv_file_open() prototypes	Kevin Wolf	14	-23/+35
	The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-19	qcow2: Fix segfault in qcow2_invalidate_cache	Kevin Wolf	2	-2/+13
	Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-19	sheepdog: show error message for halt status	Liu Yuan	1	-0/+2
	Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests when number of alive nodes is less than that of copies specified by users. This will return 0x19 to QEMU client which currently doesn't recognize it. This patch adds an error description when QEMU client receives it, other than plainly printing 'Invalid error code' Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	threadpool: drop global thread pool	Stefan Hajnoczi	2	-3/+9
	Now that each AioContext has a ThreadPool and the main loop AioContext can be fetched with bdrv_get_aio_context(), we can eliminate the concept of a global thread pool from thread-pool.c. The submit functions must take a ThreadPool* argument. block/raw-posix.c and block/raw-win32.c use aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's ThreadPool. tests/test-thread-pool.c must be updated to reflect the new thread_pool_submit() function prototypes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15	sheepdog: set io_flush handler in do_co_req	MORITA Kazutaka	1	-2/+11
	If an io_flush handler is not set, qemu_aio_wait doesn't invoke callbacks. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15	sheepdog: use non-blocking fd in coroutine context	MORITA Kazutaka	1	-4/+2
	Using a blocking socket in the coroutine context reduces the chance of switching to other work. This patch makes the sheepdog driver use a non-blocking fd always. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15	qcow2: make is_allocated return true for zero clusters	Paolo Bonzini	2	-5/+4
	Otherwise, live migration of the top layer will miss zero clusters and let the backing file show through. This also matches what is done in qed. QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this directly in qcow2_get_cluster_offset instead of replicating the test everywhere. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15	qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount()	Stefan Hajnoczi	1	-4/+0
	We already flush when the function completes. There is no need to flush after every compressed cluster. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	qcow2: drop flush in update_cluster_refcount()	Stefan Hajnoczi	1	-2/+0
	The update_cluster_refcount() function increments/decrements a cluster's refcount and then returns the new refcount value. There is no need to flush since both update_cluster_refcount() callers already take care of this: 1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed sectors will be appended to an existing cluster with enough free space. qcow2_alloc_bytes() already flushes so there is no need to do so in update_cluster_refcount(). 2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts if it needs to update L2 entries. It also flushes before completing. Removing this flush significantly speeds up qcow2 snapshot creation: $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata $ time qemu-img snapshot -c new test.qcow2 Time drops from more than 3 minutes to under 1 second. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	qcow2: flush in qcow2_update_snapshot_refcount()	Stefan Hajnoczi	2	-6/+1
	Users of qcow2_update_snapshot_refcount() do not flush consistently. qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and qcow2_snapshot_delete() do not. Solve this by moving the bdrv_flush() into qcow2_update_snapshot_refcount(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	qcow2: set L2 cache dependency in qcow2_alloc_bytes()	Stefan Hajnoczi	1	-1/+5
	Compressed writes use qcow2_alloc_bytes() to allocate space with byte granularity. The affected clusters' refcounts will be incremented but we do not need to flush yet. Set a L2 cache dependency on the refcount block cache, so that the refcounts get written out before the L2 updates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	qcow2: flush refcount cache correctly in qcow2_write_snapshots()	Stefan Hajnoczi	1	-1/+4
	Since qcow2 metadata is cached we need to flush the caches, not just the underlying file. Use bdrv_flush(bs) instead of bdrv_flush(bs->file). Also add the error return path when bdrv_flush() fails and move the flush after checking for qcow2_alloc_clusters() failure so that the qcow2_alloc_clusters() error return value takes precedence. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	qcow2: flush refcount cache correctly in alloc_refcount_block()	Stefan Hajnoczi	1	-2/+8
	update_refcount() affects the refcount cache, it does not write to disk. Therefore bdrv_flush(bs->file) does nothing. We need to flush the refcount cache in order to write out the refcount updates! While we're here also add error returns when qcow2_cache_flush() fails. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15	qcow2: Allow lazy refcounts to be enabled on the command line	Kevin Wolf	3	-1/+39
	qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15	block: Add options QDict to bdrv_open() prototype	Kevin Wolf	4	-4/+4
	It doesn't do anything yet except storing the options QDict in the BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15	block: Add options QDict to .bdrv_open()	Kevin Wolf	12	-14/+14
	Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-05	iscsi: add iscsi_truncate support	Peter Lieven	1	-47/+88
	this patch adds iscsi_truncate which effectively allows for online resizing of iscsi volumes. for this to work you have to resize the volume on your storage and then call block_resize command in qemu which will issue a readcapacity16 to update the capacity. v4: - factor out complete readcapacity logic into a separate function - handle capacity change check condition in readcapacity function (this happens if the block_resize cmd is the first iscsi task executed after a resize on the storage) v3: - remove switch statement in iscsi_open - create separate patch for brdv_drain_all() in bdrv_truncate() v2: - add a general bdrv_drain_all() before bdrv_truncate() to avoid in-flight AIOs while the device is truncated - since no AIOs are in flight we can use a sync libiscsi call to re-read the capacity - factor out the readcapacity16 logic as it is redundant to iscsi_open() and iscsi_truncate(). Signed-off-by: Peter Lieven <pl@kamp.de> [allow any type of unit attention check condition in iscsi_readcapacity_sync(), as in Message-ID: <51263A2A.6070304@dlhnet.de> - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-05	iscsi: retry read, write, flush and unmap on unit attention check conditions	Peter Lieven	1	-88/+203
	the storage might return a check condition status for various reasons. (e.g. bus reset, capacity change, thin-provisioning info etc.) currently all these informative status responses lead to an I/O error which is populated to the guest. this patch introduces a retry mechanism to avoid this. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-04	sheepdog: add support for connecting to unix domain socket	MORITA Kazutaka	1	-12/+70
	This patch adds support for a unix domain socket for a connection between qemu and local sheepdog server. You can use the unix domain socket with the following syntax: $ qemu sheepdog+unix:///<vdiname>?socket=<socket path>[#snapid] Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04	sheepdog: use inet_connect to simplify connect code	MORITA Kazutaka	1	-81/+30
	This uses the form "<host>:<port>" for the representation of the sheepdog server to use inet_connect. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04	sheepdog: accept URIs	MORITA Kazutaka	1	-34/+105
	The URI syntax is consistent with the NBD and Gluster syntax. The syntax is sheepdog[+tcp]://[host:port]/vdiname[#snapid\|#tag] Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>