summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2013-04-15rbd: add an asynchronous flushJosh Durgin1-4/+33
The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-15block: ssh: Use libssh2_sftp_fsync (if supported by libssh2) to flush to disk.Richard W.M. Jones1-5/+73
libssh2_sftp_fsync is an extension to libssh2 to support fsync(2) over sftp, which is itself an extension of OpenSSH. If both libssh2 and the ssh daemon support it, this will allow bdrv_flush_to_disk to commit changes through to disk on the remote server. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-15block: Add support for Secure Shell (ssh) block device.Richard W.M. Jones2-0/+996
qemu-system-x86_64 -drive file=ssh://hostname/some/image QEMU will ssh into 'hostname' and open '/some/image' which is made available as a standard block device. You can specify a username (ssh://user@host/...) and/or a port number (ssh://host:port/...). You can also use an alternate syntax using properties (file.user, file.host, file.port, file.path). Current limitations: - Authentication must be done without passwords or passphrases, using ssh-agent. Other authentication methods are not supported. - Uses a single connection, instead of concurrent AIO with multiple SSH connections. This is implemented using libssh2 on the client side. The server just requires a regular ssh daemon with sftp-server support. Most ssh daemons on Unix/Linux systems will work out of the box. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-15block: Introduce bdrv_pwritev() for qcow2_save_vmstateKevin Wolf1-7/+1
Directly pass the QEMUIOVector on instead of linearising it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-15block: Introduce bdrv_writev_vmstateKevin Wolf2-6/+19
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-13aes: move aes.h from include/block to include/qemuAurelien Jarno3-3/+3
Move aes.h from include/block to include/qemu to show it can be reused by other subsystems. Cc: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-08hw: move headers to include/Paolo Bonzini1-2/+2
Many of these should be cleaned up with proper qdev-/QOM-ification. Right now there are many catch-all headers in include/hw/ARCH depending on cpu.h, and this makes it necessary to compile these files per-target. However, fixing this does not belong in these patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-04-05qcow2: Fix L1 write error handling in qcow2_update_snapshot_refcountKevin Wolf1-6/+8
It ignored the error code, and at least the 'goto fail' is obvious nonsense as it creates an endless loop (if the next attempt doesn't magically succeed) and leaves the in-memory L1 table in big-endian instead of converting it back. In error cases, there's no point in writing an updated L1 table, so skip this part for them. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05qcow2: Return real error in qcow2_update_snapshot_refcountKevin Wolf1-6/+5
This fixes the error message triggered by the following script: cat > /tmp/blkdebug.cfg <<EOF [inject-error] event = "cluster_free" errno = "28" immediately = "off" EOF $qemu_img create -f qcow2 test.qcow2 10G $qemu_img snapshot -c snap test.qcow2 $qemu_img snapshot -d snap blkdebug:/tmp/blkdebug.cfg:test.qcow2 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-02oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()Stefan Hajnoczi2-2/+2
The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets. Rename to qemu_set_nonblock() just like qemu_set_cloexec(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2013-03-28qcow2: Gather clusters in a looping loopKevin Wolf1-31/+43
Instead of just checking once in exactly this order if there are dependendies, non-COW clusters and new allocation, this starts looping around these. This way we can, for example, gather non-COW clusters after new allocations as long as the host cluster offsets stay contiguous. Once handle_dependencies() is extended so that COW areas of in-flight allocations can be overwritten, this allows to continue with gathering other clusters (we wouldn't be able to do that without this change because we would have missed a possible second dependency in one of the next clusters). This means that in the typical sequential write case, we can combine the COW overwrite of one cluster with the allocation of the next cluster as soon as something like Delayed COW gets actually implemented. It is only by avoiding splitting requests this way that Delayed COW actually starts improving performance noticably. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Move cluster gathering to a non-looping loopKevin Wolf1-64/+70
This patch is mainly to separate the indentation change from the semantic changes. All that really changes here is that everything moves into a while loop, all 'goto done' become 'break' and at the end of the loop a new 'break is inserted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Allow requests with multiple l2metasKevin Wolf3-3/+17
Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Use byte granularity in qcow2_alloc_cluster_offset()Kevin Wolf1-56/+28
This gets rid of the nb_clusters and keep_clusters and the associated complicated calculations. Just advance the number of bytes that have been processed and everything is fine. This patch advances the variables even after the last operation even though they aren't used any more afterwards to make things look more uniform. A later patch will turn the whole thing into a loop and then it actually starts making sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Prepare handle_alloc/copied() for byte granularityKevin Wolf1-9/+16
This makes handle_alloc() and handle_copied() return byte-granularity host offsets instead of returning always the cluster start. This is required so that qcow2_alloc_cluster_offset() can stop aligning everything to cluster boundaries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: handle_copied(): Implement non-zero host_offsetKevin Wolf1-8/+20
Look only for clusters that start at a given physical offset. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: handle_copied(): Get rid of keep_clusters parameterKevin Wolf1-10/+13
Now *bytes is used to return the length of the area that can be written to without performing an allocation or COW. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: handle_copied(): Get rid of nb_clusters parameterKevin Wolf1-6/+18
handle_copied() uses its bytes parameter now to determine how many clusters it should try to find. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Factor out handle_copied()Kevin Wolf1-40/+94
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Clean up handle_alloc()Kevin Wolf1-57/+53
Things can be simplified a bit now. No semantic changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Finalise interface of handle_alloc()Kevin Wolf2-13/+21
The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: handle_alloc(): Get rid of keep_clusters parameterKevin Wolf2-17/+32
handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: handle_alloc(): Get rid of nb_clusters parameterKevin Wolf1-4/+15
We already communicate the same information in *bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Factor out handle_alloc()Kevin Wolf1-89/+151
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Decouple cluster allocation from cluster reuse codeKevin Wolf1-15/+20
This moves some code that prepares the allocation of new clusters to where the actual allocation happens. This is the minimum required to be able to move it to a separate function in the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Change handle_dependency to byte granularityKevin Wolf2-12/+39
This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Improve check for overlapping allocationsKevin Wolf1-1/+1
The old code detected an overlapping allocation even when the allocations didn't actually overlap, but were only adjacent. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Handle dependencies earlierKevin Wolf2-16/+48
Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Remove bogus unlock of s->lockKevin Wolf1-2/+0
The unlock wakes up the next coroutine, but the currently running coroutine will lock it again before it yields, so this doesn't make a lot of sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28qcow2: Fix "total clusters" number in bdrv_checkKevin Wolf1-1/+3
This should be based on the virtual disk size, not on the size of the image. Interesting observation: With some VM state stored in the image file, percentages higher than 100% are possible, even though snapshots themselves are ignored. This is a qcow2 bug to be fixed another day: The VM state should be discarded in the active L2 tables after completing the snapshot creation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-25block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build)Stefan Weil1-2/+4
The new parameter is unused yet. This part was missing in commit 787e4a8500020695eb391e2f1cc4767ee071d441. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-25rbd: fix compile errorLiu Yuan1-1/+2
Commit 787e4a85 [block: Add options QDict to bdrv_file_open() prototypes] didn't update rbd.c accordingly. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-22nbd: Check against invalid option combinationsKevin Wolf1-0/+14
A file name may only specified if no host or socket path is specified. The latter two may not appear at the same time either. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22nbd: Use default port if only host is specifiedKevin Wolf1-9/+10
The URL method already takes care to apply the default port when none is specfied. Directly specifying driver-specific options required the port number until now. Allow leaving it out and apply the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Make find_image_format safe with NULL filenameKevin Wolf1-3/+10
In order to achieve this, the .bdrv_probe callbacks of all drivers must cope with this. The DMG driver is the only one that bases its decision on the filename and it needs to be changed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Introduce .bdrv_parse_filename callbackKevin Wolf1-16/+13
If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22nbd: Accept -drive options for the network connectionKevin Wolf1-52/+77
The existing parsers for the file name now parse everything into the bdrv_open() options QDict. Instead of using these parsers, you can now directly specify the options on the command line, like this: qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1 Clearly the file=... part could use further improvement, but it's a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22nbd: Keep hostname and port separateKevin Wolf1-9/+40
The NBD block supports an URL syntax, for which a URL parser returns separate hostname and port fields. It also supports the traditional qemu syntax encoded in a filename. Until now, after parsing the URL to get each piece of information, a new string is built to be fed to socket functions. Instead of building a string in the URL case that is immediately parsed again, parse the string in both cases and use the QemuOpts interface to qemu-sockets.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Add options QDict to bdrv_file_open() prototypesKevin Wolf14-23/+35
The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-19qcow2: Fix segfault in qcow2_invalidate_cacheKevin Wolf2-2/+13
Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-19sheepdog: show error message for halt statusLiu Yuan1-0/+2
Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests when number of alive nodes is less than that of copies specified by users. This will return 0x19 to QEMU client which currently doesn't recognize it. This patch adds an error description when QEMU client receives it, other than plainly printing 'Invalid error code' Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15threadpool: drop global thread poolStefan Hajnoczi2-3/+9
Now that each AioContext has a ThreadPool and the main loop AioContext can be fetched with bdrv_get_aio_context(), we can eliminate the concept of a global thread pool from thread-pool.c. The submit functions must take a ThreadPool* argument. block/raw-posix.c and block/raw-win32.c use aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's ThreadPool. tests/test-thread-pool.c must be updated to reflect the new thread_pool_submit() function prototypes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15sheepdog: set io_flush handler in do_co_reqMORITA Kazutaka1-2/+11
If an io_flush handler is not set, qemu_aio_wait doesn't invoke callbacks. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15sheepdog: use non-blocking fd in coroutine contextMORITA Kazutaka1-4/+2
Using a blocking socket in the coroutine context reduces the chance of switching to other work. This patch makes the sheepdog driver use a non-blocking fd always. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15qcow2: make is_allocated return true for zero clustersPaolo Bonzini2-5/+4
Otherwise, live migration of the top layer will miss zero clusters and let the backing file show through. This also matches what is done in qed. QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this directly in qcow2_get_cluster_offset instead of replicating the test everywhere. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount()Stefan Hajnoczi1-4/+0
We already flush when the function completes. There is no need to flush after every compressed cluster. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15qcow2: drop flush in update_cluster_refcount()Stefan Hajnoczi1-2/+0
The update_cluster_refcount() function increments/decrements a cluster's refcount and then returns the new refcount value. There is no need to flush since both update_cluster_refcount() callers already take care of this: 1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed sectors will be appended to an existing cluster with enough free space. qcow2_alloc_bytes() already flushes so there is no need to do so in update_cluster_refcount(). 2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts if it needs to update L2 entries. It also flushes before completing. Removing this flush significantly speeds up qcow2 snapshot creation: $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata $ time qemu-img snapshot -c new test.qcow2 Time drops from more than 3 minutes to under 1 second. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15qcow2: flush in qcow2_update_snapshot_refcount()Stefan Hajnoczi2-6/+1
Users of qcow2_update_snapshot_refcount() do not flush consistently. qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and qcow2_snapshot_delete() do not. Solve this by moving the bdrv_flush() into qcow2_update_snapshot_refcount(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15qcow2: set L2 cache dependency in qcow2_alloc_bytes()Stefan Hajnoczi1-1/+5
Compressed writes use qcow2_alloc_bytes() to allocate space with byte granularity. The affected clusters' refcounts will be incremented but we do not need to flush yet. Set a L2 cache dependency on the refcount block cache, so that the refcounts get written out before the L2 updates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15qcow2: flush refcount cache correctly in qcow2_write_snapshots()Stefan Hajnoczi1-1/+4
Since qcow2 metadata is cached we need to flush the caches, not just the underlying file. Use bdrv_flush(bs) instead of bdrv_flush(bs->file). Also add the error return path when bdrv_flush() fails and move the flush after checking for qcow2_alloc_clusters() failure so that the qcow2_alloc_clusters() error return value takes precedence. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>