summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2012-04-10Merge remote-tracking branch 'kwolf/for-anthony' into stagingAnthony Liguori14-408/+312
* kwolf/for-anthony: (46 commits) qed: remove incoming live migration blocker qed: honor BDRV_O_INCOMING for incoming live migration migration: clear BDRV_O_INCOMING flags on end of incoming live migration qed: add bdrv_invalidate_cache to be called after incoming live migration blockdev: open images with BDRV_O_INCOMING on incoming live migration block: add a function to clear incoming live migration flags block: Add new BDRV_O_INCOMING flag to notice incoming live migration block stream: close unused files and update ->backing_hd qemu-iotests: Fix call syntax for qemu-io qemu-iotests: Fix call syntax for qemu-img qemu-iotests: Test unknown qcow2 header extensions qemu-iotests: qcow2.py sheepdog: fix send req helpers sheepdog: implement SD_OP_FLUSH_VDI operation block: bdrv_append() fixes qed: track dirty flag status qemu-img: add dirty flag status qed: image fragmentation statistics qemu-img: add image fragmentation statistics block: document job API ...
2012-04-05qed: remove incoming live migration blockerBenoît Canet2-11/+0
Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: honor BDRV_O_INCOMING for incoming live migrationBenoît Canet1-2/+3
From original commit with Patchwork-id: 31108 by Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> "The QED image format includes a file header bit to mark images dirty. QED normally checks dirty images on open and fixes inconsistent metadata. This is undesirable during live migration since the dirty bit may be set if the source host is modifying the image file. The check should be postponed until migration completes. Skip operations that modify the image file if the BDRV_O_INCOMING flag is set." Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: add bdrv_invalidate_cache to be called after incoming live migrationBenoît Canet1-0/+10
The QED image is reopened to flush metadata and check consistency. Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05block stream: close unused files and update ->backing_hdMarcelo Tosatti1-0/+34
Close the now unused images that were part of the previous backing file chain and adjust ->backing_hd, backing_filename and backing_format properly. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=801449 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05sheepdog: fix send req helpersLiu Yuan1-0/+2
We should return if reading of the header fails. Cc: Kevin Wolf <kwolf@redhat.com> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05sheepdog: implement SD_OP_FLUSH_VDI operationLiu Yuan1-14/+128
Flush operation is supposed to flush the write-back cache of sheepdog cluster. By issuing flush operation, we can assure the Guest of data reaching the sheepdog cluster storage. Cc: Kevin Wolf <kwolf@redhat.com> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: track dirty flag statusDong Xu Wang1-0/+1
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: image fragmentation statisticsDong Xu Wang1-0/+9
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05block: set job->speed in block_set_speedPaolo Bonzini1-1/+0
There is no need to do this in every implementation of set_speed (even though there is only one right now). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05block: fix streaming/closing racePaolo Bonzini1-2/+4
Streaming can issue I/O while qcow2_close is running. This causes the L2 caches to become very confused or, alternatively, could cause a segfault when the streaming coroutine is reentered after closing its block device. The fix is to cancel streaming jobs when closing their underlying device. The cancellation must be synchronous, on the other hand qemu_aio_wait will not restart a coroutine that is sleeping in co_sleep. So add a flag saying whether streaming has in-flight I/O. If the busy flag is false, the coroutine is quiescent and, when cancelled, will not issue any new I/O. This protects streaming against closing, but not against deleting. We have a reference count protecting us against concurrent deletion, but I still added an assertion to ensure nothing bad happens. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: change goto to loopPaolo Bonzini1-73/+68
Finally reindent all code and change goto statements to a loop. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: do not create useless iovecsPaolo Bonzini1-46/+33
Reads and writes to the underlying file can also occur with the simple non-vectored I/O interfaces. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: leave bounce buffering to block layerPaolo Bonzini1-55/+12
vdi.c really works as if it implemented bdrv_read and bdrv_write. However, because only vector I/O is supported by the asynchronous callbacks, it went through extra pain to bounce-buffer the I/O. This can be handled by the block layer now that the format is coroutine-based. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: move aiocb fields to localsPaolo Bonzini1-98/+65
Most of the AIOCB really holds local variables that need to persist across callback invocation. It can go away now. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: merge aio_read_cb and aio_write_cb into callersPaolo Bonzini1-28/+12
Now inline the former AIO callbacks into vdi_co_readv and vdi_co_writev. While many cleanups are possible, the code now really looks synchronous. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: move end-of-I/O handling at the endPaolo Bonzini1-67/+56
The next step is to take code that only triggers after the first operation, and move it at the end of vdi_aio_read_cb and vdi_aio_write_cb. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05vdi: basic conversion to coroutinesPaolo Bonzini1-121/+37
Even a basic conversion changing the bdrv_aio_readv/bdrv_aio_writev calls to bdrv_co_readv/bdrv_co_writev, and callbacks to goto statements can eliminate a lot of code. This is because error handling is simplified and indirections through bottom halves can go away. After this patch, I/O to the underlying file already happens via coroutines, but the code still looks a lot like if asynchronous I/O was being used. Acked-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05block/vpc: write checksum back to footer after checkZhang Shengju1-0/+3
After validation check, the 'checksum' is not written back to footer, which leave it with zero. This results in errors while loadding it under Microsoft's Hyper-V environment, and also errors from utilities like Citrix's vhd-util. Signed-off-by: Zhang Shengju <sean_zhang@trendmicro.com.cn> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05block: push recursive flushing up from driversPaolo Bonzini9-55/+2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qcow2: Remove unused parameter in get_cluster_table()Kevin Wolf1-10/+8
Since everything goes through the cache, callers don't use the L2 table offset any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-03block/curl: Replace usleep by g_usleepStefan Weil1-1/+1
The function usleep is not available for all supported platforms: at least some versions of MinGW don't support it. usleep was also declared obsolete by POSIX.1-2001. The function g_usleep is part of glib2.0, so it is available for all supported platforms. Using nanosleep would also be possible but needs more code. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12qcow2: Reduce number of I/O requestsKevin Wolf2-77/+167
If the first part of a write request is allocated, but the second isn't and it can be allocated so that the resulting area is contiguous, handle it at once. This is a common case for sequential writes. After this patch, alloc_cluster_offset() only checks if the clusters are already allocated or how many new clusters can be allocated contigouosly. The actual cluster allocation is split off into a new function do_alloc_cluster_offset(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12qcow2: Add qcow2_alloc_clusters_at()Kevin Wolf2-0/+30
This function allows to allocate clusters at a given offset in the image file. This is useful if you want to allocate the second part of an area that must be contiguous. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12qcow2: Factor out count_cow_clustersKevin Wolf1-19/+36
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12qcow2: Add error messages in qcow2_truncateKevin Wolf1-0/+3
qemu-img resize has some limitations with qcow2, but the user is only told that "this image format does not support resize". Quite confusing, so add some more detailed error_report() calls and change "this image format" into "this image". Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12qcow2: Add some tracingKevin Wolf3-1/+41
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12qed: do not evict in-use L2 table cache entriesStefan Hajnoczi1-4/+18
The L2 table cache reduces QED metadata reads that would be required when translating LBAs to offsets into the image file. Since requests execute in parallel it is possible to share an L2 table between multiple requests. There is a potential data corruption issue when an in-use L2 table is evicted from the cache because the following situation occurs: 1. An allocating write performs an update to L2 table "A". 2. Another request needs L2 table "B" and causes table "A" to be evicted. 3. A new read request needs L2 table "A" but it is not cached. As a result the L2 update from #1 can overlap with the L2 fetch from #3. We must avoid doing overlapping I/O requests here since the worst case outcome is that the L2 fetch completes before the L2 update and yields stale data. In that case we would effectively discard the L2 update and lose data clusters! Thanks to Benoît Canet <benoit.canet@gmail.com> for extensive testing and debugging which lead to discovery of this bug. Reported-by: Benoît Canet <benoit.canet@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Tested-by: Benoît Canet <benoit.canet@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-03-07block/vmdk: Fix warning from splint (comparision of unsigned value)Stefan Weil1-1/+1
l1_entry_sectors will never be less than 0. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-02-29qcow2: Reject too large header extensionsKevin Wolf1-0/+5
Image files that make qemu-img info read several gigabytes into the unknown header extensions list are bad. Just fail opening the image if an extension claims to be larger than the header extension area. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-02-29qcow2: Fix offset in qcow2_read_extensionsKevin Wolf1-3/+2
The spec says that the length of extensions is padded to 8 bytes, not the offset. Currently this is the same because the header size is a multiple of 8, so this is only about compatibility with future changes to the header size. While touching it, move the calculation to a common place instead of duplicating it for each header extension type. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-02-29qcow2: Fix build with DEBUG_EXT enabledKevin Wolf1-1/+0
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22block: bdrv_eject(): Make eject_flag a real boolLuiz Capitulino2-4/+4
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09sheepdog: fix co_recv coroutine contextMORITA Kazutaka1-0/+3
The co_recv coroutine has two things that will try to enter it: 1. The select(2) read callback on the sheepdog socket. 2. The aio_add_request() blocking operations, including a coroutine mutex. This patch fixes it by setting NULL to co_recv before sending data. In future, we should make the sheepdog driver fully coroutine-based and simplify request handling. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09qcow2: Keep unknown header extension when rewriting headerKevin Wolf2-2/+50
If we want header extensions to work as compatible extensions, we can't destroy yet unknown header extensions when rewriting the header (e.g. for changing the backing file). Save all unknown header extensions in a list of blobs and include them in a new header. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09qcow2: Update whole header at onceKevin Wolf2-60/+95
In order to switch the backing file, qcow2 issues multiple write requests that only changed a part of the image header. Any failure after the first one would leave the header in an corrupted state. With this patch, the whole header is written at once, so we can't fail in the middle. At the same time, this gives us a reusable functions that updates all fields of the qcow2 header and not only the backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09vpc: Round up image size during fixed image creationKevin Wolf1-13/+10
The geometry calculation algorithm from the VHD spec rounds the image size down if it doesn't exactly match a geometry. During image conversion, this causes the image to be truncated. For dynamic images, we already have code in place to round up instead, let's do the same for fixed images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09vpc: Add support for Fixed Disk typeCharles Arnold1-91/+194
The Virtual Hard Disk Image Format Specification allows for three types of hard disk formats, Fixed, Dynamic, and Differencing. Qemu currently only supports Dynamic disks. This patch adds support for the Fixed Disk format. Usage: Example 1: qemu-img create -f vpc -o type=fixed <filename> [size] Example 2: qemu-img convert -O vpc -o type=fixed <input filename> <output filename> While it is also allowed to specify '-o type=dynamic', the default disk type remains Dynamic and is what is used when the type is left unspecified. Signed-off-by: Charles Arnold <carnold@suse.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09iSCSI: add configuration variables for iSCSIRonnie Sahlberg1-10/+129
This patch adds configuration variables for iSCSI to set initiator-name to use when logging in to the target, which type of header-digest to negotiate with the target and username and password for CHAP authentication. This allows specifying a initiator-name either from the command line -iscsi initiator-name=iqn.2004-01.com.example:test or from a configuration file included with -readconfig [iscsi] initiator-name = iqn.2004-01.com.example:test header-digest = CRC32C|CRC32C-NONE|NONE-CRC32C|NONE user = CHAP username password = CHAP password If you use several different targets, you can also configure this on a per target basis by using a group name: [iscsi "iqn.target.name"] ... The configuration file can be read using -readconfig. Example : qemu-system-i386 -drive file=iscsi://127.0.0.1/iqn.ronnie.test/1 -readconfig iscsi.conf Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09qed: add .bdrv_co_write_zeroes() supportStefan Hajnoczi2-8/+103
Zero writes are a dedicated interface for writing regions of zeroes into the image file. If clusters are not yet allocated it is possible to use an efficient metadata representation which keeps the image file compact and does not store individual zero bytes. Implementing this for the QED image format is fairly straightforward. The only issue is that when a zero write touches an existing cluster we have to allocate a bounce buffer and perform a regular write. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09qed: replace is_write with flags fieldStefan Hajnoczi2-8/+13
Per-request attributes like read/write are currently implemented as bool fields in the QEDAIOCB struct. This becomes unwiedly as the number of attributes grows. For example, the qed_aio_setup() function would have to take multiple bool arguments and at call sites it would be hard to distinguish the meaning of each bool. Instead use a flags field with bitmask constants. This will be used when zero write support is added. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26qcow: Use bdrv functions to replace file operationLi Zhi Hui1-17/+31
Since common file operation functions lack of error detection and use much more I/O syscalls, so change them to bdrv series functions and reduce I/O request. Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26qcow: Return real error code in qcow_openLi Zhi Hui1-19/+37
Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26block/vdi: Zero unused parts when allocating a new block (fix #919242)Stefan Weil1-2/+6
The new block was filled with zero when it was allocated by g_malloc0, but when it was reused later and only partially used, data from the previously allocated block were still present and written to the new block. This caused the problems reported by bug #919242 (https://bugs.launchpad.net/qemu/+bug/919242). Now the unused parts of the new block which are before and after the data are always filled with zero, so it is no longer necessary to zero the whole block with g_malloc0. I also updated the copyright comment. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26block: add support for partial streamingMarcelo Tosatti1-4/+87
Add support for streaming data from an intermediate section of the image chain (see patch and documentation for details). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26block: rate-limit streaming operationsStefan Hajnoczi1-6/+59
This patch implements rate-limiting for image streaming. If we've exceeded the bandwidth quota for a 100 ms time slice we sleep the coroutine until the next slice begins. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26block: add image streaming block jobStefan Hajnoczi1-0/+133
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26block: replace unchecked strdup/malloc/calloc with glibStefan Hajnoczi2-4/+4
Most of the codebase as been converted to use glib memory allocation functions. There are still a few instances of malloc/calloc in the block layer and qemu-io. Replace them, especially since they do not check the strdup/malloc/calloc return value. Reported-by: Dr David Alan Gilbert <davidagilbert@uk.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26rbd: wire up snapshot removal and rollback functionalityGregory Farnum1-0/+22
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-13prepare for future GPLv2+ relicensingPaolo Bonzini3-0/+7
All files under GPLv2 will get GPLv2+ changes starting tomorrow. event_notifier.c and exec-obsolete.h were only ever touched by Red Hat employees and can be relicensed now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>