peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-10-16	Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-10-14' into ↵	Peter Maydell	1	-8/+10
	staging nbd patches for 2017-10-14 - Marc-André Lureau - NBD: use g_new() family of functions - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read # gpg: Signature made Sun 15 Oct 2017 01:38:47 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-10-14: nbd: header constants indenting nbd/server: simplify reply transmission nbd/server: refactor nbd_co_send_simple_reply parameters nbd/server: do not use NBDReply structure nbd/server: structurize simple reply header sending nbd: rename some simple-request related objects to be _simple_ block/nbd-client: refactor nbd_co_receive_reply block/nbd-client: assert qiov len once in nbd_co_request NBD: use g_new() family of functions Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-13	block/throttle.c: add bdrv_co_drain_begin/end callbacks	Manos Pitsidianakis	1	-0/+18
	Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13	block: rename bdrv_co_drain to bdrv_co_drain_begin	Manos Pitsidianakis	2	-5/+5
	Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13	block: add bdrv_co_drain_end callback	Manos Pitsidianakis	1	-14/+34
	BlockDriverState has a bdrv_co_drain() callback but no equivalent for the end of the drain. The throttle driver (block/throttle.c) needs a way to mark the end of the drain in order to toggle io_limits_disabled correctly, thus bdrv_co_drain_end is needed. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-12	block/nbd-client: refactor nbd_co_receive_reply	Vladimir Sementsov-Ogievskiy	1	-4/+4
	Pass handle parameter directly, as the whole request isn't needed. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-12	block/nbd-client: assert qiov len once in nbd_co_request	Vladimir Sementsov-Ogievskiy	1	-4/+6
	Also improve the assertion: check that qiov is NULL for other commands than CMD_READ and CMD_WRITE. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-09	config: qemu_config_parse() return number of config groups	Eduardo Habkost	1	-1/+0
	Change qemu_config_parse() to return the number of config groups in success and -EINVAL on error. This will allow callers of qemu_config_parse() to check if something was really loaded from the config file. All existing callers of qemu_config_parse() and qemu_read_config_file() only check if the return value was negative, so the change shouldn't affect them. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171004025043.3788-2-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-06	block/mirror: check backing in bdrv_mirror_top_flush	Vladimir Sementsov-Ogievskiy	1	-0/+4
	Backing may be zero after failed bdrv_append in mirror_start_job, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	qcow2: truncate the tail of the image file after shrinking the image	Pavel Butsykin	3	-0/+46
	Now after shrinking the image, at the end of the image file, there might be a tail that probably will never be used. So we can find the last used cluster and cut the tail. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	qcow2: fix return error code in qcow2_truncate()	Pavel Butsykin	1	-2/+2
	Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block/mirror: check backing in bdrv_mirror_top_refresh_filename	Vladimir Sementsov-Ogievskiy	1	-0/+5
	Backing may be zero after failed bdrv_attach_child in bdrv_set_backing_hd, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block: support passthrough of BDRV_REQ_FUA in crypto driver	Daniel P. Berrange	1	-2/+5
	The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver as a passthrough to the underlying block driver. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-7-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block: convert qcrypto_block_encrypt\|decrypt to take bytes offset	Daniel P. Berrange	4	-19/+16
	Instead of sector offset, take the bytes offset when encrypting or decrypting data. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-6-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block: convert crypto driver to bdrv_co_preadv\|pwritev	Daniel P. Berrange	1	-52/+54
	Make the crypto driver implement the bdrv_co_preadv\|pwritev callbacks, and also use bdrv_co_preadv\|pwritev for I/O with the protocol driver beneath. This replaces sector based I/O with byte based I/O, and allows us to stop assuming the physical sector size matches the encryption sector size. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-5-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block: fix data type casting for crypto payload offset	Daniel P. Berrange	1	-4/+9
	The crypto APIs report the offset of the data payload as an uint64_t type, but the block driver is casting to size_t or ssize_t which will potentially truncate. Most of the block APIs use int64_t for offsets meanwhile, so even if using uint64_t in the crypto block driver we are still at risk of truncation. Change the block crypto driver to use uint64_t, but add asserts that the value is less than INT64_MAX. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-4-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block: use 1 MB bounce buffers for crypto instead of 16KB	Daniel P. Berrange	1	-13/+15
	Using 16KB bounce buffers creates a significant performance penalty for I/O to encrypted volumes on storage which high I/O latency (rotating rust & network drives), because it triggers lots of fairly small I/O operations. On tests with rotating rust, and cache=none\|directsync, write speed increased from 2MiB/s to 32MiB/s, on a par with that achieved by the in-kernel luks driver. With other cache modes the in-kernel driver is still notably faster because it is able to report completion of the I/O request before any encryption is done, while the in-QEMU driver must encrypt the data before completion. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-2-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06	block: Perform copy-on-read in loop	Eric Blake	1	-38/+82
	Improve our braindead copy-on-read implementation. Pre-patch, we have multiple issues: - we create a bounce buffer and perform a write for the entire request, even if the active image already has 99% of the clusters occupied, and really only needs to copy-on-read the remaining 1% of the clusters - our bounce buffer was as large as the read request, and can needlessly exhaust our memory by using double the memory of the request size (the original request plus our bounce buffer), rather than a capped maximum overhead beyond the original - if a driver has a max_transfer limit, we are bypassing the normal code in bdrv_aligned_preadv() that fragments to that limit, and instead attempt to read the entire buffer from the driver in one go, which some drivers may assert on - a client can request a large request of nearly 2G such that rounding the request out to cluster boundaries results in a byte count larger than 2G. While this cannot exceed 32 bits, it DOES have some follow-on problems: -- the call to bdrv_driver_pread() can assert for exceeding BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks .bdrv_co_preadv -- if the buffer is all zeroes, the subsequent call to bdrv_co_do_pwrite_zeroes is a no-op due to a negative size, which means we did not actually copy on read Fix all of these issues by breaking up the action into a loop, where each iteration is capped to sane limits. Also, querying the allocation status allows us to optimize: when data is already present in the active layer, we don't need to bounce. Note that the code has a telling comment that copy-on-read should probably be a filter driver rather than a bolt-on hack in io.c; but that remains a task for another day. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	block: Add blkdebug hook for copy-on-read	Eric Blake	1	-0/+1
	Make it possible to inject errors on writes performed during a read operation due to copy-on-read semantics. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	block: Uniform handling of 0-length bdrv_get_block_status()	Eric Blake	1	-0/+4
	Handle a 0-length block status request up front, with a uniform return value claiming the area is not allocated. Most callers don't pass a length of 0 to bdrv_get_block_status() and friends; but it definitely happens with a 0-length read when copy-on-read is enabled. While we could audit all callers to ensure that they never make a 0-length request, and then assert that fact, it was just as easy to fix things to always report success (as long as the callers are careful to not go into an infinite loop). However, we had inconsistent behavior on whether the status is reported as allocated or defers to the backing layer, depending on what callbacks the driver implements, and possibly wasting quite a few CPU cycles to get to that answer. Consistently reporting unallocated up front doesn't really hurt anything, and makes it easier both for callers (0-length requests now have well-defined behavior) and for drivers (drivers don't have to deal with 0-length requests). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	commit: Remove overlay_bs	Kevin Wolf	1	-47/+15
	We don't need to make any assumptions about the graph layout above the top node of the commit operation any more. Remove the use of bdrv_find_overlay() and related variables from the commit job code. bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so we can just drop it. The overlay node was previously added to the block job to get a BLK_PERM_GRAPH_MOD. We really need to respect those permissions in bdrv_drop_intermediate() now, but as long as we haven't figured out yet how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO comment there. With this change, it is now possible to perform another block job on an overlay node without conflicts. qemu-iotests 030 is changed accordingly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06	commit: Support multiple roots above top node	Kevin Wolf	1	-1/+1
	This changes the commit block job to support operation in a graph where there is more than a single active layer that references the top node. This involves inserting the commit filter node not only on the path between the given active node and the top node, but between the top node and all of its parents. On completion, bdrv_drop_intermediate() must consider all parents for updating the backing file link. These parents may be backing files themselves and as such read-only; reopen them temporarily if necessary. Previously this was achieved by the bdrv_reopen() calls in the commit block job that made overlay_bs read-write for the whole duration of the block job, even though write access is only needed on completion. Now that we consider all parents, overlay_bs is meaningless. It is left in place in this commit, but we'll remove it soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Convert internal hbitmap size/granularity	Eric Blake	1	-44/+18
	Now that all callers are using byte-based interfaces, there's no reason for our internal hbitmap to remain with sector-based granularity. It also simplifies our internal scaling, since we already know that hbitmap widens requests out to granularity boundaries. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Switch bdrv_set_dirty() to bytes	Eric Blake	2	-7/+6
	Both callers already had bytes available, but were scaling to sectors. Move the scaling to internal code. In the case of bdrv_aligned_pwritev(), we are now passing the exact offset rather than a rounded sector-aligned value, but that's okay as long as dirty bitmap widens start/bytes to granularity boundaries. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	qcow2: Switch store_bitmap_data() to byte-based iteration	Eric Blake	1	-15/+16
	Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. iotests 165 was rather weak - on a default 64k-cluster image, where bitmap granularity also defaults to 64k bytes, a single cluster of the bitmap table thus covers (6410248) bits which each cover 64k bytes, or 32G of image space. But the test only uses a 1G image, so it cannot trigger any more than one loop of the code in store_bitmap_data(); and it was writing to the first cluster. In order to test that we are properly aligning which portions of the bitmap are being written to the file, we really want to test a case where the first dirty bit returned by bdrv_dirty_iter_next() is not aligned to the start of a cluster, which we can do by modifying the test to write data that doesn't happen to fall in the first cluster of the image. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	qcow2: Switch load_bitmap_data() to byte-based iteration	Eric Blake	1	-14/+8
	Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	qcow2: Switch qcow2_measure() to byte-based iteration	Eric Blake	1	-12/+10
	This is new code, but it is easier to read if it makes passes over the image using bytes rather than sectors (and will get easier in the future when bdrv_get_block_status is converted to byte-based). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	mirror: Switch mirror_dirty_init() to byte-based iteration	Eric Blake	1	-24/+14
	Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes	Eric Blake	2	-16/+22
	Some of the callers were already scaling bytes to sectors; others can be easily converted to pass byte offsets, all in our shift towards a consistent byte interface everywhere. Making the change will also make it easier to write the hold-out callers to use byte rather than sectors for their iterations; it also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Although all callers happen to pass sector-aligned values, make the internal scaling robust to any sub-sector requests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes	Eric Blake	2	-6/+5
	Half the callers were already scaling bytes to sectors; the other half can eventually be simplified to use byte iteration. Both callers were already using the result as a bool, so make that explicit. Making the change also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Remember, asking whether a byte is dirty is effectively asking whether the entire granularity containing the byte is dirty, since we only track dirtiness by granularity. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Change bdrv_get_dirty_count() to report bytes	Eric Blake	2	-12/+8
	Thanks to recent cleanups, all callers were scaling a return value of sectors into bytes; do the scaling internally instead. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset	Eric Blake	4	-7/+8
	Thanks to recent cleanups, most callers were scaling a return value of sectors into bytes (the exception, in qcow2-bitmap, will be converted to byte-based iteration later). Update the interface to do the scaling internally instead. In qcow2-bitmap, the code was specifically checking for an error return of -1. To avoid a regression, we either have to make sure we continue to return -1 (rather than a scaled -512) on error, or we have to fix the caller to treat all negative values as error rather than just one magic value. It's easy enough to make both changes at the same time, even though either one in isolation would work. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Set iterator start by offset, not sector	Eric Blake	4	-12/+10
	All callers to bdrv_dirty_iter_new() passed 0 for their initial starting point, drop that parameter. Most callers to bdrv_set_dirty_iter() were scaling a byte offset to a sector number; the exception qcow2-bitmap will be converted later to use byte rather than sector iteration. Move the scaling to occur internally to dirty bitmap code instead, so that callers now pass in bytes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based	Eric Blake	1	-14/+14
	We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the qcow2 bitmap helper function sectors_covered_by_bitmap_cluster(), renaming it to bytes_covered_by_bitmap_cluster() in the process. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Change bdrv_dirty_bitmap_serialize() to take bytes	Eric Blake	2	-21/+38
	Right now, the dirty-bitmap code exposes the fact that we use a scale of sector granularity in the underlying hbitmap to anything that wants to serialize a dirty bitmap. It's nicer to uniformly expose bytes as our dirty-bitmap interface, matching the previous change to bitmap size. The only caller to serialization is currently qcow2-cluster.c, which becomes a bit more verbose because it is still tracking sectors for other reasons, but a later patch will fix that to more uniformly use byte offsets everywhere. Likewise, within dirty-bitmap, we have to add more assertions that we are not truncating incorrectly, which can go away once the internal hbitmap is byte-based rather than sector-based. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Track bitmap size by bytes	Eric Blake	1	-12/+14
	We are still using an internal hbitmap that tracks a size in sectors, with the granularity scaled down accordingly, because it lets us use a shortcut for our iterators which are currently sector-based. But there's no reason we can't track the dirty bitmap size in bytes, since it is (mostly) an internal-only variable (remember, the size is how many bytes are covered by the bitmap, not how many bytes the bitmap occupies). A later cleanup will convert dirty bitmap internals to be entirely byte-based, eliminating the intermediate sector rounding added here; and technically, since bdrv_getlength() already rounds up to sectors, our use of DIV_ROUND_UP is more for theoretical completeness than for any actual rounding. Use is_power_of_2() while at it, instead of open-coding that. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes	Eric Blake	2	-7/+9
	We're already reporting bytes for bdrv_dirty_bitmap_granularity(); mixing bytes and sectors in our return values is a recipe for confusion. A later cleanup will convert dirty bitmap internals to be entirely byte-based, but in the meantime, we should report the bitmap size in bytes. The only external caller in qcow2-bitmap.c is temporarily more verbose (because it is still using sector-based math), but will later be switched to track progress by bytes instead of sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Avoid size query failure during truncate	Eric Blake	1	-3/+3
	We've previously fixed several places where we failed to account for possible errors from bdrv_nb_sectors(). Fix another one by making bdrv_dirty_bitmap_truncate() take the new size from the caller instead of querying itself; then adjust the sole caller bdrv_truncate() to pass the size just determined by a successful resize, or to reuse the size given to the original truncate operation when refresh_total_sectors() was not able to confirm the actual size (the two sizes can potentially differ according to rounding constraints), thus avoiding sizing the bitmaps to -1. This also fixes a bug where not all failure paths in bdrv_truncate() would set errp. Note that bdrv_truncate() is still a bit awkward. We may want to revisit it later and clean up things to better guarantee that a resize attempt either fails cleanly up front, or cannot fail after guest-visible changes have been made (if temporary changes are made, then they need to be cleanly rolled back). But that is a task for another day; for now, the goal is the bare minimum fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	dirty-bitmap: Drop unused functions	Eric Blake	1	-44/+0
	We had several functions that no one is currently using, and which use sector-based interfaces. I'm trying to convert towards byte-based interfaces, so it's easier to just drop the unused functions: bdrv_dirty_bitmap_get_meta bdrv_dirty_bitmap_get_meta_locked bdrv_dirty_bitmap_reset_meta bdrv_dirty_bitmap_meta_granularity Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	qcow2: Ensure bitmap serialization is aligned	Eric Blake	1	-2/+5
	When subdividing a bitmap serialization, the code in hbitmap.c enforces that start/count parameters are aligned (except that count can end early at end-of-bitmap). We exposed this required alignment through bdrv_dirty_bitmap_serialization_align(), but forgot to actually check that we comply with it. Fortunately, qcow2 is never dividing bitmap serialization smaller than one cluster (which is a minimum of 512 bytes); so we are always compliant with the serialization alignment (which insists that we partition at least 64 bits per chunk) because we are doing at least 4k bits per chunk. Still, it's safer to add an assertion (for the unlikely case that we'd ever support a cluster smaller than 512 bytes, or if the hbitmap implementation changes what it considers to be aligned), rather than leaving bdrv_dirty_bitmap_serialization_align() without a caller. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	hbitmap: Rename serialization_granularity to serialization_align	Eric Blake	1	-1/+1
	The only client of hbitmap_serialization_granularity() is dirty-bitmap's bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent is worthwhile, and the shorter name is more representative of what the function returns (the required alignment to be used for start/count of other serialization functions, where violating the alignment causes assertion failures). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06	block: Typo fix in copy_on_readv()	Eric Blake	1	-1/+1
	Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-27	Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging	Peter Maydell	12	-41/+289
	Block layer patches # gpg: Signature made Tue 26 Sep 2017 14:52:32 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) block/qcow2-bitmap: fix use of uninitialized pointer qemu-iotests: add shrinking image test qcow2: add shrink image support qcow2: add qcow2_cache_discard qemu-img: add --shrink flag for resize iotests: fix 181: enable postcopy-ram capability on target qemu-iotests: Test change-backing-file command block: Fix permissions after bdrv_reopen() block: reopen: Queue children after their parents block: Base permissions on rw state after reopen block: Add reopen queue to bdrv_check_perm() block: Add reopen_queue to bdrv_child_perm() qemu-io: Drop write permissions before read-only reopen block: Clean up some bad code in the vvfat driver block/throttle-groups.c: allocate RestartData on the heap throttle: Assert that bkt->max is valid in throttle_compute_wait() iotests: Print full path of bad output if mismatch iotests: use virtio aliases for 067 iotests: use -ccw on s390x for 051 iotests: use -ccw on s390x for 040, 139, and 182 ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-26	block/qcow2-bitmap: fix use of uninitialized pointer	Vladimir Sementsov-Ogievskiy	1	-1/+1
	Without initialization to zero dirty_bitmap field may be not zero for a bitmap which should not be stored and qcow2_store_persistent_dirty_bitmaps will erroneously call store_bitmap for it which leads to SIGSEGV on bdrv_dirty_bitmap_name. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170922144353.4220-1-vsementsov@virtuozzo.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26	qcow2: add shrink image support	Pavel Butsykin	4	-9/+218
	This patch add shrinking of the image file for qcow2. As a result, this allows us to reduce the virtual image size and free up space on the disk without copying the image. Image can be fragmented and shrink is done by punching holes in the image file. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26	qcow2: add qcow2_cache_discard	Pavel Butsykin	3	-2/+47
	Whenever l2/refcount table clusters are discarded from the file we can automatically drop unnecessary content of the cache tables. This reduces the chance of eviction useful cache data and eliminates inconsistent data in the cache with the data in the file. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26	block: Add reopen_queue to bdrv_child_perm()	Kevin Wolf	4	-0/+4
	In the context of bdrv_reopen(), we'll have to look at the state of the graph as it will be after the reopen. This interface addition is in preparation for the change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26	block: Clean up some bad code in the vvfat driver	Thomas Huth	1	-24/+2
	Remove the unnecessary home-grown redefinition of the assert() macro here, and remove the unusable debug code at the end of the checkpoint() function. The code there uses assert() with side-effects (assignment to the "mapping" variable), which should be avoided. Looking more closely, it seems as it is apparently also only usable for one certain directory layout (with a file named USB.H in it) and thus is of no use for the rest of the world. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26	block/throttle-groups.c: allocate RestartData on the heap	Manos Pitsidianakis	1	-5/+7
	RestartData is the opaque data of the throttle_group_restart_queue_entry coroutine. By being stack allocated, it isn't available anymore if aio_co_enter schedules the coroutine with a bottom half and runs after throttle_group_restart_queue returns. Cc: qemu-stable@nongnu.org Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26	file-posix: Clear out first sector in hdev_create	Fam Zheng	1	-0/+10
	People get surprised when, after "qemu-img create -f raw /dev/sdX", they still see qcow2 with "qemu-img info", if previously the bdev had a qcow2 header. While this is natural because raw doesn't need to write any magic bytes during creation, hdev_create is free to clear out the first sector to make sure the stale qcow2 header doesn't cause such confusion. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-25	block/nbd-client: nbd_co_send_request: fix return code	Vladimir Sementsov-Ogievskiy	1	-0/+2
	It's incorrect to return success rc >= 0 if we skip qio_channel_writev_all() call due to s->quit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>