summaryrefslogtreecommitdiff
path: root/include/block/block_int.h
AgeCommit message (Collapse)AuthorFilesLines
2017-03-17block: Always call bdrv_child_check_perm firstFam Zheng1-4/+0
bdrv_child_set_perm alone is not very usable because the caller must call bdrv_child_check_perm first. This is already encapsulated conveniently in bdrv_child_try_set_perm, so remove the other prototypes from the header and fix the one wrong caller, block/mirror.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-03-07block: Fix error handling in bdrv_replace_in_backing_chain()Kevin Wolf1-2/+2
When adding an Error parameter, bdrv_replace_in_backing_chain() would become nothing more than a wrapper around change_parent_backing_link(). So make the latter public, renamed as bdrv_replace_node(), and remove bdrv_replace_in_backing_chain(). Most of the callers just remove a node from the graph that they just inserted, so they can use &error_abort, but completion of a mirror job with 'replaces' set can actually fail. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-03-07block: Ignore multiple children in bdrv_check_update_perm()Kevin Wolf1-1/+1
change_parent_backing_link() will need to update multiple BdrvChild objects at once. Checking permissions reference by reference doesn't work because permissions need to be consistent only with all parents moved to the new child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-02-28commit: Add filter-node-name to block-commitKevin Wolf1-3/+10
Management tools need to be able to know about every node in the graph and need a way to address them. Changing the graph structure was okay because libvirt doesn't really manage the node level yet, but future libvirt versions need to deal with both new and old version of qemu. This new option to blockdev-commit allows the client to set a node-name for the automatically inserted filter driver, and at the same time serves as a witness for a future libvirt that this version of qemu does automatically insert a filter driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28mirror: Add filter-node-name to blockdev-mirrorKevin Wolf1-1/+4
Management tools need to be able to know about every node in the graph and need a way to address them. Changing the graph structure was okay because libvirt doesn't really manage the node level yet, but future libvirt versions need to deal with both new and old version of qemu. This new option to blockdev-mirror allows the client to set a node-name for the automatically inserted filter driver, and at the same time serves as a witness for a future libvirt that this version of qemu does automatically insert a filter driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: BdrvChildRole.attach/detach() callbacksKevin Wolf1-0/+3
Backing files are somewhat special compared to other kinds of children because they are attached and detached using bdrv_set_backing_hd() rather than the normal set of functions, which does a few more things like setting backing blockers, toggling the BDRV_O_NO_BACKING flag, setting parent_bs->backing_file, etc. These special features are a reason why change_parent_backing_link() can't handle backing files yet. With abstracting the additional features into .attach/.detach callbacks, we get a step closer to a function that can actually deal with this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Add BdrvChildRole.stay_at_nodeKevin Wolf1-0/+4
When the parents' child links are updated in bdrv_append() or bdrv_replace_in_backing_chain(), this should affect all child links of BlockBackends or other nodes, but not on child links held for other purposes (like for setting permissions). This patch allows to control the behaviour per BdrvChildRole. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Add BdrvChildRole.get_parent_desc()Kevin Wolf1-0/+6
For meaningful error messages in the permission system, we need to get some human-readable description of the parent of a BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Allow error return in BlockDevOps.change_media_cb()Kevin Wolf1-1/+1
Some devices allow a media change between read-only and read-write media. They need to adapt the permissions in their .change_media_cb() implementation, which can fail. So add an Error parameter to the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28vvfat: Implement .bdrv_child_perm()Kevin Wolf1-0/+1
vvfat is the last remaining driver that can have children, but doesn't implement .bdrv_child_perm() yet. The default handlers aren't suitable here, so let's implement a very simple driver-specific one that protects the internal child from being used by other users as good as our permissions permit. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Default .bdrv_child_perm() for format driversKevin Wolf1-0/+8
Almost all format drivers have the same characteristics as far as permissions are concerned: They have one or more children for storing their own data and, more importantly, metadata (can be written to and grow even without external write requests, must be protected against other writers and present consistent data) and optionally a backing file (this is just data, so like for a filter, it only depends on what the parent nodes need). This provides a default implementation that can be shared by most of our format drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28block: Default .bdrv_child_perm() for filter driversKevin Wolf1-0/+8
Most filters need permissions related to read and write for their children, but only if the node has a parent that wants to use the same operation on the filter. The same is true for resize. This adds a default implementation that simply forwards all necessary permissions to all children of the node and leaves the other permissions unchanged. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28block: Involve block drivers in permission grantingKevin Wolf1-0/+61
In many cases, the required permissions of one node on its children depend on what its parents require from it. For example, the raw format or most filter drivers only need to request consistent reads if that's something that one of their parents wants. In order to achieve this, this patch introduces two new BlockDriver callbacks. The first one lets drivers first check (recursively) whether the requested permissions can be set; the second one actually sets the new permission bitmask. Also add helper functions that drivers can use in their implementation of the callbacks to update their permissions on a specific child. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-28block: Let callers request permissions when attaching a child nodeKevin Wolf1-1/+14
When attaching a node as a child to a new parent, the required and shared permissions for this parent are checked against all other parents of the node now, and an error is returned if there is a conflict. This allows error returns to a function that previously always succeeded, and the same is true for quite a few callers and their callers. Converting all of them within the same patch would be too much, so for now everyone tells that they don't need any permissions and allow everyone else to do anything. This way we can use &error_abort initially and convert caller by caller to pass actual permission requirements and implement error handling. All these places are marked with FIXME comments and it will be the job of the next patches to clean them up again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-21block: document fields protected by AioContext lockPaolo Bonzini1-25/+39
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-19-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-16block: get rid of bdrv_io_unplugged_begin/endPaolo Bonzini1-2/+1
bdrv_io_plug and bdrv_io_unplug are only called (via their BlockBackend equivalents) after starting asynchronous I/O. bdrv_drain is not going to be called while they are running, because---even if a coroutine runs for some reason---it will only drain in the next iteration of the event loop through bdrv_co_yield_to_drain. So this mechanism is unnecessary, get rid of it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113334.605-1-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-09block: Rename raw-{posix,win32} to file-*.cEric Blake1-1/+1
These files deal with the file protocol, not the raw format (the file protocol is often used with other formats, and the raw format is not forced to use the file protocol). Rename things to make it a bit easier to follow. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-11-14blockjob: refactor backup_start as backup_job_createJohn Snow1-11/+12
Refactor backup_start as backup_job_create, which only creates the job, but does not automatically start it. The old interface, 'backup_start', is not kept in favor of limiting the number of nearly-identical interfaces that would have to be edited to keep up with QAPI changes in the future. Callers that wish to synchronously start the backup_block_job can instead just call block_job_start immediately after calling backup_job_create. Transactions are updated to use the new interface, calling block_job_start only during the .commit phase, which helps prevent race conditions where jobs may finish before we even finish building the transaction. This may happen, for instance, during empty block backup jobs. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-6-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01blockjob: centralize QMP event emissionsJohn Snow1-13/+4
There's no reason to leave this to blockdev; we can do it in blockjobs directly and get rid of an extra callback for most users. All non-internal events, even those created outside of QMP, will consistently emit events. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-11-01Replication/Blockjobs: Create replication jobs as internalJohn Snow1-2/+7
Bubble up the internal interface to commit and backup jobs, then switch replication tasks over to using this methodology. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-10-28block: only call aio_poll on the current thread's AioContextPaolo Bonzini1-0/+17
aio_poll is not thread safe; for example bdrv_drain can hang if the last in-flight I/O operation is completed in the I/O thread after the main thread has checked bs->in_flight. The bug remains latent as long as all of it is called within aio_context_acquire/aio_context_release, but this will change soon. To fix this, if bdrv_drain is called from outside the I/O thread, signal the main AioContext through a dummy bottom half. The event loop then only runs in the I/O thread. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1477565348-5458-18-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28block: add BDS field to count in-flight requestsPaolo Bonzini1-4/+6
Unlike tracked_requests, this field also counts throttled requests, and remains non-zero if an AIO operation needs a BH to be "really" completed. With this change, it is no longer necessary to have a dummy BdrvTrackedRequest for requests that are never serialising, and it is no longer necessary to poll the AioContext once after bdrv_requests_pending(bs) returns false. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1477565348-5458-5-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-27block: Introduce .bdrv_co_ioctl() driver callbackKevin Wolf1-0/+2
This allows drivers to implement ioctls in a coroutine-based way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-28block: mirror: fix wrong comment of mirror_startYaowei Bai1-1/+1
Obviously, we should write to '@target'. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Reviewed-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1473851019-7005-2-git-send-email-baiyaowei@cmss.chinamobile.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-15Remove unused function declarationsLadi Prosek1-9/+0
Unused function declarations were found using a simple gcc plugin and manually verified by grepping the sources. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-13mirror: auto complete active commitWen Congyang1-1/+2
Auto complete mirror job in background to prevent from blocking synchronously Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-05drive-backup: added support for data compressionPavel Butsykin1-0/+1
The idea is simple - backup is "written-once" data. It is written block by block and it is large enough. It would be nice to save storage space and compress it. The patch adds a flag to the qmp/hmp drive-backup command which enables block compression. Compression should be implemented in the format driver to enable this feature. There are some limitations of the format driver to allow compressed writes. We can write data only once. Though for backup this is perfectly fine. These limitations are maintained by the driver and the error will be reported if we are doing something wrong. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block: remove BlockDriver.bdrv_write_compressedPavel Butsykin1-3/+0
There are no block drivers left that implement the old .bdrv_write_compressed interface, so it can be removed. Also now we have no need to use the bdrv_pwrite_compressed function and we can remove it entirely. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block/io: reuse bdrv_co_pwritev() for write compressedPavel Butsykin1-0/+3
For bdrv_pwrite_compressed() it looks like most of the code creating coroutine is duplicated in bdrv_prwv_co(). So we can just add a flag (BDRV_REQ_WRITE_COMPRESSED) and use bdrv_prwv_co() as a generic one. In the end we get coroutine oriented function for write compressed by using bdrv_co_pwritev/blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED flag. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-18block: fix deadlock in bdrv_co_flushEvgeny Yakovlev1-1/+1
The following commit commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Date: Mon Jul 18 22:39:52 2016 +0300 block: ignore flush requests when storage is clean has introduced a regression. There is a problem that it is still possible for 2 requests to execute in non sequential fashion and sometimes this results in a deadlock when bdrv_drain_one/all are called for BDS with such stalled requests. 1. Current flushed_gen and flush_started_gen is 1. 2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same as flushed_gen). It gets past flushed_gen != flush_started_gen and sets flush_started_gen to 1 (again, the same it was before). 3. Request 1 yields somewhere before exiting bdrv_co_flush 4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past flushed_gen != flush_started_gen and sets flush_started_gen to 2. 5. Request 2 runs to completion and sets flushed_gen to 2 6. Request 1 is resumed, runs to completion and sets flushed_gen to 1. However flush_started_gen is now 2. From here on out flushed_gen is always != to flush_started_gen and all further requests will wait on flush_queue. This change replaces flush_started_gen with an explicitly tracked active flush request. Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1471457214-3994-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-03block: Cater to iscsi with non-power-of-2 discardEric Blake1-17/+20
Dell Equallogic iSCSI SANs have a very unusual advertised geometry: $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0 wsnz:0 maximum compare and write length:1 optimal transfer length granularity:0 maximum transfer length:0 optimal transfer length:0 maximum prefetch xdread xdwrite transfer length:0 maximum unmap lba count:30720 maximum unmap block descriptor count:2 optimal unmap granularity:30720 ugavalid:1 unmap granularity alignment:0 maximum write same length:30720 which says that both the maximum and the optimal discard size is 15M. It is not immediately apparent if the device allows discard requests not aligned to the optimal size, nor if it allows discards at a finer granularity than the optimal size. I tried to find details in the SCSI Commands Reference Manual Rev. A on what valid values of maximum and optimal sizes are permitted, but while that document mentions a "Block Limits VPD Page", I couldn't actually find documentation of that page or what values it would have, or if a SCSI device has an advertisement of its minimal unmap granularity. So it is not obvious to me whether the Dell Equallogic device is compliance with the SCSI specification. Fortunately, it is easy enough to support non-power-of-2 sizing, even if it means we are less efficient than truly possible when targetting that device (for example, it means that we refuse to unmap anything that is not a multiple of 15M and aligned to a 15M boundary, even if the device truly does support a smaller granularity where unmapping actually works). Reported-by: Peter Lieven <pl@kamp.de> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-21Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell1-4/+4
staging Pull request v2: * Resolved merge conflict with block/iscsi.c [Peter] # gpg: Signature made Wed 20 Jul 2016 17:20:52 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (25 commits) raw_bsd: Convert to byte-based interface nbd: Convert to byte-based interface block: Kill .bdrv_co_discard() sheepdog: Switch .bdrv_co_discard() to byte-based raw_bsd: Switch .bdrv_co_discard() to byte-based qcow2: Switch .bdrv_co_discard() to byte-based nbd: Switch .bdrv_co_discard() to byte-based iscsi: Switch .bdrv_co_discard() to byte-based gluster: Switch .bdrv_co_discard() to byte-based blkreplay: Switch .bdrv_co_discard() to byte-based block: Add .bdrv_co_pdiscard() driver callback block: Convert .bdrv_aio_discard() to byte-based rbd: Switch rbd_start_aio() to byte-based raw-posix: Switch paio_submit() to byte-based block: Convert BB interface to byte-based discards block: Convert bdrv_aio_discard() to byte-based block: Switch BlockRequest to byte-based block: Convert bdrv_discard() to byte-based block: Convert bdrv_co_discard() to byte-based iscsi: Rely on block layer to break up large requests ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Conflicts: block/gluster.c
2016-07-20block: Kill .bdrv_co_discard()Eric Blake1-2/+0
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we no longer need to worry about the sector-based version. We can also relax our minimum alignment to 1 for drivers that support it. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20block: Add .bdrv_co_pdiscard() driver callbackEric Blake1-0/+2
There's enough drivers with a sector-based callback that it will be easier to switch one at a time. This patch adds a byte-based callback, and then after all drivers are swapped, we'll drop the sector-based callback. [checkpatch doesn't like the space after coroutine_fn in block_int.h, but it's consistent with the rest of the file] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20block: Convert .bdrv_aio_discard() to byte-basedEric Blake1-2/+2
Another step towards byte-based interfaces everywhere. Replace the sector-based driver callback .bdrv_aio_discard() with a new byte-based .bdrv_aio_pdiscard(). Only raw-posix and RBD drivers are affected, so it was not worth splitting into multiple patches. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-19dirty-bitmap: operate with int64_t amountDenis V. Lunev1-1/+1
Underlying HBitmap operates even with uint64_t. Thus this change is safe. This would be useful f.e. to mark entire bitmap dirty in one call. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1468503209-19498-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-18block: ignore flush requests when storage is cleanEvgeny Yakovlev1-0/+5
Some guests (win2008 server for example) do a lot of unnecessary flushing when underlying media has not changed. This adds additional overhead on host when calling fsync/fdatasync. This change introduces a write generation scheme in BlockDriverState. Current write generation is checked against last flushed generation to avoid unnessesary flushes. The problem with excessive flushing was found by a performance test which does parallel directory tree creation (from 2 processes). Results improved from 0.424 loops/sec to 0.432 loops/sec. Each loop creates 10^3 directories with 10 files in each. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely (tests 026 071 089). This also affects the performance of block jobs and thus BLOCK_JOB_READY events for driver-mirror and active block-commit commands now arrives faster, before QMP send successfully returns to caller (tests 141 144). Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-13commit: Add 'job-id' parameter to 'block-commit'Alberto Garcia1-6/+10
This patch adds a new optional 'job-id' parameter to 'block-commit', allowing the user to specify the ID of the block job to be created. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13stream: Add 'job-id' parameter to 'block-stream'Alberto Garcia1-4/+6
This patch adds a new optional 'job-id' parameter to 'block-stream', allowing the user to specify the ID of the block job to be created. The HMP 'block_stream' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup'Alberto Garcia1-3/+5
This patch adds a new optional 'job-id' parameter to 'blockdev-backup' and 'drive-backup', allowing the user to specify the ID of the block job to be created. The HMP 'drive_backup' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror'Alberto Garcia1-2/+4
This patch adds a new optional 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror', allowing the user to specify the ID of the block job to be created. The HMP 'drive_mirror' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13stream: Fix prototype of stream_start()Alberto Garcia1-5/+6
'stream-start' has a parameter called 'backing-file', which is the string to be written to bs->backing when the job finishes. In the stream_start() implementation it is called 'backing_file_str', but it the prototype in the header file it is called 'base_id'. This patch fixes it so the name is the same in both cases and is consistent with other cases (like commit_start()). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: Convert bdrv_co_preadv/pwritev to BdrvChildKevin Wolf1-2/+2
This is the final patch for converting the common I/O path to take a BdrvChild parameter instead of BlockDriverState. The completion of this conversion means that all users that perform I/O on an image need to actually hold a reference (in the form of BdrvChild, possible as part of a BlockBackend) to that image. This also protects against inconsistent use of BlockBackend vs. BlockDriverState functions because direct use of a BlockDriverState isn't possible any more and blk->root is private for block-backends.c. In addition, we can now distinguish different users in the I/O path, and the future op blockers work is going to add assertions based on permissions stored in BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Use bool as appropriate for BDS membersEric Blake1-6/+7
Using int for values that are only used as booleans is confusing. While at it, rearrange a couple of members so that all the bools are contiguous. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: Move request_alignment into BlockLimitEric Blake1-9/+13
It makes more sense to have ALL block size limit constraints in the same struct. Improve the documentation while at it. Simplify a couple of conditionals, now that we have audited and documented that request_alignment is always non-zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: Switch discard length bounds to byte-basedEric Blake1-5/+11
Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_discard and discard_alignment. Rename them, using 'pdiscard' as an aid to track which remaining discard interfaces need conversion, and so that the compiler will help us catch the change in semantics across any rebased code. The BlockLimits type is now completely byte-based; and in iscsi.c, sector_limits_lun2qemu() is no longer needed. pdiscard_alignment is made unsigned (we use power-of-2 alignments as bitmasks, where unsigned is easier to think about) while leaving max_pdiscard signed (since we still have an 'int' interface); this is comparable to what commit cf081fc did for write zeroes limits. We may later want to make everything an unsigned 64-bit limit - but that requires a bigger code audit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: Wording tweaks to write zeroes limitsEric Blake1-2/+5
Improve the documentation of the write zeroes limits, to mention additional constraints that drivers should observe. Worth squashing into commit cf081fca, if that hadn't been pushed already :) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: Switch transfer length bounds to byte-basedEric Blake1-5/+8
Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_transfer_length and opt_transfer_length. Rename them (dropping the _length suffix) so that the compiler will help us catch the change in semantics across any rebased code, and improve the documentation. Use unsigned values, so that we don't have to worry about negative values and so that bit-twiddling is easier; however, we are still constrained by 2^31 of signed int in most APIs. When a value comes from an external source (iscsi and raw-posix), sanitize the results to ensure that opt_transfer is a power of 2. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-20block: use safe iteration over AioContext notifiersStefan Hajnoczi1-0/+2
It's possible that an AioContext notifier user was close to finishing when .detach_aio_context() or .attached_aio_context() is called. In that case they may call bdrv_remove_aio_context_notifier() during the callback. Use safe iteration to avoid crashing when the notifier list is modified during iteration. We must not only handle the case where the current aio notifier is removed during a callback but also the one where any other aio notifier is removed. The next patch adds an AioContext notifier for block jobs and they really could be terminating just as .detach_aio_context() is invoked. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
2016-06-16block/mirror: Fix target backing BDSMax Reitz1-1/+17
Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>