summaryrefslogtreecommitdiff
path: root/block.c
AgeCommit message (Collapse)AuthorFilesLines
2016-05-19block: Use BdrvChild callbacks for change_media/resizeKevin Wolf1-13/+26
We want to get rid of BlockDriverState.blk in order to allow multiple BlockBackends per BDS. Converting the device callbacks in block.c (which assume a single BlockBackend) to per-child callbacks gets us rid of the first few instances. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6"Kevin Wolf1-6/+0
This reverts commit 76b223200ef4fb09dd87f0e213159795eb68e7a5. Now that I/O throttling is fully done on the BlockBackend level, there is no reason any more to block I/O throttling for nodes with multiple parents as the parents don't influence each other any more. Conflicts: block.c Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Remove bdrv_move_feature_fields()Kevin Wolf1-30/+0
bdrv_move_feature_fields() and swap_feature_fields() are empty now, they can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Decouple throttling from BlockDriverStateKevin Wolf1-34/+0
This moves the throttling related part of the BDS life cycle management to BlockBackend. The throttling group reference is now kept even when no medium is inserted. With this commit, throttling isn't disabled and then re-enabled any more during graph reconfiguration. This fixes the temporary breakage of I/O throttling when used with live snapshots or block jobs that manipulate the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Move I/O throttling configuration functions to BlockBackendKevin Wolf1-1/+1
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Move throttling fields from BDS to BBKevin Wolf1-11/+11
This patch changes where the throttling state is stored (used to be the BlockDriverState, now it is the BlockBackend), but it doesn't actually make it a BB level feature yet. For example, throttling is still disabled when the BDS is detached from the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Make sure throttled BDSes always have a BBKevin Wolf1-0/+14
It was already true in principle that a throttled BDS always has a BB attached, except that the order of operations while attaching or detaching a BDS to/from a BB wasn't careful enough. This commit breaks graph manipulations while I/O throttling is enabled. It would have been possible to keep things working with some temporary hacks, but quite cumbersome, so it's not worth the hassle. We'll fix things again in a minute. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-12quorum: implement bdrv_add_child() and bdrv_del_child()Wen Congyang1-4/+4
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12Add new block driver interface to add/delete a BDS's childWen Congyang1-0/+49
In some cases, we want to take a quorum child offline, and take another child online. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12block: Inactivate all childrenFam Zheng1-11/+36
Currently we only inactivate the top BDS. Actually bdrv_inactivate should be the opposite of bdrv_invalidate_cache. Recurse into the whole subtree instead. Because a node may have multiple parents, and because once BDRV_O_INACTIVE is set for a node, further writes are not allowed, we cannot interleave flag settings and .bdrv_inactivate calls (that may submit write to other nodes in a graph) within a single pass. Therefore two passes are used here. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12block: Invalidate all childrenFam Zheng1-6/+14
Currently we only recurse to bs->file, which will miss the children in quorum and VMDK. Recurse into the whole subtree to avoid that. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12block: Remove BlockDriver.bdrv_read/writeKevin Wolf1-2/+0
There are no block drivers left that implement the old .bdrv_read/write interface, so it can be removed now. This gets us rid of the corresponding emulation functions, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12block: introduce bdrv_no_throttling_begin/endPaolo Bonzini1-1/+0
Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-05block: Forbid I/O throttling on nodes with multiple parents for 2.6Kevin Wolf1-0/+7
As the patches to move I/O throttling to BlockBackend didn't make it in time for the 2.6 release, but the release adds new ways of configuring VMs whose behaviour would change once the move is done, we need to outlaw such configurations temporarily. The problem exists whenever a BDS has more users than just its BB, for example it is used as a backing file for another node. (This wasn't possible in 2.5 yet as we introduced node references to specify a backing file only recently.) In these cases, the throttling would apply to these other users now, but after moving throttling to the BlockBackend the other users wouldn't be throttled any more. This patch prevents making new references to a throttled node as well as using monitor commands to throttle a node with multiple parents. Compared to 2.5 this changes behaviour in some corner cases where references were allowed before, like bs->file or Quorum children. It seems reasonable to assume that users didn't use I/O throttling on such low level nodes. With the upcoming move of throttling into BlockBackend, such configurations won't be possible anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30block: Remove bdrv_(set_)enable_write_cache()Kevin Wolf1-16/+0
The only remaining users were block jobs (mirror and backup) which unconditionally enabled WCE on the BlockBackend of the target image. As these block jobs don't go through BlockBackend for their I/O requests, they aren't affected by this setting anyway but always get a writeback mode, so that call can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30block: Remove BDRV_O_CACHE_WBKevin Wolf1-46/+2
The previous patches have successively made blk->enable_write_cache the true source for the information whether a writethrough mode must be implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage we're carrying around, so now's the time to remove it. At the same time, we remove the 'cache.writeback' option parsing on the BDS level as the only effect was setting the BDRV_O_CACHE_WB flag. This change requires test cases that explicitly enabled the option to drop it. Other than that and the change of the error message when writethrough is enabled on the BDS level (from "Can't set writethrough mode" to "doesn't support the option"), there should be no change in behaviour. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30block: Remove bdrv_parse_cache_flags()Kevin Wolf1-22/+7
All users are converted to bdrv_parse_cache_mode() now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30qemu-io: Use bdrv_parse_cache_mode() in reopen_f()Kevin Wolf1-12/+6
We must forbid changing the WCE flag in bdrv_reopen() in the same patch, as otherwise the behaviour would change so that the flag takes precedence over the explicitly specified option. The correct value of the WCE flag depends on the BlockBackend user (e.g. guest device) and isn't a decision that the QMP client makes, so this change is what we want. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30block/qapi: Use blk_enable_write_cache()Kevin Wolf1-1/+1
Now that WCE is handled on the BlockBackend level, the flag is meaningless for BDSes. As the schema requires us to fill the field, we return an enabled write cache for them. Note that this means that querying the BlockBackend name may return writethrough as the cache information, whereas querying the node-name of the root of that same BlockBackend will return writeback. This may appear odd at first, but it actually makes sense because it correctly repesents the layer that implements the WCE handling. This becomes more apparent when you consider nodes that are the root node of multiple BlockBackends, where each BB can have its own WCE setting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30block: Move enable_write_cache to BB levelKevin Wolf1-9/+17
Whether a write cache is used or not is a decision that concerns the user (e.g. the guest device) rather than the backend. It was already logically part of the BB level as bdrv_move_feature_fields() always kept it on top of the BDS tree; with this patch, the core of it (the actual flag and the additional flushes) is also implemented there. Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs doesn't have a BlockBackend attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30block: Add bdrv_parse_cache_mode()Kevin Wolf1-0/+17
It's like bdrv_parse_cache_flags(), except that writethrough mode isn't included in the flags, but returned as a separate bool. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30block: move encryption deprecation warning into qcow codeDaniel P. Berrange1-7/+5
For a couple of releases we have been warning Encrypted images are deprecated Support for them will be removed in a future release. You can use 'qemu-img convert' to convert your image to an unencrypted one. This warning was issued by system emulators, qemu-img, qemu-nbd and qemu-io. Such a broad warning was issued because the original intention was to rip out all the code for dealing with encryption inside the QEMU block layer APIs. The new block encryption framework used for the LUKS driver does not rely on the unloved block layer API for encryption keys, instead using the QOM 'secret' object type. It is thus no longer appropriate to warn about encryption unconditionally. When the qcow/qcow2 drivers are converted to use the new encryption framework too, it will be practical to keep AES-CBC support present for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability with older QEMU versions and liberation of data from existing encrypted qcow2 files. This change moves the warning out of the generic block code and into the qcow/qcow2 drivers. Further, the warning is set to only appear when running the system emulators, since qemu-img, qemu-io, qemu-nbd are expected to support qcow2 encryption long term now that the maint burden has been eliminated. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30block: add flag to indicate that no I/O will be performedDaniel P. Berrange1-2/+3
When opening an image it is useful to know whether the caller intends to perform I/O on the image or not. In the case of encrypted images this will allow the block driver to avoid having to prompt for decryption keys when we merely want to query header metadata about the image. eg qemu-img info This flag is enforced at the top level only, since even if we don't want todo I/O on the 'qcow2' file payload, the underlying 'file' driver will still need todo I/O to read the qcow2 header, for example. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30block: Reject writethrough mode except at the rootKevin Wolf1-0/+7
Writethrough mode is going to become a BlockBackend feature rather than a BDS one, so forbid it in places where we won't be able to support it when the code finally matches the envisioned design. We only allowed setting the cache mode of non-root nodes after the 2.5 release, so we're still free to make this change. The target of block jobs is now always opened in a writeback mode because it doesn't have a BlockBackend attached. This makes more sense anyway because block jobs know when to flush. If the graph is modified on job completion, the original cache mode moves to the new root, so for the guest device writethough always stays enabled if it was configured this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30block: Make backing files always writebackKevin Wolf1-2/+3
First of all, we're generally not writing to backing files, but when we do, it's in the context of block jobs which know very well when to flush the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30block: Remove dirty bitmaps from bdrv_move_feature_fields()Kevin Wolf1-3/+0
This patch changes dirty bitmaps from following a BlockBackend in graph changes to sticking with the node they were created at. For the full discussion, read the following mailing list thread: [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields() https://lists.nongnu.org/archive/html/qemu-block/2016-02/msg00745.html In summary, the justification for this change is: * When moving the dirty bitmap to the top of the tree was introduced in bdrv_append() in commit a9fc4408, it didn't actually have any effect because there could never be a bitmap in use when bdrv_append() was called (op blockers would prevent this). This is still true today for all internal uses of dirty bitmaps. * Support for user-defined dirty bitmaps was introduced in 2.4, but we discouraged users from using it because we didn't consider it ready yet. Moreover, in 2.5, the bdrv_swap() removal introduced a bug that left dangling pointers if a dirty bitmap was present (the anchors of the dirty bitmap were swapped, but the back link in the first element wasn't updated), so it didn't even work correctly. * block-dirty-bitmap-add takes an arbitrary node name, even if no BlockBackend is attached. This suggests that it is a node level operation and not a BlockBackend one. Consequently, there is no reason for dirty bitmaps to stay with a BlockBackend that was attached to the node they were created for. * It was suggested that block-dirty-bitmap-add could track the node if a node name was specified, and track the BlockBackend if the device name was specified. This would however be inconsistent with other QMP commands. Commands that accept both device and node names currently interpret the device name just as an alias for the current root node of that BlockBackend. * Dirty bitmaps have a name that is only unique amongst the bitmaps in a specific node. Moving bitmaps could lead to name clashes. Automatic renaming would involve too much magic. * Persistent bitmaps are stored in a specific node. Moving them around automatically might be at least surprising, but it would probably also become a real problem because that would have to happen atomically without the management tool knowing of the operation. At the end of the day it seems to be very clear that it was a mistake to include dirty bitmaps in bdrv_move_feature_fields(). The functionality of moving bitmaps and/or attaching them to a BlockBackend instead will probably be needed, but it should be done with a new explicit QMP command or option. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30block: Remove copy-on-read from bdrv_move_feature_fields()Kevin Wolf1-2/+0
Ever since we first introduced bdrv_append() in commit 8802d1fd ('qapi: Introduce blockdev-group-snapshot-sync command'), the copy-on-read flag was moved to the new top layer when taking a snapshot. The only problem is that it doesn't make a whole lot of sense. The use case for manually enabled CoR is to avoid reading data twice from a slow remote image, so we want to save it to a local overlay, say an ISO image accessed via HTTP to a local qcow2 overlay. When taking a snapshot, we end up with a backing chain like this: http <- local.qcow2 <- snap_overlay.qcow2 There is no point in doing CoR from local.qcow2 into snap_overlay.qcow2, we just want to keep copying data from the remote source into local.qcow2. The other use case of CoR is in the context of streaming, which isn't very interesting for bdrv_move_feature_fields() because op blockers prevent this combination. This patch makes the copy-on-read flag stay on the image for which it was originally set and prevents it from being propagated to the new overlay. It is no longer intended to move CoR to the BlockBackend level. In order for this to make sense, we also need to keep the respective image read-write. As a side effect of these changes, creating a live snapshot image (as opposed to using an existing externally created one) on top of a COR block device works now. It used to fail because it tried to open its backing file both read-only and with COR. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30block: Remove bdrv_make_anon()Kevin Wolf1-12/+3
The call in hmp_drive_del() is dead code because blk_remove_bs() is called a few lines above. The only other remaining user is bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the named nodes list. This path inlines the list entry removal into bdrv_delete() and removes bdrv_make_anon(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-22util: move declarations out of qemu-common.hVeronia Bahaa1-1/+2
Move declarations out of qemu-common.h for functions declared in utils/ files: e.g. include/qemu/path.h for utils/path.c. Move inline functions out of qemu-common.h and into new files (e.g. include/qemu/bcd.h) Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-17block: Use BdrvChild in BlockBackendKevin Wolf1-13/+34
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17block: Remove bdrv_states listMax Reitz1-28/+3
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17block: Use bdrv_next() instead of bdrv_statesMax Reitz1-6/+6
There is no point in manually iterating through the bdrv_states list when there is bdrv_next(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17block: Rewrite bdrv_next()Max Reitz1-3/+14
Instead of using the bdrv_states list, iterate over all the BlockDriverStates attached to BlockBackends, and over all the monitor-owned BDSs afterwards (except for those attached to a BB). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17block: Move some bdrv_*_all() functions to BBMax Reitz1-20/+0
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17block: Drop BB name from bad option errorMax Reitz1-3/+3
The information which BB is concerned does not seem useful enough to justify its existence in most other place (which may be related to qemu printing the -drive parameter in question anyway, and for blockdev-add the attribution is naturally unambiguous). Furthermore, as of a future patch, bdrv_get_device_name(bs) will always return the empty string before bdrv_open_inherit() returns. Therefore, just dropping that information seems to be the best course of action. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14block: Move block dirty bitmap code to separate filesFam Zheng1-360/+0
The only code change is making bdrv_dirty_bitmap_truncate public. It is used in block.c. Also two long lines (bdrv_get_dirty) are wrapped. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14block: Fix snapshot=on cache modesKevin Wolf1-11/+25
Since commit 91a097e, we end up with a somewhat weird cache mode configuration with snapshot=on: The commit broke the cache mode inheritance for the snapshot overlay so that it is opened as writethrough instead of unsafe now. The following bdrv_append() call to put it on top of the tree swaps the WCE flag with the snapshot's backing file (i.e. the originally given file), so what we eventually get is cache=writeback on the temporary overlay and cache=writethrough,cache.no-flush=on on the real image file. This patch changes things so that the temporary overlay gets cache=unsafe again like it used to, and the real images get whatever the user specified. This means that cache.direct is now respected even with snapshot=on, and in the case of committing changes, the final flush is no longer ignored except explicitly requested by the user. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-02-22block: Fix -incoming with snapshot=onKevin Wolf1-4/+0
The BDRV_O_INACTIVE flag should only be set for images explicitly opened by the user. snapshot=on needs to create a new qcow2 image and write some metadata to it. This is not a problem because it can't come from the source, so there's no reason to mark it as BDRV_O_INACTIVE, even though it is opened while waiting for the migration to complete. This fixes an assertion failure when -incoming and snapshot=on are combined. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-04all: Clean up includesPeter Maydell1-3/+1
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2016-02-02block: set device_list.tqe_prev to NULL on BDS removalJeff Cody1-10/+14
This fixes a regression introduced with commit 3f09bfbc7. Multiple bugs arise in conjunction with live snapshots and mirroring operations (which include active layer commit). After a live snapshot occurs, the active layer and the base layer both have a non-NULL tqe_prev field in the device_list, although the base node's tqe_prev field points to a NULL entry. This non-NULL tqe_prev field occurs after the bdrv_append() in the external snapshot calls change_parent_backing_link(). In change_parent_backing_link(), when the previous active layer is removed from device_list, the device_list.tqe_prev pointer is not set to NULL. The operating scheme in the block layer is to indicate that a BDS belongs in the bdrv_states device_list iff the device_list.tqe_prev pointer is non-NULL. This patch does two things: 1.) Introduces a new block layer helper bdrv_device_remove() to remove a BDS from the device_list, and 2.) uses that new API, which also fixes the regression once used in change_parent_backing_link(). Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 0cd51e11c0666c04ddb7c05293fe94afeb551e89.1454376655.git.jcody@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-02-02block: Rewrite bdrv_close_all()Max Reitz1-8/+26
This patch rewrites bdrv_close_all(): Until now, all root BDSs have been force-closed. This is bad because it can lead to cached data not being flushed to disk. Instead, try to make all reference holders relinquish their reference voluntarily: 1. All BlockBackend users are handled by making all BBs simply eject their BDS tree. Since a BDS can never be on top of a BB, this will not cause any of the issues as seen with the force-closing of BDSs. The references will be relinquished and any further access to the BB will fail gracefully. 2. All BDSs which are owned by the monitor itself (because they do not have a BB) are relinquished next. 3. Besides BBs and the monitor, block jobs and other BDSs are the only things left that can hold a reference to BDSs. After every remaining block job has been canceled, there should not be any BDSs left (and the loop added here will always terminate (as long as NDEBUG is not defined), because either all_bdrv_states will be empty or there will not be any block job left to cancel, failing the assertion). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02block: Add list of all BlockDriverStatesMax Reitz1-0/+7
We need this list so that bdrv_close_all() can keep track of which BDSs are still open after having removed the BDSs from all of the BBs and having released all monitor BDS references. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02block: Make bdrv_close() staticMax Reitz1-1/+3
There are no users of bdrv_close() left, except for one of bdrv_open()'s failure paths, bdrv_close_all() and bdrv_delete(), and that is good. Make bdrv_close() static so nobody makes the mistake of directly using bdrv_close() again. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02block: Remove BDS close notifierMax Reitz1-8/+0
It is unused now, so we can remove it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-02-02block: Release named dirty bitmaps in bdrv_close()Max Reitz1-8/+31
bdrv_delete() is not very happy about deleting BlockDriverStates with dirty bitmaps still attached to them. In the past, we got around that very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and bdrv_close() simply ignoring that condition. We should fix that by releasing all named dirty bitmaps in bdrv_close() (there should not be any unnamed bitmaps left) and moving the assertion from bdrv_delete() there. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20block: Inactivate BDS when migration completesKevin Wolf1-0/+34
So far, live migration with shared storage meant that the image is in a not-really-ready don't-touch-me state on the destination while the source is still actively using it, but after completing the migration, the image was fully opened on both sides. This is bad. This patch adds a block driver callback to inactivate images on the source before completing the migration. Inactivation means that it goes to a state as if it was just live migrated to the qemu instance on the source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue either on the source or on the destination, which takes ownership of the image. A typical migration looks like this now with respect to disk images: 1. Destination qemu is started, the image is opened with BDRV_O_INACTIVE. The image is fully opened on the source. 2. Migration is about to complete. The source flushes the image and inactivates it. Now both sides have the image opened with BDRV_O_INACTIVE and are expecting the other side to still modify it. 3. One side (the destination on success) continues and calls bdrv_invalidate_all() in order to take ownership of the image again. This removes BDRV_O_INACTIVE on the resuming side; the flag remains set on the other side. This ensures that the same image isn't written to by both instances (unless both are resumed, but then you get what you deserve). This is important because .bdrv_close for non-BDRV_O_INACTIVE images could write to the image file, which is definitely forbidden while another host is using the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
2016-01-20block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVEKevin Wolf1-5/+5
Instead of covering only the state of images on the migration destination before the migration is completed, the flag will also cover the state of images on the migration source after completion. This common state implies that the image is technically still open, but no writes will happen and any cached contents will be reloaded from disk if and when the image leaves this state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-20block: Fix error path in bdrv_invalidate_cache()Kevin Wolf1-0/+2
We can only clear BDRV_O_INCOMING if the caches were actually invalidated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-01-19block: Fix .bdrv_open flagsKevin Wolf1-6/+7
bdrv_common_open() modified bs->open_flags after inferring the set of options to pass to the driver's .bdrv_open callback. This means that the cache options were correctly set in bs->open_flags (and therefore correctly displayed in 'info block'), but the image would actually be opened with the default cache mode instead. This patch removes the flags parameter to bdrv_common_open() (except for BDRV_O_NO_BACKING it's the same as bs->open_flags anyway, and having two names for the same thing is confusing), and moves the assignment of open_flags down to immediately before calling into the block drivers. In all other places, bs->open_flags is now used consistently. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-01-13error: Use error_prepend() where it makes obvious senseMarkus Armbruster1-11/+8
Done with this Coccinelle semantic patch @@ expression FMT, E1, E2; expression list ARGS; @@ - error_setg(E1, FMT, ARGS, error_get_pretty(E2)); + error_propagate(E1, E2);/*###*/ + error_prepend(E1, FMT/*@@@*/, ARGS); followed by manual cleanup, first because I can't figure out how to make Coccinelle transform strings, and second to get rid of now superfluous error_propagate(). We now use or propagate the original error whole instead of just its message obtained with error_get_pretty(). This avoids suppressing its hint (see commit 50b7b00), but I can't see how the errors touched in this commit could come with hints. It also improves the message printed with &error_abort when we screw up (see commit 1e9b65b). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>