summaryrefslogtreecommitdiff
path: root/block.c
AgeCommit message (Collapse)AuthorFilesLines
2013-04-05block: clean up I/O throttling wait_time codeStefan Hajnoczi1-3/+3
The wait_time variable is in seconds. Reflect this in a comment and use NANOSECONDS_PER_SECOND instead of BLOCK_IO_SLICE_TIME * 10 (which happens to have the right value). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-By: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05block: drop duplicated slice extension codeStefan Hajnoczi1-4/+1
The current slice is extended when an I/O request exceeds the limit. There is no need to extend the slice every time we check a request. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-By: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05block: keep I/O throttling slice time constantStefan Hajnoczi1-10/+9
It is not necessary to adjust the slice time at runtime. We already extend the current slice in order to carry over accounting into the next slice. Changing the actual slice time value introduces oscillations. The guest may experience large changes in throughput or IOPS from one moment to the next when slice times are adjusted. Reported-by: Benoît Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-By: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05block: fix I/O throttling accounting blind spotStefan Hajnoczi1-11/+10
I/O throttling relies on bdrv_acct_done() which is called when a request completes. This leaves a blind spot since we only charge for completed requests, not submitted requests. For example, if there is 1 operation remaining in this time slice the guest could submit 3 operations and they will all be submitted successfully since they don't actually get accounted for until they complete. Originally we probably thought this is okay since the requests will be accounted when the time slice is extended. In practice it causes fluctuations since the guest can exceed its I/O limit and it will be punished for this later on. Account for I/O upon submission so that I/O limits are enforced properly. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-By: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-28block: Fix direct use of protocols as driver for bdrv_open()Kevin Wolf1-11/+15
bdrv_open_common() implements direct use of protocols by copying the pre-opened BlockDriverStates to bs using bdrv_swap(). It did however first set some fields in bs, which end up in file after the swap. When bdrv_open() destroys file, it appears to be open, and because it isn't, qemu could segfault while trying to close it. Reorder the operations to return immediately in such cases so that file is correctly detected as closed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-22block: Allow omitting the file name when using driver-specific optionsKevin Wolf1-8/+41
After this patch, using -drive with an empty file name continues to open the file if driver-specific options are used. If no driver-specific options are specified, the semantics stay as it was: It defines a drive without an inserted medium. In order to achieve this, bdrv_open() must be made safe to work with a NULL filename parameter. The assumption that is made is that only block drivers which implement bdrv_parse_filename() support using driver specific options and could therefore work without a filename. These drivers must make sure to cope with NULL in their implementation of .bdrv_open() (this is only NBD for now). For all other drivers, the block layer code will make sure to error out before calling into their code - they can't possibly work without a filename. Now an NBD connection can be opened like this: qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Rename variable to avoid shadowingKevin Wolf1-7/+9
bdrv_open() uses two different variables called options. Rename one of them to avoid confusion and to allow the outer one to be accessed everywhere. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Introduce .bdrv_parse_filename callbackKevin Wolf1-0/+11
If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Pass bdrv_file_open() options to block driversKevin Wolf1-7/+56
Specify -drive file.option=... on the command line to pass the option to the protocol instead of the format driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22block: Add options QDict to bdrv_file_open() prototypesKevin Wolf1-3/+11
The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22Revert "block: complete all IOs before .bdrv_truncate"Peter Lieven1-4/+0
brdv_truncate() is also called from readv/writev commands on self- growing file based storage. this will result in requests waiting for theirselves to complete. This reverts commit 9a665b2b8640e464f0a778216fc2dca8d02acf33. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-19block: fix BDRV_O_SNAPSHOT protocol detectionStefan Hajnoczi1-5/+1
realpath(3) is used to get an absolute path to the image file when creating a -drive snapshot=on temporary qcow2. This does not work for protocols since their filenames ("proto:foo:...") do not correspond to file system paths. Commit 7c96d46ec245d73fd76726588409f9abe4bd5dc1 ("Let snapshot work with protocols") skipped realpath(3) for protocols. Later on the "raw" format was introduced and broke the check. Use path_has_protocol(filename) to decide if this image uses a protocol or a filename. Reported-by: Richard Jones <rjones@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15block: add bdrv_get_aio_context()Stefan Hajnoczi1-0/+6
For now bdrv_get_aio_context() is just a stub that calls qemu_aio_get_context() since the block layer is currently tied to the main loop AioContext. Add the stub now so that the block layer can begin accessing its AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15block: Add options QDict to bdrv_open_common()Kevin Wolf1-6/+26
The options are passed down to the block drivers, which are supposed to remove all options they have processed. Anything that is left over in the end is an unknown option and results in an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15block: Add options QDict to bdrv_open() prototypeKevin Wolf1-12/+35
It doesn't do anything yet except storing the options QDict in the BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15block: Add options QDict to .bdrv_open()Kevin Wolf1-2/+2
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04block: for HMP commit() operations on 'all', skip non-COW drivesJeff Cody1-3/+5
During a commit of 'all' using the HMP non-live commit, the operation is aborted and returns error on the first error enountered. When non-COW drives are in use (e.g. ejected floppy, cdrom, or drives without a backing parent), that means a commit all will return an error of either -ENOMEDIUM or -ENOTSUP. This is not desirable, so for the 'all' commit case, only attempt the commit if both bs->drv and bs->backing_hd are present. More succinctly: 'commit all' now means a commit on all COW drives. This means an individual commit to a specific non-COW drive will still return the appropriate error (-ENOMEDIUM if eject / not present, -ENOTSUP if no backing file). Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22block: implement BDRV_O_UNMAPPaolo Bonzini1-0/+25
It is better to present homogeneous hardware independent of the storage technology that is chosen on the host, hence we make discard a host parameter; the user can choose whether to pass it down to the image format and protocol, or to ignore it. Using DISCARD with filesystems can cause very severe fragmentation, so it is left default-off for now. This can change later when we implement the "anchor" operation for efficient management of preallocated files. There is still one choice to make: whether DISCARD has an effect on the dirty bitmap or not. I chose yes, though there is a disadvantage: if the guest is buggy and issues discards for data that is in use, there will be no way to migrate storage for that guest without downgrading the machine type to an older one. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-02-22block: complete all IOs before .bdrv_truncatePeter Lieven1-0/+4
bdrv_truncate() invalidates the bdrv_check_request() result for in-flight requests, so there should better be none. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Reported-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-02-22qemu-img: Add "Quiet mode" optionMiroslav Rezanina1-5/+7
There can be a need to turn output to stdout off. This patch adds a -q option that enable "Quiet mode". In Quiet mode, only errors are printed out. Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22block: Add synchronous wrapper for bdrv_co_is_allocated_aboveMiroslav Rezanina1-0/+39
There's no synchronous wrapper for bdrv_co_is_allocated_above function so it's not possible to check for sector allocation in an image with a backing file. Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01block: Fix is_allocated_above with resized filesVishvananda Ishaya1-1/+3
In an image chain, if the base image is smaller than the current image, we need to make sure to use the current images count of unallocated blocks once we get to the end of the base image. Without this change the code will return 0 blocks when it gets to the end of the base image and mirror_run will fail its assertion. Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-25block: allow customizing the granularity of the dirty bitmapPaolo Bonzini1-7/+10
Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25block: return count of dirty sectors, not chunksPaolo Bonzini1-3/+2
Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25block: make round_to_clusters publicPaolo Bonzini1-8/+8
This is needed in the following patch. Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25block: implement dirty bitmap using HBitmapPaolo Bonzini1-80/+16
This actually uses the dirty bitmap in the block layer, and converts mirroring to use an HBitmapIter. Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-15block: clear dirty bitmap when discardingPaolo Bonzini1-1/+7
Note that resetting bits in the dirty bitmap is done _before_ actually processing the request. Writes, instead, set bits after the request is completed. This way, when there are concurrent write and discard requests, the outcome will always be that the blocks are marked dirty. This scenario should never happen, but it is safer to do it this way. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15block: fix initialization in bdrv_io_limits_enable()Peter Lieven1-4/+0
bdrv_io_limits_enable() starts a new slice, but does not set io_base correctly for that slice. Here is how io_base is used: bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write]; bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE; if (bytes_base + bytes_res <= bytes_limit) { /* no wait */ } else { /* operation needs to be throttled */ } As a result, any I/O operations that are triggered between now and bs->slice_end are incorrectly limited. If 10 MB of data has been written since the VM was started, QEMU thinks that 10 MB of data has been written in this slice. This leads to a I/O lockup in the guest. We fix this by delaying the start of a new slice to the next call of bdrv_exceed_io_limits(). Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-14block: make qiov_is_aligned() publicStefan Hajnoczi1-0/+16
The qiov_is_aligned() function checks whether a QEMUIOVector meets a BlockDriverState's alignment requirements. This is needed by virtio-blk-data-plane so: 1. Move the function from block/raw-posix.c to block/block.c. 2. Make it public in block/block.h. 3. Rename to bdrv_qiov_is_aligned(). 4. Change return type from int to bool. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14block: do not probe zero-sized disksPaolo Bonzini1-1/+1
A blank CD or DVD is visible as a zero-sized disks. Probing such disks will lead to an EIO and a failure to start the VM. Treating them as raw is a better solution. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-11Replace remaining gmtime, localtime by gmtime_r, localtime_rStefan Weil1-10/+0
This allows removing of MinGW specific code and improves reentrancy for POSIX hosts. [Removed unused ret variable in qemu_get_timedate() to fix warning: vl.c: In function ‘qemu_get_timedate’: vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable] -- Stefan Hajnoczi] Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-19softmmu: move include files to include/sysemu/Paolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19misc: move include files to include/qemu/Paolo Bonzini1-3/+3
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19monitor: move include files to include/monitor/Paolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19block: move include files to include/block/Paolo Bonzini1-3/+3
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19qapi: move include files to include/qobject/Paolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-12qemu-io: Add AIO debugging commandsKevin Wolf1-0/+39
This makes the blkdebug suspend/resume functionality available in qemu-io. Use it like this: $ ./qemu-io blkdebug::/tmp/test.qcow2 qemu-io> break write_aio req_a qemu-io> aio_write 0 4k qemu-io> blkdebug: Suspended request 'req_a' qemu-io> resume req_a blkdebug: Resuming request 'req_a' qemu-io> wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec) Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: bdrv_img_create(): drop unused error handling codeLuiz Capitulino1-35/+5
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: bdrv_img_create(): add Error ** argumentLuiz Capitulino1-1/+21
This commit adds an Error ** argument to bdrv_img_create() and set it appropriately on error. Callers of bdrv_img_create() pass NULL for the new argument and still rely on bdrv_img_create()'s return value. Next commits will change callers to use the Error object instead. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: Avoid second open for format probingKevin Wolf1-28/+38
This fixes problems that are caused by the additional open/close cycle of the existing format probing, for example related to qemu-nbd without -t option or file descriptor passing. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: Factor out bdrv_open_flagsKevin Wolf1-14/+21
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: Improve bdrv_aio_co_cancel_emKevin Wolf1-1/+18
Instead of waiting for all requests to complete, wait just for the specific request that should be cancelled. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-24block: Fix regression for MinGW (assertion caused by short string)Stefan Weil1-1/+2
The local string tmp_filename is passed to function get_tmp_filename which expects a string with minimum size MAX_PATH for w32 hosts. MAX_PATH is 260 and PATH_MAX is 259, so tmp_filename was too short. Commit eba25057b9a5e19d10ace2bc7716667a31297169 introduced this regression. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-14aio: rename AIOPool to AIOCBInfoStefan Hajnoczi1-11/+11
Now that AIOPool no longer keeps a freelist, it isn't really a "pool" anymore. Rename it to AIOCBInfo and make it const since it no longer needs to be modified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14aio: use g_slice_alloc() for AIOCB poolingStefan Hajnoczi1-11/+4
AIO control blocks are frequently acquired and released because each aio request involves at least one AIOCB. Therefore, we pool them to avoid heap allocation overhead. The problem with the freelist approach in AIOPool is thread-safety. If we want BlockDriverStates to associate with AioContexts that execute in multiple threads, then a global freelist becomes a problem. This patch drops the freelist and instead uses g_slice_alloc() which is tuned for per-thread fixed-size object pools. qemu_aio_get() and qemu_aio_release() are now thread-safe. Note that the change from g_malloc0() to g_slice_alloc() should be safe since the freelist reuse case doesn't zero the AIOCB either. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-29Merge remote-tracking branch 'kwolf/for-anthony' into stagingAnthony Liguori1-97/+210
* kwolf/for-anthony: (32 commits) osdep: Less restrictive F_SEFL in qemu_dup_flags() qemu-iotests: add testcases for mirroring on-source-error/on-target-error qmp: add pull_event function mirror: add support for on-source-error/on-target-error iostatus: forward block_job_iostatus_reset to block job qemu-iotests: add mirroring test case mirror: implement completion qmp: add drive-mirror command mirror: introduce mirror job block: introduce BLOCK_JOB_READY event block: add block-job-complete block: rename block_job_complete to block_job_completed block: export dirty bitmap information in query-block block: introduce new dirty bitmap functionality block: add bdrv_open_backing_file block: add bdrv_query_stats block: add bdrv_query_info qemu-config: Add new -add-fd command line option monitor: Prevent removing fd from set during init monitor: Enable adding an inherited fd to an fd set ... Conflicts: vl.c Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-24iostatus: forward block_job_iostatus_reset to block jobPaolo Bonzini1-0/+3
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: export dirty bitmap information in query-blockPaolo Bonzini1-0/+7
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: introduce new dirty bitmap functionalityPaolo Bonzini1-6/+45
Assert that write_compressed is never used with the dirty bitmap. Setting the bits early is wrong, because a coroutine might concurrently examine them and copy incomplete data from the source. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: add bdrv_open_backing_filePaolo Bonzini1-18/+38
Mirroring runs without the backing file so that it can be copied outside QEMU. However, we need to add it at the time the job is completed and QEMU switches to the target. Factor out the common bits of opening an image and completing a mirroring operation. The new function does not assume that the file is closed immediately after it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>