summaryrefslogtreecommitdiff
path: root/block/linux-aio.c
AgeCommit message (Collapse)AuthorFilesLines
2016-07-18linux-aio: prevent submitting more than MAX_EVENTSRoman Pen1-10/+16
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us with specified number of events. But kernel ring buffer allocation logic is a bit tricky (ring buffer is page size aligned + some percpu allocation are required) so eventually more than requested events number is allocated. From a userspace side we have to follow the convention and should not try to io_submit() more or logic, which consumes completed events, should be changed accordingly. The pitfall is in the following sequence: MAX_EVENTS = 128 io_setup(MAX_EVENTS) io_submit(MAX_EVENTS) io_submit(MAX_EVENTS) /* now 256 events are in-flight */ io_getevents(MAX_EVENTS) = 128 /* we can handle only 128 events at once, to be sure * that nothing is pended the io_getevents(MAX_EVENTS) * call must be invoked once more or hang will happen. */ To prevent the hang or reiteration of io_getevents() call this patch restricts the number of in-flights, which is now limited to MAX_EVENTS. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18linux-aio: share one LinuxAioState within an AioContextPaolo Bonzini1-4/+6
This has better performance because it executes fewer system calls and does not use a bottom half per disk. Originally proposed by Ming Lei. [Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix build error as reported by Peter Maydell <peter.maydell@linaro.org>. --Stefan] Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> squash! linux-aio: share one LinuxAioState within an AioContext
2016-07-13coroutine: move entry argument to qemu_coroutine_createPaolo Bonzini1-1/+1
In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine *co = qemu_coroutine_create(entry); + Coroutine *co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: fix return code for partial write for Linux AIODenis V. Lunev1-1/+1
Partial write most likely means that there is not space rather than "something wrong happens". Thus it would be more natural to return ENOSPC rather than EINVAL. The problem actually happens with NBD server, which has reported EINVAL rather then ENOSPC on the first error using its protocol, which makes report to the user wrong. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-16linux-aio: Cancel BH if not neededKevin Wolf1-1/+3
linux-aio uses a BH in order to make sure that the remaining completions are processed even in nested event loops of completion callbacks in order to avoid deadlocks. There is no need, however, to have the BH overhead for the first call into qemu_laio_completion_bh() or after all pending completions have already been processed. Therefore, this patch calls directly into qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels the BH after qemu_laio_completion_bh() has processed all pending completions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16raw-posix: Implement .bdrv_co_preadv/pwritevKevin Wolf1-5/+2
The raw-posix block driver actually supports byte-aligned requests now on non-O_DIRECT images, like it already (and previously incorrectly) claimed in bs->request_alignment. For some block drivers this means that a RMW cycle can be avoided when they write sub-sector metadata e.g. for cluster allocation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16raw-posix: Switch to bdrv_co_* interfacesKevin Wolf1-22/+65
In order to use the modern byte-based .bdrv_co_preadv/pwritev() interface, this patch switches raw-posix to coroutine-based interfaces as a first step. In terms of semantics and performance, it doesn't make a difference with the existing code whether we go from a coroutine to a callback-based interface already in block/io.c or only in linux-aio.c As there have been concerns in the past that this change may be a step in the wrong direction with respect to a possible AIO fast path, the old callback-based interface for linux-aio is left around and can be reactivated when a fast path (e.g. directly from virtio-blk dataplane, bypassing the whole block layer) is implemented. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-12linux-aio: make it more type safePaolo Bonzini1-29/+17
Replace void* with an opaque LinuxAioState type. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12block: plug whole tree at once, introduce bdrv_io_unplugged_begin/endPaolo Bonzini1-8/+5
Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20block: Clean up includesPeter Maydell1-0/+1
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23aio: Add "is_external" flag for event handlersFam Zheng1-2/+3
All callers pass in false, and the real external ones will switch to true in coming patches. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-12linux-aio: simplify removal of completed iocbs from the listPaolo Bonzini1-6/+6
There is no need to do another O(n) pass on the list; the iocb to split the list at is already available through the array we passed to io_submit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12linux-aio: drop return code from laio_io_unplug and ioq_submitPaolo Bonzini1-10/+5
These are unused. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12linux-aio: rename LaioQueue idx field to "n"Paolo Bonzini1-6/+6
It does not identify an index in an array anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-4-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12linux-aio: track whether the queue is blockedPaolo Bonzini1-20/+27
Avoid that unplug submits requests when io_submit reported that it couldn't accept more; at the same time, try more io_submit calls if it could handle the whole set of requests that were passed, so that the "blocked" flag is reset as soon as possible. After the previous patch, laio_submit already tried to avoid submitting requests to a blocked queue, by comparing s->io_q.idx with "==" instead of the more natural ">=". Switch to the simpler expression now that we have the "blocked" flag. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-3-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12linux-aio: queue requests that cannot be submittedPaolo Bonzini1-42/+33
Keep a queue of requests that were not submitted; pass them to the kernel when a completion is reported, unless the queue is plugged. The array of iocbs is rebuilt every time from scratch. This avoids keeping the iocbs array and list synchronized. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-20block: Rename BlockDriverCompletionFunc to BlockCompletionFuncMarkus Armbruster1-1/+1
I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Rename BlockDriverAIOCB* to BlockAIOCB*Markus Armbruster1-3/+3
I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-22block: Rename qemu_aio_release -> qemu_aio_unrefFam Zheng1-2/+2
Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22linux-aio: Convert laio_aiocb_info.cancel to .cancel_asyncFam Zheng1-22/+8
Just call io_cancel (2), if it fails, it means the request is not canceled, so the event loop will eventually call qemu_laio_process_completion. In qemu_laio_process_completion, change to call the cb unconditionally. It is required by bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29linux-aio: avoid deadlock in nested aio_poll() callsStefan Hajnoczi1-16/+55
If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Marcin Gibuła <m.gibula@beyond.pl> Reported-by: Marcin Gibuła <m.gibula@beyond.pl> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Marcin Gibuła <m.gibula@beyond.pl>
2014-07-15linux-aio: Fix laio resource leakGonglei1-0/+5
when hotplug virtio-scsi disks using laio, the aio_nr will increase in laio_init() by io_setup(), we can see the number by # cat /proc/sys/fs/aio-nr 128 if the aio_nr attach the maxnum, which found from # cat /proc/sys/fs/aio-max-nr 65536 the hotplug process will fail because of aio context leak. Fix it by io_destroy in laio_cleanup(). Reported-by: daifulai <daifulai@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-07linux-aio: implement io plug, unplug and flush io queueMing Lei1-2/+94
This patch implements .bdrv_io_plug, .bdrv_io_unplug and .bdrv_flush_io_queue callbacks for linux-aio Block Drivers, so that submitting I/O as a batch can be supported on linux-aio. [Unprocessed requests are completed with -EIO instead of a bogus ret value. --Stefan] Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04block/linux-aio: fix memory and fd leakStefan Hajnoczi1-0/+8
Hot unplugging -drive aio=native,file=test.img,format=raw images leaves the Linux AIO event notifier and struct qemu_laio_state allocated. Luckily nothing will use the event notifier after the BlockDriverState has been closed so the handler function is never called. It's still worth fixing this resource leak. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04block/raw-posix: implement .bdrv_detach/attach_aio_context()Stefan Hajnoczi1-2/+14
Drop the assumption that we're using the main AioContext for Linux AIO. Convert the Linux AIO event notifier to use aio_set_event_notifier(). The .bdrv_detach/attach_aio_context() interfaces also need to be implemented to move the event notifier handler from the old to the new AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19aio: drop io_flush argumentStefan Hajnoczi1-2/+1
The .io_flush() handler no longer exists and has no users. Drop the io_flush argument to aio_set_fd_handler() and related functions. The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no longer used and are dropped too. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19block/linux-aio: drop qemu_laio_completion_cb()Stefan Hajnoczi1-15/+2
.io_flush() is no longer called so drop qemu_laio_completion_cb(). It turns out that count is now unused so drop that too. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-19misc: move include files to include/qemu/Paolo Bonzini1-2/+2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19block: move include files to include/block/Paolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-14aio: rename AIOPool to AIOCBInfoStefan Hajnoczi1-2/+2
Now that AIOPool no longer keeps a freelist, it isn't really a "pool" anymore. Rename it to AIOCBInfo and make it const since it no longer needs to be modified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-31raw-posix: move linux-aio.c to block/Paolo Bonzini1-0/+216
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>