summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2012-07-09blkdebug: store list of active rulesPaolo Bonzini1-38/+31
This prepares for the next patch, where some active rules may actually not trigger depending on input to readv/writev. Store the active rules in a SIMPLEQ (so that it can be emptied easily with QSIMPLEQ_INIT), and fetch the errno/once/immediately arguments from there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09blkdebug: pass getlength to underlying filePaolo Bonzini1-0/+6
This is required when using blkdebug with raw format. Unlike qcow2/QED, raw asks blkdebug for the length of the file, it doesn't get it from a header. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09blkdebug: tiny cleanupPaolo Bonzini1-6/+2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09blkdebug: remove sync i/o eventsPaolo Bonzini2-3/+1
These are unused, except (by mistake more or less) in QED. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09sheepdog: traverse pending_list from the first for each timeMORITA Kazutaka1-6/+16
The pending list can be modified in other coroutine context sd_co_rw_vector, so we need to traverse the list from the first again after we send the pending request. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09sheepdog: split outstanding list into inflight and pendingMORITA Kazutaka1-25/+24
outstanding_list_head is used for both pending and inflight requests. This patch splits it and improves readability. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09sheepdog: make sure we don't free aiocb before sending all requestsMORITA Kazutaka1-13/+16
This patch increments the pending counter before sending requests, and make sures that aiocb is not freed while sending them. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09sheepdog: use coroutine based socket functions in coroutine contextMORITA Kazutaka1-2/+8
This removes blocking network I/Os in coroutine context. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09sheepdog: restart I/O when socket becomes ready in do_co_req()MORITA Kazutaka1-0/+14
Currently, no one reenters the yielded coroutine. This fixes it. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09sheepdog: fix dprintf format stringsMORITA Kazutaka1-4/+4
This fixes warnings about dprintf format in debug mode. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09qcow2: preserve free_byte_offset when qcow2_alloc_bytes() failsStefan Hajnoczi1-3/+4
When qcow2_alloc_clusters() error handling code was introduced in commit 5d757b563d59142ca81e1073a8e8396750a0ad1a, the value of free_byte_offset was clobbered in the error case. This patch keeps free_byte_offset at 0 so we will try to allocate clusters again next time this function is called. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09qcow2: fix #ifdef'd qcow2_check_refcounts() callersStefan Hajnoczi2-4/+4
The DEBUG_ALLOC qcow2.h macro enables additional consistency checks throughout the code. This makes it easier to spot corruptions that are introduced during development. Since consistency check is an expensive operation the DEBUG_ALLOC macro is used to compile checks out in normal builds and qcow2_check_refcounts() calls missed the addition of a new function argument. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-24raw-posix: Fix build without is_allocated supportKevin Wolf1-1/+8
Move the declaration of s into the #ifdef sections that actually make use of it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-06-15qcow2: fix autoclear image header updateStefan Hajnoczi1-8/+9
The autoclear feature bits can be used for qcow2 file format features that are safe to "drop" by old programs that do not understand the feature. Upon opening the image file unknown autoclear feature bits are cleared and the image file header is rewritten, but this was happening too early in the code when critical header fields were not yet loaded. Process autoclear feature bits after all necessary header information has been loaded. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: Fix avail_sectors in cluster allocation codeKevin Wolf1-1/+9
avail_sectors should really be the number of sectors from the start of the allocation, not from the start of the write request. We're lucky enough that this mistake didn't cause any real bug. avail_sectors is only used in the intialiser of QCowL2Meta: .nb_available = MIN(requested_sectors, avail_sectors), m->nb_available in turn is only used for COW at the end of the allocation. A COW occurs only if the request wasn't cluster aligned, which in turn would imply that requested_sectors was less than avail_sectors (both in the original and in the fixed version). In this case avail_sectors is ignored and therefore the mistake doesn't cause any misbehaviour. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: Simplify calculation for COW area at the endKevin Wolf1-3/+2
copy_sectors() always uses the sum (cluster_offset + n_start) or (start_sect + n_start), so if some value is added to both cluster_offset and start_sect, and subtracted from n_start, it's cancelled out anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: always operate caches in writeback modePaolo Bonzini4-44/+5
Writethrough does not need special-casing anymore in the qcow2 caches. The block layer adds flushes after every guest-initiated data write, and these will also flush the qcow2 caches to the OS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15sheepdog: add coroutine_fn markers to coroutine functionsMORITA Kazutaka1-4/+5
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15rbd: hook up cache optionsJosh Durgin1-0/+19
Writeback caching was added in Ceph 0.46, and writethrough will be in 0.47. These are controlled by general config options, so there's no need to check for librbd version. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: Support for fixing refcount inconsistenciesKevin Wolf3-15/+37
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qemu-img check: Print fixed clusters and recheckKevin Wolf1-0/+2
When any inconsistencies have been fixed, print the statistics and run another check to make sure everything is correct now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qemu-img check -r for repairing imagesKevin Wolf3-4/+15
The QED block driver already provides the functionality to not only detect inconsistencies in images, but also fix them. However, this functionality cannot be manually invoked with qemu-img, but the check happens only automatically during bdrv_open(). This adds a -r switch to qemu-img check that allows manual invocation of an image repair. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15stream: move rate limiting to a separate header filePaolo Bonzini1-29/+2
Make the code reusable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15stream: move is_allocated_above to block.cPaolo Bonzini1-51/+2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15stream: tweak usage of bdrv_co_is_allocatedPaolo Bonzini1-26/+25
is_allocated_base has complex semantics that are not really usable outside streaming. Split the check in two parts, where the allocated state for the top bs is moved to the caller. The resulting function is more generally useful. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15block: implement is_allocated for rawPaolo Bonzini2-0/+106
Either FIEMAP, or SEEK_DATA+SEEK_HOLE can be used to implement the is_allocated callback for raw files. On Linux ext4, btrfs and XFS all support it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: fix endianness conversionZhi Yong Wu1-1/+1
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: remove a line of unnecessary codeZhi Yong Wu1-1/+0
Commit 3948d1d4 removed the pointer argument we filled in with l2_offset but forgot to remove the unnecessary l2_offset assignment. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qcow2: Silence false warningKevin Wolf1-0/+2
Some gcc versions seem not to be able to figure out that the switch statement covers all possible values and that c is therefore always initialised. Add a default branch for them. Reported-by: malc <av1474@comtv.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: malc <av1474@comtv.ru>
2012-06-07build: move block/ objects to nested Makefile.objsPaolo Bonzini1-0/+11
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-30block: prevent snapshot mode $TMPDIR symlink attackJim Meyering1-1/+6
In snapshot mode, bdrv_open creates an empty temporary file without checking for mkstemp or close failure, and ignoring the possibility of a buffer overrun given a surprisingly long $TMPDIR. Change the get_tmp_filename function to return int (not void), so that it can inform its two callers of those failures. Also avoid the risk of buffer overrun and do not ignore mkstemp or close failure. Update both callers (in block.c and vvfat.c) to propagate temp-file-creation failure to their callers. get_tmp_filename creates and closes an empty file, while its callers later open that presumed-existing file with O_CREAT. The problem was that a malicious user could provoke mkstemp failure and race to create a symlink with the selected temporary file name, thus causing the qemu process (usually root owned) to open through the symlink, overwriting an attacker-chosen file. This addresses CVE-2012-2652. http://bugzilla.redhat.com/CVE-2012-2652 Signed-off-by: Jim Meyering <meyering@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-30sheepdog: fix return value of do_load_save_vm_stateMORITA Kazutaka1-5/+5
bdrv_save_vmstate and bdrv_load_vmstate should return the vmstate size on success, and -errno on error. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-29Merge remote-tracking branch 'kwolf/for-anthony' into stagingAnthony Liguori3-56/+83
* kwolf/for-anthony: fdc-test: introduced qtest no_media_on_start and cmos qtest for floppy fdc: fix media detection fdc: floppy drive should be visible after start without media qemu-iotests: mark 035 qcow2-only qcow2: Check qcow2_alloc_clusters_at() return value sheepdog: use heap instead of stack for BDRVSheepdogState sheepdog: return -errno on error sheepdog: mark image as snapshot when tag is specified qemu-img: Explain how rebase operation can be used to perform a 'diff' operation. qcow2: don't leak buffer for unexpected qcow_version in header
2012-05-28ISCSI: Switch to using READ16/WRITE16 for I/O to the LUNRonnie Sahlberg1-29/+83
This allows using LUNs bigger than 2TB. Keep using READ10 for other device types such as MMC. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28ISCSI: Only call READCAPACITY16 for SBC devices, use READCAPACITY10 for MMCRonnie Sahlberg1-5/+59
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-28ISCSI: get device type at connection timeRonnie Sahlberg1-2/+43
This is needed to avoid READ CAPACITY(16) for MMC devices. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-28ISCSI: change num_blocks to 64-bitPaolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-28ISCSI: redo how we set up the eventsRonnie Sahlberg1-4/+21
Call qemu_notify_event() after updating events. Otherwise, If we add an event for -is-writeable but the socket is already writeable there may be a delay before the event callback is actually triggered. Those delays would in particular hurt performance during BIOS boot and when the GRUB bootloader reads the kernel and initrd. But first call out to the socket write functions directly, and only set up the write event if the socket is full. This will happen very rarely and this improves performance. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-05-25qcow2: Check qcow2_alloc_clusters_at() return valueKevin Wolf1-10/+13
When using qcow2_alloc_clusters_at(), the cluster allocation code checked the wrong variable for an error code. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25sheepdog: use heap instead of stack for BDRVSheepdogStateMORITA Kazutaka1-13/+22
bdrv_create() is called in coroutine context now, so we cannot use more stack than 1 MB in the function if we use ucontext coroutine. This patch allocates BDRVSheepdogState, whose size is 4 MB, on the heap in sd_create(). Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25sheepdog: return -errno on errorMORITA Kazutaka1-32/+46
On error, BlockDriver APIs should return -errno instead of -1. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25sheepdog: mark image as snapshot when tag is specifiedMORITA Kazutaka1-1/+1
When a snapshot tag is specified in the filename, the opened image is a snapshot. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25qcow2: don't leak buffer for unexpected qcow_version in headerJim Meyering1-1/+2
Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-14qcow2: Don't ignore failure to clear autoclear flagsKevin Wolf1-1/+4
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: fix warning introduced in efcc7a23Anthony Liguori1-1/+1
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-10stream: do not copy unallocated sectors from the basePaolo Bonzini1-14/+4
Unallocated sectors should really never be accessed by the guest, so there's no need to copy them during the streaming process. If they are read by the guest during streaming, guest-initiated copy-on-read will copy them (we're in the base == NULL case, which enables copy on read). If they are read after we disconnect the image from the base, they will read as zeroes anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10stream: fix ratelimiting corner casePaolo Bonzini1-5/+5
This fixes inability to make progress in streaming if the quota is set to less than the amount of data that an I/O operation has to write. In this case, limit->dispatched + n will always be above the quota and, due to the "goto retry" to recheck cancellation and allocation, streaming will livelock. This can be reproduced with "block_job_set_speed ide0-hd0 1b". Of course, with this patch the requested limit will not be obeyed. That could be done with another patch that caps is_allocated's n argument by the slice quota. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10stream: pass new base image format to bdrv_change_backing_filePaolo Bonzini1-2/+5
When an image is modified to point to the new backing file, the backing file format is set to NULL, which means auto-probe. This is wrong, in fact it is a small security problem. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: wait for job callback in block_job_cancel_syncPaolo Bonzini1-4/+3
The limitation on not having I/O after cancellation cannot really be kept. Even streaming has a very small race window where you could cancel a job and have it report completion. If this window is hit, bdrv_change_backing_file() will yield and possibly cause accesses to dangling pointers etc. So, let's just assume that we cannot know exactly what will happen after the coroutine has set busy to false. We can set a very lax condition: - if we cancel the job, the coroutine won't set it to false again (and hence will not call co_sleep_ns again). - block_job_cancel_sync will wait for the coroutine to exit, which pretty much ensures no race. Instead, we track the coroutine that executes the job and put very strict conditions on what to do while it is quiescent (busy = false). First of all, the coroutine must never set busy = false while the job has been cancelled. Second, the coroutine can be reentered arbitrarily while it is quiescent, so you cannot really do anything but co_sleep_ns at that time. This condition is obeyed by the block_job_sleep_ns function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: add block_job_sleep_nsPaolo Bonzini1-14/+9
This function abstracts the pretty complex semantics of the "busy" member of BlockJob. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>