peter/qemu - QEMU hacking for Peter

Age	Commit message (Collapse)	Author	Files	Lines
2017-03-27	nbd-client: fix handling of hungup connections	Paolo Bonzini	1	-1/+1
	After the switch to reading replies in a coroutine, nothing is reentering pending receive coroutines if the connection hangs. Move nbd_recv_coroutines_enter_all to the reply read coroutine, which is the place where hangups are detected. nbd_teardown_connection can simply wait for the reply read coroutine to detect the hangup and clean up after itself. This wouldn't be enough though because nbd_receive_reply returns 0 (rather than -EPIPE or similar) when reading from a hung connection. Fix the return value check in nbd_read_reply_entry. This fixes qemu-iotests 083. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170314111157.14464-1-pbonzini@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-03-14	nbd/client: fix drop_sync [CVE-2017-2630]	Vladimir Sementsov-Ogievskiy	1	-1/+1
	Comparison symbol is misused. It may lead to memory corruption. Introduced in commit 7d3123e. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170203154757.36140-6-vsementsov@virtuozzo.com> [eblake: add CVE details, update conditional] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-AndrÃ© Lureau <marcandre.lureau@redhat.com> Message-Id: <20170307151627.27212-1-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-28	nbd/server: Use real permissions for NBD exports	Kevin Wolf	1	-2/+9
	NBD can't cope with device size changes, so resize must be forbidden, but otherwise we can tolerate anything. Depending on whether the export is writable or not, we only require consistent reads and writes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28	block: Add error parameter to blk_insert_bs()	Kevin Wolf	1	-1/+5
	Now that blk_insert_bs() requests the BlockBackend permissions for the node it attaches to, it can fail. Instead of aborting, pass the errors to the callers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28	block: Add permissions to blk_new()	Kevin Wolf	1	-1/+2
	We want every user to be specific about the permissions it needs, so we'll pass the initial permissions as parameters to blk_new(). A user only needs to call blk_set_perm() if it wants to change the permissions after the fact. The permissions are stored in the BlockBackend and applied whenever a BlockDriverState should be attached in blk_insert_bs(). This does not include actually choosing the right set of permissions everywhere yet. Instead, the usual FIXME comment is added to each place and will be addressed in individual patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-21	nbd: convert to use qio_channel_yield	Paolo Bonzini	3	-75/+30
	In the client, read the reply headers from a coroutine, switching the read side between the "read header" coroutine and the I/O coroutine that reads the body of the reply. In the server, if the server can read more requests it will create a new "read request" coroutine as soon as a request has been read. Otherwise, the new coroutine is created in nbd_request_put. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-8-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-23	io: change the QIOTask callback signature	Daniel P. Berrange	2	-7/+4
	Currently the QIOTaskFunc signature takes an Object * for the source, and an Error * for any error. We also need to be able to provide a result pointer. Rather than continue to add parameters to QIOTaskFunc, remove the existing ones and simply pass the QIOTask object instead. This has methods to access all the other data items required in the callback impl. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-03	aio: add AioPollFn and io_poll() interface	Stefan Hajnoczi	1	-5/+4
	The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-10	nbd: Don't inf-loop on early EOF	Eric Blake	1	-6/+7
	Commit 7d3123e converted a single read_sync() into a while loop that assumed that read_sync() would either make progress or give an error. But when the server hangs up early, the client sees EOF (a read_sync() of 0) and never makes progress, which in turn caused qemu-iotest './check -nbd 83' to go into an infinite loop. Rework the loop to accomodate reads cut short by EOF. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1478551093-32757-1-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Implement NBD_CMD_WRITE_ZEROES on server	Eric Blake	1	-2/+40
	Upstream NBD protocol recently added the ability to efficiently write zeroes without having to send the zeroes over the wire, along with a flag to control whether the client wants to allow a hole. Note that when it comes to requiring full allocation, vs. permitting optimizations, the NBD spec intentionally picked a different sense for the flag; the rules in qemu are: MAY_UNMAP == 0: must write zeroes MAY_UNMAP == 1: may use holes if reads will see zeroes while in NBD, the rules are: FLAG_NO_HOLE == 1: must write zeroes FLAG_NO_HOLE == 0: may use holes if reads will see zeroes In all cases, the 'may use holes' scenario is optional (the server need not use a hole, and must not use a hole if subsequent reads would not see zeroes). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-16-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Improve server handling of shutdown requests	Eric Blake	3	-0/+29
	NBD commit 6d34500b clarified how clients and servers are supposed to behave before closing a connection. It added NBD_REP_ERR_SHUTDOWN (for the server to announce it is about to go away during option haggling, so the client should quit sending NBD_OPT_* other than NBD_OPT_ABORT) and ESHUTDOWN (for the server to announce it is about to go away during transmission, so the client should quit sending NBD_CMD_* other than NBD_CMD_DISC). It also clarified that NBD_OPT_ABORT gets a reply, while NBD_CMD_DISC does not. This patch merely adds the missing reply to NBD_OPT_ABORT and teaches the client to recognize server errors. Actually teaching the server to send NBD_REP_ERR_SHUTDOWN or ESHUTDOWN would require knowing that the server has been requested to shut down soon (maybe we could do that by installing a SIGINT handler in qemu-nbd, which transitions from RUNNING to a new state that waits for the client to react, rather than just out-right quitting - but that's a bigger task for another day). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-15-git-send-email-eblake@redhat.com> [Move dummy ESHUTDOWN to include/qemu/osdep.h. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Refactor conversion to errno to silence checkpatch	Eric Blake	1	-6/+14
	Checkpatch complains that 'return EINVAL' is usually wrong (since we tend to favor 'return -EINVAL'). But it is a false positive for nbd_errno_to_system_errno(). Since NBD may add future defined wire values, refactor the code to keep checkpatch happy. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-14-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Support shorter handshake	Eric Blake	2	-5/+18
	The NBD Protocol allows the server and client to mutually agree on a shorter handshake (omit the 124 bytes of reserved 0), via the server advertising NBD_FLAG_NO_ZEROES and the client acknowledging with NBD_FLAG_C_NO_ZEROES (only possible in newstyle, whether or not it is fixed newstyle). It doesn't shave much off the wire, but we might as well implement it. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alex Bligh <alex@alex.org.uk> Message-Id: <1476469998-28592-13-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Less allocation during NBD_OPT_LIST	Eric Blake	1	-72/+67
	Since we know that the maximum name we are willing to accept is small enough to stack-allocate, rework the iteration over NBD_OPT_LIST responses to reuse a stack buffer rather than allocating every time. Furthermore, we don't even have to allocate if we know the server's length doesn't match what we are searching for. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-12-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Let client skip portions of server reply	Eric Blake	1	-14/+33
	The server has a nice helper function nbd_negotiate_drop_sync() which lets it easily ignore fluff from the client (such as the payload to an unknown option request). We can't quite make it common, since it depends on nbd_negotiate_read() which handles coroutine magic, but we can copy the idea into the client where we have places where we want to ignore data (such as the description tacked on the end of NBD_REP_SERVER). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-11-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Let server know when client gives up negotiation	Eric Blake	1	-0/+30
	The NBD spec says that a client should send NBD_OPT_ABORT rather than just dropping the connection, if the client doesn't like something the server sent during option negotiation. This is a best-effort attempt only, and can only be done in places where we know the server is still in sync with what we've sent, whether or not we've read everything the server has sent. Technically, the server then has to reply with NBD_REP_ACK, but it's not worth complicating the client to wait around for that reply. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-10-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Share common option-sending code in client	Eric Blake	2	-148/+109
	Rather than open-coding each option request, it's easier to have common helper functions do the work. That in turn requires having convenient packed types for handling option requests and replies. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-9-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Send message along with server NBD_REP_ERR errors	Eric Blake	1	-19/+59
	The NBD Protocol allows us to send human-readable messages along with any NBD_REP_ERR error during option negotiation; make use of this fact for clients that know what to do with our message. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-8-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Share common reply-sending code in server	Eric Blake	1	-25/+27
	Rather than open-coding NBD_REP_SERVER, reuse the code we already have by adding a length parameter. Additionally, the refactoring will make adding NBD_OPT_GO in a later patch easier. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-7-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Rename struct nbd_request and nbd_reply	Eric Blake	2	-8/+8
	Our coding convention prefers CamelCase names, and we already have other existing structs with NBDFoo naming. Let's be consistent, before later patches add even more structs. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-6-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Rename NBDRequest to NBDRequestData	Eric Blake	1	-10/+10
	We have both 'struct NBDRequest' and 'struct nbd_request'; making it confusing to see which does what. Furthermore, we want to rename nbd_request to align with our normal CamelCase naming conventions. So, rename the struct which is used to associate the data received during request callbacks, while leaving the shorter name for the description of the request sent over the wire in the NBD protocol. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-4-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Treat flags vs. command type as separate fields	Eric Blake	3	-25/+27
	Current upstream NBD documents that requests have a 16-bit flags, followed by a 16-bit type integer; although older versions mentioned only a 32-bit field with masking to find flags. Since the protocol is in network order (big-endian over the wire), the ABI is unchanged; but dealing with the flags as a separate field rather than masking will make it easier to add support for upcoming NBD extensions that increase the number of both flags and commands. Improve some comments in nbd.h based on the current upstream NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md), and touch some nearby code to keep checkpatch.pl happy. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-3-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02	nbd: Add qemu-nbd -D for human-readable description	Eric Blake	2	-10/+29
	The NBD protocol allows servers to advertise a human-readable description alongside an export name during NBD_OPT_LIST. Add an option to pass through the user's string to the NBD client. Doing this also makes it easier to test commit 200650d4, which is the client counterpart of receiving the description. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-2-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-27	nbd: set name for all I/O channels created	Daniel P. Berrange	2	-0/+2
	Ensure that all I/O channels created for NBD are given names to distinguish their respective roles. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-05	nbd-server: Use a separate BlockBackend	Kevin Wolf	1	-5/+20
	The builtin NBD server uses its own BlockBackend now instead of reusing the monitor/guest device one. This means that it has its own writethrough setting now. The builtin NBD server always uses writeback caching now regardless of whether the guest device has WCE enabled. qemu-nbd respects the cache mode given on the command line. We still need to keep a reference to the monitor BB because we put an eject notifier on it, but we don't use it for any I/O. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-08-03	nbd: Limit nbdflags to 16 bits	Eric Blake	2	-19/+19
	Rather than asserting that nbdflags is within range, just give it the correct type to begin with :) nbdflags corresponds to the per-export portion of NBD Protocol "transmission flags", which is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO. Furthermore, upstream NBD has never passed the global flags to the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually tried to OR the global flags with the transmission flags, with the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9 caused all earlier NBD 3.x clients to treat every export as read-only; NBD 3.10 and later intentionally clip things to 16 bits to pass only transmission flags). Qemu should follow suit, since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior during transmission. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03	nbd: Fix bad flag detection on server	Eric Blake	1	-1/+2
	Commit ab7c548e added a check for invalid flags, but used an early return on error instead of properly going through the cleanup label. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1469129688-22848-2-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-20	block: Convert BB interface to byte-based discards	Eric Blake	1	-14/+5
	Change sector-based blk_discard(), blk_co_discard(), and blk_aio_discard() to instead be byte-based blk_pdiscard(), blk_co_pdiscard(), and blk_aio_pdiscard(). NBD gets a lot simpler now that ignoring the unaligned portion of a byte-based discard request is handled under the hood by the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20	nbd: Drop unused offset parameter	Eric Blake	2	-6/+3
	Now that NBD relies on the block layer to fragment things, we no longer need to track an offset argument for which fragment of a request we are actually servicing. While at it, use true and false instead of 0 and 1 for a bool parameter. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-13	coroutine: move entry argument to qemu_coroutine_create	Paolo Bonzini	1	-6/+6
	In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-17	nbd/client.c: Correct trace format string	Peter Maydell	1	-1/+1
	The trace format string in nbd_send_request uses PRIu16 for request->type, but request->type is a uint32_t. This provokes compiler warnings on the OSX clang. Use PRIu32 instead. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1466167331-17063-1-git-send-email-peter.maydell@linaro.org
2016-06-16	nbd: Avoid magic number for NBD max name size	Eric Blake	2	-3/+3
	Declare a constant and use that when determining if an export name fits within the constraints we are willing to support. Note that upstream NBD recently documented that clients MUST support export names of 256 bytes (not including trailing NUL), and SHOULD support names up to 4096 bytes. 4096 is a bit big (we would lose benefits of stack-allocation of a name array), and we already have other limits in place (for example, qcow2 snapshot names are clamped around 1024). So for now, just stick to the required minimum, as that's easier to audit than a full-scale support for larger names. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Detect servers that send unexpected error values	Eric Blake	1	-1/+3
	Add some debugging to flag servers that are not compliant to the NBD protocol. This would have flagged the server bug fixed in commit c0301fcc. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alex Bligh <alex@alex.org.uk> Message-Id: <1463006384-7734-11-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Clean up ioctl handling of qemu-nbd -c	Eric Blake	1	-5/+15
	The kernel ioctl() interface into NBD is limited to 'unsigned long'; we MUST pass in input with that type (and not int or size_t, as there may be platform ABIs where the wrong types promote incorrectly through var-args). Furthermore, on 32-bit platforms, the kernel is limited to a maximum export size of 2T (our BLKSIZE of 512 times a SIZE_BLOCKS constrained by 32 bit unsigned long). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-8-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Group all Linux-specific ioctl code in one place	Eric Blake	2	-18/+13
	NBD ioctl()s are used to manage an NBD client session where initial handshake is done in userspace, but then the transmission phase is handed off to the kernel through a /dev/nbdX device. As such, all ioctls sent to the kernel on the /dev/nbdX fd belong in client.c; nbd_disconnect() was out-of-place in server.c. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-7-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Reject unknown request flags	Eric Blake	1	-0/+5
	The NBD protocol says that clients should not send a command flag that has not been negotiated (whether by the client requesting an option during a handshake, or because we advertise support for the flag in response to NBD_OPT_EXPORT_NAME), and that servers should reject invalid flags with EINVAL. We were silently ignoring the flags instead. The client can't rely on our behavior, since it is their fault for passing the bad flag in the first place, but it's better to be robust up front than to possibly behave differently than the client was expecting with the attempted flag. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alex Bligh <alex@alex.org.uk> Message-Id: <1463006384-7734-6-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Improve server handling of bogus commands	Eric Blake	1	-19/+47
	We have a few bugs in how we handle invalid client commands: - A client can send an NBD_CMD_DISC where from + len overflows, convincing us to reply with an error and stay connected, even though the protocol requires us to silently disconnect. Fix by hoisting the special case sooner. - A client can send an NBD_CMD_WRITE where from + len overflows, where we reply to the client with EINVAL without consuming the payload; this will normally cause us to fail if the next thing read is not the right magic, but in rare cases, could cause us to interpret the data payload as valid commands and do things not requested by the client. Fix by adding a complete flag to track whether we are in sync or must disconnect. Furthermore, we have split the checks for bogus from/len across two functions, when it is easier to do it all at once. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-5-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Quit server after any write error	Eric Blake	1	-9/+23
	We should never ignore failure from nbd_negotiate_send_rep(); if we are unable to write to the client, then it is not worth trying to continue the negotiation. Fortunately, the problem is not too severe - chances are that the errors being ignored here (mainly inability to write the reply to the client) are indications of a closed connection or something similar, which will also affect the next attempt to interact with the client and eventually reach a point where the errors are detected to end the loop. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-4-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: More debug typo fixes, use correct formats	Eric Blake	2	-40/+49
	Clean up some debug message oddities missed earlier; this includes some typos, and recognizing that %d is not necessarily compatible with uint32_t. Also add a couple messages that I found useful while debugging things. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-3-git-send-email-eblake@redhat.com> [Do not use PRIx16, clang complains. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Use BDRV_REQ_FUA for better FUA where supported	Eric Blake	1	-10/+6
	Rather than always flushing ourselves, let the block layer forward the FUA on to the underlying device - where all underlying layers also understand FUA, we are now more efficient; and where any underlying layer doesn't understand it, now the block layer takes care of the full flush fallback on our behalf. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-2-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Don't use cpu_to_*w() functions	Peter Maydell	1	-5/+5
	The cpu_to_w() functions just compose a pointer dereference with a byteswap. Instead use st_p(), which handles potential pointer misalignment and avoids the need to cast the pointer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1465575342-12146-1-git-send-email-peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16	nbd: Don't use *_to_cpup() functions	Peter Maydell	2	-9/+9
	The _to_cpup() functions are not very useful, as they simply do a pointer dereference and then a _to_cpu(). Instead use either: * ld__p(), if the data is at an address that might not be correctly aligned for the load * a local dereference and *_to_cpu(), if the pointer is the correct type and known to be correctly aligned Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1465570836-22211-1-git-send-email-peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29	nbd: Don't trim unrequested bytes	Eric Blake	1	-6/+14
	Similar to commit df7b97ff, we are mishandling clients that give an unaligned NBD_CMD_TRIM request, and potentially trimming bytes that occur before their request; which in turn can cause potential unintended data loss (unlikely in practice, since most clients are sane and issue aligned trim requests). However, while we fixed read and write by switching to the byte interfaces of blk_, we don't yet have a byte interface for discard. On the other hand, trim is advisory, so rounding the user's request to simply ignore the first and last unaligned sectors (or the entire request, if it is sub-sector in length) is just fine. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1464173965-9694-1-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19	qemu-common: stop including qemu/bswap.h from qemu-common.h	Paolo Bonzini	1	-0/+1
	Move it to the actual users. There are still a few includes of qemu/bswap.h in headers; removing them is left for future work. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-18	Fix some typos found by codespell	Stefan Weil	1	-1/+1
	Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-12	block: Allow BDRV_REQ_FUA through blk_pwrite()	Eric Blake	1	-1/+1
	We have several block drivers that understand BDRV_REQ_FUA, and emulate it in the block layer for the rest by a full flush. But without a way to actually request BDRV_REQ_FUA during a pass-through blk_pwrite(), FUA-aware block drivers like NBD are forced to repeat the emulation logic of a full flush regardless of whether the backend they are writing to could do it more efficiently. This patch just wires up a flags argument; followup patches will actually make use of it in the NBD driver and in qemu-io. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22	nbd: Don't mishandle unaligned client requests	Eric Blake	1	-6/+4
	The NBD protocol does not (yet) force any alignment constraints on clients. Even though qemu NBD clients always send requests that are aligned to 512 bytes, we must be prepared for non-qemu clients that don't care about alignment (even if it means they are less efficient). Our use of blk_read() and blk_write() was silently operating on the wrong file offsets when the client made an unaligned request, corrupting the client's data (but as the client already has control over the file we are serving, I don't think it is a security hole, per se, just a data corruption bug). Note that in the case of NBD_CMD_READ, an unaligned length could cause us to return up to 511 bytes of uninitialized trailing garbage from blk_try_blockalign() - hopefully nothing sensitive from the heap's prior usage is ever leaked in that manner. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1461249750-31928-1-git-send-email-eblake@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-15	nbd: Don't kill server on client that doesn't request TLS	Eric Blake	1	-2/+13
	Upstream NBD documents (as of commit 4feebc95) that servers MAY choose to operate in a conditional mode, where it is up to the client whether to use TLS. For qemu's case, we want to always be in FORCEDTLS mode, because of the risk of man-in-the-middle attacks, and since we never export more than one device; likewise, the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first option. But now that SELECTIVETLS servers exist, it is feasible to encounter a (non-qemu) client that is programmed to talk to such a server, and does not do NBD_OPT_STARTTLS first, but rather wants to probe if it can use a non-encrypted export. The NBD protocol documents that we should let such a client continue trying, on the grounds that maybe the client will get the hint to send NBD_OPT_STARTTLS, rather than immediately dropping the connection. Note that NBD_OPT_EXPORT_NAME is a special case: since it is the only option request that can't have an error return, we have to (continue to) drop the connection on that one; rather, what we are fixing here is that all other replies prior to TLS initiation tell the client NBD_REP_ERR_TLS_REQD, but keep the connection alive. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15	nbd: Don't fail handshake on NBD_OPT_LIST descriptions	Eric Blake	1	-2/+21
	The NBD Protocol states that NBD_REP_SERVER may set 'length > sizeof(namelen) + namelen'; in which case the rest of the packet is a UTF-8 description of the export. While we don't know of any NBD servers that send this description yet, we had better consume the data so we don't choke when we start to talk to such a server. Also, a (buggy/malicious) server that replies with length < sizeof(namelen) would cause us to block waiting for bytes that the server is not sending, and one that replies with super-huge lengths could cause us to temporarily allocate up to 4G memory. Sanity check things before blindly reading incorrectly. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1460077777-31004-1-git-send-email-eblake@redhat.com Reviewed-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-08	nbd: do not hang nbd_wr_syncv if outside a coroutine and no available data	Paolo Bonzini	1	-1/+4
	Until commit 1c778ef7 ("nbd: convert to using I/O channels for actual socket I/O", 2016-02-16), nbd_wr_sync returned -EAGAIN this scenario. nbd_reply_ready required these semantics because it has two conflicting requirements: 1) if a reply can be received on the socket, nbd_reply_ready needs to read the header outside coroutine context to identify _which_ coroutine to enter to process the rest of the reply 2) on the other hand, nbd_reply_ready can find a false positive if another thread (e.g. a VCPU thread running aio_poll) sneaks in and calls nbd_reply_ready too. In this case nbd_reply_ready does nothing and expects nbd_wr_syncv to return -EAGAIN. Currently, the solution to the first requirement is to wait in the very rare case of a read() that doesn't retrieve the reply header in its entirety; this is what nbd_wr_syncv does by calling qio_channel_wait(). However, the unconditional call to qio_channel_wait() breaks the second requirement. To fix this, the patch makes nbd_wr_syncv return -EAGAIN if done is zero, similar to the code before commit 1c778ef7. This is okay because NBD client-side negotiation is the only other case that calls nbd_wr_syncv outside a coroutine, and it places the socket in blocking mode. On the other hand, it is a bit unpleasant to put this in nbd_wr_syncv(), because the function is used by both client and server. The full fix would be to add a counter to NbdClientSession for how many bytes have been filled in s->reply. Then a reply can be filled by multiple separate invocations of nbd_reply_ready and the qio_channel_wait() call can be removed completely. Something to consider for 2.7... Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>