From 9ccbfe14ddfce379ee24684b3648376b130293cd Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 12 Mar 2018 17:21:00 +0000 Subject: postcopy: Add vhost-user flag for postcopy and check it Add a vhost feature flag for postcopy support, and use the postcopy notifier to check it before allowing postcopy. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Peter Xu Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- docs/interop/vhost-user.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'docs') diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index cb3a7595aa..91a572d781 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -290,6 +290,15 @@ Once the source has finished migration, rings will be stopped by the source. No further update must be done before rings are restarted. +In postcopy migration the slave is started before all the memory has been +received from the source host, and care must be taken to avoid accessing pages +that have yet to be received. The slave opens a 'userfault'-fd and registers +the memory with it; this fd is then passed back over to the master. +The master services requests on the userfaultfd for pages that are accessed +and when the page is available it performs WAKE ioctl's on the userfaultfd +to wake the stalled slave. The client indicates support for this via the +VHOST_USER_PROTOCOL_F_PAGEFAULT feature. + Memory access ------------- @@ -369,6 +378,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 +#define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 Master message types -------------------- -- cgit v1.2.1 From d3dff7a5a1e0a6eff963fabc4d06879d060f34ee Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 12 Mar 2018 17:21:01 +0000 Subject: vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wire up a notifier to send a VHOST_USER_POSTCOPY_ADVISE message on an incoming advise. Later patches will fill in the behaviour/contents of the message. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Marc-André Lureau Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- docs/interop/vhost-user.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'docs') diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index 91a572d781..7854e50008 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -699,6 +699,16 @@ Master message types feature has been successfully negotiated. It's a required feature for crypto devices. + * VHOST_USER_POSTCOPY_ADVISE + Id: 28 + Master payload: N/A + Slave payload: userfault fd + + When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, the + master advises slave that a migration with postcopy enabled is underway, + the slave must open a userfaultfd for later use. + Note that at this stage the migration is still in precopy mode. + Slave message types ------------------- -- cgit v1.2.1 From 6864a7b5aced6d8d9b287b92db8d7a996ea2e8a3 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 12 Mar 2018 17:21:06 +0000 Subject: vhost+postcopy: Transmit 'listen' to slave MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Notify the vhost-user slave on reception of the 'postcopy-listen' event from the source. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Marc-André Lureau Reviewed-by: Peter Xu Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- docs/interop/vhost-user.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'docs') diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index 7854e50008..0d24203d31 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -709,6 +709,17 @@ Master message types the slave must open a userfaultfd for later use. Note that at this stage the migration is still in precopy mode. + * VHOST_USER_POSTCOPY_LISTEN + Id: 29 + Master payload: N/A + + Master advises slave that a transition to postcopy mode has happened. + The slave must ensure that shared memory is registered with userfaultfd + to cause faulting of non-present pages. + + This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and + thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported. + Slave message types ------------------- -- cgit v1.2.1 From 9bb38019942c2f3f44b98f5830e369faec701e55 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 12 Mar 2018 17:21:10 +0000 Subject: vhost+postcopy: Send address back to qemu MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We need a better way, but at the moment we need the address of the mappings sent back to qemu so it can interpret the messages on the userfaultfd it reads. This is done as a 3 stage set: QEMU -> client set_mem_table mmap stuff, get addresses client -> qemu here are the addresses qemu -> client OK - now you can use them That ensures that qemu has registered the new addresses in it's userfault code before the client starts accessing them. Note: We don't ask for the default 'ack' reply since we've got our own. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Marc-André Lureau Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- docs/interop/vhost-user.txt | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'docs') diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index 0d24203d31..e295ef12ca 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -455,12 +455,21 @@ Master message types Id: 5 Equivalent ioctl: VHOST_SET_MEM_TABLE Master payload: memory regions description + Slave payload: (postcopy only) memory regions description Sets the memory map regions on the slave so it can translate the vring addresses. In the ancillary data there is an array of file descriptors for each memory mapped region. The size and ordering of the fds matches the number and ordering of memory regions. + When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE replies with + the bases of the memory mapped regions to the master. The slave must + have mmap'd the regions but not yet accessed them and should not yet generate + a userfault event. Note NEED_REPLY_MASK is not set in this case. + QEMU will then reply back to the list of mappings with an empty + VHOST_USER_SET_MEM_TABLE as an acknowledgment; only upon reception of this + message may the guest start accessing the memory and generating faults. + * VHOST_USER_SET_LOG_BASE Id: 6 -- cgit v1.2.1 From c639187e3342cb14e100d14ce4854444f7ae98d5 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 12 Mar 2018 17:21:19 +0000 Subject: vhost-user: Add VHOST_USER_POSTCOPY_END message MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This message is sent just before the end of postcopy to get the client to stop using userfault since we wont respond to any more requests. It should close userfaultfd so that any other pages get mapped to the backing file automatically by the kernel, since at this point we know we've received everything. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Peter Xu Reviewed-by: Marc-André Lureau Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- docs/interop/vhost-user.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'docs') diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index e295ef12ca..c058c407df 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -729,6 +729,18 @@ Master message types This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported. + * VHOST_USER_POSTCOPY_END + Id: 30 + Slave payload: u64 + + Master advises that postcopy migration has now completed. The + slave must disable the userfaultfd. The response is an acknowledgement + only. + When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, this message + is sent at the end of the migration, after VHOST_USER_POSTCOPY_LISTEN + was previously sent. + The value returned is an error indication; 0 is success. + Slave message types ------------------- -- cgit v1.2.1 From 1dc61e7b37d339c42ec9bd7a7eec1ef2c22f351c Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 12 Mar 2018 17:21:24 +0000 Subject: postcopy shared docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add some notes to the migration documentation for shared memory postcopy. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Marc-André Lureau Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- docs/devel/migration.rst | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) (limited to 'docs') diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst index 9d1b7657f0..e32b087f6e 100644 --- a/docs/devel/migration.rst +++ b/docs/devel/migration.rst @@ -577,3 +577,44 @@ Postcopy now works with hugetlbfs backed memory: hugepages works well, however 1GB hugepages are likely to be problematic since it takes ~1 second to transfer a 1GB hugepage across a 10Gbps link, and until the full page is transferred the destination thread is blocked. + +Postcopy with shared memory +--------------------------- + +Postcopy migration with shared memory needs explicit support from the other +processes that share memory and from QEMU. There are restrictions on the type of +memory that userfault can support shared. + +The Linux kernel userfault support works on `/dev/shm` memory and on `hugetlbfs` +(although the kernel doesn't provide an equivalent to `madvise(MADV_DONTNEED)` +for hugetlbfs which may be a problem in some configurations). + +The vhost-user code in QEMU supports clients that have Postcopy support, +and the `vhost-user-bridge` (in `tests/`) and the DPDK package have changes +to support postcopy. + +The client needs to open a userfaultfd and register the areas +of memory that it maps with userfault. The client must then pass the +userfaultfd back to QEMU together with a mapping table that allows +fault addresses in the clients address space to be converted back to +RAMBlock/offsets. The client's userfaultfd is added to the postcopy +fault-thread and page requests are made on behalf of the client by QEMU. +QEMU performs 'wake' operations on the client's userfaultfd to allow it +to continue after a page has arrived. + +.. note:: + There are two future improvements that would be nice: + a) Some way to make QEMU ignorant of the addresses in the clients + address space + b) Avoiding the need for QEMU to perform ufd-wake calls after the + pages have arrived + +Retro-fitting postcopy to existing clients is possible: + a) A mechanism is needed for the registration with userfault as above, + and the registration needs to be coordinated with the phases of + postcopy. In vhost-user extra messages are added to the existing + control channel. + b) Any thread that can block due to guest memory accesses must be + identified and the implication understood; for example if the + guest memory access is made while holding a lock then all other + threads waiting for that lock will also be blocked. -- cgit v1.2.1