在线迁移中tapdisk2读取的数据不一致问题

技术2025-12-14 15

Question:

given that vm migrates from host A to host B, which image is vhd file. in host B, it first creates devices including starting tapdisk2 process, at this time, tapdisk2 will read some metadata of vhd file. then, it xc_restore in host A, before it start last iteration(stop-and-copy phase), while xc_save's going, vhd file has been changed including metadata. So, in hostB tapdisk2 process doesn't read the newest metadata of vhd file. for tapdisk2, when it starts, it will read footer, header, bat of vhd file. especially bat structure, if it's inconsistent, it'll cause problem.

Solution:

Options:

A. Avoid VBD lifetime overlap. This is how XCP presently does it. XCPhas vdi.activate/deactivate operations in addition to attach/detach tocontrol storage during migration.

Attach/detach is the same as described above. It may be desired as thepreferred transfer method on non-shared storage nodes to avoid latencyin stop/copy.

The simpler way is of course activate/deactivate semantics everywhere,which is mutually exclusive.

This is needed for any indirectly mapped disk format (vhd, qcow? etc) onshared physical nodes.

Not that this doesn't only matter for metadata. There are physicallayers where exclusive login is preferred/mandatory, so you won't evenget access to the device before pre-copy is done and the node could bereleased on A.

Diagram:

Node A B

VM.migrate .. pre-copy > < stop-and-copy > <resumed ...

VDI.attached ..------------A---------------> <-----------B-------------------..

VDI.active -----------A----> <----B-------..

B. Hack. Let the toolstack issue a tap-ctl pause/unpause cycle before resume. This will reopen the image.

C. Back then, in the dark ages, blktap did this implicitly. Every first I/O request after disk create run an implicit close/open cycle on the physical image.

最新回复(0)