Mailing List Archive

Stale mfns in update_queue in XenoLinux, and suspend/resume
hi,

it seems the suspend code in arch/xen/kernel/setup.c does not flush to
mmu_update queue prior to suspend, and that as a result it may crash
after resumption as a result of stale machine page frame references in
the queue. Is this correct/should this behaviour be fixed? I am
currently investigating a crash in my own migration code, and though I
do flush the queue prior to obtaining a checkpoint, I still seem to be
hit occasionally by stale references somewhere.

If suspension is going to be safe, I guess all uses of machine addresses
should be treated as critical regions, to make sure a suspend/resume
does not happen while they are still in scope? I know this will be
problematic because of the batching of mmu-updates, perhaps it would be
wise to revert to the old behavior of specifying them as virtual
addresses, or maybe they should be converted on the fly, in a cli()
context right before the hypercall?

Jacob



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel
Re: Stale mfns in update_queue in XenoLinux, and suspend/resume [ In reply to ]
> it seems the suspend code in arch/xen/kernel/setup.c does not flush to
> mmu_update queue prior to suspend, and that as a result it may crash
> after resumption as a result of stale machine page frame references in
> the queue. Is this correct/should this behaviour be fixed? I am
> currently investigating a crash in my own migration code, and though I
> do flush the queue prior to obtaining a checkpoint, I still seem to be
> hit occasionally by stale references somewhere.
>
> If suspension is going to be safe, I guess all uses of machine addresses
> should be treated as critical regions, to make sure a suspend/resume
> does not happen while they are still in scope? I know this will be
> problematic because of the batching of mmu-updates, perhaps it would be
> wise to revert to the old behavior of specifying them as virtual
> addresses, or maybe they should be converted on the fly, in a cli()
> context right before the hypercall?

Suspend/resume occurs in a process context. Since Xenolinux is
uniprocessor, I think that this should mean that there are no
outstanding page-update requests. Thinking about it, though, it's
possible that interrupt handlers and softirqs may add stuff to teh
update queue. For safety you might want to flush it immediately after
__cli().

-- Keir



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel