Mailing List Archive

[PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply
In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

---
Changes since V4:
- Removed vmx_exited_by_nested_pagefault() (now using npfec.kind).
---
xen/arch/x86/domain.c | 3 ++
xen/arch/x86/mm/p2m.c | 83 ++++++++++++++++++++++++++++++++++++++++
xen/include/asm-x86/domain.h | 9 +++++
xen/include/public/mem_event.h | 12 +++---
4 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7e0e78..7b1dfe6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -415,6 +415,9 @@ int vcpu_initialise(struct vcpu *v)

v->arch.flags = TF_kernel_mode;

+ /* By default, do not emulate */
+ v->arch.mem_event.emulate_flags = 0;
+
rc = mapcache_vcpu_init(v);
if ( rc )
return rc;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d0962aa..c9ede2b 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1395,6 +1395,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
p2m_access_t p2ma;
mem_event_request_t *req;
int rc;
+ unsigned long eip = guest_cpu_user_regs()->eip;

/* First, handle rx2rw conversion automatically.
* These calls to p2m->set_entry() must succeed: we have the gfn
@@ -1447,6 +1448,35 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
return 1;
}
}
+ else if ( v->arch.mem_event.emulate_flags == 0 &&
+ npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
+ {
+ v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
+ v->arch.mem_event.gpa = gpa;
+ v->arch.mem_event.eip = eip;
+ }
+
+ /* The previous mem_event reply does not match the current state. */
+ if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
+ {
+ /* Don't emulate the current instruction, send a new mem_event. */
+ v->arch.mem_event.emulate_flags = 0;
+
+ /* Make sure to mark the current state to match it again against
+ * the new mem_event about to be sent. */
+ v->arch.mem_event.gpa = gpa;
+ v->arch.mem_event.eip = eip;
+ }
+
+ if ( v->arch.mem_event.emulate_flags )
+ {
+ hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
+ MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
+ TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+
+ v->arch.mem_event.emulate_flags = 0;
+ return 1;
+ }

*req_ptr = NULL;
req = xzalloc(mem_event_request_t);
@@ -1502,6 +1532,59 @@ void p2m_mem_access_resume(struct domain *d)

v = d->vcpu[rsp.vcpu_id];

+ /* Mark vcpu for skipping one instruction upon rescheduling */
+ if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
+ {
+ xenmem_access_t access;
+ bool_t violation = 1;
+
+ v->arch.mem_event.emulate_flags = 0;
+
+ if ( p2m_get_mem_access(d, rsp.gfn, &access) == 0 )
+ {
+ switch ( access )
+ {
+ case XENMEM_access_n:
+ case XENMEM_access_n2rwx:
+ default:
+ violation = rsp.access_r || rsp.access_w || rsp.access_x;
+ break;
+
+ case XENMEM_access_r:
+ violation = rsp.access_w || rsp.access_x;
+ break;
+
+ case XENMEM_access_w:
+ violation = rsp.access_r || rsp.access_x;
+ break;
+
+ case XENMEM_access_x:
+ violation = rsp.access_r || rsp.access_w;
+ break;
+
+ case XENMEM_access_rx:
+ case XENMEM_access_rx2rw:
+ violation = rsp.access_w;
+ break;
+
+ case XENMEM_access_wx:
+ violation = rsp.access_r;
+ break;
+
+ case XENMEM_access_rw:
+ violation = rsp.access_x;
+ break;
+
+ case XENMEM_access_rwx:
+ violation = 0;
+ break;
+ }
+ }
+
+ if ( violation )
+ v->arch.mem_event.emulate_flags = rsp.flags;
+ }
+
/* Unpause domain */
if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
mem_event_vcpu_unpause(v);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 83329ed..440aa81 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -458,6 +458,15 @@ struct arch_vcpu

/* A secondary copy of the vcpu time info. */
XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+
+ /* Should we emulate the next matching instruction on VCPU resume
+ * after a mem_event? */
+ struct {
+ uint32_t emulate_flags;
+ unsigned long gpa;
+ unsigned long eip;
+ } mem_event;
+
} __cacheline_aligned;

smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index d3dd9c6..92c063c 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -31,11 +31,13 @@
#include "io/ring.h"

/* Memory event flags */
-#define MEM_EVENT_FLAG_VCPU_PAUSED (1 << 0)
-#define MEM_EVENT_FLAG_DROP_PAGE (1 << 1)
-#define MEM_EVENT_FLAG_EVICT_FAIL (1 << 2)
-#define MEM_EVENT_FLAG_FOREIGN (1 << 3)
-#define MEM_EVENT_FLAG_DUMMY (1 << 4)
+#define MEM_EVENT_FLAG_VCPU_PAUSED (1 << 0)
+#define MEM_EVENT_FLAG_DROP_PAGE (1 << 1)
+#define MEM_EVENT_FLAG_EVICT_FAIL (1 << 2)
+#define MEM_EVENT_FLAG_FOREIGN (1 << 3)
+#define MEM_EVENT_FLAG_DUMMY (1 << 4)
+#define MEM_EVENT_FLAG_EMULATE (1 << 5)
+#define MEM_EVENT_FLAG_EMULATE_NOWRITE (1 << 6)

/* Reasons for the memory event request */
#define MEM_EVENT_REASON_UNKNOWN 0 /* typical reason */
--
1.7.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On Tue, Sep 9, 2014 at 11:28 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> In a scenario where a page fault that triggered a mem_event occured,
> p2m_mem_access_check() will now be able to either 1) emulate the
> current instruction, or 2) emulate it, but don't allow it to perform
> any writes.
>
> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
[snip]

> + else if ( v->arch.mem_event.emulate_flags == 0 &&
> + npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
> + {
> + v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
> + v->arch.mem_event.gpa = gpa;
> + v->arch.mem_event.eip = eip;
> + }

It looks like the previous if() is true, that it will never get to
this point (because it will either return 0 or 1 depending on whether
p2m->access_required is set). So you don't need to make this an
"else" here -- you should just add a blank line and make this a normal
if().

Also, maybe it's just because I'm not familiar with the mem_event
interface, but I don't really see what this code is doing. It seems
to be changing the behavior even for clients that aren't using
MEM_EVENT_FLAG_EMULATE*. Is that intended? In any case it seems like
there could be a better comment here.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On 09/10/2014 07:03 PM, George Dunlap wrote:
> On Tue, Sep 9, 2014 at 11:28 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> In a scenario where a page fault that triggered a mem_event occured,
>> p2m_mem_access_check() will now be able to either 1) emulate the
>> current instruction, or 2) emulate it, but don't allow it to perform
>> any writes.
>>
>> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> Acked-by: Kevin Tian <kevin.tian@intel.com>
> [snip]
>
>> + else if ( v->arch.mem_event.emulate_flags == 0 &&
>> + npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
>> + {
>> + v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
>> + v->arch.mem_event.gpa = gpa;
>> + v->arch.mem_event.eip = eip;
>> + }
>
> It looks like the previous if() is true, that it will never get to
> this point (because it will either return 0 or 1 depending on whether
> p2m->access_required is set). So you don't need to make this an
> "else" here -- you should just add a blank line and make this a normal
> if().
>
> Also, maybe it's just because I'm not familiar with the mem_event
> interface, but I don't really see what this code is doing. It seems
> to be changing the behavior even for clients that aren't using
> MEM_EVENT_FLAG_EMULATE*. Is that intended? In any case it seems like
> there could be a better comment here.

Thanks, those are very good points. I'll make that a regular if(), and
test also if introspection monitoring is enabled (please see patch 3/5:
d->arch.hvm_domain.introspection_enabled) before setting the emulate
flag, that way we won't alter the behaviour for other clients.

As for the previous if, I think that if it holds then it won't be
possible to send a mem_event anyway, hence the else.


Thanks,
Razvan Cojocaru

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On Wed, Sep 10, 2014 at 5:12 PM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 09/10/2014 07:03 PM, George Dunlap wrote:
>> On Tue, Sep 9, 2014 at 11:28 AM, Razvan Cojocaru
>> <rcojocaru@bitdefender.com> wrote:
>>> In a scenario where a page fault that triggered a mem_event occured,
>>> p2m_mem_access_check() will now be able to either 1) emulate the
>>> current instruction, or 2) emulate it, but don't allow it to perform
>>> any writes.
>>>
>>> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>>> Acked-by: Kevin Tian <kevin.tian@intel.com>
>> [snip]
>>
>>> + else if ( v->arch.mem_event.emulate_flags == 0 &&
>>> + npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
>>> + {
>>> + v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
>>> + v->arch.mem_event.gpa = gpa;
>>> + v->arch.mem_event.eip = eip;
>>> + }
>>
>> It looks like the previous if() is true, that it will never get to
>> this point (because it will either return 0 or 1 depending on whether
>> p2m->access_required is set). So you don't need to make this an
>> "else" here -- you should just add a blank line and make this a normal
>> if().
>>
>> Also, maybe it's just because I'm not familiar with the mem_event
>> interface, but I don't really see what this code is doing. It seems
>> to be changing the behavior even for clients that aren't using
>> MEM_EVENT_FLAG_EMULATE*. Is that intended? In any case it seems like
>> there could be a better comment here.
>
> Thanks, those are very good points. I'll make that a regular if(), and
> test also if introspection monitoring is enabled (please see patch 3/5:
> d->arch.hvm_domain.introspection_enabled) before setting the emulate
> flag, that way we won't alter the behaviour for other clients.

...and you should also put a comment there explaining why someone with
introspection enabled wouldn't want an event here (something I'm still
not clear on).

Are you *sure* that everyone who enables introspection will want that
event suppressed (not just you), and that no one else will?
Otherwise, it might make more sense to add some kind of flag to enable
or disable it, rather than gating it on introspection. Or it's
possible everyone actually does want that event suppressed -- in which
case making it universal is the best option.

Andres, any opinions here?

> As for the previous if, I think that if it holds then it won't be
> possible to send a mem_event anyway, hence the else.

Sure, it won't be possible to send a mem event, because that code will
not be executed at all. :-)

Putting an "else" there sort of implies to someone reading the code
that you think the if() block might be executed and then continue
executing, which is misleading. In your patch it's even more
misleading, because the else only covers the first if() and not the
subsequent conditionals you've added right after; which implies that
the if() block might be executed and then execute the conditionals
below this one, but not this one.

The less things a programmer has to remember / figure out / keep in
her head, the less likely she is to make a mistake. :-)

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
>
> From: George Dunlap <dunlapg@umich.edu>
> Date: Wed, Sep 10, 2014 at 9:38 AM
> Subject: Re: [Xen-devel] [PATCH V6 5/5] xen: Handle resumed instruction
> based on previous mem_event reply
> To: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Cc: "Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>, Ian
> Campbell <ian.campbell@citrix.com>, Stefano Stabellini <
> stefano.stabellini@eu.citrix.com>, Jun Nakajima <jun.nakajima@intel.com>,
> Ian Jackson <ian.jackson@eu.citrix.com>, "Dong, Eddie" <
> eddie.dong@intel.com>, Tim Deegan <tim@xen.org>, Jan Beulich <
> jbeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>, xen-devel <
> xen-devel@lists.xenproject.org>, Andres Lagar-Cavilla <
> andres@gridcentric.ca>
>
>
> On Wed, Sep 10, 2014 at 5:12 PM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
> > On 09/10/2014 07:03 PM, George Dunlap wrote:
> >> On Tue, Sep 9, 2014 at 11:28 AM, Razvan Cojocaru
> >> <rcojocaru@bitdefender.com> wrote:
> >>> In a scenario where a page fault that triggered a mem_event occured,
> >>> p2m_mem_access_check() will now be able to either 1) emulate the
> >>> current instruction, or 2) emulate it, but don't allow it to perform
> >>> any writes.
> >>>
> >>> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> >>> Acked-by: Kevin Tian <kevin.tian@intel.com>
> >> [snip]
> >>
> >>> + else if ( v->arch.mem_event.emulate_flags == 0 &&
> >>> + npfec.kind != npfec_kind_with_gla ) /* don't send a
> mem_event */
> >>> + {
> >>> + v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
> >>> + v->arch.mem_event.gpa = gpa;
> >>> + v->arch.mem_event.eip = eip;
> >>> + }
> >>
> >> It looks like the previous if() is true, that it will never get to
> >> this point (because it will either return 0 or 1 depending on whether
> >> p2m->access_required is set). So you don't need to make this an
> >> "else" here -- you should just add a blank line and make this a normal
> >> if().
> >>
> >> Also, maybe it's just because I'm not familiar with the mem_event
> >> interface, but I don't really see what this code is doing. It seems
> >> to be changing the behavior even for clients that aren't using
> >> MEM_EVENT_FLAG_EMULATE*. Is that intended? In any case it seems like
> >> there could be a better comment here.
> >
> > Thanks, those are very good points. I'll make that a regular if(), and
> > test also if introspection monitoring is enabled (please see patch 3/5:
> > d->arch.hvm_domain.introspection_enabled) before setting the emulate
> > flag, that way we won't alter the behaviour for other clients.
>
> ...and you should also put a comment there explaining why someone with
> introspection enabled wouldn't want an event here (something I'm still
> not clear on).
>
> Are you *sure* that everyone who enables introspection will want that
> event suppressed (not just you), and that no one else will?
> Otherwise, it might make more sense to add some kind of flag to enable
> or disable it, rather than gating it on introspection. Or it's
> possible everyone actually does want that event suppressed -- in which
> case making it universal is the best option.
>
> Andres, any opinions here?
>

My view of the mem event interface is that it should err on the side of
informing the consumer. Now, if the consumer doesn't sign up for something,
why bother (i.e. we don't inform of writes, if the access mode set for the
gfn does not mask writes, etc).

In an ideal world, the emulation of the instruction should raise all
relevant new mem events. We don't know a priori what the consumer might
learn throughout the execution of this specific instruction. Does it read
from or write to new gfns which have mem access masks set? TTBOMK, because
the emulation does not go through the EPT fault handler, no mem access
events will be generated, even if they should.

This is a long-standing shortcoming of mem event in security frameworks, in
that mem access is only defined as raising events through EPT faults. One
could conceivably craft an attack by having an instruction that through its
emulation reads/writes a massive buffer going into other gfns. Conversely,
"virtual DMA", i.e. qemu accesses via map_foreign_pages and grant accesses
form backends don't raise mem access events. So there are many (conceptual)
holes.

A decent thing to do for now would be to add a flag ..._EMULATE_SILENT,
which resolves to the current behavior, and lack of ..._EMULATE_SILENT in a
brave future would raise all the mem access events resulting from the full
emulation of this instruction. Fix the API at least, before it's set in
stone.

Thanks
Andres



>
> > As for the previous if, I think that if it holds then it won't be
> > possible to send a mem_event anyway, hence the else.
>
> Sure, it won't be possible to send a mem event, because that code will
> not be executed at all. :-)
>
> Putting an "else" there sort of implies to someone reading the code
> that you think the if() block might be executed and then continue
> executing, which is misleading. In your patch it's even more
> misleading, because the else only covers the first if() and not the
> subsequent conditionals you've added right after; which implies that
> the if() block might be executed and then execute the conditionals
> below this one, but not this one.
>
> The less things a programmer has to remember / figure out / keep in
> her head, the less likely she is to make a mistake. :-)
>
> -George
>
>
>
> --
> Andres Lagar-Cavilla | Google Cloud Platform | andreslc@google.com |
> 647-778-4380
>
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On 09/10/14 21:28, Andres Lagar Cavilla wrote:
> On Wed, Sep 10, 2014 at 5:12 PM, Razvan Cojocaru
> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote:
> > On 09/10/2014 07:03 PM, George Dunlap wrote:
> >> On Tue, Sep 9, 2014 at 11:28 AM, Razvan Cojocaru
> >> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote:
> >>> In a scenario where a page fault that triggered a mem_event occured,
> >>> p2m_mem_access_check() will now be able to either 1) emulate the
> >>> current instruction, or 2) emulate it, but don't allow it to perform
> >>> any writes.
> >>>
> >>> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com
> <mailto:rcojocaru@bitdefender.com>>
> >>> Acked-by: Kevin Tian <kevin.tian@intel.com
> <mailto:kevin.tian@intel.com>>
> >> [snip]
> >>
> >>> + else if ( v->arch.mem_event.emulate_flags == 0 &&
> >>> + npfec.kind != npfec_kind_with_gla ) /* don't send
> a mem_event */
> >>> + {
> >>> + v->arch.mem_event.emulate_flags = MEM_EVENT_FLAG_EMULATE;
> >>> + v->arch.mem_event.gpa = gpa;
> >>> + v->arch.mem_event.eip = eip;
> >>> + }
> >>
> >> It looks like the previous if() is true, that it will never get to
> >> this point (because it will either return 0 or 1 depending on whether
> >> p2m->access_required is set). So you don't need to make this an
> >> "else" here -- you should just add a blank line and make this a
> normal
> >> if().
> >>
> >> Also, maybe it's just because I'm not familiar with the mem_event
> >> interface, but I don't really see what this code is doing. It seems
> >> to be changing the behavior even for clients that aren't using
> >> MEM_EVENT_FLAG_EMULATE*. Is that intended? In any case it seems
> like
> >> there could be a better comment here.
> >
> > Thanks, those are very good points. I'll make that a regular if(), and
> > test also if introspection monitoring is enabled (please see patch
> 3/5:
> > d->arch.hvm_domain.introspection_enabled) before setting the emulate
> > flag, that way we won't alter the behaviour for other clients.
>
> ...and you should also put a comment there explaining why someone with
> introspection enabled wouldn't want an event here (something I'm still
> not clear on).
>
> Are you *sure* that everyone who enables introspection will want that
> event suppressed (not just you), and that no one else will?
> Otherwise, it might make more sense to add some kind of flag to enable
> or disable it, rather than gating it on introspection. Or it's
> possible everyone actually does want that event suppressed -- in which
> case making it universal is the best option.
>
> Andres, any opinions here?
>
>
> My view of the mem event interface is that it should err on the side of
> informing the consumer. Now, if the consumer doesn't sign up for
> something, why bother (i.e. we don't inform of writes, if the access
> mode set for the gfn does not mask writes, etc).
>
> In an ideal world, the emulation of the instruction should raise all
> relevant new mem events. We don't know a priori what the consumer might
> learn throughout the execution of this specific instruction. Does it
> read from or write to new gfns which have mem access masks set? TTBOMK,
> because the emulation does not go through the EPT fault handler, no mem
> access events will be generated, even if they should.
>
> This is a long-standing shortcoming of mem event in security frameworks,
> in that mem access is only defined as raising events through EPT faults.
> One could conceivably craft an attack by having an instruction that
> through its emulation reads/writes a massive buffer going into other
> gfns. Conversely, "virtual DMA", i.e. qemu accesses via
> map_foreign_pages and grant accesses form backends don't raise mem
> access events. So there are many (conceptual) holes.
>
> A decent thing to do for now would be to add a flag ..._EMULATE_SILENT,
> which resolves to the current behavior, and lack of ..._EMULATE_SILENT
> in a brave future would raise all the mem access events resulting from
> the full emulation of this instruction. Fix the API at least, before
> it's set in stone.

As far as I understand, George is asking about why events that have
npfec.kind != npfec_kind_with_gla are being emulated instead of being
sent out like the rest, and if that's a requirement that all memory
introspection clients might have.

To answer that question, _our_ application is not interested in events
other than npfec_kind_with_gla, and because of that it seemed worthwhile
to save a HV <-> dom0 roundtrip for events that would need to be ignored
by the application anyway, and thus keep the guest as responsive as
possible. I can't, of course, state that no other introspection client
will be interested in the other types of events. But I can add another
parameter to xc_mem_access_enable_introspection() (please see patch 3/5
in the series) to specify whether non-npfec_kind_with_gla events should
be ignored or not (is this what the ..._EMULATE_SILENT suggestion refers
to?).


Thanks,
Razvan Cojocaru

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On Wed, Sep 10, 2014 at 10:28 PM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> As far as I understand, George is asking about why events that have
> npfec.kind != npfec_kind_with_gla are being emulated instead of being
> sent out like the rest, and if that's a requirement that all memory
> introspection clients might have.
>
> To answer that question, _our_ application is not interested in events
> other than npfec_kind_with_gla, and because of that it seemed worthwhile
> to save a HV <-> dom0 roundtrip for events that would need to be ignored
> by the application anyway, and thus keep the guest as responsive as
> possible. I can't, of course, state that no other introspection client
> will be interested in the other types of events. But I can add another
> parameter to xc_mem_access_enable_introspection() (please see patch 3/5
> in the series) to specify whether non-npfec_kind_with_gla events should
> be ignored or not (is this what the ..._EMULATE_SILENT suggestion refers
> to?).

I know when you come new to the list it seems like we're being a bunch
of old stodgy skeptics nitpicking everything for no good reason. :-)

So just to explain a bit where we're coming from: We're always happy
to have improvements from people, and we're glad to have more people
using Xen. But there are interfaces that we're going to have to be
supporting long-term. When you're writing a product and you own the
entire piece of code, you can make all these kinds of changes however
you want; the only people it will affect in the future are you.
Adding them to a shared project like Xen, they affect lots of other
people who aren't doing what you're doing. Furthermore, every
addition to the interface makes it a tiny bit more complicated,
fragile, and easy to break; and it's fairly likely that we'll end up
being the ones having to fix it at some point.

So we definitely want you to be able to write your introspection
engine and have it run well. But we also want to make sure that you
don't break anything for anyone else; and, we want to make sure that
anything you're adding to the interface is worth the extra cost in
terms of maintenance: that means pushing back and asking if you could
accomplish your goals just as well with the existing interface, or
with a simpler interface that is 1) useful generally, not just to you,
and 2) simplier and easier for us to maintain.

And we realize you may not have experience with this kind of project
before, which is why we're giving you feedback (even if we sometimes
we forget what it's like to be new and get annoyed at having to say
the same thing to every new person who wants to contribute -- you'll
have to cut us some slack too).

So with that in mind:

This "don't give a notification on npfec_kind_with_gla" is a separate
feature that should have been introduced in a separate patch. Of
course it's nice not to have to do context switches when you don't
have to; but adding a whole raft of flags for events you want to be
able to skip opens up a can of worms interface wise. So it needs to
be justified. How expensive is it actually to just have the controller
ignore these? How many unwanted events like this do you get on a
regular basis, and does it actually have a measurable performance
impact?

And if it does have a measurable performance impact, is it likely that
we're going to have other events that we may want to be able to filter
out in the hypervisor? Is it better to think about a more general /
scalable way of specifying these events, rather than adding flags in
an ad-hoc fashion as they come up?

On the whole, if you're hoping to have the less controversial bits
(patches 1-3, and the other half of this patch) in 4.5, you might want
to set this to the side and come back to it after the code freeze is
done.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On 09/11/2014 01:09 PM, George Dunlap wrote:
> So just to explain a bit where we're coming from: We're always happy
> to have improvements from people, and we're glad to have more people
> using Xen. But there are interfaces that we're going to have to be
> supporting long-term. When you're writing a product and you own the
> entire piece of code, you can make all these kinds of changes however
> you want; the only people it will affect in the future are you.
> Adding them to a shared project like Xen, they affect lots of other
> people who aren't doing what you're doing. Furthermore, every
> addition to the interface makes it a tiny bit more complicated,
> fragile, and easy to break; and it's fairly likely that we'll end up
> being the ones having to fix it at some point.

Thank you for the message, it's appreciated! Fair enough.

> So with that in mind:
>
> This "don't give a notification on npfec_kind_with_gla" is a separate
> feature that should have been introduced in a separate patch. Of
> course it's nice not to have to do context switches when you don't
> have to; but adding a whole raft of flags for events you want to be
> able to skip opens up a can of worms interface wise. So it needs to
> be justified. How expensive is it actually to just have the controller
> ignore these? How many unwanted events like this do you get on a
> regular basis, and does it actually have a measurable performance
> impact?
>
> And if it does have a measurable performance impact, is it likely that
> we're going to have other events that we may want to be able to filter
> out in the hypervisor? Is it better to think about a more general /
> scalable way of specifying these events, rather than adding flags in
> an ad-hoc fashion as they come up?
>
> On the whole, if you're hoping to have the less controversial bits
> (patches 1-3, and the other half of this patch) in 4.5, you might want
> to set this to the side and come back to it after the code freeze is
> done.

The impact is acceptable, I'll do the filtering in the application.
Thanks for the suggestion! I'll do a bit more testing, and if all goes
well I'll resubmit the series as patches 1-3, and 5 without the GLA
filtering.


Thanks,
Razvan Cojocaru

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
I've removed the CC's as I'm going a bit off-topic here.


> In an ideal world, the emulation of the instruction should raise all
> relevant new mem events. We don't know a priori what the consumer might
> learn throughout the execution of this specific instruction. Does it read
> from or write to new gfns which have mem access masks set? TTBOMK, because
> the emulation does not go through the EPT fault handler, no mem access
> events will be generated, even if they should.
>
> This is a long-standing shortcoming of mem event in security frameworks,
> in that mem access is only defined as raising events through EPT faults.
> One could conceivably craft an attack by having an instruction that through
> its emulation reads/writes a massive buffer going into other gfns.
> Conversely, "virtual DMA", i.e. qemu accesses via map_foreign_pages and
> grant accesses form backends don't raise mem access events. So there are
> many (conceptual) holes.
>

Could you provide an example instruction that is trapped-and-emulated by
Xen which may be used in such a fashion? Also, is there any technical
reason why we couldn't hook such emulations into the mem_event system?

Thanks,
Tamas
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On Thu, Sep 11, 2014 at 7:40 AM, Tamas K Lengyel <tamas.lengyel@zentific.com
> wrote:

> I've removed the CC's as I'm going a bit off-topic here.
>
>
>> In an ideal world, the emulation of the instruction should raise all
>> relevant new mem events. We don't know a priori what the consumer might
>> learn throughout the execution of this specific instruction. Does it read
>> from or write to new gfns which have mem access masks set? TTBOMK, because
>> the emulation does not go through the EPT fault handler, no mem access
>> events will be generated, even if they should.
>>
>> This is a long-standing shortcoming of mem event in security frameworks,
>> in that mem access is only defined as raising events through EPT faults.
>> One could conceivably craft an attack by having an instruction that through
>> its emulation reads/writes a massive buffer going into other gfns.
>> Conversely, "virtual DMA", i.e. qemu accesses via map_foreign_pages and
>> grant accesses form backends don't raise mem access events. So there are
>> many (conceptual) holes.
>>
>
> Could you provide an example instruction that is trapped-and-emulated by
> Xen which may be used in such a fashion? Also, is there any technical
> reason why we couldn't hook such emulations into the mem_event system?
>

Tamas,
I think it's safe to assume Razvan's dom0 application is powerful enough to
emulate the entire trapping instruction and not be victimized.

For the sake of argument, what I'm going at is that after the mem_event has
been handled and control is passed to hvm_emulate_one, Xen will start
resolving gfn->mfn translations needed by the instruction emulation by
internally walking the p2m (read EPT) table with get_page_from_gfn. This
will not invoke p2m_mem_access_check (only happens for actual hw faults),
so an instruction that reads or writes across pages will not have a mem
event generated for the other pages. A rep stos across page boundaries
would do that (key: the rep stos is emulated in Xen, and the eip is then
moved silently forward, so the hardware actually doesn't get to execute the
instruction).

A harder to catch example is a qemu-based driver, which grabs guest pages
via the mapcache buckets using xc_map_foreign_bulk. This resolves
to MMU_NORMAL_PT_UPDATE, which will grab the target page with ...
get_page_from_gfn. Basically, every page qemu reads/writes to/from will not
result in a mem event. This is akin to an unrestricted DMA engine that can
bypass the hardware PTE protection bits and do things behind the OS back.

Grant mapping also uses get_page_from_gfn ... no mem access checks.

The way to fix it is very laborious, that is why it hasn't happened. The
root cause is that p2m->get_entry does not check any of the access bits. It
could, and then you would be generating mem events from everywhere. But
that brings two problems. First, repeated events, as the same gfn may be
read multiple times -- I don't think anybody wants that. Second, you have
to be able to sleep on a wait queue when the event ring fills up (unless
you are comfortable dropping events). Sleeping on a wait queue pretty much
means stopping everything you are doing, carefully unrolling your stack
until you hold no spinlocks, going into the wait queue, and when you wake
up dive back into business.

HTH
Andres


> Thanks,
> Tamas
>
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On Thu, Sep 11, 2014 at 6:42 PM, Andres Lagar Cavilla <
andres@lagarcavilla.org> wrote:

> On Thu, Sep 11, 2014 at 7:40 AM, Tamas K Lengyel <
> tamas.lengyel@zentific.com> wrote:
>
>> I've removed the CC's as I'm going a bit off-topic here.
>>
>>
>>> In an ideal world, the emulation of the instruction should raise all
>>> relevant new mem events. We don't know a priori what the consumer might
>>> learn throughout the execution of this specific instruction. Does it read
>>> from or write to new gfns which have mem access masks set? TTBOMK, because
>>> the emulation does not go through the EPT fault handler, no mem access
>>> events will be generated, even if they should.
>>>
>>> This is a long-standing shortcoming of mem event in security frameworks,
>>> in that mem access is only defined as raising events through EPT faults.
>>> One could conceivably craft an attack by having an instruction that through
>>> its emulation reads/writes a massive buffer going into other gfns.
>>> Conversely, "virtual DMA", i.e. qemu accesses via map_foreign_pages and
>>> grant accesses form backends don't raise mem access events. So there are
>>> many (conceptual) holes.
>>>
>>
>> Could you provide an example instruction that is trapped-and-emulated by
>> Xen which may be used in such a fashion? Also, is there any technical
>> reason why we couldn't hook such emulations into the mem_event system?
>>
>
> Tamas,
> I think it's safe to assume Razvan's dom0 application is powerful enough
> to emulate the entire trapping instruction and not be victimized.
>
> For the sake of argument, what I'm going at is that after the mem_event
> has been handled and control is passed to hvm_emulate_one, Xen will start
> resolving gfn->mfn translations needed by the instruction emulation by
> internally walking the p2m (read EPT) table with get_page_from_gfn. This
> will not invoke p2m_mem_access_check (only happens for actual hw faults),
> so an instruction that reads or writes across pages will not have a mem
> event generated for the other pages. A rep stos across page boundaries
> would do that (key: the rep stos is emulated in Xen, and the eip is then
> moved silently forward, so the hardware actually doesn't get to execute the
> instruction).
>
> A harder to catch example is a qemu-based driver, which grabs guest pages
> via the mapcache buckets using xc_map_foreign_bulk. This resolves
> to MMU_NORMAL_PT_UPDATE, which will grab the target page with ...
> get_page_from_gfn. Basically, every page qemu reads/writes to/from will not
> result in a mem event. This is akin to an unrestricted DMA engine that can
> bypass the hardware PTE protection bits and do things behind the OS back.
>
> Grant mapping also uses get_page_from_gfn ... no mem access checks.
>
> The way to fix it is very laborious, that is why it hasn't happened. The
> root cause is that p2m->get_entry does not check any of the access bits. It
> could, and then you would be generating mem events from everywhere. But
> that brings two problems. First, repeated events, as the same gfn may be
> read multiple times -- I don't think anybody wants that. Second, you have
> to be able to sleep on a wait queue when the event ring fills up (unless
> you are comfortable dropping events). Sleeping on a wait queue pretty much
> means stopping everything you are doing, carefully unrolling your stack
> until you hold no spinlocks, going into the wait queue, and when you wake
> up dive back into business.
>
> HTH
> Andres
>

Thanks for the in-depth explanation, it certainly sheds some light on the
limitations of the mem_access system. I understand that any memory access
to mfn's via mechanisms that don't use the trapped EPT (a pv domain or the
hypervisor itself) or have a mapping of the same pages via different EPTs
won't trigger the mem_event traps. For the emulation part my question was
rather if you are aware of any emulation that currently takes place
(outside this patch series) which may be used in this fashion?

Thanks,
Tamas
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On Thu, Sep 11, 2014 at 11:09 AM, Tamas K Lengyel <
tamas.lengyel@zentific.com> wrote:

>
>
> On Thu, Sep 11, 2014 at 6:42 PM, Andres Lagar Cavilla <
> andres@lagarcavilla.org> wrote:
>
>> On Thu, Sep 11, 2014 at 7:40 AM, Tamas K Lengyel <
>> tamas.lengyel@zentific.com> wrote:
>>
>>> I've removed the CC's as I'm going a bit off-topic here.
>>>
>>>
>>>> In an ideal world, the emulation of the instruction should raise all
>>>> relevant new mem events. We don't know a priori what the consumer might
>>>> learn throughout the execution of this specific instruction. Does it read
>>>> from or write to new gfns which have mem access masks set? TTBOMK, because
>>>> the emulation does not go through the EPT fault handler, no mem access
>>>> events will be generated, even if they should.
>>>>
>>>> This is a long-standing shortcoming of mem event in security
>>>> frameworks, in that mem access is only defined as raising events through
>>>> EPT faults. One could conceivably craft an attack by having an instruction
>>>> that through its emulation reads/writes a massive buffer going into other
>>>> gfns. Conversely, "virtual DMA", i.e. qemu accesses via map_foreign_pages
>>>> and grant accesses form backends don't raise mem access events. So there
>>>> are many (conceptual) holes.
>>>>
>>>
>>> Could you provide an example instruction that is trapped-and-emulated by
>>> Xen which may be used in such a fashion? Also, is there any technical
>>> reason why we couldn't hook such emulations into the mem_event system?
>>>
>>
>> Tamas,
>> I think it's safe to assume Razvan's dom0 application is powerful enough
>> to emulate the entire trapping instruction and not be victimized.
>>
>> For the sake of argument, what I'm going at is that after the mem_event
>> has been handled and control is passed to hvm_emulate_one, Xen will start
>> resolving gfn->mfn translations needed by the instruction emulation by
>> internally walking the p2m (read EPT) table with get_page_from_gfn. This
>> will not invoke p2m_mem_access_check (only happens for actual hw faults),
>> so an instruction that reads or writes across pages will not have a mem
>> event generated for the other pages. A rep stos across page boundaries
>> would do that (key: the rep stos is emulated in Xen, and the eip is then
>> moved silently forward, so the hardware actually doesn't get to execute the
>> instruction).
>>
>> A harder to catch example is a qemu-based driver, which grabs guest pages
>> via the mapcache buckets using xc_map_foreign_bulk. This resolves
>> to MMU_NORMAL_PT_UPDATE, which will grab the target page with ...
>> get_page_from_gfn. Basically, every page qemu reads/writes to/from will not
>> result in a mem event. This is akin to an unrestricted DMA engine that can
>> bypass the hardware PTE protection bits and do things behind the OS back.
>>
>> Grant mapping also uses get_page_from_gfn ... no mem access checks.
>>
>> The way to fix it is very laborious, that is why it hasn't happened. The
>> root cause is that p2m->get_entry does not check any of the access bits. It
>> could, and then you would be generating mem events from everywhere. But
>> that brings two problems. First, repeated events, as the same gfn may be
>> read multiple times -- I don't think anybody wants that. Second, you have
>> to be able to sleep on a wait queue when the event ring fills up (unless
>> you are comfortable dropping events). Sleeping on a wait queue pretty much
>> means stopping everything you are doing, carefully unrolling your stack
>> until you hold no spinlocks, going into the wait queue, and when you wake
>> up dive back into business.
>>
>> HTH
>> Andres
>>
>
> Thanks for the in-depth explanation, it certainly sheds some light on the
> limitations of the mem_access system. I understand that any memory access
> to mfn's via mechanisms that don't use the trapped EPT (a pv domain or the
> hypervisor itself) or have a mapping of the same pages via different EPTs
> won't trigger the mem_event traps. For the emulation part my question was
> rather if you are aware of any emulation that currently takes place
> (outside this patch series) which may be used in this fashion?
>

Uhm. Examples I can think of: MMIO access. The OS reads values from lapic
or hpet pages, and those get emulated (although there are lapic fast paths
out there). If the buffers in regular RAM fall in pages that have mem
access permission trapping set, then no event will be generated (by that
mmio instruction).

And all your PV driver frontend needs. Qemu does the RTC (IIRC), so RTC
reads also escape mem access.

Andres


> Thanks,
> Tamas
>
Re: [PATCH V6 5/5] xen: Handle resumed instruction based on previous mem_event reply [ In reply to ]
On 11/09/2014 19:39, Andres Lagar Cavilla wrote:
> On Thu, Sep 11, 2014 at 11:09 AM, Tamas K Lengyel
> <tamas.lengyel@zentific.com <mailto:tamas.lengyel@zentific.com>> wrote:
>
>
>
> On Thu, Sep 11, 2014 at 6:42 PM, Andres Lagar Cavilla
> <andres@lagarcavilla.org <mailto:andres@lagarcavilla.org>> wrote:
>
> On Thu, Sep 11, 2014 at 7:40 AM, Tamas K Lengyel
> <tamas.lengyel@zentific.com
> <mailto:tamas.lengyel@zentific.com>> wrote:
>
> I've removed the CC's as I'm going a bit off-topic here.
>
>
> In an ideal world, the emulation of the instruction
> should raise all relevant new mem events. We don't
> know a priori what the consumer might learn throughout
> the execution of this specific instruction. Does it
> read from or write to new gfns which have mem access
> masks set? TTBOMK, because the emulation does not go
> through the EPT fault handler, no mem access events
> will be generated, even if they should.
>
> This is a long-standing shortcoming of mem event in
> security frameworks, in that mem access is only
> defined as raising events through EPT faults. One
> could conceivably craft an attack by having an
> instruction that through its emulation reads/writes a
> massive buffer going into other gfns. Conversely,
> "virtual DMA", i.e. qemu accesses via
> map_foreign_pages and grant accesses form backends
> don't raise mem access events. So there are many
> (conceptual) holes.
>
>
> Could you provide an example instruction that is
> trapped-and-emulated by Xen which may be used in such a
> fashion? Also, is there any technical reason why we
> couldn't hook such emulations into the mem_event system?
>
>
> Tamas,
> I think it's safe to assume Razvan's dom0 application is
> powerful enough to emulate the entire trapping instruction and
> not be victimized.
>
> For the sake of argument, what I'm going at is that after the
> mem_event has been handled and control is passed to
> hvm_emulate_one, Xen will start resolving gfn->mfn
> translations needed by the instruction emulation by internally
> walking the p2m (read EPT) table with get_page_from_gfn. This
> will not invoke p2m_mem_access_check (only happens for actual
> hw faults), so an instruction that reads or writes across
> pages will not have a mem event generated for the other pages.
> A rep stos across page boundaries would do that (key: the rep
> stos is emulated in Xen, and the eip is then moved silently
> forward, so the hardware actually doesn't get to execute the
> instruction).
>
> A harder to catch example is a qemu-based driver, which grabs
> guest pages via the mapcache buckets using
> xc_map_foreign_bulk. This resolves to MMU_NORMAL_PT_UPDATE,
> which will grab the target page with ... get_page_from_gfn.
> Basically, every page qemu reads/writes to/from will not
> result in a mem event. This is akin to an unrestricted DMA
> engine that can bypass the hardware PTE protection bits and do
> things behind the OS back.
>
> Grant mapping also uses get_page_from_gfn ... no mem access
> checks.
>
> The way to fix it is very laborious, that is why it hasn't
> happened. The root cause is that p2m->get_entry does not check
> any of the access bits. It could, and then you would be
> generating mem events from everywhere. But that brings two
> problems. First, repeated events, as the same gfn may be read
> multiple times -- I don't think anybody wants that. Second,
> you have to be able to sleep on a wait queue when the event
> ring fills up (unless you are comfortable dropping events).
> Sleeping on a wait queue pretty much means stopping everything
> you are doing, carefully unrolling your stack until you hold
> no spinlocks, going into the wait queue, and when you wake up
> dive back into business.
>
> HTH
> Andres
>
>
> Thanks for the in-depth explanation, it certainly sheds some light
> on the limitations of the mem_access system. I understand that any
> memory access to mfn's via mechanisms that don't use the trapped
> EPT (a pv domain or the hypervisor itself) or have a mapping of
> the same pages via different EPTs won't trigger the mem_event
> traps. For the emulation part my question was rather if you are
> aware of any emulation that currently takes place (outside this
> patch series) which may be used in this fashion?
>
>
> Uhm. Examples I can think of: MMIO access. The OS reads values from
> lapic or hpet pages, and those get emulated (although there are lapic
> fast paths out there). If the buffers in regular RAM fall in pages
> that have mem access permission trapping set, then no event will be
> generated (by that mmio instruction).
>
> And all your PV driver frontend needs. Qemu does the RTC (IIRC), so
> RTC reads also escape mem access.

Xen does all forms of timer and interrupt emulation (so off the top of
my head, RTC, PIT, HPET, PMTimer, PIC, IOAPIC and LAPIC) but all other
legacy devices are handled by Qemu. There is now a fastpath for
anything emulated by Xen, for performance/scalability reasons with
many-vcpu guests.

~Andrew