Mailing List Archive

[RFC] Proposed XenStore Interactions for Multi-Queue VIFs
I'm posting this for an initial round of comments; I don't have any code
at present to implement this, and wanted to get some feedback before
getting started. All comments welcome :)

Andrew.


Proposed XenStore Interactions for Multi-Queue VIFs
========================================================================
Andrew J. Bennieston <andrew.bennieston@citrix.com> June 26 2013

Contents
--------
1. Rationale
2. Backend feature advertising
3. Frontend setup
3.1 Selecting the number of queues and the hash algorithm
3.2 Shared ring grant references and event channels
3.2.1 Ring pages
3.2.2 Event channels
4. Summary of main points

1. Rationale
---------------
Network throughput through a single VIF is limited by the processing
power available for a single netback kthread to perform work on the
ring. The single VIF throughput could be scaled up by implementing
multiple queues per VIF. Packets would be directed to one ring or
another by a hash of their headers. Initially, only TCP packets are
considered (all other packets will be presented on the first queue).

Multi-queue VIFs will be serviced by multiple shared ring structures
associated with a single virtual network interface. At present, the
connection of shared rings and event channels is performed by
negotiation between the frontend (domU) and backend (dom0) domains via
XenStore. This document details the proposed additions to this
negotiation that would be required in order to support the setup and
connection of multiple shared rings.

2. Backend feature advertising
------------------------------
The backend advertises the features it supports via keys of the form

/local/domain/0/backend/vif/X/Y/feature-NNN = "1"

where X is the domain ID and Y is the virtual network device number.

In this proposal, a backend that wishes to support multi-queue VIFs
would add the key

/local/domain/0/backend/vif/X/Y/feature-multi-queue = "1"

If this key exists and is set to "1", the frontend may request a
multi-queue configuration. If the key is set to "0", or does not exist,
the backend either does not support this feature, or it has been
disabled.

In addition to the feature flag, a backend which supports
feature-multi-queue would advertise a maximum number of queues, via the
key:

/local/domain/0/backend/vif/X/Y/multi-queue-max-queues

This value is the maximum number of supported ring pairs; each queue
consists of a pair of rings supporting Tx (from guest) and Rx (to
guest). The number of rings in total is twice the value of
multi-queue-max-queues.

Finally, the backend advertises the list of hash algorithms it supports.
Hash algorithms define how network traffic is steered to different
queues, and it is assumed that the back- and frontends will use the same
hash algorithm with the same parameters. The available hash algorithms
are advertised by the backend via the key

/local/domain/0/backend/vif/X/Y/multi-queue-hash-list = "alg1 alg2"

where "alg1 alg2" is a space-separated list of algorithms.

3. Frontend setup
-----------------
The frontend will be expected to look for the feature-multi-queue
XenStore key and, if present and non-zero, query the list of hash
algorithms and the maximum number of queues. It will then choose the
hash algorithm desired (or fall back to single-queue if the frontend and
backend do not have a hash algorithm in common) and set up a number of
XenStore keys to inform the backend of these choices. In single-queue
mode, there is no change from the existing mechanism.

3.1 Selecting the number of queues and the hash algorithm
---------------------------------------------------------
For multi-queue mode, the frontend requests the number of queues
required (between 1 and the maximum advertised by the backend):

/local/domain/X/device/vif/Y/multi-queue-num-queues = "2"

If this key is not present, or is set to "1", single-queue mode is used.

The frontend must also specify the desired hash algorithm as follows:

/local/domain/X/device/vif/Y/multi-queue-hash = "alg1"

where "alg1" is one of the values from multi-queue-hash-list.

In addition to these keys, a number of hash-specific keys may be written
to provide parameters to be used by the hash algorithm. These are not
defined here in the general case, but may be used e.g. to communicate a
key or a mapping between hash value and queue number, for a specific
hash algorithm. The recommendation is that these are grouped together
under a key named something like multi-queue-hash-params-NNN where NNN
is the name of the hash algorithm specified in the multi-queue-hash key.

3.2 Shared ring grant references and event channels
---------------------------------------------------
3.2.1 Ring pages
----------------
It is the responsibility of the frontend to allocate one page for each
ring (i.e. two pages for each queue) and provide a grant reference to
each page, so that the backend may map them. In the single-queue case,
this is done as usual with the tx-ring-ref and rx-ring-ref keys.

For multi-queue, a hierarchical structure is proposed. This serves the
dual purpose of clean separation of grant references between queues and
allows additional mechanisms (e.g. split event channels, multi-page
rings) to replicate their XenStore keys for each queue without name
collisions. For each queue, the frontend should set up the following
keys:

/local/domain/X/device/vif/Y/queue-N/tx-ring-ref
/local/domain/X/device/vif/Y/queue-N/rx-ring-ref

where X is the domain ID, Y is the device ID and N is the queue number
(beginning at zero).

3.2.2 Event channels
--------------------
The upstream netback and netfront code supports
feature-split-event-channels, allowing one channel per ring (instead of
one channel per VIF). When multiple queues are used, the frontend must
write either:

/local/domain/X/device/vif/Y/queue-N/event-channel = "M"

to use a single event channel (number M) for that queue, or

/local/domain/X/device/vif/Y/queue-N/tx-event-channel = "M"
/local/domain/X/device/vif/Y/queue-N/rx-event-channel = "L"

to use split event channels (numbers L, M) for that queue.

4. Summary of main points
-------------------------
- Each queue has two rings (one for Tx, one for Rx).
- An unbalanced set of rings (e.g. more Rx than Tx) would still
leave a bottleneck on the side with fewer rings, so for
simplicity we require matched pairs.

- The frontend may only use hash algorithms that the backend
advertises; if there are no algorithms in common, frontend
initialisation fails.
- The backend must supply at least one fast hash algorithm for Linux
guests
- Note that when Windows frontend support is added, the Toeplitz
algorithm must be supported by the backend. This is relatively
expensive to compute, however.

- Event channels are on a per-queue basis.
- Split event channels may be used for some (or all) queues, again
on a per-queue basis, selected by the presence of
tx-event-channel, rx-event-channel keys in each queue's
keyspace.
- Single event channel (per queue) is selected by the presence of
the event-channel key in the queue's keyspace.
- There is no plan to support a single event channel for all
queues, at present. This may be considered in the future to
reduce the demand for event channels, which are a limited
resource.

- Hash-specific configuration will reside in a hash-specific sub-key,
likely named something along the lines of
multi-queue-hash-params-NNN where NNN is the name of the hash
algorithm. The contents will depend on the algorithm selected and
are not specified here.
- All other configuration applies to the VIF as a whole, whether
single- or multi-queue.
- Again, there is the option to move keys into the queue hierarchy
to allow per-queue configuration at a later date.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [RFC] Proposed XenStore Interactions for Multi-Queue VIFs [ In reply to ]
On Wed, Jun 26, 2013 at 05:59:30PM +0100, Andrew Bennieston wrote:
> I'm posting this for an initial round of comments; I don't have any code
> at present to implement this, and wanted to get some feedback before
> getting started. All comments welcome :)
>

Cool!

Some thoughts inlined.

> Andrew.
>
>
> Proposed XenStore Interactions for Multi-Queue VIFs
> ========================================================================
> Andrew J. Bennieston <andrew.bennieston@citrix.com> June 26 2013
>
> Contents
> --------
> 1. Rationale
> 2. Backend feature advertising
> 3. Frontend setup
> 3.1 Selecting the number of queues and the hash algorithm
> 3.2 Shared ring grant references and event channels
> 3.2.1 Ring pages
> 3.2.2 Event channels
> 4. Summary of main points
>
> 1. Rationale
> ---------------
> Network throughput through a single VIF is limited by the processing
> power available for a single netback kthread to perform work on the
> ring. The single VIF throughput could be scaled up by implementing
> multiple queues per VIF. Packets would be directed to one ring or
> another by a hash of their headers. Initially, only TCP packets are
> considered (all other packets will be presented on the first queue).
>
> Multi-queue VIFs will be serviced by multiple shared ring structures
> associated with a single virtual network interface. At present, the
> connection of shared rings and event channels is performed by
> negotiation between the frontend (domU) and backend (dom0) domains via
> XenStore. This document details the proposed additions to this
> negotiation that would be required in order to support the setup and
> connection of multiple shared rings.
>
> 2. Backend feature advertising
> ------------------------------
> The backend advertises the features it supports via keys of the form
>
> /local/domain/0/backend/vif/X/Y/feature-NNN = "1"
>
> where X is the domain ID and Y is the virtual network device number.
>
> In this proposal, a backend that wishes to support multi-queue VIFs
> would add the key
>
> /local/domain/0/backend/vif/X/Y/feature-multi-queue = "1"
>
> If this key exists and is set to "1", the frontend may request a
> multi-queue configuration. If the key is set to "0", or does not exist,
> the backend either does not support this feature, or it has been
> disabled.
>
> In addition to the feature flag, a backend which supports
> feature-multi-queue would advertise a maximum number of queues, via the
> key:
>
> /local/domain/0/backend/vif/X/Y/multi-queue-max-queues
>
> This value is the maximum number of supported ring pairs; each queue
> consists of a pair of rings supporting Tx (from guest) and Rx (to
> guest). The number of rings in total is twice the value of
> multi-queue-max-queues.
>
> Finally, the backend advertises the list of hash algorithms it supports.
> Hash algorithms define how network traffic is steered to different
> queues, and it is assumed that the back- and frontends will use the same
> hash algorithm with the same parameters. The available hash algorithms
> are advertised by the backend via the key
>
> /local/domain/0/backend/vif/X/Y/multi-queue-hash-list = "alg1 alg2"
>
> where "alg1 alg2" is a space-separated list of algorithms.
>
> 3. Frontend setup
> -----------------
> The frontend will be expected to look for the feature-multi-queue
> XenStore key and, if present and non-zero, query the list of hash
> algorithms and the maximum number of queues. It will then choose the
> hash algorithm desired (or fall back to single-queue if the frontend and
> backend do not have a hash algorithm in common) and set up a number of
> XenStore keys to inform the backend of these choices. In single-queue
> mode, there is no change from the existing mechanism.
>
> 3.1 Selecting the number of queues and the hash algorithm
> ---------------------------------------------------------
> For multi-queue mode, the frontend requests the number of queues
> required (between 1 and the maximum advertised by the backend):
>
> /local/domain/X/device/vif/Y/multi-queue-num-queues = "2"
>
> If this key is not present, or is set to "1", single-queue mode is used.
>
> The frontend must also specify the desired hash algorithm as follows:
>
> /local/domain/X/device/vif/Y/multi-queue-hash = "alg1"
>
> where "alg1" is one of the values from multi-queue-hash-list.
>
> In addition to these keys, a number of hash-specific keys may be written
> to provide parameters to be used by the hash algorithm. These are not
> defined here in the general case, but may be used e.g. to communicate a
> key or a mapping between hash value and queue number, for a specific
> hash algorithm. The recommendation is that these are grouped together
> under a key named something like multi-queue-hash-params-NNN where NNN
> is the name of the hash algorithm specified in the multi-queue-hash key.
>

Grouping things together then parse this string in backend increases
backend complexity. If it is just something like comma / space separated
positioned list that would be fine (but then you need to clearly
document the position); if that's something more complex like a list of
key-value I think it would be better to have several
multi-queue-hash-params-NNN-KEY -> value
in Xenstore, other than trying to parse complex string.

Or even better, like the scheme you proposed for ring pages: create
hierarchical structure for parameters.

> 3.2 Shared ring grant references and event channels
> ---------------------------------------------------
> 3.2.1 Ring pages
> ----------------
> It is the responsibility of the frontend to allocate one page for each
> ring (i.e. two pages for each queue) and provide a grant reference to
> each page, so that the backend may map them. In the single-queue case,
> this is done as usual with the tx-ring-ref and rx-ring-ref keys.
>
> For multi-queue, a hierarchical structure is proposed. This serves the
> dual purpose of clean separation of grant references between queues and
> allows additional mechanisms (e.g. split event channels, multi-page
> rings) to replicate their XenStore keys for each queue without name
> collisions. For each queue, the frontend should set up the following
> keys:
>
> /local/domain/X/device/vif/Y/queue-N/tx-ring-ref
> /local/domain/X/device/vif/Y/queue-N/rx-ring-ref
>
> where X is the domain ID, Y is the device ID and N is the queue number
> (beginning at zero).
>
> 3.2.2 Event channels
> --------------------
> The upstream netback and netfront code supports
> feature-split-event-channels, allowing one channel per ring (instead of
> one channel per VIF). When multiple queues are used, the frontend must
> write either:
>
> /local/domain/X/device/vif/Y/queue-N/event-channel = "M"
>
> to use a single event channel (number M) for that queue, or
>
> /local/domain/X/device/vif/Y/queue-N/tx-event-channel = "M"
> /local/domain/X/device/vif/Y/queue-N/rx-event-channel = "L"
>
> to use split event channels (numbers L, M) for that queue.
>
> 4. Summary of main points
> -------------------------
> - Each queue has two rings (one for Tx, one for Rx).
> - An unbalanced set of rings (e.g. more Rx than Tx) would still
> leave a bottleneck on the side with fewer rings, so for
> simplicity we require matched pairs.
>
> - The frontend may only use hash algorithms that the backend
> advertises; if there are no algorithms in common, frontend
> initialisation fails.

Then fall back to single queue mode (i.e. the original mode). In fact, I
would not expect either end to start initialising before they sort out
what's present and what's not. Or it's just a wording issue, I would
call "frontend checking available algorithms (or any other parameters)"
negotiation phase.

A common trick would be frontend checks what backend offers and
determine whether to request this feature with a key called
"request-FEATURE-NAME". (yeah I know I didn't do that for split event
channels, sorry...).

> - The backend must supply at least one fast hash algorithm for Linux
> guests
> - Note that when Windows frontend support is added, the Toeplitz
> algorithm must be supported by the backend. This is relatively
> expensive to compute, however.
>
> - Event channels are on a per-queue basis.
> - Split event channels may be used for some (or all) queues, again
> on a per-queue basis, selected by the presence of
> tx-event-channel, rx-event-channel keys in each queue's
> keyspace.
> - Single event channel (per queue) is selected by the presence of
> the event-channel key in the queue's keyspace.
> - There is no plan to support a single event channel for all
> queues, at present. This may be considered in the future to
> reduce the demand for event channels, which are a limited
> resource.
>

How do you plan to map those queues to backend processing routines? One
queue per backend routine? If so a single event channel for all queues
is not a very good idea because that would need to wake several
backend routines and eventually performance suffers.

And for the frontend the situation is almost the same.

Would it be better, when resource is tight, to tell the host admin to
disable this feature?

> - Hash-specific configuration will reside in a hash-specific sub-key,
> likely named something along the lines of
> multi-queue-hash-params-NNN where NNN is the name of the hash
> algorithm. The contents will depend on the algorithm selected and
> are not specified here.
> - All other configuration applies to the VIF as a whole, whether
> single- or multi-queue.
> - Again, there is the option to move keys into the queue hierarchy
> to allow per-queue configuration at a later date.
>


Finally, just out of my curiosity, is it possible that any of the
parameters change when the guest is running?


Wei.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [RFC] Proposed XenStore Interactions for Multi-Queue VIFs [ In reply to ]
>>> On 26.06.13 at 18:59, Andrew Bennieston <andrew.bennieston@citrix.com> wrote:
> 2. Backend feature advertising
> ------------------------------
> The backend advertises the features it supports via keys of the form
>
> /local/domain/0/backend/vif/X/Y/feature-NNN = "1"
>
> where X is the domain ID and Y is the virtual network device number.
>
> In this proposal, a backend that wishes to support multi-queue VIFs
> would add the key
>
> /local/domain/0/backend/vif/X/Y/feature-multi-queue = "1"
>
> If this key exists and is set to "1", the frontend may request a
> multi-queue configuration. If the key is set to "0", or does not exist,
> the backend either does not support this feature, or it has been
> disabled.
>
> In addition to the feature flag, a backend which supports
> feature-multi-queue would advertise a maximum number of queues, via the
> key:
>
> /local/domain/0/backend/vif/X/Y/multi-queue-max-queues
>
> This value is the maximum number of supported ring pairs; each queue
> consists of a pair of rings supporting Tx (from guest) and Rx (to
> guest). The number of rings in total is twice the value of
> multi-queue-max-queues.

I pretty much dislike this redundant advertisement - a single
key absolutely suffices here - absence of the key or a value
<= 1 are a sufficient indication of the feature not being
supported.

> 3.2 Shared ring grant references and event channels
> ---------------------------------------------------
> 3.2.1 Ring pages
> ----------------
> It is the responsibility of the frontend to allocate one page for each
> ring (i.e. two pages for each queue) and provide a grant reference to
> each page, so that the backend may map them. In the single-queue case,
> this is done as usual with the tx-ring-ref and rx-ring-ref keys.
>
> For multi-queue, a hierarchical structure is proposed. This serves the
> dual purpose of clean separation of grant references between queues and
> allows additional mechanisms (e.g. split event channels, multi-page
> rings) to replicate their XenStore keys for each queue without name
> collisions. For each queue, the frontend should set up the following
> keys:
>
> /local/domain/X/device/vif/Y/queue-N/tx-ring-ref
> /local/domain/X/device/vif/Y/queue-N/rx-ring-ref
>
> where X is the domain ID, Y is the device ID and N is the queue number
> (beginning at zero).
>
> 3.2.2 Event channels
> --------------------
> The upstream netback and netfront code supports
> feature-split-event-channels, allowing one channel per ring (instead of
> one channel per VIF). When multiple queues are used, the frontend must
> write either:
>
> /local/domain/X/device/vif/Y/queue-N/event-channel = "M"
>
> to use a single event channel (number M) for that queue, or
>
> /local/domain/X/device/vif/Y/queue-N/tx-event-channel = "M"
> /local/domain/X/device/vif/Y/queue-N/rx-event-channel = "L"
>
> to use split event channels (numbers L, M) for that queue.

Other than Wei, I'm actually in favor of this model. I don't see this
adding much complexity to the parsing logic in the backend: It's a
simply loop over the requested queue count, otherwise doing the
same operations as it does currently.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [RFC] Proposed XenStore Interactions for Multi-Queue VIFs [ In reply to ]
On 26/06/13 18:51, Wei Liu wrote:
> On Wed, Jun 26, 2013 at 05:59:30PM +0100, Andrew Bennieston wrote:
>> I'm posting this for an initial round of comments; I don't have any code
>> at present to implement this, and wanted to get some feedback before
>> getting started. All comments welcome :)
>>
>
> Cool!
>
> Some thoughts inlined.
>
>> Andrew.
>>
>>
>> Proposed XenStore Interactions for Multi-Queue VIFs
>> ========================================================================
>> Andrew J. Bennieston <andrew.bennieston@citrix.com> June 26 2013
>>
>> Contents
>> --------
>> 1. Rationale
>> 2. Backend feature advertising
>> 3. Frontend setup
>> 3.1 Selecting the number of queues and the hash algorithm
>> 3.2 Shared ring grant references and event channels
>> 3.2.1 Ring pages
>> 3.2.2 Event channels
>> 4. Summary of main points
>>
>> 1. Rationale
>> ---------------
>> Network throughput through a single VIF is limited by the processing
>> power available for a single netback kthread to perform work on the
>> ring. The single VIF throughput could be scaled up by implementing
>> multiple queues per VIF. Packets would be directed to one ring or
>> another by a hash of their headers. Initially, only TCP packets are
>> considered (all other packets will be presented on the first queue).
>>
>> Multi-queue VIFs will be serviced by multiple shared ring structures
>> associated with a single virtual network interface. At present, the
>> connection of shared rings and event channels is performed by
>> negotiation between the frontend (domU) and backend (dom0) domains via
>> XenStore. This document details the proposed additions to this
>> negotiation that would be required in order to support the setup and
>> connection of multiple shared rings.
>>
>> 2. Backend feature advertising
>> ------------------------------
>> The backend advertises the features it supports via keys of the form
>>
>> /local/domain/0/backend/vif/X/Y/feature-NNN = "1"
>>
>> where X is the domain ID and Y is the virtual network device number.
>>
>> In this proposal, a backend that wishes to support multi-queue VIFs
>> would add the key
>>
>> /local/domain/0/backend/vif/X/Y/feature-multi-queue = "1"
>>
>> If this key exists and is set to "1", the frontend may request a
>> multi-queue configuration. If the key is set to "0", or does not exist,
>> the backend either does not support this feature, or it has been
>> disabled.
>>
>> In addition to the feature flag, a backend which supports
>> feature-multi-queue would advertise a maximum number of queues, via the
>> key:
>>
>> /local/domain/0/backend/vif/X/Y/multi-queue-max-queues
>>
>> This value is the maximum number of supported ring pairs; each queue
>> consists of a pair of rings supporting Tx (from guest) and Rx (to
>> guest). The number of rings in total is twice the value of
>> multi-queue-max-queues.
>>
>> Finally, the backend advertises the list of hash algorithms it supports.
>> Hash algorithms define how network traffic is steered to different
>> queues, and it is assumed that the back- and frontends will use the same
>> hash algorithm with the same parameters. The available hash algorithms
>> are advertised by the backend via the key
>>
>> /local/domain/0/backend/vif/X/Y/multi-queue-hash-list = "alg1 alg2"
>>
>> where "alg1 alg2" is a space-separated list of algorithms.
>>
>> 3. Frontend setup
>> -----------------
>> The frontend will be expected to look for the feature-multi-queue
>> XenStore key and, if present and non-zero, query the list of hash
>> algorithms and the maximum number of queues. It will then choose the
>> hash algorithm desired (or fall back to single-queue if the frontend and
>> backend do not have a hash algorithm in common) and set up a number of
>> XenStore keys to inform the backend of these choices. In single-queue
>> mode, there is no change from the existing mechanism.
>>
>> 3.1 Selecting the number of queues and the hash algorithm
>> ---------------------------------------------------------
>> For multi-queue mode, the frontend requests the number of queues
>> required (between 1 and the maximum advertised by the backend):
>>
>> /local/domain/X/device/vif/Y/multi-queue-num-queues = "2"
>>
>> If this key is not present, or is set to "1", single-queue mode is used.
>>
>> The frontend must also specify the desired hash algorithm as follows:
>>
>> /local/domain/X/device/vif/Y/multi-queue-hash = "alg1"
>>
>> where "alg1" is one of the values from multi-queue-hash-list.
>>
>> In addition to these keys, a number of hash-specific keys may be written
>> to provide parameters to be used by the hash algorithm. These are not
>> defined here in the general case, but may be used e.g. to communicate a
>> key or a mapping between hash value and queue number, for a specific
>> hash algorithm. The recommendation is that these are grouped together
>> under a key named something like multi-queue-hash-params-NNN where NNN
>> is the name of the hash algorithm specified in the multi-queue-hash key.
>>
>
> Grouping things together then parse this string in backend increases
> backend complexity. If it is just something like comma / space separated
> positioned list that would be fine (but then you need to clearly
> document the position); if that's something more complex like a list of
> key-value I think it would be better to have several
> multi-queue-hash-params-NNN-KEY -> value
> in Xenstore, other than trying to parse complex string.
>
> Or even better, like the scheme you proposed for ring pages: create
> hierarchical structure for parameters.

I intended a hierarchical structure here, e.g.
.../multi-queue-hash-params-alg1/key = "somekey"
../multi-queue-hash-params-alg1/map = "<mapping of hash to queue

Once the algorithm has been selected, the 'root' of this structure is
fixed (i.e. 'multi-queue-hash-params-' concatenated with the name of the
algorithm). The hash-dependent keys will be subkeys of that.

I can see how the language I used was unclear, though.

>
>> 3.2 Shared ring grant references and event channels
>> ---------------------------------------------------
>> 3.2.1 Ring pages
>> ----------------
>> It is the responsibility of the frontend to allocate one page for each
>> ring (i.e. two pages for each queue) and provide a grant reference to
>> each page, so that the backend may map them. In the single-queue case,
>> this is done as usual with the tx-ring-ref and rx-ring-ref keys.
>>
>> For multi-queue, a hierarchical structure is proposed. This serves the
>> dual purpose of clean separation of grant references between queues and
>> allows additional mechanisms (e.g. split event channels, multi-page
>> rings) to replicate their XenStore keys for each queue without name
>> collisions. For each queue, the frontend should set up the following
>> keys:
>>
>> /local/domain/X/device/vif/Y/queue-N/tx-ring-ref
>> /local/domain/X/device/vif/Y/queue-N/rx-ring-ref
>>
>> where X is the domain ID, Y is the device ID and N is the queue number
>> (beginning at zero).
>>
>> 3.2.2 Event channels
>> --------------------
>> The upstream netback and netfront code supports
>> feature-split-event-channels, allowing one channel per ring (instead of
>> one channel per VIF). When multiple queues are used, the frontend must
>> write either:
>>
>> /local/domain/X/device/vif/Y/queue-N/event-channel = "M"
>>
>> to use a single event channel (number M) for that queue, or
>>
>> /local/domain/X/device/vif/Y/queue-N/tx-event-channel = "M"
>> /local/domain/X/device/vif/Y/queue-N/rx-event-channel = "L"
>>
>> to use split event channels (numbers L, M) for that queue.
>>
>> 4. Summary of main points
>> -------------------------
>> - Each queue has two rings (one for Tx, one for Rx).
>> - An unbalanced set of rings (e.g. more Rx than Tx) would still
>> leave a bottleneck on the side with fewer rings, so for
>> simplicity we require matched pairs.
>>
>> - The frontend may only use hash algorithms that the backend
>> advertises; if there are no algorithms in common, frontend
>> initialisation fails.
>
> Then fall back to single queue mode (i.e. the original mode). In fact, I
> would not expect either end to start initialising before they sort out
> what's present and what's not. Or it's just a wording issue, I would
> call "frontend checking available algorithms (or any other parameters)"
> negotiation phase.
>
> A common trick would be frontend checks what backend offers and
> determine whether to request this feature with a key called
> "request-FEATURE-NAME". (yeah I know I didn't do that for split event
> channels, sorry...).
>

Yes, falling back to single-queue mode would be desirable here. I'm not
sure that 'request-feature-multi-queue' is necessary, because if the
'multi-queue-num-queues' key is present, and has a value > 1, the
multi-queue feature is assumed to be requested.

>> - The backend must supply at least one fast hash algorithm for Linux
>> guests
>> - Note that when Windows frontend support is added, the Toeplitz
>> algorithm must be supported by the backend. This is relatively
>> expensive to compute, however.
>>
>> - Event channels are on a per-queue basis.
>> - Split event channels may be used for some (or all) queues, again
>> on a per-queue basis, selected by the presence of
>> tx-event-channel, rx-event-channel keys in each queue's
>> keyspace.
>> - Single event channel (per queue) is selected by the presence of
>> the event-channel key in the queue's keyspace.
>> - There is no plan to support a single event channel for all
>> queues, at present. This may be considered in the future to
>> reduce the demand for event channels, which are a limited
>> resource.
>>
>
> How do you plan to map those queues to backend processing routines? One
> queue per backend routine? If so a single event channel for all queues
> is not a very good idea because that would need to wake several
> backend routines and eventually performance suffers.
One queue per backend routine is the plan.
>
> And for the frontend the situation is almost the same.
>
> Would it be better, when resource is tight, to tell the host admin to
> disable this feature?
Yes.

I have no intention of allowing one event channel for all queues; this
would just shift the bottleneck into waking each queue and checking for
work. One event channel per queue (or, with
feature-split-event-channels, two per queue) is a much more realistic
strategy.

>
>> - Hash-specific configuration will reside in a hash-specific sub-key,
>> likely named something along the lines of
>> multi-queue-hash-params-NNN where NNN is the name of the hash
>> algorithm. The contents will depend on the algorithm selected and
>> are not specified here.
>> - All other configuration applies to the VIF as a whole, whether
>> single- or multi-queue.
>> - Again, there is the option to move keys into the queue hierarchy
>> to allow per-queue configuration at a later date.
>>
>
>
> Finally, just out of my curiosity, is it possible that any of the
> parameters change when the guest is running?

There is. The Toeplitz algorithm that Windows specifies may periodically
update the 'table' that steers different hash values to different queues
(e.g. if it determines that one queue is being over-utilised). The
backend is expected to respond to this change within a reasonable
timeframe. This means that, with certain hash algorithms, the backend
may be expected to watch a XenStore key and respond accordingly to
changes.

Andrew.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [RFC] Proposed XenStore Interactions for Multi-Queue VIFs [ In reply to ]
On 27/06/13 10:07, Jan Beulich wrote:
>>>> On 26.06.13 at 18:59, Andrew Bennieston <andrew.bennieston@citrix.com> wrote:
>> 2. Backend feature advertising
>> ------------------------------
>> The backend advertises the features it supports via keys of the form
>>
>> /local/domain/0/backend/vif/X/Y/feature-NNN = "1"
>>
>> where X is the domain ID and Y is the virtual network device number.
>>
>> In this proposal, a backend that wishes to support multi-queue VIFs
>> would add the key
>>
>> /local/domain/0/backend/vif/X/Y/feature-multi-queue = "1"
>>
>> If this key exists and is set to "1", the frontend may request a
>> multi-queue configuration. If the key is set to "0", or does not exist,
>> the backend either does not support this feature, or it has been
>> disabled.
>>
>> In addition to the feature flag, a backend which supports
>> feature-multi-queue would advertise a maximum number of queues, via the
>> key:
>>
>> /local/domain/0/backend/vif/X/Y/multi-queue-max-queues
>>
>> This value is the maximum number of supported ring pairs; each queue
>> consists of a pair of rings supporting Tx (from guest) and Rx (to
>> guest). The number of rings in total is twice the value of
>> multi-queue-max-queues.
>
> I pretty much dislike this redundant advertisement - a single
> key absolutely suffices here - absence of the key or a value
> <= 1 are a sufficient indication of the feature not being
> supported.
>

You're right; I explicitly did not put a 'request-feature-multi-queue'
in the frontend keys for this reason, but this one slipped past! The
presence of multi-queue-max-queues > 1 is certainly sufficient to
advertise multi-queue support.

>> 3.2 Shared ring grant references and event channels
>> ---------------------------------------------------
>> 3.2.1 Ring pages
>> ----------------
>> It is the responsibility of the frontend to allocate one page for each
>> ring (i.e. two pages for each queue) and provide a grant reference to
>> each page, so that the backend may map them. In the single-queue case,
>> this is done as usual with the tx-ring-ref and rx-ring-ref keys.
>>
>> For multi-queue, a hierarchical structure is proposed. This serves the
>> dual purpose of clean separation of grant references between queues and
>> allows additional mechanisms (e.g. split event channels, multi-page
>> rings) to replicate their XenStore keys for each queue without name
>> collisions. For each queue, the frontend should set up the following
>> keys:
>>
>> /local/domain/X/device/vif/Y/queue-N/tx-ring-ref
>> /local/domain/X/device/vif/Y/queue-N/rx-ring-ref
>>
>> where X is the domain ID, Y is the device ID and N is the queue number
>> (beginning at zero).
>>
>> 3.2.2 Event channels
>> --------------------
>> The upstream netback and netfront code supports
>> feature-split-event-channels, allowing one channel per ring (instead of
>> one channel per VIF). When multiple queues are used, the frontend must
>> write either:
>>
>> /local/domain/X/device/vif/Y/queue-N/event-channel = "M"
>>
>> to use a single event channel (number M) for that queue, or
>>
>> /local/domain/X/device/vif/Y/queue-N/tx-event-channel = "M"
>> /local/domain/X/device/vif/Y/queue-N/rx-event-channel = "L"
>>
>> to use split event channels (numbers L, M) for that queue.
>
> Other than Wei, I'm actually in favor of this model. I don't see this
> adding much complexity to the parsing logic in the backend: It's a
> simply loop over the requested queue count, otherwise doing the
> same operations as it does currently.

Indeed. I think Wei was referring to the hash-specific parameters, which
will be "grouped" (but have one key per parameter) according to, e.g.
/local/domain/X/device/vif/Y/multi-queue-hash-params-alg1/key = "..."
/local/domain/X/device/vif/Y/multi-queue-hash-params-alg1/map = "..."

Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [RFC] Proposed XenStore Interactions for Multi-Queue VIFs [ In reply to ]
On 27.06.13 12:11, Andrew Bennieston wrote:
> On 27/06/13 10:07, Jan Beulich wrote:
>>>>> On 26.06.13 at 18:59, Andrew Bennieston
>>>>> <andrew.bennieston@citrix.com> wrote:
>>> 2. Backend feature advertising
>>> ------------------------------
>>> The backend advertises the features it supports via keys of the form
>>>
>>> /local/domain/0/backend/vif/X/Y/feature-NNN = "1"
>>>
>>> where X is the domain ID and Y is the virtual network device number.
>>>
>>> In this proposal, a backend that wishes to support multi-queue VIFs
>>> would add the key
>>>
>>> /local/domain/0/backend/vif/X/Y/feature-multi-queue = "1"
>>>
>>> If this key exists and is set to "1", the frontend may request a
>>> multi-queue configuration. If the key is set to "0", or does not exist,
>>> the backend either does not support this feature, or it has been
>>> disabled.
>>>
>>> In addition to the feature flag, a backend which supports
>>> feature-multi-queue would advertise a maximum number of queues, via the
>>> key:
>>>
>>> /local/domain/0/backend/vif/X/Y/multi-queue-max-queues
>>>
>>> This value is the maximum number of supported ring pairs; each queue
>>> consists of a pair of rings supporting Tx (from guest) and Rx (to
>>> guest). The number of rings in total is twice the value of
>>> multi-queue-max-queues.
>>
>> I pretty much dislike this redundant advertisement - a single
>> key absolutely suffices here - absence of the key or a value
>> <= 1 are a sufficient indication of the feature not being
>> supported.
>>
>
> You're right; I explicitly did not put a 'request-feature-multi-queue'
> in the frontend keys for this reason, but this one slipped past! The
> presence of multi-queue-max-queues > 1 is certainly sufficient to
> advertise multi-queue support.

I advertise a scheme that enlists all features in the Dom0 with

xenstore-ls | grep "feature-"

That makes it easier to get a clue what is put in place and what
is used.

Christoph

>
>>> 3.2 Shared ring grant references and event channels
>>> ---------------------------------------------------
>>> 3.2.1 Ring pages
>>> ----------------
>>> It is the responsibility of the frontend to allocate one page for each
>>> ring (i.e. two pages for each queue) and provide a grant reference to
>>> each page, so that the backend may map them. In the single-queue case,
>>> this is done as usual with the tx-ring-ref and rx-ring-ref keys.
>>>
>>> For multi-queue, a hierarchical structure is proposed. This serves the
>>> dual purpose of clean separation of grant references between queues and
>>> allows additional mechanisms (e.g. split event channels, multi-page
>>> rings) to replicate their XenStore keys for each queue without name
>>> collisions. For each queue, the frontend should set up the following
>>> keys:
>>>
>>> /local/domain/X/device/vif/Y/queue-N/tx-ring-ref
>>> /local/domain/X/device/vif/Y/queue-N/rx-ring-ref
>>>
>>> where X is the domain ID, Y is the device ID and N is the queue number
>>> (beginning at zero).
>>>
>>> 3.2.2 Event channels
>>> --------------------
>>> The upstream netback and netfront code supports
>>> feature-split-event-channels, allowing one channel per ring (instead of
>>> one channel per VIF). When multiple queues are used, the frontend must
>>> write either:
>>>
>>> /local/domain/X/device/vif/Y/queue-N/event-channel = "M"
>>>
>>> to use a single event channel (number M) for that queue, or
>>>
>>> /local/domain/X/device/vif/Y/queue-N/tx-event-channel = "M"
>>> /local/domain/X/device/vif/Y/queue-N/rx-event-channel = "L"
>>>
>>> to use split event channels (numbers L, M) for that queue.
>>
>> Other than Wei, I'm actually in favor of this model. I don't see this
>> adding much complexity to the parsing logic in the backend: It's a
>> simply loop over the requested queue count, otherwise doing the
>> same operations as it does currently.
>
> Indeed. I think Wei was referring to the hash-specific parameters, which
> will be "grouped" (but have one key per parameter) according to, e.g.
> /local/domain/X/device/vif/Y/multi-queue-hash-params-alg1/key = "..."
> /local/domain/X/device/vif/Y/multi-queue-hash-params-alg1/map = "..."
>
> Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [RFC] Proposed XenStore Interactions for Multi-Queue VIFs [ In reply to ]
On Thu, Jun 27, 2013 at 11:07:36AM +0100, Andrew Bennieston wrote:
[...]
> >Grouping things together then parse this string in backend increases
> >backend complexity. If it is just something like comma / space separated
> >positioned list that would be fine (but then you need to clearly
> >document the position); if that's something more complex like a list of
> >key-value I think it would be better to have several
> > multi-queue-hash-params-NNN-KEY -> value
> >in Xenstore, other than trying to parse complex string.
> >
> >Or even better, like the scheme you proposed for ring pages: create
> >hierarchical structure for parameters.
>
> I intended a hierarchical structure here, e.g.
> .../multi-queue-hash-params-alg1/key = "somekey"
> ../multi-queue-hash-params-alg1/map = "<mapping of hash to queue
> number>"
>
> Once the algorithm has been selected, the 'root' of this structure is
> fixed (i.e. 'multi-queue-hash-params-' concatenated with the name of the
> algorithm). The hash-dependent keys will be subkeys of that.
>
> I can see how the language I used was unclear, though.
>

Now I understand the subtle meaning of "grouping". My understanding was
"concatenating options into one big string" while you meant "put things
under hierachy". :-)

> >
> >>3.2 Shared ring grant references and event channels
> >>---------------------------------------------------
[...]
> >>4. Summary of main points
> >>-------------------------
> >>- Each queue has two rings (one for Tx, one for Rx).
> >> - An unbalanced set of rings (e.g. more Rx than Tx) would still
> >> leave a bottleneck on the side with fewer rings, so for
> >> simplicity we require matched pairs.
> >>
> >>- The frontend may only use hash algorithms that the backend
> >> advertises; if there are no algorithms in common, frontend
> >> initialisation fails.
> >
> >Then fall back to single queue mode (i.e. the original mode). In fact, I
> >would not expect either end to start initialising before they sort out
> >what's present and what's not. Or it's just a wording issue, I would
> >call "frontend checking available algorithms (or any other parameters)"
> >negotiation phase.
> >
> >A common trick would be frontend checks what backend offers and
> >determine whether to request this feature with a key called
> >"request-FEATURE-NAME". (yeah I know I didn't do that for split event
> >channels, sorry...).
> >
>
> Yes, falling back to single-queue mode would be desirable here. I'm not
> sure that 'request-feature-multi-queue' is necessary, because if the
> 'multi-queue-num-queues' key is present, and has a value > 1, the
> multi-queue feature is assumed to be requested.
>

This would do the job too. I just thought "request-XXX" has single
purpose and no implication. I don't have strong opinion on this.

> >>- The backend must supply at least one fast hash algorithm for Linux
> >> guests
> >> - Note that when Windows frontend support is added, the Toeplitz
> >> algorithm must be supported by the backend. This is relatively
> >> expensive to compute, however.
> >>
[...]
> >
> >Finally, just out of my curiosity, is it possible that any of the
> >parameters change when the guest is running?
>
> There is. The Toeplitz algorithm that Windows specifies may periodically
> update the 'table' that steers different hash values to different queues
> (e.g. if it determines that one queue is being over-utilised). The
> backend is expected to respond to this change within a reasonable
> timeframe. This means that, with certain hash algorithms, the backend
> may be expected to watch a XenStore key and respond accordingly to
> changes.
>

OK, got it.


Wei.

> Andrew.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel