Mailing List Archive

[patch 1/2] use chacha20 from openssl (1.1.0+) when possible
On some cpu's optimized chacha implementation in openssl (1.1.0+) is
notably faster (and on others it is just faster) than generic C
implementation in openssh.

Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses
different scheme (with padding/etc - see rfc8439) and it looks it is not
possible to use in openssh.

OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I
have not tried it yet (it was not in 1.1.0).

Trivial benchmark:
time ssh -c chacha20-poly1305@openssh.com -S none -o Compression=no \
localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null
(comparing "user time" only)

openssh: 7.9p1, self-compiled, based on upstream package from
debian/unstable, hostkey - ecdsa/p256, pubkey auth key - ecdh/p256

Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI)
OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1
i386: speed: +8%
amd64: speed: +10%

Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz)
OS: raspbian/stretch

baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s

with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6 without neon):

armhf/raspbian: 24.7 seconds, speed: +25%

with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for
armv7 with neon autodetection):
armhf: 22.2 seconds, speed: +39%

Patches against 7.9p1 (tested) and git master (untested, only resolved
configure.ac conflict) attached.
Re: [patch 1/2] use chacha20 from openssl (1.1.0+) when possible [ In reply to ]
On Thu, 17 Jan 2019, Yuriy M. Kaminskiy wrote:

> On some cpu's optimized chacha implementation in openssl (1.1.0+) is
> notably faster (and on others it is just faster) than generic C
> implementation in openssh.
>
> Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses
> different scheme (with padding/etc - see rfc8439) and it looks it is not
> possible to use in openssh.
>
> OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I
> have not tried it yet (it was not in 1.1.0).
>
> Trivial benchmark:
> time ssh -c chacha20-poly1305@openssh.com -S none -o Compression=no \
> localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null
> (comparing "user time" only)
>
> openssh: 7.9p1, self-compiled, based on upstream package from debian/unstable,
> hostkey - ecdsa/p256, pubkey auth key - ecdh/p256
>
> Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI)
> OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1
> i386: speed: +8%
> amd64: speed: +10%
>
> Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz)
> OS: raspbian/stretch
>
> baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s
>
> with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6 without neon):
>
> armhf/raspbian: 24.7 seconds, speed: +25%
>
> with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for
> armv7 with neon autodetection):
> armhf: 22.2 seconds, speed: +39%
>
> Patches against 7.9p1 (tested) and git master (untested, only resolved
> configure.ac conflict) attached.

Thanks for this - it seems to work okay with OpenSSL when patched to
-current, but when I adapt it for OpenBSD/LibreSSL the encryption is
broken and the connection fails right after KEX.

I expect that there is some difference between OpenSSL and LibreSSL wrt
IV lengths or something. OpenSSH does need to support both, so this will
take a little figuring out.

One comment on the patch itself: it passes do_encrypt though in a bunch
of places and I'm not sure the usage is correct in all of them. In fact
I don't think it can even be made consistent for decryption, as the
ctx->main_evp has to be used in encryption mode (not decryption) to
generate the poly1305 key.

Given this is a stream cipher and there is AFAIK no difference between
encryption and decryption, I think it would be better just fix do_encrypt
to 1 to avoid inconsistency.

-d

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: [patch 1/2] use chacha20 from openssl (1.1.0+) when possible [ In reply to ]
On Fri, 2019-07-12 at 15:54 +1000, Damien Miller wrote:
> On Thu, 17 Jan 2019, Yuriy M. Kaminskiy wrote:
>
> > On some cpu's optimized chacha implementation in openssl (1.1.0+)
> > is
> > notably faster (and on others it is just faster) than generic C
> > implementation in openssh.
> >
> > Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses
> > different scheme (with padding/etc - see rfc8439) and it looks it
> > is not
> > possible to use in openssh.
> >
> > OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I
> > have not tried it yet (it was not in 1.1.0).
> >
> > Trivial benchmark:
> > time ssh -c chacha20-poly1305@openssh.com -S none -o Compression=no
> > \
> > localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null
> > (comparing "user time" only)
> >
> > openssh: 7.9p1, self-compiled, based on upstream package from
> > debian/unstable,
> > hostkey - ecdsa/p256, pubkey auth key - ecdh/p256
> >
> > Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI)
> > OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1
> > i386: speed: +8%
> > amd64: speed: +10%
> >
> > Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz)
> > OS: raspbian/stretch
> >
> > baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s
> >
> > with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6
> > without neon):
> >
> > armhf/raspbian: 24.7 seconds, speed: +25%
> >
> > with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for
> > armv7 with neon autodetection):
> > armhf: 22.2 seconds, speed: +39%
> >
> > Patches against 7.9p1 (tested) and git master (untested, only
> > resolved
> > configure.ac conflict) attached.
>
> Thanks for this - it seems to work okay with OpenSSL when patched to
> -current, but when I adapt it for OpenBSD/LibreSSL the encryption is
> broken and the connection fails right after KEX.
>
> I expect that there is some difference between OpenSSL and LibreSSL
> wrt
> IV lengths or something. OpenSSH does need to support both, so this
> will
> take a little figuring out.
>
> One comment on the patch itself: it passes do_encrypt though in a
> bunch
> of places and I'm not sure the usage is correct in all of them. In
> fact
> I don't think it can even be made consistent for decryption, as the
> ctx->main_evp has to be used in encryption mode (not decryption) to
> generate the poly1305 key.
>
> Given this is a stream cipher and there is AFAIK no difference
> between
> encryption and decryption, I think it would be better just fix
> do_encrypt
> to 1 to avoid inconsistency.

Hi Damien,
do you have any update on this?

Indeed, it looks like LibreSSL has the IV of 96 b [1], while OpenSSL
uses 128 bits (including the 32b counter) [2]. Otherwise, I did not
notice any differences.

I have really no experience with OpenBSD so I do not have simple way to
test my changes, but I believe something like this should address the
difference:

diff --git a/cipher-chachapoly.c b/cipher-chachapoly.c
index a58616fb..7e6995f6 100644
--- a/cipher-chachapoly.c
+++ b/cipher-chachapoly.c
@@ -109,7 +109,14 @@ chachapoly_crypt(struct chachapoly_ctx *ctx, u_int
seqnr, u_char *dest,
const u_char *src, u_int len, u_int aadlen, u_int authlen, int
do_encrypt)
{
#if defined(WITH_OPENSSL) && defined(HAVE_EVP_CHACHA20)
+#if defined(LIBRESSL_VERSION_NUMBER)
+#define CHACHA_IV_OFFSET 4
+ u_char seqbuf[12];
+#else
+#define CHACHA_IV_OFFSET 8
+ /* OpenSSL IV contains also the counter in the first 4 bytes */
u_char seqbuf[16];
+#endif
int r = SSH_ERR_LIBCRYPTO_ERROR;
#else
u_char seqbuf[8];
@@ -125,7 +132,7 @@ chachapoly_crypt(struct chachapoly_ctx *ctx, u_int
seqnr, u_char *dest,
memset(poly_key, 0, sizeof(poly_key));
#if defined(WITH_OPENSSL) && defined(HAVE_EVP_CHACHA20)
memset(seqbuf + 0, 0, 8);
- POKE_U64(seqbuf + 8, seqnr);
+ POKE_U64(seqbuf + CHACHA_IV_OFFSET, seqnr);
if (!EVP_CipherInit(ctx->main_evp, NULL, NULL, seqbuf,
do_encrypt))
goto out;
if (EVP_Cipher(ctx->main_evp, poly_key, (u_char *)poly_key,
sizeof(poly_key)) < 0)

For the do_encrypt, you are right. Chacha20 is stream cipher so there
is no difference between decryption and encryption but the EVP API
requires this argument. For consistency, I would be for using 1 in all
the cases.

If you have some wip branch you used for porting to openbsd or
something I can test, I guess I can try that.

[1] https://man.openbsd.org/man3/EVP_EncryptInit.3
[2]
https://www.openssl.org/docs/man1.1.1/man3/EVP_chacha20_poly1305.html

Regards,
--
Jakub Jelen
Senior Software Engineer
Security Technologies
Red Hat, Inc.

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: [patch 1/2] use chacha20 from openssl (1.1.0+) when possible [ In reply to ]
On 16.01.2020 13:27, Jakub Jelen wrote:
> On Fri, 2019-07-12 at 15:54 +1000, Damien Miller wrote:
>> On Thu, 17 Jan 2019, Yuriy M. Kaminskiy wrote:
>>
>>> On some cpu's optimized chacha implementation in openssl (1.1.0+)
>>> is
>>> notably faster (and on others it is just faster) than generic C
>>> implementation in openssh.
>>>
>>> Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses
>>> different scheme (with padding/etc - see rfc8439) and it looks it
>>> is not
>>> possible to use in openssh.
>>>
>>> OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I
>>> have not tried it yet (it was not in 1.1.0).
>>>
>>> Trivial benchmark:
>>> time ssh -c chacha20-poly1305@openssh.com -S none -o Compression=no
>>> \
>>> localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null
>>> (comparing "user time" only)
>>>
>>> openssh: 7.9p1, self-compiled, based on upstream package from
>>> debian/unstable,
>>> hostkey - ecdsa/p256, pubkey auth key - ecdh/p256
>>>
>>> Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI)
>>> OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1
>>> i386: speed: +8%
>>> amd64: speed: +10%
>>>
>>> Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz)
>>> OS: raspbian/stretch
>>>
>>> baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s
>>>
>>> with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6
>>> without neon):
>>>
>>> armhf/raspbian: 24.7 seconds, speed: +25%
>>>
>>> with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for
>>> armv7 with neon autodetection):
>>> armhf: 22.2 seconds, speed: +39%
>>>
>>> Patches against 7.9p1 (tested) and git master (untested, only
>>> resolved
>>> configure.ac conflict) attached.
>>
>> Thanks for this - it seems to work okay with OpenSSL when patched to
>> -current, but when I adapt it for OpenBSD/LibreSSL the encryption is
>> broken and the connection fails right after KEX.
>>
>> I expect that there is some difference between OpenSSL and LibreSSL
>> wrt
>> IV lengths or something. OpenSSH does need to support both, so this
>> will
>> take a little figuring out.
>>
>> One comment on the patch itself: it passes do_encrypt though in a
>> bunch
>> of places and I'm not sure the usage is correct in all of them. In
>> fact
>> I don't think it can even be made consistent for decryption, as the
>> ctx->main_evp has to be used in encryption mode (not decryption) to
>> generate the poly1305 key.
>>
>> Given this is a stream cipher and there is AFAIK no difference
>> between
>> encryption and decryption, I think it would be better just fix
>> do_encrypt
>> to 1 to avoid inconsistency.
>
> Hi Damien,
> do you have any update on this?
>
> Indeed, it looks like LibreSSL has the IV of 96 b [1], while OpenSSL
> uses 128 bits (including the 32b counter) [2]. Otherwise, I did not
> notice any differences.

Given libressl contains no optimized chacha20/poly1305/{curve,ed}25519,
I'd rather use openssh builtin implementation on libressl.

> I have really no experience with OpenBSD so I do not have simple way to
> test my changes, but I believe something like this should address the
> difference:
>
> diff --git a/cipher-chachapoly.c b/cipher-chachapoly.c
> index a58616fb..7e6995f6 100644
> --- a/cipher-chachapoly.c
> +++ b/cipher-chachapoly.c
> @@ -109,7 +109,14 @@ chachapoly_crypt(struct chachapoly_ctx *ctx, u_int
> seqnr, u_char *dest,
> const u_char *src, u_int len, u_int aadlen, u_int authlen, int
> do_encrypt)
> {
> #if defined(WITH_OPENSSL) && defined(HAVE_EVP_CHACHA20)
> +#if defined(LIBRESSL_VERSION_NUMBER)
> +#define CHACHA_IV_OFFSET 4
> + u_char seqbuf[12];
> +#else
> +#define CHACHA_IV_OFFSET 8
> + /* OpenSSL IV contains also the counter in the first 4 bytes */
> u_char seqbuf[16];
> +#endif

... and if not, I'd use EVP_CIPHER_CTX_iv_length() rather than
brittle compile-time magic numbers.

> int r = SSH_ERR_LIBCRYPTO_ERROR;
> #else
> u_char seqbuf[8];
> @@ -125,7 +132,7 @@ chachapoly_crypt(struct chachapoly_ctx *ctx, u_int
> seqnr, u_char *dest,
> memset(poly_key, 0, sizeof(poly_key));
> #if defined(WITH_OPENSSL) && defined(HAVE_EVP_CHACHA20)
> memset(seqbuf + 0, 0, 8);
memset(seqbuf + 0, 0, CHACHA_IV_OFFSET);
> - POKE_U64(seqbuf + 8, seqnr);
> + POKE_U64(seqbuf + CHACHA_IV_OFFSET, seqnr);
> if (!EVP_CipherInit(ctx->main_evp, NULL, NULL, seqbuf,
> do_encrypt))
> goto out;
> if (EVP_Cipher(ctx->main_evp, poly_key, (u_char *)poly_key,
> sizeof(poly_key)) < 0)
>
> For the do_encrypt, you are right. Chacha20 is stream cipher so there
> is no difference between decryption and encryption but the EVP API
> requires this argument. For consistency, I would be for using 1 in all
> the cases.
>
> If you have some wip branch you used for porting to openbsd or
> something I can test, I guess I can try that.
>
> [1] https://man.openbsd.org/man3/EVP_EncryptInit.3
> [2]
> https://www.openssl.org/docs/man1.1.1/man3/EVP_chacha20_poly1305.html
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev