LWN.net Logo

if CPU time is the bottleneck ...

if CPU time is the bottleneck ...

Posted Feb 14, 2008 3:32 UTC (Thu) by yarikoptic (subscriber, #36795)
In reply to: if CPU time is the bottleneck ... by njs
Parent article: Multi-threaded OpenSSH

I was curious myself so I jumped to the tail to do testing of all cyphers... also due to bash sugaring, there is no disk IO involved in this one:

> for cypher in 3des-cbc  aes128-cbc  aes192-cbc  aes256-cbc  aes128-ctr \
  aes192-ctr  aes256-ctr  arcfour128  arcfour256  arcfour  blowfish-cbc \
  cast128-cbc; do echo -en "$cypher:\t"; \
 bin/timeit -r 20 "scp -c $cypher <(dd if=/dev/zero bs=1024k count=1000 2>/dev/null) /dev/null";  done
3des-cbc:       20 runs real:1.54+-0.45 user:0.04+-0.02 sys:0.74+-0.34
aes128-cbc:     20 runs real:1.58+-0.50 user:0.04+-0.02 sys:0.79+-0.43
aes192-cbc:     20 runs real:1.43+-0.39 user:0.03+-0.03 sys:0.62+-0.22
aes256-cbc:     20 runs real:1.59+-0.48 user:0.03+-0.02 sys:0.79+-0.45
aes128-ctr:     20 runs real:1.33+-0.30 user:0.04+-0.03 sys:0.59+-0.30
aes192-ctr:     20 runs real:1.35+-0.33 user:0.04+-0.02 sys:0.59+-0.26
aes256-ctr:     20 runs real:1.59+-0.50 user:0.03+-0.02 sys:0.77+-0.37
arcfour128:     20 runs real:1.43+-0.38 user:0.04+-0.02 sys:0.66+-0.30
arcfour256:     20 runs real:1.99+-0.38 user:0.02+-0.01 sys:1.00+-0.31
arcfour:        20 runs real:1.45+-0.47 user:0.03+-0.02 sys:0.70+-0.37
blowfish-cbc:   20 runs real:1.53+-0.47 user:0.04+-0.02 sys:0.73+-0.32
cast128-cbc:    20 runs real:1.42+-0.41 user:0.03+-0.03 sys:0.65+-0.31

Surprisingly (khe khe) aes128-ctr is the winner on average (and its performance the most stable, ie deviation is less from run to run) and that is not default one (I whould have ran them in default order which is aes128-cbc,3des-cbc,blowfish-cbc,...). Someone is welcome to check if blowfish performance is statistically significantly better than of default's aes128-cbc, but for my eye -- it is quite far from being significant (although you are welcome to reduce standard error by getting more samples ;-))

This test was ran using Debian's openssh 4.3p2-9. kernel 2.6.18-5-amd64. Unfortunately MT ssh patches do not apply cleanly on top of Debian shipped version (ie there is conflict with debian's diff.gz) and I don't feel wasting time there -- if someone tunes up patches to be placed on top of Debian's openssh -- I would appreciate a buzz.


(Log in to post comments)

run OpenSSL speed benchmark

Posted Feb 14, 2008 7:48 UTC (Thu) by vgough (subscriber, #2781) [Link]

OpenSSH uses OpenSSL, which has a useful speed test built-in ("openssl speed").  On my Dual
core AMD, the Blowfish speed slots somewhere between AES 192 bit and AES 256 bit, at around
60MB/sec.  The fastest I've seen is my slower VIA C7 system, which can do AES-256-CBC at
nearly 800MB/sec due to hardware acceleration:

$ openssl speed -engine padlock -evp aes-256-cbc
engine "padlock" set.
...
The 'numbers' are in 1000s of bytes per second processed.
type          16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc   70048.90k   221410.79k   482132.99k   683641.51k   780377.04k

run OpenSSL speed benchmark

Posted Feb 23, 2008 15:39 UTC (Sat) by anton (guest, #25547) [Link]

Whether blowfish is faster than AES depends on the actual CPU; below you find results from 6 different CPUs (measured with "openssl speed"). It seems that Blowfish is fastest on non-IA32/AMD64 CPUs, and that AES-128 is fastest on IA32/AMD64 CPUs. Maybe AES has been especially tuned on these architectures (maybe using SSE2), or maybe this is just an accident.
The 'numbers' are in 1000s of bytes per second processed.

0.8GHz Alpha 21264b (Alpha):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     19939.33k    22299.37k    23165.70k    23421.27k    23592.96k
aes-128 cbc      22547.19k    23024.03k    23176.11k    23239.00k    23098.79k
aes-192 cbc      20367.55k    20544.24k    20858.45k    20928.17k    20680.42k
aes-256 cbc      18996.84k    19297.97k    19407.28k    19328.00k    19358.54k

1GHz UltraSparc T1 (Sparc32):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     10205.05k    11725.33k    12179.11k    12303.36k    12085.93k
aes-128 cbc       8187.33k     9253.74k     9563.82k     9644.37k     9497.26k
aes-192 cbc       7416.54k     8274.56k     8492.37k     8556.20k     8437.76k
aes-256 cbc       6685.84k     7379.78k     7575.47k     7626.07k     7528.45k

1.066GHz PPC 7447A (iBook G4) (PPC32):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     30950.70k    34226.50k    35157.90k    35383.45k    35492.32k
aes-128 cbc      30461.87k    33709.61k    34417.71k    34576.16k    34621.41k
aes-192 cbc      27779.14k    30453.69k    31023.54k    31173.15k    31175.87k
aes-256 cbc      25526.60k    27756.61k    28241.65k    28456.96k    28372.62k

2.2GHz Athlon 64 X2 4400+ (AMD64):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     79664.65k    85287.04k    86616.41k    86835.88k    87293.95k
aes-128 cbc      91920.92k   100527.77k   103723.18k   104209.52k   104710.14k
aes-192 cbc      85428.13k    89385.83k    91002.79k    91103.00k    91463.68k
aes-256 cbc      76068.87k    78756.57k    80623.02k    81095.68k    81193.64k

2.66GHz Pentium 4 (IA-32):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     75218.71k    77014.90k    79549.78k    80735.91k    79525.42k
aes-128 cbc      46028.82k    86340.70k   108086.73k   117469.64k   120570.07k
aes-192 cbc      42808.40k    74267.47k    92402.86k    98323.46k   100554.07k
aes-256 cbc      40274.84k    67507.63k    82040.83k    85691.45k    88408.06k

3GHz Xeon 5160 (~Core 2 Duo 6850) (AMD64):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     99011.70k   105660.84k   106929.75k   107735.38k   107562.87k
aes-128 cbc     149960.71k   157818.86k   158781.27k   158404.98k   158695.42k
aes-192 cbc     132249.65k   138024.81k   138844.84k   138654.36k   138622.29k
aes-256 cbc     117380.43k   122823.36k   122933.24k   123506.01k   123387.90k

if CPU time is the bottleneck ...

Posted Feb 14, 2008 19:20 UTC (Thu) by njs (subscriber, #40338) [Link]

According to those numbers, your system can crank through pretty much any cipher at much
better than 1000MB/s.  Where did you get your CPU?  NSA surplus? :-)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.