On 2013-06-21, at 8:34, "James A Simmons" <uja(a)ornl.gov> wrote:
Recently I have been doing an evaluation of the performance of
summing in the Lustre code base. From the results you can see it can be
very expensive when you lack hardware acceleration. Unfortunately
we have some platforms were their is a high cost but check summing
is a requirement.
It is also important to consider what checksum algorithm is used on the server, since the
server will have to do the checksumming for all of the clients.
Besides the core algorithms I have added a few of my own to
see how they measure up. We have csum which is your normal IP Header
The IP header checksum is only 16-bit, so it is only suitable for a small amount of data.
It is definitely not suitable for 1MB or 4MB RPC sizes.
For the non cryptographic hashes it's the IP check sum and
that does the best. This version of murmur3 only generates 32 check
sums but their exist a 128 bit version that is suppose to be faster.
It could be worth while to explore. The IP check sum from the linux
kernel is assembly optimized but my additional algorithms are generic
You should test with the kernel cryptoapi code, since AFAIK there are assembly versions of
the common algorithms already. Check out how the libcfs code is already handling the crc32
code - it benchmarks each algorithm at startup and dumps the results in the Lustre debug
The final question is the Lustre community interested in the new
algorithms? If so I can push forward that work.
I'm not against it if there are significant improvements to be had.
It surprises me that newer CPUs do not have hardware-accelerated checksums of some sort.
Is it just that the assembly versions have not been implemented in the kernels that Lustre
is running on? Could they be implemented in libcfs as was done with crc32 and then
submitted to the upstream kernel (so everyone benefits and we don't have to maintain