On 2013-09-24, at 14:15, "Carlson, Timothy S" <Timothy.Carlson(a)pnnl.gov>
wrote:
I've got an odd situation that I can't seem to fix.
My setup is Lustre 1.8.8-wc1 clients on RHEL 6 talking to 1.8.6 servers on RHEL 5.
My compute nodes have 64 GB of memory and I have a use case where an application has very
low memory usage and needs to access a few thousand files in Lustre that range from 10 to
50 MB. The files are subject to some reuse and it would be advantageous to cache as much
of the data as possible. The default cache for this configuration would be 48GB on the
client as that is 75% of memory. However the client never caches more than about 40GB of
data according to /proc/meminfo
Even if I tune the cached memory to 64GB the amount of cache in use never goes past 40GB.
My current setting is as follows
# lctl get_param llite.*.max_cached_mb
llite.olympus-ffff8804069da800.max_cached_mb=64000
I've also played with some of the VM tunable settings. Like running
vfs_cache_pressure down to 10
# vm.vfs_cache_pressure = 10
In no case do I see more than about 35GB of cache being used. To do some more testing
on this I created a bunch (40) 2G files in Lustre and then copied them to /dev/null on the
client. While doing this I ran the fincore tool from
http://code.google.com/p/linux-ftools/ to see if the file was still in cache. Once about
40GB of cache was used, the kernel started to drop files from the cache even though there
was no memory pressure on the system.
If I do the same test with files local to the system, I can fill all the cache to about
61GB before files start getting dropped.
Is there some other Lustre tunable on the client that I can twiddle with to make more use
of the local memory cache?
This might relate to the number of DLM locks cached on the client. Of the locks get
cancelled for some reason (e.g. memory pressure on the server, old age) then the pages
covered by the locks will also be dropped.
You could try disabling the lock LRU and specify some large static number of locks (for
testing, I wouldn't leave this set for production systems with large numbers of
clients):
lctl set_param ldlm.namespaces.*.lru_size=10000
To reset it to dynamic DLM LRU size management set a value of "0".
Cheers, Andread