Can you please file a ticket for this.
On Oct 12, 2014, at 10:30 AM, David Singleton wrote:
We have seen mds (admittedly with smallish memory) OOM'ing while
testing 2.5.3 whereas there was no problem with 2.5.0. It turns out the problem is that,
even though we have lru_size=800 everywhere, the client LDLM lru's are growing huge so
that the MDS unreclaimable ldlm slabs fill memory.
It looks like the root cause is the change to ldlm_cancel_aged_policy() in commit
0a6c6fcd46 on the 2.5 branch (LU-4786 osc: to not pick busy pages for ELC) - it has
changed the lru_sze != 0 behaviour. Prior to that, the non-lru_resize behaviour (at least
through the early_lock_cancel path which is what we see being hit) was effectively
cancel lock if (too many in lru cache || lock unused too long)
In 2.5.3, it's
cancel lock if (too many in lru cache && lock unused too long)
Disabling early_lock_cancel doesn't seem to help.
It might be arguable which of the two behaviours is correct but the lru_size doco
suggests the former - the latter makes lru_size != 0 ineffective in practice. It also
looks like the change was not actually necessary for LU-4300?
HPDD-discuss mailing list