Check the striping on the files reporting ENOSPC with "lfs getstripe" to see if they are on the full OSTs. Appending to existing files will not move them to OSTs with more space, only new files will avoid the full OSTs.
Even better is to avoid having 100% full OSTs in the first place, since that causes serious file fragmentation as the last available blocks are allocated to files.
Lustre Software Architect
Intel High Performance Data Division
On 2015/06/01, 9:52 AM, "Kumar, Amit" <ahkumar(a)mail.smu.edu<mailto:email@example.com>> wrote:
Lustre version: 2.4.3
OS: RHEL/SL 6.5
Just wondering if anybody has seen something similar: I have some applications that fail with an error “No space left on device”, for example a copy command has failed while copying a 500MB file while we have over 500TB space left.
And this is intermittent, I have been able to copy files larger than that after those failures. I have seen this erorr with applications like CHARMM and Gaussian.
Only a handful of 3% of our OST’s have reached 100% usage, but this should not cause it as I understand.
Any thoughts on what could be causing this and how can I go about debugging this will be a great help.