On 2015/09/10, 6:54 PM, "Chris Hunter" <chris.hunter(a)yale.edu> wrote:
We experienced file corruption on several OSTs. We proceeded through
recovery using e2fsck & ll_recover_lost_found_obj tools.
Following these steps, e2fsck came out clean.
The file corruption did not impact the MDT. The files were still
referenced by the MDT. Accessing the file on a lustre client (ie. ls -l)
would report error "Cannot allocate memory"
Following OST recovery steps, we started removing the corrupt files via
"unlink" command on lustre client (rm command would not remove file).
Now dry-run e2fsck of the OST is reporting errors:
"deleted/unused inodes" in Pass 2 (checking directory structure),
"Unattached inodes" in Pass 4 (checking reference counts)
"free block count wrong" in Pass 5 (checking group summary information).
Is e2fsck errors expected when unlinking files ?
No, the "unlink" command is just avoiding the -ENOENT error that "rm"
by calling "stat()" on the file before trying to unlink it. This
shouldn't cause any errors on the OSTs, unless there is ongoing corruption
from the back-end storage.
On 09/03/2015 12:54 PM, Martin Hecht wrote:
> Hi Chris,
> On 09/02/2015 07:18 AM, Chris Hunter wrote:
>> Hi Andreas
>> On 09/01/2015 07:22 PM, Dilger, Andreas wrote:
>>> On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter"
>>> <lustre-discuss-bounces(a)lists.lustre.org on behalf of
>>> chris.hunter(a)yale.edu> wrote:
>>>> Hi Andreas,
>>>> Thanks for your help.
>>>> If you have a striped lustre file with "holes" (ie. one chunk
>>>> due hardware failure, etc.) are the remaining file chunks considered
>>>> orphan objects ?
>> So when a lustre striped file has a hole (eg. missing chunk due to
>> hardware failure), the remaining file chunks stay indefinitely on the
>> Is there a way to reclaim the space occupied by these pieces (after
>> recovery of any usuable data, etc.)?
> these remaining chunks still belong to the file (i.e. you have the
> metadata entry on the MDT and you see the file when lustre is mounted).
> By removing the file you free up the space.
> In general there are two types of inconsistencies which may occur:
> Orphan objects are objects which are NOT assigned to an entry on the
> MDT, i.e. chunks which do not belong to any file. These can be either
> pre-allocated chunks or chunks left over after a corruption of the
> metadata on the MDT.
> The other type of corruption is that you have a file, where chunks are
> missing in-between. This can happen, when an OST gets corrupted. As long
> as the MDT is Ok, you should be able to remove such a file. If in
> addition the MDT is also corrupted, you should first fix the MDT, and
> you might then only be able to unlink the file (which again might leave
> some orphan objects on the OSTs). lfsck should be able to remove them,
> depending on the lustre version you are running...
> Another point: When the OST got corrupted, after having them repaired
> with e2fsck, you can mount them as ldiskfs and see if there are chunks
> in lost+found and use the tool ll_recover_lost_found_objs to restore
> them in the original place. I believe these objects which e2fsck puts in
> lost+found are another kind of thing, usually not called "orphan
> objects". As I said, they usually can be easily recovered.
Lustre Software Architect
Intel High Performance Data Division