Unattached inode, Connect to /lost+found, Inode ref count is 2, should be 1
by Kumar, Amit
Dear All,
Running into tons of "Unattached inode ..,Connect to /lost+found ..., Inode <....> ref count is 2, should be 1" on MDT, after running lfs_migrate for entire last month trying to move files off of old storage hardware. Lustre v.2.4.3
I ran e2fsck -fp on OST's and it worked fine, since they had only couple of such messages they quickly got fixed.
Ran e2fsck -fp on MDT and it fails to fix it, as it is not prudent enough. Only option left was to run it with "e2fsck -fy". In doing so i see tons of above messages, that are being fixed. I realized I had orphaned objects, but I did not think that I would find never ending objects to fix. I have been running "e2fsck -fy MDT" all today and NO luck or sign on completion.
With this pace, I am at 82M inodes in fixing ...based on the output of my e2fsck -fy MDT run. Given 2TB MDT, I should be capable of at least 512M inodes. I have a question in this context.
Q) My file system had about 35 million files(guess) and MDT usage was about 32GB, any idea how long my e2fsck checks could take?
Q) Why am I seeing these many Unattached inodes in such large numbers, is there something bad going on with our file system?
Please advise.
Thank you,
Amit
4 years, 8 months
Re: [HPDD-discuss] lfs find tips or tricks
by Dilger, Andreas
You don't really explain what you are using the "lfs find" data for, so it is hard to help you optimize your usage. It is possible, for example, to specify multiple OSTs at once for "lfs find" (e.g. if emptying 4 OSTs at once), but that may not be what you want to do.
As for the MDS memory problem, that is caused by huge inode/DLM lock caches on the clients, and was fixed at one point. Don't know the bug number off hand, but you could find it in Jira. As a workaround you can also periodically flush the lock caches on the clients via:
lctl set_param ldlm.namespaces.*mdc*.lru_size=clear
You could avoid all of the repeated scanning by using Robin Hood to index the filesystem once, and then do queries against the RBH database, and use the Lustre ChangeLog to keep RBH updated without the need to re-scan the whole filesystem.
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel High Performance Data Division
On 2016/06/06, 08:58, " Kumar, Amit" <ahkumar(a)mail.smu.edu<mailto:ahkumar@mail.smu.edu>> wrote:
Dear All,
I believe there is no answer to optimized “lfs find” but still trying to see if I can learn more if there is.
Q1) I have been trying to scan 39 OST’s using lfs find and this is taking forever? Are there any tips or tricks to speed this up. Scans are taking anywhere between 15-24 hours per OST’s to finish if all goes well without interruption. I am parallelizing my scan’s from multiple clients to speed this up but don’t know of any alternate ways.
Q2) On the other hand when I start “lfs find” on each of the 39 OST’s I have doomed my MDS server with kernel panic due to out of memory issue. Any tips on how can I minimize this load and avoid MDS from running out of memory?
Q3) Situation: When a client dies for any reason or if the “lfs find” command that it is running times out, with Input/Out error “or” transport shutdown(Never saw this until I started running multiple lfs find and simultaneously running lfs_migrate for the files that were identified to be moved out of OST’s)
Observation: I believe the MDS continues to run and serve the scan request for “lfs find on a thread until it fails or notices that it has evicted the client, hence taking up the resources on MDS. I don’t know if this makes sense but I am guessing this is causing my MDS to be loaded with RPC requests and causing further slowdown. Any tunable options here?
I wish there was some kind of indexing we could do to avoid deep scans.
Best Regards,
Amit
4 years, 8 months
Lustre client compatibility matrix
by Arman Khalatyan
Hello everybody,
Recently I noticed that something is changed inside the
lustre-client-master (2.8.xx)
The client is not able to read the data from OSTS.
It is able to mount filesystem and stat files from the server 2.7,
but unable to read any data looks like OSTS did not answer.
If I rollback client to 2.8.0 (
https://build.hpdd.intel.com/job/lustre-b2_8/arch=x86_64,build_type=clien...
)
then everything is working as expected.
Are there some compatibility matrix which I missed?
Thanks,
Arman.
***********************************************************
Dr. Arman Khalatyan eScience -SuperComputing
Leibniz-Institut für Astrophysik Potsdam (AIP)
An der Sternwarte 16, 14482 Potsdam, Germany
***********************************************************
4 years, 8 months
Is lustre 2.8.0 RPMS for el7.2 missing libzfs.so.2()(64bit) ???
by Gibbins, Faye
Hi,
I'm trying to install lustre upon a SL7.2 VM from the Intel repo here: https://downloads.hpdd.intel.com/public/lustre/lustre-2.8.0/el7/server/
I'm trying this:
snip----
# /bin/yum -d 0 -e 0 -y install lustre
Error: Package: lustre-osd-zfs-mount-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64 (.....)
Requires: libzfs.so.2()(64bit)
Error: Package: lustre-osd-zfs-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64 (.....)
Requires: zfs-kmod
snip----
As you can see there's a problem. Where can I find these deps for RHEL 7?
Yours
Faye Gibbins
Snr Systems Administrator, Unix Lead Architect (Software)
Cirrus Logic | cirrus.com<http://www.cirrus.com/> | +44 131 272 7398
[cid:image003.png@01CFC5F0.FE149B60]
This message and any attachments may contain privileged and confidential information that is intended solely for the person(s) to whom it is addressed. If you are not an intended recipient you must not: read; copy; distribute; discuss; take any action in or make any reliance upon the contents of this message; nor open or read any attachment. If you have received this message in error, please notify us as soon as possible on the following telephone number and destroy this message including any attachments. Thank you. Cirrus Logic International (UK) Ltd and Cirrus Logic International Semiconductor Ltd are companies registered in Scotland, with registered numbers SC089839 and SC495735 respectively. Our registered office is at Westfield House, 26 Westfield Road, Edinburgh, EH11 2QB, UK. Tel: +44 (0)131 272 7000. cirrus.com
4 years, 9 months
lfs find tips or tricks
by Kumar, Amit
Dear All,
I believe there is no answer to optimized "lfs find" but still trying to see if I can learn more if there is.
Q1) I have been trying to scan 39 OST's using lfs find and this is taking forever? Are there any tips or tricks to speed this up. Scans are taking anywhere between 15-24 hours per OST's to finish if all goes well without interruption. I am parallelizing my scan's from multiple clients to speed this up but don't know of any alternate ways.
Q2) On the other hand when I start "lfs find" on each of the 39 OST's I have doomed my MDS server with kernel panic due to out of memory issue. Any tips on how can I minimize this load and avoid MDS from running out of memory?
Q3) Situation: When a client dies for any reason or if the "lfs find" command that it is running times out, with Input/Out error "or" transport shutdown(Never saw this until I started running multiple lfs find and simultaneously running lfs_migrate for the files that were identified to be moved out of OST's)
Observation: I believe the MDS continues to run and serve the scan request for "lfs find on a thread until it fails or notices that it has evicted the client, hence taking up the resources on MDS. I don't know if this makes sense but I am guessing this is causing my MDS to be loaded with RPC requests and causing further slowdown. Any tunable options here?
I wish there was some kind of indexing we could do to avoid deep scans.
Best Regards,
Amit
4 years, 9 months