Corrupted quota file
by Ramiro Alba
Hi all,
I am using Lustre 2.5.3 under CentOS 6.5 both servers and clients and
the lustre file system is using 6 OSTs of 30TB size with 2 OSSs.
Last days I've been having problems with OSS server rebooting without a
known reason after running fine for several hours (betwen 3 to 24), and
I also noticed that quotas were not showing the correct usage.
At the last OSS server reboot issue, I deciced to run e2fsck both on MDT
and
OSTs, and those bellow were the errors at command output:
------------------
MDT:
------------------
e2fsck -f /dev/mapper/vglustre-MDT
e2fsck 1.42.9.wc1 (24-Feb-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
[ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (1045 >=
10) in user quota file. Quota file is probably corrupted.
Please run e2fsck (8) to fix it.
[ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (16322 >=
10) in user quota file. Quota file is probably corrupted.
Please run e2fsck (8) to fix it.
[ERROR] quotaio_tree.c:590:check_reference:: Illegal reference (2940928
>= 10) in user quota file. Quota file is probably corrupted.
Please run e2fsck (8) to fix it.
jffstg-MDT0000: 1218855/47185920 files (0.2% non-contiguous),
6191259/23592960 blocks
------------------
OSSs:
------------------
[root@jffoss1 system]# e2fsck -fp /dev/mapper/ost0; e2fsck -fp
/dev/mapper/ost1; e2fsck -fp /dev/mapper/ost2
MMP interval is 10 seconds and total wait time is 42 seconds. Please
wait...
jffstg-OST0000: recovering journal
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1766737936384, 3488)
!= expected (0, 0)
[QUOTA WARNING] Usage inconsistent for ID 1042:actual (507613184, 3613)
!= expected (40960, 1)
[QUOTA WARNING] Usage inconsistent for ID 1031:actual (1041013137408,
5278) != expected (42952433664, 119)
[QUOTA WARNING] Usage inconsistent for ID 1304:actual (562630971392,
5127) != expected (0, 0)
[QUOTA WARNING] Usage inconsistent for ID 1338:actual (1634377928704,
7847) != expected (2098733056, 12)
[QUOTA WARNING] Usage inconsistent for ID 1037:actual (948502528, 1224)
!= expected (2150400, 0)
[QUOTA WARNING] Usage inconsistent for ID 1367:actual (1078501474304,
18130) != expected (20640858112, 55)
[QUOTA WARNING] Usage inconsistent for ID 1030:actual (84762128384,
3973) != expected (26271744, 2)
[QUOTA WARNING] Usage inconsistent for ID 1041:actual (258749169664,
2678) != expected (109670400, 11)
[QUOTA WARNING] Usage inconsistent for ID 1056:actual (389121245184,
1333) != expected (9640079360, 21)
[QUOTA WARNING] Usage inconsistent for ID 1363:actual (1562884812800,
28993) != expected (41607168, 44)
[QUOTA WARNING] Usage inconsistent for ID 1015:actual (263251644416,
1778) != expected (1230139392, 8)
[QUOTA WARNING] Usage inconsistent for ID 1027:actual (121355857920,
4042) != expected (918818816, 5)
[QUOTA WARNING] Usage inconsistent for ID 1000:actual (44268187648, 165)
!= expected (0, 0)
jffstg-OST0000: Update quota info for quota type 0.
[ERROR] quotaio_tree.c:241:find_free_dqentry:: find_free_dqentry(): Data
block full unexpectedly.
[ERROR] quotaio_tree.c:241:find_free_dqentry:: find_free_dqentry(): Data
block full unexpectedly.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1766738993152, 3492)
!= expected (0, 0)
[QUOTA WARNING] Usage inconsistent for ID 100:actual (14970901614592,
187889) != expected (0, 0)
jffstg-OST0000: Update quota info for quota type 1.
jffstg-OST0000: 191438/30519552 files (22.0% non-contiguous),
4088922050/7812984832 blocks
The other OSTs showed similar errors.
Are there any e2fsck options to solve this issue? (May be with -y?)
How can I reset quotas and start clean?
Please, any suggestions will be very welcomed
Thanks in advance
Best regards
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
4 years, 5 months
Lustre quotas are not showing used space
by Ramiro Alba
Hi all,
I am using Lustre 2.5.3 under CentOS 6.5 both servers and clients and
the lustre file system is using 6 OSTs of 30TB size with 2 OSSs
/dev/mapper/ost0 31242014360 16292468776 13386948620 55% /lustre/ost0
/dev/mapper/ost1 31242014360 16925276168 12754141228 58% /lustre/ost1
/dev/mapper/ost2 31242014360 15247919148 14431498248 52% /lustre/ost2
/dev/mapper/ost3 31242014360 15155750732 14523666664 52% /lustre/ost3
/dev/mapper/ost4 31242014360 15184922564 14494494832 52% /lustre/ost4
/dev/mapper/ost5 31242014360 14094013852 15585403544 48% /lustre/ost5
The system has been very stable until now (2 years), but now since the
last
two days, OSSs servers reboot without a known reason:
No hardware issue and no relevant info before rebooting.
The only thing is with quotas, as I've just realized that quotas are not
show showing
used space for most users (for some other yes).
I tried to reenable quotas using:
# Disable quotas
lctl conf_param jffstg.quota.mdt=none
lctl conf_param jffstg.quota.ost=none
# Enable quotas
lctl conf_param jffstg.quota.mdt=ug
lctl conf_param jffstg.quota.ost=ug
But not difference with quotas issue. Time and uid/gid maps are OK
My doubts are:
1) Are the two issues related?
2) Any suggestion to solve the quota issue?
Any suggestions are welcomed
Best regards
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
4 years, 5 months
OpenSFS Transition and Futures
by ssimms@iu.edu
Please accept my sincere apologies if this reaches you more than once.
Dear Members of the Lustre Community,
I write to you now with passion and enthusiasm about the restructure and
transformation of OpenSFS into a user-driven organization dedicated to
addressing the current and future needs of Lustre users.
It is my sincere pleasure to announce that on Thursday last week, after
months of discussion and careful consideration, the OpenSFS Board
transferred the organization to a new temporary board of users
representing academia, business, and the national laboratories. In
addition to myself, the temporary board consists of
Shawn Hall, BP
Steve Monk, Sandia National Laboratory
Sarp Oral, Oak Ridge National Laboratory
Rick Wagner, Globus (formerly San Diego Supercomputing)
This board will remain in place until an election can be held at this
year's Lustre User Group meeting (a 'save the date' message will be coming
soon).
OpenSFS has accomplished many great things since its inception, providing
leadership, manpower, and capital that have improved Lustre and ensured
its place in the HPC ecosystem. Now the time has come for those who rely
most on Lustre, its users, to guide OpenSFS into the future to provide:
- elected leadership
- a unified voice
- a user run Lustre User Group meeting
- support for the Lustre Working Group
- support for lustre.org along with EOFS
- chances for frank and direct contact between vendors and users
To encourage participation from users and vendors alike, the membership
model has been flattened to two categories and dues reduced significantly:
Members (user organizations) - $1,000 annual dues
- voting rights
- eligibility to serve on the board
- eligibility to serve on LUG planning committee
- eligibility to participate in requirements gathering
Participants (vendor organizations) - $5,000 annual dues
- support community efforts to promote Lustre
- opportunities for direct contact with User community
- access to community requirements gathering exercise
- eligible to attend OpenSFS member meetings
These changes are a positive step forward for OpenSFS and our community
and we would love to have your involvement to help ensure Lustre remains
open and to help shape Lustre's moving forward.
If you have questions about the organizational changes, would like to
volunteer, or discuss future objectives, feel free to reach out to me, any
of the temporary board members, or send mail to admin(a)opensfs.org. In the
meantime, we will be moving forward with new streamlined bylaws available
here:
http://cdn.opensfs.org/wp-content/uploads/2016/09/Open-SFS-Amendment-and-...
In closing, I want to thank Mark Seager from Intel for his crucial role in
founding OpenSFS in 2010 before his departure from Lawrence Livermore
National Laboratory, Charlie Carroll from Cray for his effort and
leadership as chairman of the board, and all former board members and
their organizations for making this transition possible.
It has been an honor serving as the community board representative and I
look forward to continued service as a member of the temporary OpenSFS
board.
Sincerely,
Stephen Simms
OpenSFS Temporary Board Member
Manager, High Performance File Systems
Indiana University
ssimms(a)iu.edu
812-855-7211
4 years, 5 months
Reminder & Announcing Agenda for LUG 2016, PRC - Shanghai
by OpenSFS Administration
<http://info.intel.com/rs/797-PWY-691/images/LUG%20PRC%20Banner%201.jpg>
Dear Lustre Community,
The best Lustre minds are coming together to collaborate, listen and learn at the 2016 LUG event on October 20th, Shanghai, China. Industry thought leaders and Lustre experts will present on the latest trends, developments and uses of Lustre.
We are announcing an exciting AGENDA <http://pages.intel.com/hZd2N0W00P000DlNv030Y41> for this event. Click to learn more about the topics & speakers.
We are quite pleased to invite you to attend. The event is fast approaching so REGISTER <http://pages.intel.com/y2NP1Ddv0l0ZYWO00500030> soon!
Call for Papers:The deadline to submit an abstract is very near. Please send an abstract for a 30min technical presentation to fan.yong(a)intel.com <http://pages.intel.com/SW0Y3l01Z000vPd0200ND6P> by no later than September 26th 2016.
EVENT DETAILS
Date: October 20th, 2016
Location: WH Ming Hotel Shanghai <http://pages.intel.com/n/SW0Y3l01Z000vPd0200ND7Q>
No.777 Jiamusi Road (at the junction of Yingkou Road, near Huangxing Park)
Yangpu District, Shanghai 200433, China
Lustre*最好的思想,即来参与2016中国区Lustre用户峰会,仔细聆听并认真学习。具有先进行业思维的领袖以及Lustre专家将会为大家阐释最新的趋势、发展以及Lustre的应用
此次会议的内容将非常精彩。点击这里,可以了解更多关于议题和演讲者的信息。 会议召开在即,快来注册参会吧,我们竭诚邀请您参与此次会议。
征文启事:
大会会务组诚邀您在此次峰会上进行演讲。请将您的30分钟技术演讲的概要在2016年9月 26日前发送至:fan.yong(a)intel.com <http://pages.intel.com/SW0Y3l01Z000vPd0200ND6P>
会议详情:
日期:2016年10月20日
地点:上海小南国花园酒店 (上海市杨浦区佳木斯路777号,紧邻黄兴公园)
<http://info.intel.com/rs/797-PWY-691/images/IntelLug2016Pic1.jpg>
<http://info.intel.com/rs/797-PWY-691/images/IntelLUG2016Pic2.jpg>
<http://info.intel.com/rs/797-PWY-691/images/IntelLUG2016Pic3.jpg>
<http://info.intel.com/rs/intel/images/100x100_clear.gif>
<http://info.intel.com/rs/intel/images/100x100_clear.gif>
<http://info.intel.com/rs/intel/images/100x100_clear.gif>
<http://pages.intel.com/Q0NY01W0PR3ZD0v2l0080d0> Trademarks | <http://pages.intel.com/hZd2S0W00P000DlNv030Y91> Terms of Use | <http://pages.intel.com/o03Tv00aN002dP001YWlDZ0> Privacy Policy | <http://pages.intel.com/G2W0YN301b0UP0lZ0000dvD> Contact Us
Legal Information:
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
No computer system can be absolutely secure.
Intel and the Intel logo are trademarks of Intel Corporation in the United States and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright ©2016 Intel Corporation.
Intel Corporation, M/S RNB4-145, 2200 Mission College Blvd., Santa Clara, CA 95054 USA
<http://info.intel.com/rs/intel/images/100x100_clear.gif>
<http://info.intel.com/rs/intel/images/100x100_clear.gif>
<http://pages.intel.com/trk?t=1&mid=Nzk3LVBXWS02OTE6MTI1NDQ6OTU2NToxNzE5NT...>
If you no longer wish to receive these emails, click on the following link: Unsubscribe <http://pages.intel.com/u/E0l0P0dcWV10v0DY3N00Z02>
4 years, 5 months
MDT keeps in a 100% usage
by Ramiro Alba
Hi all,
I am using Lustre 2.5.3 under CentOS 6.5 both servers and clients and
after removing 300.000 files (from 1.600.000 files), only 163008Kb could
be reduced on MDT, but it keeps showing a 100% usage:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vglustre-MDT
70766580 68563552 0 100% /lustre/mdt
The lustre file system is using 6 OSTs of 30TB size
The system mounts without problems, MDT backups (using tar) can be done
and the whole file system seems to work fine, but I can not manage to
reduce
the MDT usage. The only thing is that yesterday the whole cluster
system, due
to an electrical problem, went down suddenly, without been protected by
our UPS.
Any chance to be related with keeping on 100% usage?
I am worried with this issue. Should I?
Apart from using a bigger disk on MDT, is here anything that can be
done?
Any suggestions are welcomed
Best regards
#############################################################
I did:
[root@jffmds check]# e2fsck -fp /dev/mapper/vglustre-MDT
[QUOTA WARNING] Usage inconsistent for ID 0:actual (69414895616, 21842)
!= expected (69394186240, 21839)
[QUOTA WARNING] Usage inconsistent for ID 1056:actual (1572864, 6056) !=
expected (1572864, 6047)
[QUOTA WARNING] Usage inconsistent for ID 1027:actual (9265152, 21891)
!= expected (9265152, 21795)
[QUOTA WARNING] Usage inconsistent for ID 1367:actual (40321024, 121180)
!= expected (40316928, 121054)
[QUOTA WARNING] Usage inconsistent for ID 1304:actual (6074368, 32508)
!= expected (6074368, 32507)
[QUOTA WARNING] Usage inconsistent for ID 1041:actual (9113600, 28693)
!= expected (9113600, 28687)
[QUOTA WARNING] Usage inconsistent for ID 1031:actual (5332992, 33716)
!= expected (5320704, 33578)
[QUOTA WARNING] Usage inconsistent for ID 1037:actual (1925120, 8141) !=
expected (1925120, 8138)
jffstg-MDT0000: Update quota info for quota type 0.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (69414907904, 21870)
!= expected (69394198528, 21867)
[QUOTA WARNING] Usage inconsistent for ID 100:actual (296939520,
1240117) != expected (296923136, 1239738)
jffstg-MDT0000: Update quota info for quota type 1.
jffstg-MDT0000: 1262302/47185920 files (1.1% non-contiguous),
23038694/23592960 blocks
but it keeps saying 100% usage
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
4 years, 5 months
LBUG: ASSERTION( get_current()->journal_info == ((void *)0) ) failed
by Cédric Dufour - Idiap Research Institute
Hello,
Last Friday, during normal operations, our MDS froze with the following
LBUG, which happens again as soon as one mounts the MDT again:
Sep 13 12:45:58 n00a kernel: [ 1002.705346] Lustre: lustre-1-MDT0000:
used disk, loading
Sep 13 12:45:58 n00a kernel: [ 1002.741484] LustreError:
6265:0:(sec_config.c:1121:sptlrpc_target_local_read_conf()) missing llog
context
Sep 13 12:46:00 n00a kernel: [ 1004.771365] LustreError: 11-0:
lustre-1-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation
mds_connect failed with -11.
Sep 13 12:46:00 n00a kernel: [ 1004.783359] Lustre: lustre-1-MDT0000:
Imperative Recovery enabled, recovery window shrunk from 300-900 down to
150-450
Sep 13 12:46:00 n00a kernel: [ 1005.073160] Lustre: lustre-1-MDT0000:
Will be in recovery for at least 2:30, or until 179 clients reconnect
Sep 13 12:46:05 n00a kernel: [ 1010.228502] LustreError:
6307:0:(osd_handler.c:936:osd_trans_start()) ASSERTION(
get_current()->journal_info == ((void *)0) ) failed:
Sep 13 12:46:05 n00a kernel: [ 1010.240617] LustreError:
6307:0:(osd_handler.c:936:osd_trans_start()) LBUG
Our setup is Lustre 2.5.2 and the following debug classes enabled:
n00a:~ # cat /proc/sys/lnet/debug
ioctl neterror warning error emerg ha config console
I've had a look at:
- https://jira.hpdd.intel.com/browse/LU-6556
- https://jira.hpdd.intel.com/browse/LU-6634
- https://jira.hpdd.intel.com/browse/LU-7138
but:
- changelog_* files have 0 bytes and proper root permissions
- as far as I am able to tell, we have no Changelog actually registered
The node freezes as soon as the LBUG happens and no debug log gets
written to /tmp.
Based on the console output (see above), there is no preliminary error
that may explain how we stumble on that LBUG
I've run a file-system check on the corresponding ldiskfs device; errors
were fixed but a second dry-run reported nothing dangling.
What can I do solve that situation ?
Best regards,
Cédric
--
Cédric Dufour @ Idiap Research Institute
4 years, 5 months