Call for Participation - LAD'14 Workshop
by THIELL Stephane
*LAD'14 - Lustre Administrators and Developers Workshop*
*September 22-23, 2014**
**Domaine Pommery, Reims - France*
Organized by EOFS and OpenSFS in collaboration with the University of
Reims Champagne-Ardenne*/
/*
LAD'14 will take place in Reims, France, at Domaine Pommery during 2
days, on September 22-23, 2014. This will be a great opportunity for
worldwide Lustre administrators and developers to gather and exchange
their experiences, developments, tools, good practices and more!
Reims is reachable by train from Paris downtown and CDG Airport in about
45 minutes by the TGV, France's high speed rail line.
We are waiting for your registration and you presentation proposal!
WEB PAGE
Please use the following link to keep updated on LAD'14 agenda and
logistics:
http://www.eofs.org/?id=lad14
PRESENTATION
We are inviting community members to send proposal for presentation
during this event. No proceeding is required, just an *abstract* of a
30-min (technical) presentation.
Please send this to lad(a)eofs.eu before *July 25th, 2014*.
/Topics may include (but are not limited to): site updates or future
projects, Lustre administration, monitoring and tools, Lustre feature
overview, Lustre client performance (hot topic ;-), benefits of hardware
evolution to Lustre (like SSD, many-cores...), comparison between Lustre
and other parallel file systems (perf. and/or features), Lustre and
Exascale I/O, etc./
REGISTRATION
Registration for the workshop is now open (early-bird rate):
http://lad.eofs.org/register.php
We also recommend to book an hotel in downtown Reims as soon as possible.
SOCIAL EVENT
On Monday evening, there will be a guided visit of Pommery Champagne
cellars followed by a social dinner on site. A limited number of spouses
can attend too (on a first-come, first-served basis). Register quickly!
SPONSORS
This event is organized thanks to the following generous sponsors:
CEA, DataDirect Networks and Intel
We expect more sponsors to be confirmed soon. Please contact me if your
company wants to sponsor LAD.
For any other information, please contact lad(a)eofs.eu
7 years, 12 months
targets start order in Lustre 2.4.3
by Riccardo Murri
Hello,
The online Lustre manual recommends that Lustre targets are started in
this order[1]: MGT, MDT, OSTs, clients.
[1]: http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact...
Now we are setting up an HA cluster with Pacemaker, and a strict
ordering directive ("order ... Mandatory: ostXX mdt mgt") results in a
complete restart of all targets if the MGT is migrated. In my
experience (with Lustre 1.8.5) this is most of the time unnecessary,
and Lustre can recover from a single target restart. However, we have
recently switched to Lustre 2.4.3 and things might have changed.
So the question is: is this order strict (in Lustre 2.4.3), or can a
target be stopped and restarted on another node without affecting the
targets running on other nodes?
Thanks for any help!
Riccardo
--
Riccardo Murri
http://www.gc3.uzh.ch/people/rm
Grid Computing Competence Centre
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888
7 years, 12 months
Recovering a failed OST
by Bob Ball
I need to completely remake a failed OST. I have done this in the past,
but this time, the disk failed in such a way that I cannot fully get
recovery information from the OST before I destroy and recreate. In
particular, I am unable to recover the LAST_ID file, but successfully
retrieved the last_rcvd and CONFIGS/* files.
mount -t ldiskfs /dev/sde /mnt/ost
pushd /mnt/ost
cd O
cd 0
cp -p LAST_ID /root/reformat/sde
The O directory exists, but it is empty. What can I do concerning this
missing LAST_ID file? I mean, I probably have something, somewhere,
from some previous recovery, but that is way, way out of date.
My intent is to recreate this OST with the same index, and then put it
back into production. All files were moved off the OST before reaching
this state, so nothing else needs to be recovered here.
Thanks,
bob
8 years
Best practices for VM hosting?
by Adesanya, Adeyemi
We're looking into shared storage options for hosting VMs. I'm thinking about running the Lustre client on each hypervisor. Are there any specific guidelines related to Lustre storage for serving VM images? I'd like to hear from folks who are doing this in a production environment.
-------
Yemi
8 years
lustre 1.8.6 - OSTs not mounting after power failure
by Brian C. Huffman
All,
We had a power failure last evening and both our MDS (combined mdt /
mgt) and OSS servers went down.
Upon power-up, the MGT and all MDTs mounted correctly. Some of the OSTs
mounted but not all.
I umounted everything and then did an e2fsck on the OSTs that didn't
mount (just a basic "e2fsck <device>"). On one of those OSTs, there was
a corrected inode:
Pass 5: Checking group summary information
Inode bitmap differences: -76225610
Fix<y>? yes
However, the OSTs still wouldn't mount and I was seeing these messages
in the log:
May 21 11:28:06 oss2 lrmd: [3351]: info: RA output:
(lustre-ost5:start:stderr) mount.lustre: mount /dev/mapper/ost_home_5 at
/lustre/home/ost_home_5 failed: No such device or address The target
service failed to start (bad config log?) (/dev/mapper/ost_home_5). See
/var/log/messages.
So I then tried to umount everything and do a "tunefs.lustre
--writeconf" on each device.
Now on mount, I'm seeing the following:
May 21 14:00:19 oss1 kernel: LDISKFS-fs (dm-5): mounted filesystem with
ordered data mode
May 21 14:00:19 oss1 multipathd: dm-5: umount map (uevent)
May 21 14:00:19 oss1 kernel: JBD: barrier-based sync failed on dm-5-8 -
disabling barriers
May 21 14:00:19 oss1 kernel: LDISKFS-fs (dm-5): mounted filesystem with
ordered data mode
May 21 14:00:19 oss1 kernel: Lustre: MGC172.16.11.5@o2ib: Reactivating
import
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1156:server_start_targets()) no server named
home-OST0002 was started
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -6
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:1453:server_put_super()) no obd home-OST0002
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(ldlm_request.c:1597:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
May 21 14:00:20 oss1 kernel: JBD: barrier-based sync failed on dm-5-8 -
disabling barriers
May 21 14:00:20 oss1 multipathd: dm-5: umount map (uevent)
May 21 14:00:20 oss1 kernel: Lustre: server umount home-OST0002 complete
May 21 14:00:20 oss1 kernel: LustreError:
4702:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-6)
At this point, I'm not sure what to do next. Any suggestions?
Thanks,
Brian
8 years
nfsd export accounting
by Daire Byrne
Hi,
We have a bunch of clients exporting Lustre over NFS and we'd like to graph the data read/written by nfsd. With standard NFS servers (with attached disks) we can use /proc/{nfsd PID}/io to track the bytes read from disk but only write_bytes is updated when exporting a Lustre filesystem. Could we use stats_track_pid instead to record the reads for the nfsd processes even though it's in the kernel? Is there a good reason why /proc/PID/io/read_bytes never gets updated for the nfsd processes? It's like nfsd is serving the data directly from the VFS cache so nfsd doesn't register a read from Lustre.
These export servers don't exclusively do NFS exporting otherwise we would just use the network bytes out.
Regards,
Daire
8 years
Lustre Client Monitoring tool repository
by Ramiro Alba
Hi all,
Starting from LMT as a base, I developed a monitoring tool using cerebro
and python urwid (curses)
which can help people managing lustre, to know about lustre clients
causing problems.
The repository is at:
https://github.com/ramiro-alba/lcmt
The monitored metrics are quite the same as the one obtained from the
command:
collectl -sl --lustopts M
but with the advantage of been able to see all the clients in one shot
(using lcmt) and ordered from more to less by a certain column
I would like to hear your opinions about the matter, and if possible, to
be able improving
this modest but (IMHO) useful tool.
Thanks to the people developing LMT. Great job.
Regards
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
8 years
non-config log name received
by Pardo Diaz, Alfonso
Hello,
I had a MDS and MGS lustre 2.2, during my update process to 2.5.1, I have merge the MDS and MGS in the same node. All work OK, but when a client mount the filesystem, in the MDS I got the next message (/var/log/messages):
"kernel: Lustre: MGS: non-config logname received: params”
Any idea about the meaning of this log?
Thanks in advance
Alfonso Pardo Diaz
System Administrator / Researcher
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucción.
Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------
8 years
Same performance Infiniband and Ethernet
by Pardo Diaz, Alfonso
Hi,
I have migrated my Lustre 2.2 to 2.5.1 and I have equipped my OSS/MDS and clients with Infiniband QDR interfaces.
I have compile lustre with OFED 3.2 and I have configured lnet module with:
options lent networks=“o2ib(ib0),tcp(eth0)”
But when I try to compare the lustre performance across Infiniband (o2ib), I get the same performance than across ethernet (tcp):
INFINIBAND TEST:
dd if=/dev/zero of=test.dat bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copied, 5,88433 s, 178 MB/s
ETHERNET TEST:
dd if=/dev/zero of=test.dat bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copied, 5,97423 s, 154 MB/s
And this is my scenario:
- 1 MDs with SSD RAID10 MDT
- 10 OSS with 2 OST per OSS
- Infiniband interface in connected mode
- Centos 6.5
- Lustre 2.5.1
- Striped filesystem “lfs setstripe -s 1M -c 10"
I know my infiniband running correctly, because if I use IPERF3 between client and servers I got 40Gb/s by infiniband and 1Gb/s by ethernet connections.
Could you help me?
Regards,
Alfonso Pardo Diaz
System Administrator / Researcher
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilización, divulgación y/o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucción.
Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------
8 years