I saw this same problem on my system - it happens when trying to migrate a file created with Lustre 1.8.
See https://jira.hpdd.intel.com/browse/LU-4293 for details.
I have an updated version of lfs_migrate that works around this problem that I should push to Gerrit. The patch will be linked to the above bug when ready.
On Dec 17, 2013, at 18:43, "Peter Mistich" <peter.mistich(a)rackspace.com<mailto:email@example.com>> wrote:
today I added a ost and trying to balance them and when I run
lfs_migrate I get cannot swap layouts between <filename> and a volatile
file (Operation not permitted)
I am running lustre-2.5.52
any help would be great.
Lustre-discuss mailing list
We may be seeing some related behavior here as well.
Can you give any more details about the errors you're getting on
Our problem manifested itself as the MDS not being able to unmount,
because it's waiting for communication from the clients, while
unmounting/shutting down. (Eventually, messages about hung threads
appear on the MDS.) It may not be the sane thing (we're seeing it with
2.5 and have only begun seeing it recently), but it is similar and
happening on systems using IB.
Developer, IO File Systems
One of our user is running into this error: "IOError: [Errno 28] No space left on device: "
Although this is a python error, user is running into this multiple times while running on Lustre file system, during the life of his job. Hence I am trying to figure out if there is a way to tell if we hit Lustre/etx3 number of files limitation in a directory. Although I do not think this is the case because I read that the ext3 limitation is 15million files in a directory. I have also checked ulimit for the user and do not find any issues there. Also we are only 60% used on disk space for Lustre file system so we have not hit capacity issue.
Currently user generates about over 3.5 million files during his job run. After the failure we tested creating files manually and by script in the same directory and it works without error.
Any insight into this will be greatly appreciated.
Amit H. Kumar
I've recently started working with Lustre and setting up a couple of new
filesystems on RHEL 6.4 w/ Lustre 2.4 from the ZFS repository (we're
using Lustre on ZFS) with an Infiniband networking infrastructure using
OpenIB from the RedHat repositories.
I've encountered a problem that I'm curious if anyone else has
encountered. When shutting down machines with Lustre OSTs mounted on
them, the default shutdown scripts cause a hang when the OpenIB modules
begin to unload. This is due to the Lustre/LNET stop scripts not
completely unloading Lustre modules. While investigating, I discovered
that the following sequence would successfully unload the Lustre modules
such that IB modules could also unload:
1. Stop Lustre
2. Stop LNET (Outputs "ERROR: Module osc has non-zero reference count.")
3. Run lustre_rmmod (Outpus "Modules still loaded:
lnet/klnds/o2iblnd/ko2iblnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o
4. Stop LNET again to unload the three remaining modules.
I've written this into a shutdown script, which works as a solution, but
does not address the underlying problem.
Has anyone else seen this behavior?
Research System Administrator
UW-Space Science and Engineering
AOSS Room 439
Somebody has ever successfully compiled Lustre Client 2.4.1 on Ubuntu
Precise 12.04 with Mellanox OFED 2.0.3? I am stucked with this error:
checking build system type... x86_64-unknown-linux-gnu
checking whether to enable OpenIB gen2 support... no
configure: error: can't compile with OpenIB gen2 headers under
I tried a couple of patches/hacks found on Google but without success.
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Gouvernement du Canada | Government of Canada
after a writeconf we cannot mount the MDT. This happened with
Lustre 2.1.3. Any hints for fixing this problem would be greatly
appreciated. For details and log messages see below.
The file system is already more than 5 years old and was created
with Lustre 1.6. Later it was running with Lustre 1.8 and we upgraded
to version 2.1.3 a year ago. Since that time we had very few problems.
However, we frequently got LustreError messages on clients because
some applications wanted to use ACLs and ACLs were not enabled.
In order to change the ACL configuration we did a writeconf which
probably was a bad idea since afterwards the MDT did not start.
Removing pfs1work-MDT0000 on MGS/MDS or pfs1work-client on the MGS
did not help. Upgrading to version 2.1.6 on MDS and MDT did not fix
this problem. We made a backup of the MDT device and downgraded MDS
and MDT to version 1.8 since the writeconf had worked with that version
and indeed we were able to start the MDT. However, after upgrading to
version 2.1.3 the MDT does not mount again. We ran a read-only e2fsck
on the MDT and this did not find any problems. We are wondering if an
upgrade to version 2.4 would fix the problem.
Here are the messages from the MDS:
Dec 11 19:28:18 pfs1n2 kernel: [19046.838713] LDISKFS-fs (dm-6): mounted
filesystem with ordered data mode
Dec 11 19:28:18 pfs1n2 kernel: [19046.855112] Lustre:
MGC172.26.1.1@o2ib: Reactivating import
Dec 11 19:28:18 pfs1n2 kernel: [19046.922722] Lustre: Enabling ACL
Dec 11 19:28:18 pfs1n2 kernel: [19047.259229] LustreError:
28547:0:(mdd_device.c:1164:mdd_prepare()) Error(-2) initializing .lustre
Dec 11 19:28:18 pfs1n2 kernel: [19047.337228] LustreError:
28547:0:(mdt_handler.c:4606:mdt_init0()) Can't init device stack, rc -2
Dec 11 19:28:18 pfs1n2 kernel: [19047.417024] LustreError:
28547:0:(obd_config.c:565:class_setup()) setup pfs1work-MDT0000 failed (-2)
Dec 11 19:28:18 pfs1n2 kernel: [19047.426650] LustreError:
28547:0:(obd_config.c:1491:class_config_llog_handler()) Err -2 on cfg
Dec 11 19:28:19 pfs1n2 kernel: [19047.436520] Lustre: cmd=cf003
0:pfs1work-MDT0000 1:pfs1work-MDT0000_UUID 2:0
Dec 11 19:28:19 pfs1n2 kernel: [19047.447504] LustreError: 15c-8:
MGC172.26.1.1@o2ib: The configuration from log 'pfs1work-MDT0000' failed
(-2). This may be the result of communication errors between this node
and the MGS, a bad configuration, or other errors. See the syslog for
Dec 11 19:28:19 pfs1n2 kernel: [19047.471946] LustreError:
28516:0:(obd_mount.c:1192:server_start_targets()) failed to start server
Dec 11 19:28:19 pfs1n2 kernel: [19047.483194]
LustreError:28516:0:(obd_mount.c:1738:server_fill_super()) Unable to
start targets: -2
Dec 11 19:28:19 pfs1n2 kernel: [19047.492704] LustreError:
28516:0:(obd_config.c:610:class_cleanup()) Device 2 not setup
Dec 11 19:28:19 pfs1n2 kernel: [19047.501082] LustreError:
28516:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
Dec 11 19:28:19 pfs1n2 kernel: [19047.512672] LustreError:
Dec 11 19:28:19 pfs1n2 kernel: [19047.554700] Lustre: server umount
Dec 11 19:28:19 pfs1n2 kernel: [19047.560542] LustreError:
28516:0:(obd_mount.c:2203:lustre_fill_super()) Unable to mount (-2)
At the same time on the MGS:
Dec 11 19:27:47 pfs1n1 kernel: [18002.010225] LDISKFS-fs (dm-6): mounted
filesystem with ordered data mode
Dec 11 19:27:47 pfs1n1 kernel: [18002.029854] Lustre: MGS MGS started
Dec 11 19:27:47 pfs1n1 kernel: [18002.034066] Lustre: 23937:0
(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
628d6315-3333-d644-d0b0-314bb162402d@0@lo t0 exp (null) cur 1386786467
Dec 11 19:27:47 pfs1n1 kernel: [18002.049753] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) Skipped 1 previous
Dec 11 19:27:47 pfs1n1 kernel: [18002.060070] Lustre:
MGC172.26.1.1@o2ib: Reactivating import
Dec 11 19:28:03 pfs1n1 kernel: [18018.052321] Lustre:
23937:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from
firstname.lastname@example.org@o2ib t0 exp (null)
cur 1386786483 last 0
Dec 11 19:28:18 pfs1n1 kernel: [18032.725732] Lustre: MGS: Logs for fs
pfs1work were removed by user request. All servers must be restarted in
order to regenerate the logs.
Dec 11 19:28:18 pfs1n1 kernel: [18032.740584] Lustre: Setting parameter
pfs1work-MDT0000.mdd.quota_type in log pfs1work-MDT0000
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Scientific Computing und Simulation (SCS)
Zirkel 2, Building 20.21, Room 209
76131 Karlsruhe, Germany
Phone: +49 721 608 44861
Fax: +49 721 32550
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Howdy! This e-mail is to let you know about upcoming changes to a key
Whamcloud/Intel HPDD service.
We will soon be upgrading our code review tool, Gerrit. This will take place
on December 20th at 5PM Pacific time, and review.whamcloud.com will be down
for the duration of this upgrade. We anticipate this will take no longer than
We will send out a reminder e-mail the day before the upgrade.
Please contact joshua.kugler(a)intel.com with any questions or concerns.
High Performance Data Division (formerly Whamcloud)
Here is an update on the Lustre 2.6 release.
-A number of landings made http://git.whamcloud.com/?p=fs/lustre-release.git;a=shortlog;h=refs/heads...
-Testing on 2.5.51 tag is complete; testing on the 2.5.52 tag is underway
-If there are any issues not presently marked as blockers that you believe should be, please let me know
-Master is presently open for feature landings; feature freeze is January 31st
PS/ You can also keep up to date with matters relating to the 2.6 release on the CDWG wiki - http://wiki.opensfs.org/Lustre_2.6.0