lustre FID-in-dirent backward compatibility
by Chris Hunter
We are in process of upgrading a lustre FS. We currently have the MDS
and OSS servers running 2.1.5 (upgraded from 1.8.9). However our client
are still running 1.8.9.
I am trying to clarify if the "FID-in-dirent" feature (ie. tunefs -O
dirdata) is backwards compatible with 1.8.9 clients ?
I have seen a few talks (SC11?) that says it should be. However the 2.x
lustre operations manual at hpdd-intel doesn't *explicitly* say it will
work with older clients.
Is there high risk involved enabling "dirdata" when running old 1.8.9
clients ?
Any opinions if there is benefit in using this flag ?
thanks,
chris hunter
8 years, 3 months
robinhood delayed removed from archive - archive#0 only?
by Gary Hagensen
I have been playing with robinhood and the delayed removal from the
archive. I don't see how it can work except from archive = 0. I don't
see the archive id in the changelog. Robinhood could query for it when
it sees an HSM changlog entry but I don't see a place holder for it in
the robinhood mysql database tables. I archived a file (tryitfile2)
using archive=4 and didn't see the "4" show up in any of the mysql
tables. I did see the deleted file move through the SOFT_RM table in
the mysql database, but a query of hsm.actions shows the removal
waiting with the archive#=0, not 4.
If this is indeed the case, it should be documented that delayed
removal from the archive will only work for archive=0.
The second file in the mysql tables shows the file waiting for
removal. Looking at the other tables in the database, it appears that
all references to it are gone so the data in SOFT_RM appears to be the
only data that is available (no archive_id to pass along).
mysql> select * from SOFT_RM;
+---------------------+-------------------------------+--------------+--------------+
| fid | fullpath | soft_rm_time | real_rm_time |
+---------------------+-------------------------------+--------------+--------------+
| 0x200000400:0x2:0x0 | /mnt/dmf2lfs/tryit/tryitfile | 1391562713 | 1391649113 |
| 0x200000400:0x3:0x0 | /mnt/dmf2lfs/tryit/tryitfile2 | 1391564535 | 1391564595 |
+---------------------+-------------------------------+--------------+--------------+
2 rows in set (0.00 sec)
The time expired, robinhood issues the request and removed it from the database
mysql> select * from SOFT_RM;
+---------------------+------------------------------+--------------+--------------+
| fid | fullpath | soft_rm_time | real_rm_time |
+---------------------+------------------------------+--------------+--------------+
| 0x200000400:0x2:0x0 | /mnt/dmf2lfs/tryit/tryitfile | 1391562713 | 1391649113 |
+---------------------+------------------------------+--------------+--------------+
But here is the request waiting on archive=0, not 4.
# lctl get_param -n mdt.dmf2lfs-MDT0000.hsm.actions
lrh=[type=10680000 len=136 idx=1/40] fid=[0x200000400:0x3:0x0] dfid=[0x200000400:0x3:0x0]
compound/cookie=0x52f1807b/0x52f180 7b action=REMOVE archive#=0 flags=0x0 extent=0x0-0xffffffffffffffff
gid=0x0 datalen=0 status=WAITING data=[]
Another thing to note is that if you have a deferred_remove_delay of
say 1 day, then delete a file, then change the deferred_remove_delay
to 1 hour, only files deleted after the change (using rbh-lhsm -d)
will get the new delay. The previously deleted file will wait 1 day to
be removed. Not totally unexpected, but should be documented.
Gary
8 years, 3 months
Problems with HSM
by Valvanuz Fernandez
Hello:
I've installed the Lustre Feature Release (2.5) from whamcloud website
in my Centos 6 servers and followed the documentation to configure the HSM.
It seems that the coodinator and the agent are running properly, but
when I try to archive a file from a lustre client it fails. As I was not
sure if there was a problem with the agent, I've instantiated 2 agents
in 2 different machines (one in the MDS, that also acts as lustre
client). The backend of the first agent is a NFS filesystem and the
backend of the sencond is XFS.
The error I get when I try to archive a file from any of my clients is
the following:
[root@wn021 ~]# lfs hsm_archive /lustre/collectl.conf
Cannot send HSM request (use of /lustre/collectl.conf): Invalid argument
Below, I've attached the output of the commands I've used to test that
the configuration and the strace output of the of the "lfs hsm_archive".
Could somebody help me? Thanks in advanced
Valvanuz
[root@wn024 ~]# lctl get_param mdt.lustrefs-MDT0000.hsm_control
mdt.lustrefs-MDT0000.hsm_control=enabled
[root@wn024 ~]# lctl get_param -n mdt.lustrefs-MDT0000.hsm.agents
uuid=62a6a7ae-1c24-1380-e960-443ee68c4e80 archive_id=1
requests=[current:0 ok:0 errors:0]
uuid=5f5d5627-7fa6-1de0-6823-7d80db8d482d archive_id=2
requests=[current:0 ok:0 errors:0]
[root@wn031 ~]# ps -ef|grep hsm
root 20437 1 0 Jan22 ? 00:00:00 lhsmtool_posix --daemon
--hsm-root /localtmp/lustrehsm --archive=2 /lustre
[root@wn024 ~]# ps -ef|grep hsm
root 11750 1 0 Jan22 ? 00:00:00 lhsmtool_posix --daemon
--hsm-root /oceano/gmeteo/WORK/valva/lustre --archive=1 /lustre
[root@wn021 ~]# lfs hsm_archive /lustre/collectl.conf
Cannot send HSM request (use of /lustre/collectl.conf): Invalid argument
[root@wn021 ~]# lfs hsm_state /lustre/collectl.conf
/lustre/collectl.conf: (0x00000000)
[root@wn021 ~]# strace lfs hsm_archive /lustre/collectl.conf
execve("/usr/bin/lfs", ["lfs", "hsm_archive", "/lustre/collectl.conf"],
[/* 69 vars */]) = 0
brk(0) = 0xa43000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258d9000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=54707, ...}) = 0
mmap(NULL, 54707, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f90258cb000
close(3) = 0
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\\\0\0\0\0\0\0"..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=142464, ...}) = 0
mmap(NULL, 2212768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f902549e000
mprotect(0x7f90254b5000, 2097152, PROT_NONE) = 0
mmap(0x7f90256b5000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f90256b5000
mmap(0x7f90256b7000, 13216, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f90256b7000
close(3) = 0
open("/lib64/libreadline.so.6", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0PE\1\0\0\0\0\0"..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=269592, ...}) = 0
mmap(NULL, 2370056, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f902525b000
mprotect(0x7f9025295000, 2097152, PROT_NONE) = 0
mmap(0x7f9025495000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3a000) = 0x7f9025495000
mmap(0x7f902549d000, 2568, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f902549d000
close(3) = 0
open("/lib64/libncurses.so.5", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000j\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=140096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258ca000
mmap(NULL, 2235624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9025039000
mprotect(0x7f902505b000, 2093056, PROT_NONE) = 0
mmap(0x7f902525a000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f902525a000
close(3) = 0
open("/lib64/libkeyutils.so.1", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\v\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=10192, ...}) = 0
mmap(NULL, 2105424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9024e36000
mprotect(0x7f9024e38000, 2093056, PROT_NONE) = 0
mmap(0x7f9025037000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f9025037000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355\1\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1916568, ...}) = 0
mmap(NULL, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9024aa3000
mprotect(0x7f9024c2d000, 2093056, PROT_NONE) = 0
mmap(0x7f9024e2c000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x7f9024e2c000
mmap(0x7f9024e31000, 18600, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9024e31000
close(3) = 0
open("/lib64/libtinfo.so.5", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\310\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=135896, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c9000
mmap(NULL, 2232320, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9024882000
mprotect(0x7f902489f000, 2097152, PROT_NONE) = 0
mmap(0x7f9024a9f000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7f9024a9f000
close(3) = 0
open("/lib64/libdl.so.2", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=19536, ...}) = 0
mmap(NULL, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f902467e000
mprotect(0x7f9024680000, 2097152, PROT_NONE) = 0
mmap(0x7f9024880000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f9024880000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c6000
arch_prctl(ARCH_SET_FS, 0x7f90258c7700) = 0
mprotect(0x7f9024880000, 4096, PROT_READ) = 0
mprotect(0x7f9024e2c000, 16384, PROT_READ) = 0
mprotect(0x7f9025037000, 4096, PROT_READ) = 0
mprotect(0x7f90256b5000, 4096, PROT_READ) = 0
mprotect(0x7f90258da000, 4096, PROT_READ) = 0
munmap(0x7f90258cb000, 54707) = 0
set_tid_address(0x7f90258c79d0) = 15850
set_robust_list(0x7f90258c79e0, 0x18) = 0
futex(0x7fffd860f6bc, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fffd860f6bc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1,
NULL, 7f90258c7700) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRTMIN, {0x7f90254a3ae0, [], SA_RESTORER|SA_SIGINFO,
0x7f90254ad500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f90254a3b70, [],
SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f90254ad500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
shmget(IPC_PRIVATE, 65680, 0600) = 65536
shmat(65536, 0, 0) = ?
shmctl(65536, IPC_RMID, 0) = 0
brk(0) = 0xa43000
brk(0xa64000) = 0xa64000
lstat("/lustre/collectl.conf", {st_mode=S_IFREG|0644, st_size=7361,
...}) = 0
open("/lustre/collectl.conf", O_RDONLY|O_NONBLOCK|O_NOFOLLOW) = 3
ioctl(3, 0x800866ad, 0xa43048) = 0
close(3) = 0
lstat("/lustre", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
lstat("/lustre/collectl.conf", {st_mode=S_IFREG|0644, st_size=7361,
...}) = 0
open("/etc/mtab", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=344, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258d8000
read(3, "/dev/md0 / ext4 rw 0 0\nproc /pro"..., 4096) = 344
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f90258d8000, 4096) = 0
open("/lustre", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
ioctl(3, 0x401866d9, 0xa43030) = -1 EI[root@wn021 ~]# strace
lfs hsm_archive /lustre/collectl.conf
execve("/usr/bin/lfs", ["lfs", "hsm_archive", "/lustre/collectl.conf"],
[/* 69 vars */]) = 0
brk(0) = 0xa43000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258d9000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=54707, ...}) = 0
mmap(NULL, 54707, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f90258cb000
close(3) = 0
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\\\0\0\0\0\0\0"..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=142464, ...}) = 0
mmap(NULL, 2212768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f902549e000
mprotect(0x7f90254b5000, 2097152, PROT_NONE) = 0
mmap(0x7f90256b5000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f90256b5000
mmap(0x7f90256b7000, 13216, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f90256b7000
close(3) = 0
open("/lib64/libreadline.so.6", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0PE\1\0\0\0\0\0"..., 832)
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=269592, ...}) = 0
mmap(NULL, 2370056, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f902525b000
mprotect(0x7f9025295000, 2097152, PROT_NONE) = 0
mmap(0x7f9025495000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3a000) = 0x7f9025495000
mmap(0x7f902549d000, 2568, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f902549d000
close(3) = 0
open("/lib64/libncurses.so.5", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000j\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=140096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258ca000
mmap(NULL, 2235624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9025039000
mprotect(0x7f902505b000, 2093056, PROT_NONE) = 0
mmap(0x7f902525a000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f902525a000
close(3) = 0
open("/lib64/libkeyutils.so.1", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\v\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=10192, ...}) = 0
mmap(NULL, 2105424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9024e36000
mprotect(0x7f9024e38000, 2093056, PROT_NONE) = 0
mmap(0x7f9025037000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f9025037000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355\1\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1916568, ...}) = 0
mmap(NULL, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9024aa3000
mprotect(0x7f9024c2d000, 2093056, PROT_NONE) = 0
mmap(0x7f9024e2c000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x7f9024e2c000
mmap(0x7f9024e31000, 18600, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9024e31000
close(3) = 0
open("/lib64/libtinfo.so.5", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\310\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=135896, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c9000
mmap(NULL, 2232320, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f9024882000
mprotect(0x7f902489f000, 2097152, PROT_NONE) = 0
mmap(0x7f9024a9f000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7f9024a9f000
close(3) = 0
open("/lib64/libdl.so.2", O_RDONLY) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=19536, ...}) = 0
mmap(NULL, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f902467e000
mprotect(0x7f9024680000, 2097152, PROT_NONE) = 0
mmap(0x7f9024880000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f9024880000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258c6000
arch_prctl(ARCH_SET_FS, 0x7f90258c7700) = 0
mprotect(0x7f9024880000, 4096, PROT_READ) = 0
mprotect(0x7f9024e2c000, 16384, PROT_READ) = 0
mprotect(0x7f9025037000, 4096, PROT_READ) = 0
mprotect(0x7f90256b5000, 4096, PROT_READ) = 0
mprotect(0x7f90258da000, 4096, PROT_READ) = 0
munmap(0x7f90258cb000, 54707) = 0
set_tid_address(0x7f90258c79d0) = 15850
set_robust_list(0x7f90258c79e0, 0x18) = 0
futex(0x7fffd860f6bc, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fffd860f6bc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1,
NULL, 7f90258c7700) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRTMIN, {0x7f90254a3ae0, [], SA_RESTORER|SA_SIGINFO,
0x7f90254ad500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f90254a3b70, [],
SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f90254ad500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
shmget(IPC_PRIVATE, 65680, 0600) = 65536
shmat(65536, 0, 0) = ?
shmctl(65536, IPC_RMID, 0) = 0
brk(0) = 0xa43000
brk(0xa64000) = 0xa64000
lstat("/lustre/collectl.conf", {st_mode=S_IFREG|0644, st_size=7361,
...}) = 0
open("/lustre/collectl.conf", O_RDONLY|O_NONBLOCK|O_NOFOLLOW) = 3
ioctl(3, 0x800866ad, 0xa43048) = 0
close(3) = 0
lstat("/lustre", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
lstat("/lustre/collectl.conf", {st_mode=S_IFREG|0644, st_size=7361,
...}) = 0
open("/etc/mtab", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=344, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f90258d8000
read(3, "/dev/md0 / ext4 rw 0 0\nproc /pro"..., 4096) = 344
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f90258d8000, 4096) = 0
open("/lustre", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
ioctl(3, 0x401866d9, 0xa43030) = -1 EINVAL (Invalid argument)
close(3) = 0
write(2, "Cannot send HSM request (use of "..., 73Cannot send HSM
request (use of /lustre/collectl.conf): Invalid argument
) = 73
rt_sigaction(SIGINT, {0x40b230, [RT_1 RT_2 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8
RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], SA_RESTORER|SA_RESTART,
0x7f90254ad500}, NULL, 8) = 0
exit_group(22) = ?
NVAL (Invalid argument)
close(3) = 0
write(2, "Cannot send HSM request (use of "..., 73Cannot send HSM
request (use of /lustre/collectl.conf): Invalid argument
) = 73
rt_sigaction(SIGINT, {0x40b230, [RT_1 RT_2 RT_3 RT_4 RT_5 RT_6 RT_7 RT_8
RT_9 RT_10 RT_11 RT_12 RT_13 RT_14 RT_15], SA_RESTORER|SA_RESTART,
0x7f90254ad500}, NULL, 8) = 0
exit_group(22) = ?
8 years, 3 months
Split MDS/MGS - process recover log testfs-mtdir error -22
by Anthony Alba
I have split an MDS from an MGS using 2.4.2; running on the same server
using
CentOS 2.6.32-358.23.2.el6_lustre + ldiskfs only.
I have done a writeconf; now the MDS won't "register" with the MGS.
On the split MDS/MGS server the MGT and MDT devices mount but the logs show:
Process recover log testfs-mtdir error -22.
However, if I now try to mount the OSTs, they get stuck in "AT".
One symptom is that on the split MGS/MDS the lustre modules cannot unload,
as some process is using osp.o. If I don't start the OSTs the lustre stack
can be unloaded.
Any suggestions on getting the MDS/MGS to play nice?
Thanks
Anthony
[root@mds1 ~]# lctl dl
0 UP osd-ldiskfs MGS-osd MGS-osd_UUID 5
1 UP mgs MGS MGS 5
2 UP mgc MGC5.5.200.5@o2ib 9ce8a82a-136e-616e-f6f3-f4570fbd364e 5
3 UP osd-ldiskfs testfs-MDT0000-osd testfs-MDT0000-osd_UUID 7
4 UP mds MDS MDS_uuid 3
5 UP lod testfs-MDT0000-mdtlov testfs-MDT0000-mdtlov_UUID 4
6 UP mdt testfs-MDT0000 testfs-MDT0000_UUID 3
7 UP mdd testfs-MDD0000 testfs-MDD0000_UUID 4
8 UP qmt testfs-QMT0000 testfs-QMT0000_UUID 4
Feb 4 12:12:10 mds1 kernel: Lustre: MGS: Logs for fs testfs were removed
by user request.
All servers must be restarted in order to regenerate the logs.
Feb 4 12:12:10 mds1 kernel: Lustre: testfs-MDT0000: used disk, loading
Feb 4 12:12:10 mds1 kernel: LustreError:
8412:0:(osd_io.c:1000:osd_ldiskfs_read()) testfs=M
DT0000: can't read 128@8192 on ino 21: rc = 0
Feb 4 12:12:10 mds1 kernel: LustreError:
8412:0:(mdt_recovery.c:112:mdt_clients_data_init()
) error reading MDS last_rcvd idx 0, off 8192: rc -14
Feb 4 12:12:18 mds1 kernel: Lustre:
6987:0:(mgc_request.c:1564:mgc_process_recover_log()) P
rocess recover log testfs-mdtir error -22
Now attempt to mount 2 x OST:
9 AT osp testfs-OST0000-osc-MDT0000 testfs-MDT0000-mdtlov_UUID 1
Feb 4 12:22:48 mds1 kernel: Lustre: MGS: Regenerating testfs-OST0000 log
by user request.
Feb 4 12:22:58 mds1 kernel: Lustre:
8697:0:(mgc_request.c:1564:mgc_process_recover_log()) P
rocess recover log testfs-mdtir error -22
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(ldlm_lib.c:429:client_obd_setup()) can't a
dd initial connection
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(osp_dev.c:686:osp_init0()) testfs-OST0000-
osc-MDT0000: can't setup obd: -2
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(obd_config.c:572:class_setup()) setup testfs-OST0000-osc-MDT0000
failed (-2)
Feb 4 12:22:48 mds1 kernel: Lustre: MGS: Regenerating testfs-OST0000 log
by user request.
Feb 4 12:22:58 mds1 kernel: Lustre:
8697:0:(mgc_request.c:1564:mgc_process_recover_log()) P
rocess recover log testfs-mdtir error -22
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(ldlm_lib.c:429:client_obd_setup()) can't a
dd initial connection
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(osp_dev.c:686:osp_init0()) testfs-OST0000-
osc-MDT0000: can't setup obd: -2
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(obd_config.c:572:class_setup()) setup mant
le-OST0000-osc-MDT0000 failed (-2)
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(obd_config.c:1553:class_config_llog_handle
r()) MGC5.5.200.5@o2ib: cfg command failed: rc = -2
Feb 4 12:22:58 mds1 kernel: Lustre: cmd=cf003
0:testfs-OST0000-osc-MDT0000 1:testfs-OST
0000_UUID 2:0@<0:0>
Feb 4 12:22:58 mds1 kernel: LustreError:
8803:0:(obd_config.c:1553:class_config_llog_handler())
MGC5.5.200.5@o2ib: cfg command failed: rc = -2
Feb 4 12:22:58 mds1 kernel: Lustre: cmd=cf003
0:testfs-OST0000-osc-MDT0000
1:testfs-OST0000_UUID 2:0@<0:0>
8 years, 3 months
Re: [HPDD-discuss] 'lustre-dkms' (skeleton) package for Debian/Ubuntu available
by Cédric Dufour - Idiap Research Institute
Hello again,
On 03/02/14 22:30, Dan Tascione wrote:
> Thank you very much for your reply -- that definitely pointed me in a better direction.
>
> Actually, I have my fingers crossed, but it seems to have passed my first round of stress tests without crashing, which is exciting.
Good to hear. The more we are who use the in-kernel client, the more momentum it'll get!
>
> One last question -- what type of striping configuration do you normally use for your data? I'm wondering if some of the problems I was hitting before were related to moving away from the default striping configuration.
We use *no* striping; each file remains on a single OST (chosen by the system), whatever its size.
Best,
Cédric
>
> Thanks,
> Dan
>
>
> On 02/03/2014 04:04 AM, Cédric Dufour - Idiap Research Institute wrote:
>> Hello Dan,
>>
>> On 02/02/14 17:25, Dan Tascione wrote:
>>>
>>> Hi Cédric,
>>>
>>>
>>>
>>> I had a few hopefully easy questions about your Ubuntu setup, if you have the time to answer them.
>>>
>>>
>>>
>>> Our server side is Lustre 2.4.2 on CentOS 6.4 (installed with the Whamcloud RPMs). These nodes all seem to be operating fine.
>>>
>>
>> We have 2.4.2 on Ubuntu 12.04 with 2.6.32 kernel (which our partner, Q-Leap GmbH, set up and maintains)
>>
>>>
>>>
>>> Our client side is currently Ubuntu 12.04. I've tried:
>>>
>>> - Compiling Lustre client from the git tree (both 2.4.2 and master)
>>>
>>
>> Haven't even tried it (being quite certain it would fail)
>>
>>> - Building the 3.13 kernel from Ubuntu, with the Lustre modules enabled
>>>
>>>
>>>
>>> Unfortunately, in all my tests, the Ubuntu nodes regularly panic or just outright freeze entirely anywhere from 2 to 24 hours of operation.
>>>
>>
>> In order to in-kernel Lustre client to work (on kernel 3.12 for sure and also 3.13 I think), you *must* at least add the patches addressing:
>> - https://jira.hpdd.intel.com/browse/LU-4127
>> - https://jira.hpdd.intel.com/browse/LU-4157
>>
>>>
>>>
>>> For your Ubuntu clients, are you using the 3.12.8 that comes from Ubuntu, or from kernel.org?
>>>
>>
>> We started with an "apt-get source" in a Ubuntu/Trusty VM at the time its kernel was 3.12.0-7.15 (corresponding to 3.12.4 upstream).
>> We then added all incremental patches from https://www.kernel.org/ to "rebase" that kernel to 3.12.9.
>>
>>>
>>>
>>> It looks like you are just using the Lustre version that comes with the 3.12.8 kernel, and not the version from the Lustre source tree, is that correct?
>>>
>>
>> Yes, absolutely.
>>
>> The Lustre source tree still targets kernel 2.6.32 (or the like). As such, it is not suited for recent kernels :-(
>>
>> We started with stock in-kernel Lustre client from Ubuntu/Trusty 3.12.0-7.15, with patches for:
>> - https://jira.hpdd.intel.com/browse/LU-4127 (*required*)
>> - https://jira.hpdd.intel.com/browse/LU-4157 (*required*)
>> - https://jira.hpdd.intel.com/browse/LU-4231 (for NFS re-export)
>> - https://jira.hpdd.intel.com/browse/LU-4400 (for NFS re-export)
>>
>> BUT, as we stumbled on other minor bugs:
>> - https://jira.hpdd.intel.com/browse/LU-4209
>> - https://jira.hpdd.intel.com/browse/LU-4520
>> - https://jira.hpdd.intel.com/browse/LU-4530
>>
>> We decided to pull the in-kernel Lustre client from the latest-to-date kernel source; see https://jira.hpdd.intel.com/browse/LU-4530 for a discussion on what that might be.
>> Thus, we pull the in-kernel Lustre client from:
>> - https://github.com/verygreen/linux/tree/lustre-next
>> (which incorporates a few of the patches mentioned above, plus many others)
>> And added the patches fro the yet-not-integrated patches:
>> - https://jira.hpdd.intel.com/browse/LU-4231
>> - https://jira.hpdd.intel.com/browse/LU-4530
>> - https://jira.hpdd.intel.com/browse/LU-4520 (<-> 4152 <-> 4398 <-> 4429); this one is still unresolved as it requires server-side patches
>> - other that I thought might help our LU-4520
>>
>>>
>>>
>>> Are you clients all Infiniband, or are they Ethernet? We're using Ethernet here for the clients, and I am wondering if that's interacting badly somehow.
>>>
>>
>> All clients are Ethernet
>>
>>>
>>>
>>> You mentioned "3.14rc1~patched" below, but I wasn't sure what this version number referred to?
>>>
>>
>> At the time it was git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git, "staging-3.14rc1" branch, but it know no longer valid. Better off from https://github.com/verygreen/linux/tree/lustre-next
>>
>>>
>>>
>>> Thanks,
>>>
>>> Dan
>>>
>>
>> Best,
>>
>> Cédric
>>
>>>
>>>
>>>
>>>
>>> *From:*HPDD-discuss [mailto:hpdd-discuss-bounces@ml01.01.org] *On Behalf Of *Cédric Dufour - Idiap Research Institute
>>> *Sent:* Friday, January 24, 2014 7:17 AM
>>> *To:* Lustre (HPDD-discuss)
>>> *Subject:* [HPDD-discuss] 'lustre-dkms' (skeleton) package for Debian/Ubuntu available
>>>
>>>
>>>
>>> Hello all,
>>>
>>> Newly subscribed to the list, I've been going through the archives and seen some questions about Lustre client support on recent versions of Debian/Ubuntu distributions.
>>>
>>> We have addressed that issue by:
>>> - building a custom kernel with Lustre client *disabled*, based on Ubuntu's latest available kernel + latest stable patchsets, 3.12.8 for us so far (PS: './debian/rules editconfigs' to disable Lustre)
>>> - having a separate (easily upgrade-able) 'lustre-dkms' package based on Lustre in-kernel client code + our patches, 3.14rc1~patched for us so far
>>>
>>> We use that 3.12.8 kernel + lustre-dkms (3.14rc1~patched) package without any problem on:
>>> - Ubuntu/Quantal (~100 workstations and computation nodes)
>>> - Debian/Wheezy with the few libc (>= 2.14) dependencies pulled from Debian/Testing (a few servers requiring Lustre access)
>>> - (hopefully Ubuntu/Trusty 14.04 in a few weeks)
>>> (against a Lustre 2.6.32/2.4.2 cluster)
>>>
>>> I have tarball-ed the required resources at http://www.idiap.ch/~cdufour/download/lustre-dkms.tar.bz2 <http://www.idiap.ch/%7Ecdufour/download/lustre-dkms.tar.bz2> . It contains the skeleton directory and HOWO.TXT file that should get going those of you who are interested to follow the same path.
>>>
>>> Hope it helps.
>>>
>>> Best regard,
>>>
>>> Cédric
>>>
>>> --
>>>
>>> *Cédric Dufour @ Idiap Research Institute*
>>>
>>
>
8 years, 3 months
Re: [HPDD-discuss] 'lustre-dkms' (skeleton) package for Debian/Ubuntu available
by Cédric Dufour - Idiap Research Institute
Hello Dan,
On 02/02/14 17:25, Dan Tascione wrote:
>
> Hi Cédric,
>
>
>
> I had a few hopefully easy questions about your Ubuntu setup, if you have the time to answer them.
>
>
>
> Our server side is Lustre 2.4.2 on CentOS 6.4 (installed with the Whamcloud RPMs). These nodes all seem to be operating fine.
>
We have 2.4.2 on Ubuntu 12.04 with 2.6.32 kernel (which our partner, Q-Leap GmbH, set up and maintains)
>
>
> Our client side is currently Ubuntu 12.04. I've tried:
>
> - Compiling Lustre client from the git tree (both 2.4.2 and master)
>
Haven't even tried it (being quite certain it would fail)
> - Building the 3.13 kernel from Ubuntu, with the Lustre modules enabled
>
>
>
> Unfortunately, in all my tests, the Ubuntu nodes regularly panic or just outright freeze entirely anywhere from 2 to 24 hours of operation.
>
In order to in-kernel Lustre client to work (on kernel 3.12 for sure and also 3.13 I think), you *must* at least add the patches addressing:
- https://jira.hpdd.intel.com/browse/LU-4127
- https://jira.hpdd.intel.com/browse/LU-4157
>
>
> For your Ubuntu clients, are you using the 3.12.8 that comes from Ubuntu, or from kernel.org?
>
We started with an "apt-get source" in a Ubuntu/Trusty VM at the time its kernel was 3.12.0-7.15 (corresponding to 3.12.4 upstream).
We then added all incremental patches from https://www.kernel.org/ to "rebase" that kernel to 3.12.9.
>
>
> It looks like you are just using the Lustre version that comes with the 3.12.8 kernel, and not the version from the Lustre source tree, is that correct?
>
Yes, absolutely.
The Lustre source tree still targets kernel 2.6.32 (or the like). As such, it is not suited for recent kernels :-(
We started with stock in-kernel Lustre client from Ubuntu/Trusty 3.12.0-7.15, with patches for:
- https://jira.hpdd.intel.com/browse/LU-4127 (*required*)
- https://jira.hpdd.intel.com/browse/LU-4157 (*required*)
- https://jira.hpdd.intel.com/browse/LU-4231 (for NFS re-export)
- https://jira.hpdd.intel.com/browse/LU-4400 (for NFS re-export)
BUT, as we stumbled on other minor bugs:
- https://jira.hpdd.intel.com/browse/LU-4209
- https://jira.hpdd.intel.com/browse/LU-4520
- https://jira.hpdd.intel.com/browse/LU-4530
We decided to pull the in-kernel Lustre client from the latest-to-date kernel source; see https://jira.hpdd.intel.com/browse/LU-4530 for a discussion on what that might be.
Thus, we pull the in-kernel Lustre client from:
- https://github.com/verygreen/linux/tree/lustre-next
(which incorporates a few of the patches mentioned above, plus many others)
And added the patches fro the yet-not-integrated patches:
- https://jira.hpdd.intel.com/browse/LU-4231
- https://jira.hpdd.intel.com/browse/LU-4530
- https://jira.hpdd.intel.com/browse/LU-4520 (<-> 4152 <-> 4398 <-> 4429); this one is still unresolved as it requires server-side patches
- other that I thought might help our LU-4520
>
>
> Are you clients all Infiniband, or are they Ethernet? We're using Ethernet here for the clients, and I am wondering if that's interacting badly somehow.
>
All clients are Ethernet
>
>
> You mentioned "3.14rc1~patched" below, but I wasn't sure what this version number referred to?
>
At the time it was git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git, "staging-3.14rc1" branch, but it know no longer valid. Better off from https://github.com/verygreen/linux/tree/lustre-next
>
>
> Thanks,
>
> Dan
>
Best,
Cédric
>
>
>
>
> *From:*HPDD-discuss [mailto:hpdd-discuss-bounces@ml01.01.org] *On Behalf Of *Cédric Dufour - Idiap Research Institute
> *Sent:* Friday, January 24, 2014 7:17 AM
> *To:* Lustre (HPDD-discuss)
> *Subject:* [HPDD-discuss] 'lustre-dkms' (skeleton) package for Debian/Ubuntu available
>
>
>
> Hello all,
>
> Newly subscribed to the list, I've been going through the archives and seen some questions about Lustre client support on recent versions of Debian/Ubuntu distributions.
>
> We have addressed that issue by:
> - building a custom kernel with Lustre client *disabled*, based on Ubuntu's latest available kernel + latest stable patchsets, 3.12.8 for us so far (PS: './debian/rules editconfigs' to disable Lustre)
> - having a separate (easily upgrade-able) 'lustre-dkms' package based on Lustre in-kernel client code + our patches, 3.14rc1~patched for us so far
>
> We use that 3.12.8 kernel + lustre-dkms (3.14rc1~patched) package without any problem on:
> - Ubuntu/Quantal (~100 workstations and computation nodes)
> - Debian/Wheezy with the few libc (>= 2.14) dependencies pulled from Debian/Testing (a few servers requiring Lustre access)
> - (hopefully Ubuntu/Trusty 14.04 in a few weeks)
> (against a Lustre 2.6.32/2.4.2 cluster)
>
> I have tarball-ed the required resources at http://www.idiap.ch/~cdufour/download/lustre-dkms.tar.bz2 <http://www.idiap.ch/%7Ecdufour/download/lustre-dkms.tar.bz2> . It contains the skeleton directory and HOWO.TXT file that should get going those of you who are interested to follow the same path.
>
> Hope it helps.
>
> Best regard,
>
> Cédric
>
> --
>
> *Cédric Dufour @ Idiap Research Institute*
>
8 years, 3 months
What does it mean on client: Error ldlm_cli_enqueue: -2
by Arman Khalatyan
Hello,
On one of our big RAM(128GB) clients we see following message:
LustreError: 26295:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -2
can some one help to understand the failure?
SYSTEm: SL 6.4
lustre client/server
cat /proc/fs/lustre/version
lustre: 2.4.2
kernel: patchless_client
build: 2.4.2-RC2--PRISTINE-2.6.32-358.23.2.el6.x86_64
Thank you beforehand.
Arman.
8 years, 3 months
Debugging LNet RPC Errors
by Tobias Groschup
Hello,
I am still struggling with the GET message on the LND level. After
adding a test to the lnet selftest, there is one GET going through the
LND, and after that nothing happens some time, untill this error message
is dumped to the console:
add test RPC failed on 12345-1@ex: Unknown error 18446744073709551506
Is there any way to find out what caused this error? That would be a
great help in finding what the LND does wrong.
I consulted the different log files like /var/log/dmesg and
/var/log/messages. On my system, there is no file log-lustre under /tmp.
So, I do not know how to investigate this error further. Any help on
this matter would be very much appreciated!
Thanks and kind regards
Tobias Groschup
8 years, 3 months