Lustre and kernel buffer interaction
by John Bauer
I have been trying to understand a behavior I am observing in an IOR
benchmark on Lustre. I have pared it down to a simple example.
The IOR benchmark is running in MPI mode. There are 2 ranks, each
running on its own node. Each rank does the following:
Note : Test was run on the "swan" cluster at Cray Inc., using /lus/scratch
write a file. ( 10GB )
fsync the file
close the file
MPI_barrier
open the file that was written by the other rank.
read the file that was written by the other rank.
close the file that was written by the other rank.
The writing of each file goes as expected.
The fsync takes very little time ( about .05 seconds).
The first reads of the file( written by the other rank ) start out *very
*slowly. While theses first reads are proceeding slowly, the
kernel's cached memory ( the Cached: line in /proc/meminfo) decreases
from the size of the file just written to nearly zero.
Once the cached memory has reached nearly zero, the file reading
proceeds as expected.
I have attached a jpg of the instrumentation of the processes that
illustrates this behavior.
My questions are:
Why does the reading of the file, written by the other rank, wait until
the cached data drains to nearly zero before proceeding normally.
Shouldn't the fsync ensure that the file's data is written to the
backing storage so this draining of the cached memory should be simply
releasing pages with no further I/O?
For this case the "dead" time is only about 4 seconds, but this "dead"
time scales directly with the size of the files.
John
--
John Bauer
I/O Doctors LLC
507-766-0378
bauerj(a)iodoctors.com
5 years, 10 months
quotas on 2.4.3
by Matt Bettinger
Hello,
We have a fresh 2.4.3 lustre upgrade that is not yet put into
production running on rhel 6.4.
We would like to take a look at quotas but looks like there is some
major performance problems with 1.8.9 clients.
Here is how I enabled quotas
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=ug
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.ost=ug
[root@lfs-mds-0-0 ~]# lctl get_param osd-*.*.quota_slave.info
osd-ldiskfs.lustre2-MDT0000.quota_slave.info=
target name: lustre2-MDT0000
pool ID: 0
type: md
quota enabled: ug
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
The quotas seem to be working however the write performance from
1.8.9wc client to 2.4.3 with quotas on is horrific. Am I not setting
quotas up correctly?
I try to make a simple user quota on /lustre2/mattb/300MB_QUOTA directory
[root@hous0036 mattb]# lfs setquota -u l0363734 -b 307200 -B 309200 -i
10000 -I 11000 /lustre2/mattb/300MB_QUOTA/
See quota change is in effect...
[root@hous0036 mattb]# lfs quota -u l0363734 /lustre2/mattb/300MB_QUOTA/
Disk quotas for user l0363734 (uid 1378):
Filesystem kbytes quota limit grace files quota limit grace
/lustre2/mattb/300MB_QUOTA/
310292* 307200 309200 - 4 10000 11000 -
Try and write to quota directory as the user but get horrible write speed
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 61.7426 seconds, 5.1 MB/s
Try file number 2 and then quota take effect, so it seems.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
dd: writing `301MB_FILE2': Disk quota exceeded
dd: closing output file `301MB_FILE2': Input/output error
If I disable quotas using
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=none
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.oss=none
Then try and write the same file the speeds are more like we expect
but then can't use quotas.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 0.965009 seconds, 327 MB/s
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
I have not tried this with a 2.4 client, yet since all of our nodes
are 1.8.X until we rebuild our images.
I was going by the manual on
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact...
but it looks like I am running into interoperability issue (which I
thought I fixed by using 1.8.9-wc client) or just not configuring
this correctly.
Thanks!
MB
6 years, 1 month
lustre 2.5.0 + zfs
by luka leskovec
Hello all,
i got a running lustre 2.5.0 + zfs setup on top of centos 6.4 (the kernels
available on the public whamcloud site), my clients are on centos 6.5
(minor version difference, i recompiled the client sources with the options
specified on the whamcloud site)
but now i have some problems. I cannot judge how serious it is, as the only
problems i observe are slow responses on ls, rm and tar and apart from that
it works great. i also export it over nfs, which sometimes hangs the client
on which it is exported, but i expect this is an issue related to how many
service threads i have running on my servers (old machines).
but my osses (i got two) keep spitting out these messages into the system
log:
xxxxxxxxxxxxxxxxxxxxxx kernel: SPL: Showing stack for process 3264
xxxxxxxxxxxxxxxxxxxxxx kernel: Pid: 3264, comm: txg_sync Tainted:
P --------------- 2.6.32-358.18.1.el6_lustre.x86_64 #1
xxxxxxxxxxxxxxxxxxxxxx kernel: Call Trace:
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa01595a7>] ?
spl_debug_dumpstack+0x27/0x40 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0161337>] ?
kmem_alloc_debug+0x437/0x4c0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0163b13>] ?
task_alloc+0x1d3/0x380 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0160f8f>] ?
kmem_alloc_debug+0x8f/0x4c0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa02926f0>] ? spa_deadman+0x0/0x120
[zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa016432b>] ?
taskq_dispatch_delay+0x19b/0x2a0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0164612>] ?
taskq_cancel_id+0x102/0x1e0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa028259a>] ? spa_sync+0x1fa/0xa80
[zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295707>] ?
txg_sync_thread+0x307/0x590 [zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810560a9>] ?
set_user_nice+0xc9/0x130
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295400>] ?
txg_sync_thread+0x0/0x590 [zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162478>] ?
thread_generic_wrapper+0x68/0x80 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162410>] ?
thread_generic_wrapper+0x0/0x80 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
does anyone know, is this a serious problem, or just aesthetics? any way to
solve this? any hints?
best regards,
Luka Leskovec
6 years, 9 months
Re: [HPDD-discuss] iozone and lustre
by Dilger, Andreas
On 2014/03/10, 6:01 AM, "E.S. Rosenberg" <esr(a)cs.huji.ac.il<mailto:esr@cs.huji.ac.il>> wrote:
On Tue, Mar 4, 2014 at 4:04 PM, Simmons, James A.
>>If yes, you most likely stumbled on https://jira.hpdd.intel.com/browse/LU-4209
>We are running the in-kernel client, kernel 3.13.x on Debian machines
>The lfs utility in use is from the lustre 2.4.2 branch
>I see that a patch was submitted to the linux-kernel team, does anyone have a link to that patch? I'd like to or patch our kernel or see what version of the kernel we should be building, if it made >it into 3.14-rcX then we'll build that...
http://www.spinics.net/lists/linux-fsdevel/msg72386.html
Correct me if I'm wrong but it looks like that patch hit a dead end...
Andreas did any fsdevs ever voice an opinion on generic/lustre flags?
The patch was resubmitted, and Al Viro had relatively little to say about it (i.e. he didn't completely hate it, which is good). I've since sent an updated patch to Greg again.
Any chance it'll be in 3.14?
The patch is currently in the staging tree, I'm not sure which release it will be in.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
6 years, 10 months
URL CORRECTION: I/O Characterization of Large-Scale HPC Centers
by OpenSFS Administration
[Resending with correct URL:
http://opensfs.org/wp-content/uploads/2014/04/BenchmarkingWorkingGroup-io-su
rvey-v5-sarp.docx_.pdf]
The OpenSFS Benchmarking Working Group (BWG) has released its new I/O
Characterization Report.
The OpenSFS Benchmarking Working Group (BWG) was created with the intent of
defining an I/O benchmark suite to satisfy the requirements of the scalable
parallel file system users and facilities. The first step toward this end
was identified as characterization of I/O workloads, from small- to very
large-scale parallel file systems, deployed at various high-performance and
parallel computing (HPC) facilities and institutions. The characterization
will then drive the design of the I/O benchmarks that emulate these
workloads.
As part of the characterization, the BWG released a survey at the
Supercomputing Conference in 2012, to solicit participation and collect data
on file systems and workloads in HPC centers. This paper, which summarizes
the data collected and our analysis, is now available on the OpenSFS web
site at:
http://opensfs.org/wp-content/uploads/2014/04/BenchmarkingWorkingGroup-io-su
rvey-v5-sarp.docx_.pdf
Additional information is also available on our Resources Page:
http://lustre.opensfs.org/resources/.
Sincerely,
OpenSFS Administration
6 years, 10 months
I/O Characterization of Large-Scale HPC Centers -- Benchmarking Working Group
by OpenSFS Administration
Just in time for LUG 2014, the OpenSFS Benchmarking Working Group (BWG) has
released its new I/O Characterization Report.
The OpenSFS Benchmarking Working Group (BWG) was created with the intent of
defining an I/O benchmark suite to satisfy the requirements of the scalable
parallel file system users and facilities. The first step toward this end
was identified as characterization of I/O workloads, from small- to very
large-scale parallel file systems, deployed at various high-performance and
parallel computing (HPC) facilities and institutions. The characterization
will then drive the design of the I/O benchmarks that emulate these
workloads.
As part of the characterization, the BWG released a survey at the
Supercomputing Conference in 2012, to solicit participation and collect data
on file systems and workloads in HPC centers. This paper, which summarizes
the data collected and our analysis, is now available on the OpenSFS web
site at
http://lustre.opensfs.org/files/2014/04/BenchmarkingWorkingGroup-io-survey-v
5-sarp.docx_.pdf
Additional information is also available on our Resources Page
(http://lustre.opensfs.org/resources/).
Sincerely,
OpenSFS Administration
6 years, 10 months
Fwd: [Lustre-discuss] MDT fails to mount after lustre upgrade
by Parinay Kondekar
Anjana,
Does this look similar - https://jira.hpdd.intel.com/browse/LU-4634 ?
including hpdd-discuss.
HTH
---------- Forwarded message ----------
From: Anjana Kar <kar(a)psc.edu>
Date: 24 April 2014 20:16
Subject: Re: [Lustre-discuss] MDT fails to mount after lustre upgrade
To: lustre-discuss(a)lists.lustre.org
Would it make sense to run
"tunefs.lustre --mgs --writeconf --mgs --mdt /dev/sda /dev/sdb" ?
The original mkfs command used to create the MDT was
mkfs.lustre --reformat --fsname=iconfs --mgs --mdt --backfstype=zfs
--device-size 131072 \
--index=0 lustre-mgs-mdt/mgsmdt0 mirror /dev/sda /dev/sdb
Not sure if both device names should be included or the zpool name.
The latest 2.5.x version also fails to mount MDT with similar messages,
though
zpool seems intact... is there anyway to get MDT to mount?
[root@icon0 kar]# more /proc/fs/lustre/version
lustre: 2.5.58
kernel: patchless_client
build: 2.5.58-g5565877-PRISTINE-2.6.32-431.11.2.el6.netboot
[root@icon0 kar]# /sbin/service lustre start mgsmdt
Mounting lustre-mgs-mdt/mgsmdt0 on /mnt/lustre/local/mgsmdt
mount.lustre: mount lustre-mgs-mdt/mgsmdt0 at /mnt/lustre/local/mgsmdt
failed: File exists
[root@icon0 kar]# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
lustre-mgs-mdt 74.5G 47.2G 27.3G 63% 1.00x ONLINE -
Corresponding console messages:
2014-04-24T10:36:53.736967-04:00 icon0.psc.edu kernel: Lustre: Lustre:
Build Version: 2.5.58-g5565877-PRISTINE-2.6.32-431.11.2.el6.netboot
2014-04-24T10:36:53.986362-04:00 icon0.psc.edu kernel: LNet: Added LNI
10.10.101.160@o2ib [8/256/0/180]
2014-04-24T10:36:54.005177-04:00 icon0.psc.edu kernel: LNet: Added LNI
128.182.75.160@tcp10 [8/256/0/180]
2014-04-24T10:36:54.009556-04:00 icon0.psc.edu kernel: LNet: Accept secure,
port 988
2014-04-24T10:36:57.428805-04:00 icon0.psc.edu kernel: LustreError: 11-0:
iconfs-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect
failed with -11.
2014-04-24T10:36:58.600958-04:00 icon0.psc.edu kernel: LustreError:
11888:0:(mdd_device.c:1050:mdd_prepare()) iconfs-MDD0000: failed to
initialize lfsck: rc = -17
2014-04-24T10:36:58.600978-04:00 icon0.psc.edu kernel: LustreError:
11888:0:(obd_mount_server.c:1776:server_fill_super()) Unable to start
targets: -17
2014-04-24T10:36:58.614082-04:00 icon0.psc.edu kernel: Lustre: Failing over
iconfs-MDT0000
2014-04-24T10:37:04.885917-04:00 icon0.psc.edu kernel: Lustre:
11888:0:(client.c:1912:ptlrpc_expire_one_request()) @@@ Request sent has
timed out for slow reply: [sent 1398350218/real 1398350218]
req@ffff8802fca1a000 x1466276472946828/t0(0)
o251->MGC10.10.101.160@o2ib@0@lo:26/25
lens 224/224 e 0 to 1 dl 1398350224 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
2014-04-24T10:37:05.418516-04:00 icon0.psc.edu kernel: Lustre: server
umount iconfs-MDT0000 complete
2014-04-24T10:37:05.418535-04:00 icon0.psc.edu kernel: LustreError:
11888:0:(obd_mount.c:1338:lustre_fill_super()) Unable to mount (-17)
TIA,
-Anjana Kar
Pittsburgh Supercomputing Center
kar(a)psc.edu
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss(a)lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
6 years, 10 months
Lustre client on SLES11sp2
by Daguman, Brainard R
Hi All,
I've got Lustre file system ver2.4.2 built up on CentOS6.4 MDS/OSSs/Client(test only) systems using the same version of OS and kernel 2.6.32-358.23.2.
Need to load Lustre client on several compute systems with SLES11sp2 build; matrix show this is doable. How do I enable SLES11sp2 client which has different kernel builds? I believe this require to recompile Lustre on client systems with correct source; what are the steps?
--
Thanks,
Brainard
6 years, 10 months
Lustre Error on Lustre client machines
by Venkat Reddy
Hi,
I am getting the following error messages in Lustre Clients continuously,
I am unable to find any clu on this, Please advoice.
I am posting one client messages.
Mar 20 00:05:20 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 00:05:20 node22 kernel: LustreError: Skipped 373 previous similar
messages
Mar 20 00:05:20 node22 kernel: LustreError:
3248:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881069aeb000 x1462271803650175/t0(0)
o4->lustre-OST0001-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395254164 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 00:05:20 node22 kernel: LustreError:
3248:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 374 previous
similar messages
Mar 20 00:15:24 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.45@o2ib. The ost_write operation failed with -5
Mar 20 00:15:24 node22 kernel: LustreError: Skipped 367 previous similar
messages
Mar 20 00:15:24 node22 kernel: LustreError:
3251:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881069f61c00 x1462271803659320/t0(0)
o4->lustre-OST0006-osc-ffff880819776400@192.168.1.45@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395254732 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 00:15:24 node22 kernel: LustreError:
3251:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 368 previous
similar messages
Mar 20 00:28:04 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.45@o2ib. The ost_write operation failed with -5
Mar 20 00:28:04 node22 kernel: LustreError: Skipped 13 previous similar
messages
Mar 20 00:28:04 node22 kernel: LustreError:
3260:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106a0db400 x1462271803668389/t0(0)
o4->lustre-OST0006-osc-ffff880819776400@192.168.1.45@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395255492 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 00:28:04 node22 kernel: LustreError:
3260:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 13 previous
similar messages
Mar 20 00:38:34 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.45@o2ib. The ost_write operation failed with -5
Mar 20 00:38:34 node22 kernel: LustreError: Skipped 13 previous similar
messages
Mar 20 00:38:34 node22 kernel: LustreError:
3250:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106b499000 x1462271803676412/t0(0)
o4->lustre-OST0006-osc-ffff880819776400@192.168.1.45@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395256121 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 00:38:34 node22 kernel: LustreError:
3250:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 13 previous
similar messages
Mar 20 00:48:37 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 00:48:37 node22 kernel: LustreError: Skipped 16 previous similar
messages
Mar 20 00:48:37 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881060f94800 x1462271803685186/t0(0)
o4->lustre-OST0000-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395256724 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 00:48:37 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 16 previous
similar messages
Mar 20 00:58:45 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.45@o2ib. The ost_write operation failed with -5
Mar 20 00:58:45 node22 kernel: LustreError: Skipped 21 previous similar
messages
Mar 20 00:58:45 node22 kernel: LustreError:
3253:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106b499c00 x1462271803693596/t0(0)
o4->lustre-OST0006-osc-ffff880819776400@192.168.1.45@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395257371 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 00:58:45 node22 kernel: LustreError:
3253:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 21 previous
similar messages
Mar 20 01:09:09 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.45@o2ib. The ost_write operation failed with -5
Mar 20 01:09:09 node22 kernel: LustreError: Skipped 13 previous similar
messages
Mar 20 01:09:09 node22 kernel: LustreError:
3259:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106b499000 x1462271803701825/t0(0)
o4->lustre-OST0006-osc-ffff880819776400@192.168.1.45@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395257993 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 01:09:09 node22 kernel: LustreError:
3259:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 13 previous
similar messages
Mar 20 01:19:39 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 01:19:39 node22 kernel: LustreError: Skipped 18 previous similar
messages
Mar 20 01:19:39 node22 kernel: LustreError:
3251:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881060e57800 x1462271803711034/t0(0)
o4->lustre-OST0000-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395258587 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 01:19:39 node22 kernel: LustreError:
3251:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 18 previous
similar messages
Mar 20 01:29:40 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.45@o2ib. The ost_write operation failed with -5
Mar 20 01:29:40 node22 kernel: LustreError: Skipped 21 previous similar
messages
Mar 20 01:29:40 node22 kernel: LustreError:
3256:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106a0db000 x1462271803719047/t0(0)
o4->lustre-OST0006-osc-ffff880819776400@192.168.1.45@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395259225 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 01:29:40 node22 kernel: LustreError:
3256:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 21 previous
similar messages
Mar 20 01:40:09 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 01:40:09 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 01:40:09 node22 kernel: LustreError: Skipped 83 previous similar
messages
Mar 20 01:40:09 node22 kernel: LustreError: Skipped 83 previous similar
messages
Mar 20 01:40:09 node22 kernel: LustreError:
3255:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106b272400 x1462271803727400/t0(0)
o4->lustre-OST0001-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395259855 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 01:40:09 node22 kernel: LustreError:
3256:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881062400800 x1462271803727404/t0(0)
o4->lustre-OST0001-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395259855 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 01:40:09 node22 kernel: LustreError:
3256:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 82 previous
similar messages
Mar 20 01:50:10 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 01:50:10 node22 kernel: LustreError: Skipped 190 previous similar
messages
Mar 20 01:50:10 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881069aeb000 x1462271803735098/t0(0)
o4->lustre-OST0001-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395260417 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 01:50:10 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 189 previous
similar messages
Mar 20 02:00:17 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 02:00:17 node22 kernel: LustreError: Skipped 199 previous similar
messages
Mar 20 02:00:17 node22 kernel: LustreError:
3260:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881069aeb000 x1462271803743031/t0(0)
o4->lustre-OST0001-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395261061 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 02:00:17 node22 kernel: LustreError:
3260:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 198 previous
similar messages
Mar 20 02:10:20 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.44@o2ib. The ost_write operation failed with -5
Mar 20 02:10:20 node22 kernel: LustreError: Skipped 210 previous similar
messages
Mar 20 02:10:20 node22 kernel: LustreError:
3249:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106b5c1800 x1462271803750283/t0(0)
o4->lustre-OST0005-osc-ffff880819776400@192.168.1.44@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395261665 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 02:10:20 node22 kernel: LustreError:
3249:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 213 previous
similar messages
Mar 20 02:20:23 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 02:20:23 node22 kernel: LustreError: Skipped 191 previous similar
messages
Mar 20 02:20:23 node22 kernel: LustreError:
3251:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106a962400 x1462271803758875/t0(0)
o4->lustre-OST0000-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395262269 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 02:20:23 node22 kernel: LustreError:
3251:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 191 previous
similar messages
Mar 20 02:30:25 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 02:30:25 node22 kernel: LustreError: Skipped 149 previous similar
messages
Mar 20 02:30:25 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff88106a962400 x1462271803766929/t0(0)
o4->lustre-OST0000-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395262833 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 02:30:25 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 148 previous
similar messages
Mar 20 02:40:26 node22 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.43@o2ib. The ost_write operation failed with -5
Mar 20 02:40:26 node22 kernel: LustreError: Skipped 225 previous similar
messages
Mar 20 02:40:26 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) @@@ redo for recoverable
error -5 req@ffff881069aeb800 x1462271803774144/t0(0)
o4->lustre-OST0001-osc-ffff880819776400@192.168.1.43@o2ib:6/4 lens 488/416
e 0 to 0 dl 1395263434 ref 2 fl Interpret:R/0/0 rc -5/-5
Mar 20 02:40:26 node22 kernel: LustreError:
3254:0:(osc_request.c:1689:osc_brw_redo_request()) Skipped 225 previous
similar messages
[root@node22 ~]#
Thank you
Venkat Reddy
6 years, 10 months