This issue has surfaced once again any help is greatly appreciated.
Best regards
Amit
On Jun 30, 2013, at 9:01 PM, "Kumar, Amit" <ahkumar(a)mail.smu.edu> wrote:
Dear All,
At this point the OST connections have been restored and touch wood, they look stable.
I have seen intermittent performance issues before, but not like this one that lasted
more than a day.
We had several running jobs which is normal, but this time I saw high IO waits on OSS up
to 8% for process jbd2-sd[x]
And new data that was written/added to the disks during this period was about 3-4TB since
Friday, that does sound much.
Just so I can make some sense out of what happened here, I pulled the avg. stats of 3Gbps
bonded interface for each of the OSS for the last 24hours, and then converted into MB/days
...TB/day
But the value below I get is 99TB and it does not make sense, I know I am doing something
silly, does it make sense?
One interesting thing I noted is, as soon as I activated the OSS," from which I was
migrating the data off, so that I could replace the disks ... ", that OSS starting
getting lot of data in and out and you can see that by avg. stats being 1200 Mbps and 1100
Mpbs.
Can you please help me understand what could have happened, and what is the most accurate
way to know what is exactly causing the issue, and what would be the best practice action
plan at that time? This week I will be adding another OSS with additional OST's to
increase capacity, hence any tuning tips at this time will be very helpful.
Thank you,
Amit
For each OSS in the last 24 hours, the avg stats for data in/out of the 3Gbps bonded
interface as shown below
OSS____________________Mbps*60*60*24
In 166.5 Mbps 14385600
Out 54 Mbps 4665600
In 776 Mbps 67046400
Out 199.7 Mbps 17254080
In 305.3 Mbps 26377920
Out 230.5 Mbps 19915200
In 193.3 Mbps 16701120
Out 34.7 Mbps 2998080
In 158.1 Mbps 13659840
Out 51.6 Mbps 4458240
In 327 Mbps 28252800
Out 75 Mbps 6480000
In 95.6 Mbps 8259840
Out 63 Mbps 5443200
In 824.3 Mbps 71219520
Out 203.2 Mbps 17556480
In 1200 Mbps 103680000
Out 575.9 Mbps 49757760
In 1100 Mbps 95040000
Out 427.4 Mbps 36927360
In 1100 Mbps 95040000
Out 466.6 Mbps 40314240
In 750.9 Mbps 64877760
Out 253.8 Mbps 21928320
________________SUM 832239360 Mb/day
________________MB 104029920 MB/day
________________GB 101591.7188 GB/day
________________TB 99.21066284 99TB/day
-----Original Message-----
From: hpdd-discuss-bounces(a)lists.01.org [mailto:hpdd-discuss-bounces@lists.01.org] On
Behalf Of Kumar, Amit
Sent: Sunday, June 30, 2013 3:31 PM
To: Jones, Peter A; hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Serious Performance Problems, Need help!!!
Hi Peter,
I missed the most important one :)
MDS
# rpm -qa | grep lustre
kernel-lustre-2.6.18-128.7.1.el5_lustre.1.8.1.1
kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
lustre-1.8.5-2.6.18_194.17.1.el5_lustre.1.8.5
kernel-devel-2.6.18-194.17.1.el5_lustre.1.8.5
kernel-lustre-devel-2.6.18-128.7.1.el5_lustre.1.8.1.1
kernel-2.6.18-194.3.1.el5_lustre.1.8.4
lustre-modules-1.8.5-2.6.18_194.17.1.el5_lustre.1.8.5
lustre-ldiskfs-3.1.4-2.6.18_194.17.1.el5_lustre.1.8.5
kernel-2.6.18-194.17.1.el5_lustre.1.8.5
OSS:
lustre-1.8.5-2.6.18_194.17.1.el5_lustre.1.8.5
lustre-modules-1.8.5-2.6.18_194.17.1.el5_lustre.1.8.5
lustre-ldiskfs-3.1.4-2.6.18_194.17.1.el5_lustre.1.8.5
kernel-2.6.18-194.17.1.el5_lustre.1.8.5
Clients:
# rpm -qa | grep lustre
lustre-modules-1.8.7-2.6.18_308.8.1.el5_201206071003
lustre-1.8.7-2.6.18_308.8.1.el5_201206071003
Thank you for your response.
Amit
-----Original Message-----
From: hpdd-discuss-bounces(a)lists.01.org [mailto:hpdd-discuss-bounces@lists.01.org] On
Behalf Of Jones, Peter A
Sent: Sunday, June 30, 2013 3:25 PM
To: hpdd-discuss(a)lists.01.org
Subject: Re: [HPDD-discuss] Serious Performance Problems, Need help!!!
Amit
I imagine the first question anyone hoping to help will have will be "what version
of Lustre are you running?"
Peter
On 6/30/13 12:47 PM, "Kumar, Amit"
<ahkumar@mail.smu.edu<mailto:ahkumar@mail.smu.edu>> wrote:
Dear Lustre,
We are having major performance problems this time and hard to grasp what is going on.
Health check all look good. Network looks good. But performance is bad.
(a) lfs df, output at the end of the email shows a couple of OST's temporarily
unavailable, but this normally happens and it connects back. It does connect back but then
it is unavailable again in a short while, this is repeating.
(b) Also included below are outputs for the following commands from every OSS
cat /proc/fs/lustre/devices
lctl get_param ost.*.ost_io.threads_max
lctl get_param ost.*.ost_io.threads_started grep -i LBUG /var/log/messages cat
/proc/fs/lustre/health_check cat /proc/sys/lnet/nis
(c)
(d) Based on this RPC stats that is attached to this email it seems a lot of pending
pages to write to is probably causing this. Attached rpc_stats includes all ost's.
(e) Also the LNET peer stats below show a great deal of congestion with two of the
OST.
I am not sure how to approach this in reducing the performance problems. Almost all OSS
is seeing IO wait, backend storage also looks good.
Can anybody please advise on possible issue that may be causing this other than the file
system being 88% full.
No changes were made to the system recently, except to refresh the disk in OST I
deactivated OST temporarily while I was migrating the data off the deactivated OST. Since
this problem this Friday I re-activated the deactivated OST, so that I could add
additional OSS and OST to load balance, hence relieve the performance issues. It seemed to
help a bit but not much.
Best,
Thank you,
Amit
Below here is the output of the following commands from each of the OSS.
cat /proc/fs/lustre/devices
lctl get_param ost.*.ost_io.threads_max
lctl get_param ost.*.ost_io.threads_started grep -i LBUG /var/log/messages cat
/proc/fs/lustre/health_check cat /proc/sys/lnet/nis
array2
0 UP mgc MGC10.1.1.40@tcp 87942af4-c7b4-5695-4680-2a3a4f232054 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST001a smuhpc-OST001a_UUID 439
3 UP obdfilter smuhpc-OST0000 smuhpc-OST0000_UUID 439
4 UP obdfilter smuhpc-OST0001 smuhpc-OST0001_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=367
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.51@tcp up -1 225 8 0 256 256 -1512
array2b
0 UP mgc MGC10.1.1.40@tcp f4072991-d501-f944-10b0-4c6a460c9c6d 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0002 smuhpc-OST0002_UUID 439
3 UP obdfilter smuhpc-OST0003 smuhpc-OST0003_UUID 438
4 UP obdfilter smuhpc-OST0008 smuhpc-OST0008_UUID 439
ost.OSS.ost_io.threads_max=128
ost.OSS.ost_io.threads_started=64
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.54@tcp up -1 225 8 0 256 256 -306
array3
0 UP mgc MGC10.1.1.40@tcp 524536bc-fb4f-bed5-6e55-924aa46112d1 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0004 smuhpc-OST0004_UUID 439
3 UP obdfilter smuhpc-OST0005 smuhpc-OST0005_UUID 439
4 UP obdfilter smuhpc-OST0006 smuhpc-OST0006_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=362
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.52@tcp up -1 225 8 0 256 256 -1037
array3b
0 UP mgc MGC10.1.1.40@tcp 00fdbef3-fd0c-18db-637b-eb869eb99309 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0007 smuhpc-OST0007_UUID 439
3 UP obdfilter smuhpc-OST0011 smuhpc-OST0011_UUID 439
4 UP obdfilter smuhpc-OST0012 smuhpc-OST0012_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=293
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.55@tcp up -1 225 8 0 256 256 -147
array4
0 UP mgc MGC10.1.1.40@tcp b90bd48b-3f2f-aa60-a1a4-e743ce1d4025 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST000b smuhpc-OST000b_UUID 439
3 UP obdfilter smuhpc-OST000c smuhpc-OST000c_UUID 439
4 UP obdfilter smuhpc-OST000d smuhpc-OST000d_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=512
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.53@tcp up -1 225 8 0 256 256 -966
array4b
0 UP mgc MGC10.1.1.40@tcp 1b31358c-ffc6-ca4d-14ea-78bf8804a15a 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST000e smuhpc-OST000e_UUID 439
3 UP obdfilter smuhpc-OST001c smuhpc-OST001c_UUID 437
4 UP obdfilter smuhpc-OST001d smuhpc-OST001d_UUID 437
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=512
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.56@tcp up -1 223 8 0 256 256 -655
array5
0 UP mgc MGC10.1.1.40@tcp 0bdb83f9-dbf5-aeaa-ff7d-66c0b6471811 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0009 smuhpc-OST0009_UUID 439
3 UP obdfilter smuhpc-OST000a smuhpc-OST000a_UUID 439
4 UP obdfilter smuhpc-OST000f smuhpc-OST000f_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=512
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.57@tcp up -1 225 8 0 256 256 -385
array5b
0 UP mgc MGC10.1.1.40@tcp d5982303-1e80-3ba5-c88b-b712e2d7c7af 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0010 smuhpc-OST0010_UUID 439
3 UP obdfilter smuhpc-OST0017 smuhpc-OST0017_UUID 439
4 UP obdfilter smuhpc-OST001b smuhpc-OST001b_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=312
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.58@tcp up -1 225 8 0 256 249 -383
array6
0 UP mgc MGC10.1.1.40@tcp 624e193a-3f28-2936-14e8-a3ff130bcd0f 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0030 smuhpc-OST0030_UUID 436
3 UP obdfilter smuhpc-OST0031 smuhpc-OST0031_UUID 436
4 UP obdfilter smuhpc-OST0032 smuhpc-OST0032_UUID 436
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=128
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.59@tcp up -1 225 8 0 256 238 -475
array6b
0 UP mgc MGC10.1.1.40@tcp 1854af7c-31b9-43c5-058c-4953afb936bb 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0033 smuhpc-OST0033_UUID 436
3 UP obdfilter smuhpc-OST0034 smuhpc-OST0034_UUID 436
4 UP obdfilter smuhpc-OST0035 smuhpc-OST0035_UUID 436
ost.OSS.ost_io.threads_max=256
ost.OSS.ost_io.threads_started=128
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.60@tcp up -1 225 8 0 256 239 -217
array8
0 UP mgc MGC10.1.1.40@tcp a6744840-8c1a-cd8a-487b-db2e9efbf856 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0013 smuhpc-OST0013_UUID 439
3 UP obdfilter smuhpc-OST0014 smuhpc-OST0014_UUID 439
4 UP obdfilter smuhpc-OST0015 smuhpc-OST0015_UUID 439
5 UP obdfilter smuhpc-OST0016 smuhpc-OST0016_UUID 439
6 UP obdfilter smuhpc-OST0018 smuhpc-OST0018_UUID 439
7 UP obdfilter smuhpc-OST0019 smuhpc-OST0019_UUID 439
ost.OSS.ost_io.threads_max=512
ost.OSS.ost_io.threads_started=512
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.62@tcp up -1 225 8 0 256 256 -673
array7
0 UP mgc MGC10.1.1.40@tcp 315ffeaf-3075-24d7-2a01-02e062b60e34 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter smuhpc-OST0036 smuhpc-OST0036_UUID 437
3 UP obdfilter smuhpc-OST0037 smuhpc-OST0037_UUID 437
4 UP obdfilter smuhpc-OST0038 smuhpc-OST0038_UUID 437
5 UP obdfilter smuhpc-OST0039 smuhpc-OST0039_UUID 437
6 UP obdfilter smuhpc-OST003a smuhpc-OST003a_UUID 437
7 UP obdfilter smuhpc-OST003b smuhpc-OST003b_UUID 437
ost.OSS.ost_io.threads_max=256
ost.OSS.ost_io.threads_started=64
healthy
nid status alive refs peer rtr max tx min
0@lo up 0 2 0 0 0 0 0
10.1.1.61@tcp up -1 225 8 0 256 251 -1421
MGS/MDS_NODE# cat /proc/sys/lnet/peers | grep "10\.1\.1\." (below here are our
ost's, and you see congestion on two of those, although the health_check shows
healthy)
10.1.1.51@tcp 1 up 8 8 8 8 -6732 0
10.1.1.52@tcp 1 up 8 8 8 8 -2753 0
10.1.1.53@tcp 1 up 8 8 8 8 -4 0
10.1.1.54@tcp 1 up 8 8 8 8 -40 0
10.1.1.55@tcp 1 up 8 8 8 8 -7 0
10.1.1.56@tcp 1 up 8 8 8 8 0 0
10.1.1.57@tcp 1 up 8 8 8 8 -4 0
10.1.1.58@tcp 1 up 8 8 8 8 -6 0
10.1.1.59@tcp 1 up 8 8 8 8 -2 0
10.1.1.60@tcp 1 up 8 8 8 8 -1 0
10.1.1.61@tcp 1 up 8 8 8 8 -15 0
10.1.1.62@tcp 1 up 8 8 8 8 -11 0
=======More LOGS from MDS/MGS ====
# grep '[0-9]' /proc/fs/lustre/osc/*/kbytes{free,avail,total}
/proc/fs/lustre/osc/smuhpc-OST0000-osc/kbytesfree:514058156
/proc/fs/lustre/osc/smuhpc-OST0001-osc/kbytesfree:765667120
/proc/fs/lustre/osc/smuhpc-OST0002-osc/kbytesfree:1096019280
grep: /proc/fs/lustre/osc/smuhpc-OST0003-osc/kbytesfree: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0004-osc/kbytesfree:1577637660
/proc/fs/lustre/osc/smuhpc-OST0005-osc/kbytesfree:132305164
/proc/fs/lustre/osc/smuhpc-OST0006-osc/kbytesfree:899697048
/proc/fs/lustre/osc/smuhpc-OST0007-osc/kbytesfree:857944436
/proc/fs/lustre/osc/smuhpc-OST0008-osc/kbytesfree:36161928
/proc/fs/lustre/osc/smuhpc-OST0009-osc/kbytesfree:39061480
/proc/fs/lustre/osc/smuhpc-OST000a-osc/kbytesfree:938678228
/proc/fs/lustre/osc/smuhpc-OST000b-osc/kbytesfree:8604452
/proc/fs/lustre/osc/smuhpc-OST000c-osc/kbytesfree:44878900
/proc/fs/lustre/osc/smuhpc-OST000d-osc/kbytesfree:1117771508
/proc/fs/lustre/osc/smuhpc-OST000e-osc/kbytesfree:769454268
/proc/fs/lustre/osc/smuhpc-OST000f-osc/kbytesfree:56939372
/proc/fs/lustre/osc/smuhpc-OST0010-osc/kbytesfree:210416704
/proc/fs/lustre/osc/smuhpc-OST0011-osc/kbytesfree:1315953944
/proc/fs/lustre/osc/smuhpc-OST0012-osc/kbytesfree:1112498952
/proc/fs/lustre/osc/smuhpc-OST0013-osc/kbytesfree:917528092
/proc/fs/lustre/osc/smuhpc-OST0014-osc/kbytesfree:818228736
/proc/fs/lustre/osc/smuhpc-OST0015-osc/kbytesfree:119717344
/proc/fs/lustre/osc/smuhpc-OST0016-osc/kbytesfree:818664044
/proc/fs/lustre/osc/smuhpc-OST0017-osc/kbytesfree:1307525340
/proc/fs/lustre/osc/smuhpc-OST0018-osc/kbytesfree:561629216
/proc/fs/lustre/osc/smuhpc-OST0019-osc/kbytesfree:682050424
/proc/fs/lustre/osc/smuhpc-OST001a-osc/kbytesfree:1262541880
/proc/fs/lustre/osc/smuhpc-OST001b-osc/kbytesfree:864048788
/proc/fs/lustre/osc/smuhpc-OST001c-osc/kbytesfree:511371988
/proc/fs/lustre/osc/smuhpc-OST001d-osc/kbytesfree:109860844
grep: /proc/fs/lustre/osc/smuhpc-OST0030-osc/kbytesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0031-osc/kbytesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0032-osc/kbytesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0033-osc/kbytesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0034-osc/kbytesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0035-osc/kbytesfree: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0036-osc/kbytesfree:718292640
/proc/fs/lustre/osc/smuhpc-OST0037-osc/kbytesfree:472531244
/proc/fs/lustre/osc/smuhpc-OST0038-osc/kbytesfree:433755684
/proc/fs/lustre/osc/smuhpc-OST0039-osc/kbytesfree:875580388
/proc/fs/lustre/osc/smuhpc-OST003a-osc/kbytesfree:1161276948
grep: /proc/fs/lustre/osc/smuhpc-OST003b-osc/kbytesfree: Resource temporarily
unavailable
/proc/fs/lustre/osc/smuhpc-OST0000-osc/kbytesavail:514033840
/proc/fs/lustre/osc/smuhpc-OST0001-osc/kbytesavail:765639756
/proc/fs/lustre/osc/smuhpc-OST0002-osc/kbytesavail:1095950892
grep: /proc/fs/lustre/osc/smuhpc-OST0003-osc/kbytesavail: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0004-osc/kbytesavail:1577629868
/proc/fs/lustre/osc/smuhpc-OST0005-osc/kbytesavail:132295072
/proc/fs/lustre/osc/smuhpc-OST0006-osc/kbytesavail:899689368
/proc/fs/lustre/osc/smuhpc-OST0007-osc/kbytesavail:857942648
/proc/fs/lustre/osc/smuhpc-OST0008-osc/kbytesavail:36140876
/proc/fs/lustre/osc/smuhpc-OST0009-osc/kbytesavail:38998500
/proc/fs/lustre/osc/smuhpc-OST000a-osc/kbytesavail:938670344
/proc/fs/lustre/osc/smuhpc-OST000b-osc/kbytesavail:8593840
/proc/fs/lustre/osc/smuhpc-OST000c-osc/kbytesavail:44876596
/proc/fs/lustre/osc/smuhpc-OST000d-osc/kbytesavail:1117758504
/proc/fs/lustre/osc/smuhpc-OST000e-osc/kbytesavail:769447360
/proc/fs/lustre/osc/smuhpc-OST000f-osc/kbytesavail:56922292
/proc/fs/lustre/osc/smuhpc-OST0010-osc/kbytesavail:210406920
/proc/fs/lustre/osc/smuhpc-OST0011-osc/kbytesavail:1315948464
/proc/fs/lustre/osc/smuhpc-OST0012-osc/kbytesavail:1112487208
/proc/fs/lustre/osc/smuhpc-OST0013-osc/kbytesavail:917520972
/proc/fs/lustre/osc/smuhpc-OST0014-osc/kbytesavail:818200064
/proc/fs/lustre/osc/smuhpc-OST0015-osc/kbytesavail:119708876
/proc/fs/lustre/osc/smuhpc-OST0016-osc/kbytesavail:818659948
/proc/fs/lustre/osc/smuhpc-OST0017-osc/kbytesavail:1307516124
/proc/fs/lustre/osc/smuhpc-OST0018-osc/kbytesavail:561624584
/proc/fs/lustre/osc/smuhpc-OST0019-osc/kbytesavail:682045540
/proc/fs/lustre/osc/smuhpc-OST001a-osc/kbytesavail:1262529492
/proc/fs/lustre/osc/smuhpc-OST001b-osc/kbytesavail:863983524
/proc/fs/lustre/osc/smuhpc-OST001c-osc/kbytesavail:511362064
/proc/fs/lustre/osc/smuhpc-OST001d-osc/kbytesavail:109827908
grep: /proc/fs/lustre/osc/smuhpc-OST0030-osc/kbytesavail: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0031-osc/kbytesavail: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0032-osc/kbytesavail: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0033-osc/kbytesavail: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0034-osc/kbytesavail: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0035-osc/kbytesavail: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0036-osc/kbytesavail:718253728
/proc/fs/lustre/osc/smuhpc-OST0037-osc/kbytesavail:472467152
/proc/fs/lustre/osc/smuhpc-OST0038-osc/kbytesavail:433729872
/proc/fs/lustre/osc/smuhpc-OST0039-osc/kbytesavail:875578332
/proc/fs/lustre/osc/smuhpc-OST003a-osc/kbytesavail:1161272852
grep: /proc/fs/lustre/osc/smuhpc-OST003b-osc/kbytesavail: Resource temporarily
unavailable
/proc/fs/lustre/osc/smuhpc-OST0000-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST0001-osc/kbytestotal:9612387536
/proc/fs/lustre/osc/smuhpc-OST0002-osc/kbytestotal:11534862728
grep: /proc/fs/lustre/osc/smuhpc-OST0003-osc/kbytestotal: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0004-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST0005-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST0006-osc/kbytestotal:9612387536
/proc/fs/lustre/osc/smuhpc-OST0007-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST0008-osc/kbytestotal:9615574536
/proc/fs/lustre/osc/smuhpc-OST0009-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST000a-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST000b-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST000c-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST000d-osc/kbytestotal:9612387536
/proc/fs/lustre/osc/smuhpc-OST000e-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST000f-osc/kbytestotal:9612387536
/proc/fs/lustre/osc/smuhpc-OST0010-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST0011-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST0012-osc/kbytestotal:9615574536
/proc/fs/lustre/osc/smuhpc-OST0013-osc/kbytestotal:13452678016
/proc/fs/lustre/osc/smuhpc-OST0014-osc/kbytestotal:13452678016
/proc/fs/lustre/osc/smuhpc-OST0015-osc/kbytestotal:11530866816
/proc/fs/lustre/osc/smuhpc-OST0016-osc/kbytestotal:13452678016
/proc/fs/lustre/osc/smuhpc-OST0017-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST0018-osc/kbytestotal:13452678016
/proc/fs/lustre/osc/smuhpc-OST0019-osc/kbytestotal:11530866816
/proc/fs/lustre/osc/smuhpc-OST001a-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST001b-osc/kbytestotal:9615574536
/proc/fs/lustre/osc/smuhpc-OST001c-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST001d-osc/kbytestotal:9615574536
grep: /proc/fs/lustre/osc/smuhpc-OST0030-osc/kbytestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0031-osc/kbytestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0032-osc/kbytestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0033-osc/kbytestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0034-osc/kbytestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0035-osc/kbytestotal: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0036-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST0037-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST0038-osc/kbytestotal:11534862728
/proc/fs/lustre/osc/smuhpc-OST0039-osc/kbytestotal:11538687128
/proc/fs/lustre/osc/smuhpc-OST003a-osc/kbytestotal:9612387536
grep: /proc/fs/lustre/osc/smuhpc-OST003b-osc/kbytestotal: Resource temporarily
unavailable
# grep '[0-9]' /proc/fs/lustre/osc/*/files{free,total}
/proc/fs/lustre/osc/smuhpc-OST0000-osc/filesfree:128514539
/proc/fs/lustre/osc/smuhpc-OST0001-osc/filesfree:191416790
/proc/fs/lustre/osc/smuhpc-OST0002-osc/filesfree:274004820
grep: /proc/fs/lustre/osc/smuhpc-OST0003-osc/filesfree: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0004-osc/filesfree:394395591
/proc/fs/lustre/osc/smuhpc-OST0005-osc/filesfree:33076291
/proc/fs/lustre/osc/smuhpc-OST0006-osc/filesfree:224911717
/proc/fs/lustre/osc/smuhpc-OST0007-osc/filesfree:214486110
/proc/fs/lustre/osc/smuhpc-OST0008-osc/filesfree:8856919
/proc/fs/lustre/osc/smuhpc-OST0009-osc/filesfree:9624045
/proc/fs/lustre/osc/smuhpc-OST000a-osc/filesfree:234669553
/proc/fs/lustre/osc/smuhpc-OST000b-osc/filesfree:2151113
/proc/fs/lustre/osc/smuhpc-OST000c-osc/filesfree:11219725
/proc/fs/lustre/osc/smuhpc-OST000d-osc/filesfree:279442892
/proc/fs/lustre/osc/smuhpc-OST000e-osc/filesfree:192357679
/proc/fs/lustre/osc/smuhpc-OST000f-osc/filesfree:14234843
/proc/fs/lustre/osc/smuhpc-OST0010-osc/filesfree:52604176
/proc/fs/lustre/osc/smuhpc-OST0011-osc/filesfree:328988486
/proc/fs/lustre/osc/smuhpc-OST0012-osc/filesfree:278118850
/proc/fs/lustre/osc/smuhpc-OST0013-osc/filesfree:229382023
/proc/fs/lustre/osc/smuhpc-OST0014-osc/filesfree:204557180
/proc/fs/lustre/osc/smuhpc-OST0015-osc/filesfree:29929336
/proc/fs/lustre/osc/smuhpc-OST0016-osc/filesfree:204663451
/proc/fs/lustre/osc/smuhpc-OST0017-osc/filesfree:326881334
/proc/fs/lustre/osc/smuhpc-OST0018-osc/filesfree:140407304
/proc/fs/lustre/osc/smuhpc-OST0019-osc/filesfree:170512603
/proc/fs/lustre/osc/smuhpc-OST001a-osc/filesfree:315635470
/proc/fs/lustre/osc/smuhpc-OST001b-osc/filesfree:216012197
/proc/fs/lustre/osc/smuhpc-OST001c-osc/filesfree:127842996
/proc/fs/lustre/osc/smuhpc-OST001d-osc/filesfree:27465211
grep: /proc/fs/lustre/osc/smuhpc-OST0030-osc/filesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0031-osc/filesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0032-osc/filesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0033-osc/filesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0034-osc/filesfree: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0035-osc/filesfree: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0036-osc/filesfree:179276300
/proc/fs/lustre/osc/smuhpc-OST0037-osc/filesfree:118132811
/proc/fs/lustre/osc/smuhpc-OST0038-osc/filesfree:108438921
/proc/fs/lustre/osc/smuhpc-OST0039-osc/filesfree:218891001
/proc/fs/lustre/osc/smuhpc-OST003a-osc/filesfree:290319237
grep: /proc/fs/lustre/osc/smuhpc-OST003b-osc/filesfree: Resource temporarily unavailable
/proc/fs/lustre/osc/smuhpc-OST0000-osc/filestotal:132250935
/proc/fs/lustre/osc/smuhpc-OST0001-osc/filestotal:193743661
/proc/fs/lustre/osc/smuhpc-OST0002-osc/filestotal:277776374
grep: /proc/fs/lustre/osc/smuhpc-OST0003-osc/filestotal: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0004-osc/filestotal:396440189
/proc/fs/lustre/osc/smuhpc-OST0005-osc/filestotal:35044722
/proc/fs/lustre/osc/smuhpc-OST0006-osc/filestotal:226783057
/proc/fs/lustre/osc/smuhpc-OST0007-osc/filestotal:217558056
/proc/fs/lustre/osc/smuhpc-OST0008-osc/filestotal:11184523
/proc/fs/lustre/osc/smuhpc-OST0009-osc/filestotal:12231803
/proc/fs/lustre/osc/smuhpc-OST000a-osc/filestotal:237327760
/proc/fs/lustre/osc/smuhpc-OST000b-osc/filestotal:4003238
/proc/fs/lustre/osc/smuhpc-OST000c-osc/filestotal:12981815
/proc/fs/lustre/osc/smuhpc-OST000d-osc/filestotal:281106176
/proc/fs/lustre/osc/smuhpc-OST000e-osc/filestotal:195190328
/proc/fs/lustre/osc/smuhpc-OST000f-osc/filestotal:16476504
/proc/fs/lustre/osc/smuhpc-OST0010-osc/filestotal:55089005
/proc/fs/lustre/osc/smuhpc-OST0011-osc/filestotal:330925776
/proc/fs/lustre/osc/smuhpc-OST0012-osc/filestotal:279931713
/proc/fs/lustre/osc/smuhpc-OST0013-osc/filestotal:231994647
/proc/fs/lustre/osc/smuhpc-OST0014-osc/filestotal:206633272
/proc/fs/lustre/osc/smuhpc-OST0015-osc/filestotal:31825125
/proc/fs/lustre/osc/smuhpc-OST0016-osc/filestotal:206716377
/proc/fs/lustre/osc/smuhpc-OST0017-osc/filestotal:329458681
/proc/fs/lustre/osc/smuhpc-OST0018-osc/filestotal:142450211
/proc/fs/lustre/osc/smuhpc-OST0019-osc/filestotal:172358875
/proc/fs/lustre/osc/smuhpc-OST001a-osc/filestotal:318295996
/proc/fs/lustre/osc/smuhpc-OST001b-osc/filestotal:218409008
/proc/fs/lustre/osc/smuhpc-OST001c-osc/filestotal:129660363
/proc/fs/lustre/osc/smuhpc-OST001d-osc/filestotal:29250131
grep: /proc/fs/lustre/osc/smuhpc-OST0030-osc/filestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0031-osc/filestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0032-osc/filestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0033-osc/filestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0034-osc/filestotal: Cannot send after transport
endpoint shutdown
grep: /proc/fs/lustre/osc/smuhpc-OST0035-osc/filestotal: Cannot send after transport
endpoint shutdown
/proc/fs/lustre/osc/smuhpc-OST0036-osc/filestotal:180363444
/proc/fs/lustre/osc/smuhpc-OST0037-osc/filestotal:119215177
/proc/fs/lustre/osc/smuhpc-OST0038-osc/filestotal:109491390
/proc/fs/lustre/osc/smuhpc-OST0039-osc/filestotal:220028416
/proc/fs/lustre/osc/smuhpc-OST003a-osc/filestotal:291302059
grep: /proc/fs/lustre/osc/smuhpc-OST003b-osc/filestotal: Resource temporarily
unavailable
# grep '[0-9]' /proc/fs/lustre/mds/*/kbytes{free,avail,total}
/proc/fs/lustre/mds/smuhpc-MDT0000/kbytesfree:677031612
/proc/fs/lustre/mds/smuhpc-MDT0000/kbytesavail:677031612
/proc/fs/lustre/mds/smuhpc-MDT0000/kbytestotal:688076544
# grep '[0-9]' /proc/fs/lustre/mds/*/files{free,total}
/proc/fs/lustre/mds/smuhpc-MDT0000/filesfree:154564006
/proc/fs/lustre/mds/smuhpc-MDT0000/filestotal:196608000
# lfs df
UUID 1K-blocks Used Available Use% Mounted on
smuhpc-MDT0000_UUID 688076544 11044932 677031612 2% /lustre[MDT:0]
smuhpc-OST0000_UUID 11538687128 11024628972 514033840 96% /lustre[OST:0]
smuhpc-OST0001_UUID 9612387536 8846797188 765561148 92% /lustre[OST:1]
smuhpc-OST0002_UUID 11534862728 10438969420 1095808024 90% /lustre[OST:2]
smuhpc-OST0003_UUID 11538687128 10237552284 1301088824 89% /lustre[OST:3]
smuhpc-OST0004_UUID 11534862728 9957280360 1577573564 86% /lustre[OST:4]
smuhpc-OST0005_UUID 11538687128 11406381964 132295072 99% /lustre[OST:5]
smuhpc-OST0006_UUID 9612387536 8712821564 899477908 91% /lustre[OST:6]
smuhpc-OST0007_UUID 11534862728 10676918312 857941880 93% /lustre[OST:7]
smuhpc-OST0008_UUID 9615574536 9587388940 28150780 100% /lustre[OST:8]
smuhpc-OST0009_UUID 11534862728 11502157276 32703404 100% /lustre[OST:9]
smuhpc-OST000a_UUID 11538687128 10600008916 938671044 92% /lustre[OST:10]
smuhpc-OST000b_UUID 11534862728 11526258276 8593840 100% /lustre[OST:11]
smuhpc-OST000c_UUID 11538687128 11493808228 44876596 100% /lustre[OST:12]
smuhpc-OST000d_UUID 9612387536 8494655908 1117723728 88% /lustre[OST:13]
smuhpc-OST000e_UUID 11534862728 10765432012 769356988 93% /lustre[OST:14]
smuhpc-OST000f_UUID 9612387536 9555448164 56922292 99% /lustre[OST:15]
smuhpc-OST0010_UUID 11534862728 11324446024 210408036 98% /lustre[OST:16]
smuhpc-OST0011_UUID 11538687128 10222777216 1315902940 89% /lustre[OST:17]
smuhpc-OST0012_UUID 9615574536 8503099372 1112462716 88% /lustre[OST:18]
smuhpc-OST0013_UUID 13452678016 12535227748 917441212 93% /lustre[OST:19]
smuhpc-OST0014_UUID 13452678016 12634464700 818178500 94% /lustre[OST:20]
smuhpc-OST0015_UUID 11530866816 11411149472 119708876 99% /lustre[OST:21]
smuhpc-OST0016_UUID 13452678016 12639406780 813217988 94% /lustre[OST:22]
smuhpc-OST0017_UUID 11538687128 10231302100 1307378884 89% /lustre[OST:23]
smuhpc-OST0018_UUID 13452678016 12891084320 561588224 96% /lustre[OST:24]
smuhpc-OST0019_UUID 11530866816 10848816880 682044044 94% /lustre[OST:25]
smuhpc-OST001a_UUID 11534862728 10272389460 1262461836 89% /lustre[OST:26]
smuhpc-OST001b_UUID 9615574536 8751547300 864019044 91% /lustre[OST:27]
smuhpc-OST001c_UUID 11538687128 11027353036 511322908 96% /lustre[OST:28]
smuhpc-OST001d_UUID 9615574536 9505741344 109832588 99% /lustre[OST:29]
smuhpc-OST0030_UUID 11534862728 7027461656 4507389808 61% /lustre[OST:48]
smuhpc-OST0031_UUID 11538687128 2208373512 9330099084 19% /lustre[OST:49]
smuhpc-OST0032_UUID 9612387536 5795054380 3817314724 60% /lustre[OST:50]
smuhpc-OST0033_UUID 11534862728 7414666856 4120177440 64% /lustre[OST:51]
smuhpc-OST0034_UUID 11538687128 7489405512 4049271072 65% /lustre[OST:52]
smuhpc-OST0035_UUID 9615574536 6709760396 2905811696 70% /lustre[OST:53]
smuhpc-OST0036_UUID 11534862728 10824985124 709871280 94% /lustre[OST:54]
smuhpc-OST0037_UUID : Resource temporarily unavailable smuhpc-OST0038_UUID : Resource
temporarily unavailable
smuhpc-OST0039_UUID 11538687128 10663152820 875526472 92% /lustre[OST:57]
smuhpc-OST003a_UUID 9612387536 8451144380 1161159188 88% /lustre[OST:58]
smuhpc-OST003b_UUID 9615574536 9396687868 218849404 98% /lustre[OST:59]
filesystem summary: 446049266544 393606006040 52442216896 88% /lustre
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss