Lustre and kernel buffer interaction
by John Bauer
I have been trying to understand a behavior I am observing in an IOR
benchmark on Lustre. I have pared it down to a simple example.
The IOR benchmark is running in MPI mode. There are 2 ranks, each
running on its own node. Each rank does the following:
Note : Test was run on the "swan" cluster at Cray Inc., using /lus/scratch
write a file. ( 10GB )
fsync the file
close the file
MPI_barrier
open the file that was written by the other rank.
read the file that was written by the other rank.
close the file that was written by the other rank.
The writing of each file goes as expected.
The fsync takes very little time ( about .05 seconds).
The first reads of the file( written by the other rank ) start out *very
*slowly. While theses first reads are proceeding slowly, the
kernel's cached memory ( the Cached: line in /proc/meminfo) decreases
from the size of the file just written to nearly zero.
Once the cached memory has reached nearly zero, the file reading
proceeds as expected.
I have attached a jpg of the instrumentation of the processes that
illustrates this behavior.
My questions are:
Why does the reading of the file, written by the other rank, wait until
the cached data drains to nearly zero before proceeding normally.
Shouldn't the fsync ensure that the file's data is written to the
backing storage so this draining of the cached memory should be simply
releasing pages with no further I/O?
For this case the "dead" time is only about 4 seconds, but this "dead"
time scales directly with the size of the files.
John
--
John Bauer
I/O Doctors LLC
507-766-0378
bauerj(a)iodoctors.com
7 years, 2 months
quotas on 2.4.3
by Matt Bettinger
Hello,
We have a fresh 2.4.3 lustre upgrade that is not yet put into
production running on rhel 6.4.
We would like to take a look at quotas but looks like there is some
major performance problems with 1.8.9 clients.
Here is how I enabled quotas
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=ug
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.ost=ug
[root@lfs-mds-0-0 ~]# lctl get_param osd-*.*.quota_slave.info
osd-ldiskfs.lustre2-MDT0000.quota_slave.info=
target name: lustre2-MDT0000
pool ID: 0
type: md
quota enabled: ug
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
The quotas seem to be working however the write performance from
1.8.9wc client to 2.4.3 with quotas on is horrific. Am I not setting
quotas up correctly?
I try to make a simple user quota on /lustre2/mattb/300MB_QUOTA directory
[root@hous0036 mattb]# lfs setquota -u l0363734 -b 307200 -B 309200 -i
10000 -I 11000 /lustre2/mattb/300MB_QUOTA/
See quota change is in effect...
[root@hous0036 mattb]# lfs quota -u l0363734 /lustre2/mattb/300MB_QUOTA/
Disk quotas for user l0363734 (uid 1378):
Filesystem kbytes quota limit grace files quota limit grace
/lustre2/mattb/300MB_QUOTA/
310292* 307200 309200 - 4 10000 11000 -
Try and write to quota directory as the user but get horrible write speed
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 61.7426 seconds, 5.1 MB/s
Try file number 2 and then quota take effect, so it seems.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
dd: writing `301MB_FILE2': Disk quota exceeded
dd: closing output file `301MB_FILE2': Input/output error
If I disable quotas using
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.mdt=none
[root@lfs-mds-0-0 ~]# lctl conf_param lustre2.quota.oss=none
Then try and write the same file the speeds are more like we expect
but then can't use quotas.
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
301+0 records in
301+0 records out
315621376 bytes (316 MB) copied, 0.965009 seconds, 327 MB/s
[l0363734@hous0036 300MB_QUOTA]$ dd if=/dev/zero of=301MB_FILE2 bs=1M count=301
I have not tried this with a 2.4 client, yet since all of our nodes
are 1.8.X until we rebuild our images.
I was going by the manual on
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact...
but it looks like I am running into interoperability issue (which I
thought I fixed by using 1.8.9-wc client) or just not configuring
this correctly.
Thanks!
MB
7 years, 5 months
New liblustreapi ?
by Simmons, James A.
Now that lustre 2.7 is coming up soon I like to open the discussion
on one of the directions we could go. Recently several projects have sprung
up that impact liblustreapi. During one of those discussion the idea of a new
liblustreapi was brought up. A liblustreapi 2.0 you could say. So I like to
get feel in the community about this. If people want this proposal I like to
recommend that we gradually build this new library along side the original
liblustreapi and link it when necessary to the lustre utilities. First I
like the discussion of using the LGPL license with this new library. I look
forward to the feed back.
7 years, 6 months
Moving the MDT performing File-level backups
by Ramiro Alba
Hello everybody,
I am currently using Lustre 1.8.5 at servers, with a SLES kernel
(2.6.32.19-0.2.1-lustre.1.8.5).
Before this year is over, I'll upgrade to the current Lustre Maintenance
release
(currently 2.4.X), but now I need to arrange a MDT moving, to other LUN
in the MDS.
I did a test on a MDT LVM snapshot, following the moving procedure
described bellow, but I
found some issues I would like to comment:
1) The backup using the tar command took very long time (20 hours),
though the
MDT is quite small (197 MB). So long?
2) The restored 'ldiskfs' file system is a bit smaller than the original
one (about 20 MB
using du -sm). Should I worry?
3) When backing up Extended attributes with the command 'getfattr' I get
some errors of
the type:
getfattr: ./ROOT/<file path>: No such file or directory
I could see that they are symbolic links using absolute paths. Can
that be a problem?
Finally, I got the bellow procedure from the 1.8.X Lustre manual. Any
comment?
**************************************************************
MDT MOVING PROCEDURE
**************************************************************
----------------------------------------------------------
- Backup procedure
----------------------------------------------------------
1) Mount lustre as 'ldiskfs' type
mount -t ldiskfs /dev/sdb /lustre/mds
2) Change to the file system mount point
cd /lustre/mds
3) Backup all the file system data
tar cSf /backup/mds.tar .
4) Backup Extended Attributes
getfattr -R -d -m '.*' -P . > /backup/ea.bak
5) Umount the file system
cd
umount /lustre/mds
----------------------------------------------------------
- Restore procedure
----------------------------------------------------------
1) Make a receiving lustre MDT
mkfs.lustre --fsname=jffstg --param mdt.quota_type=ug \
--reformat --mdt --mgs /dev/sdc
2) Mount lustre as 'ldiskfs' type
mount -t ldiskfs /dev/sdc /lustre/mds
3) Change to the file system mount point
cd /lustre/mds
4) Restore the previous tar file
tar xpSf /backup/mds.tar
5) Restore de file system Extended Attributes
setfattr --restore=/backup/ea.bak
6) Remove the recovery logs
rm OBJECTS/* CATALOGS
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
7 years, 11 months
Re: [HPDD-discuss] [Lustre-discuss] Lustre ZFS Snapshot
by Dilger, Andreas
On 2014/06/29, 1:24 PM, "Indivar Nair" <indivar.nair(a)techterra.in<mailto:indivar.nair@techterra.in>> wrote:
Referring to https://jira.hpdd.intel.com/browse/LUDOC-161
The idea is to be able to backup and restore individual files / directories to another non-Lustre storage OR tape drive...
Well, the idea of LUDOC-161 is not to backup and restore individual files, but backup and restore of the whole MDT or OST.
So, can one mount the MDT and OST snapshots to form a parallel Lustre Volume (on the same cluster) and take a point-in-time backup / rsync of the complete Lustre filesystem?
In theory yes, but this isn't yet supported. The problem is that the Lustre filesystem name is the same between the snapshots, so the snapshot filesystem cannot be mounted on the same clients as the original filesystem.
The device-level backup/zfs send/recv, especially of the MDT, is mostly intended for disaster recovery, though with ZFS it may also be practical to use it in the manner you are proposing.
If not, can one mount a copy of MDT and OST snapshots on another Lustre cluster to recreate the complete Lustre Volume?
It would be OK to mount it on separate clients.
If you want to do file-level restore it may be better to do file-level backup from the mounted filesystem.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
7 years, 12 months
crash in 2.5.1
by Jerry Natowitz
Hello,
We recently built Lustre 2.5.1 with kernel 2.6.32-431.5.1.el6 for CentOS-6.5 We have been getting these crashes whenever we try to run mkfs.lustre
In the crash material you will notice that we are running mkfs.lustre.bin - this was done so we could first gather the full command lines. The open files, which are not on Lustre, seem to be getting corrupted on crash.
I have already applied the patch associated with LU4778, but it does nopt seem to have changed anything.
mkfs.lustre.bin --reformat --fsname hss3-rr1 --mdt --mgs --mkfsoptions='-m 0 -O
mmp,extents,dir_index,uninit_groups' --mgsnode=10.2.101.1@o2ib0<mailto:--mgsnode=10.2.101.1@o2ib0> --failnode=10.2..
101.2@o2ib0<mailto:101.2@o2ib0> /dev/mapper/map00
[root@ts-hss3-rr1-01 ~]# sh doit
warning: /dev/mapper/map00: for Lustre 2.4 and later, the target index must be specified with --index
Permanent disk data:
Target: hss3-rr1:MDT0000
Index: 0
Lustre FS: hss3-rr1
Mount type: ldiskfs
Flags: 0x65
(MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.2.101.1@o2ib<mailto:mgsnode=10.2.101.1@o2ib> failover.node=10.2.101.2@o2ib<mailto:failover.node=10.2.101.2@o2ib>
device size = 5717136MB
formatting backing filesystem ldiskfs on /dev/mapper/map00
target name hss3-rr1:MDT0000
4k blocks 1463586816
options -m 0 -J size=400 -I 512 -i 2048 -q -O mmp,extents,dir_index,uninit_groups,dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L hss3-rr1:MDT0000 -m 0 -J size=400 -I 512 -i 2048 -q -O mmp,extents,dir_index,uninit_groups,dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/mapper/map00 1463586816
LDISKFS-fs (dm-4): Can't enable usage tracking on a filesystem with the QUOTA feature set
LDISKFS-fs (dm-4): mount failed
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:848 remove_proc_entry+0x1f5/0x217() (Not tainted)
Hardware name: PowerEdge R710
remove_proc_entry: removing non-empty directory 'ldiskfs/dm-4', leaking at least 'prealloc_table'
Modules linked in: ldiskfs(U) ohci_hcd ehci_hcd dell_rbu 8021q garp stp llc ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad ib_core scsi_dh_rdac dcdbas sg rtc_cmos rtc_core rtc_lib thermal processor thermal_sys usbhid mlx4_en mlx4_core mpt2sas scsi_transport_sas raid_class megaraid_sas uhci_hcd qla4xxx iscsi_boot_sysfs libiscsi [last unloaded: scsi_wait_scan]
Pid: 12123, comm: mkfs.lustre.bin Not tainted 2.6.32-431.5.1.el6_lustre.2.5.1_1.0.1 #1
Call Trace:
[<ffffffff8117664c>] ? remove_proc_entry+0x1f5/0x217
[<ffffffff8104835b>] ? warn_slowpath_common+0x8d/0xa6
[<ffffffff81048466>] ? warn_slowpath_fmt+0x6e/0x70
[<ffffffff81523358>] ? schedule_timeout+0x2b/0x20a
[<ffffffff81175b63>] ? xlate_proc_name+0x49/0xaa
[<ffffffff81175b10>] ? proc_match+0x28/0x32
[<ffffffff8117664c>] ? remove_proc_entry+0x1f5/0x217
[<ffffffffa0218639>] ? ldiskfs_fill_super+0x249/0x2930 [ldiskfs]
[<ffffffff8111cfce>] ? sget+0x3dc/0x3ee
[<ffffffff8111d88d>] ? get_sb_bdev+0x12c/0x17a
[<ffffffffa02183f0>] ? ldiskfs_fill_super+0x0/0x2930 [ldiskfs]
[<ffffffffa0212858>] ? ldiskfs_get_sb+0x18/0x20 [ldiskfs]
[<ffffffff8111c946>] ? vfs_kern_mount+0x5f/0xde
[<ffffffff8111ca2c>] ? do_kern_mount+0x4c/0xf3
[<ffffffff81137242>] ? do_mount+0x6e7/0x78c
[<ffffffff81101985>] ? alloc_pages_current+0xa3/0xac
[<ffffffff8113736c>] ? sys_mount+0x85/0xbe
[<ffffffff81002adb>] ? system_call_fastpath+0x16/0x1b
---[ end trace ac77351ab2043638 ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000001e8
IP: [<ffffffffa02160d4>] ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
PGD c2114d067 PUD c223f5067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-4/range
CPU 2
Modules linked in: ldiskfs(U) ohci_hcd ehci_hcd dell_rbu 8021q garp stp llc ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad ib_core scsi_dh_rdac dcdbas sg rtc_cmos rtc_core rtc_lib thermal processor thermal_sys usbhid mlx4_en mlx4_core mpt2sas scsi_transport_sas raid_class megaraid_sas uhci_hcd qla4xxx iscsi_boot_sysfs libiscsi [last unloaded: scsi_wait_scan]
Pid: 12123, comm: mkfs.lustre.bin Tainted: G W --------------- 2.6.32-431.5.1.el6_lustre.2.5.1_1.0.1 #1 Dell Inc. PowerEdge R710/00NH4P
RIP: 0010:[<ffffffffa02160d4>] [<ffffffffa02160d4>] ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
RSP: 0018:ffff880625685c38 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff880c0b89c520 RCX: ffff880c265934c8
RDX: ffff880c0b89c130 RSI: ffff880c0b89c738 RDI: ffff880c0b89c520
RBP: ffff880625685c48 R08: 0000000000000001 R09: ffff880c0b89c568
R10: 7fffffffffffffff R11: 7fffffffffffffff R12: ffff880c0b89c658
R13: ffff880c23fb5860 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f214e2de700(0000) GS:ffff880645420000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000001e8 CR3: 0000000c22311000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mkfs.lustre.bin (pid: 12123, threadinfo ffff880625684000, task ffff880625ebe080)
Stack:
ffff880c0b89c520 ffff880c0b89c520 ffff880625685c68 ffffffff81132097
<d> ffff880c0b89c520 0000000000000000 ffff880625685c88 ffffffff8113247c
<d> ffff880c0b89c520 ffff880c0b89c520 ffff880625685ca8 ffffffff81131af2
Call Trace:
[<ffffffff81132097>] clear_inode+0x98/0xf2
[<ffffffff8113247c>] generic_drop_inode+0x47/0x5b
[<ffffffff81131af2>] iput+0x66/0x6a
[<ffffffff8112fefb>] shrink_dcache_for_umount_subtree+0x1ef/0x243
[<ffffffff811309cb>] shrink_dcache_for_umount+0x3c/0x4d
[<ffffffff8111d005>] generic_shutdown_super+0x25/0x107
[<ffffffff8111d10e>] kill_block_super+0x27/0x3f
[<ffffffff8111c8c7>] deactivate_locked_super+0x42/0x62
[<ffffffff8111d89c>] get_sb_bdev+0x13b/0x17a
[<ffffffffa02183f0>] ? ldiskfs_fill_super+0x0/0x2930 [ldiskfs]
[<ffffffffa0212858>] ldiskfs_get_sb+0x18/0x20 [ldiskfs]
[<ffffffff8111c946>] vfs_kern_mount+0x5f/0xde
[<ffffffff8111ca2c>] do_kern_mount+0x4c/0xf3
[<ffffffff81137242>] do_mount+0x6e7/0x78c
[<ffffffff81101985>] ? alloc_pages_current+0xa3/0xac
[<ffffffff8113736c>] sys_mount+0x85/0xbe
[<ffffffff81002adb>] system_call_fastpath+0x16/0x1b
Code: 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 2a 6d fe ff 48 8b 83 08 01 00 00 48 8b 80 88 02 00 00 <48> 8b b8 e8 01 00 00 48 85 ff 74 0c 48 8d b3 48 02 00 00 e8 9a
RIP [<ffffffffa02160d4>] ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
RSP <ffff880625685c38>
CR2: 00000000000001e8
---[ end trace ac77351ab2043639 ]---
Kernel panic - not syncing: Fatal exception
Pid: 12123, comm: mkfs.lustre.bin Tainted: G D W --------------- 2.6.32-431.5.1.el6_lustre.2.5.1_1.0.1 #1
Call Trace:
[<ffffffff8104853c>] ? panic+0xd4/0x1ab
[<ffffffff8100356e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff81049c78>] ? kmsg_dump+0x126/0x140
[<ffffffff815259b8>] ? oops_end+0xb5/0xc5
[<ffffffff8102ca79>] ? no_context+0x1fa/0x209
[<ffffffff8102cbfc>] ? __bad_area_nosemaphore+0x174/0x197
[<ffffffff8102cc63>] ? __bad_area+0x44/0x4d
[<ffffffff8102cc94>] ? bad_area+0x13/0x15
[<ffffffff8152759b>] ? do_page_fault+0x264/0x456
[<ffffffff81048466>] ? warn_slowpath_fmt+0x6e/0x70
[<ffffffff81523358>] ? schedule_timeout+0x2b/0x20a
[<ffffffff81069120>] ? bit_waitqueue+0x17/0x9f
[<ffffffff81524e6f>] ? page_fault+0x1f/0x30
[<ffffffffa02160d4>] ? ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
[<ffffffffa02160c6>] ? ldiskfs_clear_inode+0x16/0x50 [ldiskfs]
[<ffffffff81132097>] ? clear_inode+0x98/0xf2
[<ffffffff8113247c>] ? generic_drop_inode+0x47/0x5b
[<ffffffff81131af2>] ? iput+0x66/0x6a
[<ffffffff8112fefb>] ? shrink_dcache_for_umount_subtree+0x1ef/0x243
[<ffffffff811309cb>] ? shrink_dcache_for_umount+0x3c/0x4d
[<ffffffff8111d005>] ? generic_shutdown_super+0x25/0x107
[<ffffffff8111d10e>] ? kill_block_super+0x27/0x3f
[<ffffffff8111c8c7>] ? deactivate_locked_super+0x42/0x62
[<ffffffff8111d89c>] ? get_sb_bdev+0x13b/0x17a
[<ffffffffa02183f0>] ? ldiskfs_fill_super+0x0/0x2930 [ldiskfs]
[<ffffffffa0212858>] ? ldiskfs_get_sb+0x18/0x20 [ldiskfs]
[<ffffffff8111c946>] ? vfs_kern_mount+0x5f/0xde
[<ffffffff8111ca2c>] ? do_kern_mount+0x4c/0xf3
[<ffffffff81137242>] ? do_mount+0x6e7/0x78c
[<ffffffff81101985>] ? alloc_pages_current+0xa3/0xac
[<ffffffff8113736c>] ? sys_mount+0x85/0xbe
[<ffffffff81002adb>] ? system_call_fastpath+0x16/0x1b
[root@ts-hss3-rr1-04 ~]# cat doit
mkfs.lustre.bin --reformat --ost --failnode=10.2.101.3@o2ib0<mailto:--failnode=10.2.101.3@o2ib0> --index=3 --mkfsoptions=' -O mmp,extents,dir_index,uninit_groups -m 0' --fsname hss3-rr1 --mgsnode=10.2.101.1@o2ib0<mailto:--mgsnode=10.2.101.1@o2ib0> --mgsnode=10.2.101.2@o2ib0<mailto:--mgsnode=10.2.101.2@o2ib0> /dev/mapper/map03
[root@ts-hss3-rr1-04 ~]#
[root@ts-hss3-rr1-04 ~]#
[root@ts-hss3-rr1-04 ~]# sh doit
Permanent disk data:
Target: hss3-rr1:OST0003
Index: 3
Lustre FS: hss3-rr1
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: errors=remount-ro
Parameters: failover.node=10.2.101.3@o2ib<mailto:failover.node=10.2.101.3@o2ib> mgsnode=10.2.101.1@o2ib<mailto:mgsnode=10.2.101.1@o2ib> mgsnode=10.2.101.2@o2ib<mailto:mgsnode=10.2.101.2@o2ib>
device size = 9536080MB
formatting backing filesystem ldiskfs on /dev/mapper/map03
target name hss3-rr1:OST0003
4k blocks 2441236480
options -m 0 -J size=400 -I 256 -i 524288 -q -O mmp,extents,dir_index,uninit_groups,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize=4290772992,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L hss3-rr1:OST0003 -m 0 -J size=400 -I 256 -i 524288 -q -O mmp,extents,dir_index,uninit_groups,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize=4290772992,lazy_journal_init -F /dev/mapper/map03 2441236480
LDISKFS-fs (dm-4): Can't enable usage tracking on a filesystem with the QUOTA feature set
LDISKFS-fs (dm-4): mount failed
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:848 remove_proc_entry+0x1f5/0x217() (Not tainted)
Hardware name: PowerEdge R710
remove_proc_entry: removing non-empty directory 'ldiskfs/dm-4', leaking at least 'prealloc_table'
Modules linked in: ldiskfs(U) ohci_hcd ehci_hcd dell_rbu 8021q garp stp llc ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad ib_core scsi_dh_rdac dcdbas sg rtc_cmos rtc_core rtc_lib thermal processor thermal_sys usbhid mlx4_en mlx4_core mpt2sas scsi_transport_sas raid_class megaraid_sas uhci_hcd qla4xxx iscsi_boot_sysfs libiscsi [last unloaded: scsi_wait_scan]
Pid: 10824, comm: mkfs.lustre.bin Not tainted 2.6.32-431.5.1.el6_lustre.2.5.1_1.0.1 #1
Call Trace:
[<ffffffff8117664c>] ? remove_proc_entry+0x1f5/0x217
[<ffffffff8104835b>] ? warn_slowpath_common+0x8d/0xa6
[<ffffffff81048466>] ? warn_slowpath_fmt+0x6e/0x70
[<ffffffff81523358>] ? schedule_timeout+0x2b/0x20a
[<ffffffff81175b63>] ? xlate_proc_name+0x49/0xaa
[<ffffffff81175b10>] ? proc_match+0x28/0x32
[<ffffffff8117664c>] ? remove_proc_entry+0x1f5/0x217
[<ffffffffa0218639>] ? ldiskfs_fill_super+0x249/0x2930 [ldiskfs]
[<ffffffff8111cfce>] ? sget+0x3dc/0x3ee
[<ffffffff8111d88d>] ? get_sb_bdev+0x12c/0x17a
[<ffffffffa02183f0>] ? ldiskfs_fill_super+0x0/0x2930 [ldiskfs]
[<ffffffffa0212858>] ? ldiskfs_get_sb+0x18/0x20 [ldiskfs]
[<ffffffff8111c946>] ? vfs_kern_mount+0x5f/0xde
[<ffffffff8111ca2c>] ? do_kern_mount+0x4c/0xf3
[<ffffffff81137242>] ? do_mount+0x6e7/0x78c
[<ffffffff81101985>] ? alloc_pages_current+0xa3/0xac
[<ffffffff8113736c>] ? sys_mount+0x85/0xbe
[<ffffffff81002adb>] ? system_call_fastpath+0x16/0x1b
---[ end trace 95fb8561345cc399 ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000001e8
IP: [<ffffffffa02160d4>] ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
PGD 6227d6067 PUD 61bf18067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-4/range
CPU 4
Modules linked in: ldiskfs(U) ohci_hcd ehci_hcd dell_rbu 8021q garp stp llc ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad ib_core scsi_dh_rdac dcdbas sg rtc_cmos rtc_core rtc_lib thermal processor thermal_sys usbhid mlx4_en mlx4_core mpt2sas scsi_transport_sas raid_class megaraid_sas uhci_hcd qla4xxx iscsi_boot_sysfs libiscsi [last unloaded: scsi_wait_scan]
Pid: 10824, comm: mkfs.lustre.bin Tainted: G W --------------- 2.6.32-431.5.1.el6_lustre.2.5.1_1.0.1 #1 Dell Inc. PowerEdge R710/00NH4P
RIP: 0010:[<ffffffffa02160d4>] [<ffffffffa02160d4>] ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
RSP: 0018:ffff880621d2bc38 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8805ffa0b520 RCX: ffff880620c25cc8
RDX: ffff8805ffa0b130 RSI: ffff8805ffa0b738 RDI: ffff8805ffa0b520
RBP: ffff880621d2bc48 R08: 0000000000000001 R09: ffff8805ffa0b568
R10: 7fffffffffffffff R11: 7fffffffffffffff R12: ffff8805ffa0b658
R13: ffff8805ff91e6e0 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f1279cf2700(0000) GS:ffff88003ea80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000001e8 CR3: 000000061891c000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mkfs.lustre.bin (pid: 10824, threadinfo ffff880621d2a000, task ffff880625797510)
Stack:
ffff8805ffa0b520 ffff8805ffa0b520 ffff880621d2bc68 ffffffff81132097
<d> ffff8805ffa0b520 0000000000000000 ffff880621d2bc88 ffffffff8113247c
<d> ffff8805ffa0b520 ffff8805ffa0b520 ffff880621d2bca8 ffffffff81131af2
Call Trace:
[<ffffffff81132097>] clear_inode+0x98/0xf2
[<ffffffff8113247c>] generic_drop_inode+0x47/0x5b
[<ffffffff81131af2>] iput+0x66/0x6a
[<ffffffff8112fefb>] shrink_dcache_for_umount_subtree+0x1ef/0x243
[<ffffffff811309cb>] shrink_dcache_for_umount+0x3c/0x4d
[<ffffffff8111d005>] generic_shutdown_super+0x25/0x107
[<ffffffff8111d10e>] kill_block_super+0x27/0x3f
[<ffffffff8111c8c7>] deactivate_locked_super+0x42/0x62
[<ffffffff8111d89c>] get_sb_bdev+0x13b/0x17a
[<ffffffffa02183f0>] ? ldiskfs_fill_super+0x0/0x2930 [ldiskfs]
[<ffffffffa0212858>] ldiskfs_get_sb+0x18/0x20 [ldiskfs]
[<ffffffff8111c946>] vfs_kern_mount+0x5f/0xde
[<ffffffff8111ca2c>] do_kern_mount+0x4c/0xf3
[<ffffffff81137242>] do_mount+0x6e7/0x78c
[<ffffffff81101985>] ? alloc_pages_current+0xa3/0xac
[<ffffffff8113736c>] sys_mount+0x85/0xbe
[<ffffffff81002adb>] system_call_fastpath+0x16/0x1b
Code: 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 2a 6d fe ff 48 8b 83 08 01 00 00 48 8b 80 88 02 00 00 <48> 8b b8 e8 01 00 00 48 85 ff 74 0c 48 8d b3 48 02 00 00 e8 9a
RIP [<ffffffffa02160d4>] ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
RSP <ffff880621d2bc38>
CR2: 00000000000001e8
---[ end trace 95fb8561345cc39a ]---
Kernel panic - not syncing: Fatal exception
Pid: 10824, comm: mkfs.lustre.bin Tainted: G D W --------------- 2.6.32-431.5.1.el6_lustre.2.5.1_1.0.1 #1
Call Trace:
[<ffffffff8104853c>] ? panic+0xd4/0x1ab
[<ffffffff8100356e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff81049c78>] ? kmsg_dump+0x126/0x140
[<ffffffff815259b8>] ? oops_end+0xb5/0xc5
[<ffffffff8102ca79>] ? no_context+0x1fa/0x209
[<ffffffff8102cbfc>] ? __bad_area_nosemaphore+0x174/0x197
[<ffffffff8102cc63>] ? __bad_area+0x44/0x4d
[<ffffffff8102cc94>] ? bad_area+0x13/0x15
[<ffffffff8152759b>] ? do_page_fault+0x264/0x456
[<ffffffff81048466>] ? warn_slowpath_fmt+0x6e/0x70
[<ffffffff81523358>] ? schedule_timeout+0x2b/0x20a
[<ffffffff81069120>] ? bit_waitqueue+0x17/0x9f
[<ffffffff81524e6f>] ? page_fault+0x1f/0x30
[<ffffffffa02160d4>] ? ldiskfs_clear_inode+0x24/0x50 [ldiskfs]
[<ffffffffa02160c6>] ? ldiskfs_clear_inode+0x16/0x50 [ldiskfs]
[<ffffffff81132097>] ? clear_inode+0x98/0xf2
[<ffffffff8113247c>] ? generic_drop_inode+0x47/0x5b
[<ffffffff81131af2>] ? iput+0x66/0x6a
[<ffffffff8112fefb>] ? shrink_dcache_for_umount_subtree+0x1ef/0x243
[<ffffffff811309cb>] ? shrink_dcache_for_umount+0x3c/0x4d
[<ffffffff8111d005>] ? generic_shutdown_super+0x25/0x107
[<ffffffff8111d10e>] ? kill_block_super+0x27/0x3f
[<ffffffff8111c8c7>] ? deactivate_locked_super+0x42/0x62
[<ffffffff8111d89c>] ? get_sb_bdev+0x13b/0x17a
[<ffffffffa02183f0>] ? ldiskfs_fill_super+0x0/0x2930 [ldiskfs]
[<ffffffffa0212858>] ? ldiskfs_get_sb+0x18/0x20 [ldiskfs]
[<ffffffff8111c946>] ? vfs_kern_mount+0x5f/0xde
[<ffffffff8111ca2c>] ? do_kern_mount+0x4c/0xf3
[<ffffffff81137242>] ? do_mount+0x6e7/0x78c
[<ffffffff81101985>] ? alloc_pages_current+0xa3/0xac
[<ffffffff8113736c>] ? sys_mount+0x85/0xbe
[<ffffffff81002adb>] ? system_call_fastpath+0x16/0x1b
Jerry Natowitz
Jerry Natowitz
8 years
lustre context switching
by Michael Di Domenico
i'm seeing very high load numbers on my OST's during heavy write
operations. i believe this is being caused by the ko2iblnd process
and my infinipath cards. (as evidenced by 20k+ context
switching/interrupt numbers). while this doesn't seem to be a problem
per-se, i'd like to know if there is anything I can do to lower those
numbers.
i've not done any tuning so far;
lustre 2.4.3 on rhel 6.4 x86_64, whamcloud rpm's
infinipath qdr cards client/server using rhel 6.4 bundled ofed
8 years
exporting a lustre FS as NFS
by E.S. Rosenberg
In the interest of easy data access from computers that are not part of the
cluster we would like to export the lustre filesystem as NFS from one of
the nodes.
>From what I understood this should be possible but so far we are getting
kernel panics.
So:
- Has anyone done it?
- What are the pitfalls?
- Any other useful tips?
Tech details:
Lustre: 2.4.3
Kernel: 3.14.3 + aufs
Distro: Debian testing/sid
We will probably be upgrading to lustre 2.5.x in the near future.
Thanks,
Eli
Trace from the test subject:
Jun 16 18:07:17 kernel:LustreError:
3795:0:(llite_internal.h:1141:ll_inode2fid()) ASSERTION( inode != ((void
*)0) ) failed:
Jun 16 18:07:17 kernel:LustreError:
3795:0:(llite_internal.h:1141:ll_inode2fid()) LBUG
Jun 16 18:07:17 kernel:CPU: 0 PID: 3795 Comm: nfsd Tainted: G WC
3.14.3-aufs-mos-1 #1
Jun 16 18:07:17 kernel:Hardware name: Dell Inc. PowerEdge C6220/03C9JJ,
BIOS 1.2.1 05/27/2013
Jun 16 18:07:17 kernel: 0000000000000000 ffff881047dd5ba0 ffffffff8175c9e4
ffffffffa1842970
Jun 16 18:07:17 kernel: ffff881047dd5bc0 ffffffffa006954c 0000000000000000
ffff880845cd5148
Jun 16 18:07:17 kernel: ffff881047dd5c00 ffffffffa18054fd 0cb158b46edf5345
0000000000000013
Jun 16 18:07:17 kernel:Call Trace:
Jun 16 18:07:17 kernel: [<ffffffff8175c9e4>] dump_stack+0x45/0x56
Jun 16 18:07:17 kernel: [<ffffffffa006954c>] lbug_with_loc+0x3c/0x90
[libcfs]
Jun 16 18:07:17 kernel: [<ffffffffa18054fd>] ll_encode_fh+0x109/0x13e
[lustre]
Jun 16 18:07:17 kernel: [<ffffffff81203f79>]
exportfs_encode_inode_fh+0x1b/0x86
Jun 16 18:07:17 kernel: [<ffffffff8120402f>] exportfs_encode_fh+0x4b/0x60
Jun 16 18:07:17 kernel: [<ffffffff810f420f>] ? lookup_real+0x27/0x42
Jun 16 18:07:17 kernel: [<ffffffff81207689>] _fh_update.part.7+0x39/0x48
Jun 16 18:07:17 kernel: [<ffffffff81207c2a>] fh_compose+0x3d1/0x3fa
Jun 16 18:07:17 kernel: [<ffffffff81210fe4>]
encode_entryplus_baggage+0xd3/0x125
Jun 16 18:07:17 kernel: [<ffffffff8121121f>]
encode_entry.isra.14+0x150/0x2cb
Jun 16 18:07:17 kernel: [<ffffffff8121247d>]
nfs3svc_encode_entry_plus+0xf/0x11
Jun 16 18:07:17 kernel: [<ffffffff81209e7e>] nfsd_readdir+0x160/0x1f8
Jun 16 18:07:17 kernel: [<ffffffff8121246e>] ? nfs3svc_encode_entry+0xe/0xe
Jun 16 18:07:17 kernel: [<ffffffff8120831b>] ? nfsd_splice_actor+0xe8/0xe8
Jun 16 18:07:17 kernel: [<ffffffff81056154>] ? groups_free+0x22/0x44
Jun 16 18:07:17 kernel: [<ffffffff8120fa3d>]
nfsd3_proc_readdirplus+0xe3/0x1df
Jun 16 18:07:17 kernel: [<ffffffff81205269>] nfsd_dispatch+0xca/0x1ad
Jun 16 18:07:17 kernel: [<ffffffff8173579b>] svc_process+0x469/0x768
Jun 16 18:07:17 kernel: [<ffffffff81204d39>] nfsd+0xc5/0x117
Jun 16 18:07:17 kernel: [<ffffffff81204c74>] ? nfsd_destroy+0x6b/0x6b
Jun 16 18:07:17 kernel: [<ffffffff81051764>] kthread+0xd6/0xde
Jun 16 18:07:17 kernel: [<ffffffff8105168e>] ?
kthread_create_on_node+0x15d/0x15d
Jun 16 18:07:17 kernel: [<ffffffff8176400c>] ret_from_fork+0x7c/0xb0
Jun 16 18:07:17 kernel: [<ffffffff8105168e>] ?
kthread_create_on_node+0x15d/0x15d
8 years
bug LU-2455 (lctl ping timeout)
by Michael Di Domenico
Before I attempt to mount lustre through scripts, i usually do a lctl
ping to check for connectivity. i've noticed this lag in ping timeout
before. Can anyone tell me if there is a workaround? Or is there a
better way to check the state before I actually attempt a mount?
https://jira.hpdd.intel.com/browse/LU-2455
8 years