Hi,

it turned out that for my config (2x sun thumper's as storage as oss) zfs was not a good option. Even after tuning zfs arc, there was a lot of lag and the filesystem appeared almost useless (long lags in reading files). So i reverted to ldiskfs and that performs great now. Just wanted to let others know, that zfs might not be a good idea on old hardware.

best,
Luka Leskovec  


2014-01-23 9:27 GMT+01:00 luka leskovec <liskawc@gmail.com>:



Hi,

thanks for your reply.

You should tune zfs arc size if you dont have enough memory and cpu. try with primary/secondary cache off..(but I think you will get slow commits on ost)
changed my arc max mem size and set it to be max 2.5G, will watch for slow creates
In addition disable compression/dedup.
already have.
We had a similar problems, after a while users got unreadable files.
in the sense that they could not read them, or that the files were corrupt?
Try upgrade your ZFS to recent one it is very probable that you hit some bug in arc.
im using the one from zfsonlinux site (0.6.2). should i be using a different one?
Currently we degraded 2.5 to 2.4.2 with zfs 6.2.1. seems to be stable, but on old machines we get often slow writes on ost.
will watch it carefully for a couple of days too see what is going on.

best regards,
Luka Leskovec
 
BTW
on ext4 vs zfs MDS get 10-50 times faster. I am not sure if you can mix MDT on zfs and OST on zfs. This can improve /bin/ls (but not ls--color -l )
Regards,
Arman.



On Wed, Jan 22, 2014 at 12:42 PM, luka leskovec <liskawc@gmail.com> wrote:
Hello all,

i got a running lustre 2.5.0 + zfs setup on top of centos 6.4 (the kernels available on the public whamcloud site), my clients are on centos 6.5 (minor version difference, i recompiled the client sources with the options specified on the whamcloud site)

but now i have some problems. I cannot judge how serious it is, as the only problems i observe are slow responses on ls, rm and tar and apart from that it works great. i also export it over nfs, which sometimes hangs the client on which it is exported, but i expect this is an issue related to how many service threads i have running on my servers (old machines).

but my osses (i got two) keep spitting out these messages into the system log:
xxxxxxxxxxxxxxxxxxxxxx kernel: SPL: Showing stack for process 3264
xxxxxxxxxxxxxxxxxxxxxx kernel: Pid: 3264, comm: txg_sync Tainted: P           ---------------    2.6.32-358.18.1.el6_lustre.x86_64 #1
xxxxxxxxxxxxxxxxxxxxxx kernel: Call Trace:
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa01595a7>] ? spl_debug_dumpstack+0x27/0x40 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0161337>] ? kmem_alloc_debug+0x437/0x4c0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0163b13>] ? task_alloc+0x1d3/0x380 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0160f8f>] ? kmem_alloc_debug+0x8f/0x4c0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa02926f0>] ? spa_deadman+0x0/0x120 [zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa016432b>] ? taskq_dispatch_delay+0x19b/0x2a0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0164612>] ? taskq_cancel_id+0x102/0x1e0 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa028259a>] ? spa_sync+0x1fa/0xa80 [zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295707>] ? txg_sync_thread+0x307/0x590 [zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295400>] ? txg_sync_thread+0x0/0x590 [zfs]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162478>] ? thread_generic_wrapper+0x68/0x80 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162410>] ? thread_generic_wrapper+0x0/0x80 [spl]
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

does anyone know, is this a serious problem, or just aesthetics? any way to solve this? any hints?

best regards,
Luka Leskovec

_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss



_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss