Hi,
thanks for your reply.
You should tune zfs arc size if you dont have enough memory and cpu. try
with primary/secondary cache off..(but I think you will get slow
commits on
ost)
changed my arc max mem size and set it to be max 2.5G, will watch for slow
creates
In addition disable compression/dedup.
already have.
We had a similar problems, after a while users got unreadable files.
in the sense that they could not read them, or that the files were corrupt?
Try upgrade your ZFS to recent one it is very probable that you hit
some
bug in arc.
im using the one from zfsonlinux site (0.6.2). should i be using a
different one?
Currently we degraded 2.5 to 2.4.2 with zfs 6.2.1. seems to be
stable, but
on old machines we get often slow writes on ost.
will watch it carefully for a couple of days too see what is going on.
best regards,
Luka Leskovec
BTW
on ext4 vs zfs MDS get 10-50 times faster. I am not sure if you can mix
MDT on zfs and OST on zfs. This can improve /bin/ls (but not ls--color -l )
Regards,
Arman.
On Wed, Jan 22, 2014 at 12:42 PM, luka leskovec <liskawc(a)gmail.com> wrote:
> Hello all,
>
> i got a running lustre 2.5.0 + zfs setup on top of centos 6.4 (the
> kernels available on the public whamcloud site), my clients are on centos
> 6.5 (minor version difference, i recompiled the client sources with the
> options specified on the whamcloud site)
>
> but now i have some problems. I cannot judge how serious it is, as the
> only problems i observe are slow responses on ls, rm and tar and apart from
> that it works great. i also export it over nfs, which sometimes hangs the
> client on which it is exported, but i expect this is an issue related to
> how many service threads i have running on my servers (old machines).
>
> but my osses (i got two) keep spitting out these messages into the system
> log:
> xxxxxxxxxxxxxxxxxxxxxx kernel: SPL: Showing stack for process 3264
> xxxxxxxxxxxxxxxxxxxxxx kernel: Pid: 3264, comm: txg_sync Tainted:
> P --------------- 2.6.32-358.18.1.el6_lustre.x86_64 #1
> xxxxxxxxxxxxxxxxxxxxxx kernel: Call Trace:
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa01595a7>] ?
> spl_debug_dumpstack+0x27/0x40 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0161337>] ?
> kmem_alloc_debug+0x437/0x4c0 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0163b13>] ?
> task_alloc+0x1d3/0x380 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0160f8f>] ?
> kmem_alloc_debug+0x8f/0x4c0 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa02926f0>] ?
> spa_deadman+0x0/0x120 [zfs]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa016432b>] ?
> taskq_dispatch_delay+0x19b/0x2a0 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0164612>] ?
> taskq_cancel_id+0x102/0x1e0 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa028259a>] ?
> spa_sync+0x1fa/0xa80 [zfs]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810a2431>] ?
> ktime_get_ts+0xb1/0xf0
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295707>] ?
> txg_sync_thread+0x307/0x590 [zfs]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810560a9>] ?
> set_user_nice+0xc9/0x130
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295400>] ?
> txg_sync_thread+0x0/0x590 [zfs]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162478>] ?
> thread_generic_wrapper+0x68/0x80 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162410>] ?
> thread_generic_wrapper+0x0/0x80 [spl]
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>
> does anyone know, is this a serious problem, or just aesthetics? any way
> to solve this? any hints?
>
> best regards,
> Luka Leskovec
>
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss
>
>
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss