Hi,
it turned out that for my config (2x sun thumper's as storage as oss) zfs
was not a good option. Even after tuning zfs arc, there was a lot of lag
and the filesystem appeared almost useless (long lags in reading files). So
i reverted to ldiskfs and that performs great now. Just wanted to let
others know, that zfs might not be a good idea on old hardware.
best,
Luka Leskovec
2014-01-23 9:27 GMT+01:00 luka leskovec <liskawc(a)gmail.com>:
Hi,
thanks for your reply.
You should tune zfs arc size if you dont have enough memory and cpu. try
> with primary/secondary cache off..(but I think you will get slow commits on
> ost)
>
changed my arc max mem size and set it to be max 2.5G, will watch for slow
creates
> In addition disable compression/dedup.
>
already have.
> We had a similar problems, after a while users got unreadable files.
>
in the sense that they could not read them, or that the files were
corrupt?
> Try upgrade your ZFS to recent one it is very probable that you hit some
> bug in arc.
>
im using the one from zfsonlinux site (0.6.2). should i be using a
different one?
> Currently we degraded 2.5 to 2.4.2 with zfs 6.2.1. seems to be stable,
> but on old machines we get often slow writes on ost.
>
will watch it carefully for a couple of days too see what is going on.
best regards,
Luka Leskovec
> BTW
> on ext4 vs zfs MDS get 10-50 times faster. I am not sure if you can mix
> MDT on zfs and OST on zfs. This can improve /bin/ls (but not ls--color -l )
> Regards,
> Arman.
>
>
>
> On Wed, Jan 22, 2014 at 12:42 PM, luka leskovec <liskawc(a)gmail.com>wrote:
>
>> Hello all,
>>
>> i got a running lustre 2.5.0 + zfs setup on top of centos 6.4 (the
>> kernels available on the public whamcloud site), my clients are on centos
>> 6.5 (minor version difference, i recompiled the client sources with the
>> options specified on the whamcloud site)
>>
>> but now i have some problems. I cannot judge how serious it is, as the
>> only problems i observe are slow responses on ls, rm and tar and apart from
>> that it works great. i also export it over nfs, which sometimes hangs the
>> client on which it is exported, but i expect this is an issue related to
>> how many service threads i have running on my servers (old machines).
>>
>> but my osses (i got two) keep spitting out these messages into the
>> system log:
>> xxxxxxxxxxxxxxxxxxxxxx kernel: SPL: Showing stack for process 3264
>> xxxxxxxxxxxxxxxxxxxxxx kernel: Pid: 3264, comm: txg_sync Tainted:
>> P --------------- 2.6.32-358.18.1.el6_lustre.x86_64 #1
>> xxxxxxxxxxxxxxxxxxxxxx kernel: Call Trace:
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa01595a7>] ?
>> spl_debug_dumpstack+0x27/0x40 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0161337>] ?
>> kmem_alloc_debug+0x437/0x4c0 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0163b13>] ?
>> task_alloc+0x1d3/0x380 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0160f8f>] ?
>> kmem_alloc_debug+0x8f/0x4c0 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa02926f0>] ?
>> spa_deadman+0x0/0x120 [zfs]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa016432b>] ?
>> taskq_dispatch_delay+0x19b/0x2a0 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0164612>] ?
>> taskq_cancel_id+0x102/0x1e0 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa028259a>] ?
>> spa_sync+0x1fa/0xa80 [zfs]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810a2431>] ?
>> ktime_get_ts+0xb1/0xf0
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295707>] ?
>> txg_sync_thread+0x307/0x590 [zfs]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810560a9>] ?
>> set_user_nice+0xc9/0x130
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0295400>] ?
>> txg_sync_thread+0x0/0x590 [zfs]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162478>] ?
>> thread_generic_wrapper+0x68/0x80 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffffa0162410>] ?
>> thread_generic_wrapper+0x0/0x80 [spl]
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0
>> xxxxxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>>
>> does anyone know, is this a serious problem, or just aesthetics? any way
>> to solve this? any hints?
>>
>> best regards,
>> Luka Leskovec
>>
>> _______________________________________________
>> HPDD-discuss mailing list
>> HPDD-discuss(a)lists.01.org
>>
https://lists.01.org/mailman/listinfo/hpdd-discuss
>>
>>
>
> _______________________________________________
> HPDD-discuss mailing list
> HPDD-discuss(a)lists.01.org
>
https://lists.01.org/mailman/listinfo/hpdd-discuss
>
>