On 12/20/12 2:28 PM, "Suresh Shelvapille" <suri(a)baymicrosystems.com>
> He is referring to the block cache, which on Lustre clients is by
> restricted to 32MB. This is to avoid large IO storms when large numbers
> clients flush cache. It is tunable but, if you wish to measure
> to any real device, you must always use file sizes much greater than the
> cache, otherwise as Brian points out you are measuring writes into RAM,
> not disk.
This is where I have a problem with Lustre. Any combination of "bs=30M
Or "bs=10M count=3" or "bs=2M, count=15", yield very similar results,
in the range of 550MB/s..at least that is what "dd" reports.
Because you are not writing to a real disk. The IO goes to client cache,
and is read from cache. Your IO is likely not touching Lustre servers at
all, from your description it's all sitting cached in client memory. Which
is very fast.
But, if I change "bs=100M count=1" then rate drops to 80MB/s!!
Which is pretty much exactly what I'd expect from a real disk.
So, all I am looking for is a way to transfer big files say, "bs=512M
Over luster and get 550MB/s.
Buy a bunch more disk, hook it to a bunch more servers. Mount more OSTs.
That's what Lustre does, all day, every day.
As I said I already have 128GigaBytes of RAM, with 16 core CPU etc...
So, if I can tune the "block cache" parameter and get higher rates I will
Again, when you do IO smaller than the client cache, the IO just sitting
in the client cache, and you aren't measuring any network/Lustre
performance at all. You are measuring the client talking to itself. If
you want to really evaluate your RDMA, best to use real world numbers. If
this is an admittedly small-scale simulation, what's so bad about an
honest 80 MB/sec?
> >> -----Original Message-----
> >> From: hpdd-discuss-bounces(a)lists.01.org
> >>[mailto:email@example.com] On Behalf Of Brian
> >> J. Murrell
> >> Sent: Thursday, December 20, 2012 4:07 PM
> >> To: hpdd-discuss(a)lists.01.org
> >> Subject: Re: [HPDD-discuss] Lustre performance related question
> >> On Thu, 2012-12-20 at 14:40 -0500, Suresh Shelvapille wrote
> >> > Both /dev/sda3 and sda4
> >> So you have the MDT and the OST competing for a single disk. Not
> >> for performance.
> >> > are on a regular scsi disks with 256GB capacity (6Gbps disk
> >> 6Gbps is the interface speed. The actual disk speed will be much
> >> ) slower.
> >> > I get about 550MB/s, which is reasonable.
> >> For a single disk? 550MB/s is amazing. Even for an SSD that's
> >> good. But you are probably not really getting that...
> >> > Now, if I change the block-size, count combintaion to
> >> >
> >> > anything less or more than 30M then the performance drops
> >> > considerably.
> >> Less doesn't make any sense, but more than 30M and you are starting
> >> write to the real disk and not just the cache in the client. That's
> >> you can get 550MB/s. You are only writing into memory, not a real
> >> yet.
> >> > What is so magical about 30Megabytes?
> >> It's cache. 32MB actually.
> >> > Is it even possible with single OST/MDT combination to get better
> >> > throughput for bigger files
> >> Really, a single MDT and single OST, even on separate disks doesn't
> >> usually make much sense. You are not going to get any better
> >> performance than you would if you just put something like NFS in
> >> of that disk the OST is on. With MDT and OST on the same disk, you
> >> probably even going to get worse.
> >> Lustre doesn't magically make hardware perform better. What magic
> >> Lustre does give you is the scalability of being able to apply it to
> >> or 100s or 1000s of disks and get the aggregate bandwidth and
> >> performance of all of those disks.
> >> > or I need multiple OSTs etc...
> >> Yes. If you want more speed, you need more OSTs.
> >> b.
> >> _______________________________________________
> >> HPDD-discuss mailing list
> >> HPDD-discuss(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/hpdd-discuss
> >HPDD-discuss mailing list