Why, specifically, did you expect a performance boost?  I don't think the relevant bottlenecks are affected by RPC size, so I don't think you should expect an improvement.

My understanding (which comes partly from LUG presentations, etc, and partly from investigations of these issues I've done at Cray) is that 2.x is slower than 1.8 for two reasons:
1. Clients are often CPU bound in the CLIO layers, which I think would be unaffected by a larger RPC size.
2. Clients do not package IOs anywhere near as well, resulting in a larger number of smaller IOs.  (Even in a test where we do only 1 MB IO requests, on 2.x, we would see a large number of RPCs of (much) < 1 MB from the client, where as in 1.8, we saw almost exclusively 1 MB RPCs. For some tests, we'd see as many as 10 times as many total RPCs on 2.x vs 1.8.)  
Since the client isn't doing a good job of filling 1 MB RPCs, I don't think it would fill 4 MB RPCs.

In contrast, 2.6  is much more like 1.8.  CPU usage is down, and IOs are packaged much better.  Our IO statistics for 2.6 look much more like 1.8 than for earlier 2.x.

- Patrick Farrell

From: HPDD-discuss [hpdd-discuss-bounces@lists.01.org] on behalf of Simmons, James A. [simmonsja@ornl.gov]
Sent: Wednesday, October 22, 2014 8:16 PM
To: hpdd-discuss@lists.01.org
Subject: [HPDD-discuss] Anyone using 4MB RPCs

So recently we have moved our systems from 1.8 to 2.5 clients and have lost of the

performance we had from before which is expected. So I thought  we could try using

4MB RPCs instead of the default 1MB RPC packet. I set max_pages_per_rpc to 1024

and looked at the value of max_dirty_mb which was 32 and max_rpcs_in_flight which

is 8. By default a dirty cache of 32MB should be enough in this case. So It tested it and

saw no performance improvements. After that I boosted max_dirty_mb to 64 and still

no improvements over the default settings. Has anyone seen this before? What could

I be missing?