On 2013/08/04 11:30 PM, "linux freaker" <linuxfreaker(a)gmail.com> wrote:
I ran hadoop over lustre with 1 Namenode and 3 datanode running on
Lustre
client. Here is my findings:
Scenerio1: 1 MDS, 2 OSS/OST, 3 Lustre Clients (1 NameNode and 2
DataNode), Stripping : -1, Dataset: 18GB, Reducer: 20
Time taken: 59 min. 52 sec
Scenerio2: 1 MDS, 2 OSS/OST, 3 Lustre Clients (1 NameNode and 2
DataNode), Stripping
: -1, Dataset: 18GB, Reducer:30
Time Taken: 1 Hour . 5min
Ques: Did the time interval increase due to increase in reducer?
Since you have parallelism in the clients, you probably shouldn't be using
striping = -1, but the default striping = 1.
I have been using Ethernet. How much time(guess) will it take if I go
for
Infiniband?
This is totally workload dependent. For Lustre metadata operations, IB
can be 5-10x faster, less so for IO operations. We'd be interested to
hear what kind of improvement you get for your Hadoop workload.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division