Hi Nilesh, Ivy,
Fantastic -- thanks!
On 06/03/13 16:30, Zhu, Xia wrote:
No problem, I will send a path to you.
From: Jain, Nilesh
Sent: Tuesday, March 05, 2013 9:29 PM
To: Rebecca Dengate; graphbuilder(a)ml01.01.org
Cc: Zhu, Xia
Subject: RE: [GraphBuilder] GraphBuilder output -> GraphLab input
Sorry that you have encountered the problem... Ivy (copied) on the email has patch a for
the GLv2 that support the current output format of GB.
Ivy, could you please provide the patch to Rebecca.
From: graphbuilder-bounces(a)lists.01.org [mailto:email@example.com] On
Behalf Of Rebecca Dengate
Sent: Tuesday, March 05, 2013 9:23 PM
Subject: [GraphBuilder] GraphBuilder output -> GraphLab input
I am looking at using GraphBuilder and GraphLab2 for distributed computer vision at
NICTA. I have been trying to get a simple example to work in order to understand how the
pipeline works. I generate a graph
"hadoop jar graphbuilder-0.0.1-SNAPSHOT-hadoop-job.jar
4 /user/hduser/wiki-input /user/hduser/wiki-output"
where wiki-input contains
enwiki-latest-pages-articles1.xml-p000000010p000010000 downloaded from wikipedia.
I'm then trying to load the graph from wiki-output in GraphLab2 (to, say, run
pagerank). I am using load_json for this, but it expects a structure that looks like:
graph/graph[processID]-r-0000x[optional .gz] graph/vid2lvid[processID]-r-0000x[optional
.gz] graph/edata[processID]-r-0000x[optional .gz] graph/vdata[processID]-r-0000x[optional
.gz] where x is (I think) the partition number.
This doesn't at all match what is output by GraphBuilder, which is (for
edges -> partition0 -> subpart0-x
edges -> vrecord -> part-00000
vrecords -> partition0-x -> vrecord
vrecords -> partition0-x -> meta
The edges -> vrecord -> part-00000 seems to match what's expected for
graph/vdata[processID]-r-0000x[optional .gz] (except for keyword vdata rather than
VertexData) but the rest do not seem to correspond to what GraphLab expects.
According to docs for GraphBuilder, the graph formats are both GraphLab2 based (GLGraph
says it's a GraphLab2 distributed graph, and SimpleGraph says it's a
pre-finalized, post-partitioned GraphLab2 graph). I've tried changing SimpleGraph to
GLGraph in EdgeIngressReducer.java and EdgeIngressMR.java, but this doesn't change the
structure of the files output by GraphBuilder.
I'm sure I'm doing something wrong here, but I can't see what it is (I'm
fairly new to graph processing). Any help appreciated in getting GraphBuilder output to
work as GraphLab2 input!
GraphBuilder mailing list