Re: [GraphBuilder] GraphBuilder output -> GraphLab input
by Rebecca Dengate
Hi Nilesh, Ivy,
Fantastic -- thanks!
Cheers,
Rebecca
On 06/03/13 16:30, Zhu, Xia wrote:
> Hi Rebecca,
>
> No problem, I will send a path to you.
>
>
> Thanks,
> Ivy
> -----Original Message-----
> From: Jain, Nilesh
> Sent: Tuesday, March 05, 2013 9:29 PM
> To: Rebecca Dengate; graphbuilder(a)ml01.01.org
> Cc: Zhu, Xia
> Subject: RE: [GraphBuilder] GraphBuilder output -> GraphLab input
>
> Hi Rebecca,
>
> Sorry that you have encountered the problem... Ivy (copied) on the email has patch a for the GLv2 that support the current output format of GB.
>
> Ivy, could you please provide the patch to Rebecca.
>
> Thanks,
> Nilesh
>
> -----Original Message-----
> From: graphbuilder-bounces(a)lists.01.org [mailto:graphbuilder-bounces@lists.01.org] On Behalf Of Rebecca Dengate
> Sent: Tuesday, March 05, 2013 9:23 PM
> To: graphbuilder(a)ml01.01.org
> Subject: [GraphBuilder] GraphBuilder output -> GraphLab input
>
> Hi there,
>
> I am looking at using GraphBuilder and GraphLab2 for distributed computer vision at NICTA. I have been trying to get a simple example to work in order to understand how the pipeline works. I generate a graph
> using:
> "hadoop jar graphbuilder-0.0.1-SNAPSHOT-hadoop-job.jar
> com.intel.hadoop.graphbuilder.demoapps.wikipedia.linkgraph.LinkGraphEnd2End
> 4 /user/hduser/wiki-input /user/hduser/wiki-output"
> where wiki-input contains
> enwiki-latest-pages-articles1.xml-p000000010p000010000 downloaded from wikipedia.
>
> I'm then trying to load the graph from wiki-output in GraphLab2 (to, say, run pagerank). I am using load_json for this, but it expects a structure that looks like:
> graph/graph[processID]-r-0000x[optional .gz] graph/vid2lvid[processID]-r-0000x[optional .gz] graph/edata[processID]-r-0000x[optional .gz] graph/vdata[processID]-r-0000x[optional .gz] where x is (I think) the partition number.
>
> This doesn't at all match what is output by GraphBuilder, which is (for
> graph_partitioned):
> edges -> partition0 -> subpart0-x
> edges -> vrecord -> part-00000
> vrecords -> partition0-x -> vrecord
> vrecords -> partition0-x -> meta
>
> The edges -> vrecord -> part-00000 seems to match what's expected for graph/vdata[processID]-r-0000x[optional .gz] (except for keyword vdata rather than VertexData) but the rest do not seem to correspond to what GraphLab expects.
>
> According to docs for GraphBuilder, the graph formats are both GraphLab2 based (GLGraph says it's a GraphLab2 distributed graph, and SimpleGraph says it's a pre-finalized, post-partitioned GraphLab2 graph). I've tried changing SimpleGraph to GLGraph in EdgeIngressReducer.java and EdgeIngressMR.java, but this doesn't change the structure of the files output by GraphBuilder.
>
> I'm sure I'm doing something wrong here, but I can't see what it is (I'm fairly new to graph processing). Any help appreciated in getting GraphBuilder output to work as GraphLab2 input!
>
> Thanks,
> Rebecca
> _______________________________________________
> GraphBuilder mailing list
> GraphBuilder(a)lists.01.org
> https://lists.01.org/mailman/listinfo/graphbuilder
9 years, 2 months
GraphBuilder output -> GraphLab input
by Rebecca Dengate
Hi there,
I am looking at using GraphBuilder and GraphLab2 for distributed
computer vision at NICTA. I have been trying to get a simple example to
work in order to understand how the pipeline works. I generate a graph
using:
"hadoop jar graphbuilder-0.0.1-SNAPSHOT-hadoop-job.jar
com.intel.hadoop.graphbuilder.demoapps.wikipedia.linkgraph.LinkGraphEnd2End
4 /user/hduser/wiki-input /user/hduser/wiki-output"
where wiki-input contains
enwiki-latest-pages-articles1.xml-p000000010p000010000 downloaded from
wikipedia.
I'm then trying to load the graph from wiki-output in GraphLab2 (to,
say, run pagerank). I am using load_json for this, but it expects a
structure that looks like:
graph/graph[processID]-r-0000x[optional .gz]
graph/vid2lvid[processID]-r-0000x[optional .gz]
graph/edata[processID]-r-0000x[optional .gz]
graph/vdata[processID]-r-0000x[optional .gz]
where x is (I think) the partition number.
This doesn't at all match what is output by GraphBuilder, which is (for
graph_partitioned):
edges -> partition0 -> subpart0-x
edges -> vrecord -> part-00000
vrecords -> partition0-x -> vrecord
vrecords -> partition0-x -> meta
The edges -> vrecord -> part-00000 seems to match what's expected for
graph/vdata[processID]-r-0000x[optional .gz] (except for keyword vdata
rather than VertexData) but the rest do not seem to correspond to what
GraphLab expects.
According to docs for GraphBuilder, the graph formats are both GraphLab2
based (GLGraph says it's a GraphLab2 distributed graph, and SimpleGraph
says it's a pre-finalized, post-partitioned GraphLab2 graph). I've tried
changing SimpleGraph to GLGraph in EdgeIngressReducer.java and
EdgeIngressMR.java, but this doesn't change the structure of the files
output by GraphBuilder.
I'm sure I'm doing something wrong here, but I can't see what it is (I'm
fairly new to graph processing). Any help appreciated in getting
GraphBuilder output to work as GraphLab2 input!
Thanks,
Rebecca
9 years, 2 months