Let me jump in for Rob ( I work with him ). We're actually looking at creating edge lists  (a collection of RDF triples is basically an edge list, see below)

In RDF, the difference is that the vertex attributes are (in a way) 'flattened' into edge list space so if I have a graph represented by a combination vertex list + edge list as follows

Vertex list:




Edge List:
For simplicity, assume that the edge label is always "knows" in the below)

1 2
2 4

The same graph in RDF would be

<urn:1> :name "Venkat" <= NOT really a graph edge, but a valid RDF 'triple'
<urn:1> :location "Pleasanton"  <= NOT really a graph edge, but a valid RDF triple

<urn:2> :name "Rob" <= NOT really a graph edge, but a valid RDF triple
<urn:1> :location "UK" <= NOT really a graph edge, but a valid RDF triple

<urn:3> :name "Kushal" <= NOT really a graph edge, but a valid RDF triple
<urn:1> :location "California" <= NOT really a graph edge, but a valid RDF triple

<urn:1> :knows <urn:2> <= triple equivalent to graph edge
<urn:2> :knows <urn:4> <= triple equivalent to graph edge

The thing that Rob's running into is that GB has a separate container for vertex-specific data/attributes, and another for the pure edge list.
So the reason we need a way to access both is to combine vertex attributes and the edge list into a single  'unified RDF list' which is a combination of edges + vertex attributes (as a set of rdf triples). Is there a place to do so in the code?


From: <Datta>, Kushal <kushal.datta@intel.com>
Date: Thursday, October 10, 2013 12:45 PM
To: Rob Vesse <rvesse@yarcdata.com>
Cc: "graphbuilder@lists.01.org" <graphbuilder@lists.01.org>
Subject: Re: [GraphBuilder] Changing output formats?



Correct me if Im wrong, but I think you are trying to output an adjacency list for your graph. Currenlty we only support edgelists.

What I suggest is you can write a mapreduce code which takes the normalized graph or raw graph (whichever fits your purpose) and creates the adjacency list.

The mapper will emit both vertices and edges (with source vertex as key).

The reducer can print all the edges for every unique vertex key.





From: Rob Vesse [mailto:rvesse@yarcdata.com]
Sent: Thursday, October 10, 2013 7:08 AM
To: Datta, Kushal
Cc: graphbuilder@lists.01.org
Subject: Re: Changing output formats?


Hi Kushal


Thanks for your feedback


I've been able to get a basic implementation of my desired output working though I have still run into one problem that you may be able to help me with.  Ideally I am wanting to use both the vertex and edge data together to form my output I.e. I need to have vertex IDs, vertex data and edge data all in one place to produce the output I desire.  Currently I am only succeeding in producing part of the output I am after.


Is there a simple way to recombine these pieces of information via the Graph Builder API or would I have to figure this out myself?


I can see some obvious (hacky) ways of doing this myself but wanted to check if there was any support for this already present






From: <Datta>, Kushal <kushal.datta@intel.com>
Date: Wednesday, October 9, 2013 9:57 AM
To: Rob Vesse <rvesse@yarcdata.com>, "graphbuilder@lists.01.org" <graphbuilder@lists.01.org>
Subject: RE: Changing output formats?


Hi Rob,


Your comments are very valuable to us. I am trying to recreate your use-case and will get back to you on this asap.

We used separate GraphOutput format classes to print vertices and edges differently for separate targets. One example is for Graphlab edge list format.

We implemented the logic to write the JSON or text format of the vertex and edges in the vertex record formatter and edge format classes


However, I see your point and hence looking at the code again.





From:graphbuilder-bounces@lists.01.org [mailto:graphbuilder-bounces@lists.01.org] On Behalf Of Rob Vesse
Sent: Monday, October 07, 2013 7:55 AM
To: graphbuilder@lists.01.org
Subject: [GraphBuilder] Changing output formats?


Hi All


Following up on an earlier email from one of my colleagues (Venkat) I've been trying to look at how you would go about changing the output format of GraphBuilder.


Having looked over the code base my impression is that much of the output details appear to be hardcoded into the code base which is somewhat frustrating to me.  I can see that there is a GLGraph defined with associated output format but enabling this over the SimpleGraph format appears to entail commenting and uncommenting various lines of source code.  And as for actually implementing a completely custom format I find myself at something of a loss to understand the design decisions involved or what exactly I am supposed to implement to achieve this.


For example the GraphOutput interface has a write() method which takes both a Graph and an EdgeFormatter, yet looking at the implementations provided they appear to be tied to specific implementations of the Graph interface.  So I don't really understand why GraphOutput and EdgeFormatter are two separate interfaces in the first place?  A single GraphOutput interface would seem more logical.   Also the choice to make the Graph interface very generic seems to require that every GraphOutput implementation will be tied to a specific implementation of Graph since there's just not enough information to write a generic implementation of a GraphOutput.


From my examination of the code it looks like if I wanted to implement a custom output format I need to create my own implementations of Graph, GraphOutput and EdgeFormatter and modify the code of various map reduce jobs to use my implementations rather than the built in SimpleGraph based ones.  Is this a correct assessment of this?


Also has there been any thought given or work done on modifying the code so the desired output format can be changed on a per run basis (e.g. via a JobConf setting) rather than requiring a recompile as it currently necessary?