Hmm.. Well introducing db at this stage seems too early?
Maybe I am wrong, but at this point, I am still validating whether my original hypothesis works or not.
So, most of the filter and stuff, I have already used spark to build dataset.
The issue I generally face is.. there are (say) 10 different frameworks (graphlab, graphchi, mahout,giraph,sparkx etc)... with each having their own input requirement... Now, I dont know which framework to use at this point of time as I still dont know which will perform better. So, I have to try them out to see that.. But with each having their own input requirement, I end up doing a lot of format conversions... From csv to sequence files (for mahout) to edge lists to adjacency lists to market matrix formats..
Well, I have already gone thru the exercise.. but would be neat if graphbuilder can atleast offer these conversions without dumping data on db?

Not sure whether I am making any sense or not?

That’s a good question. Initially, we developed Graph Builder to solve the problem of constructing graphs from various data sources with different formats at scale. The scope of the tool has only increased. We envision Graph Builder to evolve into a graph ETL toolkit which will include a library of import, extract, transform and export operators for graph. For example, the library can be used to

·         Import data from a set of parquet files in HDFS

·         Filter in all data related to Country Code = USA

·         Calculate the join of multiple data files to correlate sales figures for every employee

·         Export the graph to a graph database such as Titan or a graph execution engine such as GraphLab

This is the vision that we are pursuing in our group and hopefully the open source community around Graph Builder can help us enrich it with new ideas and codebases.




Question, will GB grow beyond just Graph construction library? 


Yepp.. I guess lots of time I spend enormous time in transforming formats between different frameworks.

Graphchi requires Market Matrix (which is not standard MM format), Graphlab requires egelist.. Giraph has different input..


Lots of time, I just want to evaluate the performance of a framework but I spend alot of time juggling these formats. 


It would be awesome, if it can read tons of input formats and has out of the box support for atleast these popular frameworks?


What can I expect in the next release? and when is the next release? 


Hi Jamal,


The project is alive. We are making a lot of changes in the GB code internally, hence the slowdown.

Is there anything you were particularly looking for?





Hi GB team,

I dont see any stories,code pushes,future features or any activity? 

Is this project still going on or is it already dead?



