Following up on an earlier email from one of my colleagues (Venkat) I've been trying
to look at how you would go about changing the output format of GraphBuilder.
Having looked over the code base my impression is that much of the output details appear
to be hardcoded into the code base which is somewhat frustrating to me. I can see that
there is a GLGraph defined with associated output format but enabling this over the
SimpleGraph format appears to entail commenting and uncommenting various lines of source
code. And as for actually implementing a completely custom format I find myself at
something of a loss to understand the design decisions involved or what exactly I am
supposed to implement to achieve this.
For example the GraphOutput interface has a write() method which takes both a Graph and an
EdgeFormatter, yet looking at the implementations provided they appear to be tied to
specific implementations of the Graph interface. So I don't really understand why
GraphOutput and EdgeFormatter are two separate interfaces in the first place? A single
GraphOutput interface would seem more logical. Also the choice to make the Graph
interface very generic seems to require that every GraphOutput implementation will be tied
to a specific implementation of Graph since there's just not enough information to
write a generic implementation of a GraphOutput.
From my examination of the code it looks like if I wanted to implement
a custom output format I need to create my own implementations of Graph, GraphOutput and
EdgeFormatter and modify the code of various map reduce jobs to use my implementations
rather than the built in SimpleGraph based ones. Is this a correct assessment of this?
Also has there been any thought given or work done on modifying the code so the desired
output format can be changed on a per run basis (e.g. via a JobConf setting) rather than
requiring a recompile as it currently necessary?