The basic question is, how do we read an entire graph from a Neo4j store into a NetworkX graph? And another question is, how do we extract subgraphs from Cypher and recreate them in NetworkX, to potentially save memory?
Using a naive query to read all relationships
This is based on cypher-ipython module. This uses a simple query like the following to obtain all the data:
MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN n, r
This can be read into a graph using the following code. Note that the rows may duplicate both relationships and nodes, but this is taken care of by the use of neo4j IDs.
def rs2graph(rs):
graph = networkx.MultiDiGraph()
for record in rs:
node = record['n']
if node:
print("adding node")
nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = node.labels
graph.add_node(node.id, **nx_properties)
relationship = record['r']
if relationship is not None: # essential because relationships use hash val
print("adding edge")
graph.add_edge(
relationship.start, relationship.end, key=relationship.type,
**relationship.properties
)
return graph
There's something about this query that is rather inelegant, that is that the result set is essentially 'denormalized'.
Using aggregation functions
Luckily there's another more SQL-ish way to do it, which is to COLLECT the
relationships of each node into an array. This then returns lists which
represent a distinct node and the complete set of relationships for that node,
similar to something like the ARRAY_AGG()
and GROUP BY
combination in
PostgreSQL. This seems much cleaner to me.
# this version expects a collection of rels in the variable 'rels'
# But, this version doesn't handle dangling references
def rs2graph_v2(rs):
graph = networkx.MultiDiGraph()
for record in rs:
node = record['n2']
if not node:
raise Exception('every row should have a node')
print("adding node")
nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = list(node.labels)
graph.add_node(node.id, **nx_properties)
relationship_list = record['rels']
for relationship in relationship_list:
print("adding edge")
graph.add_edge(
relationship.start, relationship.end, key=relationship.type,
**relationship.properties
)
return graph
Trying to extend to handle subgraphs
When we have relationship types that define subtrees, which are labelled
something like :PRECEDES
in this case, we can attempt to materialize this
sub-graph selected from a given root in memory. In the query below, the Token
node with content nonesuch
is taken as the root.
This version can be used with a Cypher query like the following:
MATCH (a:Token {content: "nonesuch"})-[:PRECEDES*]->(t:Token)
WITH COLLECT(a) + COLLECT(DISTINCT t) AS nodes_
UNWIND nodes_ AS n
OPTIONAL MATCH p = (n)-[r]-()
WITH n AS n2, COLLECT(DISTINCT RELATIONSHIPS(p)) AS nestedrel
RETURN n2, REDUCE(output = [], rel in nestedrel | output + rel) AS rels
And the Python code to read the result of this query is as such:
# This version has to materialize the entire node set up front in order
# to check for dangling references. This may induce memory problems in large
# result sets
def rs2graph_v3(rs):
graph = networkx.MultiDiGraph()
materialized_result_set = list(rs)
node_id_set = set([
record['n2'].id for record in materialized_result_set
])
for record in materialized_result_set:
node = record['n2']
if not node:
raise Exception('every row should have a node')
print("adding node")
nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = list(node.labels)
graph.add_node(node.id, **nx_properties)
relationship_list = record['rels']
for relationship in relationship_list:
print("adding edge")
# Bear in mind that when we ask for all relationships on a node,
# we may find a node that PRECEDES the current node -- i.e. a node
# whose relationship starts outside the current subgraph returned
# by this query.
if relationship.start in node_id_set:
graph.add_edge(
relationship.start, relationship.end, key=relationship.type,
**relationship.properties
)
else:
print("ignoring dangling relationship [no need to worry]")
return graph