20
Neo4j AuraDB in Python

This chapter's example uses Python to build a NoSQL graph database in the cloud. It uses the Neo4j AuraDB database to build and perform queries on the org chart shown in Figure 20.1.

A representation of the org chart to perform queries.

FIGURE 20.1

This example uses an adapter called neo4j, which is the official database adapter provided by Neo4j.

The following section explains how to install the Neo4j AuraDB database engine in the cloud. The section after that explains how to install the neo4j database adapter. The rest of the chapter describes the example program.

INSTALL NEO4J AURADB

Neo4j AuraDB is a fully managed graph database service in the cloud. Neo4j provides many other services that can do more than AuraDB, but this one is simple and, best of all, free.

To install AuraDB, go to http://neo4j.com/cloud/platform/aura-graph-database, click the Start Free button, fill out the registration form, and create a free account.

To create a database instance, click New Instance. On the “Get started by picking a dataset or an empty instance” page, hover over Empty Instance and click Create.

When AuraDB creates the database, it automatically generates credentials that you'll need to use later to connect to the database. It then displays a password, as shown in Figure 20.2. Click Download to save the credentials somewhere safe.

A representation of the Neo4j Aura screen exposes the username and the generated password.

FIGURE 20.2

Check the box that indicates you'll take the blame if you lose your password, and click Continue to see a screen similar to Figure 20.3.

A representation of the Neo4j Aura screen exhibits the instances.

FIGURE 20.3

You can use the New Instance button to create another new database instance, but you only get one for free, so you'll either need to delete this one first or pay for the second.

If you click the instance shown in Figure 20.3, you can view an example program. Tabs let you pick one of these programming languages: Python, JavaScript, GraphQL, Java, Spring Boot, .NET, or Go. You can look through that code now if you like. We'll use some of it later in this chapter and in the next chapter.

NODES AND RELATIONSHIPS

Before we start looking at code, you should know exactly what kinds of information a graph database stores.

An AuraDB database holds two types of objects: nodes and relationships. A node represents some sort of item such as a person, movie, location, or bank account. A relationship represents a relationship between two nodes.

For example, an example on the Neo4j website uses a database that represents the relationships between movies, actors, and directors. The ACTED_IN relationship represents the fact that a particular person acted in a movie. Similarly, the DIRECTED relationship means a person directed a movie. It's all fairly intuitive and easy to understand—at least until you start looking at complicated queries.

All relationships in AuraDB are directed, which means they point from one node to another. For example, Harrison Ford ACTED_IN the movie Blade Runner, but Blade Runner did not ACTED_IN Harrison Ford. That should be blindingly obvious, but more symmetrical relationships can be a bit more confusing.

For example, consider the relationship IS_A_SIBLING_OF. Just because Bart Simpson IS_A_SIBLING_OF Lisa Simpson, that doesn't mean that Lisa Simpson IS_A_SIBLING_OF Bart Simpson as far as AuraDB is concerned. If you want the relationship to go both ways, then you need to define it in both directions.

However, relationships do implicitly define a reversed relationship. For example, suppose Homer Simpson IS_PARENT_TO Maggie Simpson. Then you can search the database to find nodes X where either of the following is true:

  • Homer Simpson IS_PARENT_TO X.
  • X IS_PARENT_TO Maggie Simpson.

Even though you have only defined the relationship in one direction, you can search for nodes that satisfy the reverse if you want to do so.

If this is confusing, it may be a little clearer when you see some examples of database queries.

CYPHER

Cypher is the query language supported by AuraDB. Despite its cryptographic-sounding name (and somewhat cryptic syntax), Cypher is a graph query language that was developed by Neo4j. It was inspired by SQL, so it has a vaguely SQL-like flavor but with some changes to support graph queries. You can learn more about the language as implemented by AuraDB at https://neo4j.com/developer/cypher.

The language is now open source, and you can get information about openCypher at http://opencypher.org. The plan is to turn openCypher into a standard called Graph Query Language (GQL). It's not there yet, but it's likely that the final GQL will have a lot in common with Cypher.

You'll learn more about Cypher when you see examples in the program.

CREATE THE PROGRAM

Now that the database is waiting for you in the cloud, you need to install a database driver for it. Then you can start writing code.

To create a Python program to work with the AuraDB org chart database, create a new Jupyter Notebook and then add the code described in the following sections.

Install the Neo4j Database Adapter

To install the neo4j database adapter, simply use the following pip command:

$ pip install neo4j

If you're not familiar with pip, you can execute the following command in a Jupyter Notebook cell instead:

!pip install neo4j

That's all there is to it!

The neo4j database adapter uses a somewhat unusual approach for executing commands. The following steps show the general approach:

  1. Create a database session.
  2. Call the session object's write_transaction and read_transaction methods, passing them an action method that performs the desired action. That method should take a transaction object as a parameter and should use the transaction object's run method to do the work.

To get started, we need to write the action methods. Then we can write more code to launch those methods. (Again, if this is confusing, just go with the flow until you see some example code.)

The following sections describe the lower-level action methods. The sections after those describe higher-level methods that build the org chart. Finally, the last sections in this part of the chapter describe the main program that uses the other methods to build and query the org chart.

Action Methods

The following sections describe the lower-level action methods. Each takes a transaction object as its first parameter. They then use that object to run a database command or query.

They all follow the same pattern, so you'll probably get the hang of them quickly. The more interesting differences are the query commands that they send to the database. Those queries are vaguely reminiscent of SQL, but they're not the same because they need to work with a graph database instead of a relational database.

Table 20.1 lists the action methods and gives their purposes.

TABLE 20.1: Action methods and their purposes

METHODPURPOSE
delete_all_nodesDelete all nodes and their links.
make_nodeCreate a new node.
make_linkCreate a relationship between two nodes.
execute_node_queryExecute a query that returns nodes.
find_pathFind a path between two nodes.

delete_all_nodes

The following code shows how the delete_all_nodes method deletes all of the nodes in the database:

# Delete all nodes from the database.
from neo4j import GraphDatabase
import logging
from neo4j.exceptions import ServiceUnavailable
 
# Delete all previous nodes and links.
def delete_all_nodes(tx):
    statement = "MATCH (n) DETACH DELETE n"
    tx.run(statement)

This code starts with some import statements. It then defines the delete_all_nodes action method. The method takes a transaction object tx as a parameter and passes the string MATCH (n) DETACH DELETE n into the transaction's run method to make the database execute that command. Easy, right?

This is the basic pattern that all the action methods follow. They pass a database command into the transaction object's run method. The part that differentiates this action method from others is the database command.

The MATCH command is similar to a SQL query. It uses a pattern to match objects inside the database. Later parts of the command do things with any objects that match.

In this case, the pattern (n) matches any node. In general, expressions inside parentheses () match nodes, and expressions surrounded by square brackets [] match relationships.

Note that you cannot delete a node if it is involved in any relationships. (This is kind of like a graph database version of a foreign key constraint.) You can either delete the relationships first or include the DETACH keyword to tell the database to delete those relationships automatically before deleting the node.

The last part of the command is DELETE n, which deletes the node n. Here n is a node that was matched by the MATCH (n) part of the command.

To summarize, this statement says, “Match all nodes, detach their relationships, and then delete them.” (Now that you have the hang of it, the other action methods will go faster.)

make_node

The following code shows how the make_node method creates a new node:

# Use parameters to make a node.
def make_node(tx, id, title):
    statement = "CREATE (n:OrgNode { ID:<?b Start?>$id<?b End?>, Title:<?b Start?>$title<?b End?> })"
    parameters = {
        "id": id,
        "title": title
    }
    tx.run(statement, parameters)

This method creates a statement that uses the CREATE command to add a node to the database. The n:OrgNode part tells the database that this node's type is OrgNode. That's just a name that I made up for this kind of node. You could create EmployeeNode, ProductNode, CandyCaneNode, or any node types that make sense for your application. You also don't need to include the word Node; I just decided that it would make the queries easier to understand.

The part of the statement that's inside the curly brackets tells the database to give the node two properties, ID and Title. The parts with the dollar signs (shown in bold) are placeholders. We'll provide values for the placeholders shortly.

Note that property names are case-sensitive in AuraDB, so if you create a Title property and later search for a title value, you won't get any matches. (And you'll waste a lot of time wondering why not.)

The code then makes a dictionary that uses the placeholder names as keys and supplies values for them. The first dictionary entry associates the $id placeholder with the parameter id that was passed into make_node. The second entry associates the $title placeholder with the method's title parameter.

The code finishes by calling the transaction object's run method, passing it the command string and the parameters dictionary.

make_link

The following code shows how the make_link method creates a relationship between two nodes:

# Use parameters to create a link between an OrgNode and its parent.
def make_link(tx, id, boss):
    statement = (
        "MATCH"
        "    (a:OrgNode),"
        "    (b:OrgNode) "
        "WHERE"
        "    a.ID = $id AND "
        "    b.ID = $boss "
        "CREATE (a)-[r:REPORTS_TO { Name:a.ID + '->' + b.ID }]->(b)"
    )
    parameters = {
        "id": id,
        "boss": boss
    }
    tx.run(statement, parameters)

This method creates a REPORTS_TO link for the org chart. The MATCH statement looks for two OrgNode objects, a and b, where a.ID matches the id parameter passed into the method and b.ID matches the boss parameter.

After the MATCH statement finds those two nodes, the CREATE command makes a new REPORTS_TO relationship leading from node a to node b. It gives the relationship a Name property that contains the two IDs separated by ->. For example, if the nodes' IDs are A and B, then this relationship's Name is A->B.

execute_node_query

The following code shows the execute_node_query method, which executes a query that matches nodes that are named n in the query. This method returns a list of the query's results.

# Execute a query that returns zero or more nodes identified by the name "n".
# Return the nodes in a list of strings with the format "ID: Title".
def execute_node_query(tx, query):
    result = []
    for record in tx.run(query):
        node = record["n"]
        result.append(f"{node['ID']}: {node['Title']}")
    return result

The method first creates an empty list to hold results. It then runs the query and loops through the records that it returns.

Inside the loop, it uses record["n"] to get the result named n from the current record. That result will be a node selected by the MATCH statement. (You'll see that kind of MATCH statement a bit later.) The code copies the node's ID and Title properties into a string and adds the string to the result list.

After it finishes looping through the records, the method returns the list of results.

find_path

The following code shows how the find_path method finds a path between two nodes in the org chart:

# Execute a query that returns a path from node id1 to node id2.
# Return the nodes in a list of strings with the format "ID: Title".
def find_path(tx, id1, id2):
    statement = (
        "MATCH"
        "    (start:OrgNode { ID:$id1 } ),"
        "    (end:OrgNode { ID:$id2 } ), "
        "    p = shortestPath((start)-[:REPORTS_TO *]-(end)) "
        "RETURN p"
    )
    parameters = {
        "id1": id1,
        "id2": id2
    }
 
    record = tx.run(statement, parameters).single()
    path = record["p"]
    result = []
    for node in path.nodes:
        result.append(f"{node['ID']}: {node['Title']}")
 
    return result

This code looks for two nodes that have the IDs id1 and id2 that were passed into the method as parameters. It names the matched nodes start and end.

It then calls the database's shortestPath method to find a shortest path from start to end following REPORTS_TO relationships. The * means the path can have any length. The statement saves the path that it found in a variable named p.

Note that the shortestPath method only counts the number of relationships that it crosses; it doesn't consider costs or weights on the relationships. In other words, it looks for a path with the fewest steps, not necessarily the shortest total cost as you would like in a street network, for example. Some AuraDB databases can perform the least total cost calculation and execute other graph algorithms, but the free version cannot.

After it composes the database command, the method executes it, passing in the necessary parameters. It calls the result's single method to get the first returned result.

It then looks at that result's p property, which holds the path. (Remember that the statement saved the path with the name p.)

The code loops through the path's nodes and adds each one's ID and Title values to a result list. The method finishes by returning that list.

Org Chart Methods

That's the end of the action methods. They do the following:

  • Delete all nodes.
  • Make a node.
  • Make a link.
  • Execute a query that returns nodes.
  • Find a path between two nodes.

The following sections describe the two methods that use those tools to build and query the org chart. The earlier action methods make these two relatively straightforward.

build_org_chart

The following code shows how the build_org_chart method builds the org chart:

# Build the org chart.
def build_org_chart(tx):
    # Make the nodes.
    make_node(tx, "A", "President")
    make_node(tx, "B", "VP Ambiguity")
    make_node(tx, "C", "VP Shtick")
    make_node(tx, "D", "Dir Puns and Knock-Knock Jokes")
    make_node(tx, "E", "Dir Riddles")
    make_node(tx, "F", "Mgr Pie and Food Gags")
    make_node(tx, "G", "Dir Physical Humor")
    make_node(tx, "H", "Mgr Pratfalls")
    make_node(tx, "I", "Dir Sight Gags")
 
    # Make the links.
    make_link(tx, "B", "A")
    make_link(tx, "C", "A")
    make_link(tx, "D", "B")
    make_link(tx, "E", "B")
    make_link(tx, "F", "C")
    make_link(tx, "G", "C")
    make_link(tx, "H", "G")
    make_link(tx, "I", "G")

This method calls the make_node method repeatedly to make the org chart's nodes. It then calls the make_link method several times to make the org chart's relationships.

Notice that each call to make_node and make_link includes the transaction object that build_org_chart received as a parameter.

query_org_chart

The following code shows how the query_org_chart method performs some queries on the finished org chart:

# Perform some queries on the org chart.
def query_org_chart(tx):
    # Get F.
    print("F:")
    result = execute_node_query(tx,
        "MATCH (n:OrgNode { ID:'F' }) " +
        "return n")
    print(f"    {result[0]}")
 
    # Who reports directly to B.
    print("
Reports directly to B:")
    result = execute_node_query(tx,
        "MATCH " +
        "    (n:OrgNode)-[:REPORTS_TO]->(a:OrgNode { ID:'B' }) " +
        "return n " +
        "ORDER BY n.ID")
    for s in result:
        print("    " + s)
 
    # Chain of command for H.
    print("
Chain of command for H:")
    result = find_path(tx, "H", "A")
    for s in result:
        print("    " + s)
 
    # All reports for C.
    print("
All reports for C:")
    result = execute_node_query(tx,
        "MATCH " +
        "    (n:OrgNode)-[:REPORTS_TO *]->(a:OrgNode { ID:'C' }) " +
        "return n " +
        "ORDER BY n.ID")
    for s in result:
        print("    " + s)

This method first calls the execute_node_query method to execute the following query.

MATCH (n:OrgNode { ID:'F' }) return n

This query finds the node with ID equal to F. The code prints the result.

Next, the method looks for nodes that have the REPORTS_TO relationship ending with node B. That returns all of the nodes that report directly to node B. The code loops through the results, displaying them.

The method then uses the find_path method to find a path from node H to node A. Node A is at the top of the org chart, so this includes all of the nodes in the chain of command from node H to the top.

The last query the method performs matches the following:

(n:OrgNode)-[:REPORTS_TO *]->(a:OrgNode { ID:'C' })

This finds nodes n that are related via any number (*) of REPORTS_TO relationships to node C. That includes all the nodes that report directly or indirectly to node C. Graphically those are the nodes that lie below node C in the org chart.

Main Program

The previous methods make working with the org chart fairly simple. All we need to do now is get them started. The following code shows the how main program does that:

# Replace the following with your database URI, username, and password.
uri = "neo4j+s://386baeab.databases.neo4j.io"
user = "neo4j"
password = "InsertYourReallyLongAndSecurePasswordHere"
 
driver = GraphDatabase.driver(uri, auth=(user, password))
 
with driver.session() as session:
    # Delete any previous nodes and links.
    print("Deleting old data…")
    session.write_transaction(delete_all_nodes)
 
    # Build the org chart.
    print("Building org chart…")
    session.write_transaction(build_org_chart)
 
    # Query the org chart.
    print("Querying org chart…")
    session.read_transaction(query_org_chart)
 
# Close the driver when finished with it.
driver.close()

This code first defines the uniform resource identifier (URI) where the database is located, the username, and the password. You can find these in the credential file that you downloaded when you created the database instance. (I hope you saved that file! If you didn't, then this might be a good time to delete the database instance and start over.)

Next, the code uses the URI, username, and password to create a graph database driver. It then creates a new session on the driver. It does that inside a with statement to ensure that the session is cleaned up properly when the program is finished with it.

The program then uses the session's write_transaction method to run delete_all_nodes. This is the step that starts delete_all_nodes and passes it the transaction object. It uses write_transaction to run delete_all_nodes inside a write transaction because delete_all_nodes modifies data.

The code runs the build_org_chart method similarly.

The program then calls the session's read_transaction method to run query_org_chart. It uses read_transaction this time because query_org_chart only reads data.

After it has finished displaying its results, the program closes the database driver.

The following shows the program's output:

Deleting old data…
Building org chart…
Querying org chart…
F:
    F: Mgr Pie and Food Gags
 
Reports directly to B:
    D: Dir Puns and Knock-Knock Jokes
    E: Dir Riddles
 
Chain of command for H:
    H: Mgr Pratfalls
    G: Dir Physical Humor
    C: VP Shtick
    A: President
 
All reports for C:
    F: Mgr Pie and Food Gags
    G: Dir Physical Humor
    H: Mgr Pratfalls
    I: Dir Sight Gags

Figure 20.4 shows the same org chart in Figure 20.1 so you can look at it to see that the output is correct.

A representation of the org chart.

FIGURE 20.4

SUMMARY

This chapter shows how you can use Python and a NoSQL graph database to build and explore an org chart. You can use similar techniques to work with other trees and, more generally, graphs that are not trees.

As you work with this example, you might notice that operations are relatively slow, particularly if you have a slow network connection. This is generally true of cloud applications. Network communications tend to be slower than local calculations.

The pattern that this example used was to:

  1. Create a database session.
  2. Call the session object's write_transaction and read_transaction methods, passing them an action method.
  3. Use an action method that takes a transaction object as a parameter and uses it to do the work.

The next chapter shows how to build a similar example program in C#. Because it uses a different database adapter, it also uses a different pattern.

Before you move on to that example, however, use the following exercises to test your understanding of the material covered in this chapter. You can find the solutions to these exercises in Appendix A.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset