Planet Neo4j

Neo4j home | news | blog | Neo Technology

Neo4j Blog

Access control lists the graph database way

In many contexts you need to handle user permissions to access, create or change some kind of resources. A common example is a file system, and that's what we are going to dive into in this blog post. We're going to use Ruby bindings for the Neo4j graph database to create a small - but working - example application.

Preparation

To set up the environment for this example on Ubuntu, I used the following commands:

sudo apt-get install jruby
sudo jruby -S gem install neo4j

To import the libraries, the following code was used:

require 'rubygems'
require 'neo4j'
require 'neo4j/extensions/find_path'

Heading for the node space

So user permissions, what are they all about? Obviously it's about users, and usually user groups as well. We'll abstract this away a bit and use the term principals, which can be single users or groups.

The other side of user permissions are the resources which are to be protected. In our case we'll have a file system, so there will be folders and files. Here we'll use the term content.

Let's start out building a graph to support the application from what we have gathered so far! When working with a graph it's beneficial to think in a graphy manner, so that's where we'll begin. Graphs are presumably about connecting things, so our first step is to create some relationships. Neo4j comes with a built-in reference node, which is easily accessible at all times. We use this to create our own "subreference nodes", one for principals and one for content. This is how our graph looks so far:

To create (and get) the subreference nodes, we use this function:

def get_or_create_sub_ref( name )
result = Neo4j.ref_node.rels.outgoing( name ).nodes.first
if ( result.nil? )
result = Neo4j::Node.new :name => name.to_s.capitalize.gsub("_", " ")
Neo4j.ref_node.rels.outgoing( name ) result
end
return result
end

This function is then called whenever we need to use a subreference node. The important parts here are:

  • ref_node: the built-in reference node
  • rels: relationships connected to a node
  • outgoing: the direction of the relationship (the relationships are always directed, but you can choose to ignore the direction in traversals)
  • ( name ): the type of relationships to follow (the type can be ignored in traversals as well, but in our case we want to use it)
  • nodes: the nodes in the other end of the relationships
  • first: the first node found - there sould only be one subreference node of each type

If the subreference node isn't found, it will be created and connected to the reference node. As you can see, we're adding a property with the key name to the nodes as well, which is there solely for the purpose of visualization (the images in this post are created using Neoclipse).

Basic structure

For the principals part, we are going to connect the top-level ones to the corresponding subreference node using a PRINCIPAL type of relationship. Other than that, there's just users and groups, so let's use a IS_MEMBER_OF_GROUP relationship type to encode that. This is how that looks in the graph:

And here's the code to create it:

def new_principal( name, member_of_groups = [] )
principal = Neo4j::Node.new
principal[ :name ] = name
if member_of_groups.empty?
get_or_create_sub_ref( :PRINCIPALS ).rels.outgoing( :PRINCIPAL ) principal
else
for group in member_of_groups
principal.rels.outgoing( :IS_MEMBER_OF_GROUP ) group
end
end
return principal
end

If a new principal isn't member of any groups, it's added as a top-level principal, connected to the principals subrefererence node. In other case, it's simply added to the groups.

With Neo4j all operations on the graph have to be encapsulated in a transaction, so this is how we'll call the above function:

Neo4j::Transaction.run do
all_principals = new_principal( "All principals" )
root = new_principal( "root", [ all_principals ] )
regular_users = new_principal( "Regular users", [ all_principals ] )
user1 = new_principal( "user1", [ regular_users ] )
user2 = new_principal( "user2", [ regular_users ] )
end

For the content part, things are very similar to the principals part. The main difference is that in this case, an item can have only a single parent item. Here's the graphical view on that:

And this is the code to create the structure:

def new_content( name, parent = nil )
content = Neo4j::Node.new
content[ :name ] = name
if ( parent.nil? )
get_or_create_sub_ref( :CONTENT_ROOTS ).rels.outgoing( :CONTENT_ROOT ) content
else
parent.rels.outgoing( :HAS_CHILD_CONTENT ) content
end
return content
end

Similar to how the principals were created, this is the code to create the content data:

Neo4j::Transaction.run do
root_folder = new_content( "Root folder" )
temp_folder = new_content( "Temp", root_folder )
home_folder = new_content( "Home", root_folder )
user1_home_folder = new_content( "user1 home", home_folder )
user2_home_folder = new_content( "user2 home", home_folder )
a_file = new_content( "MyFile.pdf", user1_home_folder )
end

At the core

Now that we have the basic structure in place, what's left regarding our data is a small but crucial part: the permissions information! We're using a simple scheme: adding security relationships with optional boolean flags for read and write permission. Not much to say here, this is what we want the full graph to look like (click for a bigger version):

A small function will help us add the security information:

def apply_security( content, principal, map_with_flags )
security_relationship = Neo4j::Relationship.new( :SECURITY, principal, content )
map_with_flags.each_pair {|key, value| security_relationship[ key ] = value}
end

It's time to add the security data:

Neo4j::Transaction.run do
apply_security( root_folder, root, { "w" => true } )
apply_security( root_folder, all_principals, { "r" => true } )
apply_security( temp_folder, all_principals, { "w" => true } )
apply_security( user1_home_folder, regular_users, { "r" => false, "w" => false } )
apply_security( user1_home_folder, user1, { "r" => true, "w" => true } )
apply_security( user2_home_folder, user2, { "r" => true, "w" => true } )
end

To check the permission for some action by an actual principal for some content, there's some work to do. This is the algorithm we use to retrieve a permission flag:

  1. Move from the content node and upwards through the file system structure and investigate each level for permission information.
  2. On each level, see if there are any principals related to or identical with the principal concerned.
  3. Make sure to use the permission information from the principal closest to the principal concerned.
  4. If permission information was found, return it; otherwise, continue traversing to the next level in the file system.

In the code for this, we'll use a function named depth_of_principal() to calculate the distance between the principal we have traversed to and the principal concerned. More on that later, here's the code to check the permissions:

def has_access( content, principal, flag )
for current_content in content.incoming( :HAS_CHILD_CONTENT ).depth( :all )
lowest_score = nil
lowest_modifier = nil
for rel in current_content.rels.incoming( :SECURITY )
rel_principal = rel.start_node
if !rel[ flag ].nil?
score = depth_of_principal( rel_principal, principal )
if !score.nil?
modifier = rel[ flag ]
if lowest_score.nil? || score lowest_score ||
( score == lowest_score && modifier )
lowest_score = score
lowest_modifier = modifier
end
end
end
end
if !lowest_modifier.nil?
return lowest_modifier
end
end
return false
end

Here's our function to check the distance between principals (and to see if they're on the same path at all).

def depth_of_principal( principal, reference_principal )
result = reference_principal.outgoing( :IS_MEMBER_OF_GROUP ).depth( :all ).path_to( principal )
return result.nil? ? nil : result.size
end

Finally, we want to see that everything works, so here's a utility function to print permission information:


def print_has_access( content, principal, flag )
print principal[ :name ] + " +" + flag.upcase + " access to " + content[ :name ] + "? " +
has_access( content, principal, flag ).to_s + "\n"
end

And here's how to use the function:

Neo4j::Transaction.run do
print_has_access( home_folder, root, "w" )
print_has_access( home_folder, user1, "w" )
print_has_access( a_file, root, "r" )
print_has_access( a_file, user2, "r" )
print_has_access( a_file, user1, "w" )
end

Next steps

The full source code is found here

Here's a few useful resources to help you on your way:

Thanks for reading - any feedback is welcome!

by Anders Nawroth (noreply@blogger.com) at February 25, 2010 04:55 PM

Neo4j News

Neo4j 1.0 released

Recently version 1.0 of Neo4j was released. There has been a Neo Technology news post regarding this event, as well as a blog post on how to get to know Neo4j. The distribution is available as binary and source packages from the downloads page.

For more information, read the list mail announcement and check out the details in the changelog.

A few pointers to stuff that happened around and after the release:
As always, feedback to the mailing list, on Twitter or directly to us.

by Anders Nawroth (noreply@blogger.com) at February 23, 2010 03:41 PM

Neo4j Blog

The top 10 ways to get to know Neo4j

Today is a big day in Neo4j land because after ten long years of development and seven years of commercial 24/7 production we just announced Neo4j 1.0!

We're very excited about this and this post will outline the ten most interesting and fun ways of getting started with Neo4j. Without further ado, let's go!
  1. Wait, what is Neo4j?

    Neo4j is a graph database, that is, it stores data as nodes and relationships. Both nodes and relationships can hold properties in a key/value fashion. Here's a small example:

    You can navigate the structure either by following the relationships or use declarative traverser features to get to the data you want.

  2. Introduction

    For a high-level, 9 minutes cocktail-party introduction of Neo4j, check out this interview with Emil Eifrem:

    (blip.tv)

    To watch a longer introduction, see the no:sql(east) 2009 presentation by Emil Eifrém

  3. Handling complexity

    Most applications will not only have to scale to a huge volumes, but also scale to the complexity of the domain at hand. Typically, there may be many interconnected entities and optional properties. Even simple domains can be complex to handle because of the queries you want to run on them, for example to find paths. Two coding examples are the social network example (partial Ruby implementation) and the Neo4j IMDB example (Ruby variation of the code). For more examples of different domains modeled in a graph database, visit the Domain Modeling Gallery

  4. Storing objects

    The common domain implementation pattern when using Neo4j is to let the domain objects wrap a node, and store the state of the entity in the node properties. To relieve you from the boilerplate code needed for this, you can use a framework like jo4neo (intro, blog posts), where you use annotations to declare properties and relationships, but still have the full power of the graph database available for deep traversals and other graphy stuff. Here's a code sample showing jo4neo in action:

    public class Person {
    //used by jo4neo
    transient Nodeid node;
    //simple property
    @neo String firstName;
    //helps you store a java.util.Date to neo4j
    @neo Date date;
    // jo4neo will index for you
    @neo(index=true) String email;
    // many to many relation
    @neo Collection roles;

    /* normal class oriented
    * programming stuff goes here
    */
    }

    Another way to persist objects is by using the neo4j.rb Neo4j wrapper for Ruby. Time for a few lines of sample code again:

    require "rubygems"
    require "neo4j"

    class Person
    include Neo4j::NodeMixin
    # define Neo4j properties
    property :name, :salary, :age, :country

    # define an one way relationship to any other node
    has_n :friends

    # adds a Lucene index on the following properties
    index :name, :salary, :age, :country
    end
  5. REST API

    Of course you want a RESTful API in front of the graph database as well. There's been plenty of work going on in that area and here are some options:

    • The neo4j.rb Ruby bindings comes with a REST extension.
    • The neo4jr-simple Ruby wrapper has the neo4jr-social example project, which exposes social network data over a REST API.
    • Similarly, the Scala bindings has a companion example project which will show you how to set up a project exposing your data over REST.
    • Last but not least, Jim Webber has joined up with the core Neo4j team to create a kick-ass REST API. The current code base is only in the laboratory but a lot of people are already kicking its tires.
  6. Language bindings

    The Neo4j graph engine is written in Java, so you can easily add the jar file and start using the simple and minimalistic API right away. Your first stop should be the Getting started guide, or if you want to add a package of useful add-on components to the mix, go for Getting started with Apoc. Other language bindings:

  7. Frameworks

    Work is being done on using Neo4j as backend of different frameworks. Follow the links to get more information!

  8. Tools

    • Shell: a command-line shell for browsing the graph and manipulate it.
    • Neoclipse: Eclipse plugin (and standalone application) for Neo4j. Visual interface to browse and edit the graph.
    • Batch inserter: tool to bulk upload big datasets quickly.
    • Online backup: performs backup of a running Neo4j instance.
  9. Query languages

    Beyond using Neo4j programmatically, you can also issue queries using a query language. These are the supported options at the moment:

    • SPARQL: Neo4j can be used as a triple- or quadstore, and has SAIL and SPARQL implementations. Go to the components site to find out more about the related components.
    • Gremlin: a graph-based programming-language with different backend implementations in the works as well as a supporting toolset.
  10. Inspiration

    Have a look at the Neo4j in the wild page to see what others are doing with Neo4j. Here's a selection:

Hopefully this post was a good starting guide to the Neo4j ecosystem. As always, please ask any questions on the mailing list or come hang out with us in the #neo4j channel on IRC.

by Anders Nawroth (noreply@blogger.com) at February 17, 2010 08:53 AM

Neo Technology News

Neo Technology Announces $2.5M Seed Funding for World's Leading Graph Database

Sunstone Capital and Conor Venture Partners Fund Open Source Database Company

NoSQL East Conference, Atlanta, GA and Malmö, Sweden  October 28, 2009 -

Neo Technology, developer of Neo4j, the world’s leading open source graph database, today announced that it has secured $2.5 million in seed funding to boost Neo4j’s presence in the emerging graph database market. The funds will be used to accelerate product development and expand sales and marketing efforts. The investment round was co-led by Sunstone Capital and Conor Venture Partners.

The company board of directors will be joined by Magnus Christerson, Vice President of Intentional Software Corp, Nikolaj Nyholm, serial entrepreneur and CEO of Polar Rose, Sami Ahvenniemi, Partner at Conor Venture Partners as well as Johan Svensson, co-founder and CTO of Neo Technology.

"Following in the footsteps of MySQL - another Swedish open source database company – Neo Technology needs a strong US presence to succeed”, states Magnus Christerson. “Using my own experience transitioning a Swedish software startup to the US, I look forward helping Neo Technology establish that US presence.”

"The database market is rapidly changing. The need for alternatives to traditional SQL databases is suggested by three facts: 1) the increasing pain of using the 35 year old SQL model, 2) companies like Google and Facebook building their own database technology and 3) the exploding NoSQL movement. Sunstone Capital is pleased to invest in one of the most promising NoSQL alternatives”, said Christian Lindegård Jepsen, Partner in Sunstone Capital.

"Conor is excited to invest in this excellent team with solid technology that has showed some early customer wins in this rapidly growing database market segment”, said Sami Ahvenniemi, Partner in Conor. “We look forward to the continued growth of Neo Technology's open source and commercial user bases."

"I am very happy to have Sunstone, Conor and our new board members join us to build Neo Technology into a global success”, comments Emil Eifrem, co-founder and CEO of Neo Technology. “This investment will accelerate our product development and ability to serve customers worldwide.”

About The Neo4j Graph Database

Neo4j, the world’s leading graph database, stores data in graphs rather than relational tables. This makes Neo4j especially suitable for applications that handle data with complex relationships, like social networks, life sciences, intelligence and financial applications. Neo4j offers users:

  • extremely high performance on deep traversals and mining of complex data,
  • rapid schema evolution for changing business requirements, and
  • simplified development through perfect match between domain model and database schema.

These advantages make Neo4j the most effective database choice by many social networking services and other applications that manage ever more complex business data.

For further information, please contact:

Emil Eifrem, CEO, Neo Technology, Tel. +1 (206) 403-8808 (US) or +46 733 462 271 (Europe), emil.eifrem (at) neotechnology.com

Christian Lindegård Jepsen, Partner, Sunstone Capital, Tel. +45 25 36 39 63, jepsen (at) sunstonecapital.com

Sami Ahvenniemi, Partner, Conor Venture Partners, Tel. +358 40 560 2734, sami.ahvenniemi (at) conor.vc

About Neo Technology

Neo Technology is the developer of Neo4j, the world’s leading open source graph database. In 24/7 production since 2003, Neo4j is available in open source under AGPLv3 and commercial license terms. For more information, please visit www.neotechnology.com.

About Sunstone Capital

Sunstone Capital is a Nordic-based early stage venture capital investor with over EUR 400 million in funds under management. Sunstone focuses on developing and expanding early-stage Technology and Life Science companies . For more information, please visitwww.sunstonecapital.com.

About Conor Venture Partners

Venture Partners is a leading early-stage technology VC investing in Finland, Sweden and the Baltics. Conor invests in disruptive technologies in ICT, embedded systems, electronics, new materials and optics. For more information, please visit www.conor.vc.

Emil will be presenting tomrrow at NoSQL East, sponsored by Neo Techology.

by Peter Neubauer at February 16, 2010 08:07 PM

Neo Technology Announces Version 1.0 of the Neo4j Graph Database

Malmö, Sweden - February 16, 2010 - Neo Technology announced today the general availability of version 1.0 of Neo4j, the world's leading graph database. Neo4j is a high performance database for complex data used by leading Web companies; with pre-1.0 versions in 24/7 production since 2003. Neo4j version 1.0 is immediately available both under a free open source license from www.neo4j.org, and under a supported commercial license from Neo Technology.

"Like many other successful Web 2.0 companies, Box.net has seen its share of pressure to deliver a high performance customer experience on a relational database like MySQL. Having experienced Neo4j first hand, I have to say that it is a god send. Neo4j significantly outperforms MySQL, with a drastically simplified programming model." said Sam Ghods, VP of Technology with cloud content management provider Box.net.

"We have seen that the Neo4j graph database approach to implementing enterprise-scale projects can yield significant cost benefits and clearly differentiate our clients from their competitors," said Jim Webber, PhD, Director of Professional Services with global IT consultancy ThoughtWorks.

"We have made tremendous progress over the last few months. We have a large, vibrant and rapidly growing Neo4j community with hundreds of Neo4j-based projects under development. Our customer base is steadily growing and we are very humbled by the fantastic results they are achieving with Neo4j," said Emil Eifrem, CEO of Neo Technology.

About The Neo4j Graph Database

Neo4j, the world’s leading graph database, stores data in graphs rather than relational tables. This makes Neo4j especially suitable for applications that handle data with complex relationships, like social networks, life sciences, intelligence and financial applications. Neo4j offers users:
  • extremely high performance on deep traversals and mining of complex data,
  • rapid schema evolution for changing business requirements, and
  • simplified development through perfect match between domain model and database schema.
These advantages make Neo4j the most effective database choice by many social networking services and other applications that manage ever more complex business data.

About Neo Technology

Neo Technology is the developer of Neo4j, the world’s leading open source graph database. In 24/7 production since 2003, Neo4j is available in open source under AGPLv3 and commercial license terms. For more information, please visit www.neotechnology.com.

by Peter Neubauer at February 16, 2010 11:40 AM

Neo4j Blog

Yay! The Graph Processing Infrastructure is starting to emerge!




Hi all,
in the last months, the Tinkerpop team has been starting to venture into the big task of starting a unified ecosystem for the world of graphs and related projects and products. Now, I am proud to say that it seems things are starting to get some traction and see increasing contributions from outside the core team, mainly the awesome Marko Rodriguez:



Logo contributed by Ketrina Yim
  • The JUNG graph library got adapted to Gremlin
  • HyperGraphDB is being adapted to work with Gremlin
  • a REST API based on the awesome work of Jim Webber and the Neo4j team is in the making by Michael Hunger and Pavel Yaskevich
So, here is the current project ecosystem - great work of everyone involved!

Gremlin:
  • mainly driven by Marko Rodriguez
  • a library and standalone, single-user Java project, defining a
  • number of data models - to start with the Property Graph Model (PGM) and the
  • General Document Model (GDM) , soon to be broken out of the core Gremlin code.
  • Adapters to different underlying graph implementations, from Neo4j to SAIL, integrating anything from Sesame to a live LinkedData SAIL
  • Adapters to other interesting graph frameworks like JUNG, suggested by Seth@Automenta
  • A Turing complete scripting language for querying, modification and transformation of PGM and GDM compliant data structures
  • All selectors are XPath-based in syntax
  • Pluggable external path elements and function implementation.

RESTling:
Webling:
  • Driven mainly by Pavel Yaskevich via financing from Neo Technology
  • A web based visual end-user interface to Restling
  • A web based terminal supporting execution of Gremlin operations and logic
  • Visualization support with graph libraries
  • Multi-user support
  • Via REST support to connect to remote Restling instances
Gargamel:
  • Driven mainly by Marko Rodriguez
  • a execution framework primarily targeted at Bulk Synchronous Parallel graph algos
  • A number of highly parallel base graph algos integrated into Gremlin to use this framework
  • A communication framework for execution of gremlin tasks on different (partitioned or replicated) graph instances, firstly using LinkedProcess (financed by the LANL) and XMPP, but replaceable with e.g. an Erlang-implementation (kudos to Ingo Schramm for suggesting it) or RESTling- based communication for optimization of different aspects like inter-process communication during execution

All in all, I just wanted to express my excitement over the whole emerging community around Gremlin, Neo4j and graphs in general! It is thrilling to see that the easy use of graphs and the internet-scale processing of complex data structures is starting to take shape in an open world, getting the different views on graphy data onto one page and providing a broader audience the possibility to use graph structures in the real world.

/peter neubauer

by Peter Neubauer (noreply@blogger.com) at February 08, 2010 05:43 PM

Taylor Cowan

Jo4neo’s “get most recent” feature

“show the latest ….” is a common prefix to user stories these days.  Others have noted the same and given this symptom a moniker; ”The real time web“.  Typically we just throw things into a table with an indexed timestamp column and query accordingly. In jo4neo, finding the most recent additions requires two simple steps: annotation your type with @neo(recency=true) use the ObjectGraph.getMostRecent() method [...]

by tcowan at January 11, 2010 03:56 AM

Taylor Cowan

Indexing time and URI’s in jo4neo

Graphs in and of themselves are not self indexing like relational databases, however, you can construct indexes via strong relationships between the nodes of interest.  The pattern I’ll be discussing in this post maps time (year, month, day, hour) into a graph format as nodes and edges.  Once time, or some subset, is represented as [...]

by tcowan at January 07, 2010 05:48 AM

Taylor Cowan

Simple Blog Using jo4neo and Stripes

neoblog is a simple application I built to test drive jo4neo.  You are welcome to browse the code here for details not covered in this post. It demonstrates the feasibility of utilizing view tier objects to persist graph relationships. Stripes, Struts, and other Java MVC frameworks all hinge off of a domain model expressed as Java [...]

by tcowan at January 03, 2010 09:40 PM

Neo4j News

Neo4j 1.0-b11 released: stability & robustness

Neo4j 1.0-b11 — the open source nosql graph database — has been released. This is the last beta before we (after 6 years in commercial 24/7 production use) finally feel that we have a version that is worthy of 1.0. This means that the main focus of this release is stability and robustness rather than features. Having said that, Neo4j 1.0-b11 still includes amongst other things a new batch inserter version that implements the NeoService API (to minimize the impact of first-time imports on the rest of your code) and a lot of cleanup and improvements of the indexing utilities.

Download the Neo4j Core release or the Apoc bundle here.

For more information, read the list mail announcement and check out the details in the changelog.

As always, feedback to the mailing list, on Twitter or directly to us.

by Emil Eifrem (noreply@blogger.com) at December 27, 2009 08:03 PM

Neo4j Blog

Holiday fun with Neo4j

Looking for something fun to do during the holidays? Here are a few suggestions for some new cool Neo4j things that you can play around with.

A very recent addition to the Neo4j space is the JRuby library Neo4jr-social by Matthew Deiters:

Neo4jr-Social is a self contained HTTP REST + JSON interface to the graph database Neo4j. Neo4jr-Social supports simple dynamic node creation, building relationships between nodes and also includes a few common social networking queries out of the box (i.e. linkedin degrees of seperation and facebook friend suggestion) with more to come. Think of Neo4jr-Social is to Neo4j like Solr is to Lucene.

Neo4jr-social is built on top of Neo4jr-simple:

A simple, ready to go JRuby wrapper for the Neo4j graph database engine.

There's also the Neo4j.rb JRuby bindings by Andreas Ronge which have been developed for quite a while by multiple contributors.

Staying in Ruby land, there's also some visualization and other social network analysis stuff going on.

Looking for something in Java? Then you definitely want to take a look at jo4neo by Taylor Cowan:

Simple object mapping for neo. No byte code interweaving, just plain old reflection and plain old objects.

There's also a blog post where Taylor shows how to model a User/Roles pattern using jo4neo.

There's apparently a lot of work going on right now in the Django camp to enable support for SQL and NOSQL databases alike. Tobias Ivarsson (who's the author and maintainer of the Neo4j Python bindings) recently implemented initial support for Neo4j in Django. Read his post Seamless Neo4j integration in Django for a look at what's new.

One more recent project is the Neo4j plugin for Grails. There are already some projects out there using it. We want to make sure Neo4j is a first-class Grails backend so expect more noise in this area in the future.

You can find (some of the) projects using Neo4j on the Neo4j In The Wild page. From the front page of the Neo4j wiki you'll find even more language bindings, tutorials and other things that will support you when playing around with Neo4j!

Happy Holidays and Happy Hacking wishes from the Neo4j team!

by Anders Nawroth (noreply@blogger.com) at December 25, 2009 05:14 PM

Taylor Cowan

User/Roles Pattern in jo4neo

Roles and Users is a classic domain model well suited to representation as a directed graph. The neo4j team has provided us with a good summary of how to implement this pattern using neo4j here . Utilizing jo4neo we can also solve this problem via a combination of the neo graph database and [...]

by admin at December 18, 2009 03:03 PM

Tobias Ivarsson

Seamless Neo4j integration in Django

About a year ago I gave a presentation at Devoxx where I showed off how easy it was to use any Java library with Django in Jython. The library I demonstrated this with was of course Neo4j. I had written some code for using Neo4j to define models for Django, and now it is ready to be released for you to use it.

The way that the integration between Django and Neo4j is implemented is in the Model layer. Since Neo4j does not have a SQL engine it would not have been efficient or practical to implement the support as a database layer for Django. Google did their implementation in the same way when they integrated BigTable with Django for App Engine. This means that there will be some minor modifications needed in your code compared to using PostgreSQL or MySQL. Just as with BigTable on App Engine, you will have to use a special library for defining your models when working with Neo4j, but the model definition is very similar to Djangos built in ORM. With persistence systems that integrate on the database layer the only difference is in configuration, but that requires the database to fit the mold of a SQL database.

Why the **** has this taken a year to finish?

Short answer: The cat ate my source code.

A mess of symlinks that stemmed from the fact that Jython didn't have good support for setuptools when I started writing this code actually lead to the complete loss of my source code. But to be honest the code wasn't that good anyways. I wanted to add support for Django's administration interface, and I knew that undertaking would require a complete rewrite of my code. A complete rewrite is done and now it will be possible for me to support the administrative interface of Django in the next release. So why not until now, a year after the first prototype? I was working on other things, it's that simple.

Getting started

While the demonstration I gave a year ago was geared towards Jython, since that was the topic of the presentation, the Python bindings for Neo4j work equally well with CPython. That is all you need, Neo4j and Django, the Python bindings for Neo4j comes with a Django integration layer built in as of the most recent revisions in the repository. The source distribution also contains a few sample applications for demonstrating how the integration works. The Django integration is still in a very early stage of development, but the base is pretty solid, so new features should be much easier to add now. Since the state is pre-alpha, installation from source is the only option at the moment. Let me walk you through how to get things up and running:

  • Set up and activate a virtualenv for your development. This isn't strictly necessary, but it's so nice to know that you will not destroy your system Python installation if you mess up. Since we got Jython to support virtualenv I use it for everything. If you use CPython your virtualenv will contain a python executable, and if you use Jython it will contain a jython executable, I will refer to either simply as python from here on, but substitute for that for jython if you, like me, prefer that implementation.
  • If you are using CPython: Install JPype, it is currently a dependency for accessing the JVM-based core of Neo4j from CPython:

    $ unzip JPype-0.5.4.1.zip
    $ cd JPype-0.5.4.1
    $ python setup.py install
  • Check out the source code for the Python bindings for Neo4j, and install it:

    $ svn co https://svn.neo4j.org/components/neo4j.py/trunk neo4j-python
    $ cd neo4j-python
    $ python setup.py install
  • Install Django:

    $ easy_install django
  • Create a new Django project:

    $ django-admin.py startproject neo4django
  • Create a new app in your Django project:

    $ python neo4django/manage.py startapp business
  • Set up the configuration parameters for using with Neo4j in Django by adding the following configurations to your settings.py:

    NEO4J_RESOURCE_URI = '/var/neo4j/neo4django'
    # NEO4J_RESOURCE_URI should be the path to where
    # you want to store the Neo4j database.

    NEO4J_OPTIONS = {
    # this is optional and can be used to specify
    # extra startup parameters for Neo4j, such as
    # the classpath to load Neo4j from.
    }
    You can ignore the default Django configurations for RDBMS connections if you only plan to use Neo4j, but if you want to use Djangos built in Admin interface (not supported with Neo4j quite yet) or authentication module you will need to configure this.
  • You are now ready to create your first Neo4j backed domain objects for your Django application, by editing business/models.py. Let's create a simple model for companies with owners and employees:

    from neo4j.model import django_model as model

    class Person(model.NodeModel):
    first_name = model.Property()
    last_name = model.Property()
    def __unicode__(self):
    return u"%s %s" % (self.first_name, self.last_name)

    class Company(model.NodeModel):
    name = model.Property(indexed=True)
    owners = model.Relationship(Person,
    type=model.Outgoing.OWNED_BY,
    related_name="owns",
    )
    employees = model.Relationship(Person,
    type=model.Incoming.WORKS_AT,
    related_name="employer",
    related_single=True, # Only allow Persons to work at one Company
    )
    def __unicode__(self):
    return self.name
  • That's it, you've created your first Django domain model using Neo4j, let's try it out:

    $ python neo4django/manage.py shell
    >>> from neo4django.business import models
    >>> seven_eleven = models.Company.objects.create(name="Seven Eleven")
    >>> seven_eleven.employees.add(
    ... models.Person.objects.create(
    ... first_name="Sally", last_name="Getitdone"),
    ... models.Person.objects.create(
    ... first_name="John", last_name="Workerbee"))
    >>> seven_eleven.save() # store the newly created relationships
    >>> people = list(seven_eleven.employees.all())
    >>> someone = people[0]
    >>> print someone, "works at", someone.employer

Notice how the model objects are compatible with model objects created using Djangos built in ORM, making it easy to port your existing applications to a Neo4j backend, all you need to change is the model definitions. For more examples, see the example directory in the repository:https://svn.neo4j.org/components/neo4j.py/trunk/src/examples/python/.

Future evolution

There is still more work to be done. As this is the first release, there are likely to be bugs, and I know about a few things (mainly involving querying) that I have not implemented support for yet. I also have a list of (slightly bigger) features that I am going to add as well, to keep you interested, I'll list them with a brief explanation:

  • Add support for the Django admin interface. You should be able to manage your Neo4j entities in the Django administration interface, just as you manage ORM entities. To do this I need to dig further into the internals of the admin source code, to find out what it expects from the model objects to be able to pick up on them and manage them. The hardest part with this is that the admin system has a policy of silent failure, meaning that it will not tell me how my code violates its expectations.
  • Add support for Relationship models. Currently you can only assign properties to nodes in the domain modeling API, you should be able to have entities represented by relationships as well. The way you will do this is by extending the Relationship-class.
  • Add a few basic property types. I will add support for creating your own property types by extending the Property-class (this is implemented already, but not tested, so if it works it's only by accident). I will also add a few basic subtypes of Property, a datetime type at the very least. I will also add support for choosing what kind of index to use with each indexed property, in the case of datetime a Timeline-index seems quite natural for example... Supporting enumerated values for Properties is also planned, i.e. limiting the set of allowed values to an enumerated set of values.
  • Tapping in to the power of Neo4j. By adding support for methods that do arbitrary operations on the graph (such as traversals), and where the returned nodes are then automatically converted to entity objects. I think this will be a really cool and powerful feature, but I have not worked out the details of the API yet.

Report any bugs you encounter to either the Neo4j bug tracker, or on the Neo4j mailing list. Suggestions for improvements and other ideas are also welcome on the mailing list, to me personally, or why not as a comment on this blog.

Happy Hacking

by Tobias (noreply@blogger.com) at December 15, 2009 12:22 AM

Peter Neubauer

Neo4j.rb - dope that Ruby traverser!

The Ruby way

Hi there,

I started to do some prototyping of a deep file traversal in Neo4j.rb (the awesome JRuby bindings done by Andreas Ronge of Jayway - the best Java consultants in Southern Sweden). Using Cucumber, things progressed very fast, so after 2h I had my tests running with the following nodespace layout (you can find the full code example here:

Filetree
 
Now a dynamic calculation of a top folder total size by traversing all files and summing up their "size" properties looks like this in step_definitions.rb:

def calcTotalSize(folder)
  totSize = 0 
  folder.relationships.outgoing(:child).nodes.each do |node|
    if(node[:size] != nil)
      totSize+=node[:size]
    else #this is a folder
      totSize+=calcTotalSize(node)
    end
  end
  return totSize
end

When doing this with the first test in the treesizes.feature

  Scenario: Simple tests
    When I create a filetree with 2 files a 1kb and 1 subfolders in each folder, 3 times nested
	Then the total number of nodes in the db should be greater than 7
	Then the total size of one top folder files should be 4 kb and response time less than 0.015 s

Things are pretty good, we are traversing 2 files, and the total time on my MBP with SSD (yes that ROCKS) is 5ms.

However, cranking the test up to over 20.000 files and folders:

Scenario: Bigger data sample
    When I create a filetree with 400 files a 1kb and 50 subfolders in each folder, 3 times nested
	Then the total number of nodes in the db should be greater than 20000
	Then the total size of one top folder files should be 20400 kb and response time less than 0.5 s

Results in a traversal speed of over 2.3 seconds for that method. Why is this so slow? Well, in Neo4j.rb we are trading development ease for performance. Every node created through Neo4j.rb with Neo4j::Node.new is getting wrapped in a nice little Ruby class holding the properties and basically hiding the Neo4j graph under a very Object Database-like fashion - persistence just "happens" under the hood.

Looking a file node in the above image, you notice that there is a property attached to it that tells us the classname of the Ruby class. Now, every time we are accessing this node through JRuby, we are actually not talking to the node but to a nicely wrapped JRuby instance that is persisted as a decomposition of the node.
testFile = Neo4j::Node.new
puts 'classname: ' + testFile[:classname]

gives us

classname: Neo4j::Node

Thus, every time we step in the above traversal, we are making a roundtrip from the graph into JRuby objects and back to the next hop in the graph.

Java Traversers for the rescue

Luckily, speeding up the traversal a bit is quite easy. The Neo4j Java Traversar API is taking a different approach. By giving the instructions on how to traverse the graph upfront, the full traversal is done in the graph, lazily returning and hoping around in the data structure as the result set is fetched by the client. Thus, the speed of traversal is magnitudes higher than "out of graph" traversal.

The Neo4j Java Traversar API is easily accessible from JRuby, so we can extract the underlying Neo4j node reference from the JRuby Neo4j::Node wrapper, and traverse the graph using Neo4j Java API while still not leaving JRuby:

#this is about 8x faster - untweaked
def calcSizeJava(node)
  neoNode = node.internal_node
  size = 0
  child = org.neo4j.api.core.DynamicRelationshipType.withName 'child'
  traverser = neoNode.traverse(org.neo4j.api.core.Traverser::Order::DEPTH_FIRST, 
    org.neo4j.api.core.StopEvaluator::END_OF_GRAPH, 
    org.neo4j.api.core.ReturnableEvaluator::ALL, child, org.neo4j.api.core.Direction::OUTGOING )
  while traverser.hasNext()
    node = traverser.next
    if node.hasProperty('size')
      size += node.getProperty('size')
    end
  end
  size
end

Now, this gains on the above traversal over 20K nodes about 8 times the speed, resulting in 304ms traversal time with the Java API. Well under the target 0.5 seconds in the feature. Still, this is interpreted code, so there are significantly more gains to be done, but at least the traversal is done "in-graph" without leaving JRuby and not even taking into consideration JRuby in compiled mode or tweaking in Java to get it down to the full speed for this type of traversal, which should be well under 50ms. More on that in another post :)

I found it a very "cheap" way to crank up the JRuby speed for a bigger prototype, it might work for you, too?

p.s. feel free to run, fork and improve the code, this is just a few hours spike ...

by Peter Neubauer at November 25, 2009 09:42 AM

Peter Neubauer

Hi world,
here I am going to blog about stuff that is loosely connected to Graphs, Networks, Neo4j etc.

by Peter Neubauer at November 24, 2009 02:38 PM

Emil Eifrem

NOSQL: scaling to size and scaling to complexity

About a week ago, following nosql east in Atlanta, Jonathan Ellis from the Cassandra project published a fantastic overview of the current NOSQL ecosystem. He analyzes 10 popular NOSQL databases along three axes: horizontal scalability, data model and internal persistence design. It's a great read.

The third axis (internal persistence design) may not be terribly relevant for users of NOSQL systems [1] but the position on the first two axes reveal some important underlying assumptions. In particular, it reveals a focus: is this NOSQL project oriented around scaling to size or scaling to complexity? [2]

The four main NOSQL data models

Now, there are four main categories of NOSQL databases today. Before we get into how they differ in focus, let me just quickly run through them and outline a few key characteristics:

Key-Value Stores

BigTable Clones (aka "ColumnFamily")

  • Lineage: Google's BigTable paper.
  • Data model: Column family, i.e. a tabular model where each row at least in theory can have an individual configuration of columns.
  • Example: HBase, Hypertable, Cassandra [3]

Document Databases

  • Lineage: Inspired by Lotus Notes.
  • Data model: Collections of documents, which contain key-value collections (called "documents").
  • Example: CouchDB, MongoDB, Riak

Graph Databases

  • Lineage: Draws from Euler and graph theory.
  • Data model: Nodes & relationships, both which can hold key-value pairs
  • Example: AllegroGraph, InfoGrid, Neo4j

Scalability focus

How then do these data models scale to size and complexity? Check out this slide from my presentation at nosql east: 

NOSQL data models mapped along size and complexity scalability.

The exact positions in the picture above are obviously debatable but I think it serves to illustrate my point: the key value stores and BigTable clones of the world handle size really well. This is because they have data models that can easily be partitioned horizontally. Which is great for scale out of, for example, simple two-column data like a whole bunch of username/password pairs.

The drawback however is that by constraining themselves to simpler data models, they've pushed complexity up the stack. So if you have data with a non-trivial structure, then you have to compensate for a simple data model by adding more complex functionality in the upper layers. [4]

Document databases and graph databases, on the other hand, have opted for richer data models. This means that they have more powerful abstractions that make it easy to model both simple and complex domains. But these richer data models introduce more coupling of data and therefore it's more challenging to get them to scale to size.

Size matters (but you're not Google so complexity matters more)

Now, size gets a lot of attention because scaling out to hundreds of machines is very sexy. But here's the kicker: the majority of the use cases out there don't need to store hundreds of billions of objects and scale out to truckloads of machines.

NOSQL data models - 90% of use cases need only moderate size scalability.

At the end of the day, there are only so many projects of Amazon and Google scale out there. A lot of projects fit within a couple of BILLIONS of objects. For most people, it's a lot more important to have a rich data model that lends itself to easily represent their domain.

Ben Scofield of Viget Labs expresses it eloquently in NoSQL Misconceptions:

"... there's a lot more to NoSQL than just performance and scaling. Most importantly (for me, at least) is that NoSQL DBs often provide better substrates for modeling business domains. I've spent more than two years struggling to map just part of the comic book business onto MySQL, for instance, where something like a graph database would be a vastly better fit."

Choose your hammer wisely

It's important to note that these data models are all isomorphic. Which is a fancy way of saying that you can express all datasets in either one of them. For example, you can decompose any data into a collection of key-value pairs.

But that's a bit like claiming you can write any program in any Turing complete programming language: sure, it's true in theory but just because you can doesn't mean that you should. In practice there's a bunch of programming languages that are a poor fit for many use cases. And the same is true of data models.

I think it's clear that we're rapidly moving beyond the era of the One Size Fits All database. Whereas in the past you could always trust that any decent-sized app had a relational database as backend, it's now increasingly about matching your dataset to whatever data model fits best. NOSQL is not No To SQL. NOSQL means Not Only SQL, as in: in the future, our backends will consist of Not Only SQL databases but also key-value stores, graph databases and more.

NOSQL is about choice and picking the right tool for the job. When you look at adding a NOSQL database to your current project, consider your requirements both for scaling to size and for scaling to complexity.

1] Few developers care whether their RDBMS implementation uses hash joins or nested loop joins.

2] Scaling to size and scaling to complexity was introduced (at least to me) in O'Reilly's Beautiful Data by Toby Segaran and Jeff Hammerbacher. The graph of the various NOSQL data models was first visualized by my friend and colleague Peter Neubauer.

3] Cassandra is actually the first of the "second-generation" NOSQL databases and it combines the decentralized scale out architecture of the Dynamo clones with the data model of BigTable.

4] As an analogy, imagine writing any piece of software and the only construct you had for storing state was a single global hashtable. No linked lists, no arrays, no structs, no objects. Imagine how much code you'd have to add just to work around that hashtable! Now, a key-value store is basically a distributed hashtable. This is why they have problems with scaling to complexity.

by Neo Technology at November 17, 2009 09:29 PM

Neo4j News

Neo4j 1.0-b10 released: read-only mode & faster deep traversals

Neo4j 1.0-b10 - the open source nosql graph database - has been released with new features including a read-only mode, improved depth first traversal speed due to an iterator implementation all the way down to the native store layer and faster recovery process when starting up after a crash. Download the Neo4j Core release or the Apoc bundle here.

For more information, read the list mail announcement and check out the details in the changelog.

As always, feedback to the mailing list, on Twitter or directly to us.

by Emil Eifrem (noreply@blogger.com) at November 03, 2009 01:03 AM

NoSQL East & semweb meetup in DC

Emil will represent Neo4j at two upcoming event: Emil and Tim Berners-Lee -- I'm sorry Sir Tim Berners-Lee, the father of the web -- will speak at the semantic web meetup in association with ISWC in Washington DC on Oct 27 2009.

After that, we're heading straight to nosql east where our commercial backer Neo Technology will sponsor the conference and Emil will give a Neo4j talk.

If you're attending either one or are just in the area, please ping us so we can grab a beer.

by Emil Eifrem (noreply@blogger.com) at November 03, 2009 01:03 AM

Emil Eifrem

Let's go

So TechCrunch spilled the beans with a nice writeup and the genie is officially out of the bottle: Two weeks ago today, we closed a $2.5M seed stage investment with kickass VC firms Sunstone Capital and Conor Venture Partners.

As our friends and family know, we have danced the wonderful VC dance for a while now. As a matter of fact, we met the Sunstone and Conor teams for the first time over a year ago. During this year we've dated regularly -- and happily so -- but have still not quite been ready to make that final leap from dating to marriage.

Until now. As the interest around the NoSQL space and alternative databases has just exploded over the past 3-4 months it's been increasingly clear to us that we want to and need to accelerate. We fully believe that we're moving towards what Ben Scofield of Viget Labs describes as a "pluralistic approach to storing our data," i.e. an architecture where a single application typically works with multiple databases, each storing the datasets they're best for.

In a world where it's all about choosing the database paradigm that best fits your dataset instead of squeezing everything into the relational database, in that world we believe there's a huge opportunity for graph databases. With Sunstone and Conor on board, this investment will give us the means to pursue it.

by Neo Technology at October 28, 2009 08:46 AM

Neo4j News

Initial release of Neo4j Grails plugin

As announced by Stefan Armbruster on the Neo4j and Grails mailing lists, the initial 0.1 version of the Neo4j Grails plugin has been released by him. Read the full announcement in this blog post. Grails is a web application framwork based on the Groovy langauge. At the moment the plugin has support for the basic CRUD operations and also exposes the underlying node of each domain object through a property.

Different people have requested such a plugin previously, so it's exiting news that the plugin now exists.

Stefan has also provided example code for how to use the plugin. Basic domain classes may look like this:

class Author { 
String name
Date dob
static hasMany = [ books: Book ]
}

class Book {
String title
static belongsTo = [ author:Author ]
}

After adding a little data to the domain the node space will look like this (click for bigger version):


Further information is found on the Neo4j wiki Grails page.

by Anders Nawroth (noreply@blogger.com) at October 05, 2009 04:10 PM