Planet Neo4j

Neo4j home | news | blog | Neo Technology

Neo4j Blog

We won the Rapidus award!

I was running late - meeting across time zones is a hassle. Standing in the street I could hear the heavy rock music from the night club. Was this really the place for a big media event in Malmö? Stepping into the dark it felt totally right though. More than 150 people had dressed down to participate in the mingle and awards that night. Rock away! Rapidus is an online newsletter here in

by Björn Granvik (noreply@blogger.com) at January 26, 2012 01:48 PM

Neo4j Blog

Released Neo4j 1.6 GA “Jörn Kniv”!

Three milestones later and we’re proud and happy to announce the release of Neo4j 1.6 GA. We are excited about a host of great new features, all ready to be used. Let's get to it. Highlights What features have been included in this release? Cloud - Public beta on Heroku of the Neo4j Add-on Cypher - Supports older Cypher versions, better pattern matching, better performance,

by Björn Granvik (noreply@blogger.com) at January 25, 2012 04:16 PM

Neo4j Blog

Spring onto Heroku

Andreas Kollegger Deploying your application into the cloud is a great way to scale from "wouldn't it be cool if.." to giving interviews to Forbes, Fast Company, and Jimmy Fallon. Heroku makes it super easy to provision everything you need, including a Neo4j Add-on. With a few simple adjustments, your Spring Data Neo4j application is ready to take that first step into the cloud. Let's walk

by Andreas Kollegger (noreply@blogger.com) at January 20, 2012 02:17 PM

Neo4j Blog

Neo4j - Heroku Application Template Challenge

Dear Developer Community, Today, we challenge you to create the best Heroku-hosted demo or template applications for the Neo4j Add-on. Every participant will get a Neo4j-Heroku t-shirt and awesome prizes will be given to the best contributions. Throughout the next month you have the chance to provide others with ready-made applications that are educational, tested and working well. At the

by Michael Hunger (noreply@blogger.com) at January 18, 2012 05:56 PM

Neo4j Blog

Spring Data Neo4j Webinar Follow Up

Hey everyone, This week, we had a great turnout for our Intro to Spring Data Neo4j webinar, presented by Michael Hunger. As promised, here are the rest of the questions that we weren't able to cover during the session: What's the difference between @RelatedTo and @RelatedToVia? @RelatedTo refers to the node-entities at the other end of the relationship @RelatedToVia refers to

by ayeeson (noreply@blogger.com) at January 14, 2012 01:18 AM

Neo4j Blog

Neo4j 1.6.M03 “Jörn Kniv”

Another milestone is waiting for you - Neo4j 1.6.M03. Highlights in this release are: support for indexing unique entities, array queries in Cypher, and a Lucene update to version 3.5. It’s now available for download, and you can try it out right now on Heroku. Enjoy! Kernel changes Rickard Öberg This release includes a popular feature request: the ability to ensure that key-value pairs for

by Julian (noreply@blogger.com) at January 12, 2012 06:23 PM

Neo4j Blog

Neo4j - Community matters. You matter.

Peter Neubauer Hi all graphistas, Andreas Kollegger a while back I wrote a mail, announcing that we are putting a lot of effort from Neo Technology into helping the Neo4j community to prosper. This is not just empty talk. Michael Hunger We're more than happy to announce that from now on a dedicated team (with mainly Peter, Andreas Kollegger and Michael Hunger contributing part of

by Peter Neubauer (noreply@blogger.com) at January 04, 2012 02:37 PM

Neo4j Blog

Neo4j 1.6.M02 “Jörn Kniv”

We have another milestone for you - 1.6.M02. As I’ve written before, we’re heavily into improving our infrastructure - our build, stress testing etc. But we have more: Faster and better Cypher and open beta on Heroku! Heroku Public Beta Our private beta on Heroku was going along just fine. We were getting positive feedback, tweaking provisioning and monitoring, and starting to feel

by Björn Granvik (noreply@blogger.com) at December 20, 2011 01:21 PM

Domain Modeling With Spring Data Neo4j

Spring Data Neo4j provides a powerful framework for building applications with simple POJO annotations.  In this blog post, Willie Wheeler explains domain modeling with Neo4j using Spring Data Neo4j. Focusing on a Person entity (common in so many applications), Willie shows the annotations needed to map into Neo4j nodes and relationships. Then he introduces a repository to retrieve instances from

by Andreas Kollegger (noreply@blogger.com) at December 20, 2011 01:53 AM

Neo4j Blog

A “typical” week in the Neo4j Community

We were curious ourselves how much happens within a single week in our community. So we did the easy thing and just harvested our ingenious @neo4j twitter stream. And here is the result, we were just blown away, so much activity and so many people contributing, thanks a lot to all of you. li { list-style: disc url(http://www.geni.com/images/external/twitter_bird_small.gif); } ul {

by Michael Hunger (noreply@blogger.com) at December 16, 2011 11:07 PM

Neo4j Blog

Neo4j is going GoogleGroups

Hi everyone, after some preparation, we are finally done transitioning all the new Neo4j and Graph-related discussions to The Neo4j Google Group. New discussions will be started there, so please accept the invite you have got as an existing community member and switch over. You can even directly subscribe from the neo4j.org site. We are keeping the old archives around so we can refer to them

by Peter Neubauer (noreply@blogger.com) at December 12, 2011 03:03 PM

Neo4j Blog

Cypher - A view from a recovering SQL DBA

Through the eyes of a SQL master, Cypher looks different yet somehow familiar. In this humorous and insightful blog post, Andrés Taylor guides you along the mental shift required to grok querying Neo4j. Through progressive examples, he presents SQL queries then maps them to the equivalent Cypher. Read the full post: Cypher - A view from a recovering SQL DBA

by Andreas Kollegger (noreply@blogger.com) at December 07, 2011 10:54 PM

Neo4j Labs: Heroku, Neo4j and Google Spreadsheet in 10min. Flat.

Hi all, Last Friday, we were all labbing again - the best day of the week. I didn't have much time so I decided to try to produce a screencast that would measure the time required to go from nothing to flash using some of our tools. What I came up with demonstrates the process required to set up a Neo4j instance via Heroku, then connect to it from within a Google Spreadsheet (which your

by Peter Neubauer (noreply@blogger.com) at December 07, 2011 09:31 PM

Neo4j Blog

Neo4j 1.6.M01 “Jörn Kniv”

Hi there, We have a new milestone for y’all. Previously we’ve had “Boden Bord” and “Abisko Lampa”, both going southwards through Sweden from the very north. And now we’ve released the first milestone of the up and coming 1.6, named after the small city of Jörn. Before writing about it, I’d like to get a quick chance to introduce myself (this feels like stealing the stage...). My name is

by Björn Granvik (noreply@blogger.com) at November 25, 2011 03:13 PM

Neo4j Blog

Creating a DSL for Cypher

With Cypher, querying data is like creating Ascii Art to navigate through information. While fun and powerful, sometimes the pesky execution engine doesn’t appreciate the beauty of what you’ve typed. In this blog post, Rickard Öberg presents his implementation of a DSL to safely construct Cypher queries in Java. For example: start( node( "n", 3, 1 ) ). where( prop( "n.age" ).lt( 30 ).and( prop(

by Andreas Kollegger (noreply@blogger.com) at November 16, 2011 02:08 AM

Neo4j Blog

Announcing Neo4j "Boden Bord" 1.5 GA Release

Hello graphistas! After a successful Milestone 2 release of Neo4j 1.5 "Boden Bord" and excellent community and customer feedback, we've been busy at work putting the finishing touches to our Neo4j 1.5 GA release, which is now available on our downloads page. Since the last milestone you'll find we've smoothed a few rough edges and the documentation has been made really spick-and-span. We think

by Jim Webber (noreply@blogger.com) at November 10, 2011 06:53 PM

Chris Gioran

Rooting out redundancy - The new Neo4j Property Store

Intro

So, for the last 2 months we've been working diligently, trying to create the 1.5 release of Neo4j. While on the surface it may look like little has changed, under the hood a huge amount of work has gone into a far more stable and usable HA implementation and rewriting the property storage layer to use far less disk space while maintaining all its features and providing a speed boost at the same time. In this post I will deal exclusively with the latter.

Departing from the old model: A new way to store things

So far, the properties were stored on disk in a doubly linked list, where each of its nodes contained some necessary administrative/structural overhead and the actual property data. More specifically, the layout was:
byte  0     : 4 high bits of previous pointer, inUse flag
byte 1 : unused<
byte 2 : 4 high bits of next pointer
bytes 3-4 : property type
bytes 5-8 : property index
bytes 9-12 : previous pointer 32 low bits
bytes 13-16 : next pointer 32 low bits
bytes 17-24 : property data

The last 8 bytes where the value stored, enough to accommodate all primitive values, a short string or a pointer to the dynamic store, where a dynamic record chain would store a long string, an array of primitives or String[].

There is some waste here, in part because the full 8 bytes are used in the (rare) cases of storing doubles and longs or for short strings but mostly because this pointers are repeated for each property, making the impact of the structural overhead felt. On the flip side, the Short String optimization was a great success, proving the value in inlining more property types. So we decided to highlight the good parts and lowlight the bad, ending up with a PropertyRecord structure that is no longer equivalent to one property but acts as a container for a variable number of variable length properties. The current layout is:

byte 0 : 4 high bits of previous, 4 high bits of next pointers
bytes 1-4 : previous property record
bytes 5-8 : next property record
bytes 9-40 : payload
Yes, that is correct, no inUse flag, explained by the payload structure.

First, let's call the 4 8-byte-blocks in payload just blocks, to have a simple name for them. Each of these blocks is used in various ways, depending on the property data type. Starting off, every property needs to have the property index and the property type. These are common and always present, with the property index taking up the first 3 bytes of the block and the type taking up the 4 high bits of the 4th byte. Now, after that comes the property value. If it is a primitive that fits in 4 bytes, then the 4 low bits of the 4th byte are skipped and the remaining 4 bytes of the block are used to store the value and we are done. When storing a pointer into the DynamicStore for non-short strings and for arrays, the 36 bits required find home to the second half of the 4th byte and the low order 4 bytes. This means that each PropertyRecord can store up to 4 such properties - a huge saving in space.
For longs and doubles which require 8 bytes, the 4 1/2 trailing bytes are skipped and instead the next block is used as a whole to store the value. This leads to some waste but it is still more efficient than the previous method and it is a relatively rare use case.

What remains is ShortStrings and the brand new ShortArray. Since we saved all that space and I/O calls with ShortString, why not expand on the idea? We now have LongerShortString, which is like ShortString but on crack. It operates on the same principle - it scans a string, sees if it falls within an encoding, encodes it and stores a header with the length and the encoding table id and then the actual data, encoded in longs that take up blocks right after the property info. If it doesn't fit in the max of 3 1/2 blocks of a property record, it is instead encoded as UTF8 and stored in the DynamicStringStore. A similar idea is applied to arrays. When passed a primitive array we first determine the minimum number of bits required to store its values, effectively shaving off all the leftmost zeros we can while keeping all array members the same size. This means that if we are asked to store new int[] {1,2,3,4,5}, the entries will take up not 32 but 3 bits each. boolean[] for example costs 1 bit per entry. Obviously, mixing in even a single negative value gives immediately a maximum number of bits per entry. So, to store an array we first determine this number and then the header becomes:

   4 bits, an enum value identifying the primitive type
   6 bits, the length of the array
   6 bits, the number of bits per item

and then follow the "bit shaved" array entries. The same algo is used for dynamic arrays as well, but the length is actualy stored in the length field of the dynamic record (as usual), not the ShortArray header and we just keep how many bits of the last byte are used. That, along with the bits per entry  number are enough to reconstruct the value. Of course, in this case as well, if the array does not fit in the PropertyRecord even after this "compression", it is stored in the DynamicArrayStore as usual, though now in its bit-shaved form as byte[], meaning less DynamicRecords are used so less waste. This comes at the price of reconstructing the array when reading it in, but the reduced I/O more than makes up for it. A more exact description of the new ShortString, including all the ShortString classes and size limits, as well as the new ShortArray, is available in the manual.

What about the mystery of the missing inUse flag? Well, that is a combination of 2 things. One is that the blocks are marked individually as in use or not, since the API allows for a property to be deleted, and now a property is no longer a record but a collection of blocks. So we folded that into the property type, with 0 signifying not in use. The second is that the blocks are written out defragmented on disk, meaning that if from 3 properties in a record we delete the middle one (set its type to deleted), then only the remaining two will be written. This leads to a simple method of marking "no more properties in this record" by writing a 0 for the 4th byte of the first not-used block (the implementation just writes a whole long). A corollary of this is that a property record that has the 4th byte of the first block 0 is actually not used.

Code walkthrough

I was going to outline the changes/additions at a source code level here, but this post is getting too long. Besides, from the above the code becomes straightforward to follow. If you have any questions, suggestions or would like to small talk about the implementation, drop by our mailing list.

Just a tweaking note here - the logic of when and how allocation of blocks happens and the defragmentation strategy is held in WriteTransaction. Go ahead and experiment with what best suites your use case - feedback on these code paths will be greeted with much rejoice!

Out with the old, in with the new: Migrating the store

Unlike the 4+ billion changes for extended address space changes a while ago, this store upgrade cannot happen in place over an old database. We need to do a true migration, meaning recreating the store from scratch and replacing your existing data files with the new ones. This process is extremely safe: It never writes in your existing data files, it is crash resistant (so if it fails mid-way nothing bad happens) and keeps a backup of your data (under upgrade-backup/ in the database directory). However, better safe than sorry, so it is considered good practice to keep an independent backup of your data.

The store migration process is relatively straightforward - it goes over the node and relationship stores, copying them over as they are and, for each primitive, it reads in the property chains, transforms them in the new format and stores them. That has the side benefit of compacting the property store, skipping over deleted entries, so you should notice a significant reduction in disk usage if you happen to delete lots of properties and not restart often.

All the migration code is bundled in the kernel source in package org.neo4j.kernel.impl.storemigration and can be run both as a standalone tool and as part of normal startup - so no matter if you use the server scripts or just the kernel library, just set the config option "allow_store_upgrade"="true" and you are set to go.

Onwards and upwards

There are more stuff in this release that can fit in a blog post. Long discussions in the community have ended up providing inspiration for substantial changes which not only provide robustness in the current offering but pave the way for more exciting features to come. So, maybe "Boden Bord" is not filled to the brim with obvious new features, but rest assured, we are in for a wild next year.

Thank you for making Neo4j what it is.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

by Chris Gioran (noreply@blogger.com) at November 10, 2011 01:27 PM

Neo4j Blog

Neo4j at the 1st London NOSQL Exchange!

Last week saw the first London NOSQL Exchange organised by Skillsmatter and Neo Technology. And what a success it was, with around 100 attendees and excellent talks throughout the day leading to excellent local beers throughout the evening (and the inevitable local hangover the following morning!). The Neo4j team was out in force, with Jim Webber (me!) chairing and occasionally refereeing the

by Jim Webber (noreply@blogger.com) at November 09, 2011 04:54 PM

Neo4j Blog

The Aftermath of SpringONE

What a whirlwind!After a LOT of hard work, I came out of Chicago extremely happy to have met some great people within Neo and in the Spring community.The week got kicked off with an All Hands meeting, where all 25 of us got together to discuss different departments and strategy. The highlight, of course, was the segway tour!! So fun. And the weather was fantastic, which was lucky.We then had the

by ayeeson (noreply@blogger.com) at October 31, 2011 07:31 AM

Neo4j Blog

Neo4j @SpringONE

Hi all,So I thought it would be worthwhile to let you all know what is going on with SpringONE. So much is happening, I've barely had any time to gather my thoughts to communicate our efforts to the most important part of Neo4j: our community.So what the eff is going on next week in Chicago?First off, we are proud to announce that we are Platinum Sponsors for SpringONE 2GX. After announcing Rod

by ayeeson (noreply@blogger.com) at October 21, 2011 08:17 PM