From GraphAware Framework to GraphAware Hume

· 8 min read

It has been over 8 years since I’ve written the first lines of code for the GraphAware Neo4j Framework as part of my MSc. thesis. That’s when the name GraphAware, as well as the (then) one-man show Neo4j consulting company was born. It is therefore my bittersweet duty to take you on a small trip down the memory lane and announce that we have decided to discontinue the development and support of the Framework and all of its modules.

In this blog post, we briefly cover the bits of Neo4j and the GraphAware Framework history that are essential to understanding the technical reasons for this decision. We then elaborate a little on what this means for the open-source code that is out there. We conclude by suggesting a path forward for open-source community users as well as commercial customers.

A Bit of Neo4j History

Whilst I haven’t worked with Neo4j 0.7 like some of our veteran colleagues, I do remember what Neo4j looked like in 2013. Cypher had a START clause. Nodes didn’t have labels; it wasn’t even a Labelled Property Graph model! And every Neo4j database had a “root” node with ID 0. I have the code to prove it (from GraphAware Framework, 2013):

 /**
     * Find node with ID = 0, or throw an exception.
     *
     * @return root node.
     * @throws IllegalStateException if the node doesn't exist.
     */
    private Node findRootOrThrowException() {
        try {
            return database.getNodeById(0);
        } catch (NotFoundException e) {
            throw new IllegalStateException("GraphAware Framework needs the root node (ID=0) for its operation. Please" +
                    " re-create the database and do not delete the root node. There is no need for it to be used in" +
                    " the application, but it must be present in the database.");
        }
    }

More importantly, prior to Neo4j 3.0 which was released in 2016, there was no way to customise Cypher, since Neo4j didn’t have the ability to implement custom functions and procedures that you could call from Cypher. Users who wanted to customise Neo4j behaviour needed to implement Neo4j server extensions that were called via a REST API they had to expose.

Such customisations included simple things like automatically assigning UUIDs to newly created nodes and relationships and the ability to retrieve them by these UUIDs, as well as more algorithmically complicated tasks such as automatically creating graph structures, like time trees and in-graph audit trails.

Shortly after Neo4j 3.0 was released, APOC was born and changed the game completely. Developers and other Neo4j users were given the ability to choose from a large, ever-growing collection of useful functions and procedures, maintained by Neo4j themselves (Michael Hunger, Stefan Armbruster, and Mark Needham to call out just a few), as well as the vibrant Neo4j community including a few (then young) Neo4j experts from GraphAware. APOC, a vendor-backed, now officially supported library has since been the place to go for custom Neo4j functionality and rightly so.

Neo4j 3.0 also introduced another very important feature - the Bolt Protocol - and a set of Bolt-compatible drivers for mainstream programming languages. Bolt is Neo4j’s binary protocol that facilitates communication between the “client” and the database, client being anything that wants to talk to Neo4j. Prior to Neo4j 3.0, Cypher statements were submitted to Neo4j using its REST API. As a consequence of Bolt being far superior in terms of performance and efficiency, the REST API was destined for a relatively quick and painless death. I may be wrong of course but I don’t think there’s a single Neo4j customer that is still using the REST API to interact with Neo4j.

GraphAware Framework

The GraphAware Framework was intended to help developers build custom Neo4j functionality in a number of ways. The first one was to provide transaction-driven behaviour, i.e., to allow the enrichment and/or the prevention of certain transactions on the database in a very easy way. Examples of GraphAware-provided modules that made use of this capability were the UUID module, the TimeTree module, and the commercially offered Audit and Schema enforcement modules.

The second category of Framework modules benefited from a configurable and quite clever way of executing certain operations on the database regularly, in timer-driven intervals. This allowed users, for instance, to automatically delete certain nodes and relationships using the Expire module.

Both the transaction- and timer-driven modules exposed a REST API. The Framework made it easy to build these APIs, which gave rise to another type of modules that simply existed to introduce new APIs to Neo4j. Examples of such modules included the Reco module for building recommendation engines, or the NLP module that brought Natural Language Processing capabilities straight into the database.

If you still remember the brief Neo4j history lesson at the beginning of this article, it will be apparent that by about 2016, the GraphAware Framework was already obsolete. Whilst it added some Cypher functions exposed by its modules, it was primarily REST API-driven and most of the functionality was better suited for APOC. That’s why we “killed” some of the modules relatively early on and contributed the code the APOC itself. A notable example of such a contribution are random graph generators that live in APOC to this day.

Technical Challenges

From a technical perspective, it became difficult over time to maintain the functionality of the Framework. Neo4j evolves extremely rapidly and keeping up with these changes would require engineering effort way beyond what we could afford to invest into an open-source project.

Making sure that timer-driven modules behave correctly in clustered environments on Neo4j Enterprise Edition became tricky with the introduction of Causal Clustering in Neo4j 3.1. Neo4j took the Enterprise Edition fully commercial since version 3.5 which added a few challenges around the build process, and Neo4j 4.0 with its multi-database features, removal of explicit indexes, and tons of other changes to the internal APIs made the GraphAware Framework asking for a rewrite.

To be clear, most if not all the changes to Neo4j made total sense, we just haven’t managed to keep the GraphAware Framework up to date. We made it compile and we made the tests pass, but we haven’t really embraced the new features and architectural changes.

Speaking of architecture, there is one more important realisation stemming from the fact that the NLP functionality on top of Neo4j started becoming relatively popular. By collecting feedback from our users, we quickly learned that running potentially computationally heavy workloads (such as Natural Language Processing) inside Neo4j’s own JVM isn’t the best idea architecturally. Since any Neo4j plugin/extension, GraphAware Framework included, run in the same JVM as Neo4j by definition, we realised it’s time to look for alternative approaches to scalability.

Business Perspective

From a business perspective, there are two sides of the coin when it comes to the GraphAware Framework. On one hand, it never really caught on as something people would be willing to pay for. Whilst we offered a paid subscription to organisations that wanted enterprise-level support, the interest was too limited for the Framework to be a profitable endeavour.

On the other hand, the GraphAware Framework has served us well in terms of staying up to speed with the latest developments in Neo4j, demonstrating our Neo4j expertise, and providing meaningful contributions to the community. Indeed, it helped acquire interesting and valuable clients, some of whom we actively work with to this day.

The truth is though, the GraphAware Framework has been in maintenance mode for all intents and purposes since the early versions of the Neo4j 3.x series, about 5 years ago. In the software world, that’s a very long time, so much more so in the rapidly evolving world of graph technologies.

Meet GraphAware Hume

Around the time the GraphAware Framework started slowly sunsetting, a new star was being born. We took all the business and technical graph experience we had gained and started building Hume, a modern platform on top of Neo4j, this time aimed at the end user, analyst, and data scientist, rather than the developer.

Five years later, I am very happy to see a mature, standalone, commercially successful, off-the-shelf product that helps our customers all over the world keep countries and communities safe, combat financial fraud, discover new medicines, advise important policymaking, and much much more.

The most popular GraphAware Framework functionality, such as Auditing, Natural Language Processing, and robust, enterprise-ready Neo4j to Elasticsearch replication has been migrated to (or re-written for) Hume. This time around, however, it is 100% up to date with the latest advances in Neo4j, uses official Neo4j drivers and thus the Bolt Protocol, and is build on a vanilla Neo4j architecture that allows its services to be scaled independently of Neo4j itself.

Goodbye Framework

Given our laser-sharp focus on Hume and for all of the reasons above, we have decided to retire the GraphAware Framework and all of its modules. All the repositories have been archived; the code remains freely available for people to explore and use under the GPLv3. The Framework and all its modules work perfectly on Neo4j 3.x series, some of them may actually work on 4.x as well (no guarantees). All our customers have been provided a migration path and our community users are free to fork the repositories and maintain the code internally, or find an alternative.

Alternatives

If you’re a community user of the GraphAware Framework or any of its modules, the following list suggests alternatives:

Conclusion

For many years, the GraphAware Framework was my baby. I have a different baby to focus my full-time effort on and it is called GraphAware, the company. GraphAware Neo4j Framework has served us well over the last 8 years and I hope it’s served you, too. Whilst there are still a few relevant pieces of code, it is the right time for GraphAware to make the tough call and focus all our efforts on Hume. I’d like to thank all my colleagues and community members who have contributed to the Framework over the years, hopefully you learned a thing or two along the way!

Michal Bachman

CEO | Neo4j certification

Michal Bachman, founder and CEO of GraphAware, drives the company's culture, core values, long-term vision, and strategy. With a strong technical background in computer science and engineering, he is passionate about using graph technologies, data science, and machine learning to solve the world's safety and security challenges. Michal leverages his international experience, natural curiosity, and passion for travel and exploration to develop and manage GraphAware's key customer and partner relationships worldwide.