CDC (Change Data Capture) is a well defined software design pattern for a system that monitors and captures data changes so that other software can respond to those events.CDC has many advantages compared to the traditional polling approach : All changes are captured: Intermediary changes between two polls are tracked and can be acted upon Real-time and low overhead: Reacting to CDC events happens in real time and only when changes happen avoiding CPU overhead of frequent polling Loose coupling: CDC send captured changes to messaging brokers, consumers can be added or removed on demand Applications of Neo4j Change Data...
Phonetic matching attempts to match words by pronunciation instead of spelling. Words are typically misspelled and exact matches result in them not being found.Algorithms such as Soundex and Metaphone were developed to address this problem and they have found usage in the areas of voice assistants, search, record linking and fraud detection, misspelled names of things (for example, medical records) etc.Custom analyzersIn 2019, we blogged about creating a Czech analyzer to address accents in the language.With Neo4j 4, a few things have changed. This short blog post was inspired by a StackOverflow question on phonetic searches and resulted in me...
The release of Neo4j 4.0 brought many improvements, one of them being areactive architecture across the stack, from query execution to clientdrivers. But how does that compare to other approaches ? As stated inthe reactive manifesto, areactive system is more scalable and responsive, by having a more efficient resource usage.I was curious to see this in action, and check the benefit. In thisarticle we will take a simple example of copying data from one databaseto another and compare the reactive approach to traditional onesregarding execution speed and resource usage.We’ll use Java to copy data between 2 separate Neo4j instances, all...
Graphs are a perfect fit for IT Operations. Right from dependency management to impact analysis and capacity to outage planning, the interconnectedness of the components that make up networks and services, modelled naturally as a graph enable various teams such as support, help desk and devops to navigate potentially complex relationships.BackgroundThe size of networks has been rapidly increasing and along with it, assets such as applications, services, and devices.IT managers and operations teams have been facing challenges around quick response times and incident analysis due to the inability of traditional databases, such as relational, to process heavily hierarchical and interconnected...
The GraphAware team is excited to release the neo4j-sso third-party security extension, compatible with Neo4j Enterprise, version 4.0 and above.It allows Neo4j Enterprise users to connect with their LDAP, Okta, Google or Azure Active Directory (and many other providers) accounts in a seamless and secure way.It has many benefits from a security perspective, where the two most prominent are : repudiation : it removes most of the need for service accounts where more than one user would use the same set of credentials for connecting to Neo4j, thus tracing every action against a Neo4j server back to a single individual...
GraphTour Europe 2020 started in Amsterdam on February 4, right after the release of Neo4j 4.0, a key milestone in the graph technology landscape. At GraphAware we are very excited about the new features included in this release because they revolutionize the way we approach some common graph challenges. Our CEO, Michal Bachman spoke about this in Amsterdam in his talk “Practical Applications of Neo4j 4.0”, and he and other GraphAware experts continue to present in each of the six cities where we sponsor GraphTour. Find out what’s the next virtual event and register for free.GraphAware and Neo4jAt GraphAware we...
Neo4j has implemented very useful algorithms in order to derive insights from your graph data.The Louvain algorithm for community detection or PageRank for centrality for finding important nodes in your graph are just some examples of them.A couple of days ago, the GDS 1.2 preview was out and is now compatible with Neo4j 4.x, here are the installation instructions :Download the GDS 1.2 library from the Neo4j Download CentreUnzip the downloaded the file and copy the .jar file into the plugins directory of your Neo4j server.Amend the neo4j.conf file in order to allow the gds.* procedures (assuming you have the...
GraphAware Hume helps governments in keeping their countries safe. In this 15-minute video, we demonstrate the use of Hume for contact tracing and smart quarantine in the context of the current coronavirus pandemic. Specifically, we will see how Hume can identify people at risk using actual and potential contact tracing, suggest who should be informed or quarantined, visually explain why someone is at risk, find quarantine offenders, and much more.Hume can do much more than structured data analysis. It is a full blown ecosystem for intelligent systems built upon the combined power of collaborative knowledge graphs and machine learning.Hume’s unique...
You’ve probably seen us already at Amsterdam, Stockholm, Madrid and London.Now, as GraphTour goes digital due to developments around the world with coronavirus (COVID-19), GraphAware stands with Neo4j and we are moving our sponsor booth online for the first time!GraphAware DistributedThe GraphAware team has been distributed globally since year one of its inception and we have always been comfortable working and serving customers remotely. We continue to be spread around the world and take pride in the fact that our colleagues are so diverse.We communicate through many (many, many) channels on Slack, not just about work, but life in general-...
Neo4j 4.0 has just been released with a key feature: graph and sub-graph access control. Access to certain labels or relationship types or properties can now be handled at the database level, resulting in developers not having to deal with complex security logic in their code, and also providing a more consistent and performant solution.Users connecting directly to Neo4j with their Neo4j user credentials either via the browser, or standalone visualisation tools will only have access to the sub-graph as permitted by the role(s) assigned to them.But what about applications that usually abstract away the database user credentials and connect...
It’s been a year since I published the Graph Technology Landscape 2019 post on GraphAware’s blog. I consider this a success story because it got a lot of attention and publicity. The landscape was mentioned many times at different places; it was used by Emil Eifrem in his GraphTour and GraphConnect opening keynotes, it was displayed in conference halls, and I received many, many useful comments and feedback. I was even invited to Rik van Bruggen Graphistania Podcast to talk about it, and the episode was referred to in the Top 5 Neo4j Podcasts of 2019 blog posts as well....
Up until version 4.0, Neo4j has supported only one active database per server instance. As such, achieving multi tenancy meant that either a Neo4j instance had to be deployed per tenant, or all tenant graphs co-existed in the same database.The first option meant a lot of extra infrastructure and maintenance, and the second implied some custom partitioning strategy usually achieved by differentiating tenants by labels or properties- a mechanism fraught with risk and mostly never preferred.Neo4j 4 allows you to use more than one active database at the same time, where each database defines a transaction domain and execution context,...
Many, many years ago, I requested for the Cypher UNION clause in Cypher and Andres Taylor graciously added it.This was followed by the request for Post-Union Processing by Aseem Kishore, and it began to collect a whopping 99 comments over the course of time.It is exciting to see support for a subset of subqueries in openCypher i.e. uncorrelated subqueries in the soon to be released Neo4j 4, bringing post-union processing finally to Cypher.Given its history, a short article is in order.Union in 3.xIn pre-4x versions of Neo4j, UNION served to combine the results of 2 or more queries into one...
So you have followed the Deep Dive into Neo4j’s Full Text Search tutorial, learned even how to create custom analyzers and finally watched the Full Text Search tips and tricks talk at the Nodes19 online conference?Still, searching for boat does not yield results containing yacht or ship, and you’re wondering how to make your search engine a bit more relevant for your users?Don’t go any further, you’ll learn how to do it, now!SynonymsA synonym is a word or phrase that means exactly or nearly the same as another word or phrase.Why synonyms ?It’s all about recall! In other words, to...
GRANDstack tips and tricksUsing GRANDstack can rapidly accelerate the development of applications. The neo4j-graphql-js library provides the ability to translate GraphQL queries from the frontend to Cypher queries. This is achieved by defining the GraphQL schema and annotating it with a few extra directives. If you want to get familiar with the GRANDstack you can visit their documentation.This post will present some more advanced tips and tricks for using the neo4j-graphql-js library that we found useful for real world applications. It will focus on overcoming some of its limitations or adding missing features. It will show you how to unset...
For the Global GraphHack 2019, we extended JMeter to support the Bolt protocol and do load testing on Cypher queries.
We have already blogged about fulltext search available in Neo4j 3.5. The list of available analyzers covers many languages and fits various use cases. However once you expose the search to real users they will start pointing out edge cases and complain about the search not being google-like.Speakers of languages using accents in their written form quite often leave out the accents. This has various reasons, the most common ones are historical, when different character encodings caused problems and users find it hard to change their habits using a different default keyboard layout (e.g. en_US); switching the layout just for...
The Cypher query planner is quite advanced and mature, and you can mostly rely on it to pick the best plan for your query. However, there are rare cases, or bugs, that might want you looking for ways to influence that plan. This article demonstrates practical usage of an index hint. Note that all queries were tested against Neo4j Enterprise 3.5.8The graph modelThis is the relevant portion of the graph model that is sufficient to demonstrate the issue.Simple enough- we have many tweets, and tweets have keywords.Our graph has two indexes, one on the value of the Keyword, and the...
Neo4j Desktop, part of the Neo4j Graph Platform, is a client application that installs on your desktop OS. It lets you get started quickly by downloading and installing the enterprise edition, and supported plugins. You can group related graphs and applications under a Project. You can also build single-page web applications that run within Neo4j Desktop and have access to these services provided by Neo4j Desktop. There are a number of apps available at https://install.graphapp.io/In this blog post, we will build a very simple graph app using vanilla javascript.All code in this blog post is available at https://github.com/aldrinm/simple-graph-app and https://github.com/aldrinm/simple-graph-app-npmHello,...
“Lateral thinking” was a big topic back in 2004 when I was in the Network Operations Center (NOC) business; one definition is: “(lateral thinking) is the solving of problems by an indirect and creative approach, typically through viewing the issue in a new and unusual light.”If it works, don’t touch itBut the world of NOC operations, and generally IT Operations was anything but creative, not because we didn’t appreciate innovation per se, but because we valued reliability, consistency an uptime above all things, and those outcomes are the result of a long tradition in IT of approaching change, the natural...
This is the second of a two post series on monitoring the Neo4j graph database with popular enterprise solutions such as Prometheus and Grafana. Monitoring the status and performance of connected data processes is a crucial aspect of deploying graph based applications. In Part 1 we have seen how to expose the graph database internals and custom metrics to Prometheus, where they are stored as multi-dimensional time series.It is now time to query those metrics and render results in a beautiful, integrated Grafana dashboard. This will help you establish 24/7 monitoring and alerting of your Neo4j setup so that you...
Database Monitoring is a crucial aspect of any application deployment. After all, databases manage data and sit quite down in the stack. They are robust pieces of software, but their setup and maintenance need care and attention since any problem has the potential to be disruptive to business.
There is one common performance issue our clients run into when trying their first Cypher queries on a dataset in Neo4j. When writing a query, be sure that it doesn’t match any cycles, or you can experience unpleasant surprises.Assume the following sample graph and simple query:CREATE (a:Node {name: "A"}), (b:Node {name: "B"}), (c:Node {name: "C"}), (a)-[:TO {name: "1"}]->(b), (a)-[:TO {name: "2"}]->(b), (a)-[:TO {name: "3"}]->(b), (b)-[:TO {name: "4"}]->(c)MATCH p=({name: "A"})-[*..10]-({name: "C"}) RETURN pThe query returns 9 paths, instead of 3 as you might have guessed! The additional 6 paths have length 4 with node pattern A-B-A-B-C, note the repeated nodes A...
When developing web applications with frameworks like Vue.js the best approach is to subdivide it into well-defined and reusable components for the user interface, with the business logic being encapsulated in ‘services’.In an ideal world every new feature added should follow this approach and all components and services should remain at a reasonable size, so that you can quickly glance at each file and understand what it does.The problemThe reality is usually much less ideal: good principles are followed rigorously when the project is young, but as new features, bug fixes and different developers accumulate, components and services can become...
Dependencies, like graphs, are everywhere. Achieving a goal is rarely possible in a vacuum and requires collaboration between individuals and/or processes.Eliminating dependencies completely is unrealistic- they are a part of life- but they can be streamlined to improve efficiency and reduce friction.In this blog post, we use the example of software projects, but dependency management can very well be applied to many verticals such as supply chains, business processes, inventory management and government processes and workflow.Quite a few organizations struggle as the time draws close to releasing or delivering a version of their software projects.For many, it is a time...
Few years ago I decided that one day I would create a Graph Technology Landscape map, which would be useful for everyone who wants to discover the players around graph technologies. I started to collect the companies and products, but my research has never manifested into a proper blog post. Till now. I am happy to announce, that the first version of my landscape is published, I hope we can consider this as a start of a long journey.
In this blog we will go over the Full Text Search capabilities available in the latest major release of Neo4j.Contrary to our usual blogs, the content will rather focus on the underlying search engine used by Neo4j, that is Apache Lucene in version 5.5.5 .What exactly is Search ?Search is an interaction between a user and a search engine. The user has an information need at hand and attempts to satisfy it by providing a search with adequate constraints.The search engine uses those constraints to collect matching results and return them to the user.What is a Search Engine ?A search...
2018- it’s been such a whirlwind of activity at GraphAware, and we’re so proud of everything we’ve accomplished this year.In fact, we grew and grew, announcing ourselves in Australia and then, later in the year, expanding into the Americas.“Neo4j is one of the most disruptive and transformative technologies I have seen in my career,” said Kyle McNamara, CEO, Americas. His team are well on their way to increasing GraphAware’s presence and strengthening the already close bond we have with Neo4j.Over in Australia, various government entities have showed keen interest in auto-classification, simplifying organisational movement, enriching original documents, and security and...
Automated testing is the cornerstone of any successful software project.Applications using the Neo4j database are no exception. This blog postshows how to use the Neo4j Dockerimage and the Testcontainerslibrary for integration testing inJava using JUnit.This blog post shows examples in Java. Testcontainers library has beenported to many other languages so the same approach and principles canbe applied. Check out theTestcontainersgithub page.MotivationNeo4j already provides a testing harness to start a temporary databasewithin tests, either manually or through a JUnit rule. To use thisharness one must include theneo4j-harnessmaven artifact, together with whole Neo4j database as a testdependencyto the project. This inevitably pollutes...
Do you think there is no space for a graph database in your company? Or it would be a huge effort to integrate a graph database into your product? I have to tell you: You can use a graph database like Neo4j without touching your product, and you can use it for managing your company’s knowledge as well as to improve your software development process. So, even if your business problem is not inherently graphy (hard to believe in 2018), there are a few reasons why you should think about your environment as a graph.Without knowing your core business, I...
IntroductionIn the bucket filling problem you are given two empty buckets, each of a certain capacity, and a large supply of water. By filling, emptying and transferring water between the two buckets, you must try to end up with a situation where one of the buckets contains a required volume of water, or where both buckets together contain the required volume.One popular instance of the problem has two buckets with a capacity of 5 and 3 litres respectively, with the goal being to find a solution where one bucket contains exactly 4 litres. One solution is shown below: Step Action...
The GraphAware Audit Module seamlessly and transparently captures the full audit history who, when, and how a graph was modified.A demonstration of the audit module is best viewed during live interactions with Neo4j, which is why GraphAware has published this screencast on the GraphAware YouTube channel. A textual summary of the screencast, this post will quickly introduce modules, discuss the GraphAware Framework, and then dive into the audit module.What is a module?Modules are small, custom built software packages, also commonly called plugins, that live inside of Neo4j and enhance it to provide advanced functionality. The type of advanced functionality that...
Enterprise IT requirements are demanding and solutions are expected to be reliable, scalable, and continuously available. Databases accomplish this through clustering, the ability of several instances to connect and conceptually appear and operate as a single unit.While Neo4j’s clustering is well documented, for exploration and learning it can be helpful to get a cluster up and running as quickly as possible. This post demonstrates how to use Docker to have a Neo4j causal cluster up and running in a matter of minutes.Neo4j Causal ClusteringCausal clustering, introduced as a cornerstone feature of Neo4j 3.1, enables support for ultra-large clusters and a...
What a year it’s been for all of us at GraphAware!We travelled around the world, starting on a great note in Bangalore, India and winding up in Ecuador, teaching,consulting, building cool applications; always spreading graph love.Here are some of the highlights of 2017.Neo4j loveGraphAware continued to be a premier Neo4j Solutions Partner, closing license sales around the world, conducting Neo4j trainings, providing support to Neo4j customers and being an instrumental part of the Neo4j community and ecosystem.Frantisek and Nicolas, core committers to Neo4j OGM and Spring Data Neo4j, made huge improvements to both projects, resulting in a major milestone, Spring...
Spring and Spring Boot have become the Swiss Army knife of Java software development, offering dozens of useful modules across a wide range of concerns.One such module is Spring Boot Actuator, a sub-project of Spring Boot, that offers built-in, production-grade functionality to help monitor and interact with an application. Numerous endpoints are included that provide a wealth of information that, among others, include auditing, configuration, environment, and health details.The /health EndpointOne particularly useful endpoint is /health, which displays information on the application’s overall health and can be used by monitoring software to generate alerts should a system component become unavailable....
We are becoming increasingly dependent on technology. Yet, without diligent attention paid to cybersecurity, technology is vulnerable to unauthorized access, change or even destruction. These vulnerabilities pose threats to our individual and collective safety, security and human and economic well-being.Cybersecurity is therefore a vitally important global issue with substantial consequences that depends on safe, stable, and resilient security of our data, devices, and systems.Equifax: The BreachThe latest major cybersecurity incident was publically revealed when Equifax, a US-based consumer credit reporting agency that assimilates and analyzes the financial health of more than 820 million consumers and 91 million businesses globally, announced a...
The success of many enterprises greatly depends on their ability to gather useful information and process it in a timely manner. Automation is essential and so is presentation, giving tangible feedback, to decision makers. This is where technology reaches out to management, where science and design are combined to put the right people in the position of making better and more sustainable choices.Decision making is often a very complex task. Outcomes depend on a multitude of interactions between variables, actions, and actors at play in multiple contexts. While emphasis on individual aspects or subsets of interactions is very important, information...
Companies of any size have to manage and access huge amounts of data providing advanced services for their end-users or to handle their internal processes. The greater part of this data is usually stored in the form of text. Processing and analyzing this huge source of knowledge represents a competitive advantage, but often, even providing simple and effective access to it is a complex task, due to the unstructured nature of the textual data. This blog post will focus on a specific use case: provide effective access to a huge set of documents - later referred as a corpus -...
A book tells us a story, but for a computer it is a wall of text. How can we use graphs and NLP to help our machines make more sense of a story?Our example comes from the A Song of Ice and Fire books, aka Game of Thrones. We converted the e-books (epub) to text-files and used a small python program to split them into chapters, paragraphs, and sentences.So a book turned into this model :GraphAware NLPGraphAware NLP Framework is a project that integrates NLP processing capabilities available in several software packages like Stanford NLP and OpenNLP, existing data sources,...
“Relevance is the practice of improving search results for users by satisfying their information needs in the context of a particular user experience, while balancing how ranking impacts business’s needs.” [1]Providing relevant information to the user performing search queries or navigating a site is always a complex task. It requires a huge set of data, a process of progressive improvements, and self-tuning parameters together with infrastructure that can support them.Such search infrastructure must be introduced seamlessly and smoothly into the existing platform, with access to all relevant data flows to provide always up-to-date data. Moreover, it should allow for easy...
Since our first post a few months back, Neo4j-Databridge has seen a number of improvements and enhancements. In this post, we’ll take a quick tour of the latest features.Streaming EndpointAlthough Databridge is primarily designed for bulk data import, which requires Neo4j to be offline, we recently added the capability to import data into a running Neo4j instance.This was prompted by a specific request from a user who pointed out that in many cases people want to do a fast bulk-load of an initial large dataset with the database offline, and then subsequently apply small incremental updates to that data with...
Recommendation engines are a crucial element in the global trend towards a push-based web experience and away from a pull-based one. They provide the ability to personalize content offered to each user by predicting the interest the user will have in the recommended items. This is not only a powerful business tool for content providers, but also a vital improvement to the user experience. In today’s world where the volume, interdependence, variety and speed of information is overwhelming, recommendation engines can significantly reduce the gap between us and what we search for. Indeed, these engines are used even to enhance...
Last month, the 5th edition of GraphConnect San Francisco took place at the Hyatt Regency SF. It was the biggest graph technology event ever and GraphAware proudly contributed as a sponsor, with one main talk, two lightning talks and our GraphHero stand 。^‿^。 This edition’s big announcement was the upcoming new landmark release of Neo4j 3.1, “The database for the connected enterprise”, which introduces a new state-of-the-art clustering architecture and new security architecture to meet enterprise requirements for scale and security. There will be a lot to say about this release, but you can already try the beta release as...
IntroductionUntil now, Neo4j users wanting to import data into Neo4j have been faced with two choices: Create Cypher statements in conjunction with Cypher’s LOAD CSV or use Neo4j’s batch import tool.Each of these approaches has its strengths and weaknesses. LOAD CSV is very flexible, but you need to learn Cypher, it struggles with large volumes of data and is relatively slow.On the other hand, Neo4j’s batch import tool is extremely efficient at processing large data volumes. You don’t need to know any Cypher, but the input files usually need to be manually generated beforehand. Being a simple CSV loader, it...
In recent years, the rapid growth of social media communities has created a vast amount of digital documents on the web. Recommending relevant documents to users is a strategic goal for the effectiveness of customer engagement but at the same time is not a trivial problem.In a previous blog post, we introduced the GraphAware Natural Language Processing (NLP) plugin. It provides the basis to realize more complex applications that leverage text analysis and to offer enhanced functionalities to end users.An interesting use case is combining content-based recommendations with a collaborative filtering approach to deliver high quality “suggestions”. This scenario fits...
Previous articleshave shown you how easy using Spring with Neo4j can be. Now the next release of Spring Data Neo4j (SDN), we are going to make this even easier!This post is first in a series that will explore the exciting improvements that will be available in the first candidate releaseof SDN 4.2, but these are already available in the current snapshot.The main highlights we will be covering include: Brand new Spring configuration method. Tighter integration into Spring transactions with support for transactional event listeners and read only transactions. Paging and sorting support for custom queries. Ability to attach multiple labels...
Whether you realize it or not, the software you create has a global market. Perhaps more so than any other product in any other industry,code that may start as a small, individual effort has the potential to rapidly blossom into a product used around the world.While it is not always obvious that your application can or will have such wide usage, it is in your best interest to maximizethe number of organizations and people you can reach. This means it is important to ensure your software is internationalized and localized.Internationalization and LocalizationInternationalization is the process of developing products in such...
Without question, Github is the biggest code sharing platform on the planet. With more than 14 millions users and 35 million repositories, the insights you can discover by analyzing the data available through its API are surprising and revealing.I’ve been very passionate about this data for a long time, as much as I am about Neo4j. Importing this data into a graph database can reveal interesting new information.Two years ago I created a Github Gist with some interesting queries on that dataset. I showed many more queries at several conferences since.Recently, I came across mention-bot from Facebook, which analyses the...
During GraphConnect San Francisco 2015, we introduced the concept of Graph-Aided Search and released the first module providing Neo4j data replication to Elasticsearch.Some months later, the second part was released as an Elasticsearch plugin providing advanced personalized search using Neo4j as source of external knowledge, which, combined with the former module, constitutes a complete bidirectional integration with Neo4j, taking advantage of the strengths of both technologies.The first version of the neo4j-to-elasticsearch plugin had a simple approach for defining which nodes should be indexed. After a while the need for more flexibility arose.Based on our experience using the plugin and valuable...
In the Bersin Predictions for 2016 report, Josh Bersin states that “it feels as though everything in the world of talent is changing – from the way we recruit and attract people, as well as how we reward them, to the way we learn, and how we curate and manage our entire work-life experience”[1].In fact, the last decade has witnessed much churn in the human resources and talent management software space to developholistic solutions for employee engagement and retention. HR continues to grapple with the problem of seeming to knowmore about people outside their companies rather than inside. The workforce,...
A great part of the world’s knowledge is stored using text in natural language, but using it in an effective way isstill a major challenge. Natural Language Processing (NLP) techniques provide the basis for harnessing this huge amountof data and converting it into a useful source of knowledge for further processing.IntroductionNLP is used in a wide variety of disciplines to solve many different types of problems. Analysis is performed on textfrom different sources, such as blogs, tweets, and various social media, with size ranging from a few words to multiple documents.Machine learning and text analysis are frequently used to enhance...
In our previous blog postwe introduced the concept of Graph Aided Search. It refers to a personalised user experience during search where theresults are customised for each user based on information gathered about them (likes, friends, clicks, buying history, etc.).This information is stored in a graph database and processed using machine learning and/or graph analysis algorithms.A simple example is the LinkedIn search functionality. If we were typing “Michal” in the text input, it would obviouslyreturn people where the name matches and order them by full text relevancy with some fuzziness:Lucene-based search engines such as Elasticsearch and Solr offer impressive performance...
At GraphAware, we live and breathe Neo4j. For three years, we have been helping customers around the world embrace thisamazing technology as a solution to many interesting problems. Mainstream applications of graphs, such as real-timerecommendations, fraud detection, impact analysis, and graph-aided search, have been getting a lot of media attention.In the run up to GraphConnect Europe 2016, we would like to illustrate that graphs are truly for everyone by going oversome of the less obvious, though equally interesting and intellectually stimulating use cases that we have come across.Rules EnginesWhether you’re a startup building a mobile application that will help people...
As of version 2.1, Neo4j OGM will support persistence events. Although a date for the release of 2.1 isn’t known at thetime of writing, we think this is an important and exciting new feature and so we’ll be writing a series of posts aboutit over the next few weeks to whet your appetites. In this first post we’ll take a quick tour of the new Events mechanismin the OGM, and provide some examples of how we might use it in our own applications. But first, some background…OGM persistence strategyBy design, the OGM has a deep persistence strategy. This means that...
Spring Data Neo4j 4.1 introduces the ability to map nodes and relationships returned by custom Cypher queries to domain entities. This blog post will explain how different types of query results map to entities.But firstThere are two things to keep in mind- This functionality is only available in SDN 4.1 and Neo4j-OGM 2.0 SDN/Neo4j-OGM does not modify your Cypher queries at runtime and it can only map what the query returnsThe BasicsThis simple example shows a custom Cypher query that returns a single node entity and a calculated value: @Query("MATCH (user:User) WHERE <complex conditions> RETURN user, calculatedValue") UserResult findUserByComplexCondition(String param);and...
For most organisations, data security is extremely important. The topic comes up every single time we are training, consulting,or otherwise engaging in the world of graphs and Neo4j. At the same time, security is very difficult and time-consuming to get rightand the implications of getting it wrong can be serious. In this blog post, we introduce the integration of Spring Securityinto Neo4j which provides important security controls and mechanisms for enterprises and governments that make use of theworld’s most popular graph database.Security in Neo4jNeo4j comes with certain security mechanisms out of the box. These include HTTPS support,single-user authentication with all-or-nothing...
At GraphAware, we help organisations in a wide range of verticals solve problems with graphs.Once we come across a requirement or use case two or three different times, we typically create an open-source Neo4j extensionthat addresses it. The latest addition to our product portfolio, introduced in this post, is a simple library that automaticallyexpires data from the Neo4j graph database.GraphAware FrameworkOpen-sourcing useful extensions helps us deliver solutions faster, lets people who prefer a DIY approach be more productive, andgives us valuable community feedback. That’s how our most popular products, such as the GraphAware Recommendation Engine,TimeTree, UUID, and others were born.They...
Our previous article demonstrated how easy it was to build an application using Spring Data Neo4j 4.The first milestone of Spring Data Neo4j 4.1 has just been released (based on Neo4j OGM 2.0), and it delivers significant performance improvements for write operations, the ability to map nodes and relationships returned in custom Cypher queries to domain entities, as well as the much awaited support for embedded Neo4j.The new Components framework in Neo4j OGM 2.0 allows you to configure your application by specifying which driver you want to use to connect to Neo4j.Currently supported are the Http and Embedded drivers. A...
We are delighted to invite you to a Meetup on 4th February 2016 at 6:30 pm at GraphAware London office where Michal Bachman is going to present the European premiere of his talk entitled “Real-Time Recommendations and the Future of Search” combined with a unique expert panel discussion and Q&A.Read more and sign up here.Time & Place4th February 2016 at 6:30 pm </br>GraphAware London office</br>133 Great Suffolk StreetLondon SE1 1PPUnited Kingdom</br>
This guide (first published on Airpair) will get you up and running with Spring Data Neo4j 4 in under an hour.It is based on a live application, Flavorwocky, the winner of the Neo4j Heroku Challenge 2012. Rewritten to use Spring Data Neo4j 4, the code is open source and available on Github.Introducing Spring Data Neo4j 4Neo4j is the world’s most popular graph database. With ACID guarantees and the ability to scale to billions of nodes and relationships, Neo4j is the preferred choice for modelling highly connected domains.Spring Data Neo4j is part of the Spring Data initiative and simplifies development using...
Iterating over large numbers of nodes using Cypher is quite a common use case in Neo4j. Typically, the reason for doing thisis that we want to perform some kind of operation for each one of these nodes. In this blog post, we will use one millionTestNodes and try to iterate over them in order to index their contents into a freshly created Elasticsearch index.There are three approaches we can take, two of which are quite common, but the most performant technique is largely unknown.First Technique : SKIP and LIMITUsing SKIP and LIMIT is the first approach that comes to mind,...
Last month, I had the pleasure of speaking at GraphConnect in San Francisco, introducing the Graph-Aided Search to alarge audience of Neo4j users and graph enthusiasts. For those who missed the conference, the recording and slides havenow been made available. Enjoy and get in touch with feedback / questions!VideoSlides Real-Time Recommendations and the Future of Search from GraphAware
Recently, Neo Technology announced the 2.3.0-RC1 release of their Neo4j graph database. One of the key new features is TriadicSelection built into Cypher’s Cost Based Planner. In this blog post, we will explore the Triadic Selection in detailand demonstrate how significantly it can speed up recommendations computed in Neo4j.What is Triadic Selection?A Bit of Theory: Triadic ClosureNetworks or graphs can rarely be considered static structures. On the contrary, often they seem to be ever-evolving objects.Any social network, for example, is often the most dynamic of graphs: at any moment, new relationships are created between existing nodes, other relationships vanish,new nodes...
For the last couple of years, Neo4j has been increasingly popular as the technology of choice for people building real-time recommendation engines. Having been at the forefront of the graph movement through clientengagements and open-source software development, we have identified the next step in the natural evolution of graph-based recommendationengines. We call it Graph-Aided Search.Recommendations EverywhereAt first glance, it may seem that graph databases are only good for social networks but it has been proven over and overagain that the variety of domains and industries that need a graph database to store, analyse, and query connected datacould not be any...
Drawing a graph on a whiteboard is easy and fun! Translating that graph into an object model can sometimes result in questions such as “do I have to define relationships in both participating node entities?”or “which end of the relationship should I save?”.Your object model is key when using an object graph mapper such as Neo4j OGM. The Neo4j OGM library is the magic behind Spring Data Neo4j 4so this article applies to both Neo4j OGM and SDN 4.We’ll be using the ubiquitous movies domain to explain some common models.Bidirectional NavigationThe simplest object model is also the one that represents...
Writing integration tests for your code that runs against Neo4j is simple enough when using the native API, but there’snot a great deal of help out there if you’re working in client-server mode. Making assertions about the shape of thegraph can also be difficult, particularly if use cases involve more than a few nodes and relationships.In all but the most simple of scenarios, it can be hard to see why the graph in the test database isn’t as expected,as the feedback is often as black-and-white as pass/fail so you can be looking for a proverbial needle in a haystackwhen trying...
In this blog post, we’ll demonstrate how to use variable length relationships (sometimes called “variable length paths”)in Cypher using examples. We will also see when zero length relationships can be useful.IntroductionLet’s start with the basics. For the sake of the blog post, our use case will be users that know other users. Userswrite blog posts modeled as linked lists:You can generate an example graph with the following link to a predefined Graphgen graph, oruse this Neo4j Console if you want to execute the queries whilst reading the blog post.Basic Relationships MatchingLet’s start with a basic query that will find a...
GraphAware is very proud to sponsor GraphConnect Europe 2015, the only conference thatfocuses on the rapidly growing world of graph databases and applications that make sense of connected data. The conferencetakes place in London on 7th May 2015.Throughout the week, London turns into the world’s capital of graphs. We are extremely excited to welcome our Bruges,Prague, and Mumbai based colleagues in our London headquarters. We are very much looking forward to meeting our friendsfrom Neo Technology from UK, Sweden, USA, and the rest of the world, as well as other valued members of the worldwidegraph community.As an important member of...
At GraphAware, we are very excited about the recently released Neo4j 2.2 and would like to share some info about whereyou can meet us in the next few weeks and months. Come and see us for a chat and learn something new about Neo4j and Graph Databases! On 6th April, Luanne is running a Neo4j Fundamentals training in Bangalore On 29th & 30th April, Vince is speaking about Spring Data Neo4j at Spring I/O in Barcelona On 30th April, Christophe is speaking about Neo4j at a Symfony meetup in Antwerp On 4th May, Michal is doing a Recommendation Engine Webinar...
Over the last few months, GraphAware, Neo4j, and Pivotal engineers have been workingon a ground-up reimplementation of Spring Data Neo4j (SDN) that is server-first and Cypher-centric. Today we are veryexcited to announce the first milestone of the new Spring Data project for Neo4j.Server-first!While Neo4j has the ability to run embedded or as a regular server-side database, a lot of users favor traditionaldeployments where the database can scale independently of application servers. Neo4j server has provided the capabilityto do this for some time now but when the original version of SDN was written, it was designed to target Neo4jin embedded (in-process)...
Last weekend, I came across a tweet announcing that Wikimedia released the dataset of the page clickstreamsfor February 2015. I found it interesting to download this dataset and see how people arrive on the Neo4j’s Wikipedia page.The data is quite simple; we have page entities that relate to other pages. A page can either be a Wikipedia page, ora non-Wikipedia page such as Google. Relationships can represent a user click from a Wikipedia page to another page, or a user searching on Google or Wikipedia. The number of times an event occurs is also provided in the dataset.Importing the DatasetYou...
Our earlier blog posttalked about using the Neo4j web browser along with embedded Neo4j.The WrappingNeoServerBootstrapper which was employed to do this has been deprecated for a while and it raises questionsabout the alternative.Testing server extensions is now possible(see Testing your extension)but there is still another use case for wanting to connect to an embedded graph - troubleshooting.Perhaps you have a web application running in production with Neo4j in embedded mode and you’ve got to troubleshoot an issue which requires access to the graph - how do you do this against a live graph?Well, there is another option which won’t give...
A common question when planning and designing your Neo4j Graph Database is how to handle “flagged” entities. This couldinclude users that are active, blog posts that are published, news articles that have been read, etc.IntroductionIn the SQL world, you would typically create a a boolean|tinyint column; in Neo4j, the same can be achieved in thefollowing two ways: A flagged indexed property A dedicated labelHaving faced this design dilemma a number of times, we would like to share our experience with the twopresented possibilities and some Cypher query optimizations that will help you take a full advantage of a the graph...
There is no better way to start 2015 than to learn something new. In the wake of two recent major announcements (here and here),Neo4j is as hot as ever, so it might well be the next skill you pick up or improve. Here’s a list of Neo4j events organisedby GraphAware around the world in the next few weeks. We’ll be delighted to see you there! On 17th January, Luanne is running a Graph Data Modelling training in Bangalore On 19th January, I’m speaking about Recommendation Engines at Neo4j Expert Talks in Berlin On 21st January, Christophe is showing off Graphgen...
There are times when you have an application using Neo4j in embedded mode but also need to play around with the graphusing the Neo4j web browser. Since the database can be accessed from at most one process at a time, trying to start upthe Neo4j server when your embedded Neo4j application is running won’t work. The WrappingNeoServerBootstrapper,although deprecated, comes to the rescue. Here’s how to set it up.Maven Dependencies<dependency> <groupId>org.neo4j</groupId> <artifactId>neo4j</artifactId> <version>2.1.5</version></dependency><dependency> <groupId>org.neo4j.app</groupId> <artifactId>neo4j-server</artifactId> <version>2.1.5</version></dependency><dependency> <groupId>org.neo4j.app</groupId> <artifactId>neo4j-server</artifactId> <version>2.1.5</version> <classifier>static-web</classifier></dependency>Start the WrappingNeoServerBootstrapperpublic static void connectAndStartBootstrapper() { WrappingNeoServerBootstrapper neoServerBootstrapper; GraphDatabaseService db = new GraphDatabaseFactory() .newEmbeddedDatabaseBuilder("/path/to/db").newGraphDatabase(); try { GraphDatabaseAPI api = (GraphDatabaseAPI) db;...
Last month, I had the pleasure of speaking at GraphConnect in San Francisco, introducing the GraphAware Framework to alarge audience of Neo4j users and graph enthusiasts. For those who missed the conference, the recording and slides havenow been made available. Enjoy and get in touch with feedback / questions!VideoSlides GraphAware Framework Intro from Michal Bachman
Specialist in Neo4j consultancy, training, and software development, Graph Aware Ltd has been selected as one of NeoTechnology’s first UK solution partners, under its newly launched partnership program.Neo Technology is the creator of the world’s most popular graph database, Neo4j, and has selected GraphAware to provideits customers with support from introduction through to full integration of Neo4j, into their enterprise architecture andapplications. In addition, GraphAware is now authorised to offer Neo4j subscriptions directly to customers.With access to GraphAware’s Neo4j experts throughout the implementation and integration process, customers are muchbetter positioned to fully utilise all the business and technical benefits of...
In this post, we’d like to introduce the first version of the GraphAware Neo4j ChangeFeed - a GraphAware Runtime Modulethat keeps track of changes made to the graph.GraphAware ChangeFeed ModuleEvery time a transaction commits successfully, all changes made to the graph as a result of the transaction are recordedas a change set. A change includes additions, modifications and deletions of nodes, relationships, labels, and properties.Each change set has a UUID, a set of changes that occurred in the same transaction, and a timestamp which is allocatedat the time the transaction starts committing.The module can be configured to limit the total...
Modelling and querying time-based events in a graph is a fairly common discussion topic and a frequently asked questionon Q/A sites. In this blog post, we evaluate some of the common approaches and introduce GraphAware TimeTree, a GraphAware Framework Module that simplifies modelling time and events in Neo4j.Naive ApproachNeo4j has no notion of a Date/Time data type, so you have to decide to store the timestamp either as a long, or asa human-readable String, for instance formatted as ‘YYYY-MM-DD HH:mm:ss’. Unless the time is only for human eyes, though,we recommend opting for the machine readable (long) approach. It is simply...
In the first part of this short series aboutrandom graph models, we talked about why they are useful and had a brief look at two of them: Erdos-Renyi graphs andBarabasi-Albert model. In this post, we take a look at the “small world” phenomenon and another network model, namelythe Watts-Strogatz model.Small WorldThere is an important property of random networks which we did not write about in the last blog post: the way thenode separation scales with network size. Both Erdos-Renyi and Barabasi-Albert networks are “small world” models, meaningthat the characteristic node separation scales logarithmically with number of nodes present in the...
With MERGE set to replace CREATE UNIQUEat some time, the behavior of MERGE can sometimes be tricky to understand.MERGEHere’s a summary of what MERGE does: It ensures that a pattern exists in the graph by creating it if it does not exist already It will not use partially existing patterns- it will attempt to match the entire pattern and create the entire pattern if missing When unique constraints are defined, MERGE expects to find at most one node that matches the pattern It also allows you to define what should happen based on whether data was created or matchedThe key...
Efficient counting of relationships in Neo4j was the cornerstone of my Master Thesisand the reason the very first GraphAware Frameworkmodule called the Relationship Count Module was born. The improvements in Neo4j 2.1around dense nodes and the addition of getDegree(…) methods on the Node interface made me eager to do some benchmarking around relationship counts again.The improvements in Neo4j 2.1, indeed, make the RelCount module slightly less useful. We’ve decided, however, not to makeit obsolete, since it is a useful reference implementation of a GraphAware Framework Transaction-Driven Runtime Module.Instead, we ported the module to Neo4j 2.1 and reused all the performance...
When one obtains a graph data from a measurement on a real world network, it is sometimes useful to make comparison witha random graph. Such graph is characterised by certain degree distribution, which you can imagine to be a list of degreesof nodes present in the network. The most interesting distributions have certain functional dependence which allowsone to infer what processes are dominant in formation of the network. The processes consequently characterise therelationships between the nodes.Why would one care about such analysis?Imagine a group of customers that you want to target efficiently in a certain fashion. Say, for instance, that...
One of the main goals of the GraphAware Framework is to simplify andspeed up development with Neo4j. Although it is called a “framework” for reasons explained elsewhere, today we willsimply treat it as a library of useful, tested, and documented Java code. The feature we will introduce is calledImproved Transaction Event API, which is exactly what it says on the tin.MotivationNeo4j requires every mutating operation on the graph to be run in a transaction, which is great, because it keeps yourdata safe. Every operation is atomic, consistent, isolated, and durable. As a bonus, Neo4j gives you the opportunity toreact to...
A couple of days ago, I wrote about unit testing with GraphUnit.GraphUnit tested the state of an embedded Neo4j database. What if you run Neo4j in standalone server mode?Fortunately, you can still test it and match subgraphs using the GraphAware Neo4j RestTest library.SetupGrab the GraphAware Neo4j Framework and GraphAware Neo4j RestTest jars. Drop them into the plugins directory of your Neo4j installation and restart the server to be able to use the API’s.How to use itAll you have to do is POST your cypher to http://your-server-address:7474/graphaware/resttest/assertSameGraph orhttp://your-server-address:7474/graphaware/resttest/assertSubgraph to verify the state of your graph.Emptying the database before the next test...
Testing the state of an Embedded Neo4j database is now much easier if you use GraphUnit, a component of the GraphAware Neo4j Framework.I tried replacing an existing Flavorwocky unit test with GraphUnit to check out the benefits.Let’s walk through a before-after case study.The testThe unit test in question is the one to test that a Pairing is saved correctly. A pairing must have exactly two ingredients. Each Ingredient node has a name; a Pairing has an affinity and an array of allAffinities (all affinities ever assigned to the pairing).BeforePseudocode to test that my code saved a pairing correctly in Neo4j:...
Today, it is exactly one year ago since Graph Aware Limited was incorporated. It started as a one man show, whilst I was finishing my MSc. Thesis at Imperial College London. Since then, we’ve been growing slowly but steadily and will be moving to our new London office fairly soon (announcements to come). We have happy clients in London, New York, Copenhagen, Barcelona, Prague, and Accra.I would like to take this opportunity to thank everyone who’s made it possible for us to help people discover the beauty of graphs, run a business, and have a lot of fun, all at...
Recently, we announced the GraphAware Framework. Today, I would like to introduce its first feature called GraphUnit. GraphUnit is a component that helps Java developers unit test their code that talks to Neo4j and mutates data.Unit Testing Neo4j CodeWhen writing Java code that modifies data stored in Neo4j, developers can use the ImpermanentGraphDatabase in conjunction with any of APIs provided by Neo4jto test that code. This includes the native Java API, the traversal framework, and Cypher.(I’ve excluded the REST API because using that to unit test Java code wouldn’t make much sense.)Let’s say we’re testing code that creates two nodes...
In this short blog post, I would like to introduce the GraphAware Neo4j Framework.Its goal is very ambitious: we’d like to make it as useful for Neo4j developers, as the Spring Framework is for Java developers. The Framework aims at speeding up development with Neo4j by providing a platform for building useful generic aswell as domain-specific functionality, analytical capabilities, graph algorithms, and more.Features OverviewOn a high level, there are two key pieces of functionality, GraphAware Server and GraphAware Runtime. GraphAwareServer is a Neo4j server extension that allows developers to build (REST) APIs on top of Neo4j using Spring MVC, ratherthan...
After a long wait, I finally got the opportunity to publish the recording of my graph/Neo4j talk at WebExpo Prague 2013,intentionally and somewhat misleadingly titled “(Big) Data Science”. Thanks to the organisers for making it available andsee you soon at WebExpo 2014! (Big) Data Science from bachmanm
Those who missed the first official Czech Neo4j Meetup can view recording of the event below (in Czech). Thanks againto all organisers, speakers, and participants.(My slides here)
We are pleased to announce the first official Czech Neo4j Meetup on 11th November 2013 at 6pm at the Czech TechnicalUniversity in Prague. It is a free event: Anyone interested in learning about graph databases as well as those alreadyusing them are welcome to attend, listen to the talks, and join us for a beer afterwards. The talks will be in Czech.UPDATE: Recording of the event now available.Time & PlaceMonday 11th November 2013, 18:00 - 19:30</br>ČVUT, Room T9:107</br>Thákurova 9</br>160 00 Praha 6</br>Programme90% of all world’s data has been generated in the last two years. Its structure has changed and interconnectedness...
Srdečně zveme všechny zájemce o NoSQL, grafové databáze a Neo4j na první oficiální setkání v ČR, které se koná v rámciinformatického večerana Fakultě informačních technologií ČVUT 11. listopadu 2013 v 18h. Vstup je zdarma.UPDATE: Záznam z akce zde.Místo a časPondělí 11. listopadu 2013, 18:00 - 19:30</br>ČVUT, posluchárna T9:107</br>Thákurova 9</br>160 00 Praha 6</br>Program90% všech dat na světě bylo vytvořeno v posledních dvou letech. Značně se také změnila struktura a zvýšila propojenostdat, které lidstvo generuje, zpracovává a ukládá. Důsledkem toho se zrodilo mnoho alternativ k relačním databázím,které se často souhrnně označují termínem NoSQL. Jednou ze čtyř kategorií NoSQL databází jsou tzv. grafové...
In the last post of our “Neo4j Modelling for Beginners” series,we looked at bidirectional relationships. In this post, we compare the implications of qualifying relationships byusing different relationship types versus using relationship properties.Properties as QualifiersLet’s say we want to model movie ratings in Neo4j. People have an option to rate a movie with 1 to 5 stars. One way ofmodelling this, and perhaps the first one that springs into mind, is creating a RATED relationship with a ratingproperty that takes on 5 different values: integers 1 though 5.Writing queries using this model is fairly straightforward in both Java and Cypher....
Transitioning from the relational world to the beautiful world of graphs requires a shift in thinking about data. Althoughgraphs are often much more intuitive than tables, there are certain mistakes peopletend to make when modelling their data as a graph for the first time. In this article, we look at one common sourceof confusion: bidirectional relationships.Directed RelationshipsRelationships in Neo4j must have a type, giving the relationship a semantic meaning, and a direction. Frequently, thedirection becomes part of the relationship’s meaning. In other words, the relationship would be ambiguous without it.For example, the following graph shows that the Czech Republic DEFEATED...
I have just finished a year-long MSc. program in Computing at Imperial College London. My thesis was called GraphAware:Towards Online Analytical Processing in Graph Databases, which you can freely download. It’s not an easy, cover-to-coverread, but there might be some interesting parts, even if you don’t go through all the (over 100) pages.First of all, the overview of Neo4j architecture in Section 2.4 could be helpful for trying to understand how Neo4j worksand, in particular, how it stores its data. Section 5.1 is a good introduction to the GraphAware Framework’s architecturefor people who would like to dig deeper into it...
S laskavým svolením organizátorů konference WebExpo si dovoluji veřejně zpřístupnit záznam své přednášky o Neo4j. Enjoy! WebExpo Prague 2012 - Introduction to Neo4j (Czech) from bachmanm
Letos jsem se poprvé zúčastnil konference WebExpo a sepsal několik postřehů.Motivace Na WebExpo jsem se vydal hned z několika důvodů. Prvním impulsem byl můj kamarád Honza Šrůtek, který plánoval na konferenci přednášet už podruhé a po shlédnutí mé přednášky na Neo4j User Group v Londýně navrhnul, zda bych ji nechtěl vypilovat a rozšířit řady řečníků na WebExpo. Rychlá kontrola Googlu ukázala, že se o Neo4j v ČR zatím příliš nemluví. Za poslední rok jsem s touto open-source grafovou databází pracoval poměrně intenzivně a získal upřímné přesvědčení, že může mnoha vývojářům, kteří řeší problémy s hustě propojenými daty, významně usnadnit práci. Naposledy...