In the first part of the series Graphs in Law Enforcement, Data sources and modelling, we discussed graphs in law enforcement investigations, their data sources, data provenance, INTs and how to model sources in graphs. In part 2, Data quality and credibility, we covered source ratings (source reliability & information credibility) and their importance in investigation graphs for law enforcement.
In this third blog post of the series, corresponding entities, grouping and fusing entities are our focus.
Same entity, different sources
The sources are modelled, we have the information and the nodes have a source rating. What if different sources provide the same information and are represented by different nodes?
In this graph, there is information collected regarding an armed robbery. Paul is the witness who observed the vehicle, a Toyota, with John in it. The licence plate was seen by Paul and after a run of the plate number, it produced the fact there is a Toyota Camry registered to Richard S.
As the licence plates are the same, it is highly probable that this is the same vehicle. An analyst may assume a link exists between Richard and John. Though there is no path between them in the graph, it should exist as the vehicles are most probably the same.
This case could be handled in multiple ways. The first option is to not take any action and allow the analyst to draw the conclusion that these two vehicles are the same.
The result in these cases would be a cluttered graph due to these duplicates, extra nodes, as well as deep subgraphs. With hundreds and thousands of nodes in the graph, it will be difficult to identify which nodes are the same. Secondly, there is substantial cognitive overload for an analyst and it defeats path finding completely, which is one of the graph’s most powerful advantages.
When confident and prepared to capture the hypothesis, you can link the entities and create a link between the nodes in the graph. The link will create a path now, so when you’re searching for the armed robbery, John or one of the related nodes, the link between them is visible in the graph.
It does mean that this path is now one hop longer. Moreover, if there are more vehicles with the same licence plate, e.g. caught on a speed camera, observed somewhere else or maybe even stolen, this link becomes inconvenient and will create significant noise, as a relationship or link can only connect two nodes. If there are different confidence levels for the source ratings on the links, it might become unmanageable.
Using node representations
When creating a link between nodes is not the solution for the case, creating a node representation is a more suitable solution.
All the activities around the vehicle, the Toyota Camry, can get linked to the node created, whenever it is observed again or caught by a speed camera for example. In this graph, many activity nodes can be linked to the vehicle node, but that will create extra steps in the paths of your graph.
A way to reduce these extra steps and the resulting noise in the graph is to group the nodes. You can group several nodes visually, based on a strong identifier. As the graph is connected it is possible to efficiently drill up and down the layers to find and develop conclusions.
As this is a visual layer, a virtual group, there are no real paths connected. However, you can see what the facts are in this group of nodes as the underlying pieces of information.
The next step would be to materialise the groups and fuse entities, to bring information together from the entities which are considered the same.
To fuse entities, create a new entity which fuses information from facts determined to be the same. Then, establish the source and confidence in the new fused entity.
In the best case, all properties on all these entities are complementary and there is no difference or conflict between them, so we can create a new, unified node.
The confidence in the fused entity, the source rating, is human intelligence for this node as the analyst determined the original entities are the same. The analyst is quite confident as it has been rated A3.
This node is a brand new entity in the graph with its own source and rating. The original sources are as properties on the fused node.
The relationships are transferred from the original nodes to the fused entity. When exploring the fused entity you are able to expand and collapse to see the relationships to the original sources.
Challenges with fused entities
There are a couple of challenges you are most likely going to face with fusing entities.
Updates of facts: One of the biggest questions is, how do you deal with updates? Are the fused entities automatically updated when the facts are updated?
Materialised vs shell fused entities: There are two types of fused entities. Materialised fused entities have materialised properties, i.e. they are copied from the facts onto the fused entity. In this case, it is critical to make sure the fused entity gets triggered and updated as well when you update a fact. Shell fused entities are more common and the facts own and show the properties at run time, they will show updated data anyway.
Information and source rating validation: It might happen a source rating of a fact changes, the original node. How does that affect the fused entity? If the source rating shifts from A5 to D2, the source can no longer be considered trustworthy and possibly the assessments of the analyst have lost validity.
Conflicts: What happens if there is a conflict? For example, one of the vehicles in the fused entity wasn’t a Toyota, a mistake was made and the vehicle turned out to be a Jeep. Now they are not the same entity and this example forces you to consider how to model the graph and sources.
The main takeaway is to make sure you consider use cases before creating fused entities and create workflows to ensure all these issues remain in sync.
We will speak more about facts and offloading them in the upcoming blog, the fourth and final post in this Graphs in Law Enforcement series. In case you would like to watch a video with all these topics touched, check out the talk Tracking Data Sources of Fused Entities in Law Enforcement Graphs by our VP of Engineering, Luanne Misquitta.