Caring about sentiment: how to get the most from people feelings

· 9 min read

It is often useful to relate a piece of text with the sentiment expressed in it. Extracting and processing sentiments from text provides not only a new emotional access pattern to your corpus but also new knowledge which can reveal new insights. Suppose you want to build a recommendation engine which leverages reviews to spot detailed strengths and weaknesses of different hotels, such as good location but bad staff. Or, it certainly makes a difference whether an article talks about your organization in a positive or negative manner.

What’s out there?

Extracting sentiments from texts is a difficult task. Many models predict sentiment based on bag-of-words concept: they inspect words (or potentially bi-grams, tri-grams, …) in isolation from the rest of the sentence and basically evaluate prevalence of positive or negative words. These simple approaches sometimes work surprisingly well on simple sentences (such as tweets), but they struggle in more complicated cases. Consider these sentences from movie reviews dataset [1]:

“We know the plot’s a little crazy, but it held my interest from start to finish.” “It has the ability to offend and put off everyone, but it holds you with its outrageousness.”

As humans we don’t get easily fooled and realize that both sentences are mildly positive. But for a computer, this is not a simple task.

Different algorithms have been developed for sentiment classification. Some of them are “simple”, such as Naive Bayes, Support Vector Machines or even combination of them [2] (these are often used as benchmarks for other models). More advanced methods use deep neural networks (DNNs), such as derivatives of Recurrent Neural Networks (such as LSTMs or GRUs) [1], which are currently the state-of-the-art models regarding accuracy and ability to generalise to unseen data. We can even use Convolutional Neural Networks, known primarily from image classification tasks (sliding kernel matrices represent a context window which leverages combinations of words to predict sentiment) [3]. A valuable alternative to deep neural networks could be Deep Forests in combination with TF-IDF vectors or document embeddings [4]: they are deep models with less hyperparameters than DNNs and without relying on backpropagation algorithm.

Sentiment training with Stanford Sentiment Treebank

One of the best available off-the-shelf implementations for Sentiment Analysis is provided by Stanford CoreNLP. It handles the sentiment classification task by using a recursive deep neural network to build a representation of complex underlying structure of sentences [1]. Figure below shows a labeled tree representing the second sentence mentioned as an example above:

Example from the Stanford Sentiment Treebank dataset [1]. Blue denotes positive sentiment, red is negative.

The first part of the sentence is negative, perhaps even very negative (the red nodes on the left), however, the tone changes after the conjunction “but”. Even though this part is not overly positive (it even contains a negative word “outrageousness”), the context is such that it overweights all the negative elements in the whole sentence and the resulting emotion is positive.

This is the kind of data that is used for training Stanford Sentiment classification model. GraphAware Hume provides full support of this approach by integrating Stanford Sentiment training and classification functionalities and making them available as simple cypher procedures or rest API calls. The user can easily use an existing sentiment model, or train his or her own sentiment classifier.

We tested this approach using Stanford movie reviews dataset [5], which contains 25k positive reviews, 25k negative reviews and additional 50k unlabeled reviews for unsupervised learning. To use default pretrained Stanford Sentiment classifier, all we need to do is to enable it in our pipeline and run annotation as usual:

CALL ga.nlp.processor.addPipeline({
textProcessor: 'com.graphaware.nlp.enterprise.processor.EnterpriseStanfordTextProcessor',
name: 'imdb-sentiment', processingSteps: {tokenize: true, dependency: true, ner: true, sentiment: true}, threadNumber: 4}) YIELD result
RETURN result

CALL apoc.periodic.iterate("MATCH (n:Review) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]-() and size(n.text) > 10 RETURN n",
"CALL ga.nlp.annotate({text: n.text, id: id(n), pipeline: 'imdb-sentiment', checkLanguage: false}) YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)",
{batchSize: 1, iterateList: false, parallel: false})

Sentiment labels are now added to each processed Sentence node as additional labels. Label names depend on the trained model, default Stanford labels are VeryPositive, Positive, Neutral, Negative, VeryNegative. Few examples for each class:

Label Sentences
VeryPositive “This film is a great comedy drama.”, “A fun romp…a lot of good twists and turns!”, “You are going to have a great amount of fun!”, “This is high adventure at its best.”
Positive “Addy is an actual Englishman, and he doesn’t have to fake an accent; his accent is genuine.”, “The film, however, with a strong debut for James Caan, remains effective and affecting.”
Neutral “Maybe it has to be earned.”, “I’ve seen the movie before, a different version .”, “No doubt our cavemen friends will follow suit.”
Negative “Of course they did, but back then this wasn’t abnormal.”, “We then see a human birth in all its bloody glory, the daughter of The Butcher.”, “His parents think that if Bartleby doesn’t go to college, he will have no future.”
VeryNegative “Unfortunately, she is usually being assaulted, terrorized, and raped, a very bad thing.”, “Slow, incredibly slow, and flat.”, “worst scene for me was watching Tracy shoot up in an old navy dressing room!!!!”

There are however two shortcomings:

  • Complexity of training data: each training sentence needs to be transformed into complex tree with labels for all tree leaves and forking branches
  • Sentence-based approach: it returns a sentiment of individual sentences and there is no clear way how to get a sentiment of a whole document (be it a short paragraph or longer article); evaluating prevalence of positive or negative sentences is too simplistic

To illustrate the reason why we consider a disadvantage that we can’t extract sentiment of whole texts, note a sentence in Neutral category in the table above: “No doubt our cavemen friends will follow suit.” When we look at it without context, it indeed appears to be relatively neutral statement. However, the previous sentence is: “The lead actor in the embarrassing Chimp fiasco actually went into shame-by-association hiding after it was abruptly canceled.” This sheds completely different light on the originally neutral sentence: it is clearly intended as negative, if not very negative.

This observation can be further illustrated when attempting to use Stanford Sentiment model for classifying whole documents. The annotation (ground truth) of the above discussed IMDB dataset provides labels of whole reviews, not individual sentences, by marking them as Positive or Negative (no shades of grey like “Very Positive”). It is not therefore straightforward to use this dataset for testing default Stanford Sentiment model and using it as a benchmark for other models, because it works on sentence level.

The simplest way to use Stanford for getting sentiment of whole documents is as follows:

  • ignore neutral sentences
  • use VeryPositive and Positive Stanford Sentiment labels as a generic positive and vice versa for generic negative
  • document is positive if positive sentences prevail over negative ones and vice versa

Comparing this to ground truth labels, we get 64% accuracy. This poor result is not surprising given very simplistic generalization of sentence-based sentiment to document based one. It can however serve as a baseline for other - document based - models.

Sentiment training with Paragraph Vectors

To overcome these inconveniences and to provide the users an alternative approach, we investigated other deep learning algorithms. After careful evaluations (both quality and performance wise), we decided to introduce into Hume a sentiment analysis based on Paragraph Vectors.

As discussed in our previous blog post, Paragraph Vectors provide document embeddings: vector representations that manage to capture a semantic substance of documents (topics). Previously, we used these vectors as a high quality distance measure among texts. They can be also used as document representations in machine learning algorithms, such as logistic regression or deep neural networks trained for sentiment classification. Applying this two stage approach (training Paragraph Vectors in Hume and using them as input to classification in TensorFlow) to the Stanford IMDB dataset, we found that neural networks perform the best: accuracy 85% was reached based on vector dimension 800, surpassing by 3% the best result from logistic regression.

Alternatively, DeepLearning4j implementation of Paragraph Vectors allows to run these two stages in one API call. As input, we provide texts along with class labels (sentiment) to train the model. During inference, we pass a text to the model, which directly returns the most probable class. This approach has several advantages:

  • works per document (which can also be just one sentence), not per sentence, so whole paragraphs and articles can be classified
  • simplified training: required input is a text and its sentiment label, no need for complex labeled sentence-level tree
  • very fast inference

Hume provides support both for training the model and extracting the sentiment. The Cypher query for training Paragraph Vectors together with sentiment on the IMDB dataset (assuming that training Review nodes contain a sentiment property) is:

// First: train the model
query: 'MATCH (s:Review) WHERE s:Training AND length(s.text) > 0 RETURN s.text as text, s.sentiment as label',
classify: true,
learningRate: 0.01,
batchSize: 500,
layerSize: 800,
trainWordVectors: false
}) YIELD result
return result

Once a model is trained it is possible to use it to classify the unseen text.

// Next: classify sentiment (sentiment classes are added as labels to Review nodes)
CALL apoc.periodic.iterate('MATCH (n:Review:Test) where not exists(n.doc2vec_sentiment) and length(n.text) > 0 RETURN n',
"CALL, {labels:['neg', 'pos'], iterations: 10}) YIELD result
SET n.doc2vec_sentiment = result",
{batchSize: 10, parallel: true})

This results in accuracy 83%, that is 16% improvement compared to attempts to use sentence-based Stanford Sentiment classification for labelling whole documents. Moreover, the inference is incredibly fast: only 7 milliseconds per movie review. So this method not only solves the shortcomings of the Stanford based approach, but it also delivers more accurate and faster results to the end user.


This blog post shows how GraphAware Hume supports Sentiment Analysis. As usual, Hume provides various options to the users so they are able to find the best fit for their requirements. This blog post discussed two completely different approaches: one provided off-the-shelf by integrating Stanford CoreNLP, the other one has been implemented using DL4J and its implementation of Paragraph Vectors.

In order to get the best from each of them, they require significant quantity of training data in order to fine tune the results for the specific domain, since a negative sentence in one context can become positive in another. Hume provides some pre-trained models for the Paragraph Vectors approach for some of the most common domains and applications.


[1] Richard Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)

[2] S. Wang, Ch. D. Manning, “Baselines and Bigrams: Simple Good Sentiment and Topic Classification”,

[3] Y. Kim, “Convolutional Neural Networks for Sentence Classification”, arXiv:1408.5882 [cs.CL]

[4] Zhi-Hua Zhou, Ji Feng, “Deep Forest”, arXiv:1702.08835 [cs.LG]

[5] A. L. Maas et al., Learning Word Vectors for Sentiment Analysis, ACL 2011,

Dr. Vlasta Kůs

Dr. Alessandro Negro

Research & Development | Neo4j certification

Dr. Alessandro Negro holds a Ph.D. in Computer Science and is a leading authority on graph-based AI and Machine Learning. Dr. Negro is an expert in computer science, graphs, and data science, specialising in natural language processing, recommendation engines, fraud detection, and knowledge graphs. He has written two books on these topics: Graph-Powered Machine Learning (Manning, 2021) and Knowledge Graphs Applied (Manning, estimated publication in 2023) and his expertise is highly sought after within the industry.