Sentiment Lexicon

Sentiment lexicon wrapper and generator.

Installation

pip install sentiment-lexicon

Usage

The module provides a single class, Lexicon, that can be used as a simple wrapper around sentiment lexicon data. The sentiment value of a given word can be accessed via the value instance method.

from sentiment_lexicon import Lexicon

lexicon = Lexicon(words, values)

lexicon.value('good') # => 1

The class can also generate a sentiment lexicon based on positive and negative input documents.

lexicon = Lexicon.from_labelled_text(positive_documents, negative_documents)

More information is available in the documentation.

class sentiment_lexicon.Lexicon(words, values, default=None, normalize=False)

A class wrapping a sentiment lexicon.

Provides a value() method for finding the sentiment value of a word.

Parameters
  • words (List[str]) – List of words.

  • values (List[float]) – List of values.

  • default (Optional[float]) – The value given to words that are not in the lexicon.

  • normalize (Optional[bool]) – Determines if the values are normalized to [-1, 1].

default

The default value.

data

The lexicon data.

range

The range that the sentiment values are spanning.

Raises

ValueError – If words and values do not have the same length.

Examples

Creating a lexicon requires one sentiment value per word

>>> Lexicon(['good'], [1])
<sentiment_lexicon.lexicon.Lexicon object at ...>

otherwise an exception is raised.

>>> Lexicon(['good'], [])
Traceback (most recent call last):
...
ValueError: words and values must have the same length

Normalizing the sentiment values is done by passing True as the normalize parameter.

>>> lexicon = Lexicon(['good', 'bad', 'maybe'], [10, -8, 2], normalize=True)
>>> lexicon.data
good     1.0
bad     -0.8
maybe    0.2
dtype: float64
value(word, default=None)

Returns the sentiment value of a given word.

Parameters
  • word (str) – The word to find sentiment value for.

  • default (Optional[float]) – The value to return if not value if found for word. Takes precedence over the default attribute of Lexicon.

Raises
  • KeyError – If word is not found in the lexicon and a default

  • is not provided.

Return type

float

Returns

The sentiment value for the word.

Examples

>>> lexicon = Lexicon(['good'], [1])
>>> lexicon.value('good')
1.0
>>> lexicon.value('bad')
Traceback (most recent call last):
...
KeyError: 'bad not present in the lexicon'
>>> lexicon = Lexicon(['good'], [1], default=0)
>>> lexicon.value('bad')
0
>>> lexicon.value('bad', default=-1)
-1
static from_labelled_text(positive, negative, min_df=0.0, alpha=0.5, ignore_case=True, **kwargs)

Generate a Lexicon based on positive and negative documents using pointwise mututal information.

Parameters
  • positive (List[str]) – List of positive documents.

  • negative (List[str]) – List of negative documents.

  • min_df (Optional[float]) – The number of documents that a word must occur in before it is added to the lexicon.

  • alpha (Optional[float]) – Determines how much the PMI of the two classes affect each other when computing the sentiment values.

  • ignore_case (Optional[bool]) – Determines if the case of the words in the documents are ignored. If True, the words good and Good will be treated as the same.

  • kwargs – Parameters passed to the Lexicon constructor.

Examples

>>> Lexicon.from_labelled_text(['This is good'], ['This is bad'])
<sentiment_lexicon.lexicon.Lexicon object at ...>
>>> Lexicon.from_labelled_text(['This is good'], [])
Traceback (most recent call last):
...
ValueError: there must be at least one positive and one negative document
Return type

Lexicon