Aspect-based sentiment analysis

Aspect-based vs. document-level sentiment analysis

Anacode’s Web&Text API offers two different calls for sentiment analysis:

  • sentiment: general, document-level sentiment analysis
  • absa (aspect-based sentiment analysis): detailed sentiment analysis for different entities mentioned in the text

Put simply, the general sentiment call tells you whether a text is positive or negative, while the absa call explains why a text has this polarity, thus providing valuable detail information. Let’s consider the following simple example:

  • 宝马汽车很好看。 BMW cars look very nice.

The output of the sentiment call is as follows:

[[{"sentiment_value": 0.581642059125058}]]

Thus, we know that the text has a strong positive polarity. However, we don’t know what the text is ultimately about and, in particular, which entities and aspects from the text actually get a positive sentiment or evaluation.

The output of the absa call is as follows:

[{"normalized_text": "宝马的汽车很好看。",
  "entities": [{"semantics": [{"type": "brand", "value": "BMW"},
                              {"type": "product", "value": "Car"}],
                "surface": {"span": [0, 5],
                         "surface_string": "宝马的汽车"}}],
  "evaluations": [{"semantics": {"entity": [{"type": "feature_subjective",
                                             "value": "VisualAppearance"}],
                                 "sentiment_value": 0.875},
                   "surface": {"span": [5, 8], "surface_string": "很好看"}}],
  "relations": [{"semantics": {"entity": [{"type": "brand",
                                           "value": "BMW"},
                                          {"type": "product",
                                           "value": "Car"},
                                          {"type": "feature_subjective",
                                           "value": "VisualAppearance"}],
                               "opinion_holder": null,
                               "restriction": null,
                               "sentiment_value": 0.875},
                 "surface": {"span": [0, 8], "surface_string": "宝马的汽车很好看"}}]}]

This output provides more detailed information - specifically, we get to know that the text mentions the visual appearance of BMW cars and gives them a positive evaluation.

Thus, whereas both sentiment and absa calls provide us the information that the text is positive, only absa provides additional information which allows to explain why it carries this polarity.

Technologically, document-level sentiment analysis is mainly based on a supervised statistical model. By contrast, the implementation of aspect-based sentiment analysis includes different types of linguistic knowledge, specifically domain taxonomies, lexica with common sentiment expressions, grammars and rules of semantic compositionality. Some examples of the application of this linguistic knowledge are:

  • the distinction of various types of product features, e.g. 座椅 Seats: component, 尺寸 Size: quantitative feature etc.
  • the interpretation of modifiers in adjectival phrases, e.g. 好 good: 0.5 vs. 很好 very good: 0.875
  • the distinction between coordination and subordination in nominal phrases, e.g. 座椅和外观 the visual appearance AND the seats vs. 座椅的外观 the visual appearance OF the seats

To get you started with fine-grained sentiment analysis, we provide an Absa: basic tutorial which will lead you through common aggregation and analysis operations, incl. finding frequent entities, calculating entity sentiment and finding entities and features that often cooccur. Further, please refer to Aspect-based sentiment analysis for a detailed description of the conceptual framework used for our aspect-based sentiment analysis.

Top-level structure

At the top level, the output of the absa call has three list-valued fields: “entities”, “evaluations” and “relations”. Each list element corresponds to one object and has two fields:

  • “surface”: a description of how the object appears in the input text; each “surface” object has an attribute “surface_string” containing the corresponding surface string from the text, as well as an attribute “span” specifying the indices of the first and the last character of the surface string

    For example:

    {
        "surface": {
            "surface_string": "宝马汽车座椅的外观",
            "span": [0, 9]
        },
        "..."
    }
    
  • “semantics”: the semantic representation of the object. In the following, we provide a detailed description of the structure and content of the “semantics” field for each of the three object types.

Entities

Entities are things that can be evaluated, such as brands, products and product features; in natural language, they are mostly represented by nominal phrases. Entities can be composed of multiple subentities, with a modifying relationship between them. Each subentity has two attributes:

  • “type” (the semantic class of the entity, e.g. “brand”, “product” etc.)
  • “value” (specific concept, e.g. “BMW”, “Truck” etc.)

Real-world examples of composed entities and their corresponding representations in our framework are:

  • 中国的汽车 Chinese cars

    [{"type": "location", "value": "China"},
     {"type": "product", "value": "Car"}]
    
  • 宝马汽车 BMW cars

    [{"type": "brand", "value": "BMW"},
     {"type": "product", "value": "Automobile"}]
    
  • 宝马汽车的座椅 the seats of BMW cars

    [{"type": "brand", "value": "BMW"},
     {"type": "product", "value": "Car"},
     {"type": "feature_component", "value": "Seats"}]
    
  • 宝马汽车座椅的外观 the visual appearance of the seats of BMW cars

    [{"type": "brand", "value": "BMW"},
     {"type": "product", "value": "Car"},
     {"type": "feature_component", "value": "Seats"},
     {"type": "feature_subjective", "value": "VisualAppearance"}]
    

The ordering of the elements in the list corresponds to their natural modification ordering, with each element being connected by a logical “part-of” relation to the previous element.

Types used in entities

Concept type Description
location Geographical locations, e.g. China, Moscow, Australia
brand Brands of goods and services, e.g. Apple Inc., TUI, Cartier
product Physical products, e.g. Automobile, Mobile Phone
service Non-material services, e.g. MBA, Concert, Surgery
product_model Product models of specific brands, e.g. Golf, iPhone

Special attention should be paid to the product features. Our framework distinguishes different types of features; in order to facilitate aggregated calculations, all feature types start with the prefix feature_, followed by a more detailed specification. Currently, we distinguish the following feature types:

Feature type Description
feature_component Components of the product, e.g. Engine, Seats, DoorHandle
feature_quantitative Physical features that can be quantified on an objective scale, e.g. Power, EngineDisplacement, Size
feature_subjective Subjective perceptions, e.g. Flexibility, Comfort, Precision
feature_action Action and usages of product, e.g. Braking, Accelerating, Driving
feature_other Other features

Evaluations

Evaluations show how positive/negative a statement is; they are mostly derived from adjectival phrases. Evaluations have two fields: an obligatory “sentiment_value” and an optional “entity”. The value is a float placed on a scale between -1 and 1, with -1 signalling extremely negative and +1 signalling extremely positive polarity. For example:

  • good: “value”: 0.5
  • bad: “value”: -0.5

The original value of an evaluation word is modified if the evaluation word is modified by degree adverbs or negator:

  • 特别好 particularly good: “value”: 0.875
  • 不好 not good: “value”: -0.5

Optionally, evaluations can also contain a simple entity. This happens if the adjective in the evaluation carries an ‘implicit’ feature, as is the case for 贵 expensive (-> Price), 灵活 flexible (-> Flexibility), 小 small (-> Size) etc. The implicit feature then appears in the entity field:

  • 难看 ugly

    {"sentiment_value": -0.875,
     "entity": [{"value": "VisualAppearance",
                 "type": "feature_subjective"}]}
    
  • 比较灵活 relatively flexible

    {"sentiment_value": 0.6875,
     "entity": [{"value": "Flexibility",
                 "type": "feature_subjective"}]}
    

Relations

Relations can be established between an evaluation and entity if there is syntactic evidence that the evaluation predicates of the entity, e.g.:

  • 操控很灵活 The control is very flexible.: here, the adjectival phrase very flexible is used to evaluate the entity the control.

The semantics of a relation is represented by a dictionary with 4 fields:

  • “entity” (evaluated entity)
  • “sentiment_value” (polarity of evaluation)
  • “restriction” (optional; contextual restriction under which the evaluation applies)
  • “opinion holder” (optional; if this field is null, the default opinion holder is the author himself)

Examples of sentences with relations are:

  • 舒适度很好。The comfort is very good.

    {"entity": [{"type": "feature_subjective",
                 "value": "Comfort"}],
     "opinion_holder": null,
     "restriction": null,
     "sentiment_value": 0.875}
    
  • 加速时操控比较灵活。When accelerating, the control is relatively flexible.

    {"entity": [{"type": "feature_subjective",
                 "value": "Control"},
                {"type": "feature_subjective",
                 "value": "Flexibility"}],
     "opinion_holder": null,
     "restriction": "Accelerating",
     "sentiment_value": 0.6875}
    
  • 我爸爸很喜欢它的动力。 My father likes its power a lot.

    {"entity": [{"type": "feature_quantitative",
                 "value": "Power"},
                {"type": "emotion", "value": "Joy"}],
     "opinion_holder": "Father",
     "restriction": null,
     "sentiment_value": 0.875}
    

To a great part, relations are composed of entities and evaluations. Two points should be observed:

  • If the evaluation contains an ‘implicit’ feature, this feature is ‘moved’ into the entity of the relation. For example:

    加速时操控比较灵活。When accelerating, the control is relatively flexible.

    {"entity": [{"type": "feature_subjective",
                 "value": "Control"},
                {"type": "feature_subjective",
                 "value": "Flexibility"}],
     "opinion_holder": null,
     "restriction": "Accelerating",
     "sentiment_value": 0.6875}
    

Hereby, the evaluation phrase 比较灵活 relatively flexible contains the implicit feature Flexibility, which is moved into the entity of the relation.

  • An additional type - that of “emotion” - can occur inside the entity of a relation. It can be moved there from any emotion adjectives or verbs that target the entity in question. For example:

    我特别讨厌它的外观。 I really don’t like its visual appearance.

    {"entity": [{"type": "feature_subjective",
                 "value": "VisualAppearance"},
                {"type": "emotion", "value": "Dislike"}],
     "opinion_holder": "AuthorReference",
     "restriction": null,
     "sentiment_value": -1.0}
    

Adding external entities

In some cases, we already know the subject of an evaluation from the metadata. For example, some data types, especially product review data, already structure the review text by specific product features and aspects. The following shows an example review from pcauto:

Example of a product review with semi-structured text data

The text provided by the user often does not repeat the entity information. It can still be integrated into the absa output via the input parameter external_entity_data with the following structure:

  • External entities are provided as a list of entities. Each entity is a list of dict items with keys “type” and “value”, following the standard entity format (cf. Entities).
  • “type” must correspond to one of the entity types (cf. Types used in entities for list of allowed types).
  • external_entity_data should have the same length as the texts argument; thus, each text has a corresponding element in external_entity_data. null values are allowed.

By specifying external entities, we ensure that all evaluations which do not occur inside a relation with an explicit entity become part of a new relation with the external entity. The following example shows a call with an external entity:

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["很好看"],
  "analysis": ["absa"],
  "absa": {
    "external_entity_data": [[{"value": "VisualAppearance",
                               "type": "feature_subjective"}]]
  }
}

The response is as follows:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

  {
    "absa": [
      [{"entities": [],
        "evaluations": [
          {"semantics": {"entity": [{"type": "feature_subjective",
                                     "value": "VisualAppearance"}],
                         "value": 0.875},
           "surface": {"span": [0, 3], "surface_string": "很好看"}}],
        "normalized_text": "很好看",
        "relations": [
          {"external_entity": true,
           "semantics": {"entity": [{"type": "feature_subjective",
                                     "value": "VisualAppearance"}],
                         "opinion_holder": null,
                         "restriction": null,
                         "value": 0.875},
           "surface": {"span": [0, 3], "surface_string": "很好看"}}]
      }]
    ]
  }

We see that the external entity VisualAppearance is combined with the evaluation and forms a new relation.