API Reference

All REST calls accept and return JSON content and need to be accessed with the HTTPS protocol (not HTTP). Maximum content size of a request is 1MB. Concurrent requests are not throttled, but there is a hard limit of 4 requests per second.

In the following, we describe the calls and analyses that can be performed with the API.

Web scraping

POST /scrape/

Scrapes HTML page and returns text contents as well as meaningful metadata (e.g. date, author etc.). It also returns all hyperlink URLs found on the page, which can be used for continuous data collection (cf. the tutorial on Scrape: continuous data collection).

Note: Currently, the scraping functionality is suitable for pages with one single main text block (articles etc.). Scraping functionality for table-like pages (product reviews, forums etc.) will become available in the near future.

Request JSON Object:
 
  • url – Page URL to be scraped
Response JSON Object:
 
  • title (string) – Scraped title or null if not found
  • date (string) – Scraped date in “YYYY-MM-DD” format or null if not found
  • author (string) – Scraped author or null if not found
  • source (string) – Scraped article source or null if not found
  • article (boolean) – true if this page is considered an “article” page, false otherwise
Response JSON Array of Objects:
 
  • content (string) – Scraped content in a list
  • hrefs (string) – List of hyperlink URLs scraped found on the page

Example request:

POST /scrape/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "url": "http://auto.sohu.com/20160909/n468022408.shtml"
}

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{
   "title": "SUV持续火热 有望在乘用车市场占半壁江山",
   "date": "2016-09-09",
   "author": "李溯婉",
   "source": null,
   "article": true,
   "content": [
     "无论是自主品牌还是合资品牌,都忍不住对SUV愈发“宠爱”。",
     "在正在举行的成都车展上,东风汽车集团旗下无论是自主品牌还是合资品牌,
     皆让SUV当主角。东风柳州汽车有限公司(下称“东风柳汽”)自主品牌东风风
     行派出三款重磅新车前来助阵,包括风行SX6、新景逸X5以及菱智F500,
     其中前两款是SUV车型,菱智F500则是一款MPV。...",
     "..."
   ],
   "hrefs": [
     "http://auto.sohu.com/rdxc/",
     "http://mt.sohu.com/20160628/n456663420.shtml",
     "http://shijiazhuang.auto.sohu.com/",
     "http://2sc.sohu.com/auto-changcheng/",
     "http://stock.sohu.com/",
     "..."
   ]
}

Analyze call

POST /analyze/

The analyze call lets you analyze a list of Chinese texts using the NLP functionality of the Web&Text API, i.e. categorization, concept extraction and sentiment analysis. You can perform multiple analyses in one call, thus saving time by sending your data only once. More detailed descriptions of available analyses can be found in the Linguistic analyses section.

The analyze call takes two obligatory arguments:

Request JSON Object:
 
  • texts (array<string>) – Chinese texts you wish to analyze
  • analyses (array<string>) – Set of analyses to perform on input text. For best performance select only those that you need. Possible values are “concepts”, “categories”, “sentiment” and “absa”
Response JSON Array of Objects:
 
  • categories (array<object>) – Only present if categories was specified in the analyses argument. Cf. Text Categorization (categories) for more information about the output.
  • concepts (array<object>) – Only present if concepts was specified in the analyses argument. Cf. Concept Extraction (concepts) for more information about the output.
  • sentiment (array<object>) – Only present if sentiment was specified in the analyses argument. Cf. Sentiment Analysis (sentiment) for more information about the output.
  • absa (array<object>) – Only present if absa was specified in the analyses argument. Cf. Aspect-based Sentiment Analysis (BETA; absa) for more information about the output.

Example request:

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["宝马的汽车很好看。"],
  "analyses": ["concepts", "sentiment"],
}

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{"concepts": [[
   {"concept": "VisualAppearance",
    "freq": 1,
    "relevance_score": 0.925397272754923,
    "surface": [{"span": [6, 8],
                 "surface_string": "好看"}],
    "type": "feature"},
   {"concept": "Automobile",
    "freq": 1,
    "relevance_score": 0.37899853242163195,
    "surface": [{"span": [3, 5],
                 "surface_string": "汽车"}],
    "type": "product"},
   {"concept": "BayerischeMotorenWerke",
    "freq": 1,
    "relevance_score": null,
    "surface": [{"span": [0, 2],
                 "surface_string": "宝马"}],
    "type": "brand"}]],
 "sentiment": [
   {"sentiment_value": 0.20766649639881007}]}

Linguistic analyses

In the following, we describe the output structure for the four linguistic analyses (categories, concepts, sentiment and absa) that can be performed with the analyze call. Please refer to Section Linguistic and conceptual framework for conceptual details about the outputs.

Text Categorization (categories)

Returns the dominant thematic categories of a text. The result of the categories analysis is a list of probabilities for the different categories as specified in taxonomies documentation. Each list element is an object that consists of the category label and its probability in the [0,1] interval. All probabilities sum up to 1.

Example request:

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["宝马汽车"],
  "analyses": ["categories"]
}

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{
  "categories": [
    [{"label": "auto", "probability": 0.8993354544297932},
     {"label": "hr", "probability": 0.04230845740698282},
     {"label": "law", "probability": 0.01438615376297817},
     {"label": "education", "probability": 0.007647668964158772},
     "..." ]
  ]
}

Concept Extraction (concepts)

Extracts concepts, such as products, companies, people, locations etc., from provided text. You can find a specification of the used concept types in the Concept types section. The result of the concepts analysis is a list of the extracted concepts. Each concept is described in a JSON object with the following properties:

  • concept (string) - English name of the concept
  • freq (int) - frequency of occurrences of this concept in the text
  • type (string) - concept type (from Concept types)
  • relevance_score (float) - relative relevance of the concept in this text, based on TFIDF score
  • surface (list) - a list of objects describing the surface appearances of the concept; each object has two fields: surface_form (original surface form in text) and span (start and end index of this surface form in text)

Example request:

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["我们在中关村买了三星的平板电脑。"],
  "analyses": ["concepts"]
}

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{
  "concepts": [
    [{"concept": "Zhongguancun",
      "freq": 1,
      "relevance_score": 0.5937342292839902,
      "type": "location",
      "surface": [{"surface_string": "中关村", "span": [3, 6]}]},
     {"concept": "Tablet",
      "freq": 1,
      "relevance_score": 0.47934067781670464,
      "type": "product_type",
      "surface": [{"surface_string": "平板电脑", "span": [11, 15]}]},
     {"concept": "Samsung",
      "freq": 1,
      "relevance_score": 0.49896184625200823,
      "type": "brand",
      "surface": [{"surface_string": "三星", "span": [9, 11]}]}]
   ]
}

Please refer to the tutorial Concepts: cooking recipe analysis to see an example usage of concept extraction.

Sentiment Analysis (sentiment)

Extracts overall sentiment polarity (positive or negative) from a text. The result is a JSON object with a single field sentiment_value:

  • sentiment_value (number) - Number between -1 and 1. -1 is the most negative value, 1 ist the most positive value.

Example request:

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["这个平板电脑让我们特别不满意。"],
  "analyses": ["sentiment"]
}

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{
  "sentiment": [
      {"sentiment_value": -0.519603486469173}
  ]
}

Aspect-based Sentiment Analysis (BETA; absa)

Analyzes entities and associated polarities and emotions. ABSA evaluations can be configured with absa object property to extend evaluation entities that are not found in original text using external_entity_data property (see example request). To learn more read Aspect-based sentiment analysis section.

Optional arguments:

  • external_entity_data - list of external entities that can be integrated into the analysis. Cf. Adding external entities for detailed specification.

Result of ABSA analysis is made up of four sections:

  • normalized_text - text with normalized casing and whitespace
  • entities - simple and complex entities identified in the text.
  • evaluations - evaluations identified in the text.
  • relations - relations between evaluations/emotions and the entities they target.

A detailed description of the ABSA output as well as tutorials demonstrating its usage can be found in the Aspect-based sentiment analysis section.

Important note: Currently, we provide a BETA version of the absa call, which is tailored to the automotive domain. We will be regularly updating the analysis with new linguistic features and extending it to other domains.

Example request:

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["宝马的汽车很好看。"],
  "analyses": ["absa"],
}

Example response:

HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{
  "absa": [
    {"normalized_text": "宝马的汽车很好看。",
     "entities": [{"semantics": [{"type": "brand", "value": "BMW"},
                                 {"type": "product", "value": "Car"}],
                   "surface": {"span": [0, 5], "surface_string": "宝马的汽车"}}],
     "evaluations": [{"semantics": {"entity": [{"type": "feature_subjective",
                                                "value": "VisualAppearance"}],
                                    "sentiment_value": 0.875},
                      "surface": {"span": [5, 8], "surface_string": "很好看"}}],
     "relations": [{"semantics": {"entity": [{"type": "brand",
                                              "value": "BMW"},
                                             {"type": "product",
                                              "value": "Car"},
                                             {"type": "feature_subjective",
                                              "value": "VisualAppearance"}],
                                  "opinion_holder": null,
                                  "restriction": null,
                                  "sentiment_value": 0.875},
                    "surface": {"span": [0, 8], "surface_string": "宝马的汽车很好看"}}]}
  ]
}

Please refer to the tutorials Absa: basic tutorial and Absa: comparative usecase to see example usages of aspect-based sentiment analysis.

Error responses

Wrong formatting or names of input parameters, missing required fields etc. will result in 400 Bad Request error. The returned JSON entity describes the issue with your request. For example, upon sending empty JSON, you will get the following response:

{"analyses": ["This field is required."],
 "texts": ["This field is required."]}

You will also get 400 Bad Request if you try to use HTTP instead of HTTPS. In this case, the error response will contain a generic detail key:

{"detail": ["Use https"]}

If you try to send a request bigger than 1 MB, which is the maximum size allowed for a request, a 413 Request Entity Too Large error code will be returned with the following JSON entity:

{"detail": ["Enclosed entity is too big"]}

For scraping, if the requested could not be downloaded, a 503 Service Unavailable error code will be returned. This may mean that your URL was wrong or the requested page is temporarily unavailable. In case this error is returned, double-check that the requested URL is correct and that you can access it on your own.

{"detail": "Requested page unavailable - please check URL or try again later."}

Advanced usage options

external_entity_data in absa

The analyze call takes extra optional argument “absa” that can be used to specify external entities to associate with unassociated evaluations in the texts. You can read more about this feature - its intended uses, format, etc. - and check examples in Adding external entities.

single_document

By default, input texts are considered as separate documents, so the output will contain sentiment analysis and categories for each string in the texts array. If your input texts are part of one document (for example they are paragraphs), we recommend to perform sentiment and categories analysis on the document level instead of splitting it into parts. Setting “single_document”: true in analyze call input json will concatenate the texts before performing categories and sentiment analysis and return a single result for the whole test.

That also means that setting it to true has no effect unless you are using either categories or sentiment analysis. In the example below, you can see that there is no difference in concepts output, but only one sentiment value is returned if single_document flag is set. If set, the single_document flag will also be present in the output entity.

POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["宝马的汽车很好看。"],
  "analyses": ["concepts", "sentiment"],
}
HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{"concepts": [[{"concept": "Automobile",
    "freq": 1,
    "relevance_score": 1.0,
    "surface": [{"ambiguity": false, "span": [2, 4], "surface_string": "汽车"}],
    "type": "product"},
   {"concept": "BayerischeMotorenWerke",
    "freq": 1,
    "relevance_score": null,
    "surface": [{"ambiguity": false, "span": [0, 2], "surface_string": "宝马"}],
    "type": "brand"}],
  [{"concept": "VisualAppearance",
    "freq": 1,
    "relevance_score": 0.925397272754923,
    "surface": [{"ambiguity": false, "span": [6, 8], "surface_string": "好看"}],
    "type": "feature"},
   {"concept": "Automobile",
    "freq": 1,
    "relevance_score": 0.37899853242163195,
    "surface": [{"ambiguity": false, "span": [3, 5], "surface_string": "汽车"}],
    "type": "product"},
   {"concept": "BayerischeMotorenWerke",
    "freq": 1,
    "relevance_score": null,
    "surface": [{"ambiguity": false, "span": [0, 2], "surface_string": "宝马"}],
    "type": "brand"}]],
 "sentiment": [{"sentiment_value": -0.3959144788943484},
  {"sentiment_value": 0.5846670072023799}]}
POST /analyze/ HTTP/1.1
Host: api.anacode.de
Authorization: Token <token>
Accept: application/json
Content-Type: application/json

{
  "texts": ["宝马的汽车很好看。"],
  "analyses": ["concepts", "sentiment"],
}
HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json

{"concepts": [[{"concept": "Automobile",
    "freq": 1,
    "relevance_score": 1.0,
    "surface": [{"ambiguity": false, "span": [2, 4], "surface_string": "汽车"}],
    "type": "product"},
   {"concept": "BayerischeMotorenWerke",
    "freq": 1,
    "relevance_score": null,
    "surface": [{"ambiguity": false, "span": [0, 2], "surface_string": "宝马"}],
    "type": "brand"}],
  [{"concept": "VisualAppearance",
    "freq": 1,
    "relevance_score": 0.925397272754923,
    "surface": [{"ambiguity": false, "span": [6, 8], "surface_string": "好看"}],
    "type": "feature"},
   {"concept": "Automobile",
    "freq": 1,
    "relevance_score": 0.37899853242163195,
    "surface": [{"ambiguity": false, "span": [3, 5], "surface_string": "汽车"}],
    "type": "product"},
   {"concept": "BayerischeMotorenWerke",
    "freq": 1,
    "relevance_score": null,
    "surface": [{"ambiguity": false, "span": [0, 2], "surface_string": "宝马"}],
    "type": "brand"}]],
 "sentiment": [{"sentiment_value": 0.3651351286793789}],
 "single_document": true}