Skip to content

Language processing in Linked Data authorities

E. Lynette Rayle edited this page Apr 19, 2019 · 13 revisions

Overview

Some linked data authorites tag literals with a language (e.g. 'milk@en', 'Milch@de', 'Lait@fr'). When literals are provided in multiple languages, it is desirable to be able to request literals for a specific language for two reasons: 1) provide users terms in their desired language, 2) avoid long results that include the term in every language. This document describes language processing for the linked data module in QA.

Where can language be specified?

Language can be specified in multiple places. They are listed here in priority order with highest priority first. If the language is not specified at a higher priority location, then the next highest language specification that does exist will be used.

  1. Passed as part of the request URL using the lang parameter
  2. Specified in the request header using HTTP_ACCEPT_LANGUAGE
  3. Authority specific default defined in the authority configuration
  4. Sitewide default defined in qa initializer

See configuration section below for more information on how to specify language.

How is language filtering applied?

Filtering can happen in two ways either with the authority performing the filtering or QA performing the filtering.

Authority filtering

If the authority's API supports passing in a language parameter, then QA will pass the language to the authority for it to perform the filtering. Passing language as a parameter to an authority is limited to a single authority. If multiple languages are specified, then only the first language will be passed to the authority. (e.g. for [:en, :fr], only 'en' will be passed)

Post authority query filtering

QA requests the full set of results that the authority will return. Then QA performs filtering on the full set of results based on the selected language. QA filtering supports filtering for multiple languages (e.g. [:en, :fr])

Rules for filtering:

  • if a language is not specified, keep all triples
  • keep triples where the object literal is tagged with the selected language
  • keep triples where the object literal doe not have a language tag
  • if there are 0 matches for a predicate, keep triples for all languages

Preventing language processing for a single authority when there is a site wide default

Some authorities may have language tagged literals that are known to be incorrect or you may actually want to retrieve literals for all languages. To prevent language filtering, set the language to * which acts as a wild card indicating all language should be matched. The most common usage of this is to set the authority default configuration to * to prevent filtering for that authority. Setting the site default to * means that the default behavior is to not filter for any authority unless it is set individually in the authority or as part of the QA request.

A user can override the authority default and site default by passing in * to prevent filtering for a specific request.

Caveat: If the language is passed as a parameter to the authority and a default vaule is set for the language parameter, the default for the parameter will be used if the user passes in * for the language.

Configuring for language

lang parameter as part of the QA request URL

Processing

If the configuration defines a lang parameter, the lang value will be passed to the authority which will perform the language filtering. If not defined, the filtering occurs on the QA side with QA filtering results returned from the authority.

NOTE: You can define a default value for a lang parameter defined as part of the URL templates, but if you do, it will short-circuit the remainder of the configurations and processing described in this document. If a lang parameter is not passed in and there is a default defined, the default will be used without any further processing.

Limitations

Requires the authority to support language filtering OR requires the authority results to have language tagged literals. If neither of these conditions exist, filtering will not be applied to results.

USAGE

NOTE: This provides examples for configuring a parameter to pass to the authority for the authority to perform the filtering. If this is not an option for the authority, do not provide this configuration and the filtering will happen on the QA side providing the results from the authority include language tagged literals.

Configuration

Configure parameter to pass to authority for fetching a single term

Configure a parameter to pass to the authority when fetching a single term. You see lang defined in the "template". And there is a mapping for the lang parameter.

{
  "term": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type":    "IriTemplate",
      "template": "http://api.library.cornell.edu/skosmos/rest/v1/nalt/data?{?lang}&uri={term_uri}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type":    "IriTemplateMapping",
          "variable": "term_uri",
          "property": "hydra:freetextQuery",
          "required": true,
          "encode":   false
        },
        {
          "@type":    "IriTemplateMapping",
          "variable": "lang",
          "property": "hydra:freetextQuery",
          "required": false
        }
      ]
    },
    ...
  },
  ...
}

Identify the parameter used by the authority for language. Many authorities support the commonly used lang parameter, but QA does not assume this. It allows you to specify a different parameter to use in the authority's URL.

NOTE: The key in this hash is always "lang". The value for "lang" identifies the name of the parameter in the authority URL.

{
  "term": {
    ...
    "qa_replacement_patterns": {
      "term_id": "term_uri",
      "lang": "lang"
    },
    ...
  },
  ...
}
Configure parameter to pass to authority for searching for terms

Similarly, you can define a parameter to use for the search template URL. Again, you see lang defined in the template and a mapping for the lang parameter.

{
  ...
  "search": {
    "url": {
      "@context": "http://www.w3.org/ns/hydra/context.jsonld",
      "@type":    "IriTemplate",
      "template": "http://services.ld4l.org/ld4l_services/agrovoc_batch.jsp?{?query}&{?maxRecords}&{?lang}",
      "variableRepresentation": "BasicRepresentation",
      "mapping": [
        {
          "@type":    "IriTemplateMapping",
          "variable": "query",
          "property": "hydra:freetextQuery",
          "required": true
        },
        {
          "@type":    "IriTemplateMapping",
          "variable": "maxRecords",
          "property": "hydra:freetextQuery",
          "required": false,
          "default":  "20"
        },
        {
          "@type":    "IriTemplateMapping",
          "variable": "lang",
          "property": "hydra:freetextQuery",
          "required": false
        }
      ]
    },
  ...
}

Also in the same was as for term fetch, you can identify the parameter used by the authority for language.

{
    "qa_replacement_patterns": {
      "query":   "query",
      "lang":    "lang"
    },
    ...
  }
}

Example QA request

The following is an example call to QA passing language as part of the URL.

curl 'http://localhost:3000/qa/search/linked_data/agrovoc_ld4l_cache?q=lait&lang=fr'

Pass language in the HTTP_ACCEPT_LANGUAGE request header

Processing

This is processed the same as passing the language as a parameter on the URL. The only difference is that instead of getting the language from the URL, it gets the language from the request header. See the previous section for details on configuration and processing.

Limitations

Requires the authority to support language filtering OR requires the authority results to have language tagged literals. If neither of these conditions exist, filtering will not be applied to results.

USAGE

Configuration

This is the same as for the previous section on passing language as a parameter.

Example QA request

curl -H 'Accept-Language: fr' 'http://localhost:3000/qa/search/linked_data/agrovoc_ld4l_cache?q=lait'

Authority specific default

Processing

If the language is not passed in through a parameter or the request header, QA will look to see if the authority has a default value to use for all requests to that authority. QA will perform the filtering of results returned from the authority.

Limitations

Requires the authority results to have language tagged literals.

USAGE

Configuration

{
  "term": {
    ...
    "language": "en",
    ...
  },
  "search": {
    ...
    "language": ["en", "fr"],
    ...
  }
}

Site wide default

Processing

If the language is not passed in through a parameter or the request header and there is not a default for the authority, QA will look to see if there is a site wide default value to use for language.

Limitations

Requires the authority results to have language tagged literals.

USAGE

Configuration

Installing qa initializer file...

The site wide language default is configured in the qa initializer. The qa initializer is installed into /config/initializers/qa.rb by the install generator. The generator will also modify routes and perform other actions. If this is a new installation of qa, you can run the installer using...

$ rails generate qa:install

OR you can manually copy the qa intializer from /lib/generators/qa/install/templates/config/initializers/qa.rb to /config/intializers/qa.rb.

Configuring site wide default

Edit /config/intializers/qa.rb and modify the value for (uncommenting if needed)...

config.default_language = :en
Clone this wiki locally