A Terminological Inventory for Biodiversity

Biodiversity Term Inventory

Handle:	https://hdl.handle.net/21.11129/0000-000B-D395-E (persistent URL to this page)
URL:	http://nactem.ac.uk/bhl_inventory/

In order to construct the inventory, we firstly compiled a species name dictionary by combining all of the names available in Catalogue of Life (CoL), Encyclopedia of Life (EoL) and Global Biodiversity Information Facility (GBIF).
The terms contained in this dictionary were then located within the text of English BHL documents (about 24 million pages of text) using a string matching method.
We then learned vector representations of the identified terms using three different approaches, namely count-based, prediction-based and compositional distributional semantic models (DSMs).
These approaches compute vector representations for both single and multi-word terms.
The cosine similarity between two vectors serves as an indicator of the corresponding terms' semantic relatedness: the higher the cosine similarity, the more related the two terms are.
We finally selected the top-k candidates as the terms that are most semantically related to a given term.

The inventory contains 288,562 names of species whose frequency in BHL documents is at least five.
For each term in the inventory, the 20 topmost semantically similar terms are provided, together with their corresponding similarity scores.
For further digital biodiversity processes, each term is also linked to its URI, UUID and LSID indexed by Global Names.

A search interface that uses the inventory as metadata for query expansion is available at http://nactem.ac.uk/BHLQueryExpansion/.

Download

DistributionLicence

CC - BY

Restrictions: Academic - Non Commercial Use

Download location: hidden

User Nature: Academic

IPR Holder

Paul Thompson

University of Manchester

Research Associate

[javascript protected email address]

School of Computer Science

[javascript protected email address]

Contact Person

Sophia Ananiadou

University of Manchester

Professor

[javascript protected email address]

School of Computer Science

[javascript protected email address]

Lexical Conceptual Resource
text

Lexical Conceptual Resource General Information

Terminological Resource

Encoding

Extratextual information unit: Semantics

Encoding level: Semantics

Linguistic information: Semantics - Domain

Monolingual text lexicalConceptualResourceLanguages

English

Linguality

Linguality type: Monolingual

Text Format

Plain text

Size

288,562 Entries

Character encoding

UTF - 8

Domains

Biodiversity Conforms to Other

Modalities

Written Language

Time Coverage

From 17th century to present

Geographic coverage

Around the world

Metadata

Created: 02/27/2017

Last Updated: 02/27/2017

Metadata Language: English (en)

Metadata Creator

Paul Thompson

University of Manchester

Research Associate

[javascript protected email address]

School of Computer Science

[javascript protected email address]

Version

Version: 1.0

Usage

Foreseen UseNlp Applications

Use NLP Specific: Entity Mention Recognition, Information Extraction, Information Retrieval, Knowledge Discovery, Linguistic Research, Named Entity Recognition, Terminology Extraction, Text Mining

Actual Use - Nlp Applications

Use NLP Specific: Information Retrieval

Documentation

Tool Documentation: Manual

Document Type: Manual

Nhung Nguyen, BHL Inventory, http://www.nactem.ac.uk/meta-net/Narratives/BHL_inventory_narrative.pdf , 2017

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following: