PASCAL Ontology Learning Challenge
Lonely Planet Dataset Release Notes

This dataset consists of the following files:

t21.tax

Taxonomy definition for the task T2.1. The file contains 103 lines in the form of

is_a(A,B)

which means A is a type of B. All concepts are derived from the “root” concept and are formed using lower-case ASCII letters and the character '_'.

t21.nelist

Contains the list of 1106 named entities to be used for task T2.1, one per line.

t21.colist

List of 96 target concepts for task T2.1, one per line.

t22.nelist

List of 1107 named entities for task T2.2, one per line.

t23.nelist

List of 437 named entities for task T2.3, one per line.

t23.tax

Pruned taxonomy definition for task T2.3.

flatcorpus.txt

Contains the entire corpus of 1801 documents, one per line, starting with “filename: ” and followed by a clear-text version of the document without newlines.

corpus.zip

An archive of the text-only version of the corpus, one file per document.

taxonomy21.jpg

An informative picture of the taxonomy used in task T2.1.

Output file format

Task 2.1: Ontology Population

Your file should contain one line for each entity from t21.nelist. This line should contain the name of the entity followed by a tab character and the name of the concept (from t21.colist) to which your solution has assigned this named entity.

Example:

Acre city
Adriatic Sea sea
Africa continent
Agatha Christie person
...

Task 2.2: Concept Formation

Provide two files. In the first file, there should be one line for each concept you have formed. This line should contain the identifier of the concept followed by a tab character and a comma-separated list of named entities (from t22.nelist) contained in the extension of this concept. Concept identifiers should not contain characters other than alphanumeric ASCII characters and the underscore ('_') character.

Example:

Concept_1     Bulgaria, Honduras, Algeria, Albania, ...
Concept_2     Andaman Sea, Adriatic Sea, Libyan sea, ...
Concept_3     Basilica di San Pietro, Church of St Gregory, Govinda Temple, ...
...

The second file should likewise contain one line for each concept. This line should contain the identifier of the concept followed by a tab character and the label of the concept. The labels will be evaluated manually by a set of human judges.

Example:

Concept_1     country
Concept_2     person
Concept_3     place of worship
...

Task 2.3: Taxonomy Extension

Provide three files. The first one should contain concept identifiers and corresponding lists of named entities (from t23.nelist), and the second one should contain concept identifiers and their human-readable labels. The format of these two files is the same as for task 2.2.

The third file should contain one line for each concept you have formed, consisting of the identifier of this concept, a tab character, and the identifier of its immediate parent concept (from the pruned taxonomy, t23.tax).

Example: (third file only — for the first two files, see examples for Task 2.2)

Concept_1     area
Concept_2     person
Concept_3     sight
...

 

[To the index.]