This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
meta-data_annotation [2020/05/15 06:09] antoinegautier created |
meta-data_annotation [2021/04/12 17:27] (current) annapineda [METADATA FOR BUILDING SUB-CORPORA] |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ===== META-DATA & ANNOTATION ===== | ||
| ==== METADATA FOR BUILDING SUB-CORPORA ==== | ==== METADATA FOR BUILDING SUB-CORPORA ==== | ||
| The following parameters are used to select a sub-corpus or to segment the corpus into different sub-corpora (by genre, century, dialect, etc.). | The following parameters are used to select a sub-corpus or to segment the corpus into different sub-corpora (by genre, century, dialect, etc.). | ||
| - | * Genres (link to genres section) | + | * Genre [[http://stih-sorbonne-universite.fr/dokuwiki/doku.php?id=corpus| Link]] |
| - | * Language (link to languages section) | + | * Language [[http://stih-sorbonne-universite.fr/dokuwiki/doku.php?id=corpus|Link]] |
| * Dialect | * Dialect | ||
| * Century | * Century | ||
| * Form: verse/prose/hybrid | * Form: verse/prose/hybrid | ||
| * Relation: original, translation, adaptation, comment | * Relation: original, translation, adaptation, comment | ||
| - | |||
| - | |||
| ==== OTHER META-DATA ==== | ==== OTHER META-DATA ==== | ||
| Researchers can use other meta-data to refine their queries: | Researchers can use other meta-data to refine their queries: | ||
| Line 25: | Line 24: | ||
| - | ==== POS-TAGGING ==== | + | ==== PoS-TAGGING ==== |
| - | [Explain the stage of tagging of each text.] | + | |
| + | List of morpho-syntactic tags for CoRaLHis | ||
| + | |||
| + | * ADJ = Adjective | ||
| + | * ADV = Adverb | ||
| + | * CON = Conjunction | ||
| + | * DET = Determiner | ||
| + | * NOM = Noun | ||
| + | * PRE = Preposition | ||
| + | * PRO = Pronoun | ||
| + | * VER = Verb | ||
| + | * PUN = Punctuation | ||
| + | * INJ = Interjection | ||
| - | ==== OTHER ANNOTATIONS ==== | ||
| - | [Explain other types of annotation adopted in the corpus, such as direct speech and semantic labels.] | ||