User Tools

Site Tools


corpus

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
corpus [2021/04/12 17:31]
annapineda
corpus [2021/06/17 08:35] (current)
annapineda
Line 23: Line 23:
 ==== LANGUAGES ==== ==== LANGUAGES ====
  
-At the first stage of the project, the selection of 100.000 tokens per half century and genre will be made for a total amount of languages:+At the first stage of the project, the selection of 100,000 tokens per half century and genre will be made for a total amount of languages:
  
   - French   - French
Line 42: Line 42:
 On the other hand, we of course foresee that, for some languages, there will be cells that cannot be entirely filled out, at least from the very beginning. In this regard, it is important to highlight that the project is conceived as a growing and ongoing initiative, which will be enlarged as time passes by and more collaborations are established. On the other hand, we of course foresee that, for some languages, there will be cells that cannot be entirely filled out, at least from the very beginning. In this regard, it is important to highlight that the project is conceived as a growing and ongoing initiative, which will be enlarged as time passes by and more collaborations are established.
  
-Finally, it should be noted that, in addition to the 8 major languages mentioned, the corpus also envisages to include texts from **Sardinian**,​ **Francoprovençal** and **Rhaeto-Romance**. For these languages, the scarcity of resources available does not allow us to envisage gathering texts for all the genres and periods of time, at least not in a significant way. Despite this not being possible and therefore comparative work including these languages being affected, we think they need to be represented in the corpus as much as possible, thus we have opted for representing the genres / periods available. For example, for Sardinian we have at our disposal a notable number of legal texts from the 13th and 14th centuries.+Finally, it should be noted that, in addition to the 8 major languages mentioned, the corpus also envisages to include texts from **Sardinian**,​ **Francoprovençal** and **Rhaeto-Romance**. For these languages, the scarcity of resources available does not allow us to envisage gathering texts for all the genres and periods of time, at least not in a significant way. Despite this not being possible and therefore comparative work including these languages being affected, we think they need to be represented in the corpus as much as possible, thus we have opted for representing the genres / periods available. For example, for Sardinian we have especially ​at our disposal a notable number of legal texts from the 13th and 14th centuries, whereas the number of texts for the remaining genres and centuries is being currently increased.
  
  
Line 48: Line 48:
 ==== LIST OF TEXTS ==== ==== LIST OF TEXTS ====
  
-  - [[https://​www.dropbox.com/​s/​rkfgoypndlngub7/​FR%20selection%20of%20texts%20WEB.pdf?​dl=0| Proposal for French]].+  - [[https://​www.dropbox.com/​s/​rkfgoypndlngub7/​FR%20selection%20of%20texts%20WEB.pdf?​dl=0| Proposal for French ​version Word]] [[https://​www.dropbox.com/​s/​h49ygvpyeeidx1i/​DATABASE%20-%20FRENCH%20-%20selection_2020.06.15--.pdf?​dl=0| Version Excel]]
   - Proposal for Italian [not available yet]   - Proposal for Italian [not available yet]
-  - [[https://​www.dropbox.com/​s/​ktnwzy47iu8wpsl/​CAT%20selection%20of%20texts%20WEB.pdf?​dl=0| Proposal for Catalan]] +  - [[https://​www.dropbox.com/​s/​ktnwzy47iu8wpsl/​CAT%20selection%20of%20texts%20WEB.pdf?​dl=0| Proposal for Catalan ​version Word]] [[https://​www.dropbox.com/​s/​k0fifehag1ptifj/DATABASE%20-%20CATALAN%20-%20selection_2020.06.10..pdf?dl=0 | Version Excel]] 
-  - [[https://​www.dropbox.com/​s/​tul94eslfnkson0/SP%20selection%20of%20texts%20WEB.pdf?​dl=0| ​Proposal for Spanish]] +  - [[https://​www.dropbox.com/​scl/​fi/​6nkvr5txatgvr3av6h7gb/​SP-selection-of-texts-WEB.docx?​dl=0&​rlkey=79pg4vag3w3rl0vmrjzpv7k3q| Proposal for Spanish version Word]] ​[[https://​www.dropbox.com/​s/​jbc7dzsmndwaipg/DATABASE%20-%20SPANISH%20-%20selection_2020.07.09-.pdf?​dl=0 ​| Version Excel]] 
-  - [[https://​www.dropbox.com/​s/​ljg19wxx8ly2ec2/PORT%20selection%20of%20texts%20WEB.pdf?dl=0| Proposal for Portuguese]] +  - [[https://​www.dropbox.com/​scl/​fi/​we5qvk2fgprwbzx38nyso/​PORT-selection-of-texts-WEB.docx?​dl=0&​rlkey=b6rn7ksdfh87uikrtkbtmd917| Proposal for Portuguese ​version Word]] [[https://​www.dropbox.com/​s/​a7gkwkhi84stmc2/​DATABASE%20-%20PORTUGUESE%20-%20%20selection_2021.05.pdf?​dl=0 | Version Excel]] 
-  - [[https://​www.dropbox.com/​s/​vdqzwpre95ojj93/​OC%20selection%20of%20texts%20WEB.pdf?​dl=0| Proposal for Occitan]]+  - [[https://​www.dropbox.com/​s/​vdqzwpre95ojj93/​OC%20selection%20of%20texts%20WEB.pdf?​dl=0| Proposal for Occitan ​version Word]] [[https://​www.dropbox.com/​s/​7les6yd27lay4mi/​DATABASE%20-%20OCCITAN%20-%20selection_2020.07.09..pdf?​dl=0 | Version Excel]]
   - Proposal for Gascon [not available yet]   - Proposal for Gascon [not available yet]
-  - [[https://​www.dropbox.com/​s/oijkemzf3oltclp/SARD%20selection%20of%20texts%20WEB.pdf?​dl=0| ​Proposal for Sardinian]] +  - [[https://​www.dropbox.com/​scl/fi/​m63gf5bh8vugsqusi8wms/SARD-selection-of-texts-WEB.docx?​dl=0&​rlkey=wcmpieksj6n3agcpqvp6q82pj| Proposition for Sardinian version Word]] [[https://​www.dropbox.com/​s/​t3amwpfew6su1n8/​DATABASE%20-%20SARDINIAN-def..pdf?​dl=0| ​Version Excel]] 
-  - [[https://​www.dropbox.com/​s/​rl3li1fdifczbox/FR-PROV%20selection%20of%20texts%20WEB.pdf?​dl=0| ​Proposal for Francoprovençal]] +  - [[https://​www.dropbox.com/​scl/​fi/​m63gf5bh8vugsqusi8wms/​SARD-selection-of-texts-WEB.docx?​dl=0&​rlkey=wcmpieksj6n3agcpqvp6q82pj| Proposal for Francoprovençal version Word]] ​[[https://​www.dropbox.com/​s/​3yloh1jhst6c5oq/DATABASE%20-%20FRANCOPROVEN%C3%87AL-d%C3%A9f.pdf?dl=0 | Version Excel]] 
-  - Proposal for Rhaeto-Romance [not available yet]+  -  Proposal for Rhaeto-Romance [not available yet] 
 + 
 + 
 + 
 + 
 + 
corpus.1618248660.txt.gz · Last modified: 2021/04/12 17:31 by annapineda