未分类

The Annotated Corpus of Classical Tibetan (ACTib) – Version 2.0

(Segmented & POS-tagged)

Creators

Show affiliations

Description

This corpus consisting of >185 million tokens is a segmented and part-of-speech tagged version of

Wallman, Jeff, Rowinski, Zach, Ngawang Trinley, Tomlinson, Chris, & Keutzer, Kurt. (2017). Collection of Tibetan etexts compiled by the Buddhist Digital Resource Center [Data set]. Zenodo. http://doi.org/10.5281/zenodo.821218

using the training data of

Hill, Nathan W., & Garrett, Edward. (2017). A part-of-speech (POS) tagged corpus of Classical Tibetan [Data set]. Zenodo. http://doi.org/10.5281/zenodo.574878

The code for segmenting and POS tagging any Tibetan file can be found on GitHub.

This Version 2 of ACTib is based on the same XML files as ACTib Version 1 (http://doi.org/10.5281/zenodo.823707), but contains both segmented and POS-tagged files and is improved in a number of ways, although post-processing was still done automatically and no manual correction was involved. For details of this improved annotation method see:

Meelen, Marieke, Roux, Élie & Hill, Nathan (forthcoming). ‘Optimisation of the largest annotated Tibetan corpus combining rule-based, memory-based & deep-learning methods’ in TALLIP.

Notes

Acknowledgements go to the British Academy for funding Meelen’s research through grant pf170063.

Files

results-eKangyur.zip

https://zenodo.org/records/3951486/preview/results-eKangyur.zip?include_deleted=0

Files (1.6 GB)

NameSize Download all
results-eKangyur.zipmd5:6dc2eaa0d904f983dcbe8a0f079768a3 78.6 MBPreview Download
results-eTengyur.zipmd5:207dd6d020efebeb2df727dbaceb615b 192.6 MBPreview Download
results-GuruLamaWorks.zipmd5:a33a58ec60b9fbc4f4e420a8a314efc5 222.2 MBPreview Download
results-KarmaDelek.zipmd5:d8ac07e85670b6354205afdd2cc599a6 84.2 MBPreview Download
results-PalriParkhang.zipmd5:4419bf8785c1553dbda608033de8c1e3 17.5 MBPreview Download
results-Shechen.zipmd5:a0f94d337feefe478f744ad5eb176e55 65.6 MBPreview Download
results-TulkuSangag.zipmd5:12ee3b32e598362aa11606b96520a441 15.2 MBPreview Download
results-VajraVidya.zipmd5:50ac6dae81ed84c99e5365f2f045719b 12.8 MBPreview Download
results-Various.zipmd5:702e386dbc72f1b9eca6f7d4344e8185 17.5 MBPreview Download
SegPOS-DharmaDownload_July2020.zipmd5:3a24715c7b1181be0ee2bf671c8142e0 91.7 MBPreview Download
SegPOS-DrikungChetsang_July2020.zipmd5:f9c3c9dd9fdd4e597ad1d6283e68cf9f 42.1 MBPreview Download
SegPOS-eKangyur_July2020.zipmd5:fe9e87295f8a5408adc69019f9976da7 79.0 MBPreview Download
SegPOS-eTengyur_July2020.zipmd5:becaf3b43d67f8a903278f0fd71e78d9 192.9 MBPreview Download
SegPOS-GuruLamaworks_July2020.zipmd5:1ff9ccc99e372d6cae11d44f1204febd 222.2 MBPreview Download
SegPOS-KarmaDelek_July2020.zipmd5:75836c8ca070af0b543c17f674b4a628 85.1 MBPreview Download
SegPOS-PalriParkhang_July2020.zipmd5:2510e3ea27ae0230cc6b060226e5fca1 17.5 MBPreview Download
SegPOS-Shechen_July2020.zipmd5:2552ba9369579462db2ef7d2cdf94ad2 68.2 MBPreview Download
SegPOS-TulkuSangag_July2020.zipmd5:e846876dd34da3770e8df0620491b14d 15.1 MBPreview Download
SegPOS-VajraVidya_July2020.zipmd5:29ce6ea2ec4ae90257c0be7a5d8eb8c5 12.9 MBPreview Download
SegPOS-Various_July2020.zipmd5:82c1fe04c44296b1e18f47815a8d7d88 18.0 MBPreview Download

Citations

AI相关的一切

留言

您的邮箱地址不会被公开。 必填项已用 * 标注