The Hebrew Treebank Version 2.0 contains 6500 hand-annotated sentences of news items from the MILA HaAretz Corpus, with full word segmentation and morpho-syntactic analysis. Morphological features that are not directly relevant for syntactic structures, like roots, templates and patterns, are not analyzed.
Hebrew Treebank 2.0
- Documentation (in English)
- Documentation (in Hebrew)
- Original Text (Hebrew Characters)
- Original Text (Transliterated Characters)
- Treebank Files (Web-Viewing Version)
- Treebank Files (SEMTAGS Version)
- Treebank Files (Transliterated Version)
Hebrew Treebank 1.0
- Documentation (in English)
- Documentation (in Hebrew)
- Annotator Manual (in English)
- Original Text (Hebrew Characters)
- Original Text (Transliterated Characters)
- Morphologically Annotated Corpus Based on Syntax Trees (Hebrew Characters)
- Morphologically Annotated Corpus Based on Syntax Trees (Transliterated Characters)
- Parse Trees (Hebrew characters)
- Parse Trees (English characters)
- Parse Trees v. 1.1 (English characters)
Related Publications
- Khalil Sima'an, Alon Itai, Yoad Winter, Alon Altman and Noa Nativ. "Building a Tree-Bank of Modern Hebrew Text." Traitment Automatique des Langues, 42, 347-380. 2001. [BibTeX]
Credits
- Principle Investigators: Khalil Sima'an and Yoad Winter.
- Lexicographers: Nomi Guthman and Adi Milea
License
This resource can be used freely for research purposes only (please register to access password-protected files). For copyright reasons, this corpus is unavailable for commercial usage. Any publication resulting from the use of this corpus should refer to it as "The MILA Hebrew Treebank" and cite:
Khalil Sima'an, Alon Itai, Yoad Winter, Alon Altman and Noa Nativ. "Building a Tree-Bank of Modern Hebrew Text." Traitment Automatique des Langues, 42, 347-380. 2001. [BibTeX]
View all corpora...
View corpus standards...
Register to access the password-protected corpora files for non-commercial purposes...
