Standards

The corpus and lexicon encoding standards below were developed by MILA and are used throughout its resources and tools, facilitating resource reuse and compatibility. MILA encourages other researchers and organizations to adopt these standards as well in their work.


Transliteration Scheme

Each Hebrew character is represented as a Latin character equivalent in the transliteration attribute of the XML tag:

א ב ג ד ה ו ז ח ט י כ ל מ נ ס ע פ צ ק ר ש ת
a b g d h w z x v i k l m n s y p c q r e t

(Note no distinction is made for Hebrew final-form letters.)


XML Schema for Corpora

The XML schema for the representation of morpho-syntactically annotated Hebrew corpora:

Previous versions:


XML Schema for Lexicons

The XML schema for the representation of morpho-syntactically annotated Hebrew lexicons:

Previous versions: