Abstract
We demonstrate that it is feasible to accurately diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present Nakdimon, a two-layer character-level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources. The model is accompanied by a training set and a test set, collected from diverse sources. –(Gershuni and Pinter 2022)
Outline
The Paper
References
Gershuni, Elazar, and Yuval Pinter. 2022. “Restoring Hebrew Diacritics Without a Dictionary.” In Findings of the Association for Computational Linguistics: NAACL 2022, 1010–18.
Citation
BibTeX citation:
@online{bochman2021,
author = {Bochman, Oren},
title = {Restoring {Hebrew} {Diacritics} {Without} a {Dictionary}},
date = {2021-05-13},
url = {https://orenbochman.github.io/notes-nlp/reviews/paper/2022-nakdimon/},
langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2021. “Restoring Hebrew Diacritics Without a
Dictionary.” May 13, 2021. https://orenbochman.github.io/notes-nlp/reviews/paper/2022-nakdimon/.