Assignment 1: Multilingual POS Tagging – NLP Course Notes & Research

Video 1: Recitation of AWS Fundamentals 01

Video 2: Recitation of AWS Fundamentals 02

Video 3: Recitation of AWS Fundamentals 03

Video 4: Recitation of AWS Fundamentals 04

assignment slides

Supplementary Figure 1

Learning Objectives

Example Sequence Classification/Labeling Tasks
Overall Framework of Sequence Classification/Labeling
Sequence Featurization Models (BiRNN, Self Attention, CNNs)

Transcript

Some Ideas

crearing a multilingual POS tagger
using a hierarchical model that does partial pooling to learn from multiple languages when working on a low-resource language
creating a surrogate simulated language which
1. has parameters that correspond to the low resource language - resources are drawn from language database like WALS, Ethnologue, and Glottolog Note that the challange then becomes in how to generate the surrogate language based on these parameters. One could try to create real phonetic and morphological rules etc or one might side step this complexity and use a simple mathematical construct to create data to create suitable embeddings.
2. Use a phrase book as a template for generating texts in the surrogate languages. The outcome should be a dataset of translations of the phrase book in multiple languages. Note that it could also be feasible to generate multiple variants for the both the source and target language to avoid overfitting on the phrase book.
3. a priors distribution that follows high resource languages. (i.e. idea that high frequency source words are more likely to be translated to high frequency target words)
4. a language model that is trained on the high resource language and then used to generate the surrogate language
5. a model that is trained on the surrogate language and then used to tag the low resource language

Citation

BibTeX citation:

@online{bochman2022,
  author = {Bochman, Oren},
  title = {Assignment 1: {Multilingual} {POS} {Tagging}},
  date = {2022-02-15},
  url = {https://orenbochman.github.io/notes-nlp/notes/cs11-737/cs11-737-w02-assignment-1/},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2022. “Assignment 1: Multilingual POS Tagging.” February 15, 2022. https://orenbochman.github.io/notes-nlp/notes/cs11-737/cs11-737-w02-assignment-1/.