Introduction

supervised and unsupervised models are great for most NLP tasks, yet both approaches have their limitations. Supervised models require labeled data, which is usually in short supply, while unsupervised models often lack the precision needed for many applications.

Large language models like GPT-3 can provide us with a way to genereate text on demand. They are weak when generating from sparse data (Hellucination) and can be biased.

Perhaps RL which is able to learn from mistakes, as well as use supervised as well as unsupervised learning as representations can perhaps bridge the gaps that LLM cannot adress so well.

The main challanges in RL however are difficulty in Trasfer learning, or generalizing between similar and or related tasks. I believe that within the NLP domain transfer learning seems to be a bit easier than in other domains, as the representations learned from one task can be used in another task.

NLP is a domain in which skills learned in one task may be transferrable into other task.
Multigoal learning can be used to solve this issue.
Meta learning can be used to learn from multiple tasks and generalize to new tasks.
Curriculum learning can be applied to:
- identify weaknesses in the model
- collect more samples to adress weakness
- collect appropriate samples to correct for biases that emerge in the model.
Evolving language from scratch using lewis siggins games and thier extentions can also be viewed as as from of Meta learning.
Can we define a abstract hamiltonian that can be used for energy/entropy based generation of text using the hamiltonian of the promprts/context
- Can we define a minimalist grammar using this abstract formalism.
- Can we define a resonant solution multiple hamiltonians that interact on differernt levels.
- Can we make this something that is a good fit for multi-headed attention perhaps analagous to how the finit state machine morphology was simplified by understanding that the FSM can be represented as a regular expression and that the generations we bounded by the lexicon. In other words can we create a hamiltonian that introduced contraints on the generation of text using different heads of the transformer model and thereby places bounds on the computational complexity of the model?

Rewards for different task can be defined in different ways, and the reward function can be used to guide the model to learn the task. At the start the reward function seems to be the greatest unknown. I believe that this will be the most interesting part of the project and perhaps a driver for innovation.

Tasks and components

Creating a minimalist language model that allows for pretraining and fine tuning with small payloads of data generated by the RL agents.
Augmenting the LLM with sophisticated embeddings that are most amenable for transfer learning.

Citation

BibTeX citation:

@online{bochman2025,
  author = {Bochman, Oren},
  title = {NLP with {RL}},
  date = {2025-01-13},
  url = {https://orenbochman.github.io/posts/2024/2024-09-30-LLMs/rl.html},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2025. “NLP with RL.” January 13, 2025. https://orenbochman.github.io/posts/2024/2024-09-30-LLMs/rl.html.