Author

Oren Bochman

Published

Friday, December 20, 2024

Fine tuning LLM for Readability

Question 1: LLM are amazing - can I fine tune a state of the art LLM like Lama 3.1 with what I consider to be very high quality writing from my private library content to create a writing assistant?

Can I do it in a way that is aware of the writer’s style and the text domain other high level features so that the prompt can be used to condition the output this way.

Can I test it on wikipedia articles and see if it can improve the readability of existing and new articles?

Learning to write well

learning to write from:

  • “Best American Science Writing” series
  • Bast papers - sourced from leading conferences
  • Authors
    • Oliver Sacks,
    • Natalie Angier,
    • Alan Lightman,
    • Sylvia Nasar,
    • Matt Ridley
      • Genome
      • Red Queen
      • Viral
    • Steven Pinker
      • How the Mind Works
      • The Blank Slate: The Modern Denial of Human Nature
      • The Language Instinct: How the Mind Creates Language
      • The Secret Life of Verbs
      • Words and Rules
      • Hotheads
    • Richard Dawkins
      • Selfish gene
  • Great Explianers
    • Richard Feynman
      • lectures on physics
      • QED
      • The pleasure of finding things out
      • The character of physical law
      • Surely you’re joking
      • What do you care about what other people think
      • Six easy pieces
    • Levitt and dunbar
      • Freakonomics
    • Michel Foucout
      • Discipline and Punish
      • The Birth of the Clinic
      • The Order of Things
      • The Archaeology of Knowledge
      • Madness and Civilization
    • Leonard Susskind
      • Theoretical Minimum
    • Richard Hawkins
    • Jared Diamond
      • Guns, Germs, and Steel
      • Collapse: How Societies Choose to Fail or Succeed
      • The World Until Yesterday: What Can We Learn from Traditional Societies?
      • The Invisible Hands: Top Hedge Fund Traders on Bubbles, Crashes, and Real Money
      • etc
    • C.S. Lewis
      • A Grief Observed
      • The Problem of Pain
      • The Screwtape Letters
      • The Great Divorce
      • Mere Christianity
      • A Preface to Paradise Lost
    • Eric Metaxas
      • Martin Luther
      • Bonhoffer
      • Discussing Mere Chritianity
    • Yuval Noah Harari
      • Sapiens - A brief history of mankind
      • Homo Deus
      • 21 Lessons for the 21st Century Audiobook
    • Primo Levi
      • The Periodic Kingdom
    • Mcluhan Marshall
      • The Medium Is The Massage
    • Empire of the Summer moon
    • Hidden Figures
    • Art of War
    • Book of five rings
    • Adam Smith
    • The pencil
    • Dan Ariely
    • Chris Anderson
      • The long tail
    • Plato

    • Aristotle

    • Machiavelli
      • The prince
    • James Surowiecki
      • The Wisdom of Crowds
    • Robert A. Caro
      • The Power Broker: Robert Moses and the Fall of New York
      • Working
      • Master of the Senate
    • Stephen Jay Gould
    • Ian Ayres
      • Super Crunchers -
    • Giles Milton
      • Nathaniel’s Nutmeg
      • D-Day The Soldiers’ Story
      • When Hitler Took Cocaine and Lenin Lost His Brain
      • Fascinating Footnotes From History
      • Churchill’s Ministry of Ungentlemanly Warfare
      • Wolfram The Boy Who Went to War
      • The Extraordinary Story of Thomas Pellow and Islam’s One Million White Slaves
      • The Stalin Affair: The Impossible Alliance that Won the War
      • Samurai William: The Englishman Who Opened Japan
      • Edward Trencom’s Nose
      • Russian Roulette - A Deadly Game: How British Spies Thwarted Lenin’s Global Plot

etc

the ideas here are :

  1. the primary text
  2. the wikipedia article on
  3. summaries

where we want to focus on the primary text but and also to highlight its structure

The primary text has lots of words but one top level structure a few chapter level structure many paragraph level structure

idealy we want to learn structures:

\text{paragraph} \to \cdots \to \text{top level} and idealy we would like to learn to research

i.e. source the ‘facts’ from reliable sources which we cite inline.

teach an LLM to rewrite text with high fidelity yet increase thier readability.

high quaity data sets:

  1. wikipedia v.s. higher quality e.g. britanica, or others
  • check this isn’t the best of wikipedia
  • check the citations (is there a significant overlap we are in the ballpark)

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Fine-Tune Llm for {Style} and {Grammar} Advice.},
  date = {2024-12-20},
  url = {https://orenbochman.github.io/posts/2024/2024-09-30-LLMs/readability.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Fine-Tune Llm for Style and Grammar Advice.” December 20, 2024. https://orenbochman.github.io/posts/2024/2024-09-30-LLMs/readability.html.