The missing link is my name for a set of agents that should be able to edit wikipedia or at least to significantly reduce the effort needed to contribute to wikipedia.
- Wikipedia has a number of task and challenges
- Wikipedia also offers unique opportunities for learning not available elsewhere (edit histories, talk pages, etc.)
tasks:
- Wikification - use entropy maximize the entropy and mutual information of the wiki - i.e. choose links to other articles that are most likely to be clicked on rather than the the most most famous or like USA - which contributes no information to the reader.
- inlining citations
- adding missing references
- adding missing sections across languages
- Improving readability
- most wikipedia articles are poorly written when compared with the best science writing in the world.
- Addressing biases and COI issues. [^we nay need to train the LLM on material that does not include wikipedia or to create a version that can separate wikipedia and non wikipedia material possibly using CLIP?]
- with the advent of LLM we can now collect all the material in an articles Sources and use it to rewrite a more complete article and perhaps one with fewer biases.1 Further more it is fairly easy to source additional material from the web and other sources and thus again allowing a second view of the the articles point of view.
- Addressing vandalism and spam - this can be learned across articles
- Extracting wikidata from articles again this can be learned across many articles by mapping the article to the wikidata entries of the primary and secondary entities.
- Replace low register terms with high register terms - with an eye to improving readability. One hopes that the higher register terms are more precise and less ambiguous.
- Replace highly ambiguous terms with less ambiguous terms. The same perhaps for sentences.
- Make use of other media - diagrams, maths, code, images, videos, maps and so on should be more than referenced in the text.
1 LLM inherit and amplify biases from thier training material, so this aspect is an area of active research and may require some creativity
Citation
BibTeX citation:
@online{bochman2024,
author = {Bochman, Oren},
title = {LLM and the Missing Link},
date = {2024-12-20},
url = {https://orenbochman.github.io/posts/2024/2024-09-30-LLMs/the-missing-link.html},
langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “LLM and the Missing Link.” December
20, 2024. https://orenbochman.github.io/posts/2024/2024-09-30-LLMs/the-missing-link.html.