Oren Bochman’s Blog
Home
About
Source Code
Report a Bug
Archive
Archive
Notes
Bayesian Specialization
Reinforcement Learning Specialization
Model Thinking
NLP Specialization
AB testing
Posts
Posts
2011
Text Mining With Python
Tidy Text Mining With R
Time management Tips
Text Mining With R
2012
Wikisym 2012
2013
life hacks
2014
2014 10 06 FinTech
2015
2015 02 07 Analytics Checklist
Analytics Checklist
2015 02 07 Optimal Bidding
2015 04 20 All Things Data
HotJar Heat Map Analysis - Dr. David Darmanin
Using Competitive Analysis to Benchmark Your Marketing Efforts Ariel Rosenstein - Similar Web
Using Competitive Analysis to Benchmark Your Marketing Efforts - Ariel Rosenstein - Similar Web
2016
Travel checklist
2017
A/B testing cost and risks?
2018
text annotation with BRAT
2019
Exploding and vanishing nodes.
Docker for data science
2020
Deep Learning Intuitions
brace expansion
How to avoid cross site scripting (XSS) errors with the Jupyter local runtime for Colab
numpy melt down
Meme bank
Pandas Productivity Challenge?
2021
Storytelling and other essentials
Inlining Citations for Wikipedia articles
json-ld
Ebook Hacks
Language models and explainability
Automatic Summarization Task
What is in a citation?
Advertising Models
TensorFlow probability
Bayesian agents
10 Tips To Improve Your Workflow
Modeling Events
Getting more from your agency ?
Q&A and the Winograd schemas
A type of Witness and an evolving Idiom
Customer Lifetime Value - Pareto/NBD (BTYD) Model
Multilevel Models
Hackathon session link dumps & notes
Excel 2019 for Marketing Statistics in pandas
WaveNet
Python Graphs
Transfer learning in NLP
2021 12 07 Attention for Sensor Fusion
Attention for sensor fusion
2022
Robust Regression
Set Up M1 MacBooks for DS & ML
2022 04 01 Bandits
2022 05 05 Command Line
command line
2022 09 16 Adaptive Learning Rate
2022 09 16 Loss Engineering
Loss engineering and uncertainty for multi-task learning
2022 09 22 Entropy for Uncertainty Quantification
entropy for uncertainty quantification
2023
The Great Migration
AutoGluon Cheetsheets
Quarto loves pseudocode
MCMC algorithms
2023 02 01 Ds from Scratch
OLS regression From Scratch
2023 02 20 Ts Nonlinear
2023 02 28 NLP.IL Booking.com
Text2topic Leverage reviews data for multi-label topics classification in Booking.com
Validating NLP data and models
2023 03 01 Braindump
2023 03 01 Spark Emr
2023 03 08 Responsible AI
2023 06 01 Spark
Spark Tips
2023 06 01 Synthesis and Stabilization
Summary: Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization
S3 Series
2024
A definition by Patrick Henry Winston
OCR building blocks
readings in rl
Stumpy
OCR - Brain Dump
Fine-tune llm for Style and Grammar advice.
Evolutionary Games and Population Dynamics Summary
Risk-constrained Markov decision processes
D3.js in in Quarto Observable
SuperLearner
More Sugar please
NLP with RL
Signals Experiment
Villeny pure and simple
Transformations in Linguistic Representation
Shannon Game
😁 Quarto 💖 Mermaid🧜 Mindmaps 🧠
Post With Code
Vitter’s Algorithm
Understanding Emergent Languages
replay buffer questions
Mesa Lessons
LLM the good the bad and the ugly
RAD REPL
Sugar Scapes
Lewis Signaling Game for PettingZoo
Deduction Evaluation
Is compositionality overrated? The view from language emergence
TL-DR rethinking 💭 topological alignment
LLM and the missing link
Six quick tips to improve modeling
Lewis Game from a Bayesian Perspective
ad hoc complex signaling systems
event generator
two ideas on generalization
2024 02 01 Quarto Bootstrap
2024 02 19 Rhetoric
Rhetoric NLP Tasks
2024 05 02 Signaling Games Tikz
2024 05 03 Urn Models
Urn models using Numpy
2024 05 04 Signals Bib
2024 05 09 Roth Erev RL
Roth Erev learning in Lewis signaling games
2024 06 01 Bayesian Agents
2024 06 12 Logic Puzzles
2024 06 13 Hyper
Hyperparameter Optimization
2024 06 23 Zero Inflated Data
zero inflated data
2024 06 25 Mesa Rl
Mesa & RL
Misbehavior of Markets and Scaling in financial prices 1-4
Scaling in financial prices 2
Scaling in financial prices 3
Scaling in financial prices 4
Scaling in financial prices 1
2025
Complex Signals Questions
Updates to the github action
Lessons learnt in optimizing a large-scale pandas application using Polars, FireDucks and cuDF: Go Smart and Save More!
Realtime Financial Fraud Detection with Modern Python
Combining Zarr, HDF5, and TIFF into a single data format
FlexAttention: A Flexible Approach to Attention Mechanisms
Where Have All the Metrics Gone?
Base line Morphology Model
Reviving Survival Analysis: Timeless, Yet Overlooked?
Decisions Under Uncertainty: A Hands‑On Guide to Bayesian Decision Theory
torchTextClassifiers : Modernizing Text classification for French National Statistics
Harnessing Generative Models for Synthetic Non-Life Insurance Data
Books, Courses Tools
GPU Accelerated Zarr
Automating ML with PyCaret: Train & Compare Multiple Models to Find the Best Performer
Rethinking Signaling systems via the lens of compositionality
Time series analysis for coupled neurons.
Building LLM-Powered Applications for Data Scientists and Software Engineers
Engineering Reinforcement Learning Algorithms
FlexAttention: A Flexible Approach to Attention Mechanisms
projspec: what’s this project anyway?
Emergent Languages
Vibe coding GPT5 Edition
Python Meets Excel: Smarter Workflows for Analysts and Data Teams
The Referential Lewis Signaling Game
ShinyLive ❤️ Mesa Tutorial
Building a Lightweight Feature Store for Electricity Grid Forecasts with Polars
The Lifecycle of a Jupyter Environment - From Exploration to Production-Grade Pipelines
When the Meter Maxes Out: Chernobyl Disaster Lessons for ML Systems in Production
AI a bag of tricks
Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare
When AI Makes Things Up: Understanding and Tackling Hallucinations
Langtalks Resources # 43
Planning in the Complex Lewis Game
Python Worst Practices - Learn from the Expert
Scaling Fuzzy Product Matching with BM25: A Comparative Study of Python and Database Solutions
Garbage In, Lawsuit Out: Building Compliant and Reproducible ML Pipelines
Optimal Variable Binning in Logistic Regression
Hands-on with Blosc2: Accelerating Your Python Data Workflows
Designing a Fast, Offline-Capable Reverse Geocoder in Python: An Open Source Alternative to Big Geo APIs
How to Effectively use text embeddings in tree based models
I Built a Transformer from Scratch So You Don’t Have To
Allegations of War Crimes and the Palestinian Genocide
From Feature Engineering to Context Engineering for Agents
Probabilitic Modeling with Language Models
Using MCP to turn Claude into a Football Opposition Analyst
Bodo DataFrames: a fast and scalable HPC-based drop-in replacement for Pandas
Complex Lewis Signaling - The Research Questions
The roles of Partial pooling and mixed strategies in the Lewis signaling game
Stochastic gradient Descent – a Deep Dive
A garden of forking paths
Optimizing AI/ML Workloads: Resource Management and Cost Attribution
From Ideas to APIs: Delivering Fast with Modern Python
Scaling Data Processing for LLMs with NeMo Curator
Off-Policy Learning
I Built a Transformer from Scratch So You Don’t Have To
I Built a Transformer from Scratch So You Don’t Have To
FlexAttention: A Flexible Approach to Attention Mechanisms
LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language
How Big are SLMs
PyData/Sparse & Finch - Extending sparse computing in the Python ecosystem
Marketing Mix Model
The Many Path To A Signaling System
Podcast
Archive
I Built a Transformer from Scratch So You Don’t Have To
I Built a Transformer from Scratch So You Don’t Have To
Realtime Financial Fraud Detection with Modern Python
Combining Zarr, HDF5, and TIFF into a single data format
Friday, December 12, 2025
GPU Accelerated Zarr
Friday, December 12, 2025
Bodo DataFrames: a fast and scalable HPC-based drop-in replacement for Pandas
Friday, December 12, 2025
Garbage In, Lawsuit Out: Building Compliant and Reproducible ML Pipelines
Friday, December 12, 2025
Building a Lightweight Feature Store for Electricity Grid Forecasts with Polars
Friday, December 12, 2025
I Built a Transformer from Scratch So You Don’t Have To
Friday, December 12, 2025
Scaling Data Processing for LLMs with NeMo Curator
Friday, December 12, 2025
PyData/Sparse & Finch - Extending sparse computing in the Python ecosystem
Friday, December 12, 2025
How to Effectively use text embeddings in tree based models
Friday, December 12, 2025
When the Meter Maxes Out: Chernobyl Disaster Lessons for ML Systems in Production
Friday, December 12, 2025
Automating ML with PyCaret: Train & Compare Multiple Models to Find the Best Performer
Thursday, December 11, 2025
How Big are SLMs
Thursday, December 11, 2025
Hands-on with Blosc2: Accelerating Your Python Data Workflows
Wednesday, December 10, 2025
Decisions Under Uncertainty: A Hands‑On Guide to Bayesian Decision Theory
Wednesday, December 10, 2025
From Ideas to APIs: Delivering Fast with Modern Python
Wednesday, December 10, 2025
Optimal Variable Binning in Logistic Regression
Wednesday, December 10, 2025
Optimizing AI/ML Workloads: Resource Management and Cost Attribution
Wednesday, December 10, 2025
Reviving Survival Analysis: Timeless, Yet Overlooked?
Wednesday, December 10, 2025
Time series analysis for coupled neurons.
Wednesday, December 10, 2025
Using MCP to turn Claude into a Football Opposition Analyst
Wednesday, December 10, 2025
Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare
Tuesday, December 9, 2025
Building LLM-Powered Applications for Data Scientists and Software Engineers
Tuesday, December 9, 2025
From Feature Engineering to Context Engineering for Agents
Tuesday, December 9, 2025
The Lifecycle of a Jupyter Environment - From Exploration to Production-Grade Pipelines
Tuesday, December 9, 2025
LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language
Tuesday, December 9, 2025
Lessons learnt in optimizing a large-scale pandas application using Polars, FireDucks and cuDF: Go Smart and Save More!
Tuesday, December 9, 2025
projspec: what’s this project anyway?
Tuesday, December 9, 2025
Python Meets Excel: Smarter Workflows for Analysts and Data Teams
Tuesday, December 9, 2025
Python Worst Practices - Learn from the Expert
Tuesday, December 9, 2025
Designing a Fast, Offline-Capable Reverse Geocoder in Python: An Open Source Alternative to Big Geo APIs
Tuesday, December 9, 2025
Scaling Fuzzy Product Matching with BM25: A Comparative Study of Python and Database Solutions
Tuesday, December 9, 2025
Harnessing Generative Models for Synthetic Non-Life Insurance Data
Tuesday, December 9, 2025
torchTextClassifiers : Modernizing Text classification for French National Statistics
Tuesday, December 9, 2025
When AI Makes Things Up: Understanding and Tackling Hallucinations
Tuesday, December 9, 2025
Where Have All the Metrics Gone?
Tuesday, December 9, 2025
Stochastic gradient Descent – a Deep Dive
Thursday, October 9, 2025
Updates to the github action
Wednesday, October 1, 2025
Vibe coding GPT5 Edition
Thursday, September 25, 2025
FlexAttention: A Flexible Approach to Attention Mechanisms
Saturday, September 20, 2025
FlexAttention: A Flexible Approach to Attention Mechanisms
Saturday, September 20, 2025
FlexAttention: A Flexible Approach to Attention Mechanisms
Saturday, September 20, 2025
Probabilitic Modeling with Language Models
Sunday, September 14, 2025
Marketing Mix Model
Wednesday, September 10, 2025
Allegations of War Crimes and the Palestinian Genocide
Sunday, September 7, 2025
AI a bag of tricks
Thursday, September 4, 2025
ShinyLive ❤️ Mesa Tutorial
Tuesday, September 2, 2025
Langtalks Resources # 43
Thursday, April 3, 2025
Base line Morphology Model
Wednesday, April 2, 2025
Complex Lewis Signaling - The Research Questions
Wednesday, April 2, 2025
The roles of Partial pooling and mixed strategies in the Lewis signaling game
Tuesday, March 11, 2025
Emergent Languages
Tuesday, January 14, 2025
Planning in the Complex Lewis Game
Tuesday, January 14, 2025
The Referential Lewis Signaling Game
Tuesday, January 14, 2025
A garden of forking paths
Saturday, January 11, 2025
Complex Signals Questions
Monday, January 6, 2025
The Many Path To A Signaling System
Sunday, January 5, 2025
Off-Policy Learning
Saturday, January 4, 2025
Rethinking Signaling systems via the lens of compositionality
Thursday, January 2, 2025
Lewis Signaling Game for PettingZoo
Wednesday, January 1, 2025
Books, Courses Tools
Wednesday, January 1, 2025
Villeny pure and simple
Thursday, December 12, 2024
Misbehavior of Markets and Scaling in financial prices 1-4
Monday, December 2, 2024
Scaling in financial prices 4
Sunday, December 1, 2024
Scaling in financial prices 3
Saturday, November 30, 2024
Scaling in financial prices 2
Friday, November 29, 2024
Scaling in financial prices 1
Thursday, November 28, 2024
Vitter’s Algorithm
Friday, October 11, 2024
TL-DR rethinking 💭 topological alignment
Tuesday, October 1, 2024
LLM the good the bad and the ugly
Monday, September 30, 2024
LLM and the missing link
Saturday, September 28, 2024
NLP with RL
Friday, September 27, 2024
Deduction Evaluation
Thursday, September 26, 2024
Fine-tune llm for Style and Grammar advice.
Wednesday, September 25, 2024
Is compositionality overrated? The view from language emergence
Sunday, September 1, 2024
Six quick tips to improve modeling
Monday, August 26, 2024
Stumpy
Thursday, August 8, 2024
replay buffer questions
Tuesday, July 2, 2024
two ideas on generalization
Monday, July 1, 2024
Mesa & RL
Tuesday, June 25, 2024
zero inflated data
Sunday, June 23, 2024
readings in rl
Tuesday, June 18, 2024
Hyperparameter Optimization
Thursday, June 13, 2024
More Sugar please
Tuesday, June 11, 2024
Risk-constrained Markov decision processes
Tuesday, June 11, 2024
Evolutionary Games and Population Dynamics Summary
Sunday, May 12, 2024
Roth Erev learning in Lewis signaling games
Thursday, May 9, 2024
Signals Experiment
Tuesday, May 7, 2024
ad hoc complex signaling systems
Sunday, May 5, 2024
Shannon Game
Thursday, May 2, 2024
Urn models using Numpy
Thursday, May 2, 2024
RAD REPL
Wednesday, May 1, 2024
Mesa Lessons
Sunday, March 31, 2024
Sugar Scapes
Sunday, March 31, 2024
OCR building blocks
Thursday, March 28, 2024
A definition by Patrick Henry Winston
Sunday, March 3, 2024
OCR - Brain Dump
Sunday, February 25, 2024
Rhetoric NLP Tasks
Saturday, February 17, 2024
😁 Quarto 💖 Mermaid🧜 Mindmaps 🧠
Monday, February 12, 2024
Lewis Game from a Bayesian Perspective
Monday, February 12, 2024
The Great Migration
Tuesday, January 30, 2024
Post With Code
Sunday, January 28, 2024
SuperLearner
Wednesday, January 10, 2024
Engineering Reinforcement Learning Algorithms
Wednesday, January 10, 2024
Understanding Emergent Languages
Thursday, January 4, 2024
D3.js in in Quarto Observable
Tuesday, January 2, 2024
AutoGluon Cheetsheets
Wednesday, December 20, 2023
Summary: Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization
Thursday, June 1, 2023
Spark Tips
Thursday, June 1, 2023
MCMC algorithms
Saturday, April 22, 2023
Quarto loves pseudocode
Tuesday, April 11, 2023
Text2topic Leverage reviews data for multi-label topics classification in Booking.com
Tuesday, February 28, 2023
Validating NLP data and models
Tuesday, February 28, 2023
Transformations in Linguistic Representation
Wednesday, February 22, 2023
event generator
Thursday, February 16, 2023
OLS regression From Scratch
Wednesday, February 1, 2023
entropy for uncertainty quantification
Thursday, September 22, 2022
Robust Regression
Monday, September 12, 2022
Loss engineering and uncertainty for multi-task learning
Monday, September 12, 2022
Wikisym 2012
Tuesday, July 26, 2022
Set Up M1 MacBooks for DS & ML
Thursday, May 5, 2022
command line
Thursday, May 5, 2022
Meme bank
Thursday, December 30, 2021
Getting more from your agency ?
Monday, September 27, 2021
Excel 2019 for Marketing Statistics in pandas
Friday, September 24, 2021
Language models and explainability
Friday, September 24, 2021
Attention for sensor fusion
Friday, September 24, 2021
Advertising Models
Tuesday, September 14, 2021
Customer Lifetime Value - Pareto/NBD (BTYD) Model
Tuesday, September 14, 2021
Storytelling and other essentials
Thursday, September 2, 2021
WaveNet
Sunday, August 29, 2021
Python Graphs
Sunday, August 29, 2021
What is in a citation?
Sunday, August 29, 2021
Hackathon session link dumps & notes
Friday, August 13, 2021
Inlining Citations for Wikipedia articles
Friday, August 13, 2021
Transfer learning in NLP
Friday, August 13, 2021
A type of Witness and an evolving Idiom
Wednesday, July 14, 2021
json-ld
Thursday, July 1, 2021
TensorFlow probability
Tuesday, June 1, 2021
Ebook Hacks
Saturday, May 29, 2021
Multilevel Models
Sunday, May 16, 2021
Q&A and the Winograd schemas
Tuesday, April 27, 2021
Automatic Summarization Task
Saturday, April 24, 2021
Bayesian agents
Wednesday, April 14, 2021
Modeling Events
Friday, April 9, 2021
10 Tips To Improve Your Workflow
Wednesday, April 7, 2021
numpy melt down
Sunday, November 29, 2020
Deep Learning Intuitions
Sunday, October 25, 2020
brace expansion
Friday, June 12, 2020
Pandas Productivity Challenge?
Wednesday, March 4, 2020
How to avoid cross site scripting (XSS) errors with the Jupyter local runtime for Colab
Thursday, February 20, 2020
Docker for data science
Sunday, November 24, 2019
Exploding and vanishing nodes.
Wednesday, July 31, 2019
text annotation with BRAT
Tuesday, January 16, 2018
A/B testing cost and risks?
Sunday, July 30, 2017
Travel checklist
Wednesday, December 14, 2016
HotJar Heat Map Analysis - Dr. David Darmanin
Wednesday, April 20, 2016
Using Competitive Analysis to Benchmark Your Marketing Efforts Ariel Rosenstein - Similar Web
Monday, April 20, 2015
Using Competitive Analysis to Benchmark Your Marketing Efforts - Ariel Rosenstein - Similar Web
Monday, April 20, 2015
Analytics Checklist
Saturday, February 7, 2015
life hacks
Friday, June 7, 2013
Text Mining With Python
Tuesday, November 29, 2011
Text Mining With R
Tuesday, November 29, 2011
Tidy Text Mining With R
Tuesday, November 29, 2011
Time management Tips
Thursday, August 11, 2011
No matching items