Building LLM-Powered Applications for Data Scientists and Software Engineers

PyData Global 2025 Recap

A detailed recap of the PyData Global 2025 workshop on building LLM-powered applications, focusing on software engineering principles, prompt engineering, monitoring, and practical application development.
PyData
Generative AI
Large Language Models
Software Engineering
Data Science
Author

Oren Bochman

Published

Tuesday, December 9, 2025

Keywords

PyData, Generative AI, Large Language Models, Software Engineering, Data Science

pydata global

pydata global
TipWorkshop Overview

This workshop is designed to equip software engineers with the skills to build and iterate on generative AI-powered applications. Participants will explore key components of the AI software development lifecycle through first principles thinking, including prompt engineering, monitoring, evaluations, and handling non-determinism. The session focuses on using LLMs to build applications, such as querying PDFs, while providing insights into the engineering challenges unique to AI systems. By the end of the workshop, participants will know how to build a PDF-querying app, but all techniques learned will be generalizable for building a variety of generative AI applications.

If you’re a data scientist, machine learning practitioner, or AI enthusiast, this workshop can also be valuable for learning about the software engineering aspects of AI applications, such as lifecycle management, iterative development, and monitoring, which are critical for production-level AI systems.

TipWhat You’ll Learn:
  • How to integrate AI models and APIs into a practical application.
  • Techniques to manage non-determinism and optimize outputs through prompt engineering.
  • How to monitor, log, and evaluate AI systems to ensure reliability.
  • The importance of handling structured outputs and using function calling in AI models.
  • The software engineering side of building AI systems, including iterative development, debugging, and performance monitoring.
  • Practical experience in building an app to query PDFs using multimodal models.

What is Unique About This Session:

This workshop bridges the gap between software engineering and generative AI development. While most AI workshops focus solely on model usage or tuning, this session emphasizes the entire AI software lifecycle — from prompt engineering to monitoring and tracing. Participants will learn how to manage non-determinism and create production-ready AI applications, giving them the knowledge to tackle the software engineering challenges of AI-powered apps. The hands-on approach ensures that attendees walk away with practical skills and a functional app.

TipPrerequisites:
  • Basic programming knowledge in Python.
  • Familiarity with REST APIs.
  • Experience working with Jupyter Notebooks or similar environments (preferred but not required).
  • No prior experience with AI or machine learning is required.
  • Most importantly, a sense of curiosity and a desire to learn!
  • If you have a background in data science, ML, or AI, this workshop will help you understand the software engineering side of building AI applications.

Tools and Frameworks:

We will introduce you to certain modern frameworks in the workshop but the emphasis be on first principles and using vanilla Python and LLM calls to build AI-powered systems.

workshop repo

TipSpeakers:

Hugo Bowne-Anderson

Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry Vanishing Gradients, where he explores cutting-edge developments in data science and artificial intelligence.

As a data scientist, educator, evangelist, content marketer, and strategist, Hugo has worked with leading companies in the field. His past roles include Head of Developer Relations at Outerbounds, a company committed to building infrastructure for machine learning applications, and positions at Coiled and DataCamp, where he focused on scaling data science and online education respectively.

Hugo’s teaching experience spans from institutions like Yale University and Cold Spring Harbor Laboratory to conferences such as SciPy, PyCon, and ODSC. He has also worked with organizations like Data Carpentry to promote data literacy. His impact on data science education is significant, having developed over 30 courses on the DataCamp platform that have reached more than 3 million learners worldwide. Hugo also created and hosted the popular weekly data industry podcast DataFramed for two years.

Committed to democratizing data skills and access to data science tools, Hugo advocates for open source software both for individuals and enterprises.


slide 01 - About the Workshop

slide 01 - About the Workshop

Slide 02 - Session Flow

Slide 02 - Session Flow

Slide 03 - Chat GPT

Slide 03 - Chat GPT

Slide 04 - Chat with Claude

Slide 04 - Chat with Claude

Slide 05 - Session Flow

Slide 05 - Session Flow

Slide 06 - Action

Slide 06 - Action

Slide 07 - What can an LLM do?

Slide 07 - What can an LLM do?

Slide 08 - Goals

Slide 08 - Goals

Slide 09 - Agmented LLM (anthropic)

Slide 09 - Agmented LLM (anthropic)

c.f. Building effective agents

Slide 10 - AI POC

Slide 10 - AI POC

Slide 11 - 5 line Rag

Slide 11 - 5 line Rag

Slide 12 - output

Slide 12 - output

Slide 13 - How To Improve

Slide 13 - How To Improve

Slide 14 - Show me the prompt

Slide 14 - Show me the prompt

TipTool Tip : Github code spaces!
- can I use this as a free resource? yes!
- can I use this to edit my repo from my IPAD? yes!

First Demo

  1. Don’t follow along in real time, just focus on the concepts.
  2. Follow the README.md in the repo.
  3. The first notebook is about using the code above to build a simple RAG system that queries different LLM (Claude, ChatGPT and Gemini). against some PDF documents.

Slide 15

Slide 15

Slide 16

Slide 16

Slide 17

Slide 17

Slide 18

Slide 18

Slide 19

Slide 19

TipTool Tip : gemini-2.5-flash

has a free tier.

TipTool Tip : Datasette

has a free tier.


Second Demo :

3-vannila-python-query.py

  1. Rebuild the front end in Gradio
  2. Add monitoring and logging (observability)

Slide 20

Slide 20

Slide 21

Slide 21

Slide 22

Slide 22

Slide 23

Slide 23

Slide 24

Slide 24

llms are

Slide 25 - Multimodel Session Ad

Slide 25 - Multimodel Session Ad

Slide 30

Slide 30

Slide 31

Slide 31

Slide 32

Slide 32

Slide 33

Slide 33

Slide 34

Slide 34

Slide 35

Slide 35

Slide 36

Slide 36

Key: Align LLM outputs to your application needs

Slide 37

Slide 37

recommends use a spread sheet (slide comes from another workshop/talk)


Demo 3: UnStructured to Emailed Report (Two-Stage Pipeline)

  • Two-Stage AI Pipeline: From Unstructured Text to Personalized Email
    1. Setup (keys, imports, client)
    2. Load LinkedIn data (via txt file)
    3. Stage 1: Summarize LinkedIn posts \to JSON
      • Minimal Baseline
      • With Schema Definition, JSON Mode, and Error Handling
    4. Stage 2: Structured Data \to Personalized Recruiter Email - Email Variation 1: Minimal (Baseline) - Email Variation 2: With Guardrails and Personalization Requirements - Complete Two-Stage Pipeline
    5. LLM Judge (we don’t need a code check but a judgement call)

Slide 38

Slide 38

Slide 39

Slide 39
  • jumps back to … anthropic slide
    • do we need memory?
    • do we need tool use?
    • creating two command line tools like send email etc can cover most of our needs.

Slide 40

Slide 40

Slide 41 routing

Slide 41 routing

Slide 42

Slide 42

Slide 43

Slide 43

Slide 44

Slide 44

Slide 45

Slide 45

Demo 4: Function Calling

Citation

BibTeX citation:
@online{bochman2025,
  author = {Bochman, Oren},
  title = {Building {LLM-Powered} {Applications} for {Data} {Scientists}
    and {Software} {Engineers}},
  date = {2025-12-09},
  url = {https://orenbochman.github.io/posts/2025/2025-12-09-pydata-building-llm-powered-apps-for-ds/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2025. “Building LLM-Powered Applications for Data Scientists and Software Engineers.” December 9, 2025. https://orenbochman.github.io/posts/2025/2025-12-09-pydata-building-llm-powered-apps-for-ds/.