Building LLM-Powered Applications for Data Scientists and Software Engineers

PyData Global 2025 Recap

A detailed recap of the PyData Global 2025 workshop on building LLM-powered applications, focusing on software engineering principles, prompt engineering, monitoring, and practical application development.

PyData

Generative AI

Large Language Models

Software Engineering

Data Science

Workshop Overview

This workshop is designed to equip software engineers with the skills to build and iterate on generative AI-powered applications. Participants will explore key components of the AI software development lifecycle through first principles thinking, including prompt engineering, monitoring, evaluations, and handling non-determinism. The session focuses on using LLMs to build applications, such as querying PDFs, while providing insights into the engineering challenges unique to AI systems. By the end of the workshop, participants will know how to build a PDF-querying app, but all techniques learned will be generalizable for building a variety of generative AI applications.

If you’re a data scientist, machine learning practitioner, or AI enthusiast, this workshop can also be valuable for learning about the software engineering aspects of AI applications, such as lifecycle management, iterative development, and monitoring, which are critical for production-level AI systems.

What You’ll Learn:

How to integrate AI models and APIs into a practical application.
Techniques to manage non-determinism and optimize outputs through prompt engineering.
How to monitor, log, and evaluate AI systems to ensure reliability.
The importance of handling structured outputs and using function calling in AI models.
The software engineering side of building AI systems, including iterative development, debugging, and performance monitoring.
Practical experience in building an app to query PDFs using multimodal models.

What is Unique About This Session:

This workshop bridges the gap between software engineering and generative AI development. While most AI workshops focus solely on model usage or tuning, this session emphasizes the entire AI software lifecycle — from prompt engineering to monitoring and tracing. Participants will learn how to manage non-determinism and create production-ready AI applications, giving them the knowledge to tackle the software engineering challenges of AI-powered apps. The hands-on approach ensures that attendees walk away with practical skills and a functional app.

Prerequisites:

Basic programming knowledge in Python.
Familiarity with REST APIs.
Experience working with Jupyter Notebooks or similar environments (preferred but not required).
No prior experience with AI or machine learning is required.
Most importantly, a sense of curiosity and a desire to learn!
If you have a background in data science, ML, or AI, this workshop will help you understand the software engineering side of building AI applications.

Tools and Frameworks:

We will introduce you to certain modern frameworks in the workshop but the emphasis be on first principles and using vanilla Python and LLM calls to build AI-powered systems.

workshop repo

Speakers:

Hugo Bowne-Anderson

Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry Vanishing Gradients, where he explores cutting-edge developments in data science and artificial intelligence.

As a data scientist, educator, evangelist, content marketer, and strategist, Hugo has worked with leading companies in the field. His past roles include Head of Developer Relations at Outerbounds, a company committed to building infrastructure for machine learning applications, and positions at Coiled and DataCamp, where he focused on scaling data science and online education respectively.

Hugo’s teaching experience spans from institutions like Yale University and Cold Spring Harbor Laboratory to conferences such as SciPy, PyCon, and ODSC. He has also worked with organizations like Data Carpentry to promote data literacy. His impact on data science education is significant, having developed over 30 courses on the DataCamp platform that have reached more than 3 million learners worldwide. Hugo also created and hosted the popular weekly data industry podcast DataFramed for two years.

Committed to democratizing data skills and access to data science tools, Hugo advocates for open source software both for individuals and enterprises.

c.f. Building effective agents

Tool Tip : Github code spaces!

- can I use this as a free resource? yes!
- can I use this to edit my repo from my IPAD? yes!

First Demo

Don’t follow along in real time, just focus on the concepts.
Follow the README.md in the repo.
The first notebook is about using the code above to build a simple RAG system that queries different LLM (Claude, ChatGPT and Gemini). against some PDF documents.

Tool Tip : gemini-2.5-flash

has a free tier.

Tool Tip : Datasette

has a free tier.

Second Demo :

3-vannila-python-query.py

Rebuild the front end in Gradio
Add monitoring and logging (observability)

llms are

Key: Align LLM outputs to your application needs

recommends use a spread sheet (slide comes from another workshop/talk)

Demo 3: UnStructured to Emailed Report (Two-Stage Pipeline)

Two-Stage AI Pipeline: From Unstructured Text to Personalized Email
1. Setup (keys, imports, client)
2. Load LinkedIn data (via txt file)
3. Stage 1: Summarize LinkedIn posts \to JSON
  - Minimal Baseline
  - With Schema Definition, JSON Mode, and Error Handling
4. Stage 2: Structured Data \to Personalized Recruiter Email - Email Variation 1: Minimal (Baseline) - Email Variation 2: With Guardrails and Personalization Requirements - Complete Two-Stage Pipeline
5. LLM Judge (we don’t need a code check but a judgement call)

jumps back to … anthropic slide
- do we need memory?
- do we need tool use?
- creating two command line tools like send email etc can cover most of our needs.

Demo 4: Function Calling

Function Calling with LLM APIs
- OpenAI Function Calling
- Gemini Function Calling (not as clever yet, think about non-determinism)
- Enriching Data with Search
  - there are lots of cool tools we can use here (pinecone, weaviate, chromadb, etc)

Citation

BibTeX citation:

@online{bochman2025,
  author = {Bochman, Oren},
  title = {Building {LLM-Powered} {Applications} for {Data} {Scientists}
    and {Software} {Engineers}},
  date = {2025-12-09},
  url = {https://orenbochman.github.io/posts/2025/2025-12-09-pydata-building-llm-powered-apps-for-ds/},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2025. “Building LLM-Powered Applications for Data Scientists and Software Engineers.” December 9, 2025. https://orenbochman.github.io/posts/2025/2025-12-09-pydata-building-llm-powered-apps-for-ds/.