I Built a Transformer from Scratch So You Don’t Have To

PyData Global 2025 Recap

A detailed recap of the PyData Global 2025 talk on building a transformer model from scratch using PyTorch, covering key components, common pitfalls, and practical implementation tips.
PyData
Author

Oren Bochman

Published

Friday, December 12, 2025

Keywords

PyData, Transformers, Deep Learning, PyTorch, Machine Learning

pydata global

pydata global
TipLecture Overview

NVIDIA GPUs offer unmatched speed and efficiency for data processing and model training, significantly reducing the time and cost associated with these tasks. Using GPUs is even more tempting when you use zero-code-change plugins and libraries. You can use PyData libraries including pandas, polars and networkx without needing to rewrite your code to get the benefits of GPU acceleration. We can also mix in GPU native libraries like Numba, CuPy and pytorch to accelerate our workflows from end-to-end.

However, integrating GPUs into our workflow can be a new challenge where we need to learn about installation, dependency management, and deployment in the Python ecosystem. When writing code, we also need to monitor performance, leverage hardware effectively, and debug when things go wrong

This is where RAPIDS and its tooling ecosystem comes to the rescue. RAPIDS, is a collection of open source software libraries to execute end-to-end data pipelines on NVIDIA GPUs using familiar PyData APIs.

In this tutorial we will cover:

This is a hands-on tutorial, with multiple examples to get familiarized with the RAPIDS ecosystem. Participants should ideally have some experience using Python, pandas and sci-kit learn. We’ll use cloud-based VMs, so familiarity with the cloud and resource creation is helpful but not required. No prior GPU knowledge is needed.

ImportantTools and Frameworks:
  • RAPIDS ecosystem including cuDF, cuML, and Dask-cuDF
TipSpeakers:

Jacob Tomlinson

Jacob Tomlinson is a senior software engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK.

Naty Clementi

Naty Clementi is a senior software engineer at NVIDIA. She is a former academic with a Masters in Physics and PhD in Mechanical and Aerospace Engineering to her name. Her work involves contributing to RAPIDS, and in the past she has also contributed and maintained other open source projects such as Ibis and Dask. She is an active member of PyLadies and an active volunteer and organizer of Women and Gender Expansive Coders DC meetups.

Outline

we start with intoduction and tour of the https://rapids.ai/ web page

then we do a quick demo on colab

but most of the demo happens on the NVIDIA Brev platform

both of these are based on this repo:

Demo notebook

  • Deployment
    • NVIDIA Brev
    • GPU Software Environment Fundamentals
    • Python packages that use CUDA
    • Monitoring/debugging tools
    • Other platforms

Connecting to your VM

Once your VM is deployed, follow the Brev access instructions provided for your instance. The connection instructions will vary depending on your operating system. For example, on macOS you would:

  • Install the brev CLI
    • brew install brevdev/homebrew-brev/brev
  • Login to your account (copy from access page)
    • brev login --token ****
  • Connect via SSH
    • brev ls to list your VMs
    • brev shell <your vm name> to connect via SSH`

Exploring our GPU Software Environment

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.158.01             Driver Version: 570.158.01     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:00:03.0 Off |                    0 |
| N/A   47C    P8             13W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Python Software environments

Reflections

The demo is informative.

They run tmux which lets you see how installation is done and while the workload is running you can see some system monitoring alongside the python console.

This seems to be a missing capabilities when working with Cuda on my desktop and possibly when using a remote instance too!.


Monitoring and Debugging using TMUX

Jacob suggest using monitoring tools such as:

  • NVTOP [GPU % RX TX Memory] process list [GPU CPU memory loads etc]

  • SMI

  • Future lab nvdashbord

  • DCGM

  • nsys

  • Nsight

  • Jupter lab Nsight extension

it shows that many operations are async or lazy.


Anyhow I can’t wait to try RAPIDS on my own GPU machine.

Citation

BibTeX citation:
@online{bochman2025,
  author = {Bochman, Oren},
  title = {I {Built} a {Transformer} from {Scratch} {So} {You} {Don’t}
    {Have} {To}},
  date = {2025-12-12},
  url = {https://orenbochman.github.io/posts/2025/2025-12-11-pydata-rapids/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2025. “I Built a Transformer from Scratch So You Don’t Have To.” December 12, 2025. https://orenbochman.github.io/posts/2025/2025-12-11-pydata-rapids/.