First LLM Classifier¶

Learn how journalists use large-language models to organize and analyze massive datasets

What you will learn¶

This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.

It will teach you how to:

Submit large-language model prompts with the Python programming language
Write structured prompts that can classify text into predefined categories
Submit dozens of prompts at once as part of an automated routine
Evaluate results using a rigorous, scientific approach
Improve results by training the model with rules and examples

By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.

Who can take it¶

This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.

Table of contents¶

About this class¶

Ben Welsh and Derek Willis prepared this guide for a training session at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis. Some of the copy was written with the assistance of GitHub’s Copilot, an AI-powered text generator. The materials are available as free and open source on GitHub. The project has been adapted to run on Hugging Face by Florent Daudens.