First LLM Classifier

Learn how journalists use large-language models to organize and analyze massive datasets

What you will learn

This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.

It will teach you how to:

  • Submit large-language model prompts with the Python programming language

  • Write structured prompts that can classify text into predefined categories

  • Submit dozens of prompts at once as part of an automated routine

  • Evaluate results using a rigorous, scientific approach

  • Improve results by training the model with rules and examples

By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.

Who can take it

This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.

The documentation assumes you are working on an Apple computer or with the Linux operating system. If you are using Windows, we recommend that you install the Windows Subsystem for Linux, which will allow you to run Linux on your Windows machine.

Table of contents

About this class

Ben Welsh and Derek Willis prepared this guide for a training session at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis. The project was adapted to run on Hugging Face by Florent Daudens. Andrew Briz updated it to incorporate structured responses for the 2026 NICAR conference in Indianapolis.

Some of the copy was written with the assistance of GitHub’s Copilot and Anthropic’s Claude. The materials are available as free and open source on GitHub.