First LLM Classifier

Learn how journalists use large-language models to organize and analyze massive datasets

What you will learn

This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.

It will teach you how to:

  • Submit large-language model prompts with the Python programming language

  • Write structured prompts that can classify text into predefined categories

  • Submit dozens of prompts at once as part of an automated routine

  • Evaluate results using a rigorous, scientific approach

  • Improve results by training the model with rules and examples

By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.

Who can take it

This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.

Table of contents

About this class

Ben Welsh and Derek Willis prepared this guide for a training session at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis. Some of the copy was written with the assistance of GitHub’s Copilot, an AI-powered text generator. The materials are available as free and open source on GitHub.