First LLM Classifier¶
Learn how journalists use large-language models to organize and analyze massive datasets
What you will learn¶
This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.
It will teach you how to:
- Submit large-language model prompts with the Python programming language 
- Write structured prompts that can classify text into predefined categories 
- Submit dozens of prompts at once as part of an automated routine 
- Evaluate results using a rigorous, scientific approach 
- Improve results by training the model with rules and examples 
By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.
Who can take it¶
This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.
Table of contents¶
About this class¶
Ben Welsh and Derek Willis prepared this guide for a training session at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis. Some of the copy was written with the assistance of GitHub’s Copilot, an AI-powered text generator. The materials are available as free and open source on GitHub. The project has been adapted to run on Hugging Face by Florent Daudens.