Labeling Social Media Posts: Does Showing Coders Multimodal Content Produce Better Human Annotation, and a Better Machine Classifier?

Journal Article

With rising costs of collecting and labeling multimodal social media data, we develop five measures and an experimental framework to help researchers make informed methodological decisions when facing budget constraints.

Citation

Chen, Haohan, James Bisbee, Joshua A. Tucker, and Jonathan Nagler. “Labeling Social Media Posts: Does Showing Coders Multimodal Content Produce Better Human Annotation, and a Better Machine Classifier?” Political Science Research and Methods, 2025, 1–13. https://doi.org/10.1017/psrm.2025.10010

Date Posted

Jul 13, 2025

Authors

Area of Study

Abstract

The increasing multimodality (e.g., images, videos, links) of social media data presents opportunities and challenges. But text-as-data methods continue to dominate as modes of classification, as multimodal social media data are costly to collect and label. Researchers who face a budget constraint may need to make informed decisions regarding whether to collect and label only the textual content of social media data or their full multimodal content. In this article, we develop five measures and an experimental framework to assist with these decisions. We propose five performance metrics to measure the costs and benefits of multimodal labeling: average time per post, average time per valid response, valid response rate, intercoder agreement, and classifier’s predictive power. To estimate these measures, we introduce an experimental framework to evaluate coders’ performance under text-only and multimodal labeling conditions. We illustrate the method with a tweet labeling experiment.

View More Research

Citation

Date Posted

Authors

Area of Study

Tags

Abstract