Dictionary-Assisted Supervised Contrastive Learning

We propose an approach for combining dictionaries with large language models.

Abstract

Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods.

Background

In the past, social science researchers doing text analysis had often used methods that rely on specialized dictionaries crafted to identify underlying, and typically abstract, concepts of interest. These dictionaries are carefully made by researchers to classify specific words that otherwise might not exist in training data. More recently, however, supervised machine learning approaches that do not rely on dictionaries have become more common. Our proposed dictionary-assisted supervised contrastive learning (DASCL) approach combines these techniques by leveraging researcher-crafted dictionaries when fine-tuning pretrained large language models.

Study

Our method first uses keyword simplification by iterating through each term in a dictionary and, when applicable, replacing it with a simplified token. For example, any positive terms such as “wonderful” or “fresh” would be replaced with “<positive>”. We then used a DASCL loss function to identify patterns in both the original and word-simplified texts. This contrastive loss function is combined with the cross-entropy loss function to improve classification metrics by drawing embeddings of the same class closer together. We then ran a series of experiments to test DASCL against two other baseline methods. 

Results

Compared to alternative contrastive and data augmentation methods or the use of cross-entropy alone, our experiments demonstrated improvement in  classification metrics when using DASCL with cross-entropy. This approach not only performed better on SST-2 in few-shot learning settings, but was also better able to classify economic perspectives expressed in New York Times articles and identify tweets that contained abusive language. We hope to conduct further experiments and continue expanding on this research to better understand the potential and limitations of this approach.