Analyzing political text can answer many pressing questions in political science, from understanding political ideology to mapping the effects of censorship in authoritarian states. This makes the study of political text and speech an important part of the political science methodological toolbox. The confluence of increasing availability of large digital text collections, plentiful computational power, and methodological innovations has led to many researchers adopting techniques of automatic text analysis for coding and analyzing textual data. In what is sometimes termed the “text as data” approach, texts are converted to a numerical representation, and various techniques such as dictionary analysis, automatic scaling, topic modeling, and machine learning are used to find patterns in and test hypotheses on these data.
These methods all make certain assumptions and need to be validated to assess their fitness for any particular task and domain.