The data for these maps are drawn from billions of tweets collected by geographer Diansheng Guo in 2014. Jack Grieve, a forensic linguist at Aston University in the United Kingdom, along with Andreas Nini of the University of Manchester, identified the top 100,000 words used in these tweets and how often they are used in every county in the continental United States, based on location data from Twitter.
Once Grieve and Nini identified these words and their locations, they used hot-spot testing, a common technique in spatial analysis. This is the “regional smoothing” setting you see above. This technique uncovers geographic trends in data by clustering together nearby areas with similar results. You can adjust the smoothing or disable it to see the raw data for each word. Note: Quartz may use your totally anonymous search queries for future stories about how people are using this tool.
If you’d like to dive deeper, the full dataset of words and their values for each county is available for download here.