Home » Software » The Great Recession News Corpus

Copyright © 2016-2020 Universidad de Málaga.

The Great Recession News Corpus

The Great Recession News Corpus (GRNC) compiles news articles from the online Business section of major daily newspapers The Guardian and The New York Times between 2007 and 2015. The resulting corpus includes 42,193 news articles from the Business section of The Guardian and The New York Times in plain text files tagged according to their publication date, from January 2007 to December 2015. It contains 21M tokens.


The GRNC is available by request for non-for-profit academic users of The Sketch Engine. This platform was selected for its management, processing and dissemination possibilities and the fact that our corpus can be used in combination with its other 500 corpora in more than 90 languages that cover multiple language varieties.

The final product is compiled in The Sketch Engine online suite (Kilgariff et al., 2014). It includes an automatic Penn Treebank POS-tagging (Marcus, Santorini & Marcinkiewicz, 1993) computation of Word Sketches, thesaurus or n-grams. In addition, The Sketch Engine allows dividing the GRNC into multiple subcorpora and, as a result, allows complex searches by year and publisher.

If you are interested please send us an email to [ fernandezcruz [a] uma.es ] from an academic email account, including your SketchEgine user.

Follow us on twitter