SentiEcon (ISLRN: 314-817-285-706-3) is a large, comprehensive, domain-specific computational lexicon for Economy and Finance designed for sentiment analysis applications.
SentiEcon was created as a plug-in lexicon for the sentiment analysis tool Lingmotif, and thus it follows its data structure requirements and presupposes the availability of a general-language core sentiment lexicon that covers non-specific sentiment-carrying terms and phrases. It contains 6,470 entries, both single and multi-word expressions, each with tags denoting their semantic orientation and intensity. SentiEcon’s is formatted in a tab-separated UTF-8 file (e.g., money <launder> VB neg 3).
We evaluate SentiEcon’s performance by comparing results in a sentence classification task using exclusively sentiment words as features. This sentence dataset was extracted from business news texts, and included certain key words known to recurrently convey strong semantic orientation, such as “debt”, “inflation” or “markets”. The results show that performance is significantly improved when adding SentiEcon to a general-language sentiment lexicon.
Data field | Example/List |
Word form | launder, haircut, european central bank |
PoS | [ALL, NN, JJ, VB, RB, UH, IN] |
Polarity | [POS, NEG, NEU] |
Intensity | [0, 1, 2, 3] |
Table 1: SentiEcon’s data fields
Polarity | Words | MWE | Total |
POS | 343 | 1022 | 1365 |
NEG | 736 | 1708 | 2444 |
NEU | 309 | 2352 | 2661 |
Total | 1388 | 5082 | 6470 |
Table 3: Count and distribution of entries in SentiEcon
SentiEcon GS-1000
SentiEcon GS-1000 (ISLRN: 524-008-163-978-0) is a manually annotated gold standard dataset consisting of 1,000 sentences initially compiled to evaluate the performance of SentiEcon. Two domain experts annotated the dataset by classifying each sentence as belonging to one of three categories: POSITIVE, NEGATIVE, and NONE. They were instructed to take into account only the information available in the sentences and to annotate sentences. Annotation was carried out independently and then they were asked to reach a consensus in differing cases.
License request
SentiEcon will be soon available under the ELRA license and GS-1000 will be realeased under Creative Commons-BY-NC 3.0. You can request a trial version of SentiEcon and/or SentiEcon GS-1000 for academic purposes. Simply write us an email to tecnolengua [a] uma.es from an academic account with the following information and we will respond shortly:
- Full name
- Institution and department / position
- Brief description of the needs.
Citing SentiEcon
If you use SentiEcon please cite us:
Moreno-Ortiz, A., Fernández-Cruz, Javier, & Pérez-Hernández, Chantal. (2020). Design and Evaluation of SentiEcon: A fine-grained Economic/Financial Sentiment Lexicon from a Corpus of Business News. Proceedings of the 12th Language Resources and Evaluation ConferenceAt: Marseille, 5067-5074. http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.623.pdf