Home » Software » Lingmotif » SentiEcon

Copyright © 2016-2020 Universidad de Málaga.

SentiEcon

SentiEcon (ISLRN: 314-817-285-706-3) is a large, comprehensive, domain-specific computational lexicon for Economy and Finance designed for sentiment analysis applications.

SentiEcon was created as a plug-in lexicon for the sentiment analysis tool Lingmotif, and thus it follows its data structure requirements and presupposes the availability of a general-language core sentiment lexicon that covers non-specific sentiment-carrying terms and phrases. It contains 6,470 entries, both single and multi-word expressions, each with tags denoting their semantic orientation and intensity. SentiEcon’s is formatted in a tab-separated UTF-8 file (e.g., money <launder> VB neg 3).

We evaluate SentiEcon’s performance by comparing results in a sentence classification task using exclusively sentiment words as features. This sentence dataset was extracted from business news texts, and included certain key words known to recurrently convey strong semantic orientation, such as “debt”, “inflation” or “markets”. The results show that performance is significantly improved when adding SentiEcon to a general-language sentiment lexicon.

Data field Example/List
Word form launder, haircut, european central bank
PoS [ALL, NN, JJ, VB, RB, UH, IN]
Polarity [POS, NEG, NEU]
Intensity [0, 1, 2, 3]

Table 1: SentiEcon’s data fields

Polarity Words MWE Total
POS 343 1022 1365
NEG 736 1708 2444
NEU 309 2352 2661
Total 1388 5082 6470

Table 3: Count and distribution of entries in SentiEcon

SentiEcon GS-1000

SentiEcon GS-1000 (ISLRN: 524-008-163-978-0) is a manually annotated gold standard dataset consisting of 1,000 sentences initially compiled  to evaluate the performance of SentiEcon.  Two domain experts annotated the dataset by classifying each sentence as belonging to one of three categories: POSITIVE, NEGATIVE, and NONE. They were instructed to take into account only the information available in the sentences and to annotate sentences. Annotation was carried out independently and then they were asked to reach a consensus in differing cases.

License request

SentiEcon will be soon available under the ELRA license and GS-1000 will be realeased under Creative Commons-BY-NC 3.0. You can request a trial version of SentiEcon and/or SentiEcon GS-1000 for academic purposes. Simply write us an email to tecnolengua [a] uma.es from an academic account with the following information and we will respond shortly:

  • Full name
  • Institution and department / position
  • Brief description of the needs.

Citing SentiEcon

If you use SentiEcon please cite us:

Moreno-Ortiz, A., Fernández-Cruz, Javier, & Pérez-Hernández, Chantal. (2020). Design and Evaluation of SentiEcon: A fine-grained Economic/Financial Sentiment Lexicon from a Corpus of Business News. Proceedings of the 12th Language Resources and Evaluation ConferenceAt: Marseille, 5067-5074. http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.623.pdf

Follow us on twitter