- Papers
Papers is Alpha. This content is part of an effort to make research more accessible, and (most likely) has lost some details from the original. You can find the original paper here.
Introduction
[0]leftmargin=_ [0]leftmargin=_
Words often convey affect (emotions, sentiment, feelings, and attitudes); either explicitly through their core meaning (denotation) or implicitly through connotation. For example, dejected denotes sadness. On the other hand, failure simply connotes sadness. Either through denotation or connotation, both words are associated with sadness. A compilation of such associations is referred to as a word–affect association lexicon (aka emotion lexicon).[This includes sentiment lexicons that capture valence (association with the positive–negative dimension) and other lexica that capture affect-related phenomena.] An entry in a lexicon usually includes a word, an emotion category or affect dimension (e.g., joy, fear, valence, arousal, etc.), and a score indicating association (or strength of association). Examples of emotion lexicons include the General Inquirer, ANEW, LIWC, Pittsburgh Subjectivity Lexicon, NRC Emotion Lexicon, and the NRC Valence, Arousal, and Dominance (VAD) Lexicon. These were all created by manual annotation (either by experts or crowdsourced). There also exist lexicons that were generated automatically from large text corpora usingstatistical and/or machine learning algorithms; e.g., WordNet Affect, SentiWordNet (SWN).
Emotion lexicons have a wide range of applications in commerce, public health, and research (in NLP, Psychology, Social Sciences, Digital Humanities, etc.). Some notable examples include: tracking brand and product perception via social media posts, tracking support for controversial issues and policies, tracking buy-in for non-pharmaceutical health measures such as social distancing during a pandemic, literary analysis, and developing more natural dialogue systems. The lexicons can be used on their own or in support of neural machine learning (ML) algorithms for emotion recognition. Lexicon-based emotion analyses are especially popular in real-world applications and research outside of computer science because they are interpretable, have a low carbon footprint, and do not require significant programming expertise. Further, since outputs of ML models are highly dependent on training data, use of a model often requires retraining, and there may not exists labeled data from the target domain Further, TMemotionarcs show that when determining broad trends (emotion arcs) and aggregating information from hundreds (if not more) instances for every time step, simple lexicon-based methods are extremely accurate (correlations above 0.95 with ground truth arcs).
However, inappropriate and incorrect use of these lexicons, can lead to not just sub-optimal results, but also inferences that are directly harmful. For example, using lexicons to infer emotions from limited amount of data to make judgments about refugee applications, to make judgments about which groups of people are shown certain advertisements and which groups are not, marking businesses owned by some groups of people as less liked than that of others, etc.
Emotions are deeply personal, private, and complex. Even the best natural language systems largely only employ pattern matching based on huge amounts of historical data, and thus often do not really understand what the user is trying to convey, let alone how they are feeling. In fact, some recent commercial and governmental uses of emotion recognition have garnered considerable criticism, including: infringing on one’s privacy, exploiting vulnerable sub-populations, and even allegations of pseudo-science.
This paper brings together ideas from Affective Computing and AI Ethics to present, in one place, some of the practical and ethicalconsiderations involved in the creation and use of emotion lexicons —best practices.[This paper is a reframed and expanded avatar of an earlier datasheet paper for emotion lexicons.] We hope this work will facilitate more thoughtfulness when one is deciding on what emotions to work on, how to create an emotion lexicon, how to use an emotion lexicon, and how to judge success. Additional benefits of such a document include:
Presents the trade-offs of relevant choices so that stakeholders can make informed decisions appropriate for their context.
Has citations and pointers; acts as a jumping off point for further reading.
Helps engage the various stakeholders of an emotion task with each other. Helps stakeholders challenge assumptions made by researchers and developers.
Helps develop harm mitigation strategies.
Acts as a useful introductory document on emotion lexicons (complements survey articles).
Note that even though this article is focused on emotion lexicons, many of the ethical considerations apply broadly to natural language lexicons/resources in general. Also, see Mohammad22AER for a broader discussion on the ethical considerations associated with automatic emotion recognition (AER).
This work is in the same spirit as other recent innovations in exercising responsible research such as datasheets for datasets, model cards for systems, and ethics sheets for AI tasks. However, unlike datasheets and model cards which are designed for individual datasets and systems and that are published after the work is done, the goal of this work is to provide a more general-purpose relevant resource, accessible at the very beginning of one’s project. Also, unlike an ethics sheet for a automatic emotion recognition that may cover all kinds of ethical considerations associated with the task of interest, this document has a focus onthe creation of emotion lexicons and their use in AI tasks.
Ethics considerations are not about objective metrics or simple checklists. They involve engaging with issues that impact stake holders, especially those that are already disadvantaged. Thus, a big component of this work is to raise awareness of relevant issues, to underscore how often there are no easy solutions, and that meaningful change requires painstaking, slow, and deliberate engagement with the stakeholders. Additionally, such documents are useful for those that are impacted to question and challenge assumptions made by unfair decisions of automated systems.
Best Practices
Below we present various best practices (practical and ethical considerations) pertaining to 22 aspects of emotion lexicon creation and use. The 22 aspects are grouped under the coarser categories pertaining to a lexicon’s life cycle: A. Lexicon Design, B. Annotation, C. Entries in the Lexicon, and D. Applying the Lexicon. Note that while many considerations are presented from the perspective of lexicon creation, they are also relevant to the users of a lexicon — knowing what decisions were made during the creation of a lexicon help one to assess appropriateness of using the lexicon.
The goal is to provide a comprehensive set of relevant considerations, so that readers (especially those new to research or new to work with emotions) can find the information in one place. Thus, we include both the considerations that are especially specific to emotions, as well as others that apply more broadly (even if they are somewhat well known). Also, the points listed below are not meant to be the final word, but rather jumping off points for further thought and discussion.
Overview
An overview of the 22 aspects is presented below; followed by the detailed descriptions.
A. LEXICON DESIGN
Purpose or Objective
Emotion Category or Dimension
Word Senses and Dominant Sense Priors
Discrete or Continuous Value Labels
B. ANNOTATION
Questionnaire
Comparative Annotations
Annotators
Quality Control
C. ENTRIES IN THE LEXICON
Annotation Aggregation
Relative (not Absolute)
Coverage
Not Immutable
Perceptions (not “truth”)
Socio-Cultural Biases
Inappropriate Biases
Errors
Mechanism to Report and Fix Errors
D. APPLYING THE LEXICON
Fit of the Lexicon to One’s Data
Rescaling the Lexicon for One’s Task
Metrics & Features Drawn from the Lexicon
Removing Neutral Words
Inferences
Detailed Descriptions
A. LEXICON DESIGN
#1. Purpose or Objective: Consider and document the objective(s) of building the emotion lexicon. There can be more than one objective. The objectives guide various design choices involved in the creation of the lexicon. See selbst2019fairness for common pitfalls in designing and framing socio-technical systems; and Mohammad22AER for common pitfalls in designing and framing automatic emotion recognition tasks.Users of emotion lexicons can study the purpose of each lexicon to determine which is most suitable for their use case.
Broadly speaking, the objectives tend to be around the study of word–emotion associations (exploring various research questions at the intersection of language an emotions) and aiding automatic emotion detection from utterances. However, individual projects often have specific goals, for example, to study specific phenomenon such as loneliness and empathy, to study inappropriate biases, to detect what emotions people perceive from utterances, to study how automatic systems should perceive the emotions in utterances, how automatic systems should use words to convey emotions, etc. It is important to recognize that some of these objectives are very related, but they have important differences. For example, while a general-purpose emotion lexicon will capture a number of benign associations, it will also capture inappropriate societal biases. If one wants to use a lexicon in a text generation system, then they should either use a lexicon designed specifically for that purpose, or address the biases in a general purpose lexicon, before using it.
Work using emotion lexicons should not claim that using it one can determine one’s emotional state from their utterance. At best, recognition systems (whether they use emotion lexicons or not) capture what one is trying to convey or what is perceived by the listener/viewer; and even there, given the complexity of human expression, they are often inaccurate. Several studies have shown that it is difficult to fully measure psychological states of people.
In contrast, statistical analyses with features drawn from emotion lexicons can be used to accurately determine broad trends in the emotional state of a population over time. Here, inferences are drawn at aggregate level from much larger amounts of data. Studies on public health, such as those on loneliness, depression, suicidality prediction, bipolar disorder, stress, emotions during a pandemic, and general well-beingfall in this category. Here too, however, it is best to be cautious in making claims about mental state, and use emotion recognition as one source of evidence amongst many (and involve expertise from public health and psychology). #2. Emotion Category or Dimension: A key decision in the creation of an emotion lexicon is which conceptualization or facet of emotion to use. For example, should it capture emotion categories such as joy, sadness, fear, optimism, etc., or will it capture dimensions such as valence, arousal, and dominance. Psychologists and neuro-scientists have identified several theories of emotion that can inform the choice of categories and dimensions, including: the Basic Emotions Theory (BET), the Dimensional Theory, Cognitive Appraisal Theory, and the Theory of Constructed Emotions.
Since ML approaches rely on human-annotated data (which can be hard to obtain in large quantities), emotion recognition research has often gravitated to the Basic Emotions Theory, as that work allows one to focus on a small number of emotions. This attraction has been even stronger in the vision research because of BET’s suggested mapping between facial expressions and emotions. However, many of the tenets of BET, such as the universality of some emotions and their fixed mapping to facial expressions, stand discredited or are in question.
Carefully consider which emotion formulation you wish to capture in your lexicon, or is appropriate for your task/project. For example, one may choose to work with the dimensional model or the model of constructed emotions if the goal is to infer behavioural or health outcome predictions. Despite criticisms of BET, it makes sense for some NLP work to focus on categorical emotions such as joy, sadness, guilt, pride, fear, etc. (including what some refer to as basic emotions) because people often talk about their emotions in terms of these concepts. Many human languages have words for these concepts (even if our individual mental representations for these concepts vary to some extent). However, note that work on categorical emotions by itself is not an endorsement of the BET. Do not refer to some emotions as basic emotions, unless you mean to convey your belief in the BET. Careless endorsement of theories can lead to the perpetuation of ideas that are actively harmful (such as suggesting we can determine internal state from outward appearance—physiognomy).
#3. Word Senses and Dominant Sense Priors: Words when used in different senses and contexts may be associated with different emotions. The entries in the emotion lexicons are mostly indicative of the emotions associated with the predominant senses of the words. This is usually not too problematic because most words have a highly dominant main sense (which occurs much more frequently than the other senses). In specialized domains, some terms might have a different dominant sense than in general usage. Entries in the lexicon for such terms should be appropriately updated or removed. However, if the goal of the project is to create a lexicon for a specialized domain, then one should guide the annotation process accordingly.
#4. Discrete or Continuous Value Labels: Many emotion lexicons have discrete binary labels for words (positive–negative, joy–no joy, fear–no fear, and so on). Lexicons such as ANEW and the NRC VAD Lexicon have real-valued scores between 0 and 1, -1 and 1, 0 to 5, 0 to 100, etc. Real-valued scores allows one to make finer distinctions in the degree of emotion. They allow one to determine the intensity of emotion. Binary-labeled lexicons are used primarily to determine density of emotion word usage; for example, to explore whether there is a higher percentage of tweets with loneliness words during the Covid-19 pandemic, than in the years before the pandemic. Determine which type of lexicon is more aligned with your objectives.
B. ANNOTATION
#5. Questionnaire: Arguably the most crucial aspect in the creation of an emotion lexicon is the questionnaire. What is asked and how it is asked determines the outcome. Below are key recommendations in the design of questionnaires:
Where appropriate, break the task/question into simpler sub-tasks/sub-questions.
It is better to have separate tasks for different questions and emotion dimensions. Asking for responses about more than one emotion dimension requires the annotator to switch contexts and leads to more cognitive load.
Keep the instructions clear and easy to follow.
Examples are more important than definitions. People tend to learn faster and better through examples. It is still good to include simple definitions of relevant concepts.
Refer to the theories for emotions work in psychology on to how to collect emotional information from respondents. Especially useful are the terms used to define emotion dimensions: e.g., as per the dimensional model of emotionsarousal is defined as the active–sluggish dimension, in the stereotype content model of social perception, warmth is defined as the trustworthiness, friendliness, kindness dimension. These words should be used when eliciting annotation responses.
Keep the instructions brief. This is respectful of annotator time, and one can only keep track of a limited number of instructions at a time.
Explain the purpose of the annotation task. This is respectful of annotators. People have a right to know (in appropriate detail) what research they are contributing their time for. This may also lead to more engaged annotators.
Include an optional comment box that gives annotators a way to provide feedback, raise issues, and to be heard.
Make the questionnaire and instructions freely available. This helps others to build on your work. It allows users to see exactly how the questions were phrased, and thus how to interpret the resulting emotion lexicon.
See also other data curation and questionnaire development tips from non-NLP fields such as psychology.
#6. Comparative Annotations: Real-valued scores provide fine-grained emotion information; however, it is difficult for humans to provide direct scores at this granularity. A popular approach to obtain real-valued scores is by providing the annotators with numeric rating scales.[https://www.questionpro.com/blog/rating-scale/] These scales have numbers (usually 1 to 5 or 1 to 7) and the annotator has to select which number is most indicative of the degree of association with the property of interest for the given word; given that the lowest number on the scale indicates least association and the highest number indicates the most association.[It is good practice to anchor the numeric values with labels such as maximum/moderate/low association.] The scores for an item from multiple annotators is averaged to obtain a real-valued score that is assigned to the word–emotion pair.
A common problem of annotation by rating scales is inconsistencies in annotations among different annotators. One annotator might assign a score of 87 to one word, while another annotator may assign a score of 81 to the same word. It is also common that the same annotator might assign different scores to the same word, if asked to annotate again after a period of time. Further, annotators often have a bias towards selecting scores in the middle of the scale, known as scale region bias.
Paired Comparisonsis a comparative annotation method, where respondents are presented with pairs of items and asked which item has more of the property of interest (for example, which is more positive). The annotations can then be converted into a ranking of items by the property of interest, and one can even obtain real-valued scores indicating the degree to which an item is associated with the property of interest. The paired comparison method does not suffer from the problems discussed above for the rating scale, but it requires a large number of annotations—order $N^2$, where $N$ is the number of items to be annotated.
Best–worst scaling (BWS)is a form of comparative annotation, like paired comparison, but it requires much fewer annotations. Annotators are given $n$ items (an $n$-tuple, where $n > 1$ and commonly $n= 4$).[At its limit, when $n=2$, best–worst scaling reduces to a paired comparison; However, then a much larger set of tuples need to be annotated (closer to $N^2$). ] They are asked which item is the best (highest in terms of the property of interest) and which is the worst (least in terms of the property of interest). When working on $4$-tuples, best–worst annotations are particularly efficient because each best and worst annotation will reveal the order of five of the six item pairs (e.g., for a 4-tuple with items w, x, y, and z, if w is the best, and z is the worst, then w $>$ x, w $>$ y, w $>$ z, x $>$ z, and y $>$ z). Real-valued scores of association between the items and the property of interest can be determined using simple arithmetic on the number of times an item was chosen best and number of times it was chosen worst. It has been empirically shown that three annotations each for $2N$ $4$-tuples is sufficient for obtaining reliable scores (where N is the number of items). Kiritchenko and Mohammad maxdiff-naacl2016,kiritchenko2017best showed through empirical experiments on emotion lexicons that BWS produces more reliable and more discriminating scores than those obtained using rating scales.
Within the NLP community, BWS has been used for creating datasets for relational similarity, word-sense disambiguation, word–sentiment intensity, sentence–sentence semantic relatedness, etc.
#7. Annotators: Who is recruited to annotate the data also impacts the lexicon that is generated.
Experts or Crowd: If a task has clear correct and wrong answers and knowing the answers requires some training/qualifications, then one can employ domain experts to annotate the data. However, emotion annotations largely do not fall in this category. People are the best judges of their emotions and how they use words to communicate them. If the goal is to determine how people use languageor we want to know how people perceive words, phrases, and sentences then we might want to employ a large number of annotators (crowdsourcing). Note that this is also a scenario where there can be more than one appropriate answer.
Diversity: Emotion lexicons are a function of their annotators. Consider who all should be represented in the annotator pool, and actively recruit people from under-represented groups. Seek appropriate demographic information (respectfully and ethically).Document annotator demographics at an aggregate level.
Informed Consent, Privacy, and Potential for Harms: Provide a clear and easy-to-understand description of what the task will involve, potential risks, and what information will be collected, before obtaining consent from the annotators. Note that if the terms included for annotation or the chosen dimension of annotation is particularly negative, then there may be significant risk of adversely impacting the annotator’s mental health. In such cases, suitable avenues for recourse must be provided.
Remuneration: Determine fair compensation for the task. Inform the annotators of the pay and the time commitment expected.
Miscellaneous: There are several other ethical considerations also involved with such work such as: worker invisibility, lack of learning trajectory, humans-as-a-service paradigm, worker well-being, and worker rights.
Ethics Approval: Obtain approval of the project and annotation plan from your institution’s research ethics board before conducting the annotation. The ethics boards are also a great source of feedback for improving the ethical standards of the annotation process. If unsure whether some work requires ethics approval, reach out to the ethics board. Many institutions provide expedited review in cases of low risk.
Document these considerations so that the users can judge suitability of the lexicon for their work.
#8. Quality Control: Good quality control strategies can make a large difference for any scenario of annotations, but are especially important when the annotations are done via crowdsourcing. Quality control strategies can be of three kinds:
Type 1: applied before data annotation begins
Type 2: applied during data annotation, and
Type 3: applied after data annotation.
It is recommended to apply measures of all three kinds. Examples of Type 1 include:careful questionnaire design and setting up training or qualification annotations to screen annotators.
A particularly powerful example of a Type 2 measure is to intersperse the instances with small numberof hidden gold instances ($∼$5%) — instances for which the appropriate label(s) are pre-determined (by, say, the authors). If a crowd worker responds with an answer not already marked as appropriate, then they are immediately notified, the annotation is discarded. If an annotator’s accuracy on the gold questions falls below a pre-chosen threshold (say, 80%), then they are refused further annotation, and all of their annotations are discarded. This way the gold instances serve as a mechanism to avoid malicious annotations, as well as a way to further train the annotators. This also avoids scenarios where an annotator provides responses to a large number of questions, only to later learn that they misinterpreted something, rendering all of their annotations useless. The use of gold questions was popularized by the crowdsourcing platform CrowdFlower (now, Figure8).
Examples of Type 3 quality control measures include: removal of responses from people who answer questions too quickly, or whose responses are more than two standard deviations away from the responses of others. There also exist approaches that identify which annotators to trust using machine learning algorithms.
C. ENTRIES IN THE LEXICON
#9. Annotation Aggregation: Each instance in a lexicon (usually a word) is often annotated by a number of annotators. Standard practice in aggregating the responses from multiple annotators is to take the most frequent response. However, it should be noted that sometimes other responses are also appropriate. Further, different socio-cultural groups can perceive language differently, and taking the majority vote can have the effect of only considering the perceptions of the majority group. When these views are crystallized in the form of a lexicon, it can lead to the false perception that the norms so captured are “standard" or “correct", whereas other associations are “non-standard" or “incorrect". Thus, it is worth explicitly disavowing that view and stating that the lexicon simply captures the perceptions of the majority group among the annotators. Thus, it is recommended to also make available disaggregated annotations (annotations in their raw form – without aggregation). Note that it is also problematic to consider all annotator responses as valid because sometimes annotators make mistakes, and some may have inappropriate biases (see #15).
#10. Relative (not Absolute): The absolute values of the association scores themselves usually have no meaning. The scores help order the words relative to each other. For example, a term with a high valence score is associated with more positiveness than a term with with a lower score.
#11. Coverage: Some lexicons have a few hundred terms, and some have tens of thousands of terms. However, even the largest lexicons do not include all the terms in a language. Mostly, they include entries for the canonical forms (lemmas), but some also include morphological variants. The high-coverage lexicons, such as the NRC Emotion Lexicon, have tens of thousands of terms. However, when using the lexicons in specialized domains, one may find that a number of common terms in the domain are not listed in the lexicons.
#12. Not Immutable: The associations do not indicate an inherent unchangeable attribute. Emotion associations can change with time, but these lexicon entries are largely fixed. They pertain to the time they are created or the time associated with the corpus from which they are created.
#13. Perceptions (not “truth”):
Emotion lexicons largely capture how speakers of a language perceive the emotion associations of words.
As mentioned in the previous bullet, this can change with time. Further, it can also be different for different people. Mohammad and Turney MohammadT13 found that when the annotators are asked to judge emotion associations in terms of how speakers of a language perceive the word', the results have lower variance than when asked
the emotions evoked in the annotator’. Consider your objective when deciding which of the two framings (or some other) is more appropriate for your use case.
#14. Socio-Cultural Biases: Since the emotion lexicons have been created by people (directly through crowdsourcing or indirectly through the texts written by people) they capture various human biases. These biases may be systematically different for different socio-cultural groups. Document who produced the data (people from which countries, what is the gender distribution, age distribution, etc.) in the paper describing the dataset or in the associated datasheet. An advantage of crowdsourcing is that the annotations are from a wider pool of annotators; however, crowd annotators are systematically different from, and not representative of, the general population.
#15. Inappropriate Biases: Some of the human biases that have percolated into the lexicons may be rather inappropriate. For example, entries with low valence scores for certain demographic groups or social categories. Studying such biases in the lexicon can be useful to show and address some of the historical inequities that have plagued humankind. Nonetheless, when these lexicons are used in specific tasks, care must be taken to remove such entries from the lexicons where necessary.
#16. Errors: Even though the researchers take several measures to ensure high-quality and reliable data annotation (e.g., multiple annotators, clear and concise questionnaires, framing tasks as comparative annotations, interspersed check questions, etc.), human-error can never be fully eliminated in large-scale annotations. Expect a small number of clearly wrong entries. Automatically generated lexicons also can have erroneous entries. They are often built on the assumption that the tendency of a word to co-occur with emotion-associated seed terms is proportional to its association with that emotion. However, in any corpus, there will always be some amount of chance high co-occurrences that are not accurate reflections of the true associations.
#17. Mechanism to Report and Fix Errors: Provide a mechanism for users to report issues and errors. Fix errors and where appropriate issue warnings for how some types of entries can be mis-interpreted or misused. Periodically assess whether certain types of entries need to be proactively checked. For example, there has been growing recognition that emotion associations associated with identity groups are particularly sensitive, affected by historical bias, and so one must be careful in how they interpret the associations captured in lexicons.
D. APPLYING THE LEXICON
#18. Examining the Fit of the Lexicon: Manually examine the emotion associations of the most frequent terms in your data. Remove entries from the lexicon that are not suitable (due to mismatch of sense, inappropriate human bias, etc.).
#19. Rescaling the Lexicon for One’s Task: Depending on your specific use case, you may choose to re-scale the scores from 0 to 1, -1 to 1, 1 to 10, etc. Note that if using the lexicon entries as features in machine learning experiments, the scale (0 to 1 or -1 to 1) can make a difference—e.g. if the score is used as a weight for features.
#20 Metrics and Features Drawn from the Lexicon: For text analysis, one can calculate various metrics such as the percentage of emotion words (when the lexicons provides a list of words associated with a category) or average emotion intensity (for real-valued associations). When determining the scores, a further choice is how to handle words that are not in the lexicon. Two common approaches include: 1. Treat words that are not in the lexicon as neutral; 2. Ignore these words in the calculation of the scores. The latter approach does not make assumptions of neutrality, and is not impacted by the number of such out of lexicon words in a piece of text. See TMemotionarcs for a systematic analysis of the impact of various lexicon features on the quality of emotion arcs generated with them.
#21. Creating Subsets of the Lexicon: Sometimes it is better to use a subset of the emotion lexicon, rather than the whole lexicon.
Removing Neutral Words: One can use the whole lexicon to calculate metrics such as average valence of the words in a text; however, one can also choose to disregard terms with close to 0 valence scores. when calculating the same metric. Removal of such neutral terms from the analysis will show greater variations in the average scores when comparing across different sets of data of interest or across time. For example, when looking at the average tweet happiness over time of day, using full or neutral-removed lexicon is expected to get roughly similar curves, but the neutral-removed lexicon will show a greater amplitude (divergence of scores from the peaks to troughs).describes this as turning up the magnifier knob in a microscope. Note, however, that just having larger score differences between the target and control does not mean that the emotion word usage is substantially different or significant; and conversely, just because the score difference for a metric is small in value does not mean that the differences in emotion word usages are not substantial. (More on this in #22).
Removing Low-Association Words: Use of low-association terms from a lexicon may not be beneficial for some downstream applications. These entries may also include a greater percentage of annotation errors. See TMemotionarcs for experiments on multiple datasets and multiple emotion dimensions that examine usefulness of removing low-association terms from a lexicon when generating emotion arcs.
Removing Highly Polysemous and Certain Domain Words: For some applications, it is beneficial to discard highly ambiguous words. Entries for highly ambiguous words are more likely to include emotion associations for a sense that is not common in one’s data. As stated in #3, it is also recommended to remove entries not appropriate for the target domain; e.g., the word harry has a negative meaning, but it should not be used when analyzing text where a person has the name Harry.
#22. Inferences: When drawing inferences from texts using counts of emotion words:
It is more appropriate to make claims about emotion word usage rather than emotions of the speakers. For example,
the use of anger words grew by 20%' rather than
anger grew by 20%’. A marked increase in anger words is likely an indication that anger increased, but there is no evidence that anger increased by 20%. Further, it is important to understand the emotion metrics and to interpret them accordingly. For example, many off-the-shelf tools provide a “sentiment score" for the input textual instances, withoutproviding adequate details about what this score means. As discussed in #21, the scores themselves can have large or small values, and just knowing that the score difference between a target and control is large (or small) is not enough to draw meaningful inference. On the other hand, grounded metrics that tie the score to attributes such as percentage of positive words tend to be less open to misinterpretation.Comparative analysis is your friend. Often, emotion word counts on their own are not useful. For example,
the use of anger words grew by 20% when compared to [data from last year, data from a different person, etc.]' is more useful than saying
on average, 5 anger words were used in every 100 words'.Lexicon features (or any other automatically drawn features) are not well suited to draw meaningful emotional inferences from individual utterances. Human language and behaviour are highly variable and complex. However, with careful design, they can be useful to draw inferences about broad trends at an aggregate level.
Inferences drawn from large amounts of text are more reliable than those drawn from small amounts of text. TMemotionarcs show that this is the single most important feature in determining the fidelity of the predicted emotion trends with the true emotion trends, among a host of features they explored. For many emotion dimensions and dataset domains, it is advisable to determine aggregate emotion scores using at least 100 instances. For example, if there are at least 100 tweets per day about a product of interest, the average valence scores of all the words in the tweets every day is expected to produce a fairly accurate valence arc (x-axis is day, y-axis is average valence score for the corresponding day).
Limitations
This paper does not present a new NLP model or dataset. Thus, there are no corresponding limitations to discuss. However, the paper itself can be viewed as a document discussing limitations of existing approaches to do sentiment and emotion analysis using emotion lexica. The 22 best practises presented in the paper discuss approaches to engage with and counter these limitations.
While this document was a result of engaging a larger community through blog posts, talks, and discussions, we had relatively low access to developers of commercial sentiment analysis systems. Thus the list presented here may have missed some important considerations. We encourage readers and impacted stakeholders to challenge the assumptions latent in the document, and identify new ethical considerations not included here or not gaining adequate attention in the research community.
Concluding Remarks
Emotion lexicons are simple yet powerful tools to analyze text. However, use of the lexicons (even for tasks that it is suited for) can lead to inappropriate bias. Applying a lexicon to any new data should only be done after first investigating its suitability, and requires careful analysis to minimize unintentional harm. In this paper, we presented 22 best practises that include considerations that can help mitigate such unwanted outcomes, as well as strategies to make the best use of emotion lexicons towards drawing meaningful and accurate inferences. The best practises are organized as per a lexicon’s life cycle: A. Lexicon Design, B. Annotation, C. Entries in the Lexicon, and D. Applying the Lexicon. We also provide pointers to relevant literature to explore the best practises in more detail. It should be noted that these practises are not meant to be the final word, but rather jumping off points for further thought, discussion, and additional measures towards the responsible use of emotion lexicons.
Acknowledgments
Many thanks to Emiel van Miltenburg, Annika Schoene, Mallory Feldman, Tara Small, Roman Klinger,and Peter Turney for thoughtful comments and discussions.
Bibliography
1@article{Mohammad22AER,
2 year = {2022},
3 month = {June},
4 pages = {239-278},
5 number = {2},
6 volume = {48},
7 journal = {Computational Linguistics},
8 author = {Mohammad, Saif M.},
9 title = {Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis},
10}
11
12@article{stark2018algorithmic,
13publisher = {SAGE Publications Sage UK: London, England},
14year = {2018},
15pages = {204--231},
16number = {2},
17volume = {48},
18journal = {Social Studies of Science},
19author = {Stark, Luke},
20title = {Algorithmic psychometrics and the scalable subject},
21}
22
23@inproceedings{VM2022-TED,
24year = {2022},
25address = {Marseille, France},
26booktitle = {Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)},
27author = {Krishnapriya Vishnubhotla and Saif M. Mohammad},
28title = {Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada},
29}
30
31@inproceedings{rtz2013cschwaharacterizing,
32year = {2013},
33pages = {583--591},
34booktitle = {Seventh International AAAI Conference on Weblogs and Social Media},
35author = {Schwartz, Hansen Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Lucas, Richard E and Agrawal, Megha and Park, Gregory J and Lakshmikanth, Shrinidhi K and Jha, Sneha and Seligman, Martin EP and others},
36title = {Characterizing geographic variation in well-being using tweets},
37}
38
39@inproceedings{mohammad-2022-ethics,
40abstract = {Several high-profile events, such as the mass testing of emotion recognition systems on vulnerable sub-populations and using question answering systems to make moral judgments, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. At issue here are not just individual systems and datasets, but also the AI tasks themselves. In this position paper, I make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. I will also present a template for ethics sheets with 50 ethical considerations, using the task of emotion recognition as a running example. Ethics sheets are a mechanism to engage with and document ethical considerations before building datasets and systems. Similar to survey articles, a small number of carefully created ethics sheets can serve numerous researchers and developers.},
41pages = {8368--8379},
42doi = {10.18653/v1/2022.acl-long.573},
43url = {https://aclanthology.org/2022.acl-long.573},
44publisher = {Association for Computational Linguistics},
45address = {Dublin, Ireland},
46year = {2022},
47month = {May},
48booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
49author = {Mohammad, Saif},
50title = {Ethics Sheets for {AI} Tasks},
51}
52
53@misc{mohammad2020practical,
54primaryclass = {cs.CL},
55archiveprefix = {arXiv},
56eprint = {2011.03492},
57year = {2020},
58author = {Saif M. Mohammad},
59title = {Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons},
60}
61
62@inproceedings{strapparava2004wordnet,
63organization = {Lisbon},
64year = {2004},
65pages = {40},
66number = {1083-1086},
67volume = {4},
68booktitle = {Lrec},
69author = {Strapparava, Carlo and Valitutti, Alessandro and others},
70title = {Wordnet affect: an affective extension of wordnet.},
71}
72
73@inproceedings{MohammadKZ2013,
74address = {Atlanta, Georgia, USA},
75year = {2013},
76month = {June},
77booktitle = {Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013)},
78title = {NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets},
79author = {Mohammad, Saif M. and Kiritchenko, Svetlana and Zhu, Xiaodan},
80}
81
82@inproceedings{baccianella2010sentiwordnet,
83year = {2010},
84pages = {2200--2204},
85volume = {10},
86series = {LREC '10},
87booktitle = {Proceeding of the 7th International Conference on Language Resources and Evaluation},
88author = {Baccianella, Stefano and Esuli, Andrea and Sebastiani, Fabrizio},
89title = {{SentiWordNet 3.0:} An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining},
90}
91
92@misc{physiognomy_2017,
93month = {May},
94year = {2017},
95author = {Arcas, Blaise and Mitchell, Margaret and Todorov, Alexander},
96howpublished = {Medium. \url{https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a}},
97url = {https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a},
98title = {Physiognomy's New Clothes},
99}
100
101@misc{ongweso_2020,
102month = {Sep},
103year = {2020},
104author = {Ongweso, Edward},
105howpublished = {Vice. \url{https://www.vice.com/en/article/g5pawq/an-ai-paper-published-in-a-major-journal-dabbles-in-phrenology}},
106title = {An {AI} Paper Published in a Major Journal Dabbles in Phrenology},
107}
108
109@article{hertzmann2020computers,
110publisher = {ACM New York, NY, USA},
111year = {2020},
112pages = {45--48},
113number = {5},
114volume = {63},
115journal = {Communications of the ACM},
116author = {Hertzmann, Aaron},
117title = {Computers do not make art, people do},
118}
119
120@misc{article19_2021,
121month = {Jan},
122year = {2021},
123author = {ARTICLE19},
124howpublished = {\url{https://www.article19.org/wp-content/uploads/2021/01/ER-Tech-China-Report.pdf}},
125title = {Emotional Entanglement: China’s emotion recognition market and its implications for human rights},
126}
127
128@misc{wakefield_2021,
129month = {May},
130year = {2021},
131author = {Wakefield, Jane},
132howpublished = {BBC. \url{https://www.bbc.com/news/technology-57101248}},
133title = {{AI} emotion-detection software tested on {U}yghurs},
134}
135
136@misc{woensel_nevil_2019,
137month = {Mar},
138year = {2019},
139author = {Woensel, Lieve Van and Nevil, Nissy},
140howpublished = {European Parliamentary Research Service, PE 634.415. \url{https://www.europarl.europa.eu/RegData/etudes/ATAG/2019/634415/EPRS_ATA(2019)634415_EN.pdf}},
141title = {What if your emotions were tracked to spy on you?},
142}
143
144@misc{MaxDiff_2007,
145year = {2007},
146author = {Sawtooth Software Inc.},
147title = {The MaxDiff/Web System Technical Paper},
148}
149
150@book{david1963method,
151address = {New York},
152publisher = {Hafner Publishing Company},
153year = {1963},
154author = {David, Herbert Aron},
155title = {The method of paired comparisons},
156}
157
158@article{thurstone1927law,
159publisher = {Psychological Review Company},
160year = {1927},
161pages = {273},
162number = {4},
163volume = {34},
164journal = {Psychological review},
165author = {Thurstone, Louis L.},
166title = {A law of comparative judgment},
167}
168
169@book{fechner1966elements,
170publisher = {New York: Holt, Rinehart and Winston},
171year = {1966},
172author = {Fechner, Gustav},
173title = {Elements of psychophysics. Vol. I.},
174}
175
176@inproceedings{mixedpol-naacl2016,
177address = {San Diego, California},
178year = {2016},
179booktitle = {Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)},
180author = {Kiritchenko, Svetlana and Mohammad, Saif M.},
181title = {Sentiment Composition of Words with Opposing Polarities},
182}
183
184@inproceedings{maxdiff-naacl2016,
185address = {San Diego, California},
186year = {2016},
187booktitle = {Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)},
188author = {Kiritchenko, Svetlana and Mohammad, Saif M.},
189title = {Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best--Worst Scaling},
190}
191
192@inproceedings{SemEval2016Task7,
193address = {San Diego, California},
194year = {2016},
195month = {June},
196booktitle = {Proceedings of the International Workshop on Semantic Evaluation (SemEval)},
197title = {{SemEval-2016 Task 7}: Determining Sentiment Intensity of {E}nglish and {A}rabic Phrases},
198author = {Kiritchenko, Svetlana and Mohammad, Saif M. and Salameh, Mohammad},
199}
200
201@book{Louviere2015,
202year = {2015},
203publisher = {Cambridge University Press},
204title = {{Best-Worst Scaling}: Theory, Methods and Applications},
205author = {Jordan J. Louviere and Terry N. Flynn and A. A. J. Marley},
206}
207
208@misc{Louviere_1990,
209note = {Department of Marketing and Economic Analysis, University of Alberta},
210year = {1990},
211howpublished = {Working Paper},
212title = {Best-Worst Analysis},
213author = {Jordan J. Louviere and George G. Woodworth},
214}
215
216@incollection{flynn2014,
217publisher = {Edward Elgar Publishing},
218year = {2014},
219pages = {178--201},
220editor = {Stephane Hess and Andrew Daly},
221booktitle = {Handbook of Choice Modelling},
222author = {Flynn, T. N. and Marley, A. A. J.},
223title = {Best-worst scaling: theory and methods},
224}
225
226@article{MohammadSK16,
227volume = {Submitted},
228year = {2016},
229journal = {Special Section of the ACM Transactions on Internet Technology on Argumentation in Social Media},
230author = {Mohammad, Saif M. and Sobhani, Parinaz and Kiritchenko, Svetlana},
231title = {Stance and Sentiment in Tweets},
232}
233
234@inproceedings{stance-lrec,
235address = {Portoro\v{z}, Slovenia},
236year = {2016},
237booktitle = {Proceedings of 10th edition of the the Language Resources and Evaluation Conference (LREC)},
238title = {A Dataset for Detecting Stance in Tweets},
239author = {Saif M. Mohammad and Svetlana Kiritchenko and Parinaz Sobhani and Xiaodan Zhu and Colin Cherry},
240}
241
242@inproceedings{SCL-NMA,
243year = {2016},
244booktitle = {Proceedings of the Workshop on Computational Approaches to
245Subjectivity, Sentiment and Social Media Analysis (WASSA)},
246author = {Kiritchenko, Svetlana and Mohammad, Saif M.},
247title = {The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition},
248}
249
250@inproceedings{Jurgens2013EmbracingAA,
251year = {2013},
252booktitle = {NAACL},
253author = {David Jurgens},
254title = {Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels},
255}
256
257@article{barrett2019emotional,
258publisher = {Sage Publications Sage CA: Los Angeles, CA},
259year = {2019},
260pages = {1--68},
261number = {1},
262volume = {20},
263journal = {Psychological science in the public interest},
264author = {Barrett, Lisa Feldman and Adolphs, Ralph and Marsella, Stacy and Martinez, Aleix M and Pollak, Seth D},
265title = {Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements},
266}
267
268@article{ekman1992there,
269publisher = {American Psychological Association},
270pages = {550--553},
271number = {3},
272volume = {99},
273journal = {Psychological Review},
274year = {1992},
275author = {Ekman, Paul},
276title = {Are there basic emotions?},
277}
278
279@article{russell2003core,
280publisher = {American Psychological Association},
281year = {2003},
282pages = {145},
283number = {1},
284volume = {110},
285journal = {Psychological review},
286author = {Russell, James A},
287title = {Core affect and the psychological construction of emotion.},
288}
289
290@article{russell1977evidence,
291publisher = {Elsevier},
292year = {1977},
293pages = {273--294},
294number = {3},
295volume = {11},
296journal = {Journal of research in Personality},
297author = {Russell, James A and Mehrabian, Albert},
298title = {Evidence for a three-factor theory of emotions},
299}
300
301@article{russell2009emotion,
302publisher = {Taylor \& Francis},
303year = {2009},
304pages = {1259--1283},
305number = {7},
306volume = {23},
307journal = {Cognition and emotion},
308author = {Russell, James A},
309title = {Emotion, core affect, and psychological construction},
310}
311
312@article{cao2021toward,
313year = {2021},
314pages = {1--47},
315journal = {Computational Linguistics},
316author = {Cao, Yang Trista and Daum{\'e}, Hal},
317title = {Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias throughout the Machine Learning Lifecyle},
318}
319
320@book{ekman1994nature,
321publisher = {Oxford University Press},
322year = {1994},
323author = {Ekman, Paul Ed and Davidson, Richard J},
324title = {The nature of emotion: Fundamental questions.},
325}
326
327@article{gallagher2021generalized,
328publisher = {Springer Berlin Heidelberg},
329year = {2021},
330pages = {4},
331number = {1},
332volume = {10},
333journal = {EPJ Data Science},
334author = {Gallagher, Ryan J and Frank, Morgan R and Mitchell, Lewis and Schwartz, Aaron J and Reagan, Andrew J and Danforth, Christopher M and Dodds, Peter Sheridan},
335title = {Generalized word shift graphs: A method for visualizing and explaining pairwise comparisons between texts},
336}
337
338@article{mulligan2019shaping,
339year = {2019},
340journal = {Available at SSRN 3311894},
341author = {Mulligan, Deirdre K and Kluttz, Daniel and Kohli, Nitin},
342title = {Shaping our tools: Contestability as a means to promote responsible algorithmic decision making in the professions},
343}
344
345@misc{what_if_2018,
346month = {Sep},
347year = {2018},
348author = {Google},
349howpublished = {Google {AI} Blog. \url{https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html}},
350}
351
352@article{zhang2018deep,
353publisher = {Wiley Online Library},
354year = {2018},
355pages = {e1253},
356number = {4},
357volume = {8},
358journal = {Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
359author = {Zhang, Lei and Wang, Shuai and Liu, Bing},
360title = {Deep learning for sentiment analysis: A survey},
361}
362
363@article{soleymani2017survey,
364publisher = {Elsevier},
365year = {2017},
366pages = {3--14},
367volume = {65},
368journal = {Image and Vision Computing},
369author = {Soleymani, Mohammad and Garcia, David and Jou, Brendan and Schuller, Bj{\"o}rn and Chang, Shih-Fu and Pantic, Maja},
370title = {A survey of multimodal sentiment analysis},
371}
372
373@article{guntuku2019studying,
374publisher = {British Medical Journal Publishing Group},
375year = {2019},
376pages = {e030355},
377number = {11},
378volume = {9},
379journal = {BMJ open},
380author = {Guntuku, Sharath Chandra and Schneider, Rachelle and Pelullo, Arthur and Young, Jami and Wong, Vivien and Ungar, Lyle and Polsky, Daniel and Volpp, Kevin G and Merchant, Raina},
381title = {Studying expressions of loneliness in individuals using {T}witter: an observational study},
382}
383
384@inproceedings{kiritchenko-etal-2020-solo,
385isbn = {979-10-95546-34-4},
386language = {English},
387pages = {1567--1577},
388url = {https://aclanthology.org/2020.lrec-1.195},
389address = {Marseille, France},
390year = {2020},
391month = {May},
392booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
393author = {Kiritchenko, Svetlana and
394Hipson, Will and
395Coplan, Robert and
396Mohammad, Saif M.},
397title = {{SOLO}: A Corpus of Tweets for Examining the State of Being Alone},
398}
399
400@inproceedings{de2013predicting,
401year = {2013},
402pages = {128--137},
403booktitle = {Seventh international AAAI conference on weblogs and social media},
404author = {De Choudhury, Munmun and Gamon, Michael and Counts, Scott and Horvitz, Eric},
405title = {Predicting depression via social media},
406}
407
408@inproceedings{resnik-etal-2015-beyond,
409pages = {99--107},
410doi = {10.3115/v1/W15-1212},
411url = {https://aclanthology.org/W15-1212},
412address = {Denver, Colorado},
413year = {2015},
414month = {June 5},
415booktitle = {Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality},
416author = {Resnik, Philip and
417Armstrong, William and
418Claudino, Leonardo and
419Nguyen, Thang and
420Nguyen, Viet-An and
421Boyd-Graber, Jordan},
422title = {Beyond {LDA}: Exploring Supervised Topic Modeling for Depression-Related Language in {T}witter},
423}
424
425@article{shmueli2021beyond,
426year = {2021},
427journal = {arXiv preprint arXiv:2104.10097},
428author = {Shmueli et al. , Boaz },
429title = {Beyond fair pay: Ethical implications of NLP crowdsourcing},
430}
431
432@article{dolmaya2011ethics,
433year = {2011},
434number = {10},
435journal = {Linguistica Antverpiensia, New Series--Themes in Translation Studies},
436author = {Dolmaya, Julie McDonough},
437title = {The ethics of crowdsourcing},
438}
439
440@article{raykar2012eliminating,
441publisher = {JMLR. org},
442year = {2012},
443pages = {491--518},
444number = {1},
445volume = {13},
446journal = {The Journal of Machine Learning Research},
447author = {Raykar, Vikas C and Yu, Shipeng},
448title = {Eliminating spammers and ranking annotators for crowdsourced labeling tasks},
449}
450
451@inproceedings{hovy-etal-2013-learning,
452pages = {1120--1130},
453url = {https://aclanthology.org/N13-1132},
454publisher = {Association for Computational Linguistics},
455address = {Atlanta, Georgia},
456year = {2013},
457month = {June},
458booktitle = {Proceedings of the 2013 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies},
459author = {Hovy, Dirk and
460Berg-Kirkpatrick, Taylor and
461Vaswani, Ashish and
462Hovy, Eduard},
463title = {Learning Whom to Trust with {MACE}},
464}
465
466@misc{ai2_2019,
467month = {Jul},
468year = {2019},
469author = {AI2},
470howpublished = {Medium. \url{https://medium.com/ai2-blog/crowdsourcing-pricing-ethics-and-best-practices-8487fd5c9872}},
471title = {Crowdsourcing: Pricing Ethics and Best Practices},
472}
473
474@article{fort-etal-2011-last,
475pages = {413--420},
476doi = {10.1162/COLI_a_00057},
477url = {https://aclanthology.org/J11-2010},
478year = {2011},
479number = {2},
480volume = {37},
481journal = {Computational Linguistics},
482author = {Fort et al. , Kar{\"e}n},
483title = {{A}mazon {M}echanical {T}urk: Gold Mine or Coal Mine?},
484}
485
486@article{standing2018ethical,
487publisher = {Wiley Online Library},
488year = {2018},
489pages = {72--80},
490number = {1},
491volume = {27},
492journal = {Business Ethics: A European Review},
493author = {Standing, Susan and Standing, Craig},
494title = {The ethical use of crowdsourcing},
495}
496
497@inproceedings{irani2013turkopticon,
498year = {2013},
499pages = {611--620},
500booktitle = {Proceedings of the SIGCHI conference on human factors in computing systems},
501author = {Irani, Lilly C and Silberman, M Six},
502title = {Turkopticon: Interrupting worker invisibility in {A}mazon {M}echanical {T}urk},
503}
504
505@article{agrawal2016analyzing,
506year = {2016},
507journal = {arXiv preprint arXiv:1606.07356},
508author = {Agrawal, Aishwarya and Batra, Dhruv and Parikh, Devi},
509title = {Analyzing the behavior of visual question answering models},
510}
511
512@inproceedings{bissoto2020debiasing,
513year = {2020},
514pages = {740--741},
515booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
516author = {Bissoto, Alceu and Valle, Eduardo and Avila, Sandra},
517title = {Debiasing skin lesion datasets and models? not so fast},
518}
519
520@article{winkler2019association,
521publisher = {American Medical Association},
522year = {2019},
523pages = {1135--1141},
524number = {10},
525volume = {155},
526journal = {JAMA dermatology},
527author = {Winkler, Julia K and Fink, Christine and Toberer, Ferdinand and Enk, Alexander and Deinlein, Teresa and Hofmann-Wellenhof, Rainer and Thomas, Luc and Lallas, Aimilios and Blum, Andreas and Stolz, Wilhelm and others},
528title = {Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition},
529}
530
531@article{hollenstein2015time,
532publisher = {Sage Publications Sage UK: London, England},
533year = {2015},
534pages = {308--315},
535number = {4},
536volume = {7},
537journal = {Emotion Review},
538author = {Hollenstein, Tom},
539title = {This time, it’s real: Affective flexibility, time scales, feedback loops, and the regulation of emotion},
540}
541
542@inproceedings{macavaney-etal-2021-community,
543pages = {70--80},
544doi = {10.18653/v1/2021.clpsych-1.7},
545url = {https://aclanthology.org/2021.clpsych-1.7},
546publisher = {Association for Computational Linguistics},
547address = {Online},
548year = {2021},
549month = {June},
550booktitle = {Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access},
551author = {MacAvaney, Sean and
552Mittu, Anjali and
553Coppersmith, Glen and
554Leintz, Jeff and
555Resnik, Philip},
556title = {Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the {CLP}sych 2021 Shared Task},
557}
558
559@inproceedings{karam2014ecologically,
560organization = {IEEE},
561year = {2014},
562pages = {4858--4862},
563booktitle = {2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
564author = {Karam, Zahi N and Provost, Emily Mower and Singh, Satinder and Montgomery, Jennifer and Archer, Christopher and Harrington, Gloria and Mcinnis, Melvin G},
565title = {Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech},
566}
567
568@article{eichstaedt2015psychological,
569publisher = {Sage Publications Sage CA: Los Angeles, CA},
570year = {2015},
571pages = {159--169},
572number = {2},
573volume = {26},
574journal = {Psychological science},
575author = {Eichstaedt, Johannes C and Schwartz, Hansen Andrew and Kern, Margaret L and Park, Gregory and Labarthe, Darwin R and Merchant, Raina M and Jha, Sneha and Agrawal, Megha and Dziurzynski, Lukasz A and Sap, Maarten and others},
576title = {Psychological language on {T}witter predicts county-level heart disease mortality},
577}
578
579@book{barrett2017emotions,
580publisher = {Houghton Mifflin Harcourt},
581year = {2017},
582author = {Barrett, Lisa Feldman},
583title = {How emotions are made: The secret life of the brain},
584}
585
586@article{barrett2017theory,
587publisher = {Oxford University Press},
588year = {2017},
589pages = {1--23},
590number = {1},
591volume = {12},
592journal = {Social cognitive and affective neuroscience},
593author = {Barrett, Lisa Feldman},
594title = {The theory of constructed emotion: an active inference account of interoception and categorization},
595}
596
597@book{osgood1957measurement,
598publisher = {University of Illinois press},
599year = {1957},
600number = {47},
601author = {Osgood, Charles Egerton and Suci, George J and Tannenbaum, Percy H},
602title = {The measurement of meaning},
603}
604
605@article{russell1980circumplex,
606publisher = {American Psychological Association},
607year = {1980},
608pages = {1161},
609number = {6},
610volume = {39},
611journal = {Journal of personality and social psychology},
612author = {Russell, James A},
613title = {A circumplex model of affect.},
614}
615
616@book{scherer1999appraisal,
617publisher = {John Wiley \& Sons Ltd},
618year = {1999},
619author = {Scherer, Klaus R},
620title = {Appraisal theory.},
621}
622
623@article{lazarus1991progress,
624publisher = {American Psychological Association},
625year = {1991},
626pages = {819},
627number = {8},
628volume = {46},
629journal = {American psychologist},
630author = {Lazarus, Richard S},
631title = {Progress on a cognitive-motivational-relational theory of emotion.},
632}
633
634@article{harris1954distributional,
635publisher = {Taylor \& Francis},
636year = {1954},
637pages = {146--162},
638number = {2-3},
639volume = {10},
640journal = {Word},
641author = {Harris, Zellig S},
642title = {Distributional structure},
643}
644
645@book{chomsky2014aspects,
646publisher = {MIT press},
647year = {2014},
648volume = {11},
649author = {Chomsky, Noam},
650title = {Aspects of the Theory of Syntax},
651}
652
653@inproceedings{SemEval2018Task1,
654year = {2018},
655address = {New Orleans, LA, USA},
656booktitle = {Proceedings of International Workshop on Semantic Evaluation (SemEval-2018)},
657title = {SemEval-2018 {T}ask 1: {A}ffect in Tweets},
658author = {Mohammad, Saif M. and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana},
659}
660
661@inproceedings{fraser-etal-2019-feel,
662pages = {62--71},
663doi = {10.18653/v1/W19-1308},
664url = {https://www.aclweb.org/anthology/W19-1308},
665address = {Minneapolis, USA},
666year = {2019},
667month = {June},
668booktitle = {Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
669author = {Fraser, Kathleen C. and
670Zeller, Frauke and
671Smith, David Harris and
672Mohammad, Saif and
673Rudzicz, Frank},
674title = {How do we feel when a robot dies? Emotions expressed on {T}witter before and after hitch{BOT}{'}s destruction},
675}
676
677@inproceedings{mohammad-2011-upon,
678pages = {105--114},
679url = {https://www.aclweb.org/anthology/W11-1514},
680publisher = {Association for Computational Linguistics},
681address = {Portland, OR, USA},
682year = {2011},
683month = {June},
684booktitle = {Proceedings of the 5th {ACL}-{HLT} Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities},
685author = {Mohammad, Saif},
686title = {From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales},
687}
688
689@inproceedings{hipson-mohammad-2020-poki,
690isbn = {979-10-95546-34-4},
691pages = {1578--1589},
692url = {https://www.aclweb.org/anthology/2020.lrec-1.196},
693address = {Marseille, France},
694year = {2020},
695month = {May},
696booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
697author = {Hipson, Will and
698Mohammad, Saif M.},
699title = {{P}o{K}i: A Large Dataset of Poems by Children},
700}
701
702@article{mendelsohn2020framework,
703year = {2020},
704journal = {arXiv preprint arXiv:2003.03014},
705author = {Mendelsohn, Julia and Tsvetkov, Yulia and Jurafsky, Dan},
706title = {A Framework for the Computational Linguistic Analysis of Dehumanization},
707}
708
709@article{redondo2007spanish,
710publisher = {Springer},
711year = {2007},
712pages = {600--605},
713number = {3},
714volume = {39},
715journal = {Behavior research methods},
716author = {Redondo, Jaime and Fraga, Isabel and Padr{\'o}n, Isabel and Comesa{\~n}a, Montserrat},
717title = {The {S}panish adaptation of {ANEW} (affective norms for {E}nglish words)},
718}
719
720@article{stadthagen2017norms,
721publisher = {Springer},
722year = {2017},
723pages = {111--123},
724number = {1},
725volume = {49},
726journal = {Behavior research methods},
727author = {Stadthagen-Gonzalez, Hans and Imbault, Constance and S{\'a}nchez, Miguel A P{\'e}rez and Brysbaert, Marc},
728title = {Norms of valence and arousal for 14,031 {S}panish words},
729}
730
731@article{sianipar2016affective,
732publisher = {Frontiers},
733year = {2016},
734pages = {1907},
735volume = {7},
736journal = {Frontiers in psychology},
737author = {Sianipar, Agnes and van Groenestijn, Pieter and Dijkstra, Ton},
738title = {Affective meaning, concreteness, and subjective frequency norms for Indonesian words},
739}
740
741@article{schmidtke2014angst,
742publisher = {Springer},
743year = {2014},
744pages = {1108--1118},
745number = {4},
746volume = {46},
747journal = {Behavior research methods},
748author = {Schmidtke, David S and Schr{\"o}der, Tobias and Jacobs, Arthur M and Conrad, Markus},
749title = {ANGST: Affective norms for German sentiment terms, derived from the affective norms for {E}nglish words},
750}
751
752@article{moors2013norms,
753publisher = {Springer},
754year = {2013},
755pages = {169--177},
756number = {1},
757volume = {45},
758journal = {Behavior research methods},
759author = {Moors, Agnes and De Houwer, Jan and Hermans, Dirk and Wanmaker, Sabine and Van Schie, Kevin and Van Harmelen, Anne-Laura and De Schryver, Maarten and De Winne, Jeffrey and Brysbaert, Marc},
760title = {Norms of valence, arousal, dominance, and age of acquisition for 4,300 {D}utch words},
761}
762
763@inproceedings{yu2016building,
764year = {2016},
765pages = {540--545},
766booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
767author = {Yu, Liang-Chih and Lee, Lung-Hao and Hao, Shuai and Wang, Jin and He, Yunchao and Hu, Jun and Lai, K Robert and Zhang, Xuejie},
768title = {Building {C}hinese affective resources in valence-arousal dimensions},
769}
770
771@article{mohammadSK2015,
772year = {2016},
773pages = {95--130},
774volume = {55},
775journal = {Journal of Artificial Intelligence Research},
776author = {Mohammad, Saif M. and Salameh, Mohammad and Kiritchenko, Svetlana},
777title = {How Translation Alters Sentiment},
778}
779
780@incollection{mohammad2021chapter,
781publisher = {Elsevier},
782year = {2021},
783booktitle = {Emotion Measurement},
784author = {Mohammad, Saif M.},
785title = {Sentiment analysis: Detecting valence, emotions, and other affectual states from text},
786}
787
788@misc{mohammad2020sentiment,
789note = {arXiv:cs.CL/2005.11882},
790year = {2020},
791author = {Saif M. Mohammad},
792title = {Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text},
793}
794
795@inproceedings{nielsen2011new,
796year = {2011},
797address = {Heraklion, Crete},
798pages = {93--98},
799booktitle = {Proceedings of the ESWC Workshop on `Making Sense of Microposts': Big things come in small packages},
800author = {Nielsen, Finn {\AA}rup},
801title = {A new {ANEW}: Evaluation of a word list for sentiment analysis in microblogs},
802}
803
804@article{warriner2013norms,
805publisher = {Springer},
806year = {2013},
807pages = {1191--1207},
808number = {4},
809volume = {45},
810journal = {Behavior Research Methods},
811author = {Warriner, Amy Beth and Kuperman, Victor and Brysbaert, Marc},
812title = {Norms of valence, arousal, and dominance for 13,915 {E}nglish lemmas},
813}
814
815@article{pennebaker2001linguistic,
816year = {2001},
817pages = {2001},
818number = {2001},
819volume = {71},
820journal = {Mahway: Lawrence Erlbaum Associates},
821author = {Pennebaker, James W and Francis, Martha E and Booth, Roger J},
822title = {Linguistic inquiry and word count: {LIWC} 2001},
823}
824
825@article{Wiebe05,
826pages = {165-210},
827year = {2005},
828number = {2-3},
829volume = {39},
830journal = {Language Resources and Evaluation},
831title = {Annotating Expressions of Opinions and Emotions in Language},
832author = {Janyce Wiebe and
833Theresa Wilson and
834Claire Cardie},
835}
836
837@book{Stone66,
838year = {1966},
839publisher = {The MIT Press},
840title = {The General Inquirer: A Computer Approach to Content Analysis},
841author = {Stone, Philip and Dunphy, Dexter and Smith, Marshall and Ogilvie, Daniel M.},
842}
843
844@inproceedings{selbst2019fairness,
845year = {2019},
846pages = {59--68},
847booktitle = {Proceedings of the conference on fairness, accountability, and transparency},
848author = {Selbst, Andrew D and Boyd, Danah and Friedler, Sorelle A and Venkatasubramanian, Suresh and Vertesi, Janet},
849title = {Fairness and abstraction in sociotechnical systems},
850}
851
852@article{baumgartner2001response,
853publisher = {American Marketing Association},
854year = {2001},
855pages = {143--156},
856number = {2},
857volume = {38},
858journal = {Journal of Marketing Research},
859author = {Baumgartner, Hans and Steenkamp, Jan-Benedict E.M.},
860title = {{Response Styles in Marketing Research: A Cross-National Investigation}},
861}
862
863@misc{abdalla2021makes,
864primaryclass = {cs.CL},
865archiveprefix = {arXiv},
866eprint = {2110.04845},
867year = {2021},
868author = {Abdalla, Mohamed and Vishnubhotla, Krishnapriya and Mohammad, Saif M.},
869title = {What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study},
870}
871
872@inproceedings{abdalla2023makes,
873booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume},
874publisher = {Association for Computational Linguistics},
875address = {Dubrovnik, Croatia},
876year = {2023},
877author = {Abdalla, Mohamed and Vishnubhotla, Krishnapriya and Mohammad, Saif M.},
878title = {What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study},
879}
880
881@article{dodds2011temporal,
882publisher = {Public Library of Science San Francisco, USA},
883year = {2011},
884pages = {e26752},
885number = {12},
886volume = {6},
887journal = {PloS one},
888author = {Dodds, Peter Sheridan and Harris, Kameron Decker and Kloumann, Isabel M and Bliss, Catherine A and Danforth, Christopher M},
889title = {Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter},
890}
891
892@article{cuddy2008warmth,
893publisher = {Elsevier},
894year = {2008},
895pages = {61--149},
896volume = {40},
897journal = {Advances in experimental social psychology},
898author = {Cuddy, Amy JC and Fiske, Susan T and Glick, Peter},
899title = {Warmth and competence as universal dimensions of social perception: The stereotype content model and the BIAS map},
900}
901
902@article{doi:10.1177/1094428119836485,
903abstract = { We offer best-practice recommendations for journal reviewers, editors, and authors regarding data collection and preparation. Our recommendations are applicable to research adopting different epistemological and ontological perspectives—including both quantitative and qualitative approaches—as well as research addressing micro (i.e., individuals, teams) and macro (i.e., organizations, industries) levels of analysis. Our recommendations regarding data collection address (a) type of research design, (b) control variables, (c) sampling procedures, and (d) missing data management. Our recommendations regarding data preparation address (e) outlier management, (f) use of corrections for statistical and methodological artifacts, and (g) data transformations. Our recommendations address best practices as well as transparency issues. The formal implementation of our recommendations in the manuscript review process will likely motivate authors to increase transparency because failure to disclose necessary information may lead to a manuscript rejection decision. Also, reviewers can use our recommendations for developmental purposes to highlight which particular issues should be improved in a revised version of a manuscript and in future research. Taken together, the implementation of our recommendations in the form of checklists can help address current challenges regarding results and inferential reproducibility as well as enhance the credibility, trustworthiness, and usefulness of the scholarly knowledge that is produced. },
904eprint = {
905https://doi.org/10.1177/1094428119836485
906},
907url = {
908https://doi.org/10.1177/1094428119836485
909},
910doi = {10.1177/1094428119836485},
911year = {2021},
912pages = {678-693},
913number = {4},
914volume = {24},
915journal = {Organizational Research Methods},
916title = {Best Practices in Data Collection and Preparation: Recommendations for Reviewers, Editors, and Authors},
917author = {Herman Aguinis and N. Sharon Hill and James R. Bailey},
918}
919
920@book{presser2004questions,
921year = {1996},
922publisher = {SAGE Publications, Inc},
923author = {Presser, Stanley and Schuman, Howard},
924title = {{Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context}},
925}
926
927@techreport{bradley1999affective,
928institution = {The Center for Research in Psychophysiology, University of Florida},
929year = {1999},
930author = {Bradley, Margaret M and Lang, Peter J},
931title = {Affective norms for {E}nglish words ({ANEW}): Instruction manual and affective ratings},
932}
933
934@inproceedings{SCL-NMA2016,
935year = {2016},
936booktitle = {Proceedings of the Workshop on Computational Approaches to
937Subjectivity, Sentiment and Social Media Analysis (WASSA)},
938author = {Kiritchenko, Svetlana and Mohammad, Saif M.},
939title = {The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition},
940}
941
942@inproceedings{OPP-lrec,
943address = {Portoro\v{z}, Slovenia},
944year = {2016},
945booktitle = {Proceedings of 10th edition of the the Language Resources and Evaluation Conference (LREC)},
946title = {Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases},
947author = {Kiritchenko, Svetlana and Mohammad, Saif M.},
948}
949
950@misc{Orme_2009,
951year = {2009},
952howpublished = {Sawtooth Software, Inc.},
953author = {Bryan Orme},
954title = {Maxdiff analysis: Simple counting, individual-level logit, and {HB}},
955}
956
957@inproceedings{kiritchenko2017best,
958year = {2017},
959pages = {465--470},
960booktitle = {{Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)}},
961author = {Kiritchenko, Svetlana and Mohammad, Saif},
962title = {{Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation}},
963}
964
965@misc{Louviere_1991,
966year = {1991},
967howpublished = {Working Paper},
968title = {Best-worst scaling: A model for the largest difference judgments},
969author = {Jordan J. Louviere},
970}
971
972@inproceedings{jurgens-EtAl:2012:STARSEM-SEMEVAL,
973url = {http://www.aclweb.org/anthology/S12-1047},
974pages = {356--364},
975address = {Montr\'eal, Canada},
976year = {2012},
977month = {7-8 June},
978booktitle = {Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval)},
979title = {SemEval-2012 Task 2: Measuring Degrees of Relational Similarity},
980author = {Jurgens, David and Mohammad, Saif M. and Turney, Peter and Holyoak, Keith},
981}
982
983@inproceedings{arabicSA2015,
984year = {2015},
985address = {Denver, Colorado},
986booktitle = {Proceedings of the North American Chapter of Association of Computational Linguistics},
987title = {Sentiment After Translation: A Case-Study on {A}rabic Social Media Posts},
988author = {Mohammad Salameh and Saif M Mohammad and Svetlana Kiritchenko},
989}
990
991@inproceedings{Mohammad11a,
992address = {Portland, OR, USA},
993year = {2011},
994booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
995title = {Even the Abstract have Colour: Consensus in Word--Colour Associations},
996author = {Mohammad, Saif M.},
997}
998
999@article{MohammadT13,
1000year = {2013},
1001volume = {29},
1002title = {Crowdsourcing a Word-Emotion Association Lexicon},
1003pages = {436--465},
1004number = {3},
1005journal = {Computational Intelligence},
1006author = {Mohammad, Saif M. and Turney, Peter D.},
1007}
1008
1009@inproceedings{MohammadT10,
1010year = {2010},
1011address = {LA, California},
1012booktitle = {Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text},
1013title = {Emotions Evoked by Common Words and Phrases: Using {M}echanical {T}urk to Create an Emotion Lexicon},
1014author = {Mohammad, Saif M. and Turney, Peter D.},
1015}
1016
1017@inproceedings{MohammadDD09,
1018year = {2009},
1019pages = {599--608},
1020address = {Singapore},
1021booktitle = {Proceedings of Empirical Methods in Natural Language Processing (EMNLP-2009)},
1022title = {Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus},
1023author = {Mohammad, Saif M. and Dunne, Cody and Dorr, Bonnie},
1024}
1025
1026@inproceedings{vad-acl2018,
1027address = {Melbourne, Australia},
1028year = {2018},
1029booktitle = {Proceedings of The Annual Conference of the Association for Computational Linguistics (ACL)},
1030author = {Mohammad, Saif M.},
1031title = {Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 {E}nglish Words},
1032}
1033
1034@inproceedings{LREC18-AIL,
1035address = {Miyazaki, Japan},
1036year = {2018},
1037booktitle = {Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018)},
1038title = {Word Affect Intensities},
1039author = {Mohammad, Saif M.},
1040}
1041
1042@inproceedings{MohammadSemEval2013,
1043address = {Atlanta, Georgia, USA},
1044year = {2013},
1045month = {June},
1046booktitle = {Proceedings of the International Workshop on Semantic Evaluation},
1047title = {{NRC-Canada}: Building the State-of-the-Art in Sentiment Analysis of Tweets},
1048author = {Mohammad, Saif M. and Kiritchenko, Svetlana and Zhu, Xiaodan},
1049}
1050
1051@article{Kiritchenko2014,
1052year = {2014},
1053pages = {723--762},
1054volume = {50},
1055journal = {Journal of Artificial Intelligence Research},
1056title = {Sentiment Analysis of Short Informal Texts},
1057author = {Kiritchenko, Svetlana and Zhu, Xiaodan and Mohammad, Saif M.},
1058}
1059
1060@article{MohammadK14,
1061year = {2015},
1062keywords = {affect, tweets, social media, hashtags, basic emotions, personality detection, Big 5 model, word–emotion associations, sentiment analysis},
1063pages = {301--326},
1064doi = {10.1111/coin.12024},
1065url = {http://dx.doi.org/10.1111/coin.12024},
1066issn = {1467-8640},
1067number = {2},
1068volume = {31},
1069journal = {Computational Intelligence},
1070title = {Using Hashtags to Capture Fine Emotion Categories from Tweets},
1071author = {Mohammad, Saif M. and Kiritchenko, Svetlana},
1072}
1073
1074@inproceedings{Mohammad12,
1075pages = {246--255},
1076address = {Montr\'eal, Canada},
1077year = {2012},
1078booktitle = {Proceedings of the Joint Conference on Lexical and Computational Semantics},
1079title = {\#{E}motional Tweets},
1080author = {Mohammad, Saif M.},
1081}
1082
1083@inproceedings{vogel2012he,
1084year = {2012},
1085pages = {33--41},
1086booktitle = {Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries},
1087author = {Vogel, Adam and Jurafsky, Dan},
1088title = {He said, she said: Gender in the ACL anthology},
1089}
1090
1091@article{torvik2009author,
1092publisher = {ACM New York, NY, USA},
1093year = {2009},
1094pages = {1--29},
1095number = {3},
1096volume = {3},
1097journal = {ACM Transactions on Knowledge Discovery from Data (TKDD)},
1098author = {Torvik, Vetle I and Smalheiser, Neil R},
1099title = {Author name disambiguation in {MEDLINE}},
1100}
1101
1102@inproceedings{mohammad2020gender,
1103year = {2020},
1104address = {Seattle, USA},
1105booktitle = {Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics},
1106author = {Mohammad, Saif M.},
1107title = {Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations},
1108}
1109
1110@inproceedings{mohammad2020citations,
1111year = {2020},
1112address = {Seattle, USA},
1113booktitle = {Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics},
1114author = {Mohammad, Saif M.},
1115title = {Examining Citations of Natural Language Processing Literature},
1116}
1117
1118@inproceedings{mohammad2020demo,
1119year = {2020},
1120address = {Seattle, USA},
1121booktitle = {Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics},
1122author = {Mohammad, Saif M.},
1123title = {{NLP S}cholar: An Interactive Visual Explorer for Natural Language Processing Literature},
1124}
1125
1126@inproceedings{mohammad2020data,
1127year = {2020},
1128address = {Marseille, France},
1129booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020)},
1130author = {Mohammad, Saif M.},
1131title = {{NLP S}cholar: A Dataset for Examining the State of NLP Research},
1132}
1133
1134@article{mohammad2019nlpscholar,
1135year = {2019},
1136journal = {arXiv preprint arXiv:1911.03562},
1137author = {Mohammad, Saif M.},
1138title = {The State of {NLP} Literature: A Diachronic Analysis of the {ACL} Anthology},
1139}
1140
1141@article{bornmann2009state,
1142publisher = {John Wiley \& Sons, Ltd},
1143year = {2009},
1144pages = {2--6},
1145number = {1},
1146volume = {10},
1147journal = {EMBO reports},
1148author = {Bornmann, Lutz and Daniel, Hans-Dieter},
1149title = {The state of h index research},
1150}
1151
1152@article{zhu2015measuring,
1153publisher = {Wiley Online Library},
1154year = {2015},
1155pages = {408--427},
1156number = {2},
1157volume = {66},
1158journal = {Journal of the Association for Information Science and Technology},
1159author = {Zhu, Xiaodan and Turney, Peter and Lemire, Daniel and Vellino, Andr{\'e}},
1160title = {Measuring academic influence: Not all citations are equal},
1161}
1162
1163@article{qazvinian2013generating,
1164year = {2013},
1165pages = {165--201},
1166volume = {46},
1167journal = {Journal of Artificial Intelligence Research},
1168author = {Qazvinian, Vahed and Radev, Dragomir R and Mohammad, Saif M. and Dorr, Bonnie and Zajic, David and Whidby, Michael and Moon, Taesun},
1169title = {Generating extractive summaries of scientific paradigms},
1170}
1171
1172@inproceedings{mohammad2009using,
1173year = {2009},
1174pages = {584--592},
1175booktitle = {Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics},
1176author = {Mohammad, Saif M. and Dorr, Bonnie and Egan, Melissa and Hassan, Ahmed and Muthukrishan, Pradeep and Qazvinian, Vahed and Radev, Dragomir and Zajic, David},
1177title = {Using citations to generate surveys of scientific paradigms},
1178}
1179
1180@article{nanba2011classification,
1181year = {2011},
1182pages = {117--134},
1183number = {1},
1184volume = {11},
1185journal = {Advances in Classification Research Online},
1186author = {Nanba, Hidetsugu and Kando, Noriko and Okumura, Manabu},
1187title = {Classification of research papers using citation links and citation types: Towards automatic review article generation.},
1188}
1189
1190@inproceedings{pham2003new,
1191organization = {Springer},
1192year = {2003},
1193pages = {759--771},
1194booktitle = {Australasian Joint Conference on Artificial Intelligence},
1195author = {Pham, Son Bao and Hoffmann, Achim},
1196title = {A new approach for scientific citation classification using cue phrases},
1197}
1198
1199@inproceedings{teufel2006automatic,
1200year = {2006},
1201pages = {103--110},
1202booktitle = {Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing},
1203author = {Teufel, Simone and Siddharthan, Advaith and Tidhar, Dan},
1204title = {Automatic classification of citation function},
1205}
1206
1207@incollection{aya2005citation,
1208publisher = {World Scientific},
1209year = {2005},
1210pages = {287--298},
1211booktitle = {Knowledge Management: Nurturing Culture, Innovation, and Technology},
1212author = {Aya, Selcuk and Lagoze, Carl and Joachims, Thorsten},
1213title = {Citation classification and its applications},
1214}
1215
1216@article{mariani2018nlp4nlp,
1217publisher = {Frontiers},
1218year = {2018},
1219pages = {36},
1220volume = {3},
1221journal = {Frontiers in Research Metrics and Analytics},
1222author = {Mariani, Joseph and Francopoulo, Gil and Paroubek, Patrick},
1223title = {The NLP4NLP Corpus (I): 50 Years of Publication, Collaboration and Citation in Speech and Language Processing.},
1224}
1225
1226@article{ravenscroft2017measuring,
1227publisher = {Public Library of Science},
1228year = {2017},
1229pages = {e0173152},
1230number = {3},
1231volume = {12},
1232journal = {PloS one},
1233author = {Ravenscroft, James and Liakata, Maria and Clare, Amanda and Duma, Daniel},
1234title = {Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvements},
1235}
1236
1237@article{priem2010scientometrics,
1238year = {2010},
1239number = {7},
1240volume = {15},
1241journal = {First monday},
1242author = {Priem, Jason and Hemminger, Bradely H},
1243title = {Scientometrics 2.0: New metrics of scholarly impact on the social Web},
1244}
1245
1246@inproceedings{schluter2018glass,
1247year = {2018},
1248pages = {2793--2798},
1249booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
1250author = {Schluter, Natalie},
1251title = {The glass ceiling in {NLP}},
1252}
1253
1254@article{bulaitis2017measuring,
1255publisher = {Nature Publishing Group},
1256year = {2017},
1257pages = {7},
1258number = {1},
1259volume = {3},
1260journal = {Palgrave Communications},
1261author = {Bulaitis, Zoe},
1262title = {Measuring impact in the humanities: Learning from accountability and economics in a contemporary history of cultural value},
1263}
1264
1265@article{bos2019interdisciplinary,
1266publisher = {Ubiquity Press},
1267year = {2019},
1268number = {1},
1269volume = {18},
1270journal = {Data Science Journal},
1271author = {Bos, Arthur R and Nitza, Sandrine},
1272title = {Interdisciplinary Comparison of Scientific Impact of Publications Using the Citation-Ratio},
1273}
1274
1275@article{ioannidis2019standardized,
1276publisher = {Public Library of Science},
1277year = {2019},
1278pages = {e3000384},
1279number = {8},
1280volume = {17},
1281journal = {PLoS biology},
1282author = {Ioannidis, John PA and Baas, Jeroen and Klavans, Richard and Boyack, Kevin W},
1283title = {A standardized citation metrics author database annotated for scientific field},
1284}
1285
1286@article{radev2016bibliometric,
1287publisher = {Wiley Online Library},
1288year = {2016},
1289pages = {683--706},
1290number = {3},
1291volume = {67},
1292journal = {Journal of the Association for Information Science and Technology},
1293author = {Radev, Dragomir R and Joseph, Mark Thomas and Gibson, Bryan and Muthukrishnan, Pradeep},
1294title = {A bibliometric and network analysis of the field of computational linguistics},
1295}
1296
1297@inproceedings{anderson2012towards,
1298year = {2012},
1299pages = {13--21},
1300booktitle = {Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries},
1301author = {Anderson, Ashton and McFarland, Dan and Jurafsky, Dan},
1302title = {Towards a computational history of the acl: 1980-2008},
1303}
1304
1305@article{bird2008acl,
1306publisher = {EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA},
1307year = {2008},
1308author = {Bird, Steven and Dale, Robert and Dorr, Bonnie J and Gibson, Bryan and Joseph, Mark Thomas and Kan, Min-Yen and Lee, Dongwon and Powley, Brett and Radev, Dragomir R and Tan, Yee Fan},
1309title = {The acl anthology reference corpus: A reference dataset for bibliographic research in computational linguistics},
1310}
1311
1312@inproceedings{yogatama2011predicting,
1313year = {2011},
1314pages = {594--604},
1315booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing},
1316author = {Yogatama, Dani and Heilman, Michael and O'Connor, Brendan and Dyer, Chris and Routledge, Bryan R and Smith, Noah A},
1317title = {Predicting a scientific community's response to an article},
1318}
1319
1320@article{khabsa2014number,
1321publisher = {Public Library of Science},
1322year = {2014},
1323pages = {e93949},
1324number = {5},
1325volume = {9},
1326journal = {PloS one},
1327author = {Khabsa, Madian and Giles, C Lee},
1328title = {The number of scholarly documents on the public web},
1329}
1330
1331@article{howland2010scholarly,
1332year = {2010},
1333author = {Howland, Jared L},
1334title = {How scholarly is Google Scholar? A comparison to library databases},
1335}
1336
1337@article{orduna2014size,
1338year = {2014},
1339journal = {arXiv preprint arXiv:1407.6239},
1340author = {Ordu{\~n}a-Malea, Enrique and Ayll{\'o}n, Juan Manuel and Mart{\'\i}n-Mart{\'\i}n, Alberto and L{\'o}pez-C{\'o}zar, Emilio Delgado},
1341title = {About the size of Google Scholar: playing the numbers},
1342}
1343
1344@article{martin2018google,
1345publisher = {Elsevier},
1346year = {2018},
1347pages = {1160--1177},
1348number = {4},
1349volume = {12},
1350journal = {Journal of Informetrics},
1351author = {Mart{\'\i}n-Mart{\'\i}n, Alberto and Orduna-Malea, Enrique and Thelwall, Mike and L{\'o}pez-C{\'o}zar, Emilio Delgado},
1352title = {Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories},
1353}
1354
1355@article{gusenbauer2019google,
1356publisher = {Springer},
1357year = {2019},
1358pages = {177--214},
1359number = {1},
1360volume = {118},
1361journal = {Scientometrics},
1362author = {Gusenbauer, Michael},
1363title = {Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases},
1364}
1365
1366@article{mingers2015review,
1367publisher = {Elsevier},
1368year = {2015},
1369pages = {1--19},
1370number = {1},
1371volume = {246},
1372journal = {European journal of operational research},
1373author = {Mingers, John and Leydesdorff, Loet},
1374title = {A review of theory and practice in scientometrics},
1375}
1376
1377@article{GGG18,
1378address = {Geneva, Switzerland},
1379author = {WEC, World Economic Forum},
1380year = {2018},
1381title = {The Global Gender Gap Report 2018},
1382}
1383
1384@book{hakura2016inequality,
1385publisher = {International Monetary Fund},
1386year = {2016},
1387author = {Hakura, Dalia S and Hussain, Mumtaz and Newiak, Monique and Thakoor, Vimal and Yang, Fan},
1388title = {Inequality, gender gaps and economic growth: Comparative evidence for sub-Saharan Africa},
1389}
1390
1391@techreport{gallego2018integrated,
1392institution = {Inter-American Development Bank},
1393year = {2018},
1394author = {Gallego, Juan Miguel and Guti{\'e}rrez, Luis H},
1395title = {An Integrated Analysis of the Impact of Gender Diversity on Innovation and Productivity in Manufacturing Firms},
1396}
1397
1398@article{rao2016board,
1399publisher = {Springer},
1400year = {2016},
1401pages = {327--347},
1402number = {2},
1403volume = {138},
1404journal = {Journal of Business Ethics},
1405author = {Rao, Kathyayini and Tilt, Carol},
1406title = {Board composition and corporate social responsibility: The role of diversity, gender, strategy and decision making},
1407}
1408
1409@book{skjelsboek2001gender,
1410publisher = {Sage},
1411year = {2001},
1412author = {Skjelsboek, Inger and Smith, Dan},
1413title = {Gender, peace and conflict},
1414}
1415
1416@article{mehta2017gender,
1417publisher = {American Thoracic Society},
1418year = {2017},
1419pages = {425--429},
1420number = {4},
1421volume = {196},
1422journal = {American journal of respiratory and critical care medicine},
1423author = {Mehta, Sangeeta and Burns, Karen EA and Machado, Flavia R and Fox-Robichaud, Alison E and Cook, Deborah J and Calfee, Carolyn S and Ware, Lorraine B and Burnham, Ellen L and Kissoon, Niranjan and Marshall, John C and others},
1424title = {Gender parity in critical care medicine},
1425}
1426
1427@techreport{woetzel2015power,
1428year = {2015},
1429publisher = {McKinsey Global Institute},
1430author = {Woetzel, Jonathan and others},
1431title = {The power of parity: How advancing women's equality can add \$12 trillion to global growth},
1432}
1433
1434@article{symonds2006gender,
1435publisher = {Public Library of Science},
1436year = {2006},
1437pages = {e127},
1438number = {1},
1439volume = {1},
1440journal = {PloS one},
1441author = {Symonds, Matthew RE and Gemmell, Neil J and Braisher, Tamsin L and Gorringe, Kylie L and Elgar, Mark A},
1442title = {Gender differences in publication output: towards an unbiased metric of research performance},
1443}
1444
1445@article{streuly1994accounting,
1446publisher = {American Accounting Association},
1447year = {1994},
1448pages = {247},
1449number = {2},
1450volume = {9},
1451journal = {Issues in Accounting Education},
1452author = {Streuly, Carolyn A and Maranto, Cheryl L},
1453title = {Accounting faculty research productivity and citations: are there gender differences?},
1454}
1455
1456@article{borrego2009scientific,
1457publisher = {Akad{\'e}miai Kiad{\'o}, co-published with Springer Science+ Business Media BV~…},
1458year = {2009},
1459pages = {93--101},
1460number = {1},
1461volume = {83},
1462journal = {Scientometrics},
1463author = {Borrego, {\'A}ngel and Barrios, Maite and Villarroya, Anna and Oll{\'e}, Candela},
1464title = {Scientific output and impact of postdoctoral scientists: A gender perspective},
1465}
1466
1467@article{king2017men,
1468publisher = {SAGE Publications Sage CA: Los Angeles, CA},
1469year = {2017},
1470pages = {2378023117738903},
1471volume = {3},
1472journal = {Socius},
1473author = {King, Molly M and Bergstrom, Carl T and Correll, Shelley J and Jacquet, Jennifer and West, Jevin D},
1474title = {Men set their own cites high: Gender and self-citation across fields and over time},
1475}
1476
1477@article{haakanson2005impact,
1478year = {2005},
1479pages = {312--323},
1480number = {4},
1481volume = {66},
1482journal = {College \& Research Libraries},
1483author = {H{\aa}kanson, Malin},
1484title = {The impact of gender on citations: An analysis of college \& research libraries, journal of academic librarianship, and library quarterly},
1485}
1486
1487@inproceedings{ghiasi2016gender,
1488year = {2016},
1489booktitle = {21st International Conference on Science and Technology Indicators-STI 2016. Book of Proceedings},
1490author = {Ghiasi, Gita and Larivi{\`e}re, Vincent and Sugimoto, Cassidy},
1491title = {Gender differences in synchronous and diachronous self-citations},
1492}
1493
1494@article{duch2012possible,
1495publisher = {Public Library of Science},
1496year = {2012},
1497pages = {e51332},
1498number = {12},
1499volume = {7},
1500journal = {PloS one},
1501author = {Duch, Jordi and Zeng, Xiao Han T and Sales-Pardo, Marta and Radicchi, Filippo and Otis, Shayna and Woodruff, Teresa K and Amaral, Lu{\'\i}s A Nunes},
1502title = {The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact},
1503}
1504
1505@article{mishra2018self,
1506publisher = {Public Library of Science},
1507year = {2018},
1508pages = {e0195773},
1509number = {9},
1510volume = {13},
1511journal = {PloS one},
1512author = {Mishra, Shubhanshu and Fegley, Brent D and Diesner, Jana and Torvik, Vetle I},
1513title = {Self-citation is the hallmark of productive authors, of any gender},
1514}
1515
1516@article{andersen2018google,
1517publisher = {Elsevier},
1518year = {2018},
1519pages = {950--959},
1520number = {3},
1521volume = {12},
1522journal = {Journal of Informetrics},
1523author = {Andersen, Jens Peter and Nielsen, Mathias Wullum},
1524title = {Google Scholar and Web of Science: Examining gender differences in citation coverage across five scientific disciplines},
1525}
1526
1527@article{willyard2011men,
1528year = {2011},
1529pages = {40},
1530number = {1},
1531volume = {9},
1532journal = {GradPSYCH Magazine},
1533author = {Willyard, Cassandra},
1534title = {Men: A growing minority},
1535}
1536
1537@article{LSA17,
1538author = {LSA, The Linguistic Society of America},
1539year = {2017},
1540title = {The State of Linguistics in Higher Education
1541Annual Report 2017},
1542}
1543
1544@inproceedings{smith2013search,
1545organization = {ACM},
1546year = {2013},
1547pages = {199--208},
1548booktitle = {Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries},
1549author = {Smith, Brittany N and Singh, Mamta and Torvik, Vetle I},
1550title = {A search engine approach to estimating temporal changes in gender orientation of first names},
1551}
1552
1553@article{nekby2008gender,
1554publisher = {Elsevier},
1555year = {2008},
1556pages = {405--407},
1557number = {3},
1558volume = {100},
1559journal = {Economics Letters},
1560author = {Nekby, Lena and Thoursie, Peter Skogman and Vahtrik, Lars},
1561title = {Gender and self-selection into a competitive environment: Are women more overconfident than men?},
1562}
1563
1564@article{hardies2013gender,
1565publisher = {Elsevier},
1566year = {2013},
1567pages = {442--444},
1568number = {3},
1569volume = {118},
1570journal = {Economics Letters},
1571author = {Hardies, Kris and Breesch, Diane and Branson, Jo{\"e}l},
1572title = {Gender differences in overconfidence and risk taking: Do self-selection and socialization matter?},
1573}
1574
1575@article{roos2008together,
1576publisher = {SAGE Publications},
1577year = {2008},
1578number = {2},
1579volume = {13},
1580journal = {Journal of Workplace Rights},
1581author = {Roos, Patricia A},
1582title = {Together but unequal: Combating gender inequity in the academy},
1583}
1584
1585@article{guzzetti1996gender,
1586publisher = {Wiley Online Library},
1587year = {1996},
1588pages = {5--20},
1589number = {1},
1590volume = {33},
1591journal = {Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching},
1592author = {Guzzetti, Barbara J and Williams, Wayne O},
1593title = {Gender, text, and discussion: Examining intellectual safety in the science classroom},
1594}
1595
1596@article{lariviere2013bibliometrics,
1597year = {2013},
1598pages = {211},
1599number = {7479},
1600volume = {504},
1601journal = {Nature News},
1602author = {Larivi{\`e}re, Vincent and Ni, Chaoqun and Gingras, Yves and Cronin, Blaise and Sugimoto, Cassidy R},
1603title = {Bibliometrics: Global gender disparities in science},
1604}
1605
1606@article{buchmann2009gender,
1607publisher = {Teachers College, Columbia University},
1608year = {2009},
1609journal = {Teachers College Record},
1610author = {Buchmann, Claudia},
1611title = {Gender inequalities in the transition to college.},
1612}
1613
1614@inproceedings{mitchell2019model,
1615year = {2019},
1616pages = {220--229},
1617booktitle = {Proceedings of the conference on fairness, accountability, and transparency},
1618author = {Mitchell, Margaret and Wu, Simone and Zaldivar, Andrew and Barnes, Parker and Vasserman, Lucy and Hutchinson, Ben and Spitzer, Elena and Raji, Inioluwa Deborah and Gebru, Timnit},
1619title = {Model cards for model reporting},
1620}
1621
1622@inproceedings{TMemotionarcs,
1623year = {2022},
1624publisher = {arXiv},
1625title = {Evaluating Automatically Generated Emotion Arcs:
1626A Case for Simple Methods Using Emotion Lexicons},
1627keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
1628author = {Teodorescu, Daniela and Mohammad, Saif M.},
1629url = {https://arxiv.org/abs/2210.07381},
1630doi = {10.48550/ARXIV.2210.07381},
1631}
1632
1633@book{wierzbicka1999emotions,
1634publisher = {Cambridge university press},
1635year = {1999},
1636author = {Wierzbicka, Anna},
1637title = {Emotions across languages and cultures: Diversity and universals},
1638}
1639
1640@inproceedings{Gebru2018DatasheetsFD,
1641year = {2018},
1642address = {Stockholm, Sweden},
1643booktitle = {Proceedings of the conference on Fairness, Accountability, and Transparency
1644in Machine Learning},
1645author = {Timnit Gebru and Jamie H. Morgenstern and Briana Vecchione and Jennifer Wortman Vaughan and H. Wallach and Hal Daum{\'e} and Kate Crawford},
1646title = {Datasheets for Datasets},
1647}
1648
1649@inproceedings{wilson2005recognizing,
1650year = {2005},
1651pages = {347--354},
1652booktitle = {Proceedings of human language technology conference and conference on empirical methods in natural language processing},
1653author = {Wilson, Theresa and Wiebe, Janyce and Hoffmann, Paul},
1654title = {Recognizing contextual polarity in phrase-level sentiment analysis},
1655}
1656
1657@article{feller2004measurement,
1658publisher = {Luxemburgo, Comisi{\'o}n Europea [en l{\'\i}nea] http://ec. europa. eu/research~…},
1659year = {2004},
1660volume = {35},
1661journal = {Gender and Excellence in the Making},
1662author = {Feller, Irwin},
1663title = {Measurement of scientific performance and gender bias},
1664}
1665
1666@article{gupta2005triple,
1667publisher = {JSTOR},
1668year = {2005},
1669pages = {1382--1386},
1670journal = {Current science},
1671author = {Gupta, Namrata and Kemelgor, Carol and Fuchs, Stefan and Etzkowitz, Henry},
1672title = {Triple burden on women in science: A cross-cultural analysis},
1673}
1674
1675@article{foschi2004blocking,
1676publisher = {Directorate-General for Research, Science and Society, European Commission~…},
1677year = {2004},
1678pages = {51--56},
1679journal = {Gender and Excellence in the Making},
1680author = {Foschi, Marta},
1681title = {Blocking the use of gender-based double standards for competence},
1682}
1683
1684@article{brouns2007making,
1685year = {2007},
1686journal = {Wissenschaftsrat (Hrsg.)},
1687author = {Brouns, Margo},
1688title = {The making of Excellence--gender bias in academia},
1689}
1690
1691@article{evans2014black,
1692publisher = {ERIC},
1693year = {2014},
1694pages = {22--30},
1695number = {1},
1696volume = {4},
1697journal = {Interdisciplinary Journal of Teaching and Learning},
1698author = {Evans-Winters, Venus E},
1699title = {Are Black Girls Not Gifted? Race, Gender, and Resilience.},
1700}
1701
1702@inproceedings{knowles2016demographer,
1703year = {2016},
1704pages = {108--113},
1705booktitle = {Proceedings of the First Workshop on NLP and Computational Social Science},
1706author = {Knowles, Rebecca and Carroll, Josh and Dredze, Mark},
1707title = {Demographer: Extremely simple name demographics},
1708}
1709
1710@misc{vanetta16,
1711year = {2016},
1712howpublished = {https://pypi.python.org/pypi/gender-detector/0.0.4.},
1713author = {Vanetta, Marcos},
1714title = {Gender detector},
1715}
Attribution
arXiv:2210.07206v2
[cs.CL]
License: cc-by-nc-sa-4.0