Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis

Content License: cc-by

Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis

Emotions play a central role in our lives. Thus affective computing, which deals with emotions and computation (often through AI systems) is atremendously important and vibrant line of work. It is a sweeping interdisciplinary area of study exploring both fundamental research questions (such as what are emotions?) and commercial applications (such as can machines detect consumer sentiment?).

In her seminal book, Affective Computing, Dr. Rosalind Picard described Automatic Emotion Recognition (AER) as: “giving emotional abilities to computers”. Such systems can be incredibly powerful:facilitators of enormous progress, but also enablers of great harm. In fact, some of the recent commercial and governmental uses of emotion recognition have garnered considerable criticism, including: infringing on one’s privacy, exploiting vulnerable sub-populations, and also allegations of downright pseudo-science. Even putting aside high-profile controversies, emotion recognition impacts people and thus entails ethical considerations (big and small). Thus, it is imperative that the AER community actively engage with the ethical ramifications of their creations.

This article, which I refer to as an Ethics Sheet for AER, is a critical reflection of this broad field of study with the aim of facilitating more responsible emotion research and appropriate use of the technology. As described in, an Ethics Sheet for an AI Task is a semi-standardized document that synthesizes and organizes information from AI Ethics and AI Task literature to present a comprehensive array of ethical considerations for that task. Thus, in some ways, an ethics sheet is similar to survey articles, except here the focus is on ethical considerations. It:

  • Fleshes out assumptions hidden in how the task is framed, and in the choices often made regarding the data, method, and evaluation.

  • Presents ethical considerations unique or especially relevant to the task.

  • Presents how common ethical considerations manifest in the task.

  • Presents relevant dimensions and choice points; along with tradeoffs.

  • Lists common harm mitigation strategies.

  • Communicates societal implications of AI systems to researchers, developers, and the broader society.

The sheet should flesh out various ethical considerations that apply at the level of the task. It should also flesh out ethical consideration of common theories, methodologies, resources,and practices used in building AI systems for the task. A good ethics sheet will question some of the assumptions that often go unsaid.

Primary motivation for creating an Ethics Sheet for AER: to provide a go-to point for a carefully compiled substantive engagement with the ethical issues relevant to emotion recognition; going beyond individual systems and datasets and drawing on knowledge from a large body of past work. The document will be useful to anyone who wants to build or use emotion recognition systems/algorithms for research or commercial purposes. Specifically, the main benefits can be summarized by the list below:

  • Encourages more thoughtfulness on why to automate, how to automate, and how to judge success well before the building of AER systems.

  • Helps us better navigate research and implementation choices.

  • Moves us towards consensus andstandards.

  • Helps in developing better post-production documents such as datasheets and model cards.

  • Has citations and pointers; acts as a jumping off point for further reading.

  • Helps engage the various stakeholders of an AI task with each other. Helps stakeholders challenge assumptions made by researchers and developers. Helps develop harm mitigation strategies.

  • Acts as a useful introductory document on emotion recognition (complements survey articles).

Note that even though this sheet is focused on AER, many of the ethical considerations apply broadly to natural language tasks in general. Thus, it can serve as a useful template to build ethics sheets for other tasks.

Target audience: The primary audience for this sheet are researchers, engineers, developers, and educators from various fields (especially NLP, ML, AI, data science, public health, psychology, and digital humanities) who build, make use of, or teach about AER technologies; however, much of the discussion should be accessible to various other stakeholders of AER as well, including policy/decision makers, and those who are impacted by AER. I hope also that this sheet will act as a springboard for the creation of a sheet where non-technical stakeholders are the primary audience.

Process: My own research interests are at the intersection of emotions and language—to understand how we use language to express our feelings. I created this sheet to gather and organize my thoughts around responsible emotion recognition research, and hopefully it is of use to others as well. Discussions with various scholars from computer science, psychology, linguistics, neuroscience, and social sciences (and their comments on earlier drafts) have helped shape this sheet. An earlier draft of this material was also posted as a blog post with an explicit invitation for feedback. Valuable insights from the community were then incorporated into this document. That said, it should be noted that I do not speak for the AER community. There is no “objective” or “correct” ethics sheet. This sheet should be taken as one perspective amongst many in the community. I welcome dissenting views and encourage further discussion. These can lead to periodically revised or new ethics sheets. As stated in:

Multiple ethics sheets can be created (by different teams and approaches) to reflect multiple perspectives, viewpoints, and what is important to different groups of people. We should be wary of the world with single authoritative ethics sheets per task and no dissenting voices.

The rest of the paper is organized as follows: Section 2 is a preface to the ethics sheet, Section 3 presents the Ethics sheet for AER (50 considerations), and this is followed by summarizing thoughts in Section 4. The Appendix compiles a list of succinct recommendations for responsible AER (drawn from the discussions on ethical considerations in Section 3).

Preface for the Ethics Sheet on AER

Let us consider a few rapid-fire questions to set the context. A good ethics sheet makes us question our assumptions. So let us start at the top:

Q1. Should we be building AI systems for Automatic Emotion Recognition? Is it ethical to do so?

A. This is a good question. This sheet will not explicitly answer the question, but it will help in clarifying and thinking about it. This sheet will sometimes suggest that certain applications in certain contexts are good or bad ideas, but largely it will discuss what are the various considerations to be taken into account: whether to build or use a particular system, how to build or use a particular system, what is more appropriate for a given context, how to assess success, etc.

The above question is also somewhat under-specified. We first need to clarify…

Q2. What does automatic emotion recognition mean?

A. Emotion recognition can mean many things, and it has many forms. (This sheet will get into that.) Emotion recognition can be deployed in many contexts. For example, many will consider automated insurance premium decisions based on inferred emotions to be inappropriate. However, studying how people use language to express gratitude, sadness, etc. is considered okay in many contexts. A human–computer interaction system benefits from being able to identify which utterances can convey anger, joy, sadness, hate, etc. (Not having such capabilities will lead to offensive, unempathetic, and inappropriate interactions.) Many other contexts are described in the sheet.

Q3. Can machines infer one’s true emotional state ever?

A. No. (This sheet will get into that.)

Q4. Can machines infer some small aspect of people’s emotions (or emotions that they are trying to convey) in some contexts, to the extent that it is useful?

A. In my view, yes. In a limited way, this is analogous to machine translation or web search. The machine does not understand language, nor does it understand what the user really wants, nor the social, cultural, or embodied context, but it is able to produce a somewhat useful translation or search result with some likelihood; and it produces some amount of inappropriate and harmful results with some likelihood. However, unlike machine translation or search, emotions are much more personal, private, and complex. People cannot fully determine each other’s emotions. People cannot fully determine their own emotional state. But we make do with our limitations and infer emotions as best we can to function socially. We also have moral and ethical failures. We cause harm because of our limitations, and we harbor stereotypes and biases.

If machines are to be a part of this world and interact with people in any useful and respectful way, then they must have at least some limited emotion recognition capabilities; and thereby will also cause some amount of harm. Thus, if we use them, it is important that we are aware of the limitations; design systems that protect and empower those without power; deploy them in the contexts they are designed for; use them to assist human decision making; and work to mitigate the harms they will perpetrate. We need to hold AER systems to high standards, not just because it is a nice aspirational goal, but because machines impact people at scale (in ways that individuals rarely can) and emotions define who we are (in ways that other attributes rarely do). I hope this sheet is useful in that regard.

Main Sheet (version 1.0)

This ethics sheet for Automatic Emotion Recognition has four sections: Modalities and Scope, Task, Applications, and Ethical Considerations. The first three are brief and set the context. The fourth presents various ethical considerations of AER as a numbered list, organized in thematic groups.

Modalities and Scope

Modalities: Work on AER has used a number of modalities (sources of input), including:

  • Facial expressions, gait, proprioceptive data (movement of body), gestures

  • Skin and blood conductance, blood flow, respiration, infrared emanations

  • Force of touch, haptic data (from sensors of force)

  • Speech, language (esp. written text, emoticons, emojis)

All of these modalities come with benefits, potential harms, and ethical considerations.

Scope: This sheet will focus on AER from written text and AER in Natural Language Processing (NLP), but several of the listed considerations apply to AER in general (regardless of modality, and regardless of field such as NLP or Computer Vision).


Automatic Emotion Recognition (AER) from one’s utterances (written or spoken) is a broad umbrella term used to refer to a number of related tasks such as those listed below: (Note that each of these framings has ethical considerations and may be more or less appropriate for a given context.)

  • Inferring emotions felt by the speaker (e.g., given Sara’s tweet, what is Sara feeling?); Inferring emotions of the speaker as perceived by the reader/listener (e.g., what does Li think Sara is feeling?); Inferring emotions that the speaker is attempting to convey (e.g., what emotion is Sara trying to convey?) These may be correlated, but they can be different depending on the particular instance. The first framing “inferring emotions felt by the speaker” is fairly common in scientific literature, but also perhaps most often misused/misinterpreted. More on this in the ethical considerations section.

  • Inferring the intensity of the emotions discussed above.

  • Inferring patterns of speaker’s emotions over long periods of time, across many utterances; including the inference of moods, emotion dynamics, and emotional arcs (e.g., tracking character emotion arcs in novels andtracking impact of health interventions on a patient’s well-being).

  • Inferring speaker’s emotions/attitudes/sentiment towards a target product, movie, person, idea, policy, entity, etc. (e.g., does Sara like the new phone?).

  • Inferring emotions evoked in the reader/listener (e.g., what feelings arise in Li on reading Sara’s tweet?). This may be different among different readers because of their past experiences, personalities, and world-views: e.g., the same text may evoke different feelings among people with opposing views on an issue.

  • Inferring emotions of people mentioned in the text (e.g., given a tweet that mentions Moe, what emotional state of Moe is conveyed in the tweet?).

  • Inferring emotionality of language used in text (regardless of whose emotions) (e.g., is the tweet about happy things, angry feelings, etc.?).

  • Inferring how language is used to convey emotions such as joy, sadness, loneliness, hate, etc.

  • Inferring the emotional impact of sarcasm, metaphor, idiomatic expression, dehumanizing utterance, hate speech, etc.

Note 1: The term Sentiment Analysis is commonly used to refer to the task described in bullet 4, especially in the context of product reviews (sentiment is commonly labeled as positive negative, or neutral). On the other hand, determining the predilection of a person towards a policy, party, issue, etc. is usually referred to as Stance Detection, and involves classes such as favour and against.

Note 2: Many AER systems focus only on the emotionality of the language used (bullet 7), even though their stated goal might be one of the other bullets. This may be appropriate in restricted contexts such as customer reviews or personal diary blog posts, but not always. (More on this in sec-taskdesign Ethical Considerations: Task Design.)

Note 3: There also exist tasks that focus not directly on emotions, but on associated phenomena, such as: whose emotions, who/what evoked the emotion, what types of human need was met or not met resulting in the emotion, etc. See these surveys for more details:examines emotions, sentiment, stance, etc.;focuses on sentiment analysis tasks;surveys multi-modal techniques for sentiment analysis.


The potential benefits of AER are substantial. Below is a sample of some existing applications: (Note that this is not an endorsement of these applications. All of the applications come with potential harms and ethical considerations. Use of AER by the military, for intelligence, and for education are especially controversial.)

  • Public Health: Assist public health research projects, including those on loneliness, depression, suicidality prediction, bipolar disorder, stress, and well-being.

  • Commerce/Business: Track sentiment and emotions towards one’s products, track reviews, blog posts, YouTube videos and comments; develop virtual assistants, writing assistants; help advertise products that one is more likely to be interested in.

  • Government Policy and Public Health Policy: Tracking and documenting views of the broader public on a range of issues that impact policy (tracking amount of support and opposition, identifying underlying issues and pain points, etc.). Governments and health organizations around the world are also interested in tracking how effective their messaging has been in response to crises such as pandemics and climate change.

  • Art and Literature: Improve our understanding of what makes a compelling story, how do different types of characters interact, what are the emotional arcs of stories,

  • what is the emotional signature of different genres,

  • what makes well-rounded characters,

  • why does art evoke emotions,

  • how do the lyrics and music impact us emotionally, etc. Can machines generate art(generate paintings, stories, music, etc.)?

  • Social Sciences, Neuroscience, Psychology: Help answer questions about people. What makes people thrive? What makes us happy? What can our language tell us about our well-being? What can language tell us about how we construct emotions in our minds? How do we express emotions? How different are people in terms of what different emotion words mean to them and how they use emotional words?

  • Military, Policing, and Intelligence: Tracking how sets of people or countries feel about a government or other entities (controversial); tracking misinformation on social media.


The usual approach to building an AER system is to design the task (identify the process to be automated, the emotions of interest, etc.), compile appropriate data (label some of the data for emotions—a process referred to as human annotation), train ML models that capture patterns of emotional expression from the data—the method, and evaluate the models by examining their predictions on a held-out test set. There are ethical considerations associated with each step of this development process. Considerations for privacy and social groups are especially pertinent for AER andcut across task, design, data, and evaluation.

This section describes fifty considerations grouped under the themes: Task Design, Data, Method, Impact and Evaluation, and Implications for Privacy and Social Groups. First I present an outline of the considerations along with a summary for each grouping. This is followed by five sub-sections (sec-taskdesign through sec-privacysg) that present, in detail, the ethical considerations associated with the five groups.


Summary: This section discusses various ethical considerations associated with the choices involved in the framing of the emotion task and the implications of automating the chosen task. Some important considerations include: Whether it is even possible to determine one’s internal mental state? Whether it is ethical to determine such a private state? And, who is often left out in the design of existing AER systems? I discuss how it is important to consider which formulation of emotions is appropriate for a specific task/project; while avoiding careless endorsement of theories that suggest a mapping of external appearances to inner mental states.

A. Theoretical Foundations

  1. Emotion Taskand Framing

  2. Emotion Model and Choice of Emotions

  3. Meaning and Extra-Linguistic Information

  4. Wellness and Emotion

  5. Aggregate Level vs. Individual Level

B. Implications of Automation

  1. Why Automate (Who Benefits; Will this Shift Power)

  2. Embracing Neurodiversity

  3. Participatory/Emancipatory Design

  4. Applications, Dual use, Misuse

  5. Disclosure of Automation


Summary: This section has three themes: implications of using datasets of different kinds, the tension between human variability and machine normativeness, and theconsiderations regarding the people who have produced the data. Notably, I discuss how on the one handis the tremendous variability in human mental representation and expression of emotions, and on the other hand, is the inherent bias of modern machine learning approaches to ignore variability. Thus, through their behaviour (e.g., by recognizing some forms of emotion expression and notothers), AI systems convey to the user what is “normal”; implicitly invalidating other forms of emotion expression.

C. Why This Data

  1. Types of data

  2. Dimensions of data

D. Human Variability vs. Machine Normativeness

  1. Variability of Expression and Mental Representation

  2. Norms of Emotions Expression

  3. Norms of Attitudes

  4. One “Right” Label or Many Appropriate Labels

  5. Label Aggregation

  6. Historical Data (Who is Missing and What are the Biases)

  7. Training-Deployment Differences

E. The People Behind the Data

  1. Platform Terms of Service

  2. Anonymization and Ability to Delete One’s information

  3. Warnings and Recourse

  4. Crowdsourcing


Summary: This section discusses the ethical implications of doing AER using a given method. It presents the types of methods and their tradeoffs, as well as, considerations of who is left out, spurious correlations, and the role of context. Special attention is paid to green AI and the fine line between emotion management and manipulation.

F. Why This Method

  1. Types of Methods and their Tradeoffs

  2. Who is Left Out by this Method

  3. Spurious Correlations

  4. Context is Everything

  5. Individual Emotion Dynamics

  6. Historical Behavior is not always indicative of Future Behavior

  7. Emotion Management, Manipulation

  8. Green AI


Summary: This section discusses ethical considerations associated with the impact of AER systems using both traditional metrics as well as through a number of other criteria beyond metrics. Notably, this latter subsection discusses interpretability, visualizations, building safeguards, and contestability, because even when systems work as designed, there will be some negative consequences. Recognizing and planning for such outcomes is part of responsible development.

G. Metrics

  1. Reliability/Accuracy

  2. Demographic Biases

  3. Sensitive Applications

  4. Testing (on Diverse Datasets, on Diverse Metrics)

H. Beyond Metrics

  1. Interpretability, Explainability

  2. Visualization

  3. Safeguards and Guard Rails

  4. Harms even when the System Works as Designed

  5. Contestability and Recourse

  6. Be wary of Ethics Washing


Summary: This section presents ethical implications of AER for privacy and for social groups. These issues cut across Task Design, Data, Method, and Impact. I discuss both individual and group privacy. The latter becomes especially important in the context of soft-biometrics determined through AER that are not intended to be able to identify individuals, but rather identify groups of people with similar characteristics. I discuss the need for work that does not treat people as a homogeneous group (ignoring sub-group differences) but rather explores disaggregation andintersectionality, while minimizing reification and essentialization of social constructs.

I. Implications for Privacy

  1. Privacy and Personal Control

  2. Group Privacy and Soft Biometrics

  3. Mass Surveillance vs. Right to Privacy, Expression, Protest

  4. Right Against Self-Incrimination

  5. Right to Non-Discrimination

J. Implications for Social Groups

  1. Disaggregation

  2. Intersectionality

  3. Reification and Essentialization

  4. Attributing People to Social Groups

One can read these various sections in one go, or simply use it as a reference when needed (jumping to sections of interest).


(Ten considerations.)

A. Theoretical Foundations

Domain naivete is not a virtue.

Study the theoretical foundations for the task from relevant research fields such as psychology, linguistics, and sociology, to inform thetask formulation.

#1. Emotion Task and Framing: Carefully consider what emotion task should be the focus of the work (whether conducting human-annotation or building an automatic system). (See sec-tasks for a sample of common emotion tasks.) When building an AER system, a clear grasp of the task will help in making appropriate design choices. When choosing which AER system to use, a clear grasp of the emotion task most appropriate for the deployment context will help in choosing the right AER system. It is not uncommon for users of AER to have a particular emotion task in mind and mistakenly assume that an off-the-shelf AER system is designed for that task.

Each of the emotion tasks has associated ethical considerations. For example,

Is the goal to infer one’s true emotions? Is it possible to comprehensively determine one’s internal mental state by any AI or human? (Hint: No.) Is it ethical to determine such a private state?

Realize that it is impossible to capture the full emotional experience of a person (even if one had access to all the electrical signals in the brain). A less ambitious goal is to infer some aspects of one’s emotional state.

Here, we see a distinct difference between AER that uses vision and AER that uses language. While there is little credible evidence of the connection between one’s facial expressions and one’s internal emotional state, there is a substantial amount of work on the idea that language is a window into one’s mind—which of course also includes emotions.

That said, there is no evidence that one can determine the full (or even substantial portions) of one’s emotional state through their language. (See also considerations #2 Emotion Model and #13 Variability of Expression ahead on complexity of the emotional experience and variability of expression.) Thus, often it is more appropriate to frame the AER task differently, for example, the objective could be:

  • to study how people express emotions: Work that uses speaker-annotated labeled data such as emotion-word hashtags in tweets usually captures how people convey emotions. What people convey may not necessarily indicate what they feel.

  • to determine perceived emotion (how others may think one is feeling):

  • Perceived emotions are not necessarily the emotions of the speaker.

  • Emotion annotations by people who have not written the source text usually reveal perceived emotions. (This is most common in NLP data-annotation projects.) Annotation aggregation strategies, such as majority voting usually only convey emotions perceived by a majority group. Are we missing out on the perceptions of some groups?(More on majority voting in #17 Label Aggregation.)

  • to determine emotionality of language used in text (regardless of whose emotions, target/stimulus, etc.): This may be appropriate in some restricted-domain scenarios, for example, when one is looking at customer reviews. Here, the context is indicative that the emotionality in the language likely indicates attitude towards the product being reviewed. However, such systems have difficulty when dealing with movie and book reviews because then it has to distinguish between text expressing attitudes towards the book/movie from text describing what happened in the plot (which is likely emotional too).

  • to determine trends at aggregate level: Emotionality of language is also useful when tracking broad patterns at an aggregate level e.g., tracking trends of emotionality in tens of thousands of tweets or text in novels over time (e.g.,). The idea is that aggregating information from a large number of instances leads to the determination of meaningful trends in emotionality. (See also discussion in #5 Aggregate Level vs. Individual Level.)

In summary, it is important to identify what emotion task is the focus of one’s work, use appropriate data, and communicate the nuance of what is being captured to the stakeholders. Not doing so will lead to the misuse and misinterpretation of one’s work. Specifically, AER systems should not claim to determine one’s emotional state from their utterance, facial expression, gait, etc. At best, AER systems capture what one is trying to convey or what is perceived by the listener/viewer, and even there, given the complexity of human expression, they are often inaccurate. A separate question is whether AER systems can determine trends in the emotional state of a person (or a group) over time? Here, inferences are drawn at aggregate level from much larger amounts of data. Studies on public health, such as those listed in 3.3 fall in this category. Here too, it is best to be cautious in making claims about mental state, and use AER as one source of evidence amongst many (and involve expertise from public health and psychology).

#2. Emotion Model and Choice of Emotions: Work on AER needs to opertationalize the aspect of emotion it intends to capture, that is, decide on emotion-related categories or dimensions of interest, decide on how to represent them, etc. Psychologists and neuro-scientists have identified several theories of emotion to inform these decisions:

  • The Basic Emotions Theory (BET): Work by Dr. Paul Ekman in 1960s galvanized the idea that some emotions (such as joy, sadness, fear, etc.) are universally expressed through similar facial expressions, and these emotions are more basic than others. This was followed by other proposals of basic emotions by Robert Plutchik, Izard and others. However, many of the tenets of BET, such as the universality of some emotions and their fixed mapping to facial expressions, stand discredited or are in question.
  • The Dimensional Theory: Several influential studies have shown that the three most fundamental, largely independent, dimensions of affect and connotative meaning are valence (positiveness–negativeness / pleasure–displeasure), arousal (active–sluggish), and dominance (dominant-–submissive / in control–out of control). Valence and arousal specifically are commonly studied in a number of psychological and neuro-cognitive explorations of emotion.
  • Cognitive Appraisal Theory: The core idea behind appraisal theoryis that emotions arise from a person’s evaluation of a situation or event. (Some varieties of the theory point to a parallel process of reacting to perceptual stimuli as well.) Thus it naturally accounts for variability in emotional reaction to the same event since different people may appraise the situation differently. Criticisms of appraisal theory centre around questions such as: whether emotions can arise without appraisal; whether emotions can arise without physiological arousal; and whether our emotions inform our evaluations.
  • The Theory of Constructed Emotions: Dr. Lisa Barrett proposed a new theory on how the human brain constructs emotions from our experiences of the world around us and the signals from our body.

Since ML approaches rely on human-annotated data (which can be hard to obtain in large quantities), AER research has often gravitated to the Basic Emotions Theory, as that work allows one to focus on a small number of emotions. This attraction has been even stronger in the vision AER research because of BET’s suggested mapping between facial expressions and emotions. However, as noted above, many of the tenets of BET stand debunked.

Consider which formulation of emotions is appropriate for your task/project. For example, one may choose to work with the dimensional model or the model of constructed emotions if the goal is to infer behavioural or health outcome predictions. Despite criticisms of BET, it makes sense for some NLP work to focus on categorical emotions such as joy, sadness, guilt, pride, fear, etc. (including what some refer to as basic emotions) because people often talk about their emotions in terms of these concepts. Most human languages have words for these concepts (even if our individual mental representations for these concepts vary to some extent). However, note that work on categorical emotions by itself is not an endorsement of the BET. Do not refer to some emotions as basic emotions, unless you mean to convey your belief in the BET. Careless endorsement of theories can lead to the perpetuation of ideas that are actively harmful (such as suggesting we can determine internal state from outward appearance—physiognomy).

#3. Meaning and Extra-Linguistic Information: The meaning of an utterance is not only a property of language, but it is grounded in human activity, social interactions, beliefs, culture, and other extra-linguistic events, perceptions, and knowledge. Thus one can express the same emotion in different ways in different contexts, different people express the same emotions in different ways, and the same utterances can evoke different emotions in different people. AER systems that do not take extra-linguistic information into consideration will always be limited in their capabilities, and risk being systematically biased, insensitive, and discriminatory. More on this in #13 Variability of Expression and #14 Norms of Emotion Expression.

#4. Wellness and Emotion: The prominent role of one’s body in the theory of constructed emotion, nicely accounts for the fact that various physical and mental illnesses (e.g., Parkinsons, Alzheimers, Cardiovascular Disease, Depression, Anxiety) impact our emotional lives. Existing AER systems are not capable of handling these inter-subject and within-subject variability and thus should not be deployed in scenarios where their decisions could negatively impact the lives of people; and where deployed, their limitations should be clearly communicated.

Emotion recognition is playing a greater role than ever before in understanding how our language reflects our wellness, understanding how certain physical and mental illnesses impact our emotional expression, and understanding how emotional expression can help improve our well-being. For some medical conditions, clinicians can benefit from a detailed history of one’s emotional state. However, people are generally not very good at remembering how they had been feeling over the past week, month, etc. Thus an area of interest is to use AER to help patients track their emotional state. See applications of AER in Public Health in Section sec-applications. See also CL Psych workshop proceedings. Note, however, that these are cases where the technology is working firmly in an assistive role to clinicians and psychologists—providing additional information in situations where human experts make decisions based on a number of other sources of information as well. Seefor ethical considerations on inferring mental health states from one’s utterances.

#5. Aggregate Level vs. Individual Level: Emotion detection can be be used to make inferences about individuals or groups of people; for example, to assist one in writing, to recommend products or services, etc. or to determine broad trends in attitudes towards a product, issue, or some other entity. Statistical inferences tend to be more reliable when using large amounts of data and when using more relevant data. Systems that make predictions about individuals often have very little pertinent information about the individual and thus often fall back on data from groups of people. Thus, given the person-to-person variability and within-person variability discussed in the earlier bullets, systems are imbued with errors and biases. Further, these errors are especially detrimental because of the direct and personal nature of such interactions. They may, for example, attribute majority group behavior/preferences to the individual, further marginalizing those that are not in the majority.

Various ethical concerns, including privacy, manipulation, bias, and free speech, are further exacerbated when systems act on individuals.

Work on finding trends in large groups of people on the other hand benefits from having a large amount of relevant information to draw on. However, see #43 Group Privacy and #47 to #50 _Implications for Social Groups_for relevant concerns.

B. Implications of Automation

What are the ethical implications of automating the chosen task?

#6. Why Automate (Who Benefits and Will this Shift Power): When we choose to work on a particular AER task, or any AI task for that matter, it is important to ask ourselves why? Often the first set of responses may be straightforward: e.g., to automate some process to make people’s lives easier, or to provide access to some information that is otherwise hard to obtain, or to answer research questions about how emotions work. However, lately there has been a call to go beyond this initial set of responses and ask more nuanced, difficult, and uncomfortable questions such as:

  • Who will benefit from this work and who will not?
  • Will this work shift power from those who already have a lot of power to those that have less power?
  • How can we reframe or redesign the task so that it helps those that are most in need?

Specifically for AER, this will involve considerations such as:

  • Are there particular groups of people who will not benefit from this task: e.g., people who convey and detect emotions differently than what is common (e.g., people on the autism spectrum), people who use language differently than the people whose data is being used to build the system (e.g., older people or people from a different region)?
  • If AER is used in some application, say to determine insurance premiums, then is this further marginalizing those that are already marginalized?
  • How can we prevent the use of emotion and stance detection systems for detecting and suppressing dissidents?
  • How can AER help those that need the most help?

Various other considerations such as those listed in this sheet can be used to further evaluate the wisdom in investing our labor in a particular task.

#7. Embracing Neurodiversity: Much of the ML/NLP emotion work has assumed homogeneity of users and ignored neurodiversity, alexithymia, and autism spectrum. These groups have significant overlap, but are not identical. They are also often characterized as having difficulty in sensing and expressing emotions. Therefore these groups hold particular significance in the development of an inclusive AER system. Existing AER systems implicitly cater to the more populous neurotypical group. At minimum, such AER systems should explicitly acknowledge this limitation. Report disaggregated performance metrics for relevant groups. (See also #47 Disaggregation.)

Greater research attention needs to be paid to the neurodiverse group. When doing data annotations, we should try to obtain information on whether participants are neurodiverse or neurotypical (when participants are comfortable sharing that information), and include that information at an aggregate level when we report participant demographics. Work in Psychology has used scales such as the Toronto Alexithymia Scale (TAS-20) to determine the difficulty that people might have in identifying and describing emotions.

#8. Participatory/Emancipatory Design: Participatory design in research and systems development centers the people, especially marginalized and disadvantaged communities, such that they are not mere passive subjects but rather have the agency to shape the design process. This has also been referred to as emancipatory researchand is pithily captured by the rallying cry “nothing about us without us”. These calls have developed across many different domains, including research pertaining to disability, indigenous communities, autism spectrum, and neurodiversity. Seefor specific recommendations for conducting studies with neuro-diverse participants.

#9. Applications, Dual Use, Misuse: AER is a powerful enabling technology that has a number of applications. Thus, like all enabling technologies it can be misused and abused. Examples of inappropriate commercial AER application include:

  • Using AER at airports to determine whether an individual is dangerous simply from their facial expressions.

  • Detecting stance towards governing authorities to persecute dissidents.

  • Using deception detection or lie detection en masse without proper warrants or judicial approval. (Using such technologies even in carefully restricted individual cases is controversial.)

  • Increasing someone’s insurance premium because the system has analyzed one’s social media posts to determine (accurately or inaccurately) that they are likely to have a certain mental health condition.

  • Advertisement that prey on the emotional state of people, e.g., user-specific advertising to people when they are emotionally vulnerable.

Socio-Psychological Applications: Applications such as inferring patterns in emotions of a speaker to in turn infer other characteristics such as suitability for a job, personality traits, or health conditions are especially fraught with ethical concerns. For example, consider the use of the Myers–Briggs Type Indicator (MBTI) for hiring decisions or research on detecting personality traits automatically. Notable ethical concerns, include:

  • MBTI is criticized by psychologists, especially for its lack of test-retest reliability. The Big 5 personality traits formalismhas greater validity, but even when using Big 5, it is easy to overstate the conclusions.

  • Even with accurate personality trait identification, there is little to no evidence that using personality traits for hiring and team-composition decisions is beneficial. The use of such tests have also been criticized on the grounds of discrimination.

Health and Well-Being Applications: AER has considerable potential for improving our health and well-being outcomes. However, the sensitive nature of such applications require substantial efforts to adhere to the best ethical principles. For example, how can harm be mitigated when systems make errors? Should automatic systems be used at all given that sometimes we cannot put a value to the cost of errors? What should be done when the system detects that one is at a high risk of suicide, depression, or some other severe mental health condition? How to safeguard patient privacy? See the shared task at the 2021 CL Psych workshop where a secure enclave was used to store the training and test data. See these papers for ethical considerations of AI systems in health care.

Applications in Art and Culture: Lately there has been increasing use of AI in art and culture, especially through curation and recommendation systems. Seefor a discussion of ethical implications, including: are we really able to determine what art one would like, long-term impacts of automated curation (on users and artists), and diversity of sources and content.

AI is also used in the analysis and generation of art: e.g, for literary analysis and generating poems, paintings, songs, etc. Since emotions are a central component of art, much of this work also includes automatic emotion recognition: e.g. tracking the emotions of characters in novels, recommending songs for people based on their mood, and generating emotional music. This raises several questions including:

  • Is it art if the creation did not involve human input?[]

  • Should AI play a collaborative role with other artists (enhancing their creativity) as opposed to generate pieces on its own?

  • How will artists be impacted by AI’s role in art?

  • Who should get credit for AI art?[]

  • How should we critique AI art?[]

See further discussion by.

#10. Disclosure of Automation: Disclose to all stakeholders the decisions that are being made (in part or wholly) by automation. Provide mechanisms for the user to understand why relevant predictions were made, and also to contest the decisions. (See also #36 Interpretability and #40 Contestability.)

Artificial agents that perceive and convey emotions in a human-like manner can give one the impression that they are interacting with a human. Artificial agents should begin their interactions with humans by first disclosing that they are artificial agents, even though some studies show certain negative outcomes of such a disclosure.


(Thirteen considerations.) [-2pt]

C. Why This Data

What are the ethical implications of using the chosen data?

#11. Types of Data: Emotion and sentiment researchers have used text data, speech data, data from mobile devices, data from social media, product reviews, suicide notes, essays, novels, movie screenplays, financial documents, etc. All of these entail their own ethical considerations in terms of the various points discussed in this article. AER systems use data in various forms, including:

  • Large Language Models: Language models such as BERT (that capture common patterns in language use) are obtained by training ML models on massive amounts of text found on the internet. Seefor ethical considerations in the use of large language models, including: documentation debt, difficult to curate, incorporation of inappropriate biases, and perpetuation of stereotypes. Note also that using smaller amounts of data raise concerns as well: they may not have enough generalizable information; they may be easier to overfit on; and they may not include diverse perspectives. An important aspect of preparing data (big or small) is deciding how to curate it
  • (e.g., what to discard).
  • Emotion Lexicons: Emotion Lexicons are lists of words and their associated emotions (determined manually by annotation or automatically from large corpora). Word–emotion association lexicons (such as AFINN, NRC Emotion Lexicon, and the Valence, Arousal, Dominance Lexicon) are a popular type of resource used in emotion research, emotion-related data science, and machine learning models for AER. Seefor biases and ethical considerations in the use of such emotion lexicons. Notable among these considerations is how words in different domains often convey different senses and thus have different emotion associations. Also, word associations capture historic perceptions that change with time and may differ acrossdifferent groups of people. They are not indicative of inherent immutable emotion labels.
  • Labeled Training and Testing Data: AER systems often make use of a relatively small number of example instances that are manually labeled (annotated) for emotions. A portion of these is used to train/fine-tune the large language model (training set). The rest is further split for development and testing. I discuss various ethical considerations associated with using emotion-labeled instances below. #12. Dimensions of Data: The data used by AER systems can be examined across various dimension: size of data; whether it is custom data (carefully produced for the research) or data obtained from an online platform (naturally occurring data); less private/sensitive data or more private/sensitive data; what languages are represented in the data; degree of documentation provided with the data; and so on. All of these have societal implications and the choice of datasets should be appropriate for the context of deployment.

D. Human Variability vs. Machine Normativeness

What should we know about emotion data so that we use it appropriately?

#13. Variability of Expression and Mental Representation: Language is highly variable—we can express roughly the same meaning in many different ways.

_Expressions of emotions through language are highly variable: Different people express the same emotion differently; the same text may convey different emotions to different people. _

This is true even for people living in the same area and especially true for people living in different regions, and people with different lived experiences. Some cues of emotion are somewhat more common and somewhat more reliable than others. This is usually the signal that automatic systems attempt to capture. We construct emotions in our brains from the signals we get from the world and the signal we get from our bodies. This mapping of signals to emotions is highly variable, and different people can have different signals associated with different emotions; therefore, different people have different concept–emotion associations. For example, high school, public speaking, and selfies may evoke different emotions in different people. This variability is not to say that there are no commonalities. In fact, speakers of a language share substantial commonalities in their mental representation of concepts (including emotions), which enables them to communicate with each other. However, the variability should also be taken into consideration when building datasets, systems, and choosing where to deploy the systems.

#14. Norms of Emotion Expression: As John M. Culkin once said, “We shape our tools and thereafter they shape us." Whether text, speech, vision, or any other modality, AI systems are often trained on a limited set of emotion expressions and their emotion annotations (emotion labels for the expressions).

Thus, through their behaviour (e.g., by recognizing some forms of emotion expression and not recognizing others), AI systems convey to the user that it is “normal” or appropriate to convey emotions in certain ways; implicitly invalidating other forms of emotion expression.

Therefore it is important for emotion recognition systems to accurately map a diverse set of emotion instantiations to emotion categories/dimensions. That said, it is also worth noting that the variations in emotion and language expression are so large that systems can likely never attain perfection. The goal is to obtain useful levels of emotion recognition capabilities without having systematic gaps that convey a strong sense of emotion-expression normativeness.

Normative implications of AER are analogous to normative implications of movies (especially animated ones):

  • Badly executed characters express emotions in fixed stereo-typical ways.
  • Good movies explore the diversity, nuance, and subtlety of human emotion expression.
  • Influential movies (bad and good) convey to a wide audience around the world how emotions are expressed or what is “normal” in terms of emotion expression. Thus they can either colonize other groups, reducing emotion expression diversity, or they can validate one’s individualism and independence of self-expression.

Since AI systems areinfluenced by the data they train on, dataset development should:

  • Obtain data from a diverse set of sources. Report details of the sources.

  • Studies have shown that a small percentage of speakers often produce a large percentage of utterances (see study byon tweets). Thus, when creating emotion datasets, limit the number of instances included per person. -kept one tweet for every query term and tweeter combination when studying relationships between affect categories (data also used in a SemEval-2018 Task 1 on emotions). -kept at most three tweets per tweeter when studying expressions of loneliness.

  • Obtain annotations from a diverse set of people. Report aggregate-level demographic information of the annotators.

Variability is common not just for emotions but also forlanguage. People convey meaning in many different ways. Thus, these considerations apply to NLP in general.

#15. Norms of Attitudes: Different people and different groups of people might have different attitudes, perceptions, and associations with the same product, issue, person, social groups, etc. Annotation aggregation, by say majority vote, may convey a more homogeneous picture to the ML system. Annotation aggregation may also capture stereotypes and inappropriate associations for already marginalized groups. (For example, majority group A may perceive a minority group B as less competent, or less generous.) Such inappropriate biases are also encoded in large language models. When using language models or emotion datasets, assess the risk of such biases for the particular context and take correcting action as appropriate.

#16. One “Right” Label or Many Appropriate Labels: When designing data annotation efforts, consider whether there is a “right” answer and a “wrong”? Who decides what is correct/appropriate? Are we including the voices of those that are marginalized and already under-represented in the data? When working with emotion and language data, there are usually no “correct” answers, but rather, some answers are more appropriate than others. And there can be multiple appropriate answers.

  • If a task has clear correct and wrong answers and knowing the answers requires some training/qualifications, then one can employ domain experts to annotate the data. However, as mentioned, emotion annotations largely do not fall in this category.
  • If the goal is to determine how people use language, and there can be many appropriate answers, or we want to know how people perceive words, phrases, and sentences then we might want to employ a large number of annotators. This is much more in line with what is appropriate for emotion annotations — people are the best judges of their emotions and of the emotions they perceive from utterances.

Seek appropriate demographic information (respectfully and ethically).Document annotator demographics, annotation instructions, and other relevant details. These are useful in conveying to the reader that there is no one “correct” answer and that the dataset is situated in who annotated the data, the precise annotation instructions, when the data was annotated, etc.

#17. Label Aggregation: Multiple annotations (by different people) for the same instanceare usually aggregated by choosing the majorty label. However, majority voting tends to capture majority group attitudes (at the expense of other groups). (See also,, and.) As a result, sometimes researchers have released not just the aggregated results but also the raw (pre-aggregated data), as well as various versions of aggregated results. Others have argued in favor of not doing majority voting at all and including all annotations as input to ML systems. However, saying all voices should be included has its own problems: e.g., how to address and manage inappropriate/racist/sexist opinions; how to disentangle low-frequency valid opinions from genuine annotation errors and malicious annotations? (See also #15 Norms of Attitudes and #47 Disaggregation.)

If using majority voting, acknowledge its limitations. Acknowledge that it may be missing some/many voices. Explore statistical approaches to finding multiple appropriate labels, while still discarding noise. Employ separate manual checks to determine whether the human annotations also capture inappropriate human biases. Such biases may be useful for some projects (e.g., work studying such biases), but not for others. Warn users ofinappropriate biases that may exist in the data; and suggest strategies to deal with them when using the dataset.

#18. Historical Data (Who is Missing and What are the Biases): Machine learning methods feed voraciously on data (often historical data). Natural language processing systems often feed on huge amounts of data collected from the internet. However, the data is not representative of everyone and seeped into this data are our biases. Historical data over-represents people who have had power, who are more well to do, mostly from the west, mostly English-speaking, mostly white, mostly able-bodied, and so on and so forth. So the machines that feed on such data often learn their perspectives at the expense of the views of those already marginalized.

When using any dataset, devote resources to study who is included in the dataset and whose voices are missing. Take corrective action as appropriate. Keep a portion of your funding for work with marginalized communities. Keep a portion of your funding for work on less-researched languages.

#19. Training–Deployment Data Differences: The accuracy of supervised systems is contingent on the assumption that the data the system is applied to is similar to the data the system was trained on. Deploying an off-the-shelf sentiment analysis system on data in a different domain, from a different time, or a different class distribution than the training data will likely result in poor predictions. Systems that are to be deployed to handle open-domain data should be trained on many diverse datasets and tested on many datasets that are quite different from the training datasets.

E. The People Behind the Data

What are the ethical implications on the people who have produced the data?

When building systems, we make extensive use of (raw and emotion-labeled) data. It can sometimes be easy to forget that behind the data are the people that produced it, and imprinted in it are a plethora of personal information.

#20. Platform Terms of Service: Data for ML systems is often scraped from websites or extracted from large online platforms (e.g., Twitter, Reddit) using APIs. The terms of service for these platforms often include protections for the users and their data. Ensure that the terms of service of the source platforms are not violated: e.g., data scraping is allowed and data redistribution is allowed (in raw form or through ids). Ensure compliance with the robot exclusion protocol.

#21. Anonymization and Ability to Delete One’s information: Take actions to anonymize data when dealing with private data; e.g., scrub identifying information. Some techniques are better at anonymization than others. (See for example, privacy-preserving work on word embeddings and sentiment data by.) Provide mechanisms for people to remove their data from the dataset if they choose to.

Choose to not work with a dataset if adequate safeguards cannot be placed.

#22. Warnings and Recourse: Annotating highly emotional, offensive, or suicidal utterances can adversely impact the well-being of the annotators. Provide appropriate warnings. Minimize amount of data exposure per annotator. Provide options for psychological help as needed.

#23. Crowdsourcing: Crowdsourcing (splitting a task into multiple independent units and uploading them on the internet so that people can solve them online) has grown to be a major source of labeled data in NLP, Computer Vision, and a number of other academic disciplines. Compensation often gets most of the attention when talking about crowdsourcing ethics, but there are several ethical considerations involved with such work such as: worker invisibility, lack of learning trajectory, humans-as-a-service paradigm, worker well-being, and worker rights. See. See (public) guidelines by AI2 for its researchers.


(Eight considerations.)

F. Why This Method

What are the ethical implications of using a given method?

#24. Methods and their Tradeoffs: Different methods entail different trade-offs:

  • Less Accurate vs. More Accurate: This usually gets all the attention; value other dimensions listed below as well. (See also sec-impact IMPACT.)

  • White Box (can understand why system makes a given prediction) vs. Black Box (do not know why it makes a given prediction): understanding the reasons behind a prediction help identify bugs and biases; helps contestability; arguably, better suited for answering research questions about language use and emotions.

  • Less Energy Efficient vs. More Energy Efficient: See discussion further below on Green AI.

  • Less Data Hungry vs. More Data Hungry: data may not always be abundant; needing too much data of a person leads to privacy concerns.

  • Less Privacy Preserving vs. More Privacy Preserving: There is greater appreciation lately for the need for privacy-preserving NLP.

  • Fewer Inappropriate Biases vs. More Inappropriate Biases: We want our algorithms to not perpetuate/amplify inappropriate human biases.

Consider various dimensions of a method and their importance for the particular system deployment context before deciding on the method. Focusing on fewer dimensions may be okay in a research system, but widely deployed systems often require a good balance across the many dimensions.

#25. Who is Left Out: The dominant paradigm in Machine Learning and NLP is to use use large pre-trained models pre-trained on massive amounts of raw data (unannotated text, pictures, videos, etc.) and then fine-tuned on small amounts of labeled data (e.g., sentences labeled with emotions) to learn how to perform a particular task. As such, these methods tend to work well for people that are well-represented in the data (raw and annotated), but not so well for others. (See also #18 Historical Data.)

Even just documenting who is left out is a valuable contribution.

Explore alternative methods that are more inclusive, especially for those not usually included by other systems.

#26. Spurious Correlations: Machine learning methods have been shown to be susceptible to spurious correlations. For example,show thatwhen asked what is the ground covered with, visual QA systems tend to always say snow, because in the training set, this question was only asked for when the ground was covered with snow.andshow spurious correlations in melanoma and skin lesion detection systems.andshow that natural language inference systems can sometimes decide on the prediction just from information in the premise, without regard for the hypothesis (for example, because a premise with negation is often a contradiction in the training set).

Similarly, machine learning systems capture spurious correlations when doing AER. For example, marking some countries and people of some demographics with less charitable and stereotypical sentiments and emotions. This phenomenon is especially marked in abusive language detection work where it was shown that data collection methods in combination with the ML algorithm result in the system marking any comment with identity terms such as gay, muslim, and jew as offensive.

Consider how the data collection and machine learning set ups can be addressed to avoid such spurious correlations, especially correlations that perpetuate racism, sexism, and stereotypes. In extreme cases, spurious correlations lead to pseudoscience and physiognomy. For example, there have been a spate of papers attempting to determine criminality, personality, trustworthiness, and emotions just from one’s face or outer appearance. Note that sometimes, systematic idiosyncrasies of the data can lead to apparent good results on a held out test set even on such tasks. Thus it is important to consider whether the method and sources of information used are expected to capture the phenomenon of interest? Is there a risk that the use of this method may perpetuate false beliefs and stereotypes? If yes, take appropriate corrective action.

#27. Context is Everything: Considering a greater amount of context is often crucial in correctly determining emotions/sentiment. What was said/written before and after the target utterance? Where was this said? What was the intonation and what was emphasized? Who said this? And so on. More context can be a double-edged sword though. The more the system wants to know about a person to make better predictions, the more we worry about privacy. Work on determining the right balance between collecting more user information and privacy considerations, as appropriate for the context in which the system is deployed.

#28. Individual Emotion Dynamics: A form of contextual information is one’s utterance emotion dynamics. The idea is that different people might have different steady states in terms of where they tend to most commonly be (considering any affect dimension of choice). Some may move out of this steady state often, but some may venture out less often. Some recover quickly from the deviations, and for some it may take a lot of time. Similar emotion dynamics occur in the text that people write or the words they utter—Utterance Emotion Dynamics. The degree of correlation between the utterance emotion dynamics and the true emotion dynamics may be correlated, but one can argue that examining utterance emotion dynamics is valuable on its own. Access to utterance emotion dynamics provides greater context and helps judge the degree of emotionality of new utterances by the person. Systems that make use of such detailed contextual information are more likely to make appropriate predictions for diverse groups of people. However, the degree of personal information they require warrants care, concern, and meaningful consent from the users.

#29. Historical behavior is not always indicative of future behavior (for groups and individuals): Systems are often trained on static data from the past. However, perceptions, emotions, and behavior change with time. Thus automatic systems may make inappropriate predictions on current data. (See also #18 Historical Data.)

#30. Emotion Management, Manipulation: Managing emotions is a central part of any human–computer interaction system (even if this is often not an explicitly stated goal). Just as in human–human interactions, we do not want the systems we build to cause undue stress, pain, or unpleasantness. For example, a chatbot has to be careful to not offend or hurt the feelings of the user with which it is interacting. For this, it needs to assess the emotions conveyed by the user, in order to then be able to articulate the appropriate information with appropriate affect. However, this same technology can enable companies and governments to detect one’s emotions to manipulate their behavior. For example, it is known that we purchase more products when we are sad. So sensing when you are most susceptible to suggestion to plant ideas of what to buy, who to vote for, or who to dislike, can have dangerous implications. On the other hand, identifying how to cater to individual needs to improve their compliance with public health measures in a world-wide pandemic, or to help people give up on smoking, may be seen in more positive light. As with many things discussed in this article, consider the context to determine what levels of emotional management and meaningful consent are appropriate.

#31. Green AI: A direct consequence of using ever-increasing pre-trained models (large number of training examples and hyperparameters) for AI tasks is that these systems are now drivers of substantial energy consumption. Recent papers showing the increasing carbon footprint of AI systems and approaches to address them. Thus, there is a growing push to develop AI methods that are not singularly focused on accuracy numbers on test sets, but are also mindful of efficiency and energy consumption. The authors encourage reporting of cost per example, size of training set, number of hyperparameters, and budget-accuracy curves. They also argue for regarding efficiency as a valued scientific contribution.


(Ten considerations.)

G. Metrics [2pt] All evaluation metrics are misleading. Some metrics are more useful than others.

#32. Reliability/Accuracy: No emotion recognition method is perfect. However, some approaches are much less accurate than others. Some techniques are so unreliable that they are essentially pseudoscience. For example, trying to predict personality, mood, or emotions through physical appearances has long been criticized. The ethics of a number of existing commercial systems that purportedly detect emotions from facial expressions is called into question by, which shows the low reliability of recognizing emotions from facial expressions.

#33. Demographic Biases: Some systems can be unreliable or systematically inaccurate for certain groups of people, races, genders, people with health conditions, people that are on the autism spectrum, people from different countries, etc. Such systematic errors can occur when working on:

  • Utterances of a group or faces of a group: For example, low accuracy in recognizing emotions in text produced by African Americans or in recognizing faces of African Americans.
  • Utterances mentioning a group: For example, systematically marking texts mentioning African Americans as more angry, or texts mentioning women as more emotional.

Determine and present disaggregated accuracies. Take steps to address disparities in performance across groups. (See also #47 Disaggregation.)

#34. Sensitive Applications: Some applications are considerably more sensitive than others and thus necessitate the use of a much higher quality of emotion recognition systems (if used at all). Automatic systems may sometimes be used in high-stakes applications if their role is to assist human experts. For example, assisting patients and health experts in tracking the patient’s emotional state.

#35. Testing (on Diverse Datasets, on Diverse Metrics): Results on any test set are contingents on the attributes of that test set and may not be indicative of real-world performance, or implicit biases, or systematic errors of many kinds. Good practice is to test the system on many different datasets that explore various input characteristics. For example, see these evaluations that cater to a diverse set of emotion-related tasks, datasets, linguistic phenomena, and languages: SemEval 2014 Task 9, SemEval 2015 Task 10, and SemEval 2018 Task 1. (The last of which also includes and evaluation component for demographic bias in sentiment analysis systems.) Seefor work on creating separate diagnostic datasets for various types of hate speech. See Google’s recommendations on best practices on metrics and testing ( )

H. Beyond Metrics

Are we even measuring the right things?

#36. Interpretability, Explainability: As ML systems are deployed more widely and impact a greater sphere of our lives, there is a growing understanding that these systems can be flawed to varying degrees. One line of approach in understanding and addressing these flaws is to develop interpretable or explainable models. Interpretability and explainability each have been defined in a few different ways in the literature, but at the heart of the definitions is the idea that we should be able to understand why a system is making a certain prediction: what pieces of evidence are contributing to the decision and to what degree? That way, humans can better judge how valid a particular prediction is, better judge how accurate the model is for certain kinds of input, and even how accurate the system is in general and over time.

In line with this, AER systems should have components that depict why they are making certain predictions for various inputs. As described in thesurvey, such components can be viewed from several perspectives, including:

  • are the explanations meant for the scientist/engineer or to a lay person?

  • are the explanations faithful (accurate reflections of system behavior)?

  • are the explanations easily comprehensible?

  • to what extent do people trust the explanations?

Responsible research and product development entails actively considering various explainability strategies at the very outset of the project. This includes, where appropriate, specifically choosing an ML model that lends itself to better interpretability, running ablation and disaggregation experiments, running data perturbation and adversarial testing experiments, and so on.

#37. Visualization: Visualizations help convey trends in emotions and sentiments, and are common in the emotion analysis of streams of data such as tweet streams, novels, newspaper headlines, etc. There are several considerations when developing visualizations that impact the extent to which they are effective, convey key trends, and the extent to which they may be misleading:

  • It is almost always important to not only show the broad trends but also to allow the user to drill down to the source data that is driving the trend.

  • Summarize the data driving the trend, for example through treemaps of the most frequent emotion words and phrases in the data.

  • Interactive visualizations allow users to explore different trends in the data and even drill down to the source data that is driving the trends.

See work on visualizing emotions and sentiment.

#38. Safeguards and guard rails: Devote time and resources to identify how the system can be misused and how the system may cause harm because of it’s inherent biases and limitations. Identify steps that can be taken to mitigate these harms.

#39. Recognize that there will be harms even when the system works “correctly”: Provide a mechanism for users to report issues. Have resources in place to deal with unanticipated harms. Document societal impacts, including both benefits and harms.

**#40. Contestability and Recourse:**argue that contestability—the mechanisms made available to challenge the predictions of an AI system—are more important and beneficial than transparency/explainability. Not only do they allow people to challenge the decisions made by a system, they also invite participation in the understanding of how machine learning systems work and their limitations. See Google’s The What-If Tool as an example of how people are invited to explore ML systems by changing inputs (without needing to do any coding). AER systems are encouraged to produce similar tools, for example:

  • tools that allow one to see counterfactuals—given a data point, what is the closest other data point for which the system predicts a different label; tools that allow one to try out various input conditions/features to see what help obtain the desired classification label.

  • tools that allow one to see classification accuracies on different demographics and the impact of different classifier parameters and thresholds on these scores.

  • tools that allow one to see confidence of the classifier for a given prediction and the features that were primarily responsible for the decision.

Seefor ideas on on participatory dataset creation and management.

#41. Be wary of Ethics Washing: As we push farther into incorporating ethical practices in our projects, we need to be wary of inauthentic and cursory attention to ethics for the sake of appearances. This VentureBeat articlepresents some nice tips to avoid ethics washing, including: “Welcome ‘constructive dissent’ and uncomfortable conversations”, “Don’t ask for permission to get started”, “Share your shortcomings”, “Be prepared for gray area decision-making”, and “Ethics has few clear metrics”.


(Nine considerations.)

I. Implications for Privacy

(Cuts across Task Design, Data, Method, Impact and Evaluation)

#42. Privacy and Personal Control: As noted privacy expert, Dr. Ann Cavoukian, puts it: privacy is not about hiding information or secrecy. It is about choice, “You have to be the one to make the decision." Individuals may not want their emotions to be inferred. Applying emotion detection systems en masse—gathering emotion information continuously, without meaningful consent, is an invasion of privacy, harmful to the individual, and dangerous to society. (See reportcreated for the members of the European Parliament). Follow the seven principles of privacy by design: Proactive not Reactive (preventative not remedial), Privacy as the Default, Privacy Embedded into Design, Full Functionality (positive-sum, not zero-sum), End-to-End Security (full lifecycle), Visibility and Transparency, and Respect for User Privacy (keep it user-centric). See also privacy-preserving work on sentiment by.

**#43. Group Privacy and Soft Biometrics:**argues that many of our conversations around privacy are far too focused on individual privacy and ignore group privacy — the rights and protections we need as a group.

There are very few Moby-Dicks. Most of us are sardines. The individual sardine may believe that the encircling net is trying to catch it. It is not. It is trying to catch the whole shoal. It is therefore the shoal that needs to be protected, if the sardine is to be saved. —The idea of group privacy becomes especially important in the context of soft-biometrics such as traits and preferences determined through AER that are not intended to be able to identify individuals, but rather identify groups of people with similar characteristics. Seefor further discussions on the implications of AER on group privacy and how companies are using AER to determine group preferences, even though a large number of people disfavour such profiling.

#44. Mass Surveillance versus Right to Privacy, Right to Freedom of Expression, and Right to Protest: Emotion recognition, sentiment analysis, and stance detection can be used for mass surveillance by companies and governments (often without meaningful consent). There is low awareness in people that their information (e.g., what they say or click on an online platform) can be used against their best interest. Often people do not have meaningful choices regarding privacy when they use online platforms. In extreme cases, as in the case of authoritarian governments, this can lead to dramatic curtailing of freedoms of expression and the right to protest.

#45. Right Against Self-Incrimination: In a number of countries around the world, the accused are given legal rights against self-incrimination. However, automatic methods of emotion, stance, and deception detection can potentially be used to circumvent such protections. (Seepage 37.)

#46. Right to Non-Discrimination: Automatic methods of emotion, stance, and deception detection can sometimes systematically discriminate based on these protected categories such as race, gender, and religion. Even if ML systems are not fed race or gender information directly, studies have shown that they often pick up on proxy attributes for these categories. Report disaggregated results as appropriate.

J. Implications for Social Groups

(Cuts across Task Design, Data, Method, Impact and Evaluation)

#47. Disaggregation: Society has often viewed different groups differently (because of their race, gender, income, language, etc.), imposing unequal social and power structures. Even when the biases are not conscious, the unique needs of different groups is often overlooked. For example,discusses, through numerous examples, how there is a considerable lack of disaggregated data for women and how that is directly leading to negative outcomes in all spheres of their lives, including health, income, safety, and the degree to which they succeed in their endeavors. This holds true (perhaps even more) for transgender people. Thus emotion researchers should consider the value of disaggregation at various levels, including:

  • When creating datasets: Obtain annotations from a diverse group of people. Report aggregate-level demographic information. Rather than only labeling instances with the majority vote, consider the value of providing multiple sets of labels as per each of the relevant and key demographic groups.

  • When testing hypotheses or drawing inferences about language use: Consider also testing the hypotheses disaggregated for each of the relevant and key demographic groups.

  • When building automatic prediction systems: Report performance disaggregated for each of the relevant and key demographic groups. (See work on model cards. See how sentiment analysis systems can be systematically biased.)

#48. Intersectional Invisibility in Research: Intersectionality refers to the complex ways in which different group identities such as race, class, neurodiversity, and gender overlap to amplify discrimination or disadvantage.argue how people with multiple group identities are often not seen as prototypical members of any of their groups and thus are subject to, what they, call intersectional invisibility—omissions of their experiences in historical narratives and cultural representation, lack of support from advocacy groups, and mismatch with existing anti-discrimination frameworks. Many of the forces that lead to such invisibility (e.g., not being seen as prototypical members of a group) along with other notions common in the quantitative research paradigm (e.g., the predilection to work on neat, non-overlapping, populous categories) lead to intersectional invisibility in research. As ML/NLP researchers, we should be cognizant of such blind spots and work to address these gaps. Further, new ways of doing research that address the unique challenges of doing intersectional research need to be valued and encouraged.

#49. Reification and Essentialization: Some demographic variables are essentially, or in big part, social constructs. Thus, work on disaggregation can sometimes reinforce false beliefs that there are innate differences across different groups or that some features are central for one to belong to a social category. Thus it is imperative to contextualize work on disaggregation. For example, by impressing on the reader that even though race is a social construct, the impact of people’s perceptions and behavior around race lead to very real-world consequences.

#50. Attributing People to Social Groups: In order to be able to obtain disaggregated results, sometimes oneneeds access to demographic information. This of course leads to considerations such as: whether they are providing meaningful consent to the collection of such data and whether the data being collected in a manner that respects their privacy, their autonomy (e.g., can they choose to delete their information later), and dignity (e.g., allowing self-descriptions). Challenges persist in terms of how to design effective and inclusive questionnaires. Further, even with self-report textboxes that give the respondent the primacy and autonomy to express their race, gender, etc., downstream research often ignores such data or combines information in ways beyond the control of the respondent. Some work tries to infer aggregate-level group statistics automatically. For example, inferring race, gender, etc. from cues such as the type of language used, historical name-gender associations, etc. to do disaggregated analysis. However, such approaches are fraught with ethical concerns such as misgendering, essentialization, and reification. Further, historically, people have been marginalized because of their social category, and so methods that try to detect these categories raise legitimate and serious concerns of abuse, erasure, and perpetuating stereotypes.

In many cases, it may be more appropriate to perform disaggregated analysis on something other than a social category. For example, when testing face recognition systems, it might be more appropriate to test the system performance on different skin tones (as opposed to race). Similarly, when working on language data, it might be more appropriate to analyze data partitioned by linguistic gender (as opposed to social gender). Seefor a useful discussion on linguistic vs. social gender and also for a great example to create more inclusive data for research.

In Summary

This paper aggregates and organizes various ethical considerations relevant to automatic emotion recognition, drawn from the wider AI Ethics and Affective Computing literature. It includes brief sections on the modalities of information, task, and applications of AER to set the context. Then it presents fifty ethical considerations grouped thematically. Notably, the sheet fleshes out assumptions hidden in how AER is commonly framed, and in the choices often made regarding the data, method, and evaluation. Special attention is paid to the implications of AERon privacy and social groups. It discusses how these considerations manifest within AER and outlines best practices for responsible research. A succinct list of key recommendations for responsible AER discussed in the paper is provided in the Appendix.

The objective of the sheet is to encourage practitioners to think in more detail and at the very outset: why to automate, how to automate, and how to judge success based on broad societal implications. I hope that it will help engage the various stakeholders of AER with each other; help stakeholders challenge assumptions made by researchers and developers; and help develop appropriate harm mitigation strategies. Additionally, for those that are new to emotion recognition, the ethics sheet acts as a useful introductory document(complementing survey articles).

As an expert on a technology, an often overlooked and undervalued responsibility is to convey its broad societal impacts to those that deploy the technology, those that make policy decisions about the technology, and the society at large. I hope that this sheet helps to that end for emotion recognition, and also spurs the wider community to ask and document: What ethical considerations apply to my task?

I am grateful to Annika Schoene, Mallory Feldman, and Tara Small for their belief and encouragement in the early days of this project. Many thanks to Mallory Feldman (Carolina Affective Neuroscience Lab, UNC) for discussions on the psychology and complexity of emotions. Many thanks to Annika Schoene, Mallory Feldman, Roman Klinger, Rada Mihalcea, Peter Turney, Barbara Plank, Malvina Nissim, Viviana Patti, Maria Liakata, and Emily Mower Provost for discussions about ethical considerations for emotion recognition and thoughtful comments. Many thanks to Tara Small, Emily Bender, Esma Balkir, Isar Nejadgholi, Patricia Thaine, Brendan O’Connor, Cyril Goutte, Eric Joanis, Joel Martin, Roland Kuhn, and Sowmya Vajjala for thoughtful comments on the blog post on this work.

APPENDIX: Recommendations for Responsible AER

Below is a list of key recommendations for responsible AER discussed earlier in the context of various ethical considerations. They are compiled here for easy access. Note that adhering to these recommendations does not guarantee “ethicalness”; nor do these recommendations apply to all contexts. They are guidelines meant to help responsible development and use of AER systems. Particular development or deployment contexts entail further considerations and steps to address them.

Task Design

  • Center the people, especially marginalized and disadvantaged communities, such that they are not mere passive subjects but rather have the agency to shape the design process.

  • Ask who will benefit from this work and who will not? Will this work shift power from those who already have a lot of power to those that have less power? How can the task be designed so that it helps those that are most in need ?

  • Ask how the AER design will impact people in the context ofneurodiversity, alexithymia, and autism spectrum.

  • Carefully consider what emotion task should be the focus of the work (whether conducting a human-annotation study or building an automatic prediction model). Different emotion tasks entail different ethical considerations. Communicate the nuance of exactly what emotions are being captured to the stakeholders. Not doing so will mean will lead to the misuse and misinterpretation of one’s work.

  • AER systems should not claim to determine one’s emotional state from their utterance, facial expression, gait, etc. At best, AER systems capture what one is trying to convey or what is perceived by the listener/viewer, and even there, given the complexity of human expression, they are often inaccurate.

  • Even when AER systems attempt to determine the emotional state of a person (or a group) over time (drawing inferences at aggregate level from large amounts of data), such as studies on public health listed in 3.3, it is best to be cautious when making claims. Use AER as one source of evidence amongst many (and involve relevant expertise; e.g., from public health and psychology).

  • Lay out the theoretical foundations for the task from relevant research fields such as psychology, linguistics, and sociology, and relate the opinions of relevant domain experts to the task formulation. Realize that it is impossible to capture the full emotional experience of a person.

  • Do not refer to some emotions as basic emotions, unless you mean to convey your belief in the Basic Emotions Theory. Careless endorsement of theories can lead to the perpetuation of belief in ideas that are actively harmful (such as suggesting we can determine internal state from outward appearance — physiognomy).

  • Realize that various ethical concerns, including privacy, manipulation, bias, and free speech, are further exacerbated when systems that act on individuals. Take steps such as anonymization and realizing information at aggregate levels.

  • Think about how the AER system can be misused, and how that can be minimized.

  • Use AER as one source of information among many.

  • Do not use AER for fully automated decision making. AER may be used to assist humans in making decisions, coming up with ideas, suggesting where to delve deeper, and sparking their imagination. Consider also the risk of the system inappropriately biasing the human decision makers.

  • Disclose to all stakeholders the decisions that are being made (in part or wholly) by automation. Provide mechanisms for the user to understand why relevant predictions were made, and also to contest the decisions.


  • Examine the choice of data used by AER systems across various dimensions: size of data; whether it is custom data or data obtained from an online platform; less private/sensitive data or more private/sensitive data; what languages are represented; degree of documentation; and so on.

  • Expressions of emotions through language are highly variable: Different people express the same emotion differently; the same text may convey different emotions to different people. This variability should also be taken into consideration when building datasets, systems, and choosing where to deploy the systems.

  • Variability is common not just for emotions but also for natural language. People convey meaning in many different ways. There is usually no one “correct” way of articulating our thoughts.

  • Aim to obtain useful level of emotion recognition capabilities without having systematic gaps that convey a strong sense of emotion-expression normativeness.

  • When using language models or emotion datasets, avoid perpetuating stereotypes of how one group of people perceive another group.

  • Obtain data from a diverse set of sources. Report details of the sources.

  • When creating emotion datasets, limit the number of instances included per person. Mohammad and Kiritchenko (2018) kept one tweet for every query term and tweeter combination when studying relationships between affect categories (data also used in a shared task on emotions). Kiritchenko et al., (2020) kept at most three tweets per tweeter when studying expressions of loneliness.

  • Obtain annotations from a diverse set of people. Report aggregate-level demographic information of the annotators.

  • In emotion and language data, often there are no “correct” answers. Instead, it is a case of some answers being more appropriate than others. And there can be multiple appropriate answers.

  • Part of conveying that there is no one “correct” answer is to convey how the dataset is situated in many parameters, including: who annotated it, the precise annotation instructions, what data was presented to the annotators (and in what form), and when the data was annotated.

  • Release raw data annotations as well as any aggregations of annotations.

  • If using majority voting, acknowledge its limitations.

  • Explore statistical approaches to finding multiple appropriate labels.

  • Employ manual and automatic checks to determine whether the human annotations have also captures inappropriate biases. Such biases may be useful for some projects (e.g., work studying such biases), but not for others. Warn users appropriately and deploy measures to mitigate their impact.

  • When using any dataset, devote time and resources to study who is included in the dataset and whose voices are missing. Take corrective action as appropriate.

  • Keep a portion of your funding for work with marginalized communities and for work on less-researched languages.

  • Systems that are to be deployed to handle open-domain data should be trained on many diverse datasets and tested on many datasets that are quite different from the training datasets.

  • Ensure that the terms of service of the source platforms are not violated: e.g., data scraping is allowed and data redistribution is allowed (in raw form or through ids). Check the platform terms of service. Ensure compliance with the robot exclusion protocol. Take actions to anonymize data when dealing with sensitive or private data; e.g., scrub identifying information. Choose to not work with a dataset if adequate safeguards cannot be placed.

  • Proposals of data annotation efforts that may impact the well-being of annotators should first be submitted for approval to one’s Research Ethics Board (REB) / Institutional Research Board (IRB). The board will evaluate and provide suggestions so that the work complies with the required ethics standards.

  • An excellent jumping off point for further information on ethical conduct of research involving human subjects is The Belmont Report. The guiding principles they proposed are Respect for Persons, Beneficence, and Justice.


  • Examine choice of method across various dimensions such as interpretability, privacy concerns, energy efficiency, data needs, etc . Focusing on fewer dimensions may be okay in a research system, but widely deployed systems often require a good balance across the many dimensions. AI methods tend to work well for people that are well-represented in the data (raw and annotated), but not so well for others. Documenting who is left-out is valuable. Explore alternative methods that are more inclusive. Consider how the data collection and machine learning setups can be addressed to avoid spurious correlations, especially correlations that perpetuate racism, sexism, and stereotypes.

  • Systems are often trained on static data from the past. However, perceptions, emotions, and behavior change with time. Consider how automatic systems may make inappropriate predictions on current data.

  • Consider the system deployment context to determine what levels of emotional management and meaningful consent are appropriate.

  • Consider the carbon footprint of your method and value efficiency as a contribution. Report costs per example, size of training set, number of hyperparameters, and budget-accuracy curves.

Impact and Evaluation

  • Consider whether the chosen metrics are measuring what matters.

  • Some methods can be unreliable or systematically inaccurate for certain groups of people, races, genders, people with health conditions, people from different countries, etc. Determine and present disaggregated accuracies. Test the system on many different datasets that explore various input characteristics.

  • Responsible research and product development entails actively considering various explainability strategies at the very outset of the project. This includes, where appropriate, specifically choosing an ML model that lends itself to better interpretability, running ablation and disaggregation experiments, running data perturbation and adversarial testing experiments, and so on.

  • When visualizing emotions, it is almost always important to not only show the broad trends but also to allow the user to drill down to the source data that is driving the trend. One can also summarize the data driving the trend, for example through treemaps of the most frequent emotion words.

  • Devote time and resources to identify how the system can be misused and how the system may cause harm because of it’s inherent biases and limitations. Recognize that there will be harms even when the system works “correctly”. Identify steps that can be taken to mitigate these harms.

  • Provide mechanisms for contestability that not only allow people to challenge the decisions made by a system about them, but also invites participation in the understanding of how machine learning systems work and it limitations.

Implications for Privacy

  • Privacy is not about secrecy. It is about personal choice. Follow Dr. Cavoukian’s seven principles of privacy by design.

  • Consider that people might not want their emotions to be inferred. Applying emotion detection systems en masse — gathering emotion information continuously, without meaningful consent, is an invasion of privacy, harmful to the individual, and dangerous to society.

  • Soft-biometrics also have privacy concerns. Consider implications of AER on group privacy and that a large number of people disfavour such profiling.

  • Obtain meaningful consent as appropriate for the context. Working with more sensitive and more private data requires a more involved consent process where the user understands the privacy concerns and willingly provides consent. Consider harm mitigation strategies such as: anonymization techniques and differential privacy. Beware that these can vary in effectiveness.

  • Plan for how to keep people’s information secure.

  • Obtain permission for secondary use or if you intend to distribute the data.

  • When working out the privacy–benefit tradeoffs, consider who will really benefit from the technology. Especially consider whether those who benefit are people with power or those with less power. Also, as Dr. Cavoukian says, often privacy and benefits can both be had, “it is not a zero-sum game”.

  • Consider implications of AER for mass surveillance and how that undermines right to privacy, right to freedom of expression, right to protest, right against self-incrimination, and right to non-discrimination.

Implications for Social Groups

  • When creating datasets, obtain annotations from a diverse group of people. Report aggregate-level demographic information. Rather than only labeling instances with the majority vote, consider the value of providing multiple sets of labels as per each of the relevant and key demographic groups.

  • When testing hypotheses or drawing inferences about language use, consider also testing the hypotheses disaggregated for each of the relevant demographic groups.

  • When building automatic prediction systems, evaluate and report performance disaggregated for each of the relevant demographic groups.

  • Consider and report the implication of the AER system on intersectionality.

  • Contextualize work on disaggregation: for example, by impressing on the reader that even though race is a social construct, the impact of people’s perceptions and behavior around race lead to very real-world consequences.

  • Obtaining demographic information requires careful and thoughtful considerations such as: whether people are providing meaningful consent to the collection of such data and whether the data being collected in a manner that respects their privacy, their autonomy (e.g., can they choose to delete their information later), and dignity (e.g., allowing self-descriptions).


 518  isbn = {979-10-95546-34-4},
 519  language = {English},
 520  pages = {1567--1577},
 521  url = {},
 522  address = {Marseille, France},
 523  year = {2020},
 524  month = {May},
 525  booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
 526  author = {Kiritchenko, Svetlana  and
 527Hipson, Will  and
 528Coplan, Robert  and
 529Mohammad, Saif M.},
 530  title = {{SOLO}: A Corpus of Tweets for Examining the State of Being Alone},
 534  year = {2013},
 535  pages = {128--137},
 536  booktitle = {Seventh international AAAI conference on weblogs and social media},
 537  author = {De Choudhury, Munmun and Gamon, Michael and Counts, Scott and Horvitz, Eric},
 538  title = {Predicting depression via social media},
 542  pages = {99--107},
 543  doi = {10.3115/v1/W15-1212},
 544  url = {},
 545  address = {Denver, Colorado},
 546  year = {2015},
 547  month = {June 5},
 548  booktitle = {Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality},
 549  author = {Resnik, Philip  and
 550Armstrong, William  and
 551Claudino, Leonardo  and
 552Nguyen, Thang  and
 553Nguyen, Viet-An  and
 554Boyd-Graber, Jordan},
 555  title = {Beyond {LDA}: Exploring Supervised Topic Modeling for Depression-Related Language in {T}witter},
 559  pages = {70--80},
 560  doi = {10.18653/v1/2021.clpsych-1.7},
 561  url = {},
 562  publisher = {Association for Computational Linguistics},
 563  address = {Online},
 564  year = {2021},
 565  month = {June},
 566  booktitle = {Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access},
 567  author = {MacAvaney, Sean  and
 568Mittu, Anjali  and
 569Coppersmith, Glen  and
 570Leintz, Jeff  and
 571Resnik, Philip},
 572  title = {Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the {CLP}sych 2021 Shared Task},
 576  organization = {IEEE},
 577  year = {2014},
 578  pages = {4858--4862},
 579  booktitle = {2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
 580  author = {Karam, Zahi N and Provost, Emily Mower and Singh, Satinder and Montgomery, Jennifer and Archer, Christopher and Harrington, Gloria and Mcinnis, Melvin G},
 581  title = {Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech},
 585  publisher = {Sage Publications Sage CA: Los Angeles, CA},
 586  year = {2015},
 587  pages = {159--169},
 588  number = {2},
 589  volume = {26},
 590  journal = {Psychological science},
 591  author = {Eichstaedt, Johannes C and Schwartz, Hansen Andrew and Kern, Margaret L and Park, Gregory and Labarthe, Darwin R and Merchant, Raina M and Jha, Sneha and Agrawal, Megha and Dziurzynski, Lukasz A and Sap, Maarten and others},
 592  title = {Psychological language on {T}witter predicts county-level heart disease mortality},
 596  publisher = {Houghton Mifflin Harcourt},
 597  year = {2017},
 598  author = {Barrett, Lisa Feldman},
 599  title = {How emotions are made: The secret life of the brain},
 603  publisher = {Oxford University Press},
 604  year = {2017},
 605  pages = {1--23},
 606  number = {1},
 607  volume = {12},
 608  journal = {Social cognitive and affective neuroscience},
 609  author = {Barrett, Lisa Feldman},
 610  title = {The theory of constructed emotion: an active inference account of interoception and categorization},
 614  publisher = {University of Illinois press},
 615  year = {1957},
 616  number = {47},
 617  author = {Osgood, Charles Egerton and Suci, George J and Tannenbaum, Percy H},
 618  title = {The measurement of meaning},
 622  publisher = {American Psychological Association},
 623  year = {1980},
 624  pages = {1161},
 625  number = {6},
 626  volume = {39},
 627  journal = {Journal of personality and social psychology},
 628  author = {Russell, James A},
 629  title = {A circumplex model of affect.},
 633  publisher = {John Wiley \& Sons Ltd},
 634  year = {1999},
 635  author = {Scherer, Klaus R},
 636  title = {Appraisal theory.},
 640  publisher = {American Psychological Association},
 641  year = {1991},
 642  pages = {819},
 643  number = {8},
 644  volume = {46},
 645  journal = {American psychologist},
 646  author = {Lazarus, Richard S},
 647  title = {Progress on a cognitive-motivational-relational theory of emotion.},
 651  publisher = {Taylor \& Francis},
 652  year = {1954},
 653  pages = {146--162},
 654  number = {2-3},
 655  volume = {10},
 656  journal = {Word},
 657  author = {Harris, Zellig S},
 658  title = {Distributional structure},
 662  publisher = {MIT press},
 663  year = {2014},
 664  volume = {11},
 665  author = {Chomsky, Noam},
 666  title = {Aspects of the Theory of Syntax},
 670  publisher = {Elsevier},
 671  year = {1973},
 672  pages = {261--286},
 673  booktitle = {Cognitive Development and Acquisition of Language},
 674  author = {Ervin-Tripp, Susan},
 675  title = {Some strategies for the first two years},
 679  pages = {8718--8735},
 680  doi = {10.18653/v1/2020.emnlp-main.703},
 681  url = {},
 682  publisher = {Association for Computational Linguistics},
 683  address = {Online},
 684  year = {2020},
 685  month = {November},
 686  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
 687  author = {Bisk, Yonatan  and
 688Holtzman, Ari  and
 689Thomason, Jesse  and
 690Andreas, Jacob  and
 691Bengio, Yoshua  and
 692Chai, Joyce  and
 693Lapata, Mirella  and
 694Lazaridou, Angeliki  and
 695May, Jonathan  and
 696Nisnevich, Aleksandr  and
 697Pinto, Nicolas  and
 698Turian, Joseph},
 699  title = {Experience Grounds Language},
 703  pages = {5185--5198},
 704  doi = {10.18653/v1/2020.acl-main.463},
 705  url = {},
 706  publisher = {Association for Computational Linguistics},
 707  address = {Online},
 708  year = {2020},
 709  month = {July},
 710  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 711  author = {Bender, Emily M.  and
 712Koller, Alexander},
 713  title = {Climbing towards {NLU}: {On} Meaning, Form, and Understanding in the Age of Data},
 717  pages = {588--602},
 718  doi = {10.18653/v1/2021.naacl-main.49},
 719  url = {},
 720  publisher = {Association for Computational Linguistics},
 721  address = {Online},
 722  year = {2021},
 723  month = {June},
 724  booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
 725  author = {Hovy, Dirk  and
 726Yang, Diyi},
 727  title = {The Importance of Modeling Social Factors of Language: Theory and Practice},
 731  publisher = {Frontiers},
 732  year = {2018},
 733  pages = {1614},
 734  volume = {9},
 735  journal = {Frontiers in psychology},
 736  author = {Goerlich, Katharina S},
 737  title = {The multifaceted nature of alexithymia--a neuroscientific perspective},
 741  publisher = {Elsevier},
 742  year = {1994},
 743  pages = {23--32},
 744  number = {1},
 745  volume = {38},
 746  journal = {Journal of psychosomatic research},
 747  author = {Bagby, R Michael and Parker, James DA and Taylor, Graeme J},
 748  title = {The twenty-item {T}oronto {A}lexithymia Scale: I. Item selection and cross-validation of the factor structure},
 752  publisher = {Springer},
 753  year = {2001},
 754  pages = {5--17},
 755  number = {1},
 756  volume = {31},
 757  journal = {Journal of autism and developmental disorders},
 758  author = {Baron-Cohen, Simon and Wheelwright, Sally and Skinner, Richard and Martin, Joanne and Clubley, Emma},
 759  title = {The autism-spectrum quotient (AQ): Evidence from asperger syndrome/high-functioning autism, malesand females, scientists and mathematicians},
 763  publisher = {Society for Technical Communication},
 764  year = {2005},
 765  pages = {163--174},
 766  number = {2},
 767  volume = {52},
 768  journal = {Technical communication},
 769  author = {Spinuzzi, Clay},
 770  title = {The methodology of participatory design},
 774  publisher = {Routledge},
 775  year = {2020},
 776  pages = {3--23},
 777  booktitle = {Research and inequality},
 778  author = {Humphries, Beth and Mertens, Donna M and Truman, Carole},
 779  title = {Arguments for an ‘emancipatory’research paradigm},
 783  year = {2016},
 784  pages = {27--30},
 785  address = {Brighton, United Kingdom},
 786  booktitle = {Proceedings of DRS2016 International Conference, Vol. 6: Future–Focused Thinking},
 787  author = {Noel, Lesley-Ann},
 788  title = {Promoting an emancipatory research paradigm in design education and practice},
 792  publisher = {Citeseer},
 793  year = {1997},
 794  pages = {15--31},
 795  volume = {2},
 796  journal = {Doing disability research},
 797  author = {Oliver, Michael},
 798  title = {Emancipatory research: Realistic goal or impossible dream},
 802  publisher = {JSTOR},
 803  year = {1996},
 804  pages = {699--716},
 805  journal = {British journal of sociology},
 806  author = {Stone, Emma and Priestley, Mark},
 807  title = {Parasites, pawns and partners: Disability research and the role of non-disabled researchers},
 811  publisher = {Taylor \& Francis},
 812  year = {2015},
 813  pages = {483--497},
 814  number = {4},
 815  volume = {28},
 816  journal = {Innovation: The European Journal of Social Science Research},
 817  author = {Seale, Jane and Nind, Melanie and Tilley, Liz and Chapman, Rohhss},
 818  title = {Negotiating a third space for participatory research with people with learning disabilities: An examination of boundaries and spatial practices},
 822  publisher = {Taylor \& Francis},
 823  year = {2014},
 824  pages = {376--389},
 825  number = {4},
 826  volume = {37},
 827  journal = {International Journal of Research \& Method in Education},
 828  author = {Hall, Lisa},
 829  title = {‘{W}ith’not ‘about’: {E}merging paradigms for research in a cross-cultural space},
 833  publisher = {SAGE Publications Sage UK: London, England},
 834  year = {2019},
 835  pages = {943--953},
 836  number = {4},
 837  volume = {23},
 838  journal = {Autism},
 839  author = {Fletcher-Watson, Sue and Adams, Jon and Brook, Kabie and Charman, Tony and Crane, Laura and Cusack, James and Leekam, Susan and Milton, Damian and Parr, Jeremy R and Pellicano, Elizabeth},
 840  title = {Making the future together: Shaping autism research through meaningful participation},
 844  publisher = {Taylor \& Francis},
 845  year = {2019},
 846  pages = {1082--1101},
 847  number = {7-8},
 848  volume = {34},
 849  journal = {Disability \& Society},
 850  author = {Bertilsdotter Rosqvist, Hanna and Kourti, Marianthi and Jackson-Perry, David and Brownlow, Charlotte and Fletcher, Kirsty and Bendelman, Daniel and O'Dell, Lindsay},
 851  title = {Doing it differently: Emancipatory autism studies within a neurodiverse academic space},
 855  publisher = {Emerald Publishing Limited},
 856  year = {2017},
 857  journal = {Journal of Enabling Technologies},
 858  author = {Brosnan, Mark and Holt, Samantha and Yuill, Nicola and Good, Judith and Parsons, Sarah},
 859  title = {Beyond autism and technology: Lessons from neurodiverse populations},
 863  isbn = {978-3-030-25629-6},
 864  pages = {268--274},
 865  address = {Cham},
 866  publisher = {Springer International Publishing},
 867  year = {2020},
 868  booktitle = {Human Interaction and Emerging Technologies},
 869  title = {Designing Technologies for Neurodiverse Users: Considerations from Research Practice},
 870  editor = {Ahram, Tareq
 871and Taiar, Redha
 872and Colson, Serge
 873and Choplin, Arnaud},
 874  author = {Motti, Vivian Genaro
 875and Evmenova, Anna},
 879  doi = {10.1111/j.1742-9544.1995.tb01750.x},
 880  journal = {Humanities \& Social Sciences papers},
 881  volume = {30},
 882  title = {Myers--{B}riggs Type Indicator ({MBTI}): Some psychometric limitations},
 883  pages = {},
 884  month = {03},
 885  year = {1995},
 886  author = {Boyle, Gregory J.},
 890  year = {2016},
 891  journal = {Military review},
 892  author = {Gerras, Stephen J and Wong, Leonard},
 893  title = {Moving beyond the {MBTI}},
 897  month = {Jul},
 898  year = {2018},
 899  author = {Dickson, Ben},
 900  howpublished = {PC Magazine. \url{}},
 901  title = {Why {AI} Must Disclose That It's {AI} },
 905  organization = {Springer},
 906  year = {2020},
 907  pages = {3--15},
 908  booktitle = {International Workshop on Chatbot Research and Design},
 909  author = {De Cicco, Roberta and Palumbo, Riccardo and others},
 910  title = {Should a Chatbot Disclose Itself? {I}mplications for an Online Conversational Retailer},
 914  year = {2021},
 915  pages = {610--623},
 916  booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
 917  author = {Bender, Emily M and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret},
 918  title = {On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?},
 922  year = {2013},
 923  volume = {29},
 924  title = {Crowdsourcing a Word-Emotion Association Lexicon},
 925  pages = {436--465},
 926  number = {3},
 927  journal = {Computational Intelligence},
 928  author = {Mohammad, Saif M. and Turney, Peter D.},
 932  address = {Melbourne, Australia},
 933  year = {2018},
 934  booktitle = {Proceedings of The Annual Conference of the Association for Computational Linguistics (ACL)},
 935  author = {Mohammad, Saif M.},
 936  title = {Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 {E}nglish Words},
 940  journal = {arXiv:2011.03492},
 941  year = {2020},
 942  author = {Saif M. Mohammad},
 943  title = {Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons},
 947  year = {2021},
 948  journal = {Pew Research Center},
 949  author = {Auxier, Brooke and Anderson, Monica},
 950  title = {Social media use in 2021},
 954  url = {},
 955  address = {Miyazaki, Japan},
 956  year = {2018},
 957  month = {May},
 958  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)},
 959  author = {Mohammad, Saif  and
 960Kiritchenko, Svetlana},
 961  title = {Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories},
 965  pages = {1--17},
 966  doi = {10.18653/v1/S18-1001},
 967  url = {},
 968  address = {New Orleans, Louisiana},
 969  year = {2018},
 970  month = {June},
 971  booktitle = {Proceedings of The 12th International Workshop on Semantic Evaluation},
 972  author = {Mohammad, Saif  and
 973Bravo-Marquez, Felipe  and
 974Salameh, Mohammad  and
 975Kiritchenko, Svetlana},
 976  title = {{S}em{E}val-2018 Task 1: Affect in Tweets},
 980  year = {2015},
 981  pages = {15--24},
 982  number = {1},
 983  volume = {36},
 984  journal = {AI Magazine},
 985  author = {Aroyo, Lora and Welty, Chris},
 986  title = {Truth is a lie: Crowd truth and the seven myths of human annotation},
 990  year = {2017},
 991  pages = {11--20},
 992  booktitle = {Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing},
 993  author = {Checco, Alessandro and Roitero, Kevin and Maddalena, Eddy and Mizzaro, Stefano and Demartini, Gianluca},
 994  title = {Let's agree to disagree: Fixing agreement measures for crowdsourcing},
 998  address = {Winterthur},
 999  booktitle = {Proceedings of the 5th Swiss Text Analytics Conference (SwissText) \& 16th Conference on Natural Language Processing (KONVENS)},
1000  year = {2020},
1001  author = {Klenner, Manfred and G{\"o}hring, Anne and Amsler, Michael and Ebling, Sarah and Tuggener, Don and H{\"u}rlimann, Manuela and Volk, Martin},
1002  title = {Harmonization sometimes harms},
1006  organization = {CEUR-WS},
1007  year = {2020},
1008  pages = {31--40},
1009  volume = {2776},
1010  booktitle = {2020 AIxIA Discussion Papers Workshop, AIxIA 2020 DP},
1011  author = {Basile, Valerio},
1012  title = {It’s the End of the Gold Standard as we Know it. On the Impact of Pre-aggregation on the Evaluation of Highly Subjective Tasks},
1016  month = {Aug},
1017  year = {2020},
1018  howpublished = {\url{}},
1019  author = {Ruder, Sebastian Ruder Sebastian},
1020  title = {Why You Should Do NLP Beyond {E}nglish},
1024  year = {2020},
1025  pages = {1--18},
1026  booktitle = {ICIS},
1027  author = {Mozafari, Nika and Weiger, Welf H and Hammerschmidt, Maik},
1028  title = {The Chatbot Disclosure Dilemma: Desirable and Undesirable Effects of Disclosing the Non-Human Identity of Chatbots.},
1032  pages = {3512--3521},
1033  url = {},
1034  address = {Online},
1035  year = {2021},
1036  month = {April},
1037  booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume},
1038  author = {Thaine, Patricia  and
1039Penn, Gerald},
1040  title = {The {C}hinese Remainder Theorem for Compact, Task-Precise, Efficient and Secure Word Embeddings},
1044  year = {2021},
1045  journal = {arXiv preprint arXiv:2104.10097},
1046  author = {Shmueli, Boaz and Fell, Jan and Ray, Soumya and Ku, Lun-Wei},
1047  title = {Beyond fair pay: Ethical implications of NLP crowdsourcing},
1051  year = {2011},
1052  number = {10},
1053  journal = {Linguistica Antverpiensia, New Series--Themes in Translation Studies},
1054  author = {Dolmaya, Julie McDonough},
1055  title = {The ethics of crowdsourcing},
1059  year = {2016},
1060  journal = {arXiv preprint arXiv:1606.07356},
1061  author = {Agrawal, Aishwarya and Batra, Dhruv and Parikh, Devi},
1062  title = {Analyzing the behavior of visual question answering models},
1066  year = {2020},
1067  pages = {740--741},
1068  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
1069  author = {Bissoto, Alceu and Valle, Eduardo and Avila, Sandra},
1070  title = {Debiasing skin lesion datasets and models? not so fast},
1074  publisher = {American Medical Association},
1075  year = {2019},
1076  pages = {1135--1141},
1077  number = {10},
1078  volume = {155},
1079  journal = {JAMA dermatology},
1080  author = {Winkler, Julia K and Fink, Christine and Toberer, Ferdinand and Enk, Alexander and Deinlein, Teresa and Hofmann-Wellenhof, Rainer and Thomas, Luc and Lallas, Aimilios and Blum, Andreas and Stolz, Wilhelm and others},
1081  title = {Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition},
1085  publisher = {Sage Publications Sage UK: London, England},
1086  year = {2015},
1087  pages = {308--315},
1088  number = {4},
1089  volume = {7},
1090  journal = {Emotion Review},
1091  author = {Hollenstein, Tom},
1092  title = {This time, it’s real: Affective flexibility, time scales, feedback loops, and the regulation of emotion},
1096  pages = {1-19},
1097  url = {},
1098  volume = {16},
1099  month = {09},
1100  year = {2021},
1101  title = {Emotion dynamics in movie dialogues},
1102  publisher = {Public Library of Science},
1103  journal = {PLOS ONE},
1104  author = {Hipson, Will E. AND Mohammad, Saif M.},
1105  doi = {10.1371/journal.pone.0256153},
1109  publisher = {ACM New York, NY, USA},
1110  year = {2020},
1111  pages = {54--63},
1112  number = {12},
1113  volume = {63},
1114  journal = {Communications of the ACM},
1115  author = {Schwartz, Roy and Dodge, Jesse and Smith, Noah A and Etzioni, Oren},
1116  title = {Green {AI}},
1120  month = {Jul},
1121  year = {2019},
1122  author = {AI2},
1123  howpublished = {Medium. \url{}},
1124  title = {Crowdsourcing: Pricing Ethics and Best Practices},
1128  pages = {413--420},
1129  doi = {10.1162/COLI_a_00057},
1130  url = {},
1131  year = {2011},
1132  number = {2},
1133  volume = {37},
1134  journal = {Computational Linguistics},
1135  author = {Fort, Kar{\"e}n  and
1136Adda, Gilles  and
1137Cohen, K. Bretonnel},
1138  title = {Last Words: {A}mazon {M}echanical {T}urk: Gold Mine or Coal Mine?},
1142  publisher = {Wiley Online Library},
1143  year = {2018},
1144  pages = {72--80},
1145  number = {1},
1146  volume = {27},
1147  journal = {Business Ethics: A European Review},
1148  author = {Standing, Susan and Standing, Craig},
1149  title = {The ethical use of crowdsourcing},
1153  year = {2013},
1154  pages = {611--620},
1155  booktitle = {Proceedings of the SIGCHI conference on human factors in computing systems},
1156  author = {Irani, Lilly C and Silberman, M Six},
1157  title = {Turkopticon: Interrupting worker invisibility in {A}mazon {M}echanical {T}urk},
1161  keywords = {Non-cognitive skills, Big-five personality traits, Stability, Wages},
1162  author = {Deborah A. Cobb-Clark and Stefanie Schurer},
1163  url = {},
1164  doi = {},
1165  issn = {0165-1765},
1166  year = {2012},
1167  pages = {11-15},
1168  number = {1},
1169  volume = {115},
1170  journal = {Economics Letters},
1171  title = {The stability of big-five personality traits},
1175  publisher = {Nature Publishing Group},
1176  year = {2018},
1177  pages = {719--731},
1178  number = {10},
1179  volume = {2},
1180  journal = {Nature biomedical engineering},
1181  author = {Yu, Kun-Hsing and Beam, Andrew L and Kohane, Isaac S},
1182  title = {Artificial {I}ntelligence in healthcare},
1186  publisher = {Springer},
1187  year = {2019},
1188  pages = {299--314},
1189  number = {3},
1190  volume = {11},
1191  journal = {Asian Bioethics Review},
1192  author = {Lysaght, Tamra and Lim, Hannah Yeefen and Xafis, Vicki and Ngiam, Kee Yuan},
1193  title = {{AI}-assisted decision-making in healthcare},
1197  publisher = {Springer},
1198  year = {2019},
1199  author = {Panesar, Arjun},
1200  title = {Machine learning and {AI} for healthcare},
1204  year = {2021},
1205  author = {Born, Georgina and Morris, Jeremy and Diaz, Fernando and Anderson, Ashton},
1206  title = {Artificial {I}ntelligence, Music Recommensation, and the Curation of Culture},
1210  year = {2021},
1211  author = {Srinivasan, Ramya and Uchino, Kanji},
1212  title = {The Role of Arts in Shaping {AI} Ethics},
1216  year = {2013},
1217  pages = {583--591},
1218  booktitle = {Seventh International AAAI Conference on Weblogs and Social Media},
1219  author = {Schwartz, Hansen Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Lucas, Richard E and Agrawal, Megha and Park, Gregory J and Lakshmikanth, Shrinidhi K and Jha, Sneha and Seligman, Martin EP and others},
1220  title = {Characterizing geographic variation in well-being using tweets},
1224  year = {2011},
1225  address = {Heraklion, Crete},
1226  pages = {93--98},
1227  booktitle = {Proceedings of the ESWC Workshop on `Making Sense of Microposts': Big things come in small packages},
1228  author = {Nielsen, Finn {\AA}rup},
1229  title = {A new {ANEW}: Evaluation of a word list for sentiment analysis in microblogs},
1233  series = {FAT* '19},
1234  location = {Atlanta, GA, USA},
1235  keywords = {ethics, machine learning, social media, mental health, algorithms},
1236  numpages = {10},
1237  pages = {79–88},
1238  booktitle = {Proceedings of the Conference on Fairness, Accountability, and Transparency},
1239  doi = {10.1145/3287560.3287587},
1240  url = {},
1241  address = {New York, NY, USA},
1242  publisher = {Association for Computing Machinery},
1243  isbn = {9781450361255},
1244  year = {2019},
1245  title = {A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media},
1246  author = {Chancellor, Stevie and Birnbaum, Michael L. and Caine, Eric D. and Silenzio, Vincent M. B. and De Choudhury, Munmun},
1250  year = {1975},
1251  publisher = {Pantheon},
1252  author = {Chomsky, Noam},
1253  title = {Reflections on language},
1257  publisher = {University of Chicago press},
1258  year = {2008},
1259  author = {Lakoff, George},
1260  title = {Women, fire, and dangerous things: What categories reveal about the mind},
1264  publisher = {Penguin},
1265  year = {2007},
1266  author = {Pinker, Steven},
1267  title = {The stuff of thought: Language as a window into human nature},
1271  abstract = {Recent advances in machine learning have led to computer systems that are humanlike in behavior. Sentiment analysis, the automatic determination of emotions in text, is allowing us to capitalize on substantial previously unattainable opportunities in commerce, public health, government policy, social sciences, and art. Further, analysis of emotions in text, from news to social media posts, is improving our understanding of not just how people convey emotions through language but also how emotions shape our behavior. This article presents a sweeping overview of sentiment analysis research that includes: the origins of the field, the rich landscape of tasks, challenges, a survey of the methods and resources used, and applications. We also discuss how, without careful fore-thought, sentiment analysis has the potential for harmful outcomes. We outline the latest lines of research in pursuit of fairness in sentiment analysis.},
1272  keywords = {Sentiment analysis, Emotions, Artificial intelligence, Machine learning, Natural language processing (NLP), Social media, Emotion lexicons, Fairness in NLP},
1273  author = {Saif M. Mohammad},
1274  url = {},
1275  doi = {},
1276  isbn = {978-0-12-821125-0},
1277  year = {2021},
1278  pages = {323-379},
1279  edition = {Second Edition},
1280  publisher = {Woodhead Publishing},
1281  booktitle = {Emotion Measurement (Second Edition)},
1282  editor = {Herbert L. Meiselman},
1283  title = {Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text},
1287  publisher = {Elsevier},
1288  year = {1997},
1289  pages = {309--340},
1290  number = {4},
1291  volume = {19},
1292  journal = {Language sciences},
1293  author = {Bamberg, Michael},
1294  title = {Language, concepts and emotions: The role of language in the construction of emotions},
1298  publisher = {Springer},
1299  year = {2005},
1300  pages = {165--210},
1301  number = {2},
1302  volume = {39},
1303  journal = {Language resources and evaluation},
1304  author = {Wiebe, Janyce and Wilson, Theresa and Cardie, Claire},
1305  title = {Annotating expressions of opinions and emotions in language},
1309  publisher = {Sage Publications Sage CA: Los Angeles, CA},
1310  year = {2010},
1311  pages = {24--54},
1312  number = {1},
1313  volume = {29},
1314  journal = {Journal of language and social psychology},
1315  author = {Tausczik, Yla R and Pennebaker, James W},
1316  title = {The psychological meaning of words: {LIWC} and computerized text analysis methods},
1320  pages = {246--255},
1321  url = {},
1322  address = {Montr{\'e}al, Canada},
1323  year = {2012},
1324  month = {7-8 June},
1325  booktitle = {*{SEM} 2012: The First Joint Conference on Lexical and Computational Semantics {--} Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation ({S}em{E}val 2012)},
1326  author = {Mohammad, Saif},
1327  title = {{\#E}motional Tweets},
1331  year = {2011},
1332  pages = {265--272},
1333  number = {1},
1334  volume = {5},
1335  journal = {Proceedings of the Fifth international AAAI conference on weblogs and social media},
1336  author = {Paul, Michael J and Dredze, Mark},
1337  title = {You are what you tweet: Analyzing {T}witter for public health},
1341  pages = {105--114},
1342  url = {},
1343  address = {Portland, OR, USA},
1344  year = {2011},
1345  month = {June},
1346  booktitle = {Proceedings of the 5th {ACL}-{HLT} Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities},
1347  author = {Mohammad, Saif},
1348  title = {From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales},
1352  series = {CSCW '12},
1353  location = {Seattle, Washington, USA},
1354  keywords = {twitter, community, emotion, psychology},
1355  numpages = {4},
1356  pages = {965–968},
1357  booktitle = {Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work},
1358  doi = {10.1145/2145204.2145347},
1359  url = {},
1360  address = {New York, NY, USA},
1361  publisher = {Association for Computing Machinery},
1362  isbn = {9781450310864},
1363  year = {2012},
1364  title = {Tracking "{G}ross Community Happiness" from Tweets},
1365  author = {Quercia, Daniele and Ellis, Jonathan and Capra, Licia and Crowcroft, Jon},
1369  pages = {124--126},
1370  number = {1},
1371  volume = {1},
1372  journal = {Linguistic Inquiry},
1373  title = {Coreferentiality and Stress},
1374  year = {1970},
1375  author = {Akmajian, Adrian and Ray Jackendoff},
1379  address = {Dublin, Ireland},
1380  month = {May},
1381  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
1382  year = {2021},
1383  author = {Saif M. Mohammad},
1384  title = {Ethics Sheets for {AI} Tasks},
1388  pages = {591--606},
1389  number = {10},
1390  volume = {13},
1391  journal = {Communications of the {ACM}},
1392  title = {Transition Network Grammars for Natural
1393Language Analysis},
1394  year = {1970},
1395  author = {Woods, William A.},
1399  publisher = {Elsevier},
1400  year = {2016},
1401  pages = {89--114},
1402  journal = {Emotions, technology, design, and learning},
1403  author = {Harley, Jason Matthew},
1404  title = {Measuring emotions: a survey of cutting edge methodologies used in computer-based learning environment research},
1408  year = {2014},
1409  booktitle = {ACM SIGKDD workshop on health informatics, New York, USA},
1410  author = {Hasan, Maryam and Agu, Emmanuel and Rundensteiner, Elke},
1411  title = {Using hashtags as labels for supervised learning of emotions in {T}witter messages},
1415  year = {2012},
1416  pages = {482--491},
1417  booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
1418  author = {Purver, Matthew and Battersby, Stuart},
1419  title = {Experimenting with distant supervision for emotion classification},
1423  address = {Lund},
1424  publisher = {Lund University Press},
1425  series = {Lund Studies in English},
1426  volume = {76},
1427  title = {Prosodic Patterns in Spoken {E}nglish: Studies
1428in the Correlation between Prosody and Grammar for Text-to-Speech
1430  year = {1987},
1431  author = {Altenberg, Bengt},
1435  address = {New York},
1436  publisher = {Academic Press},
1437  title = {Understanding Natural Language},
1438  year = {1972},
1439  author = {Winograd, Terry},
1443  pages = {79--92},
1444  address = {Berlin},
1445  publisher = {Springer-Verlag},
1446  booktitle = {Prosody: Models and Measurements},
1447  editor = {Anne Cutler and D. Robert Ladd},
1448  title = {Speakers' Conception of the Functions of Prosody},
1449  year = {1983},
1450  author = {Cutler, Anne},
1454  pages = {231--240},
1455  address = {New York},
1456  publisher = {D. Reidel},
1457  booktitle = {Studies in Syntax and Semantics},
1458  editor = {Ferenc Kiefer},
1459  title = {L'ordre des mots et la semantique},
1460  year = {1970},
1461  author = {Sgall, Petr},
1465  publisher = {Prentice Hall},
1466  chapter = {1},
1467  title = {Speech and Language Processing},
1468  year = {2000},
1469  author = {Jurafsky, Daniel and James H. Martin},
1473  institution = {SRI},
1474  number = {259},
1475  title = {Planning Natural-Language Utterances to
1476Satisfy Multiple Goals},
1477  year = {1982},
1478  author = {Appelt, Douglas E.},
1482  address = {Santa Monica, CA},
1483  institution = {The RAND Corporation},
1484  number = {RM-3892-PR},
1485  type = {Memorandum},
1486  title = {Automatic Parsing and Fact Retrieval: A
1487Comment on Grammar, Paraphrase, and Meaning},
1488  year = {1964},
1489  author = {Robinson, Jane J.},
1493  address = {Leyden},
1494  school = {University of Leyden},
1495  title = {Focus, Syntax, and Accent Placement},
1496  year = {1987},
1497  author = {Baart, J. L. G.},
1501  address = {Cambridge, England},
1502  school = {Cambridge University},
1503  type = {{D.Phil.}\ dissertation},
1504  title = {Synonymy and Semantic Classification},
1505  year = {1964},
1506  author = {Sp\"arck Jones, Karen},
1510  month = {May},
1511  school = {Massachusetts Institute of Technology},
1512  title = {Generating Expression in Synthesized Speech},
1513  year = {1989},
1514  author = {Cahn, Janet E.},
1518  note = {Paper presented at the Linguistic Society of America
1519annual meeting},
1520  title = {Discourse Functions of Pitch Range in Spontaneous
1521and Read Speech},
1522  year = {1992},
1523  author = {Ayers, Gail M.},
1527  address = {Grenoble},
1528  publisher = {Institut de la Communication Parlee},
1529  organization = {European Speech Communication Association},
1530  title = {Proceedings of the
1531Eurpoean Speech Communication Association Workshop on Speech
1532Synthesis, \emph{Autrans, September}},
1533  year = {1989},
1534  editor = {Benoit, Christian and Gerard Bailly},
1538  address = {Budapest},
1539  pages = {1423--1426},
1540  booktitle = {Proceedings of EUROSPEECH-99},
1541  title = {Error Spotting in Human-Machine Interactions},
1542  year = {1999},
1543  author = {Krahmer, Emiel and  M. Swerts and Mariet Theune and M. Weegels},
1547  year = {2001},
1548  address = {Toulouse, France},
1549  pages = {140--147},
1550  title = {{An Algebra for Semantic Construction in Constraint-Based Grammars}},
1551  date-modified = {2014-07-07 11:55:56 +0000},
1552  date-added = {2014-07-07 11:55:56 +0000},
1553  booktitle = {Proceedings of the 39th Annual Conference of the Association for Computational Linguistics (ACL-01)},
1554  author = {Copestake, Ann and Lascarides, Alex and Flickinger, Dan},
1558  year = {1963},
1559  address = {New York},
1560  publisher = {Macmillan Company},
1561  title = {Collected Works: Volume {V}},
1562  author = {John von Neumann},


arXiv:2109.08256v3 [cs.CL]
License: cc-by-4.0

