- Papers
Papers is Alpha. This content is part of an effort to make research more accessible, and (most likely) has lost some details from the original. You can find the original paper here.
Introduction
Emotions play a central role in our lives. Thus affective computing, which deals with emotions and computation (often through AI systems) is atremendously important and vibrant line of work. It is a sweeping interdisciplinary area of study exploring both fundamental research questions (such as what are emotions?) and commercial applications (such as can machines detect consumer sentiment?).
In her seminal book, Affective Computing, Dr. Rosalind Picard described Automatic Emotion Recognition (AER) as: “giving emotional abilities to computers”. Such systems can be incredibly powerful:facilitators of enormous progress, but also enablers of great harm. In fact, some of the recent commercial and governmental uses of emotion recognition have garnered considerable criticism, including: infringing on one’s privacy, exploiting vulnerable sub-populations, and also allegations of downright pseudo-science. Even putting aside high-profile controversies, emotion recognition impacts people and thus entails ethical considerations (big and small). Thus, it is imperative that the AER community actively engage with the ethical ramifications of their creations.
This article, which I refer to as an Ethics Sheet for AER, is a critical reflection of this broad field of study with the aim of facilitating more responsible emotion research and appropriate use of the technology. As described in, an Ethics Sheet for an AI Task is a semi-standardized document that synthesizes and organizes information from AI Ethics and AI Task literature to present a comprehensive array of ethical considerations for that task. Thus, in some ways, an ethics sheet is similar to survey articles, except here the focus is on ethical considerations. It:
Fleshes out assumptions hidden in how the task is framed, and in the choices often made regarding the data, method, and evaluation.
Presents ethical considerations unique or especially relevant to the task.
Presents how common ethical considerations manifest in the task.
Presents relevant dimensions and choice points; along with tradeoffs.
Lists common harm mitigation strategies.
Communicates societal implications of AI systems to researchers, developers, and the broader society.
The sheet should flesh out various ethical considerations that apply at the level of the task. It should also flesh out ethical consideration of common theories, methodologies, resources,and practices used in building AI systems for the task. A good ethics sheet will question some of the assumptions that often go unsaid.
Primary motivation for creating an Ethics Sheet for AER: to provide a go-to point for a carefully compiled substantive engagement with the ethical issues relevant to emotion recognition; going beyond individual systems and datasets and drawing on knowledge from a large body of past work. The document will be useful to anyone who wants to build or use emotion recognition systems/algorithms for research or commercial purposes. Specifically, the main benefits can be summarized by the list below:
Encourages more thoughtfulness on why to automate, how to automate, and how to judge success well before the building of AER systems.
Helps us better navigate research and implementation choices.
Moves us towards consensus andstandards.
Helps in developing better post-production documents such as datasheets and model cards.
Has citations and pointers; acts as a jumping off point for further reading.
Helps engage the various stakeholders of an AI task with each other. Helps stakeholders challenge assumptions made by researchers and developers. Helps develop harm mitigation strategies.
Acts as a useful introductory document on emotion recognition (complements survey articles).
Note that even though this sheet is focused on AER, many of the ethical considerations apply broadly to natural language tasks in general. Thus, it can serve as a useful template to build ethics sheets for other tasks.
Target audience: The primary audience for this sheet are researchers, engineers, developers, and educators from various fields (especially NLP, ML, AI, data science, public health, psychology, and digital humanities) who build, make use of, or teach about AER technologies; however, much of the discussion should be accessible to various other stakeholders of AER as well, including policy/decision makers, and those who are impacted by AER. I hope also that this sheet will act as a springboard for the creation of a sheet where non-technical stakeholders are the primary audience.
Process: My own research interests are at the intersection of emotions and language—to understand how we use language to express our feelings. I created this sheet to gather and organize my thoughts around responsible emotion recognition research, and hopefully it is of use to others as well. Discussions with various scholars from computer science, psychology, linguistics, neuroscience, and social sciences (and their comments on earlier drafts) have helped shape this sheet. An earlier draft of this material was also posted as a blog post with an explicit invitation for feedback. Valuable insights from the community were then incorporated into this document. That said, it should be noted that I do not speak for the AER community. There is no “objective” or “correct” ethics sheet. This sheet should be taken as one perspective amongst many in the community. I welcome dissenting views and encourage further discussion. These can lead to periodically revised or new ethics sheets. As stated in:
Multiple ethics sheets can be created (by different teams and approaches) to reflect multiple perspectives, viewpoints, and what is important to different groups of people. We should be wary of the world with single authoritative ethics sheets per task and no dissenting voices.
The rest of the paper is organized as follows: Section 2 is a preface to the ethics sheet, Section 3 presents the Ethics sheet for AER (50 considerations), and this is followed by summarizing thoughts in Section 4. The Appendix compiles a list of succinct recommendations for responsible AER (drawn from the discussions on ethical considerations in Section 3).
Preface for the Ethics Sheet on AER
Let us consider a few rapid-fire questions to set the context. A good ethics sheet makes us question our assumptions. So let us start at the top:
Q1. Should we be building AI systems for Automatic Emotion Recognition? Is it ethical to do so?
A. This is a good question. This sheet will not explicitly answer the question, but it will help in clarifying and thinking about it. This sheet will sometimes suggest that certain applications in certain contexts are good or bad ideas, but largely it will discuss what are the various considerations to be taken into account: whether to build or use a particular system, how to build or use a particular system, what is more appropriate for a given context, how to assess success, etc.
The above question is also somewhat under-specified. We first need to clarify…
Q2. What does automatic emotion recognition mean?
A. Emotion recognition can mean many things, and it has many forms. (This sheet will get into that.) Emotion recognition can be deployed in many contexts. For example, many will consider automated insurance premium decisions based on inferred emotions to be inappropriate. However, studying how people use language to express gratitude, sadness, etc. is considered okay in many contexts. A human–computer interaction system benefits from being able to identify which utterances can convey anger, joy, sadness, hate, etc. (Not having such capabilities will lead to offensive, unempathetic, and inappropriate interactions.) Many other contexts are described in the sheet.
Q3. Can machines infer one’s true emotional state ever?
A. No. (This sheet will get into that.)
Q4. Can machines infer some small aspect of people’s emotions (or emotions that they are trying to convey) in some contexts, to the extent that it is useful?
A. In my view, yes. In a limited way, this is analogous to machine translation or web search. The machine does not understand language, nor does it understand what the user really wants, nor the social, cultural, or embodied context, but it is able to produce a somewhat useful translation or search result with some likelihood; and it produces some amount of inappropriate and harmful results with some likelihood. However, unlike machine translation or search, emotions are much more personal, private, and complex. People cannot fully determine each other’s emotions. People cannot fully determine their own emotional state. But we make do with our limitations and infer emotions as best we can to function socially. We also have moral and ethical failures. We cause harm because of our limitations, and we harbor stereotypes and biases.
If machines are to be a part of this world and interact with people in any useful and respectful way, then they must have at least some limited emotion recognition capabilities; and thereby will also cause some amount of harm. Thus, if we use them, it is important that we are aware of the limitations; design systems that protect and empower those without power; deploy them in the contexts they are designed for; use them to assist human decision making; and work to mitigate the harms they will perpetrate. We need to hold AER systems to high standards, not just because it is a nice aspirational goal, but because machines impact people at scale (in ways that individuals rarely can) and emotions define who we are (in ways that other attributes rarely do). I hope this sheet is useful in that regard.
Main Sheet (version 1.0)
This ethics sheet for Automatic Emotion Recognition has four sections: Modalities and Scope, Task, Applications, and Ethical Considerations. The first three are brief and set the context. The fourth presents various ethical considerations of AER as a numbered list, organized in thematic groups.
Modalities and Scope
Modalities: Work on AER has used a number of modalities (sources of input), including:
Facial expressions, gait, proprioceptive data (movement of body), gestures
Skin and blood conductance, blood flow, respiration, infrared emanations
Force of touch, haptic data (from sensors of force)
Speech, language (esp. written text, emoticons, emojis)
All of these modalities come with benefits, potential harms, and ethical considerations.
Scope: This sheet will focus on AER from written text and AER in Natural Language Processing (NLP), but several of the listed considerations apply to AER in general (regardless of modality, and regardless of field such as NLP or Computer Vision).
Task
Automatic Emotion Recognition (AER) from one’s utterances (written or spoken) is a broad umbrella term used to refer to a number of related tasks such as those listed below: (Note that each of these framings has ethical considerations and may be more or less appropriate for a given context.)
Inferring emotions felt by the speaker (e.g., given Sara’s tweet, what is Sara feeling?); Inferring emotions of the speaker as perceived by the reader/listener (e.g., what does Li think Sara is feeling?); Inferring emotions that the speaker is attempting to convey (e.g., what emotion is Sara trying to convey?) These may be correlated, but they can be different depending on the particular instance. The first framing “inferring emotions felt by the speaker” is fairly common in scientific literature, but also perhaps most often misused/misinterpreted. More on this in the ethical considerations section.
Inferring the intensity of the emotions discussed above.
Inferring patterns of speaker’s emotions over long periods of time, across many utterances; including the inference of moods, emotion dynamics, and emotional arcs (e.g., tracking character emotion arcs in novels andtracking impact of health interventions on a patient’s well-being).
Inferring speaker’s emotions/attitudes/sentiment towards a target product, movie, person, idea, policy, entity, etc. (e.g., does Sara like the new phone?).
Inferring emotions evoked in the reader/listener (e.g., what feelings arise in Li on reading Sara’s tweet?). This may be different among different readers because of their past experiences, personalities, and world-views: e.g., the same text may evoke different feelings among people with opposing views on an issue.
Inferring emotions of people mentioned in the text (e.g., given a tweet that mentions Moe, what emotional state of Moe is conveyed in the tweet?).
Inferring emotionality of language used in text (regardless of whose emotions) (e.g., is the tweet about happy things, angry feelings, etc.?).
Inferring how language is used to convey emotions such as joy, sadness, loneliness, hate, etc.
Inferring the emotional impact of sarcasm, metaphor, idiomatic expression, dehumanizing utterance, hate speech, etc.
Note 1: The term Sentiment Analysis is commonly used to refer to the task described in bullet 4, especially in the context of product reviews (sentiment is commonly labeled as positive negative, or neutral). On the other hand, determining the predilection of a person towards a policy, party, issue, etc. is usually referred to as Stance Detection, and involves classes such as favour and against.
Note 2: Many AER systems focus only on the emotionality of the language used (bullet 7), even though their stated goal might be one of the other bullets. This may be appropriate in restricted contexts such as customer reviews or personal diary blog posts, but not always. (More on this in sec-taskdesign Ethical Considerations: Task Design.)
Note 3: There also exist tasks that focus not directly on emotions, but on associated phenomena, such as: whose emotions, who/what evoked the emotion, what types of human need was met or not met resulting in the emotion, etc. See these surveys for more details:examines emotions, sentiment, stance, etc.;focuses on sentiment analysis tasks;surveys multi-modal techniques for sentiment analysis.
Applications
The potential benefits of AER are substantial. Below is a sample of some existing applications: (Note that this is not an endorsement of these applications. All of the applications come with potential harms and ethical considerations. Use of AER by the military, for intelligence, and for education are especially controversial.)
Public Health: Assist public health research projects, including those on loneliness, depression, suicidality prediction, bipolar disorder, stress, and well-being.
Commerce/Business: Track sentiment and emotions towards one’s products, track reviews, blog posts, YouTube videos and comments; develop virtual assistants, writing assistants; help advertise products that one is more likely to be interested in.
Government Policy and Public Health Policy: Tracking and documenting views of the broader public on a range of issues that impact policy (tracking amount of support and opposition, identifying underlying issues and pain points, etc.). Governments and health organizations around the world are also interested in tracking how effective their messaging has been in response to crises such as pandemics and climate change.
Art and Literature: Improve our understanding of what makes a compelling story, how do different types of characters interact, what are the emotional arcs of stories,
what is the emotional signature of different genres,
what makes well-rounded characters,
why does art evoke emotions,
how do the lyrics and music impact us emotionally, etc. Can machines generate art(generate paintings, stories, music, etc.)?
Social Sciences, Neuroscience, Psychology: Help answer questions about people. What makes people thrive? What makes us happy? What can our language tell us about our well-being? What can language tell us about how we construct emotions in our minds? How do we express emotions? How different are people in terms of what different emotion words mean to them and how they use emotional words?
Military, Policing, and Intelligence: Tracking how sets of people or countries feel about a government or other entities (controversial); tracking misinformation on social media.
ETHICAL CONSIDERATIONS
The usual approach to building an AER system is to design the task (identify the process to be automated, the emotions of interest, etc.), compile appropriate data (label some of the data for emotions—a process referred to as human annotation), train ML models that capture patterns of emotional expression from the data—the method, and evaluate the models by examining their predictions on a held-out test set. There are ethical considerations associated with each step of this development process. Considerations for privacy and social groups are especially pertinent for AER andcut across task, design, data, and evaluation.
This section describes fifty considerations grouped under the themes: Task Design, Data, Method, Impact and Evaluation, and Implications for Privacy and Social Groups. First I present an outline of the considerations along with a summary for each grouping. This is followed by five sub-sections (sec-taskdesign through sec-privacysg) that present, in detail, the ethical considerations associated with the five groups.
I. TASK DESIGN
Summary: This section discusses various ethical considerations associated with the choices involved in the framing of the emotion task and the implications of automating the chosen task. Some important considerations include: Whether it is even possible to determine one’s internal mental state? Whether it is ethical to determine such a private state? And, who is often left out in the design of existing AER systems? I discuss how it is important to consider which formulation of emotions is appropriate for a specific task/project; while avoiding careless endorsement of theories that suggest a mapping of external appearances to inner mental states.
A. Theoretical Foundations
Emotion Taskand Framing
Emotion Model and Choice of Emotions
Meaning and Extra-Linguistic Information
Wellness and Emotion
Aggregate Level vs. Individual Level
B. Implications of Automation
Why Automate (Who Benefits; Will this Shift Power)
Embracing Neurodiversity
Participatory/Emancipatory Design
Applications, Dual use, Misuse
Disclosure of Automation
II. DATA
Summary: This section has three themes: implications of using datasets of different kinds, the tension between human variability and machine normativeness, and theconsiderations regarding the people who have produced the data. Notably, I discuss how on the one handis the tremendous variability in human mental representation and expression of emotions, and on the other hand, is the inherent bias of modern machine learning approaches to ignore variability. Thus, through their behaviour (e.g., by recognizing some forms of emotion expression and notothers), AI systems convey to the user what is “normal”; implicitly invalidating other forms of emotion expression.
C. Why This Data
Types of data
Dimensions of data
D. Human Variability vs. Machine Normativeness
Variability of Expression and Mental Representation
Norms of Emotions Expression
Norms of Attitudes
One “Right” Label or Many Appropriate Labels
Label Aggregation
Historical Data (Who is Missing and What are the Biases)
Training-Deployment Differences
E. The People Behind the Data
Platform Terms of Service
Anonymization and Ability to Delete One’s information
Warnings and Recourse
Crowdsourcing
III. METHOD
Summary: This section discusses the ethical implications of doing AER using a given method. It presents the types of methods and their tradeoffs, as well as, considerations of who is left out, spurious correlations, and the role of context. Special attention is paid to green AI and the fine line between emotion management and manipulation.
F. Why This Method
Types of Methods and their Tradeoffs
Who is Left Out by this Method
Spurious Correlations
Context is Everything
Individual Emotion Dynamics
Historical Behavior is not always indicative of Future Behavior
Emotion Management, Manipulation
Green AI
IV. IMPACT AND EVALUATION
Summary: This section discusses ethical considerations associated with the impact of AER systems using both traditional metrics as well as through a number of other criteria beyond metrics. Notably, this latter subsection discusses interpretability, visualizations, building safeguards, and contestability, because even when systems work as designed, there will be some negative consequences. Recognizing and planning for such outcomes is part of responsible development.
G. Metrics
Reliability/Accuracy
Demographic Biases
Sensitive Applications
Testing (on Diverse Datasets, on Diverse Metrics)
H. Beyond Metrics
Interpretability, Explainability
Visualization
Safeguards and Guard Rails
Harms even when the System Works as Designed
Contestability and Recourse
Be wary of Ethics Washing
V. IMPLICATIONS FOR PRIVACY, SOCIAL GROUPS
Summary: This section presents ethical implications of AER for privacy and for social groups. These issues cut across Task Design, Data, Method, and Impact. I discuss both individual and group privacy. The latter becomes especially important in the context of soft-biometrics determined through AER that are not intended to be able to identify individuals, but rather identify groups of people with similar characteristics. I discuss the need for work that does not treat people as a homogeneous group (ignoring sub-group differences) but rather explores disaggregation andintersectionality, while minimizing reification and essentialization of social constructs.
I. Implications for Privacy
Privacy and Personal Control
Group Privacy and Soft Biometrics
Mass Surveillance vs. Right to Privacy, Expression, Protest
Right Against Self-Incrimination
Right to Non-Discrimination
J. Implications for Social Groups
Disaggregation
Intersectionality
Reification and Essentialization
Attributing People to Social Groups
One can read these various sections in one go, or simply use it as a reference when needed (jumping to sections of interest).
TASK DESIGN
(Ten considerations.)
A. Theoretical Foundations
Domain naivete is not a virtue.
Study the theoretical foundations for the task from relevant research fields such as psychology, linguistics, and sociology, to inform thetask formulation.
#1. Emotion Task and Framing: Carefully consider what emotion task should be the focus of the work (whether conducting human-annotation or building an automatic system). (See sec-tasks for a sample of common emotion tasks.) When building an AER system, a clear grasp of the task will help in making appropriate design choices. When choosing which AER system to use, a clear grasp of the emotion task most appropriate for the deployment context will help in choosing the right AER system. It is not uncommon for users of AER to have a particular emotion task in mind and mistakenly assume that an off-the-shelf AER system is designed for that task.
Each of the emotion tasks has associated ethical considerations. For example,
Is the goal to infer one’s true emotions? Is it possible to comprehensively determine one’s internal mental state by any AI or human? (Hint: No.) Is it ethical to determine such a private state?
Realize that it is impossible to capture the full emotional experience of a person (even if one had access to all the electrical signals in the brain). A less ambitious goal is to infer some aspects of one’s emotional state.
Here, we see a distinct difference between AER that uses vision and AER that uses language. While there is little credible evidence of the connection between one’s facial expressions and one’s internal emotional state, there is a substantial amount of work on the idea that language is a window into one’s mind—which of course also includes emotions.
That said, there is no evidence that one can determine the full (or even substantial portions) of one’s emotional state through their language. (See also considerations #2 Emotion Model and #13 Variability of Expression ahead on complexity of the emotional experience and variability of expression.) Thus, often it is more appropriate to frame the AER task differently, for example, the objective could be:
to study how people express emotions: Work that uses speaker-annotated labeled data such as emotion-word hashtags in tweets usually captures how people convey emotions. What people convey may not necessarily indicate what they feel.
to determine perceived emotion (how others may think one is feeling):
Perceived emotions are not necessarily the emotions of the speaker.
Emotion annotations by people who have not written the source text usually reveal perceived emotions. (This is most common in NLP data-annotation projects.) Annotation aggregation strategies, such as majority voting usually only convey emotions perceived by a majority group. Are we missing out on the perceptions of some groups?(More on majority voting in #17 Label Aggregation.)
to determine emotionality of language used in text (regardless of whose emotions, target/stimulus, etc.): This may be appropriate in some restricted-domain scenarios, for example, when one is looking at customer reviews. Here, the context is indicative that the emotionality in the language likely indicates attitude towards the product being reviewed. However, such systems have difficulty when dealing with movie and book reviews because then it has to distinguish between text expressing attitudes towards the book/movie from text describing what happened in the plot (which is likely emotional too).
to determine trends at aggregate level: Emotionality of language is also useful when tracking broad patterns at an aggregate level e.g., tracking trends of emotionality in tens of thousands of tweets or text in novels over time (e.g.,). The idea is that aggregating information from a large number of instances leads to the determination of meaningful trends in emotionality. (See also discussion in #5 Aggregate Level vs. Individual Level.)
In summary, it is important to identify what emotion task is the focus of one’s work, use appropriate data, and communicate the nuance of what is being captured to the stakeholders. Not doing so will lead to the misuse and misinterpretation of one’s work. Specifically, AER systems should not claim to determine one’s emotional state from their utterance, facial expression, gait, etc. At best, AER systems capture what one is trying to convey or what is perceived by the listener/viewer, and even there, given the complexity of human expression, they are often inaccurate. A separate question is whether AER systems can determine trends in the emotional state of a person (or a group) over time? Here, inferences are drawn at aggregate level from much larger amounts of data. Studies on public health, such as those listed in 3.3 fall in this category. Here too, it is best to be cautious in making claims about mental state, and use AER as one source of evidence amongst many (and involve expertise from public health and psychology).
#2. Emotion Model and Choice of Emotions: Work on AER needs to opertationalize the aspect of emotion it intends to capture, that is, decide on emotion-related categories or dimensions of interest, decide on how to represent them, etc. Psychologists and neuro-scientists have identified several theories of emotion to inform these decisions:
- The Basic Emotions Theory (BET): Work by Dr. Paul Ekman in 1960s galvanized the idea that some emotions (such as joy, sadness, fear, etc.) are universally expressed through similar facial expressions, and these emotions are more basic than others. This was followed by other proposals of basic emotions by Robert Plutchik, Izard and others. However, many of the tenets of BET, such as the universality of some emotions and their fixed mapping to facial expressions, stand discredited or are in question.
- The Dimensional Theory: Several influential studies have shown that the three most fundamental, largely independent, dimensions of affect and connotative meaning are valence (positiveness–negativeness / pleasure–displeasure), arousal (active–sluggish), and dominance (dominant-–submissive / in control–out of control). Valence and arousal specifically are commonly studied in a number of psychological and neuro-cognitive explorations of emotion.
- Cognitive Appraisal Theory: The core idea behind appraisal theoryis that emotions arise from a person’s evaluation of a situation or event. (Some varieties of the theory point to a parallel process of reacting to perceptual stimuli as well.) Thus it naturally accounts for variability in emotional reaction to the same event since different people may appraise the situation differently. Criticisms of appraisal theory centre around questions such as: whether emotions can arise without appraisal; whether emotions can arise without physiological arousal; and whether our emotions inform our evaluations.
- The Theory of Constructed Emotions: Dr. Lisa Barrett proposed a new theory on how the human brain constructs emotions from our experiences of the world around us and the signals from our body.
Since ML approaches rely on human-annotated data (which can be hard to obtain in large quantities), AER research has often gravitated to the Basic Emotions Theory, as that work allows one to focus on a small number of emotions. This attraction has been even stronger in the vision AER research because of BET’s suggested mapping between facial expressions and emotions. However, as noted above, many of the tenets of BET stand debunked.
Consider which formulation of emotions is appropriate for your task/project. For example, one may choose to work with the dimensional model or the model of constructed emotions if the goal is to infer behavioural or health outcome predictions. Despite criticisms of BET, it makes sense for some NLP work to focus on categorical emotions such as joy, sadness, guilt, pride, fear, etc. (including what some refer to as basic emotions) because people often talk about their emotions in terms of these concepts. Most human languages have words for these concepts (even if our individual mental representations for these concepts vary to some extent). However, note that work on categorical emotions by itself is not an endorsement of the BET. Do not refer to some emotions as basic emotions, unless you mean to convey your belief in the BET. Careless endorsement of theories can lead to the perpetuation of ideas that are actively harmful (such as suggesting we can determine internal state from outward appearance—physiognomy).
#3. Meaning and Extra-Linguistic Information: The meaning of an utterance is not only a property of language, but it is grounded in human activity, social interactions, beliefs, culture, and other extra-linguistic events, perceptions, and knowledge. Thus one can express the same emotion in different ways in different contexts, different people express the same emotions in different ways, and the same utterances can evoke different emotions in different people. AER systems that do not take extra-linguistic information into consideration will always be limited in their capabilities, and risk being systematically biased, insensitive, and discriminatory. More on this in #13 Variability of Expression and #14 Norms of Emotion Expression.
#4. Wellness and Emotion: The prominent role of one’s body in the theory of constructed emotion, nicely accounts for the fact that various physical and mental illnesses (e.g., Parkinsons, Alzheimers, Cardiovascular Disease, Depression, Anxiety) impact our emotional lives. Existing AER systems are not capable of handling these inter-subject and within-subject variability and thus should not be deployed in scenarios where their decisions could negatively impact the lives of people; and where deployed, their limitations should be clearly communicated.
Emotion recognition is playing a greater role than ever before in understanding how our language reflects our wellness, understanding how certain physical and mental illnesses impact our emotional expression, and understanding how emotional expression can help improve our well-being. For some medical conditions, clinicians can benefit from a detailed history of one’s emotional state. However, people are generally not very good at remembering how they had been feeling over the past week, month, etc. Thus an area of interest is to use AER to help patients track their emotional state. See applications of AER in Public Health in Section sec-applications. See also CL Psych workshop proceedings. Note, however, that these are cases where the technology is working firmly in an assistive role to clinicians and psychologists—providing additional information in situations where human experts make decisions based on a number of other sources of information as well. Seefor ethical considerations on inferring mental health states from one’s utterances.
#5. Aggregate Level vs. Individual Level: Emotion detection can be be used to make inferences about individuals or groups of people; for example, to assist one in writing, to recommend products or services, etc. or to determine broad trends in attitudes towards a product, issue, or some other entity. Statistical inferences tend to be more reliable when using large amounts of data and when using more relevant data. Systems that make predictions about individuals often have very little pertinent information about the individual and thus often fall back on data from groups of people. Thus, given the person-to-person variability and within-person variability discussed in the earlier bullets, systems are imbued with errors and biases. Further, these errors are especially detrimental because of the direct and personal nature of such interactions. They may, for example, attribute majority group behavior/preferences to the individual, further marginalizing those that are not in the majority.
Various ethical concerns, including privacy, manipulation, bias, and free speech, are further exacerbated when systems act on individuals.
Work on finding trends in large groups of people on the other hand benefits from having a large amount of relevant information to draw on. However, see #43 Group Privacy and #47 to #50 _Implications for Social Groups_for relevant concerns.
B. Implications of Automation
What are the ethical implications of automating the chosen task?
#6. Why Automate (Who Benefits and Will this Shift Power): When we choose to work on a particular AER task, or any AI task for that matter, it is important to ask ourselves why? Often the first set of responses may be straightforward: e.g., to automate some process to make people’s lives easier, or to provide access to some information that is otherwise hard to obtain, or to answer research questions about how emotions work. However, lately there has been a call to go beyond this initial set of responses and ask more nuanced, difficult, and uncomfortable questions such as:
- Who will benefit from this work and who will not?
- Will this work shift power from those who already have a lot of power to those that have less power?
- How can we reframe or redesign the task so that it helps those that are most in need?
Specifically for AER, this will involve considerations such as:
- Are there particular groups of people who will not benefit from this task: e.g., people who convey and detect emotions differently than what is common (e.g., people on the autism spectrum), people who use language differently than the people whose data is being used to build the system (e.g., older people or people from a different region)?
- If AER is used in some application, say to determine insurance premiums, then is this further marginalizing those that are already marginalized?
- How can we prevent the use of emotion and stance detection systems for detecting and suppressing dissidents?
- How can AER help those that need the most help?
Various other considerations such as those listed in this sheet can be used to further evaluate the wisdom in investing our labor in a particular task.
#7. Embracing Neurodiversity: Much of the ML/NLP emotion work has assumed homogeneity of users and ignored neurodiversity, alexithymia, and autism spectrum. These groups have significant overlap, but are not identical. They are also often characterized as having difficulty in sensing and expressing emotions. Therefore these groups hold particular significance in the development of an inclusive AER system. Existing AER systems implicitly cater to the more populous neurotypical group. At minimum, such AER systems should explicitly acknowledge this limitation. Report disaggregated performance metrics for relevant groups. (See also #47 Disaggregation.)
Greater research attention needs to be paid to the neurodiverse group. When doing data annotations, we should try to obtain information on whether participants are neurodiverse or neurotypical (when participants are comfortable sharing that information), and include that information at an aggregate level when we report participant demographics. Work in Psychology has used scales such as the Toronto Alexithymia Scale (TAS-20) to determine the difficulty that people might have in identifying and describing emotions.
#8. Participatory/Emancipatory Design: Participatory design in research and systems development centers the people, especially marginalized and disadvantaged communities, such that they are not mere passive subjects but rather have the agency to shape the design process. This has also been referred to as emancipatory researchand is pithily captured by the rallying cry “nothing about us without us”. These calls have developed across many different domains, including research pertaining to disability, indigenous communities, autism spectrum, and neurodiversity. Seefor specific recommendations for conducting studies with neuro-diverse participants.
#9. Applications, Dual Use, Misuse: AER is a powerful enabling technology that has a number of applications. Thus, like all enabling technologies it can be misused and abused. Examples of inappropriate commercial AER application include:
Using AER at airports to determine whether an individual is dangerous simply from their facial expressions.
Detecting stance towards governing authorities to persecute dissidents.
Using deception detection or lie detection en masse without proper warrants or judicial approval. (Using such technologies even in carefully restricted individual cases is controversial.)
Increasing someone’s insurance premium because the system has analyzed one’s social media posts to determine (accurately or inaccurately) that they are likely to have a certain mental health condition.
Advertisement that prey on the emotional state of people, e.g., user-specific advertising to people when they are emotionally vulnerable.
Socio-Psychological Applications: Applications such as inferring patterns in emotions of a speaker to in turn infer other characteristics such as suitability for a job, personality traits, or health conditions are especially fraught with ethical concerns. For example, consider the use of the Myers–Briggs Type Indicator (MBTI) for hiring decisions or research on detecting personality traits automatically. Notable ethical concerns, include:
MBTI is criticized by psychologists, especially for its lack of test-retest reliability. The Big 5 personality traits formalismhas greater validity, but even when using Big 5, it is easy to overstate the conclusions.
Even with accurate personality trait identification, there is little to no evidence that using personality traits for hiring and team-composition decisions is beneficial. The use of such tests have also been criticized on the grounds of discrimination.
Health and Well-Being Applications: AER has considerable potential for improving our health and well-being outcomes. However, the sensitive nature of such applications require substantial efforts to adhere to the best ethical principles. For example, how can harm be mitigated when systems make errors? Should automatic systems be used at all given that sometimes we cannot put a value to the cost of errors? What should be done when the system detects that one is at a high risk of suicide, depression, or some other severe mental health condition? How to safeguard patient privacy? See the shared task at the 2021 CL Psych workshop where a secure enclave was used to store the training and test data. See these papers for ethical considerations of AI systems in health care.
Applications in Art and Culture: Lately there has been increasing use of AI in art and culture, especially through curation and recommendation systems. Seefor a discussion of ethical implications, including: are we really able to determine what art one would like, long-term impacts of automated curation (on users and artists), and diversity of sources and content.
AI is also used in the analysis and generation of art: e.g, for literary analysis and generating poems, paintings, songs, etc. Since emotions are a central component of art, much of this work also includes automatic emotion recognition: e.g. tracking the emotions of characters in novels, recommending songs for people based on their mood, and generating emotional music. This raises several questions including:
Is it art if the creation did not involve human input?[https://www.artbasel.com/news/artificial-intelligence-art-artist-boundary]
Should AI play a collaborative role with other artists (enhancing their creativity) as opposed to generate pieces on its own?
How will artists be impacted by AI’s role in art?
Who should get credit for AI art?[https://www.cnn.com/style/article/ai-art-who-should-get-credit-conversation/index.html]
How should we critique AI art?[https://www.artnews.com/art-in-america/features/creative-ai-art-criticism-1202686003/]
See further discussion by.
#10. Disclosure of Automation: Disclose to all stakeholders the decisions that are being made (in part or wholly) by automation. Provide mechanisms for the user to understand why relevant predictions were made, and also to contest the decisions. (See also #36 Interpretability and #40 Contestability.)
Artificial agents that perceive and convey emotions in a human-like manner can give one the impression that they are interacting with a human. Artificial agents should begin their interactions with humans by first disclosing that they are artificial agents, even though some studies show certain negative outcomes of such a disclosure.
DATA
(Thirteen considerations.) [-2pt]
C. Why This Data
What are the ethical implications of using the chosen data?
#11. Types of Data: Emotion and sentiment researchers have used text data, speech data, data from mobile devices, data from social media, product reviews, suicide notes, essays, novels, movie screenplays, financial documents, etc. All of these entail their own ethical considerations in terms of the various points discussed in this article. AER systems use data in various forms, including:
- Large Language Models: Language models such as BERT (that capture common patterns in language use) are obtained by training ML models on massive amounts of text found on the internet. Seefor ethical considerations in the use of large language models, including: documentation debt, difficult to curate, incorporation of inappropriate biases, and perpetuation of stereotypes. Note also that using smaller amounts of data raise concerns as well: they may not have enough generalizable information; they may be easier to overfit on; and they may not include diverse perspectives. An important aspect of preparing data (big or small) is deciding how to curate it
- (e.g., what to discard).
- Emotion Lexicons: Emotion Lexicons are lists of words and their associated emotions (determined manually by annotation or automatically from large corpora). Word–emotion association lexicons (such as AFINN, NRC Emotion Lexicon, and the Valence, Arousal, Dominance Lexicon) are a popular type of resource used in emotion research, emotion-related data science, and machine learning models for AER. Seefor biases and ethical considerations in the use of such emotion lexicons. Notable among these considerations is how words in different domains often convey different senses and thus have different emotion associations. Also, word associations capture historic perceptions that change with time and may differ acrossdifferent groups of people. They are not indicative of inherent immutable emotion labels.
- Labeled Training and Testing Data: AER systems often make use of a relatively small number of example instances that are manually labeled (annotated) for emotions. A portion of these is used to train/fine-tune the large language model (training set). The rest is further split for development and testing. I discuss various ethical considerations associated with using emotion-labeled instances below. #12. Dimensions of Data: The data used by AER systems can be examined across various dimension: size of data; whether it is custom data (carefully produced for the research) or data obtained from an online platform (naturally occurring data); less private/sensitive data or more private/sensitive data; what languages are represented in the data; degree of documentation provided with the data; and so on. All of these have societal implications and the choice of datasets should be appropriate for the context of deployment.
D. Human Variability vs. Machine Normativeness
What should we know about emotion data so that we use it appropriately?
#13. Variability of Expression and Mental Representation: Language is highly variable—we can express roughly the same meaning in many different ways.
_Expressions of emotions through language are highly variable: Different people express the same emotion differently; the same text may convey different emotions to different people. _
This is true even for people living in the same area and especially true for people living in different regions, and people with different lived experiences. Some cues of emotion are somewhat more common and somewhat more reliable than others. This is usually the signal that automatic systems attempt to capture. We construct emotions in our brains from the signals we get from the world and the signal we get from our bodies. This mapping of signals to emotions is highly variable, and different people can have different signals associated with different emotions; therefore, different people have different concept–emotion associations. For example, high school, public speaking, and selfies may evoke different emotions in different people. This variability is not to say that there are no commonalities. In fact, speakers of a language share substantial commonalities in their mental representation of concepts (including emotions), which enables them to communicate with each other. However, the variability should also be taken into consideration when building datasets, systems, and choosing where to deploy the systems.
#14. Norms of Emotion Expression: As John M. Culkin once said, “We shape our tools and thereafter they shape us." Whether text, speech, vision, or any other modality, AI systems are often trained on a limited set of emotion expressions and their emotion annotations (emotion labels for the expressions).
Thus, through their behaviour (e.g., by recognizing some forms of emotion expression and not recognizing others), AI systems convey to the user that it is “normal” or appropriate to convey emotions in certain ways; implicitly invalidating other forms of emotion expression.
Therefore it is important for emotion recognition systems to accurately map a diverse set of emotion instantiations to emotion categories/dimensions. That said, it is also worth noting that the variations in emotion and language expression are so large that systems can likely never attain perfection. The goal is to obtain useful levels of emotion recognition capabilities without having systematic gaps that convey a strong sense of emotion-expression normativeness.
Normative implications of AER are analogous to normative implications of movies (especially animated ones):
- Badly executed characters express emotions in fixed stereo-typical ways.
- Good movies explore the diversity, nuance, and subtlety of human emotion expression.
- Influential movies (bad and good) convey to a wide audience around the world how emotions are expressed or what is “normal” in terms of emotion expression. Thus they can either colonize other groups, reducing emotion expression diversity, or they can validate one’s individualism and independence of self-expression.
Since AI systems areinfluenced by the data they train on, dataset development should:
Obtain data from a diverse set of sources. Report details of the sources.
Studies have shown that a small percentage of speakers often produce a large percentage of utterances (see study byon tweets). Thus, when creating emotion datasets, limit the number of instances included per person. -kept one tweet for every query term and tweeter combination when studying relationships between affect categories (data also used in a SemEval-2018 Task 1 on emotions). -kept at most three tweets per tweeter when studying expressions of loneliness.
Obtain annotations from a diverse set of people. Report aggregate-level demographic information of the annotators.
Variability is common not just for emotions but also forlanguage. People convey meaning in many different ways. Thus, these considerations apply to NLP in general.
#15. Norms of Attitudes: Different people and different groups of people might have different attitudes, perceptions, and associations with the same product, issue, person, social groups, etc. Annotation aggregation, by say majority vote, may convey a more homogeneous picture to the ML system. Annotation aggregation may also capture stereotypes and inappropriate associations for already marginalized groups. (For example, majority group A may perceive a minority group B as less competent, or less generous.) Such inappropriate biases are also encoded in large language models. When using language models or emotion datasets, assess the risk of such biases for the particular context and take correcting action as appropriate.
#16. One “Right” Label or Many Appropriate Labels: When designing data annotation efforts, consider whether there is a “right” answer and a “wrong”? Who decides what is correct/appropriate? Are we including the voices of those that are marginalized and already under-represented in the data? When working with emotion and language data, there are usually no “correct” answers, but rather, some answers are more appropriate than others. And there can be multiple appropriate answers.
- If a task has clear correct and wrong answers and knowing the answers requires some training/qualifications, then one can employ domain experts to annotate the data. However, as mentioned, emotion annotations largely do not fall in this category.
- If the goal is to determine how people use language, and there can be many appropriate answers, or we want to know how people perceive words, phrases, and sentences then we might want to employ a large number of annotators. This is much more in line with what is appropriate for emotion annotations — people are the best judges of their emotions and of the emotions they perceive from utterances.
Seek appropriate demographic information (respectfully and ethically).Document annotator demographics, annotation instructions, and other relevant details. These are useful in conveying to the reader that there is no one “correct” answer and that the dataset is situated in who annotated the data, the precise annotation instructions, when the data was annotated, etc.
#17. Label Aggregation: Multiple annotations (by different people) for the same instanceare usually aggregated by choosing the majorty label. However, majority voting tends to capture majority group attitudes (at the expense of other groups). (See also,, and.) As a result, sometimes researchers have released not just the aggregated results but also the raw (pre-aggregated data), as well as various versions of aggregated results. Others have argued in favor of not doing majority voting at all and including all annotations as input to ML systems. However, saying all voices should be included has its own problems: e.g., how to address and manage inappropriate/racist/sexist opinions; how to disentangle low-frequency valid opinions from genuine annotation errors and malicious annotations? (See also #15 Norms of Attitudes and #47 Disaggregation.)
If using majority voting, acknowledge its limitations. Acknowledge that it may be missing some/many voices. Explore statistical approaches to finding multiple appropriate labels, while still discarding noise. Employ separate manual checks to determine whether the human annotations also capture inappropriate human biases. Such biases may be useful for some projects (e.g., work studying such biases), but not for others. Warn users ofinappropriate biases that may exist in the data; and suggest strategies to deal with them when using the dataset.
#18. Historical Data (Who is Missing and What are the Biases): Machine learning methods feed voraciously on data (often historical data). Natural language processing systems often feed on huge amounts of data collected from the internet. However, the data is not representative of everyone and seeped into this data are our biases. Historical data over-represents people who have had power, who are more well to do, mostly from the west, mostly English-speaking, mostly white, mostly able-bodied, and so on and so forth. So the machines that feed on such data often learn their perspectives at the expense of the views of those already marginalized.
When using any dataset, devote resources to study who is included in the dataset and whose voices are missing. Take corrective action as appropriate. Keep a portion of your funding for work with marginalized communities. Keep a portion of your funding for work on less-researched languages.
#19. Training–Deployment Data Differences: The accuracy of supervised systems is contingent on the assumption that the data the system is applied to is similar to the data the system was trained on. Deploying an off-the-shelf sentiment analysis system on data in a different domain, from a different time, or a different class distribution than the training data will likely result in poor predictions. Systems that are to be deployed to handle open-domain data should be trained on many diverse datasets and tested on many datasets that are quite different from the training datasets.
E. The People Behind the Data
What are the ethical implications on the people who have produced the data?
When building systems, we make extensive use of (raw and emotion-labeled) data. It can sometimes be easy to forget that behind the data are the people that produced it, and imprinted in it are a plethora of personal information.
#20. Platform Terms of Service: Data for ML systems is often scraped from websites or extracted from large online platforms (e.g., Twitter, Reddit) using APIs. The terms of service for these platforms often include protections for the users and their data. Ensure that the terms of service of the source platforms are not violated: e.g., data scraping is allowed and data redistribution is allowed (in raw form or through ids). Ensure compliance with the robot exclusion protocol.
#21. Anonymization and Ability to Delete One’s information: Take actions to anonymize data when dealing with private data; e.g., scrub identifying information. Some techniques are better at anonymization than others. (See for example, privacy-preserving work on word embeddings and sentiment data by.) Provide mechanisms for people to remove their data from the dataset if they choose to.
Choose to not work with a dataset if adequate safeguards cannot be placed.
#22. Warnings and Recourse: Annotating highly emotional, offensive, or suicidal utterances can adversely impact the well-being of the annotators. Provide appropriate warnings. Minimize amount of data exposure per annotator. Provide options for psychological help as needed.
#23. Crowdsourcing: Crowdsourcing (splitting a task into multiple independent units and uploading them on the internet so that people can solve them online) has grown to be a major source of labeled data in NLP, Computer Vision, and a number of other academic disciplines. Compensation often gets most of the attention when talking about crowdsourcing ethics, but there are several ethical considerations involved with such work such as: worker invisibility, lack of learning trajectory, humans-as-a-service paradigm, worker well-being, and worker rights. See. See (public) guidelines by AI2 for its researchers.
METHOD
(Eight considerations.)
F. Why This Method
What are the ethical implications of using a given method?
#24. Methods and their Tradeoffs: Different methods entail different trade-offs:
Less Accurate vs. More Accurate: This usually gets all the attention; value other dimensions listed below as well. (See also sec-impact IMPACT.)
White Box (can understand why system makes a given prediction) vs. Black Box (do not know why it makes a given prediction): understanding the reasons behind a prediction help identify bugs and biases; helps contestability; arguably, better suited for answering research questions about language use and emotions.
Less Energy Efficient vs. More Energy Efficient: See discussion further below on Green AI.
Less Data Hungry vs. More Data Hungry: data may not always be abundant; needing too much data of a person leads to privacy concerns.
Less Privacy Preserving vs. More Privacy Preserving: There is greater appreciation lately for the need for privacy-preserving NLP.
Fewer Inappropriate Biases vs. More Inappropriate Biases: We want our algorithms to not perpetuate/amplify inappropriate human biases.
Consider various dimensions of a method and their importance for the particular system deployment context before deciding on the method. Focusing on fewer dimensions may be okay in a research system, but widely deployed systems often require a good balance across the many dimensions.
#25. Who is Left Out: The dominant paradigm in Machine Learning and NLP is to use use large pre-trained models pre-trained on massive amounts of raw data (unannotated text, pictures, videos, etc.) and then fine-tuned on small amounts of labeled data (e.g., sentences labeled with emotions) to learn how to perform a particular task. As such, these methods tend to work well for people that are well-represented in the data (raw and annotated), but not so well for others. (See also #18 Historical Data.)
Even just documenting who is left out is a valuable contribution.
Explore alternative methods that are more inclusive, especially for those not usually included by other systems.
#26. Spurious Correlations: Machine learning methods have been shown to be susceptible to spurious correlations. For example,show thatwhen asked what is the ground covered with, visual QA systems tend to always say snow, because in the training set, this question was only asked for when the ground was covered with snow.andshow spurious correlations in melanoma and skin lesion detection systems.andshow that natural language inference systems can sometimes decide on the prediction just from information in the premise, without regard for the hypothesis (for example, because a premise with negation is often a contradiction in the training set).
Similarly, machine learning systems capture spurious correlations when doing AER. For example, marking some countries and people of some demographics with less charitable and stereotypical sentiments and emotions. This phenomenon is especially marked in abusive language detection work where it was shown that data collection methods in combination with the ML algorithm result in the system marking any comment with identity terms such as gay, muslim, and jew as offensive.
Consider how the data collection and machine learning set ups can be addressed to avoid such spurious correlations, especially correlations that perpetuate racism, sexism, and stereotypes. In extreme cases, spurious correlations lead to pseudoscience and physiognomy. For example, there have been a spate of papers attempting to determine criminality, personality, trustworthiness, and emotions just from one’s face or outer appearance. Note that sometimes, systematic idiosyncrasies of the data can lead to apparent good results on a held out test set even on such tasks. Thus it is important to consider whether the method and sources of information used are expected to capture the phenomenon of interest? Is there a risk that the use of this method may perpetuate false beliefs and stereotypes? If yes, take appropriate corrective action.
#27. Context is Everything: Considering a greater amount of context is often crucial in correctly determining emotions/sentiment. What was said/written before and after the target utterance? Where was this said? What was the intonation and what was emphasized? Who said this? And so on. More context can be a double-edged sword though. The more the system wants to know about a person to make better predictions, the more we worry about privacy. Work on determining the right balance between collecting more user information and privacy considerations, as appropriate for the context in which the system is deployed.
#28. Individual Emotion Dynamics: A form of contextual information is one’s utterance emotion dynamics. The idea is that different people might have different steady states in terms of where they tend to most commonly be (considering any affect dimension of choice). Some may move out of this steady state often, but some may venture out less often. Some recover quickly from the deviations, and for some it may take a lot of time. Similar emotion dynamics occur in the text that people write or the words they utter—Utterance Emotion Dynamics. The degree of correlation between the utterance emotion dynamics and the true emotion dynamics may be correlated, but one can argue that examining utterance emotion dynamics is valuable on its own. Access to utterance emotion dynamics provides greater context and helps judge the degree of emotionality of new utterances by the person. Systems that make use of such detailed contextual information are more likely to make appropriate predictions for diverse groups of people. However, the degree of personal information they require warrants care, concern, and meaningful consent from the users.
#29. Historical behavior is not always indicative of future behavior (for groups and individuals): Systems are often trained on static data from the past. However, perceptions, emotions, and behavior change with time. Thus automatic systems may make inappropriate predictions on current data. (See also #18 Historical Data.)
#30. Emotion Management, Manipulation: Managing emotions is a central part of any human–computer interaction system (even if this is often not an explicitly stated goal). Just as in human–human interactions, we do not want the systems we build to cause undue stress, pain, or unpleasantness. For example, a chatbot has to be careful to not offend or hurt the feelings of the user with which it is interacting. For this, it needs to assess the emotions conveyed by the user, in order to then be able to articulate the appropriate information with appropriate affect. However, this same technology can enable companies and governments to detect one’s emotions to manipulate their behavior. For example, it is known that we purchase more products when we are sad. So sensing when you are most susceptible to suggestion to plant ideas of what to buy, who to vote for, or who to dislike, can have dangerous implications. On the other hand, identifying how to cater to individual needs to improve their compliance with public health measures in a world-wide pandemic, or to help people give up on smoking, may be seen in more positive light. As with many things discussed in this article, consider the context to determine what levels of emotional management and meaningful consent are appropriate.
#31. Green AI: A direct consequence of using ever-increasing pre-trained models (large number of training examples and hyperparameters) for AI tasks is that these systems are now drivers of substantial energy consumption. Recent papers showing the increasing carbon footprint of AI systems and approaches to address them. Thus, there is a growing push to develop AI methods that are not singularly focused on accuracy numbers on test sets, but are also mindful of efficiency and energy consumption. The authors encourage reporting of cost per example, size of training set, number of hyperparameters, and budget-accuracy curves. They also argue for regarding efficiency as a valued scientific contribution.
IMPACT AND EVALUATION
(Ten considerations.)
G. Metrics [2pt] All evaluation metrics are misleading. Some metrics are more useful than others.
#32. Reliability/Accuracy: No emotion recognition method is perfect. However, some approaches are much less accurate than others. Some techniques are so unreliable that they are essentially pseudoscience. For example, trying to predict personality, mood, or emotions through physical appearances has long been criticized. The ethics of a number of existing commercial systems that purportedly detect emotions from facial expressions is called into question by, which shows the low reliability of recognizing emotions from facial expressions.
#33. Demographic Biases: Some systems can be unreliable or systematically inaccurate for certain groups of people, races, genders, people with health conditions, people that are on the autism spectrum, people from different countries, etc. Such systematic errors can occur when working on:
- Utterances of a group or faces of a group: For example, low accuracy in recognizing emotions in text produced by African Americans or in recognizing faces of African Americans.
- Utterances mentioning a group: For example, systematically marking texts mentioning African Americans as more angry, or texts mentioning women as more emotional.
Determine and present disaggregated accuracies. Take steps to address disparities in performance across groups. (See also #47 Disaggregation.)
#34. Sensitive Applications: Some applications are considerably more sensitive than others and thus necessitate the use of a much higher quality of emotion recognition systems (if used at all). Automatic systems may sometimes be used in high-stakes applications if their role is to assist human experts. For example, assisting patients and health experts in tracking the patient’s emotional state.
#35. Testing (on Diverse Datasets, on Diverse Metrics): Results on any test set are contingents on the attributes of that test set and may not be indicative of real-world performance, or implicit biases, or systematic errors of many kinds. Good practice is to test the system on many different datasets that explore various input characteristics. For example, see these evaluations that cater to a diverse set of emotion-related tasks, datasets, linguistic phenomena, and languages: SemEval 2014 Task 9, SemEval 2015 Task 10, and SemEval 2018 Task 1. (The last of which also includes and evaluation component for demographic bias in sentiment analysis systems.) Seefor work on creating separate diagnostic datasets for various types of hate speech. See Google’s recommendations on best practices on metrics and testing (https://ai.google/responsibilities/responsible-ai-practices )
H. Beyond Metrics
Are we even measuring the right things?
#36. Interpretability, Explainability: As ML systems are deployed more widely and impact a greater sphere of our lives, there is a growing understanding that these systems can be flawed to varying degrees. One line of approach in understanding and addressing these flaws is to develop interpretable or explainable models. Interpretability and explainability each have been defined in a few different ways in the literature, but at the heart of the definitions is the idea that we should be able to understand why a system is making a certain prediction: what pieces of evidence are contributing to the decision and to what degree? That way, humans can better judge how valid a particular prediction is, better judge how accurate the model is for certain kinds of input, and even how accurate the system is in general and over time.
In line with this, AER systems should have components that depict why they are making certain predictions for various inputs. As described in thesurvey, such components can be viewed from several perspectives, including:
are the explanations meant for the scientist/engineer or to a lay person?
are the explanations faithful (accurate reflections of system behavior)?
are the explanations easily comprehensible?
to what extent do people trust the explanations?
Responsible research and product development entails actively considering various explainability strategies at the very outset of the project. This includes, where appropriate, specifically choosing an ML model that lends itself to better interpretability, running ablation and disaggregation experiments, running data perturbation and adversarial testing experiments, and so on.
#37. Visualization: Visualizations help convey trends in emotions and sentiments, and are common in the emotion analysis of streams of data such as tweet streams, novels, newspaper headlines, etc. There are several considerations when developing visualizations that impact the extent to which they are effective, convey key trends, and the extent to which they may be misleading:
It is almost always important to not only show the broad trends but also to allow the user to drill down to the source data that is driving the trend.
Summarize the data driving the trend, for example through treemaps of the most frequent emotion words and phrases in the data.
Interactive visualizations allow users to explore different trends in the data and even drill down to the source data that is driving the trends.
See work on visualizing emotions and sentiment.
#38. Safeguards and guard rails: Devote time and resources to identify how the system can be misused and how the system may cause harm because of it’s inherent biases and limitations. Identify steps that can be taken to mitigate these harms.
#39. Recognize that there will be harms even when the system works “correctly”: Provide a mechanism for users to report issues. Have resources in place to deal with unanticipated harms. Document societal impacts, including both benefits and harms.
**#40. Contestability and Recourse:**argue that contestability—the mechanisms made available to challenge the predictions of an AI system—are more important and beneficial than transparency/explainability. Not only do they allow people to challenge the decisions made by a system, they also invite participation in the understanding of how machine learning systems work and their limitations. See Google’s The What-If Tool as an example of how people are invited to explore ML systems by changing inputs (without needing to do any coding). AER systems are encouraged to produce similar tools, for example:
tools that allow one to see counterfactuals—given a data point, what is the closest other data point for which the system predicts a different label; tools that allow one to try out various input conditions/features to see what help obtain the desired classification label.
tools that allow one to see classification accuracies on different demographics and the impact of different classifier parameters and thresholds on these scores.
tools that allow one to see confidence of the classifier for a given prediction and the features that were primarily responsible for the decision.
Seefor ideas on on participatory dataset creation and management.
#41. Be wary of Ethics Washing: As we push farther into incorporating ethical practices in our projects, we need to be wary of inauthentic and cursory attention to ethics for the sake of appearances. This VentureBeat articlepresents some nice tips to avoid ethics washing, including: “Welcome ‘constructive dissent’ and uncomfortable conversations”, “Don’t ask for permission to get started”, “Share your shortcomings”, “Be prepared for gray area decision-making”, and “Ethics has few clear metrics”.
IMPLICATIONS FOR PRIVACY, SOCIAL GROUPS
(Nine considerations.)
I. Implications for Privacy
(Cuts across Task Design, Data, Method, Impact and Evaluation)
#42. Privacy and Personal Control: As noted privacy expert, Dr. Ann Cavoukian, puts it: privacy is not about hiding information or secrecy. It is about choice, “You have to be the one to make the decision." Individuals may not want their emotions to be inferred. Applying emotion detection systems en masse—gathering emotion information continuously, without meaningful consent, is an invasion of privacy, harmful to the individual, and dangerous to society. (See reportcreated for the members of the European Parliament). Follow the seven principles of privacy by design: Proactive not Reactive (preventative not remedial), Privacy as the Default, Privacy Embedded into Design, Full Functionality (positive-sum, not zero-sum), End-to-End Security (full lifecycle), Visibility and Transparency, and Respect for User Privacy (keep it user-centric). See also privacy-preserving work on sentiment by.
**#43. Group Privacy and Soft Biometrics:**argues that many of our conversations around privacy are far too focused on individual privacy and ignore group privacy — the rights and protections we need as a group.
There are very few Moby-Dicks. Most of us are sardines. The individual sardine may believe that the encircling net is trying to catch it. It is not. It is trying to catch the whole shoal. It is therefore the shoal that needs to be protected, if the sardine is to be saved. —The idea of group privacy becomes especially important in the context of soft-biometrics such as traits and preferences determined through AER that are not intended to be able to identify individuals, but rather identify groups of people with similar characteristics. Seefor further discussions on the implications of AER on group privacy and how companies are using AER to determine group preferences, even though a large number of people disfavour such profiling.
#44. Mass Surveillance versus Right to Privacy, Right to Freedom of Expression, and Right to Protest: Emotion recognition, sentiment analysis, and stance detection can be used for mass surveillance by companies and governments (often without meaningful consent). There is low awareness in people that their information (e.g., what they say or click on an online platform) can be used against their best interest. Often people do not have meaningful choices regarding privacy when they use online platforms. In extreme cases, as in the case of authoritarian governments, this can lead to dramatic curtailing of freedoms of expression and the right to protest.
#45. Right Against Self-Incrimination: In a number of countries around the world, the accused are given legal rights against self-incrimination. However, automatic methods of emotion, stance, and deception detection can potentially be used to circumvent such protections. (Seepage 37.)
#46. Right to Non-Discrimination: Automatic methods of emotion, stance, and deception detection can sometimes systematically discriminate based on these protected categories such as race, gender, and religion. Even if ML systems are not fed race or gender information directly, studies have shown that they often pick up on proxy attributes for these categories. Report disaggregated results as appropriate.
J. Implications for Social Groups
(Cuts across Task Design, Data, Method, Impact and Evaluation)
#47. Disaggregation: Society has often viewed different groups differently (because of their race, gender, income, language, etc.), imposing unequal social and power structures. Even when the biases are not conscious, the unique needs of different groups is often overlooked. For example,discusses, through numerous examples, how there is a considerable lack of disaggregated data for women and how that is directly leading to negative outcomes in all spheres of their lives, including health, income, safety, and the degree to which they succeed in their endeavors. This holds true (perhaps even more) for transgender people. Thus emotion researchers should consider the value of disaggregation at various levels, including:
When creating datasets: Obtain annotations from a diverse group of people. Report aggregate-level demographic information. Rather than only labeling instances with the majority vote, consider the value of providing multiple sets of labels as per each of the relevant and key demographic groups.
When testing hypotheses or drawing inferences about language use: Consider also testing the hypotheses disaggregated for each of the relevant and key demographic groups.
When building automatic prediction systems: Report performance disaggregated for each of the relevant and key demographic groups. (See work on model cards. See how sentiment analysis systems can be systematically biased.)
#48. Intersectional Invisibility in Research: Intersectionality refers to the complex ways in which different group identities such as race, class, neurodiversity, and gender overlap to amplify discrimination or disadvantage.argue how people with multiple group identities are often not seen as prototypical members of any of their groups and thus are subject to, what they, call intersectional invisibility—omissions of their experiences in historical narratives and cultural representation, lack of support from advocacy groups, and mismatch with existing anti-discrimination frameworks. Many of the forces that lead to such invisibility (e.g., not being seen as prototypical members of a group) along with other notions common in the quantitative research paradigm (e.g., the predilection to work on neat, non-overlapping, populous categories) lead to intersectional invisibility in research. As ML/NLP researchers, we should be cognizant of such blind spots and work to address these gaps. Further, new ways of doing research that address the unique challenges of doing intersectional research need to be valued and encouraged.
#49. Reification and Essentialization: Some demographic variables are essentially, or in big part, social constructs. Thus, work on disaggregation can sometimes reinforce false beliefs that there are innate differences across different groups or that some features are central for one to belong to a social category. Thus it is imperative to contextualize work on disaggregation. For example, by impressing on the reader that even though race is a social construct, the impact of people’s perceptions and behavior around race lead to very real-world consequences.
#50. Attributing People to Social Groups: In order to be able to obtain disaggregated results, sometimes oneneeds access to demographic information. This of course leads to considerations such as: whether they are providing meaningful consent to the collection of such data and whether the data being collected in a manner that respects their privacy, their autonomy (e.g., can they choose to delete their information later), and dignity (e.g., allowing self-descriptions). Challenges persist in terms of how to design effective and inclusive questionnaires. Further, even with self-report textboxes that give the respondent the primacy and autonomy to express their race, gender, etc., downstream research often ignores such data or combines information in ways beyond the control of the respondent. Some work tries to infer aggregate-level group statistics automatically. For example, inferring race, gender, etc. from cues such as the type of language used, historical name-gender associations, etc. to do disaggregated analysis. However, such approaches are fraught with ethical concerns such as misgendering, essentialization, and reification. Further, historically, people have been marginalized because of their social category, and so methods that try to detect these categories raise legitimate and serious concerns of abuse, erasure, and perpetuating stereotypes.
In many cases, it may be more appropriate to perform disaggregated analysis on something other than a social category. For example, when testing face recognition systems, it might be more appropriate to test the system performance on different skin tones (as opposed to race). Similarly, when working on language data, it might be more appropriate to analyze data partitioned by linguistic gender (as opposed to social gender). Seefor a useful discussion on linguistic vs. social gender and also for a great example to create more inclusive data for research.
In Summary
This paper aggregates and organizes various ethical considerations relevant to automatic emotion recognition, drawn from the wider AI Ethics and Affective Computing literature. It includes brief sections on the modalities of information, task, and applications of AER to set the context. Then it presents fifty ethical considerations grouped thematically. Notably, the sheet fleshes out assumptions hidden in how AER is commonly framed, and in the choices often made regarding the data, method, and evaluation. Special attention is paid to the implications of AERon privacy and social groups. It discusses how these considerations manifest within AER and outlines best practices for responsible research. A succinct list of key recommendations for responsible AER discussed in the paper is provided in the Appendix.
The objective of the sheet is to encourage practitioners to think in more detail and at the very outset: why to automate, how to automate, and how to judge success based on broad societal implications. I hope that it will help engage the various stakeholders of AER with each other; help stakeholders challenge assumptions made by researchers and developers; and help develop appropriate harm mitigation strategies. Additionally, for those that are new to emotion recognition, the ethics sheet acts as a useful introductory document(complementing survey articles).
As an expert on a technology, an often overlooked and undervalued responsibility is to convey its broad societal impacts to those that deploy the technology, those that make policy decisions about the technology, and the society at large. I hope that this sheet helps to that end for emotion recognition, and also spurs the wider community to ask and document: What ethical considerations apply to my task?
I am grateful to Annika Schoene, Mallory Feldman, and Tara Small for their belief and encouragement in the early days of this project. Many thanks to Mallory Feldman (Carolina Affective Neuroscience Lab, UNC) for discussions on the psychology and complexity of emotions. Many thanks to Annika Schoene, Mallory Feldman, Roman Klinger, Rada Mihalcea, Peter Turney, Barbara Plank, Malvina Nissim, Viviana Patti, Maria Liakata, and Emily Mower Provost for discussions about ethical considerations for emotion recognition and thoughtful comments. Many thanks to Tara Small, Emily Bender, Esma Balkir, Isar Nejadgholi, Patricia Thaine, Brendan O’Connor, Cyril Goutte, Eric Joanis, Joel Martin, Roland Kuhn, and Sowmya Vajjala for thoughtful comments on the blog post on this work.
APPENDIX: Recommendations for Responsible AER
Below is a list of key recommendations for responsible AER discussed earlier in the context of various ethical considerations. They are compiled here for easy access. Note that adhering to these recommendations does not guarantee “ethicalness”; nor do these recommendations apply to all contexts. They are guidelines meant to help responsible development and use of AER systems. Particular development or deployment contexts entail further considerations and steps to address them.
Task Design
Center the people, especially marginalized and disadvantaged communities, such that they are not mere passive subjects but rather have the agency to shape the design process.
Ask who will benefit from this work and who will not? Will this work shift power from those who already have a lot of power to those that have less power? How can the task be designed so that it helps those that are most in need ?
Ask how the AER design will impact people in the context ofneurodiversity, alexithymia, and autism spectrum.
Carefully consider what emotion task should be the focus of the work (whether conducting a human-annotation study or building an automatic prediction model). Different emotion tasks entail different ethical considerations. Communicate the nuance of exactly what emotions are being captured to the stakeholders. Not doing so will mean will lead to the misuse and misinterpretation of one’s work.
AER systems should not claim to determine one’s emotional state from their utterance, facial expression, gait, etc. At best, AER systems capture what one is trying to convey or what is perceived by the listener/viewer, and even there, given the complexity of human expression, they are often inaccurate.
Even when AER systems attempt to determine the emotional state of a person (or a group) over time (drawing inferences at aggregate level from large amounts of data), such as studies on public health listed in 3.3, it is best to be cautious when making claims. Use AER as one source of evidence amongst many (and involve relevant expertise; e.g., from public health and psychology).
Lay out the theoretical foundations for the task from relevant research fields such as psychology, linguistics, and sociology, and relate the opinions of relevant domain experts to the task formulation. Realize that it is impossible to capture the full emotional experience of a person.
Do not refer to some emotions as basic emotions, unless you mean to convey your belief in the Basic Emotions Theory. Careless endorsement of theories can lead to the perpetuation of belief in ideas that are actively harmful (such as suggesting we can determine internal state from outward appearance — physiognomy).
Realize that various ethical concerns, including privacy, manipulation, bias, and free speech, are further exacerbated when systems that act on individuals. Take steps such as anonymization and realizing information at aggregate levels.
Think about how the AER system can be misused, and how that can be minimized.
Use AER as one source of information among many.
Do not use AER for fully automated decision making. AER may be used to assist humans in making decisions, coming up with ideas, suggesting where to delve deeper, and sparking their imagination. Consider also the risk of the system inappropriately biasing the human decision makers.
Disclose to all stakeholders the decisions that are being made (in part or wholly) by automation. Provide mechanisms for the user to understand why relevant predictions were made, and also to contest the decisions.
Data
Examine the choice of data used by AER systems across various dimensions: size of data; whether it is custom data or data obtained from an online platform; less private/sensitive data or more private/sensitive data; what languages are represented; degree of documentation; and so on.
Expressions of emotions through language are highly variable: Different people express the same emotion differently; the same text may convey different emotions to different people. This variability should also be taken into consideration when building datasets, systems, and choosing where to deploy the systems.
Variability is common not just for emotions but also for natural language. People convey meaning in many different ways. There is usually no one “correct” way of articulating our thoughts.
Aim to obtain useful level of emotion recognition capabilities without having systematic gaps that convey a strong sense of emotion-expression normativeness.
When using language models or emotion datasets, avoid perpetuating stereotypes of how one group of people perceive another group.
Obtain data from a diverse set of sources. Report details of the sources.
When creating emotion datasets, limit the number of instances included per person. Mohammad and Kiritchenko (2018) kept one tweet for every query term and tweeter combination when studying relationships between affect categories (data also used in a shared task on emotions). Kiritchenko et al., (2020) kept at most three tweets per tweeter when studying expressions of loneliness.
Obtain annotations from a diverse set of people. Report aggregate-level demographic information of the annotators.
In emotion and language data, often there are no “correct” answers. Instead, it is a case of some answers being more appropriate than others. And there can be multiple appropriate answers.
Part of conveying that there is no one “correct” answer is to convey how the dataset is situated in many parameters, including: who annotated it, the precise annotation instructions, what data was presented to the annotators (and in what form), and when the data was annotated.
Release raw data annotations as well as any aggregations of annotations.
If using majority voting, acknowledge its limitations.
Explore statistical approaches to finding multiple appropriate labels.
Employ manual and automatic checks to determine whether the human annotations have also captures inappropriate biases. Such biases may be useful for some projects (e.g., work studying such biases), but not for others. Warn users appropriately and deploy measures to mitigate their impact.
When using any dataset, devote time and resources to study who is included in the dataset and whose voices are missing. Take corrective action as appropriate.
Keep a portion of your funding for work with marginalized communities and for work on less-researched languages.
Systems that are to be deployed to handle open-domain data should be trained on many diverse datasets and tested on many datasets that are quite different from the training datasets.
Ensure that the terms of service of the source platforms are not violated: e.g., data scraping is allowed and data redistribution is allowed (in raw form or through ids). Check the platform terms of service. Ensure compliance with the robot exclusion protocol. Take actions to anonymize data when dealing with sensitive or private data; e.g., scrub identifying information. Choose to not work with a dataset if adequate safeguards cannot be placed.
Proposals of data annotation efforts that may impact the well-being of annotators should first be submitted for approval to one’s Research Ethics Board (REB) / Institutional Research Board (IRB). The board will evaluate and provide suggestions so that the work complies with the required ethics standards.
An excellent jumping off point for further information on ethical conduct of research involving human subjects is The Belmont Report. The guiding principles they proposed are Respect for Persons, Beneficence, and Justice.
Method
Examine choice of method across various dimensions such as interpretability, privacy concerns, energy efficiency, data needs, etc . Focusing on fewer dimensions may be okay in a research system, but widely deployed systems often require a good balance across the many dimensions. AI methods tend to work well for people that are well-represented in the data (raw and annotated), but not so well for others. Documenting who is left-out is valuable. Explore alternative methods that are more inclusive. Consider how the data collection and machine learning setups can be addressed to avoid spurious correlations, especially correlations that perpetuate racism, sexism, and stereotypes.
Systems are often trained on static data from the past. However, perceptions, emotions, and behavior change with time. Consider how automatic systems may make inappropriate predictions on current data.
Consider the system deployment context to determine what levels of emotional management and meaningful consent are appropriate.
Consider the carbon footprint of your method and value efficiency as a contribution. Report costs per example, size of training set, number of hyperparameters, and budget-accuracy curves.
Impact and Evaluation
Consider whether the chosen metrics are measuring what matters.
Some methods can be unreliable or systematically inaccurate for certain groups of people, races, genders, people with health conditions, people from different countries, etc. Determine and present disaggregated accuracies. Test the system on many different datasets that explore various input characteristics.
Responsible research and product development entails actively considering various explainability strategies at the very outset of the project. This includes, where appropriate, specifically choosing an ML model that lends itself to better interpretability, running ablation and disaggregation experiments, running data perturbation and adversarial testing experiments, and so on.
When visualizing emotions, it is almost always important to not only show the broad trends but also to allow the user to drill down to the source data that is driving the trend. One can also summarize the data driving the trend, for example through treemaps of the most frequent emotion words.
Devote time and resources to identify how the system can be misused and how the system may cause harm because of it’s inherent biases and limitations. Recognize that there will be harms even when the system works “correctly”. Identify steps that can be taken to mitigate these harms.
Provide mechanisms for contestability that not only allow people to challenge the decisions made by a system about them, but also invites participation in the understanding of how machine learning systems work and it limitations.
Implications for Privacy
Privacy is not about secrecy. It is about personal choice. Follow Dr. Cavoukian’s seven principles of privacy by design.
Consider that people might not want their emotions to be inferred. Applying emotion detection systems en masse — gathering emotion information continuously, without meaningful consent, is an invasion of privacy, harmful to the individual, and dangerous to society.
Soft-biometrics also have privacy concerns. Consider implications of AER on group privacy and that a large number of people disfavour such profiling.
Obtain meaningful consent as appropriate for the context. Working with more sensitive and more private data requires a more involved consent process where the user understands the privacy concerns and willingly provides consent. Consider harm mitigation strategies such as: anonymization techniques and differential privacy. Beware that these can vary in effectiveness.
Plan for how to keep people’s information secure.
Obtain permission for secondary use or if you intend to distribute the data.
When working out the privacy–benefit tradeoffs, consider who will really benefit from the technology. Especially consider whether those who benefit are people with power or those with less power. Also, as Dr. Cavoukian says, often privacy and benefits can both be had, “it is not a zero-sum game”.
Consider implications of AER for mass surveillance and how that undermines right to privacy, right to freedom of expression, right to protest, right against self-incrimination, and right to non-discrimination.
Implications for Social Groups
When creating datasets, obtain annotations from a diverse group of people. Report aggregate-level demographic information. Rather than only labeling instances with the majority vote, consider the value of providing multiple sets of labels as per each of the relevant and key demographic groups.
When testing hypotheses or drawing inferences about language use, consider also testing the hypotheses disaggregated for each of the relevant demographic groups.
When building automatic prediction systems, evaluate and report performance disaggregated for each of the relevant demographic groups.
Consider and report the implication of the AER system on intersectionality.
Contextualize work on disaggregation: for example, by impressing on the reader that even though race is a social construct, the impact of people’s perceptions and behavior around race lead to very real-world consequences.
Obtaining demographic information requires careful and thoughtful considerations such as: whether people are providing meaningful consent to the collection of such data and whether the data being collected in a manner that respects their privacy, their autonomy (e.g., can they choose to delete their information later), and dignity (e.g., allowing self-descriptions).
Bibliography
1@inproceedings{poliak-etal-2018-hypothesis,
2 pages = {180--191},
3 doi = {10.18653/v1/S18-2023},
4 url = {https://aclanthology.org/S18-2023},
5 address = {New Orleans, Louisiana},
6 year = {2018},
7 month = {June},
8 booktitle = {Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics},
9 author = {Poliak, Adam and
10Naradowsky, Jason and
11Haldar, Aparajita and
12Rudinger, Rachel and
13Van Durme, Benjamin},
14 title = {Hypothesis Only Baselines in Natural Language Inference},
15}
16
17@article{gururangan2018annotation,
18 year = {2018},
19 journal = {arXiv preprint arXiv:1803.02324},
20 author = {Gururangan, Suchin and Swayamdipta, Swabha and Levy, Omer and Schwartz, Roy and Bowman, Samuel R and Smith, Noah A},
21 title = {Annotation artifacts in natural language inference data},
22}
23
24@misc{physiognomy_2017,
25 month = {May},
26 year = {2017},
27 author = {Arcas, Blaise and Mitchell, Margaret and Todorov, Alexander},
28 howpublished = {Medium. \url{https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a}},
29 url = {https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a},
30 title = {Physiognomy's New Clothes},
31}
32
33@misc{ongweso_2020,
34 month = {Sep},
35 year = {2020},
36 author = {Ongweso, Edward},
37 howpublished = {Vice. \url{https://www.vice.com/en/article/g5pawq/an-ai-paper-published-in-a-major-journal-dabbles-in-phrenology}},
38 title = {An {AI} Paper Published in a Major Journal Dabbles in Phrenology},
39}
40
41@article{hertzmann2020computers,
42 publisher = {ACM New York, NY, USA},
43 year = {2020},
44 pages = {45--48},
45 number = {5},
46 volume = {63},
47 journal = {Communications of the ACM},
48 author = {Hertzmann, Aaron},
49 title = {Computers do not make art, people do},
50}
51
52@inproceedings{pang-etal-2002-thumbs,
53 pages = {79--86},
54 doi = {10.3115/1118693.1118704},
55 url = {https://www.aclweb.org/anthology/W02-1011},
56 year = {2002},
57 month = {July},
58 booktitle = {Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing ({EMNLP} 2002)},
59 author = {Pang, Bo and
60Lee, Lillian and
61Vaithyanathan, Shivakumar},
62 title = {Thumbs up? Sentiment Classification using Machine Learning Techniques},
63}
64
65@inproceedings{turney-2002-thumbs,
66 pages = {417--424},
67 doi = {10.3115/1073083.1073153},
68 url = {https://www.aclweb.org/anthology/P02-1053},
69 address = {Philadelphia, Pennsylvania, USA},
70 year = {2002},
71 month = {July},
72 booktitle = {Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics},
73 author = {Turney, Peter},
74 title = {Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews},
75}
76
77@inproceedings{StanceSemEval2016,
78 address = {San Diego, California},
79 year = {2016},
80 month = {June},
81 series = {SemEval '16},
82 booktitle = {Proceedings of the International Workshop on Semantic Evaluation},
83 title = {Semeval-2016 Task 6: Detecting Stance in Tweets},
84 author = {Mohammad, Saif M. and Kiritchenko, Svetlana and Sobhani, Parinaz and Zhu, Xiaodan and Cherry, Colin},
85}
86
87@article{MohammadSK17,
88 pages = {1--23},
89 number = {3},
90 volume = {17},
91 year = {2017},
92 journal = {Special Section of the ACM Transactions on Internet Technology on Argumentation in Social Media},
93 author = {Mohammad, Saif M. and Sobhani, Parinaz and Kiritchenko, Svetlana},
94 title = {Stance and Sentiment in Tweets},
95}
96
97@article{barrett2019emotional,
98 publisher = {Sage Publications Sage CA: Los Angeles, CA},
99 year = {2019},
100 pages = {1--68},
101 number = {1},
102 volume = {20},
103 journal = {Psychological science in the public interest},
104 author = {Barrett, Lisa Feldman and Adolphs, Ralph and Marsella, Stacy and Martinez, Aleix M and Pollak, Seth D},
105 title = {Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements},
106}
107
108@inproceedings{buolamwini2018gender,
109 organization = {PMLR},
110 year = {2018},
111 pages = {77--91},
112 booktitle = {Conference on fairness, accountability and transparency},
113 author = {Buolamwini, Joy and Gebru, Timnit},
114 title = {Gender shades: Intersectional accuracy disparities in commercial gender classification},
115}
116
117@inproceedings{kiritchenko-mohammad-2018-examining,
118 pages = {43--53},
119 doi = {10.18653/v1/S18-2005},
120 url = {https://www.aclweb.org/anthology/S18-2005},
121 address = {New Orleans, Louisiana},
122 year = {2018},
123 month = {June},
124 booktitle = {Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics},
125 author = {Kiritchenko, Svetlana and
126Mohammad, Saif},
127 title = {Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems},
128}
129
130@inproceedings{rosenthal-etal-2014-semeval,
131 pages = {73--80},
132 doi = {10.3115/v1/S14-2009},
133 url = {https://aclanthology.org/S14-2009},
134 publisher = {Association for Computational Linguistics},
135 address = {Dublin, Ireland},
136 year = {2014},
137 month = {August},
138 booktitle = {Proceedings of the 8th International Workshop on Semantic Evaluation ({S}em{E}val 2014)},
139 author = {Rosenthal, Sara and
140Ritter, Alan and
141Nakov, Preslav and
142Stoyanov, Veselin},
143 title = {{S}em{E}val-2014 Task 9: Sentiment Analysis in {T}witter},
144}
145
146@inproceedings{rosenthal-etal-2015-semeval,
147 pages = {451--463},
148 doi = {10.18653/v1/S15-2078},
149 url = {https://aclanthology.org/S15-2078},
150 address = {Denver, Colorado},
151 year = {2015},
152 month = {June},
153 booktitle = {Proceedings of the 9th International Workshop on Semantic Evaluation ({S}em{E}val 2015)},
154 author = {Rosenthal, Sara and
155Nakov, Preslav and
156Kiritchenko, Svetlana and
157Mohammad, Saif and
158Ritter, Alan and
159Stoyanov, Veselin},
160 title = {{S}em{E}val-2015 Task 10: Sentiment Analysis in {T}witter},
161}
162
163@article{rottger2020hatecheck,
164 year = {2020},
165 journal = {arXiv preprint arXiv:2012.15606},
166 author = {R{\"o}ttger, Paul and Vidgen, Bertram and Nguyen, Dong and Waseem, Zeerak and Margetts, Helen and Pierrehumbert, Janet},
167 title = {Hatecheck: Functional tests for hate speech detection models},
168}
169
170@article{luo2021local,
171 year = {2021},
172 journal = {arXiv preprint arXiv:2103.11072},
173 author = {Luo, Siwen and Ivison, Hamish and Han, Caren and Poon, Josiah},
174 title = {Local Interpretations for Explainable Natural Language Processing: A Survey},
175}
176
177@inproceedings{dwibhasi2015analyzing,
178 year = {2015},
179 pages = {26--29},
180 booktitle = {Proceedings of the SAS Global Forum, Dallas, TX, USA},
181 author = {Dwibhasi, Sharat and Jami, Dheeraj and Lanka, Shivkanth and Chakraborty, Goutam},
182 title = {Analyzing and visualizing the sentiments of {E}bola outbreak via tweets},
183}
184
185@inproceedings{kucher2018visual,
186 year = {2018},
187 pages = {49--51},
188 booktitle = {EuroVis (Posters)},
189 author = {Kucher, Kostiantyn and Paradis, Carita and Kerren, Andreas},
190 title = {Visual Analysis of Sentiment and Stance in Social Media Texts.},
191}
192
193@article{denton2020bringing,
194 year = {2020},
195 journal = {arXiv preprint arXiv:2007.07399},
196 author = {Denton, Emily and Hanna, Alex and Amironesei, Razvan and Smart, Andrew and Nicole, Hilary and Scheuerman, Morgan Klaus},
197 title = {Bringing the people back in: Contesting benchmark machine learning datasets},
198}
199
200@misc{johnson_2019,
201 month = {Jul},
202 year = {2019},
203 author = {Johnson, Khari},
204 howpublished = {VentureBeat. \url{https://venturebeat.com/2019/07/17/how-ai-companies-can-avoid-ethics-washing/}},
205 title = {How {AI} companies can avoid ethics washing},
206}
207
208@inproceedings{fraser-etal-2019-feel,
209 pages = {62--71},
210 doi = {10.18653/v1/W19-1308},
211 url = {https://aclanthology.org/W19-1308},
212 address = {Minneapolis, USA},
213 year = {2019},
214 month = {June},
215 booktitle = {Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
216 author = {Fraser, Kathleen C. and
217Zeller, Frauke and
218Smith, David Harris and
219Mohammad, Saif M. and
220Rudzicz, Frank},
221 title = {How do we feel when a robot dies? Emotions expressed on {T}witter before and after hitch{BOT}{'}s destruction},
222}
223
224@article{kalluri2020don,
225 publisher = {Macmillan Publishers Ltd., London, England},
226 year = {2020},
227 pages = {169--169},
228 number = {7815},
229 volume = {583},
230 journal = {Nature},
231 author = {Kalluri, Pratyusha},
232 title = {Don't ask if {A}rtificial {I}ntelligence is good or fair, ask how it shifts power},
233}
234
235@book{monteiro2019ruined,
236 publisher = {Mule Design},
237 year = {2019},
238 author = {Monteiro, Mike},
239 title = {Ruined by design: How designers destroyed the world, and what we can do to fix it},
240}
241
242@article{trewin2019considerations,
243 publisher = {ACM New York, NY, USA},
244 year = {2019},
245 pages = {40--63},
246 number = {3},
247 volume = {5},
248 journal = {AI Matters},
249 author = {Trewin, Shari and Basson, Sara and Muller, Michael and Branham, Stacy and Treviranus, Jutta and Gruen, Daniel and Hebert, Daniel and Lyckowski, Natalia and Manser, Erich},
250 title = {Considerations for {AI} fairness for people with disabilities},
251}
252
253@misc{snow_2020,
254 month = {Feb},
255 year = {2020},
256 author = {Snow, Shane},
257 howpublished = {Linkedin. \url{https://www.linkedin.com/pulse/personality-test-may-discriminating-people-making-your-shane-snow}},
258 title = {That Personality Test May Be Discriminating People... and Making Your Company Dumber},
259}
260
261@misc{woensel_nevil_2019,
262 month = {Mar},
263 year = {2019},
264 author = {Woensel, Lieve Van and Nevil, Nissy},
265 howpublished = {European Parliamentary Research Service, PE 634.415. \url{https://www.europarl.europa.eu/RegData/etudes/ATAG/2019/634415/EPRS_ATA(2019)634415_EN.pdf}},
266 title = {What if your emotions were tracked to spy on you?},
267}
268
269@article{schaar2010privacy,
270 publisher = {Springer},
271 year = {2010},
272 pages = {267--274},
273 number = {2},
274 volume = {3},
275 journal = {Identity in the Information Society},
276 author = {Schaar, Peter},
277 title = {Privacy by design},
278}
279
280@misc{grant_2013,
281 month = {Sep},
282 year = {2013},
283 author = {Grant, Adam},
284 howpublished = {Linkedin. \url{https://www.linkedin.com/pulse/20130917155206-69244073-say-goodbye-to-mbti-the-fad-that-won-t-die}},
285 title = {Say Goodbye to {MBTI}, the Fad That Won't Die },
286}
287
288@article{floridi2014open,
289 publisher = {Springer},
290 year = {2014},
291 pages = {1--3},
292 number = {1},
293 volume = {27},
294 journal = {Philosophy \& Technology},
295 author = {Floridi, Luciano},
296 title = {Open data, data protection, and group privacy},
297}
298
299@book{picard2000affective,
300 publisher = {MIT press},
301 year = {2000},
302 author = {Picard, Rosalind W},
303 title = {Affective computing},
304}
305
306@article{Strubell_Ganesh_McCallum_2020,
307 pages = {13693-13696},
308 month = {Apr.},
309 year = {2020},
310 author = {Strubell, Emma and Ganesh, Ananya and McCallum, Andrew},
311 journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
312 number = {09},
313 abstractnote = {<p>The field of artificial intelligence has experienced a dramatic methodological shift towards large neural networks trained on plentiful data. This shift has been fueled by recent advances in hardware and techniques enabling remarkable levels of computation, resulting in impressive advances in AI across many applications. However, the massive computation required to obtain these exciting results is costly both financially, due to the price of specialized hardware and electricity or cloud compute time, and to the environment, as a result of non-renewable energy used to fuel modern tensor processing hardware. In a paper published this year at ACL, we brought this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training and tuning neural network models for NLP (Strubell, Ganesh, and McCallum 2019). In this extended abstract, we briefly summarize our findings in NLP, incorporating updated estimates and broader information from recent related publications, and provide actionable recommendations to reduce costs and improve equity in the machine learning and artificial intelligence community.</p>},
314 doi = {10.1609/aaai.v34i09.7123},
315 url = {https://ojs.aaai.org/index.php/AAAI/article/view/7123},
316 volume = {34},
317 title = {Energy and Policy Considerations for Modern Deep Learning Research},
318}
319
320@article{mcstay2020emotional,
321 publisher = {SAGE Publications Sage UK: London, England},
322 year = {2020},
323 pages = {2053951720904386},
324 number = {1},
325 volume = {7},
326 journal = {Big Data \& Society},
327 author = {McStay, Andrew},
328 title = {Emotional {AI}, soft biometrics and the surveillance of emotional life: An unusual consensus on privacy},
329}
330
331@misc{article19_2021,
332 month = {Jan},
333 year = {2021},
334 author = {ARTICLE19},
335 howpublished = {\url{https://www.article19.org/wp-content/uploads/2021/01/ER-Tech-China-Report.pdf}},
336 title = {Emotional Entanglement: China’s emotion recognition market and its implications for human rights},
337}
338
339@misc{wakefield_2021,
340 month = {May},
341 year = {2021},
342 author = {Wakefield, Jane},
343 howpublished = {BBC. \url{https://www.bbc.com/news/technology-57101248}},
344 title = {{AI} emotion-detection software tested on {U}yghurs},
345}
346
347@incollection{lindsey2015sociology,
348 publisher = {Routledge},
349 year = {2015},
350 pages = {23--48},
351 booktitle = {Gender roles},
352 author = {Lindsey, Linda L},
353 title = {The sociology of gender theoretical perspectives and feminist frameworks},
354}
355
356@book{perez2019invisible,
357 publisher = {Random House},
358 year = {2019},
359 author = {Perez, Caroline Criado},
360 title = {Invisible women: Exposing data bias in a world designed for men},
361}
362
363@article{purdie2008intersectional,
364 publisher = {Springer},
365 year = {2008},
366 pages = {377--391},
367 number = {5},
368 volume = {59},
369 journal = {Sex roles},
370 author = {Purdie-Vaughns, Valerie and Eibach, Richard P},
371 title = {Intersectional invisibility: The distinctive advantages and disadvantages of multiple subordinate-group identities},
372}
373
374@inproceedings{mitchell2019model,
375 year = {2019},
376 pages = {220--229},
377 booktitle = {Proceedings of the conference on fairness, accountability, and transparency},
378 author = {Mitchell, Margaret and Wu, Simone and Zaldivar, Andrew and Barnes, Parker and Vasserman, Lucy and Hutchinson, Ben and Spitzer, Elena and Raji, Inioluwa Deborah and Gebru, Timnit},
379 title = {Model cards for model reporting},
380}
381
382@misc{keyes_2019,
383 month = {Apr},
384 year = {2019},
385 author = {Keyes, Os},
386 howpublished = {REAL LIFE. \url{https://reallifemag.com/counting-the-countless/}},
387 title = {Counting the Countless},
388}
389
390@article{bauer2017transgender,
391 publisher = {Public Library of Science San Francisco, CA USA},
392 year = {2017},
393 pages = {e0178043},
394 number = {5},
395 volume = {12},
396 journal = {PloS one},
397 author = {Bauer, Greta R and Braimoh, Jessica and Scheim, Ayden I and Dharma, Christoffer},
398 title = {Transgender-inclusive measures of sex/gender for population surveys: Mixed-methods evaluation and recommendations},
399}
400
401@article{ekman1992there,
402 publisher = {American Psychological Association},
403 pages = {550--553},
404 number = {3},
405 volume = {99},
406 journal = {Psychological Review},
407 year = {1992},
408 author = {Ekman, Paul},
409 title = {Are there basic emotions?},
410}
411
412@article{russell2003core,
413 publisher = {American Psychological Association},
414 year = {2003},
415 pages = {145},
416 number = {1},
417 volume = {110},
418 journal = {Psychological review},
419 author = {Russell, James A},
420 title = {Core affect and the psychological construction of emotion.},
421}
422
423@article{russell1977evidence,
424 publisher = {Elsevier},
425 year = {1977},
426 pages = {273--294},
427 number = {3},
428 volume = {11},
429 journal = {Journal of research in Personality},
430 author = {Russell, James A and Mehrabian, Albert},
431 title = {Evidence for a three-factor theory of emotions},
432}
433
434@article{russell2009emotion,
435 publisher = {Taylor \& Francis},
436 year = {2009},
437 pages = {1259--1283},
438 number = {7},
439 volume = {23},
440 journal = {Cognition and emotion},
441 author = {Russell, James A},
442 title = {Emotion, core affect, and psychological construction},
443}
444
445@article{cao2021toward,
446 year = {2021},
447 pages = {1--47},
448 journal = {Computational Linguistics},
449 author = {Cao, Yang Trista and Daum{\'e}, Hal},
450 title = {Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias throughout the Machine Learning Lifecyle},
451}
452
453@book{ekman1994nature,
454 publisher = {Oxford University Press},
455 year = {1994},
456 author = {Ekman, Paul Ed and Davidson, Richard J},
457 title = {The nature of emotion: Fundamental questions.},
458}
459
460@article{gallagher2021generalized,
461 publisher = {Springer Berlin Heidelberg},
462 year = {2021},
463 pages = {4},
464 number = {1},
465 volume = {10},
466 journal = {EPJ Data Science},
467 author = {Gallagher, Ryan J and Frank, Morgan R and Mitchell, Lewis and Schwartz, Aaron J and Reagan, Andrew J and Danforth, Christopher M and Dodds, Peter Sheridan},
468 title = {Generalized word shift graphs: A method for visualizing and explaining pairwise comparisons between texts},
469}
470
471@article{mulligan2019shaping,
472 year = {2019},
473 journal = {Available at SSRN 3311894},
474 author = {Mulligan, Deirdre K and Kluttz, Daniel and Kohli, Nitin},
475 title = {Shaping our tools: Contestability as a means to promote responsible algorithmic decision making in the professions},
476}
477
478@misc{what_if_2018,
479 month = {Sep},
480 year = {2018},
481 author = {Google},
482 howpublished = {Google {AI} Blog. \url{https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html}},
483}
484
485@article{zhang2018deep,
486 publisher = {Wiley Online Library},
487 year = {2018},
488 pages = {e1253},
489 number = {4},
490 volume = {8},
491 journal = {Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
492 author = {Zhang, Lei and Wang, Shuai and Liu, Bing},
493 title = {Deep learning for sentiment analysis: A survey},
494}
495
496@article{soleymani2017survey,
497 publisher = {Elsevier},
498 year = {2017},
499 pages = {3--14},
500 volume = {65},
501 journal = {Image and Vision Computing},
502 author = {Soleymani, Mohammad and Garcia, David and Jou, Brendan and Schuller, Bj{\"o}rn and Chang, Shih-Fu and Pantic, Maja},
503 title = {A survey of multimodal sentiment analysis},
504}
505
506@article{guntuku2019studying,
507 publisher = {British Medical Journal Publishing Group},
508 year = {2019},
509 pages = {e030355},
510 number = {11},
511 volume = {9},
512 journal = {BMJ open},
513 author = {Guntuku, Sharath Chandra and Schneider, Rachelle and Pelullo, Arthur and Young, Jami and Wong, Vivien and Ungar, Lyle and Polsky, Daniel and Volpp, Kevin G and Merchant, Raina},
514 title = {Studying expressions of loneliness in individuals using {T}witter: an observational study},
515}
516
517@inproceedings{kiritchenko-etal-2020-solo,
518 isbn = {979-10-95546-34-4},
519 language = {English},
520 pages = {1567--1577},
521 url = {https://aclanthology.org/2020.lrec-1.195},
522 address = {Marseille, France},
523 year = {2020},
524 month = {May},
525 booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
526 author = {Kiritchenko, Svetlana and
527Hipson, Will and
528Coplan, Robert and
529Mohammad, Saif M.},
530 title = {{SOLO}: A Corpus of Tweets for Examining the State of Being Alone},
531}
532
533@inproceedings{de2013predicting,
534 year = {2013},
535 pages = {128--137},
536 booktitle = {Seventh international AAAI conference on weblogs and social media},
537 author = {De Choudhury, Munmun and Gamon, Michael and Counts, Scott and Horvitz, Eric},
538 title = {Predicting depression via social media},
539}
540
541@inproceedings{resnik-etal-2015-beyond,
542 pages = {99--107},
543 doi = {10.3115/v1/W15-1212},
544 url = {https://aclanthology.org/W15-1212},
545 address = {Denver, Colorado},
546 year = {2015},
547 month = {June 5},
548 booktitle = {Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality},
549 author = {Resnik, Philip and
550Armstrong, William and
551Claudino, Leonardo and
552Nguyen, Thang and
553Nguyen, Viet-An and
554Boyd-Graber, Jordan},
555 title = {Beyond {LDA}: Exploring Supervised Topic Modeling for Depression-Related Language in {T}witter},
556}
557
558@inproceedings{macavaney-etal-2021-community,
559 pages = {70--80},
560 doi = {10.18653/v1/2021.clpsych-1.7},
561 url = {https://aclanthology.org/2021.clpsych-1.7},
562 publisher = {Association for Computational Linguistics},
563 address = {Online},
564 year = {2021},
565 month = {June},
566 booktitle = {Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access},
567 author = {MacAvaney, Sean and
568Mittu, Anjali and
569Coppersmith, Glen and
570Leintz, Jeff and
571Resnik, Philip},
572 title = {Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the {CLP}sych 2021 Shared Task},
573}
574
575@inproceedings{karam2014ecologically,
576 organization = {IEEE},
577 year = {2014},
578 pages = {4858--4862},
579 booktitle = {2014 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
580 author = {Karam, Zahi N and Provost, Emily Mower and Singh, Satinder and Montgomery, Jennifer and Archer, Christopher and Harrington, Gloria and Mcinnis, Melvin G},
581 title = {Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech},
582}
583
584@article{eichstaedt2015psychological,
585 publisher = {Sage Publications Sage CA: Los Angeles, CA},
586 year = {2015},
587 pages = {159--169},
588 number = {2},
589 volume = {26},
590 journal = {Psychological science},
591 author = {Eichstaedt, Johannes C and Schwartz, Hansen Andrew and Kern, Margaret L and Park, Gregory and Labarthe, Darwin R and Merchant, Raina M and Jha, Sneha and Agrawal, Megha and Dziurzynski, Lukasz A and Sap, Maarten and others},
592 title = {Psychological language on {T}witter predicts county-level heart disease mortality},
593}
594
595@book{barrett2017emotions,
596 publisher = {Houghton Mifflin Harcourt},
597 year = {2017},
598 author = {Barrett, Lisa Feldman},
599 title = {How emotions are made: The secret life of the brain},
600}
601
602@article{barrett2017theory,
603 publisher = {Oxford University Press},
604 year = {2017},
605 pages = {1--23},
606 number = {1},
607 volume = {12},
608 journal = {Social cognitive and affective neuroscience},
609 author = {Barrett, Lisa Feldman},
610 title = {The theory of constructed emotion: an active inference account of interoception and categorization},
611}
612
613@book{osgood1957measurement,
614 publisher = {University of Illinois press},
615 year = {1957},
616 number = {47},
617 author = {Osgood, Charles Egerton and Suci, George J and Tannenbaum, Percy H},
618 title = {The measurement of meaning},
619}
620
621@article{russell1980circumplex,
622 publisher = {American Psychological Association},
623 year = {1980},
624 pages = {1161},
625 number = {6},
626 volume = {39},
627 journal = {Journal of personality and social psychology},
628 author = {Russell, James A},
629 title = {A circumplex model of affect.},
630}
631
632@book{scherer1999appraisal,
633 publisher = {John Wiley \& Sons Ltd},
634 year = {1999},
635 author = {Scherer, Klaus R},
636 title = {Appraisal theory.},
637}
638
639@article{lazarus1991progress,
640 publisher = {American Psychological Association},
641 year = {1991},
642 pages = {819},
643 number = {8},
644 volume = {46},
645 journal = {American psychologist},
646 author = {Lazarus, Richard S},
647 title = {Progress on a cognitive-motivational-relational theory of emotion.},
648}
649
650@article{harris1954distributional,
651 publisher = {Taylor \& Francis},
652 year = {1954},
653 pages = {146--162},
654 number = {2-3},
655 volume = {10},
656 journal = {Word},
657 author = {Harris, Zellig S},
658 title = {Distributional structure},
659}
660
661@book{chomsky2014aspects,
662 publisher = {MIT press},
663 year = {2014},
664 volume = {11},
665 author = {Chomsky, Noam},
666 title = {Aspects of the Theory of Syntax},
667}
668
669@incollection{ervin1973some,
670 publisher = {Elsevier},
671 year = {1973},
672 pages = {261--286},
673 booktitle = {Cognitive Development and Acquisition of Language},
674 author = {Ervin-Tripp, Susan},
675 title = {Some strategies for the first two years},
676}
677
678@inproceedings{bisk-etal-2020-experience,
679 pages = {8718--8735},
680 doi = {10.18653/v1/2020.emnlp-main.703},
681 url = {https://aclanthology.org/2020.emnlp-main.703},
682 publisher = {Association for Computational Linguistics},
683 address = {Online},
684 year = {2020},
685 month = {November},
686 booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
687 author = {Bisk, Yonatan and
688Holtzman, Ari and
689Thomason, Jesse and
690Andreas, Jacob and
691Bengio, Yoshua and
692Chai, Joyce and
693Lapata, Mirella and
694Lazaridou, Angeliki and
695May, Jonathan and
696Nisnevich, Aleksandr and
697Pinto, Nicolas and
698Turian, Joseph},
699 title = {Experience Grounds Language},
700}
701
702@inproceedings{bender-koller-2020-climbing,
703 pages = {5185--5198},
704 doi = {10.18653/v1/2020.acl-main.463},
705 url = {https://aclanthology.org/2020.acl-main.463},
706 publisher = {Association for Computational Linguistics},
707 address = {Online},
708 year = {2020},
709 month = {July},
710 booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
711 author = {Bender, Emily M. and
712Koller, Alexander},
713 title = {Climbing towards {NLU}: {On} Meaning, Form, and Understanding in the Age of Data},
714}
715
716@inproceedings{hovy-yang-2021-importance,
717 pages = {588--602},
718 doi = {10.18653/v1/2021.naacl-main.49},
719 url = {https://aclanthology.org/2021.naacl-main.49},
720 publisher = {Association for Computational Linguistics},
721 address = {Online},
722 year = {2021},
723 month = {June},
724 booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
725 author = {Hovy, Dirk and
726Yang, Diyi},
727 title = {The Importance of Modeling Social Factors of Language: Theory and Practice},
728}
729
730@article{goerlich2018multifaceted,
731 publisher = {Frontiers},
732 year = {2018},
733 pages = {1614},
734 volume = {9},
735 journal = {Frontiers in psychology},
736 author = {Goerlich, Katharina S},
737 title = {The multifaceted nature of alexithymia--a neuroscientific perspective},
738}
739
740@article{bagby1994twenty,
741 publisher = {Elsevier},
742 year = {1994},
743 pages = {23--32},
744 number = {1},
745 volume = {38},
746 journal = {Journal of psychosomatic research},
747 author = {Bagby, R Michael and Parker, James DA and Taylor, Graeme J},
748 title = {The twenty-item {T}oronto {A}lexithymia Scale: I. Item selection and cross-validation of the factor structure},
749}
750
751@article{baron2001autism,
752 publisher = {Springer},
753 year = {2001},
754 pages = {5--17},
755 number = {1},
756 volume = {31},
757 journal = {Journal of autism and developmental disorders},
758 author = {Baron-Cohen, Simon and Wheelwright, Sally and Skinner, Richard and Martin, Joanne and Clubley, Emma},
759 title = {The autism-spectrum quotient (AQ): Evidence from asperger syndrome/high-functioning autism, malesand females, scientists and mathematicians},
760}
761
762@article{spinuzzi2005methodology,
763 publisher = {Society for Technical Communication},
764 year = {2005},
765 pages = {163--174},
766 number = {2},
767 volume = {52},
768 journal = {Technical communication},
769 author = {Spinuzzi, Clay},
770 title = {The methodology of participatory design},
771}
772
773@incollection{humphries2020arguments,
774 publisher = {Routledge},
775 year = {2020},
776 pages = {3--23},
777 booktitle = {Research and inequality},
778 author = {Humphries, Beth and Mertens, Donna M and Truman, Carole},
779 title = {Arguments for an ‘emancipatory’research paradigm},
780}
781
782@inproceedings{noel2016promoting,
783 year = {2016},
784 pages = {27--30},
785 address = {Brighton, United Kingdom},
786 booktitle = {Proceedings of DRS2016 International Conference, Vol. 6: Future–Focused Thinking},
787 author = {Noel, Lesley-Ann},
788 title = {Promoting an emancipatory research paradigm in design education and practice},
789}
790
791@article{oliver1997emancipatory,
792 publisher = {Citeseer},
793 year = {1997},
794 pages = {15--31},
795 volume = {2},
796 journal = {Doing disability research},
797 author = {Oliver, Michael},
798 title = {Emancipatory research: Realistic goal or impossible dream},
799}
800
801@article{stone1996parasites,
802 publisher = {JSTOR},
803 year = {1996},
804 pages = {699--716},
805 journal = {British journal of sociology},
806 author = {Stone, Emma and Priestley, Mark},
807 title = {Parasites, pawns and partners: Disability research and the role of non-disabled researchers},
808}
809
810@article{seale2015negotiating,
811 publisher = {Taylor \& Francis},
812 year = {2015},
813 pages = {483--497},
814 number = {4},
815 volume = {28},
816 journal = {Innovation: The European Journal of Social Science Research},
817 author = {Seale, Jane and Nind, Melanie and Tilley, Liz and Chapman, Rohhss},
818 title = {Negotiating a third space for participatory research with people with learning disabilities: An examination of boundaries and spatial practices},
819}
820
821@article{hall2014not,
822 publisher = {Taylor \& Francis},
823 year = {2014},
824 pages = {376--389},
825 number = {4},
826 volume = {37},
827 journal = {International Journal of Research \& Method in Education},
828 author = {Hall, Lisa},
829 title = {‘{W}ith’not ‘about’: {E}merging paradigms for research in a cross-cultural space},
830}
831
832@article{fletcher2019making,
833 publisher = {SAGE Publications Sage UK: London, England},
834 year = {2019},
835 pages = {943--953},
836 number = {4},
837 volume = {23},
838 journal = {Autism},
839 author = {Fletcher-Watson, Sue and Adams, Jon and Brook, Kabie and Charman, Tony and Crane, Laura and Cusack, James and Leekam, Susan and Milton, Damian and Parr, Jeremy R and Pellicano, Elizabeth},
840 title = {Making the future together: Shaping autism research through meaningful participation},
841}
842
843@article{bertilsdotter2019doing,
844 publisher = {Taylor \& Francis},
845 year = {2019},
846 pages = {1082--1101},
847 number = {7-8},
848 volume = {34},
849 journal = {Disability \& Society},
850 author = {Bertilsdotter Rosqvist, Hanna and Kourti, Marianthi and Jackson-Perry, David and Brownlow, Charlotte and Fletcher, Kirsty and Bendelman, Daniel and O'Dell, Lindsay},
851 title = {Doing it differently: Emancipatory autism studies within a neurodiverse academic space},
852}
853
854@article{brosnan2017beyond,
855 publisher = {Emerald Publishing Limited},
856 year = {2017},
857 journal = {Journal of Enabling Technologies},
858 author = {Brosnan, Mark and Holt, Samantha and Yuill, Nicola and Good, Judith and Parsons, Sarah},
859 title = {Beyond autism and technology: Lessons from neurodiverse populations},
860}
861
862@inproceedings{10.1007/978-3-030-25629-6_42,
863 isbn = {978-3-030-25629-6},
864 pages = {268--274},
865 address = {Cham},
866 publisher = {Springer International Publishing},
867 year = {2020},
868 booktitle = {Human Interaction and Emerging Technologies},
869 title = {Designing Technologies for Neurodiverse Users: Considerations from Research Practice},
870 editor = {Ahram, Tareq
871and Taiar, Redha
872and Colson, Serge
873and Choplin, Arnaud},
874 author = {Motti, Vivian Genaro
875and Evmenova, Anna},
876}
877
878@article{Boyle95,
879 doi = {10.1111/j.1742-9544.1995.tb01750.x},
880 journal = {Humanities \& Social Sciences papers},
881 volume = {30},
882 title = {Myers--{B}riggs Type Indicator ({MBTI}): Some psychometric limitations},
883 pages = {},
884 month = {03},
885 year = {1995},
886 author = {Boyle, Gregory J.},
887}
888
889@article{gerras2016moving,
890 year = {2016},
891 journal = {Military review},
892 author = {Gerras, Stephen J and Wong, Leonard},
893 title = {Moving beyond the {MBTI}},
894}
895
896@misc{dickson_2018,
897 month = {Jul},
898 year = {2018},
899 author = {Dickson, Ben},
900 howpublished = {PC Magazine. \url{https://www.pcmag.com/opinions/why-ai-must-disclose-that-its-ai}},
901 title = {Why {AI} Must Disclose That It's {AI} },
902}
903
904@inproceedings{de2020should,
905 organization = {Springer},
906 year = {2020},
907 pages = {3--15},
908 booktitle = {International Workshop on Chatbot Research and Design},
909 author = {De Cicco, Roberta and Palumbo, Riccardo and others},
910 title = {Should a Chatbot Disclose Itself? {I}mplications for an Online Conversational Retailer},
911}
912
913@inproceedings{bender2021dangers,
914 year = {2021},
915 pages = {610--623},
916 booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
917 author = {Bender, Emily M and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret},
918 title = {On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?},
919}
920
921@article{Mohammad13,
922 year = {2013},
923 volume = {29},
924 title = {Crowdsourcing a Word-Emotion Association Lexicon},
925 pages = {436--465},
926 number = {3},
927 journal = {Computational Intelligence},
928 author = {Mohammad, Saif M. and Turney, Peter D.},
929}
930
931@inproceedings{vad-acl2018,
932 address = {Melbourne, Australia},
933 year = {2018},
934 booktitle = {Proceedings of The Annual Conference of the Association for Computational Linguistics (ACL)},
935 author = {Mohammad, Saif M.},
936 title = {Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 {E}nglish Words},
937}
938
939@article{mohammad2020practical,
940 journal = {arXiv:2011.03492},
941 year = {2020},
942 author = {Saif M. Mohammad},
943 title = {Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons},
944}
945
946@article{auxier2021social,
947 year = {2021},
948 journal = {Pew Research Center},
949 author = {Auxier, Brooke and Anderson, Monica},
950 title = {Social media use in 2021},
951}
952
953@inproceedings{mohammad-kiritchenko-2018-understanding,
954 url = {https://aclanthology.org/L18-1030},
955 address = {Miyazaki, Japan},
956 year = {2018},
957 month = {May},
958 booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)},
959 author = {Mohammad, Saif and
960Kiritchenko, Svetlana},
961 title = {Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories},
962}
963
964@inproceedings{mohammad-etal-2018-semeval,
965 pages = {1--17},
966 doi = {10.18653/v1/S18-1001},
967 url = {https://aclanthology.org/S18-1001},
968 address = {New Orleans, Louisiana},
969 year = {2018},
970 month = {June},
971 booktitle = {Proceedings of The 12th International Workshop on Semantic Evaluation},
972 author = {Mohammad, Saif and
973Bravo-Marquez, Felipe and
974Salameh, Mohammad and
975Kiritchenko, Svetlana},
976 title = {{S}em{E}val-2018 Task 1: Affect in Tweets},
977}
978
979@article{aroyo2015truth,
980 year = {2015},
981 pages = {15--24},
982 number = {1},
983 volume = {36},
984 journal = {AI Magazine},
985 author = {Aroyo, Lora and Welty, Chris},
986 title = {Truth is a lie: Crowd truth and the seven myths of human annotation},
987}
988
989@inproceedings{checco2017let,
990 year = {2017},
991 pages = {11--20},
992 booktitle = {Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing},
993 author = {Checco, Alessandro and Roitero, Kevin and Maddalena, Eddy and Mizzaro, Stefano and Demartini, Gianluca},
994 title = {Let's agree to disagree: Fixing agreement measures for crowdsourcing},
995}
996
997@inproceedings{klenner2020harmonization,
998 address = {Winterthur},
999 booktitle = {Proceedings of the 5th Swiss Text Analytics Conference (SwissText) \& 16th Conference on Natural Language Processing (KONVENS)},
1000 year = {2020},
1001 author = {Klenner, Manfred and G{\"o}hring, Anne and Amsler, Michael and Ebling, Sarah and Tuggener, Don and H{\"u}rlimann, Manuela and Volk, Martin},
1002 title = {Harmonization sometimes harms},
1003}
1004
1005@inproceedings{basile2020s,
1006 organization = {CEUR-WS},
1007 year = {2020},
1008 pages = {31--40},
1009 volume = {2776},
1010 booktitle = {2020 AIxIA Discussion Papers Workshop, AIxIA 2020 DP},
1011 author = {Basile, Valerio},
1012 title = {It’s the End of the Gold Standard as we Know it. On the Impact of Pre-aggregation on the Evaluation of Highly Subjective Tasks},
1013}
1014
1015@misc{ruder_2020,
1016 month = {Aug},
1017 year = {2020},
1018 howpublished = {\url{https://ruder.io/nlp-beyond-english/index.html}},
1019 author = {Ruder, Sebastian Ruder Sebastian},
1020 title = {Why You Should Do NLP Beyond {E}nglish},
1021}
1022
1023@inproceedings{mozafari2020chatbot,
1024 year = {2020},
1025 pages = {1--18},
1026 booktitle = {ICIS},
1027 author = {Mozafari, Nika and Weiger, Welf H and Hammerschmidt, Maik},
1028 title = {The Chatbot Disclosure Dilemma: Desirable and Undesirable Effects of Disclosing the Non-Human Identity of Chatbots.},
1029}
1030
1031@inproceedings{thaine-penn-2021-chinese,
1032 pages = {3512--3521},
1033 url = {https://aclanthology.org/2021.eacl-main.306},
1034 address = {Online},
1035 year = {2021},
1036 month = {April},
1037 booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume},
1038 author = {Thaine, Patricia and
1039Penn, Gerald},
1040 title = {The {C}hinese Remainder Theorem for Compact, Task-Precise, Efficient and Secure Word Embeddings},
1041}
1042
1043@article{shmueli2021beyond,
1044 year = {2021},
1045 journal = {arXiv preprint arXiv:2104.10097},
1046 author = {Shmueli, Boaz and Fell, Jan and Ray, Soumya and Ku, Lun-Wei},
1047 title = {Beyond fair pay: Ethical implications of NLP crowdsourcing},
1048}
1049
1050@article{dolmaya2011ethics,
1051 year = {2011},
1052 number = {10},
1053 journal = {Linguistica Antverpiensia, New Series--Themes in Translation Studies},
1054 author = {Dolmaya, Julie McDonough},
1055 title = {The ethics of crowdsourcing},
1056}
1057
1058@article{agrawal2016analyzing,
1059 year = {2016},
1060 journal = {arXiv preprint arXiv:1606.07356},
1061 author = {Agrawal, Aishwarya and Batra, Dhruv and Parikh, Devi},
1062 title = {Analyzing the behavior of visual question answering models},
1063}
1064
1065@inproceedings{bissoto2020debiasing,
1066 year = {2020},
1067 pages = {740--741},
1068 booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
1069 author = {Bissoto, Alceu and Valle, Eduardo and Avila, Sandra},
1070 title = {Debiasing skin lesion datasets and models? not so fast},
1071}
1072
1073@article{winkler2019association,
1074 publisher = {American Medical Association},
1075 year = {2019},
1076 pages = {1135--1141},
1077 number = {10},
1078 volume = {155},
1079 journal = {JAMA dermatology},
1080 author = {Winkler, Julia K and Fink, Christine and Toberer, Ferdinand and Enk, Alexander and Deinlein, Teresa and Hofmann-Wellenhof, Rainer and Thomas, Luc and Lallas, Aimilios and Blum, Andreas and Stolz, Wilhelm and others},
1081 title = {Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition},
1082}
1083
1084@article{hollenstein2015time,
1085 publisher = {Sage Publications Sage UK: London, England},
1086 year = {2015},
1087 pages = {308--315},
1088 number = {4},
1089 volume = {7},
1090 journal = {Emotion Review},
1091 author = {Hollenstein, Tom},
1092 title = {This time, it’s real: Affective flexibility, time scales, feedback loops, and the regulation of emotion},
1093}
1094
1095@article{hipson2021emotion,
1096 pages = {1-19},
1097 url = {https://doi.org/10.1371/journal.pone.0256153},
1098 volume = {16},
1099 month = {09},
1100 year = {2021},
1101 title = {Emotion dynamics in movie dialogues},
1102 publisher = {Public Library of Science},
1103 journal = {PLOS ONE},
1104 author = {Hipson, Will E. AND Mohammad, Saif M.},
1105 doi = {10.1371/journal.pone.0256153},
1106}
1107
1108@article{schwartz2020green,
1109 publisher = {ACM New York, NY, USA},
1110 year = {2020},
1111 pages = {54--63},
1112 number = {12},
1113 volume = {63},
1114 journal = {Communications of the ACM},
1115 author = {Schwartz, Roy and Dodge, Jesse and Smith, Noah A and Etzioni, Oren},
1116 title = {Green {AI}},
1117}
1118
1119@misc{ai2_2019,
1120 month = {Jul},
1121 year = {2019},
1122 author = {AI2},
1123 howpublished = {Medium. \url{https://medium.com/ai2-blog/crowdsourcing-pricing-ethics-and-best-practices-8487fd5c9872}},
1124 title = {Crowdsourcing: Pricing Ethics and Best Practices},
1125}
1126
1127@article{fort-etal-2011-last,
1128 pages = {413--420},
1129 doi = {10.1162/COLI_a_00057},
1130 url = {https://aclanthology.org/J11-2010},
1131 year = {2011},
1132 number = {2},
1133 volume = {37},
1134 journal = {Computational Linguistics},
1135 author = {Fort, Kar{\"e}n and
1136Adda, Gilles and
1137Cohen, K. Bretonnel},
1138 title = {Last Words: {A}mazon {M}echanical {T}urk: Gold Mine or Coal Mine?},
1139}
1140
1141@article{standing2018ethical,
1142 publisher = {Wiley Online Library},
1143 year = {2018},
1144 pages = {72--80},
1145 number = {1},
1146 volume = {27},
1147 journal = {Business Ethics: A European Review},
1148 author = {Standing, Susan and Standing, Craig},
1149 title = {The ethical use of crowdsourcing},
1150}
1151
1152@inproceedings{irani2013turkopticon,
1153 year = {2013},
1154 pages = {611--620},
1155 booktitle = {Proceedings of the SIGCHI conference on human factors in computing systems},
1156 author = {Irani, Lilly C and Silberman, M Six},
1157 title = {Turkopticon: Interrupting worker invisibility in {A}mazon {M}echanical {T}urk},
1158}
1159
1160@article{COBBCLARK201211,
1161 keywords = {Non-cognitive skills, Big-five personality traits, Stability, Wages},
1162 author = {Deborah A. Cobb-Clark and Stefanie Schurer},
1163 url = {https://www.sciencedirect.com/science/article/pii/S0165176511004666},
1164 doi = {https://doi.org/10.1016/j.econlet.2011.11.015},
1165 issn = {0165-1765},
1166 year = {2012},
1167 pages = {11-15},
1168 number = {1},
1169 volume = {115},
1170 journal = {Economics Letters},
1171 title = {The stability of big-five personality traits},
1172}
1173
1174@article{yu2018artificial,
1175 publisher = {Nature Publishing Group},
1176 year = {2018},
1177 pages = {719--731},
1178 number = {10},
1179 volume = {2},
1180 journal = {Nature biomedical engineering},
1181 author = {Yu, Kun-Hsing and Beam, Andrew L and Kohane, Isaac S},
1182 title = {Artificial {I}ntelligence in healthcare},
1183}
1184
1185@article{lysaght2019ai,
1186 publisher = {Springer},
1187 year = {2019},
1188 pages = {299--314},
1189 number = {3},
1190 volume = {11},
1191 journal = {Asian Bioethics Review},
1192 author = {Lysaght, Tamra and Lim, Hannah Yeefen and Xafis, Vicki and Ngiam, Kee Yuan},
1193 title = {{AI}-assisted decision-making in healthcare},
1194}
1195
1196@book{panesar2019machine,
1197 publisher = {Springer},
1198 year = {2019},
1199 author = {Panesar, Arjun},
1200 title = {Machine learning and {AI} for healthcare},
1201}
1202
1203@techreport{Born21,
1204 year = {2021},
1205 author = {Born, Georgina and Morris, Jeremy and Diaz, Fernando and Anderson, Ashton},
1206 title = {Artificial {I}ntelligence, Music Recommensation, and the Curation of Culture},
1207}
1208
1209@article{srinivasan2021role,
1210 year = {2021},
1211 author = {Srinivasan, Ramya and Uchino, Kanji},
1212 title = {The Role of Arts in Shaping {AI} Ethics},
1213}
1214
1215@inproceedings{schwartz2013characterizing,
1216 year = {2013},
1217 pages = {583--591},
1218 booktitle = {Seventh International AAAI Conference on Weblogs and Social Media},
1219 author = {Schwartz, Hansen Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Lucas, Richard E and Agrawal, Megha and Park, Gregory J and Lakshmikanth, Shrinidhi K and Jha, Sneha and Seligman, Martin EP and others},
1220 title = {Characterizing geographic variation in well-being using tweets},
1221}
1222
1223@inproceedings{nielsen2011new,
1224 year = {2011},
1225 address = {Heraklion, Crete},
1226 pages = {93--98},
1227 booktitle = {Proceedings of the ESWC Workshop on `Making Sense of Microposts': Big things come in small packages},
1228 author = {Nielsen, Finn {\AA}rup},
1229 title = {A new {ANEW}: Evaluation of a word list for sentiment analysis in microblogs},
1230}
1231
1232@inproceedings{10.1145/3287560.3287587,
1233 series = {FAT* '19},
1234 location = {Atlanta, GA, USA},
1235 keywords = {ethics, machine learning, social media, mental health, algorithms},
1236 numpages = {10},
1237 pages = {79–88},
1238 booktitle = {Proceedings of the Conference on Fairness, Accountability, and Transparency},
1239 doi = {10.1145/3287560.3287587},
1240 url = {https://doi.org/10.1145/3287560.3287587},
1241 address = {New York, NY, USA},
1242 publisher = {Association for Computing Machinery},
1243 isbn = {9781450361255},
1244 year = {2019},
1245 title = {A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media},
1246 author = {Chancellor, Stevie and Birnbaum, Michael L. and Caine, Eric D. and Silenzio, Vincent M. B. and De Choudhury, Munmun},
1247}
1248
1249@book{chomsky1975reflections,
1250 year = {1975},
1251 publisher = {Pantheon},
1252 author = {Chomsky, Noam},
1253 title = {Reflections on language},
1254}
1255
1256@book{lakoff2008women,
1257 publisher = {University of Chicago press},
1258 year = {2008},
1259 author = {Lakoff, George},
1260 title = {Women, fire, and dangerous things: What categories reveal about the mind},
1261}
1262
1263@book{pinker2007stuff,
1264 publisher = {Penguin},
1265 year = {2007},
1266 author = {Pinker, Steven},
1267 title = {The stuff of thought: Language as a window into human nature},
1268}
1269
1270@incollection{mohammad2020survey,
1271 abstract = {Recent advances in machine learning have led to computer systems that are humanlike in behavior. Sentiment analysis, the automatic determination of emotions in text, is allowing us to capitalize on substantial previously unattainable opportunities in commerce, public health, government policy, social sciences, and art. Further, analysis of emotions in text, from news to social media posts, is improving our understanding of not just how people convey emotions through language but also how emotions shape our behavior. This article presents a sweeping overview of sentiment analysis research that includes: the origins of the field, the rich landscape of tasks, challenges, a survey of the methods and resources used, and applications. We also discuss how, without careful fore-thought, sentiment analysis has the potential for harmful outcomes. We outline the latest lines of research in pursuit of fairness in sentiment analysis.},
1272 keywords = {Sentiment analysis, Emotions, Artificial intelligence, Machine learning, Natural language processing (NLP), Social media, Emotion lexicons, Fairness in NLP},
1273 author = {Saif M. Mohammad},
1274 url = {https://www.sciencedirect.com/science/article/pii/B9780128211243000119},
1275 doi = {https://doi.org/10.1016/B978-0-12-821124-3.00011-9},
1276 isbn = {978-0-12-821125-0},
1277 year = {2021},
1278 pages = {323-379},
1279 edition = {Second Edition},
1280 publisher = {Woodhead Publishing},
1281 booktitle = {Emotion Measurement (Second Edition)},
1282 editor = {Herbert L. Meiselman},
1283 title = {Sentiment analysis: Automatically detecting valence, emotions, and other affectual states from text},
1284}
1285
1286@article{bamberg1997language,
1287 publisher = {Elsevier},
1288 year = {1997},
1289 pages = {309--340},
1290 number = {4},
1291 volume = {19},
1292 journal = {Language sciences},
1293 author = {Bamberg, Michael},
1294 title = {Language, concepts and emotions: The role of language in the construction of emotions},
1295}
1296
1297@article{wiebe2005annotating,
1298 publisher = {Springer},
1299 year = {2005},
1300 pages = {165--210},
1301 number = {2},
1302 volume = {39},
1303 journal = {Language resources and evaluation},
1304 author = {Wiebe, Janyce and Wilson, Theresa and Cardie, Claire},
1305 title = {Annotating expressions of opinions and emotions in language},
1306}
1307
1308@article{tausczik2010psychological,
1309 publisher = {Sage Publications Sage CA: Los Angeles, CA},
1310 year = {2010},
1311 pages = {24--54},
1312 number = {1},
1313 volume = {29},
1314 journal = {Journal of language and social psychology},
1315 author = {Tausczik, Yla R and Pennebaker, James W},
1316 title = {The psychological meaning of words: {LIWC} and computerized text analysis methods},
1317}
1318
1319@inproceedings{mohammad-2012-emotional,
1320 pages = {246--255},
1321 url = {https://aclanthology.org/S12-1033},
1322 address = {Montr{\'e}al, Canada},
1323 year = {2012},
1324 month = {7-8 June},
1325 booktitle = {*{SEM} 2012: The First Joint Conference on Lexical and Computational Semantics {--} Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation ({S}em{E}val 2012)},
1326 author = {Mohammad, Saif},
1327 title = {{\#E}motional Tweets},
1328}
1329
1330@article{paul2011you,
1331 year = {2011},
1332 pages = {265--272},
1333 number = {1},
1334 volume = {5},
1335 journal = {Proceedings of the Fifth international AAAI conference on weblogs and social media},
1336 author = {Paul, Michael J and Dredze, Mark},
1337 title = {You are what you tweet: Analyzing {T}witter for public health},
1338}
1339
1340@inproceedings{mohammad-2011-upon,
1341 pages = {105--114},
1342 url = {https://aclanthology.org/W11-1514},
1343 address = {Portland, OR, USA},
1344 year = {2011},
1345 month = {June},
1346 booktitle = {Proceedings of the 5th {ACL}-{HLT} Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities},
1347 author = {Mohammad, Saif},
1348 title = {From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales},
1349}
1350
1351@inproceedings{10.1145/2145204.2145347,
1352 series = {CSCW '12},
1353 location = {Seattle, Washington, USA},
1354 keywords = {twitter, community, emotion, psychology},
1355 numpages = {4},
1356 pages = {965–968},
1357 booktitle = {Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work},
1358 doi = {10.1145/2145204.2145347},
1359 url = {https://doi.org/10.1145/2145204.2145347},
1360 address = {New York, NY, USA},
1361 publisher = {Association for Computing Machinery},
1362 isbn = {9781450310864},
1363 year = {2012},
1364 title = {Tracking "{G}ross Community Happiness" from Tweets},
1365 author = {Quercia, Daniele and Ellis, Jonathan and Capra, Licia and Crowcroft, Jon},
1366}
1367
1368@article{akmajian,
1369 pages = {124--126},
1370 number = {1},
1371 volume = {1},
1372 journal = {Linguistic Inquiry},
1373 title = {Coreferentiality and Stress},
1374 year = {1970},
1375 author = {Akmajian, Adrian and Ray Jackendoff},
1376}
1377
1378@inproceedings{mohammad2021ethics,
1379 address = {Dublin, Ireland},
1380 month = {May},
1381 booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
1382 year = {2021},
1383 author = {Saif M. Mohammad},
1384 title = {Ethics Sheets for {AI} Tasks},
1385}
1386
1387@article{woods,
1388 pages = {591--606},
1389 number = {10},
1390 volume = {13},
1391 journal = {Communications of the {ACM}},
1392 title = {Transition Network Grammars for Natural
1393Language Analysis},
1394 year = {1970},
1395 author = {Woods, William A.},
1396}
1397
1398@article{harley2016measuring,
1399 publisher = {Elsevier},
1400 year = {2016},
1401 pages = {89--114},
1402 journal = {Emotions, technology, design, and learning},
1403 author = {Harley, Jason Matthew},
1404 title = {Measuring emotions: a survey of cutting edge methodologies used in computer-based learning environment research},
1405}
1406
1407@inproceedings{hasan2014using,
1408 year = {2014},
1409 booktitle = {ACM SIGKDD workshop on health informatics, New York, USA},
1410 author = {Hasan, Maryam and Agu, Emmanuel and Rundensteiner, Elke},
1411 title = {Using hashtags as labels for supervised learning of emotions in {T}witter messages},
1412}
1413
1414@inproceedings{purver2012experimenting,
1415 year = {2012},
1416 pages = {482--491},
1417 booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
1418 author = {Purver, Matthew and Battersby, Stuart},
1419 title = {Experimenting with distant supervision for emotion classification},
1420}
1421
1422@book{altenberg,
1423 address = {Lund},
1424 publisher = {Lund University Press},
1425 series = {Lund Studies in English},
1426 volume = {76},
1427 title = {Prosodic Patterns in Spoken {E}nglish: Studies
1428in the Correlation between Prosody and Grammar for Text-to-Speech
1429Conversion},
1430 year = {1987},
1431 author = {Altenberg, Bengt},
1432}
1433
1434@book{winograd,
1435 address = {New York},
1436 publisher = {Academic Press},
1437 title = {Understanding Natural Language},
1438 year = {1972},
1439 author = {Winograd, Terry},
1440}
1441
1442@incollection{cutler,
1443 pages = {79--92},
1444 address = {Berlin},
1445 publisher = {Springer-Verlag},
1446 booktitle = {Prosody: Models and Measurements},
1447 editor = {Anne Cutler and D. Robert Ladd},
1448 title = {Speakers' Conception of the Functions of Prosody},
1449 year = {1983},
1450 author = {Cutler, Anne},
1451}
1452
1453@incollection{sgall,
1454 pages = {231--240},
1455 address = {New York},
1456 publisher = {D. Reidel},
1457 booktitle = {Studies in Syntax and Semantics},
1458 editor = {Ferenc Kiefer},
1459 title = {L'ordre des mots et la semantique},
1460 year = {1970},
1461 author = {Sgall, Petr},
1462}
1463
1464@inbook{jurafsky,
1465 publisher = {Prentice Hall},
1466 chapter = {1},
1467 title = {Speech and Language Processing},
1468 year = {2000},
1469 author = {Jurafsky, Daniel and James H. Martin},
1470}
1471
1472@techreport{appelt,
1473 institution = {SRI},
1474 number = {259},
1475 title = {Planning Natural-Language Utterances to
1476Satisfy Multiple Goals},
1477 year = {1982},
1478 author = {Appelt, Douglas E.},
1479}
1480
1481@techreport{robinson,
1482 address = {Santa Monica, CA},
1483 institution = {The RAND Corporation},
1484 number = {RM-3892-PR},
1485 type = {Memorandum},
1486 title = {Automatic Parsing and Fact Retrieval: A
1487Comment on Grammar, Paraphrase, and Meaning},
1488 year = {1964},
1489 author = {Robinson, Jane J.},
1490}
1491
1492@phdthesis{baart,
1493 address = {Leyden},
1494 school = {University of Leyden},
1495 title = {Focus, Syntax, and Accent Placement},
1496 year = {1987},
1497 author = {Baart, J. L. G.},
1498}
1499
1500@phdthesis{spaerckjones,
1501 address = {Cambridge, England},
1502 school = {Cambridge University},
1503 type = {{D.Phil.}\ dissertation},
1504 title = {Synonymy and Semantic Classification},
1505 year = {1964},
1506 author = {Sp\"arck Jones, Karen},
1507}
1508
1509@mastersthesis{cahn,
1510 month = {May},
1511 school = {Massachusetts Institute of Technology},
1512 title = {Generating Expression in Synthesized Speech},
1513 year = {1989},
1514 author = {Cahn, Janet E.},
1515}
1516
1517@unpublished{ayers,
1518 note = {Paper presented at the Linguistic Society of America
1519annual meeting},
1520 title = {Discourse Functions of Pitch Range in Spontaneous
1521and Read Speech},
1522 year = {1992},
1523 author = {Ayers, Gail M.},
1524}
1525
1526@proceedings{benoit,
1527 address = {Grenoble},
1528 publisher = {Institut de la Communication Parlee},
1529 organization = {European Speech Communication Association},
1530 title = {Proceedings of the
1531Eurpoean Speech Communication Association Workshop on Speech
1532Synthesis, \emph{Autrans, September}},
1533 year = {1989},
1534 editor = {Benoit, Christian and Gerard Bailly},
1535}
1536
1537@inproceedings{krahmer,
1538 address = {Budapest},
1539 pages = {1423--1426},
1540 booktitle = {Proceedings of EUROSPEECH-99},
1541 title = {Error Spotting in Human-Machine Interactions},
1542 year = {1999},
1543 author = {Krahmer, Emiel and M. Swerts and Mariet Theune and M. Weegels},
1544}
1545
1546@inproceedings{Copestake2001,
1547 year = {2001},
1548 address = {Toulouse, France},
1549 pages = {140--147},
1550 title = {{An Algebra for Semantic Construction in Constraint-Based Grammars}},
1551 date-modified = {2014-07-07 11:55:56 +0000},
1552 date-added = {2014-07-07 11:55:56 +0000},
1553 booktitle = {Proceedings of the 39th Annual Conference of the Association for Computational Linguistics (ACL-01)},
1554 author = {Copestake, Ann and Lascarides, Alex and Flickinger, Dan},
1555}
1556
1557@book{vneumann,
1558 year = {1963},
1559 address = {New York},
1560 publisher = {Macmillan Company},
1561 title = {Collected Works: Volume {V}},
1562 author = {John von Neumann},
1563}
Attribution
arXiv:2109.08256v3
[cs.CL]
License: cc-by-4.0