Join Our Discord (750+ Members)

Socially Responsible AI Algorithms: Issues, Purposes, and Challenges

Content License: cc-by

Socially Responsible AI Algorithms: Issues, Purposes, and Challenges

Papers is Alpha. This content is part of an effort to make research more accessible, and (most likely) has lost some details from the original. You can find the original paper here.


Artificial intelligence (AI) has had and will continue to have a central role in countless aspects of life, livelihood, and liberty. AI is bringing forth a sea-change that is not only limited to technical domains, but is a truly sociotechnical phenomenon affecting healthcare, education, commerce, finance, and criminal justice, not to mention day-to-day life. AI offers both promise and perils. A report published by Martha Lane Fox’s Doteveryone think tankreveals that 59% of tech workers have worked on products they felt harmful to society, and more than 25% of workers in AI who had such an experience quit their jobs as a result. This was particularly marked in relation to AI products. The rise of activism – which has been regarded as one of the current few mechanisms to keep big tech companies in check– against negative social impacts of big tech have brought Social Responsibility of AI into the spotlight of the media, the general public, and AI technologists and researchers. Even researchers in universities and research institutes are trying hard to rectify the mistakes made by algorithms. Stanford’s COVID-19 vaccine allocation algorithm, for example, prioritizes older employees over front-line workers, turning much of our attention again to the transparency and fairness of AI.

Research directed towards developing fair, transparent, accountable, and ethical AI algorithms has burgeoned with a focus on decision-making algorithms such as scoring or classification to mitigate unwanted bias and achieve fairness. However, this narrow subset of research risks blinding us to the challenges and opportunities that are presented by the full scope of AI. To identify potential higher-order effects on safety, privacy, and society at large, it is critical to think beyond algorithmic bias, to capture all the connections among different aspects related to AI algorithms. Therefore, this survey complements prior work through a holistic understanding of the relations between AI systems and humans. In this work, we begin by introducing an inclusive definition of Social Responsibility of AI . Drawing on theories in business research, we then present a pyramid of Social Responsibility of AI that outlines four specific AI responsibilities in a hierarchy. This is adapted from the pyramid proposed for Corporate Social Responsibility (CSR) by carroll1991pyramid. In the second part of the survey, we review major aspects of AI algorithms and provide a systematic framework – Socially Responsible AI Algorithms (SRAs) – that aims to understand the connections among these aspects. In particular, we examine the subjects and causes of socially indifferent AI algorithms[We define “indifferent” as the complement of responsible rather than “irresponsible”.], define the objectives , and introduce the means by which we can achieve SRAs. We further discuss how to leverage SRAs to improve daily life of human beings and address challenging societal issues through protecting, informing , and preventing/mitigating . We illustrate these ideas using recent studies on several emerging societal challenges. The survey concludes with open problems and challenges in SRAs.

Differences from Existing Surveys. Some recent surveys focus on specific topics such as bias and fairness, interpretability/explainability, and privacy-preservation. These surveys successfully draw great attention to the social responsibility of AI, leading to further developments in this important line of research. However, as indispensable components of socially responsible AI, these topics have been presented in their own self-contained ways. These works pave the way for looking at socially responsible AI holistically. Therefore, our survey aims to frame socially responsible AI with a more systematic view that goes beyond discussion of each independent line of research. We summarize our contributions as follows:

  • We formally define social responsibility of AI with three specified dimensions: principles, means , and objectives . We then propose the pyramid of social responsibility of AI, describing its four fundamental responsibilities: functional, legal, ethical , and philanthropic responsibilities. The pyramid embraces the entire range of AI responsibilities involving efforts from various disciplines.
  • We propose a systematic framework that discusses the essentials of socially responsible AI algorithms (SRAs) – including its subjects, causes, means , and objectives – and the roles of SRAs in protecting, informing users, and preventing them from negative impact of AI. This framework subsumes existing topics such as fairness and interpretability.
  • We look beyond prior research in socially responsible AI and identify an extensive list of open problems and challenges, ranging from understanding why we need AI systems to showing the need to define new AI ethics principles and policies. We hope our discussions can spark future research on SRAs.

Intended Audience and Paper Organization. This survey is intended for AI researchers, AI technologists, researchers, and practitioners from other disciplines who would like to contribute to making AI more socially responsible with their expertise. The rest of the survey is organized as follows: Section 2 introduces the definition and the pyramid of social responsibility of AI, and compares definitions of similar concepts. Section 3 discusses the framework of socially responsible algorithms and its essentials, followed by Section 4 that illustrates the roles of SRAs using several emerging societal issues as examples. Section 5 details the open problems and challenges that socially responsible AI currently confronts. The last section concludes the survey.

Social Responsibility of AI

Social Responsibility of AI includes efforts devoted to addressing both technical and societal issues. While similar concepts (e.g., “Ethical AI”) repeatedly appear in the news, magazines, and scientific articles, “Social Responsibility of AI” has yet to be properly defined. In this section, we first attempt to provide an inclusive definition and then propose the Pyramid of Social Responsibility of AI to outline the various responsibilities of AI in a hierarchy: functional responsibilities, legal responsibilities, ethical responsibilities, and philanthropic responsibilities. At last, we compare “Socially Responsible AI” with similar concepts.

What is Social Responsibility of AI?

Social Responsibility of AI refers to a human value-driven process where values such as Fairness, Transparency, Accountability, Reliability and Safety, Privacy and Security, and Inclusiveness are the principles; designing Socially Responsible AI Algorithms is the means; and addressing the social expectations of generating shared value – enhancing both AI’s ability and benefits to society – is the main objective.

Here, we define three dimensions of Social Responsibility of AI: the principles lay the foundations for ethical AI systems; the means to reach the overarching goal of Social Responsibility of AI is to develop Socially Responsible AI Algorithms; and the objective of Social Responsibility of AI is to improve both AI’s capability and humanity with the second being the proactive goal.

The Pyramid of Social Responsibility of AI

Social Responsibility of AI should be framed in such a way that the entire range of AI responsibilities are embraced. Adapting Carroll’s Pyramid of CSRin the AI context, we suggest four kinds of social responsibilities that constitute the Social Responsibility of AI: functional, legal, ethical, and philanthropic responsibilities, as shown in Figure pyramid. By modularizing AI responsibilities, we hope to help AI technologists and researchers to reconcile these obligations and simultaneously fulfill all the components in the pyramid. All of these responsibilities have always existed, but functional responsibilities have been the main consideration until recently. Each type of responsibility requires close consideration.

The pyramid of Social Responsibility of AI, adapted from the Pyramid of CSR by carroll1991pyramid.pyramid

The pyramid of Social Responsibility of AI, adapted from the Pyramid of CSR by carroll1991pyramid.

The pyramid portrays the four components of Social Responsibility of AI, beginning with the basic building block notion that the functional competence of AI undergirds all elseFunctional responsibilities_ require AI systems to perform in a manner consistent with profits maximization, operating efficiency, and other key performance indicators. Meanwhile, AI is expected to obey the law, which codifies the acceptable and unacceptable behaviors in our society. That is, legal responsibilities require AI systems to perform in a manner consistent with expectations of government and law. All AI systems should at least meet the minimal legal requirements. At its most fundamental level, ethical responsibilities are the obligation to do what is right, just, and fair, and to prevent or mitigate negative impact on stakeholders (e.g., users, the environment). To fulfill its ethical responsibilities, AI systems need to perform in a manner consistent with societal expectations and ethical norms, which cannot be compromised in order to achieve AI’s functional responsibilities. Finally, in philanthropic responsibilities , AI systems are expected to be good AI citizens and to contribute to tackling societal challenges such as cancer and climate change. Particularly, it is important for AI systems to perform in a manner consistent with the philanthropic and charitable expectations of society to enhance people’s quality of life. The distinguishing feature between ethical and philanthropic responsibilities is that the latter are not expected in an ethical sense. For example, while communities desire AI systems to be applied to humanitarian projects or purposes, they do not regard the AI systems as unethical if they do not provide such services. We explore the nature of Social Responsibility of AI by focusing on its components to help AI technologists to reconcile these obligations. Though these four components are depicted as separate concepts, they are not mutually exclusive. It is necessary for AI technologists and researchers to recognize that these obligations are in a constant but dynamic tension with one another.

Comparisons of Similar Concepts

Based on Definition 1 and the pyramid of socially responsibility of AI, we compare Socially Responsible AI with other similar concepts, as illustrated in Table comparisons. The results show that Socially Responsible AI holds a systematic view that subsumes existing concepts and further considers the fundamental responsibilities of AI systems – to be functional and legal, as well as their philanthropic responsibilities – to be able to improve life quality of well beings and address challenging societal issues. In the rest of this survey, we focus our discussions on the ethical (Section 3, essentials of SRAs) and philanthropic (Section 4, roles of SRAs) responsibilities of AI given that both the functional and legal responsibilities are the usual focuses in AI research and development. An overview of SRAs research is illustrated in Figure outline, which we will refer back to throughout the remainder of the survey. Importantly, in our view, the essentials of SRAs work toward ethical responsibilities, and their roles in society encompass both ethical and philanthropic responsibilities.

Definitions of concepts similar to Socially Responsible AI.

Table Label: comparisons

Download PDF to view table


Socially Responsible AI Algorithms (SRAs)

The role of AI technologists and researchers carries a number of responsibilities. The most obvious is developing accurate, reliable, and trustworthy algorithms that can be depended on by their users. Yet, this has never been a trivial task. For example, due to the various types of human biases, e.g., confirmation bias, gender bias, and anchoring bias, AI technologists and researchers often inadvertently inject these same kinds of bias into the developed algorithms, especially when using machine learning techniques. For example, supervised machine learning is a common technique for learning and validating algorithms through manually annotated data, loss functions, and related evaluation metrics. Numerous uncertainties – e.g., imbalanced data, ill-defined criteria for data annotation, over-simplified loss functions, and unexplainable results – potentially lurk in this “beautiful” pipeline and will eventually lead to negative outcomes such as biases and discrimination. With the growing reliance on AI in almost any field in our society, we must bring upfront the vital question about how to develop Socially Responsible AI Algorithms . While conclusive answers are yet to be found, we attempt to provide a systematic framework of SRAs (illustrated in Figure sras) to discuss the components of AI’s ethical responsibilities, the roles of SRAs in terms of AI’s philanthropic and ethical responsibilities, and the feedback from users routed back as inputs to SRAs. We hope to broaden future discussions on this subject. In this regard, we define SRAs as follows:

An overview of SRAs Research.outline

An overview of SRAs Research.

The framework of Socially Responsible AI Algorithms (SRAs). It consists of the essentials (i.e., the internal mechanisms) of SRAs (left), their roles (right), and feedback received from end users for helping SRAs gradually achieve the expected social values (bottom). The essentials of SRAs center on the ethical responsibilities of AI and the roles of SRAs require philanthropic responsibilities and ethical responsibilities.sras

The framework of Socially Responsible AI Algorithms (SRAs). It consists of the essentials (i.e., the internal mechanisms) of SRAs (left), their roles (right), and feedback received from end users for helping SRAs gradually achieve the expected social values (bottom). The essentials of SRAs center on the ethical responsibilities of AI and the roles of SRAs require philanthropic responsibilities and ethical responsibilities.

Socially Responsible AI Algorithms are the intelligent algorithms that prioritize the needs of all stakeholders as the highest priority, especially the minoritized and disadvantaged users, in order to make just and trustworthy decisions. These obligations include protecting and informing users; preventing and mitigating negative impact; and maximizing the long-term beneficial impact. Socially Responsible AI Algorithms constantly receive feedback from users to continually accomplish the expected social values.

In this definition, we highlight that the functional (e.g., maximizing profits) and societal (e.g., transparency) objectives are integral parts of AI algorithms. SRAs aim to be socially responsible while still meeting and exceeding business objectives.

Subjects of Socially Indifferent AI Algorithms

Every human being can be a potential victim of socially indifferent AI algorithms. Mirroring society, the ones who suffer the most, both in frequency and severity, are minorities and disadvantaged groups such as black, indigenous and people of color (BIPOC), and females. For example, Google mislabeled an image of two black people as “gorillas”and more frequently showed ads of high-paying jobs to males than females. Similar gender bias was also found in Facebook algorithms behind the job ads. In domains with high-stakes decisions, e.g., financial services, healthcare, and criminal justice, it is not uncommon to identify instances where socially indifferent AI algorithms favor privileged groups. For example, the algorithm used in Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) was found almost twice as likely to mislabel a black defendant as a future risk than a white defendant. Identifying the subjects of socially indifferent AI algorithms depends on the context. In another study, the journalistic organization ProPublica[] investigated algorithms that determine online prices for Princeton Review’s tutoring classes. The results showed that people who lived in higher income areas were charged twice as much as the general public and than people living in a zip code with high population density. Asians were 1.8 times more likely to pay higher price, regardless of their income. Analogously, these AI algorithms might put poor people who cannot afford internet service at disadvantage because they simply have never seen such data samples in the training process.

When it comes to purpose-driven collection and use of data, each individual can be the subject of socially indifferent AI algorithms. Users’ personal data are frequently collected and used without their consent. Such data includes granular details such as contact information, online browsing and session record, social media consumption, location and so on. While most of us are aware of our data being used, few have controls to where and how the data is used, and by whom. The misuse of data and lack of knowledge causes users to become the victims of privacy-leakage and distrust.

Causes of Socially Indifferent AI Algorithms

There are many potential factors that can cause AI algorithms to be socially indifferent. Here, we list several causes that have been frequently discussed in literature. They are formalization, measuring errors, bias, privacy, and correlation versus causation.


AI algorithms encompass data formalization, label formalization, formalization of loss function and evaluation metrics. We unconsciously make some frame of reference commitment to each of these formalizations. Firstly, the social and historical context are often left out when transforming raw data into numerical feature vectors. Therefore, AI algorithms are trained on pre-processed data with important contextual information missing. Secondly, data annotation can be problematic for a number of reasons. For example, what are the criteria? Who defines the criteria? Who are the annotators? How can it be ensured that they all follow the criteria? What we have for model training are only proxies of the true labels. Ill-formulated loss functions can also result in socially indifferent AI algorithms. Many loss functions in the tasks are over-simplified to solely focus on maximizing profits and minimizing losses. The concerns of unethical optimization are recently discussed by beale2019unethical. Unknown to AI systems, certain strategies in the optimization space that are considered as unethical by stakeholder may be selected to satisfy the simplified task requirements. Lastly, the use of inappropriate benchmarks for evaluation may push algorithms away from the overarching goal of the task and fuel injustice.

Measuring Errors

Another cause of socially indifferent AI algorithms is the errors when measuring algorithm performance. When reporting results, researchers typically proclaim the proposed algorithms can achieve certain accuracy or F1 scores. However, this is based on assumptions that the training and test samples are representative of the target population and their distributions are similar enough. Yet, how often does the assumption hold in practice? As illustrated in Figure measureerror, with non-representative samples, the learned model can achieve zero training error and perform well on the testing data at the initial stage. However, with more data being tested later, the model performance deteriorates because the learned model does not represent the true model.

An example of measuring errors. The green line denotes the learned model and the blue one is the true model. `+' and `-' represent training data belonging to different classes; `X' represents testing data. Image taken from Getoor's slides for 2019 IEEE Big Data keynotewith permission.measureerror

An example of measuring errors. The green line denotes the learned model and the blue one is the true model. `+' and `-' represent training data belonging to different classes; `X' represents testing data. Image taken from Getoor's slides for 2019 IEEE Big Data keynotewith permission.


Bias is one of the most discussed topics regarding responsible AI. We here focus on the data bias, automation bias, and algorithmic bias.

Data Bias. Data, especially big data, is often heterogeneous – data with high variability of types and formats, e.g., text, image, and video. The availability of multiple data sources brings unprecedented opportunities as well as unequivocally presented challenges. For instance, high-dimensional data such as text is infamous for the danger of overfitting and the curse of dimensionality. Additionally, it is rather challenging to find subset of features that are predictive but uncorrelated. The required number samples for generalization also grows proportionally with feature dimension. One example is how the U.S. National Security Agency tried to use AI algorithms to identify potential terrorists. The Skynet project collected cellular network traffic in Pakistan and extracted 80 features for each cell phone user with only 7 known terrorists. The algorithm ended up identifying an Al Jazeera reporter covering Al Qaeda as a potential terrorist. Data heterogeneity is also against the well known $i.i.d.$ assumption in most learning algorithms. Therefore, training these algorithms on heterogeneous data can result in undesired results. Imbalanced subgroups is another source of data bias. As illustrated in, regression analysis based on the subgroups with balanced fitness level suggests positive correlation between BMI and daily pasta calorie intake whereas that based on less balanced data shows almost no relationship.

Automation Bias. This type of bias refers to our preference to results suggested by automated decision-making systems while ignoring the contradictory information. Too much reliance on the automated systems without sparing additional thoughts in making final decisions, we might end up abdicating decision responsibility to AI algorithms.

Algorithmic Bias. Algorithmic bias regards biases added purely by the algorithm itself. Some algorithms are inadvertently taught prejudices and unethical biases by societal patterns hidden in the data. Typically, models fit better to features that frequently appear in the data. For example, an automatic AI recruiting tool will learn to make decisions for a given applicant of a software engineer position using observed patterns such as “experience”, “programming skills”, “degree”, and “past projects”. For a position where gender disparity is large, the algorithms mistakenly interpret this collective imbalance as a useful pattern in the data rather than undesirable noise that should have been discarded. Algorithmic bias is systematic and repeatable error in an AI system that creates discriminated outcome, e.g., privileging wealthy users over others. It can amplify, operationalize, and even legitimize institutional bias.

Data Misuse

Data is the fuel and new currency that has empowered tremendous progress in AI research. Search engines have to rely on data to craft precisely personalized recommendation that improves the online experience of consumers, including online shopping, book recommendation, entertainment, and so on. However, users’ data are frequently misused without the consent and awareness of users. One example is the Facebook-Cambridge Analytical scandalwhere millions of Facebook users’ personal data was collected by Cambridge Analytica, without their consent. In a more recent study, researchers show that Facebook allows advertisers to exploit its users’ sensitive information for tailored ad campaigns. To make things worse, users often have no clue about where, how, and why their data is being used, and by whom. The lack of knowledge and choice over their data causes users to undervalue their personal data, and further creates issues such as privacy and distrust.

Correlation vs Causation

Confounders are common reasons for spurious correlation between two variables that are not causally connected.icecream

Confounders are common reasons for spurious correlation between two variables that are not causally connected.

AI Algorithms can become socially indifferent when correlation is misinterpreted as causation. For example, in the diagram in Figure icecream, we observe a strong correlation between the electric bill of an ice cream shop and ice cream sales. Apparently, high electric bill cannot cause the ice cream sales to increase. Rather, weather is the common cause of electric bill and the sale, i.e., high temperature causes high electric bill and the increased ice cream sales. Weather – the confounder – creates a spurious correlation between electric bill and ice cream sales. Causality is a generic relationship between the cause and the outcome. While correlation helps with prediction, causation is important for decision making. One typical example is Simpson’s Paradox. It describes a phenomenon where a trend or association observed in subgroups maybe opposite to that observed when these subgroups are aggregated. For instance, in the study of analyzing the sex bias in graduation admissions at UC Berkeley, the admission rate was found higher in male applicants when using the entire data. However, when the admission data were separated and analyzed over the departments, female candidates had equal or even higher admission rate over male candidates.

Objectives of Socially Responsible AI Algorithms

Essentially, the goal is to (re)build trust in AI. By definition, trust is the “firm belief in the reliability, truth or ability of someone or something”[Definition from Oxford Languages.]. It is a high-level concept that needs to be specified by more concrete objectives. We here discuss the SRAs objectives that have been discussed comparatively frequently in literature. They are fairness, transparency, and safety as illustrated in Figure objective.


Fairness in AI has gained substantial attentions in both research and industry since 2010. For decades, researchers found it rather challenging to present a unified definition of fairness in part because fairness is a societal and ethical concept. This concept is mostly subjective, changes over social context, and evolves over time, making fairness a rather challenging goal to achieve in practice. Because SRAs is a decision-making process commensurate with social values, we here adopt a fairness definition in the context of decision-making:

“Fairness is the absence of any prejudice or favoritism toward an individual or a group based on their inherent or acquired characteristics”.

The objectives of Socially Responsible AI Algorithms.objective

The objectives of Socially Responsible AI Algorithms.

Note that even an ideally “fair” AI system defined in a specific context might still lead to biased decisions as the entire decision making process involves numerous elements such as policy makers and environment. While the concept of fairness is difficult to pin down, unfairness/bias/discrimination might be easier to identify. There are six types of discrimination. Direct discrimination results from protected attributes of individuals while indirect discrimination from seemingly neural and non-protected attributes. Systemic discrimination relates to policies that may show discrimination against subgroups of population. Statistical discrimination occurs when decision makers use average statistics to represent individuals. Depending whether the differences amongst different groups can be justified or not, we further have explainable and unexplainable discrimination.


Transparency is another important but quite ambiguous concept. This is partly because AI alone can be defined in more than 70 ways. When we seek a transparent algorithm, we are asking for an understandable explanation of how it works: What does the training set look like? Who collected the data? What is the algorithm doing? There are mainly three types of transparency with regard to human interpretability of AI algorithms: For a developer , the goal of transparency is to understand how the algorithm works and get a sense of why; for a deployer who owns and releases the algorithm to the public, the goal of transparency is to make the consumers to feel safe and comfortable to use the system; and what transparency means to a user is understanding what the AI system is doing and why. We may further differentiate global transparency from local transparency, the former aims to explain the entire system whereas the latter explains a decision within a particular context.

Yet, at the same time, disclosures about AI can pose potential risks: explanations can be hacked and releasing additional information may make AI more vulnerable to attacks. It is becoming clear that transparency is often beneficial but not universally good. The AI “transparency paradox” encourages different parties of AI systems to think more carefully about how to balance the transparency and the risks it poses. We can also see related discussions in recent work such as. The paper studied how the widely recognized interpretable algorithms LIMEand SHAPcould be hacked. As the authors illustrated, explanations can be purposefully manipulated, leading to a loss of trust not only in the model but also in its explanations. Consequently, while working towards the goal of transparency, we must also recognize that privacy and security are the indispensable conditions we need to satisfy.


Because AI systems operate in a world with much uncertainty, volatility, and flux, another objective of SRAs is to be safe, accurate, and reliable. There are four operational objectives relevant to Safety: accuracy, reliability, security, and robustness. In machine learning, accuracy is typically measured by error rate or the fraction of instances for which the algorithm produces an incorrect output. As a standard performance metric, accuracy should be the fundamental component to establishing the approach to safe AI. It is necessary to specify a proper performance measure for evaluating any AI systems. For instance, when data for classification tasks is extremely imbalanced, precision, recall, and F1 scores are more appropriate than accuracy. The objective of reliability is to ensure that AI systems behave as we anticipate. It is a measure of consistency and is important to establish confidence in the safety of AI systemsSecurity_ encompasses the protection of information integrity, confidentiality, and continuous functionality to its users. Under harsh conditions (e.g., adversarial attack, perturbations, and implementation error), AI systems are expected to functions reliably and accurately, i.e., Robustness .

Means Towards Socially Responsible AI Algorithms

In this section, we review four primary machine learning techniques and statistical methods for achieving the goals of SRAs – interpretability and explainability, adversarial machine learning, causal learning, and uncertainty quantification. Existing surveys have conducted comprehensive reviews on each of these techniques: e.g., interpretablity, causal learning, adversarial machine learning, and uncertainty quantification. We thereby focus on the basics and the most frequently discussed methods in each means.

Interpretability and Explainability

Interpretability and explanability are the keys to increasing transparency of AI algorithms. This is extremely important when we leverage these algorithms for high-stakes prediction applications, which deeply impact people’s lives. Existing work in machine learning interpretability can be categorized according to different criteria. Depending on when the interpretability methods are applicable (before, during, or after building the machine learning model), we have pre-model (before), in-model (during), and post-model (after) interpretability. Pre-model techniques are only applicable to the data itself. It requires an in-depth understanding of the data before building the model, e.g., sparsity and dimensionality. Therefore, it is closely related to data interpretability, in which classic descriptive statistics and data visualization methods are often used, including Principal Component Analysisand t-SNE, and clustering methods such as $k$ -means. In-model interpretability asks for intrinsically interpretable AI algorithms (e.g., yang2016hierarchical), we can also refer to it as intrinsic interpretability. It can be achieved through imposition of constraints on the model such as causality, sparsity, or physical conditions from domain knowledge. In-model interpretability answers question how the model works. Decision trees, rule-based models, linear regression, attention network, and disentangled representation learning are in-model interpretability techniques. Post-model interpretability, or post-hoc interpretability (e.g., mordvintsev2015inceptionism,ribeiro2016should), is applied after model training. It answers the question what else can the model tell us. Post-model interpretability include local explanations, saliency maps, example-based explanations, influence functions, feature visualization, and explaining by base interpretable models.

Another criterion to group current interpretability techniques is model-specific vs model-agnostic. Model-specific interpretation is based on internals of a specific model. To illustrate, the coefficients of a linear regression model belong to model-specific interpretation. Model-agnostic methods do not have access to the model inner workings, rather, they are applied to any machine learning model after it has been trained. Essentially, the goal of interpretability is to help the user understand the decisions made by the machine learning models through the tool explanation . There are pragmatic and non-pragmatic theories of explanation. The former indicates that explanation should be a good answer that can be easily understood by the audience. The non-pragmatic theory emphasizes the correctness of the answer to the why-question. Both need to have the following properties: expressive power, translucency, portability, and algorithmic complexity.

Adversarial Machine Learning

Machine learning models, especially deep learning models, are vulnerable to crafted adversarial examples, which are imperceptible to human eyes but can easily fool deep neural networks (NN) in the testing/deploying stage. Adversarial examples have posed great concerns in the security and integrity of various applications. Adversarial machine learning, therefore, closely relates to the robustness of SRAs.

The security of any machine learning model is measured with regard to the adversarial goals and capabilities. Identifying the threat surfaceof an AI system built on machine learning models is critical to understand where and how an adversary may subvert the system under attack. For example, the attack surface in a standard automated vehicle system can be defined with regard to the data processing pipeline. Typically, there are three types of attacks the attack surface can identify: evasion attack – the adversary attempts to evade the system by manipulating malicious samples during testing phase, poisoning attack – the adversary attempts to poison the training data by injecting carefully designed samples into the learning process, and exploratory attack – it tries to collect as much information as possible about the learning algorithm of the underlying system and pattern in training data. Depending on the amount of information available to an adversary about the system, we can define different types of adversarial capabilities. In the training phase (i.e., training phase capabilities), there are three broad attack strategies: (1) data injection. The adversary can only augment new data to the training set; (2) data modification. The adversary has full access to the training data; and (3) logic corruption. The adversary can modify the learning algorithm. In the testing phase (i.e., testing phase capabilities), adversarial attacks focus on producing incorrect outputs. For white-box attack, an adversary has full knowledge about the model used for prediction: algorithm used in training, training data distribution, and the parameters of the fully trained model. The other type of attack is black-box attack, which, on the contrary, assumes no knowledge about the model and only uses historical information or information about the settings. The primary goal of black-box attack is to train a local model with the data distribution, i.e., non-adaptive attack, and with carefully selected dataset by querying the target model, i.e., adaptive attack.

Exploratory attacks do not have access to the training data but aim to learn the current state by probing the learner. Commonly used techniques include model inversion attack, model extraction using APIs, and inference attack. The popular attacks are evasion attacks where malicious inputs are craftily manipulated so as to fool the model to make false predictions. Poisoning attacks, however, modify the input during the training phase to obtain the desired results. Some of the well-known techniques are generative adversarial network (GAN), adversarial examples generation (including training phase modification, e.g., barreno2006can, and testing phase modification, e.g., papernot2016distillation), GAN-based attack in collaborative deep learning, and adversarial classification.

Causal Learning

Causal inference and reasoning is a critical ingredient for AI to achieve human-level intelligence, an overarching goal of Socially Responsible AI. The momentum of integrating causality into responsible AI is growing, as witnessed by a number of works (e.g., kusner2017counterfactual,xu2019achieving,holzinger2019causability) studying SRAs through causal learning methods.

Basics of Causal Learning. The two fundamental frameworks in causal learning are _structural causal models_and potential outcome. Structural causal models rely on the causal graph, which is a special class of Bayesian network with edges denoting causal relationships. A more structured format is referred to as structural equations. One of the fundamental notions in structural causal models is the do -calculus, an operation for intervention . The difficulty to conduct causal study is the difference between the observational and interventional distribution, the latter describes what the distribution of outcome $Y$ is if we were to set covariates $X=x$ . Potential outcome framework interprets causality as given the treatment and outcome, we can only observe one potential outcome. The counterfactuals – potential outcome that would have been observed if the individual had received a different treatment – however, can never be observed in reality. These two frameworks are the foundations of causal effect estimation (estimating effect of a treatment) and causal discovery (learning causal relations amongst different variables).

Many important concepts in causal inference have been adapted to AI such as intervention and counterfactual reasoning. Here, we introduce the causal concept most frequently used in SRAs – propensity score, defined as “conditional probability of assignment to a particular treatment given a vector of observed covariates”. A popular propensity-based approach is Inverse Probability of Treatment Weighting. To synthesize a randomized control trial, it uses covariate balancing to weigh instances based on their propensity scores and the probability of an instance to receive the treatment. Let $t_i$ and $x_i$ be the treatment assignment and covariate of instance $i$ , the weight $w_i$ is typically computed by the following formula:

\[ w_i=\frac{t_i}{P(t_i|x_i)}+\frac{1-t_i}{1-P(t_i|x_i)}, \]

where $P(t_i|x_i)$ quantifies the propensity score. The weighted average of the observed outcomes for the treatment and control groups are defined as

\[ \hat{\tau}=\frac{1}{n_1}\sum_{i:t_i=1}w_iy_i-\frac{1}{n_0}\sum_{i:t_i=0}w_iy_i, \]

where $n_1$ and $n_0$ denote the sizes of the treated and controlled groups.

Causal Learning for SRAs. Firstly, it is becoming increasingly popular to use causal models to solve fairness-related issues. For example, the subject of causality and its importance to address fairness issue was discussed in. Causal models can also be used to discover and eliminate discrimination to make decisions that are irrespective of sensitive attributes, on individual-, group-, and system-level, see, e.g.,. Secondly, bias alleviation is another field where causal learning methods are frequently discussed and affect many machine learning applications at large. The emerging research on debiasing recommender systemcan serve as one example. Due to the biased nature of user behavior data, recommender systems inevitably involve with various discrimination-related issues: recommending less career coaching services and high-paying jobs to women, recommending more male-authored books, and minorities are less likely to become social influencers. Gender and ethnic biases were even found in a broader context, e.g., word embeddings trained on 100 years of text data. Causal approaches such asaim to mitigate such bias in word embedding relations.

Thirdly, causal learning methods also have had discernible achievements in transparency, especially the interpretability of black-box algorithms. Causality is particularly desired since these algorithms only capture correlations not real causes. Further, it has been suggested that counterfactual explanations are the highest level of interpretability. For model-based interpretations, causal interpretability aims to explain the causal effect of a model component on the final decision. One example to differentiate it from traditional interpretability is only causal interpretability is able to answer question such as “What is the effect of the $n$ -th filter of the $m$ -th layer of a neural network on the prediction of the model?”.Counterfactual explanations is a type of example-based explanations, in which we look for data instances that can explain the underlying data distributions. Counterfactual explanations are human friendly, however, it is possible to have different true versions of explanations for the predicted results, i.e., the Rashomon effect. Studies such asare proposed to address this issue. For detailed discussion on causal interpretability, please refer to. Lastly, causal learning is inherently related to the robustness or adaptability of AI systems, which have been noted to lack the capability of reacting to new circumstances they are not trained for. Causal relationship, however, is expected to be invariant and robust across environments. This complements intensive earlier efforts toward “transfer learning”, “domain adaptation”, and “lifelong learning”. Some current work seeking to extrapolate the relationship between AI robustness and causality includes the independent causal mechanism principle, invariant prediction, and disentangled causal mechanism.

Uncertainty Quantification

AI research continues to develop new state-of-the-art algorithms with superior performance and large-scaled datasets with high quality. Even using the best models and training data, it is still infeasible for AI systems to cover all the potential situations when deployed into real-world applications. As a matter of fact, AI systems always encounter new samples that are different from those used for training. The core question is how to leverage the strengths of these uncertainties. Recent research, e.g.,, has advocated to measure, communicate, and use uncertainty as a form of transparency. There are also tools such as IBM’s Uncertainty Quantification 360[] to provide AI practitioners access to related resources as common practices for AI transparency. Consequently, uncertainty quantification plays a crucial role in the optimization and decision-making process in SRAs. There are typically two kinds of uncertainties in risk analysis processes: first, the aleatory uncertainty describes the inherent randomness of systems. For example, an AI system can present different results even with the same set of inputs. The uncertainty arises from underlying random variations within the data. Second, the epistemic uncertainty represents the effect of an unknown phenomenon or an internal parameter. The primary reason leading to this type of uncertainty is the lack of observed data. As the variation among the data in aleatory uncertainty is often observable, we can well quantify the uncertainty and assess the risks. Quantification of epistemic uncertainty is more challenging because AI systems are forced to extrapolate over unseen situations. In the literature of uncertainty quantification, one of the most widely recognized techniques are prediction intervals (PI). For neural-network-based models, PI can be categorized into multi-step PI construction methods (e.g., Bayesian method) and direct PI construction methods (e.g., lower upper bound estimation). Here, we briefly discuss several methods in each category. Please refer to the surveyfor more details.

Multi-Step Prediction Intervals Construction Methods. Delta method, Bayesian method, Mean-Variance Estimation method, and Bootstrap method are the four conventional multi-step methods reported in literature. Delta method constructs PIs through nonlinear regression using Tylor series expansion. Particularly, we linearize neural network models through optimization by minimizing the error-based loss function, sum square error. Under the assumption that uncertainty is from normal and homogeneous distribution, we then employ standard asymptotic theory to construct PIs. Delta method has been used in numerous case studies, e.g.,. Bayesian learning provides a natural framework for constructing PIsas it optimizes the posterior distribution of parameters from the assumed prior distribution. Despite its high generalization power, Bayesian techniques are limited by large computational complexity due to the calculation of Hessian matrix. Bootstrap method is the most popular among the four conventional multi-step PI construction methods. It includes smooth, parametric, wild, pairs, residual, Gaussian process, and other types of bootstrap techniques. In NN-based pairs bootstrap algorithm, for example, the key is to generate bootstrapped pairs by uniform sampling with replacement from the original training data. The estimation is then conducted for a single bootstrapped dataset.

Illustration of what Socially Responsible AI Algorithms (SRAs) can do. It requires philanthropic responsibilities and ethical responsibilities.cando

Illustration of what Socially Responsible AI Algorithms (SRAs) can do. It requires philanthropic responsibilities and ethical responsibilities.

Direct Prediction Intervals Construction Methods. This category of methods can tackle some of the limitations in previous methods, such as high demanding in computational power and stringent assumptions. When NN models are constructed through direct training without any assumptions, they can provide more adaptive and smarter PIs for any distribution of targets. Lower Upper Bound estimation method is such a technique that can be applied to arbitrary distribution of targets with more than one order reduced computation time. It directly calculates the lower and the upper bounds through trained NNs. Initially, Lower Upper Bound estimation NNs are optimized with the coverage width-based criterion, which presents several limitations. With all the benefits of the original Lower Upper Bound estimation method, the NN-based Direct Interval Forecasting methodhas much shorter computation time and narrower PIs credited to the improved cost function and the reduced average coverage error. Other approaches for improving the cost function of Lower Upper Bound estimation include the normalized root-mean-square width and particle swarm optimization, optimal system by, the independent width and penalty factors, the deviation from mid-interval consideration, and the deviation information-based criterion.

Roles of SRAs

So far, we have introduced the essentials of SRAs to achieve the expected ethical responsibilities. But pragmatic questions regarding their intended use remain: How to operationalize SRAs? What can SRAs eventually do for societal well-being to address societal challenges? Both ethical and philanthropic responsibilities are indispensable ingredients of the answers. While the ultimate goal of SRAs is to do good and be a good AI citizen , their ethical responsibilities should be ensured first. When AI fails to fulfill its ethical responsibilities, its philanthropic benefits can be insignificant. For instance, despite the immense public good of COVID-19 vaccines, there has been great controversy about algorithms for their distribution, which have been shown to be inequitable. Some argue that distribution algorithms should prioritize saving more lives and bringing the economy back more rapidly; they support such an `unfair’ allocation, but we would argue that that is not unfairness, but simply a difference of values and ethics. In our view, roles of SRAs are expected to encompass both ethical and philanthropic responsibilities. In this survey, we describe three dimensions that SRAs can help with to improve the quality of human life as illustrated in Figure cando: Protect (e.g., protect users’ personal information), Inform (e.g., fake news early detection), and Prevent/Mitigate (e.g., cyberbullying mitigation). We illustrate each dimension with research findings in several emerging societal issues. Particularly, for protecting dimension, we focus on privacy preserving and data dignity; for informing and preventing/mitigating dimensions, we discuss three societal issues that raise growing concerns recently: disinformation, abusive language, and unwanted bias. Because there are many various forms of abusive language such as hate speech and profanity, and the body of work related to each form is vast and diverse, spanning multiple interconnected disciplines, this survey uses the form of cyberbullying as a representative for the illustrations.


The protecting dimension aims to cover or shield humans (especially the most vulnerable or at-risk) from harm, injury, and negative impact of AI systems, in order to intervene. This can be the protection of users’ personal data and their interactions with AI systems. Two typical examples are privacy preserving and data dignity.


The capability of deep learning models has been greatly improved by the emerging powerful infrastructures such as clouds and collaborative learning for model training. The fuel of this power, however, comes from data, particularly sensitive data. This has raised growing privacy concerns such as illegitimate use of private data and the disclosure of sensitive data. Existing threats against privacy are typically from attacks such as the adversarial examples we discussed in Sec. adversary. Specifically, there are direct information exposure (e.g., untrusted clouds), which is caused by direct intentional or unintentional data breaches, and indirect (inferred) information exposure (e.g., parameter inference), which is caused by direct access to the model or output. Existing privacy-preserving mechanisms can be classified into three categories, namely, private data aggregation methods, private training, and private inference.

Data aggregation methods are either context-free or context-aware. A context-free approach such as differential privacy, is unaware of the context or what the data will be used for. Context-aware approach such as information-theoretic privacy, on the other hand, is aware of the context in which the data will be used. A naïve technique for privacy protection is to remove identifiers from data, such as name, address, and zip code. It has been used for protecting patients’ information while processing their medical records, but the results are unsatisfying. The k-Anonymity method can prevent information from re-identification by showing at least $k$ samples with exact same set of attributes for given combination of attributes that the adversary has access to. The most commonly used data aggregation method is differential privacy, which aims to estimate the effect of removing an individual from the dataset and keep the effect of the inclusion of one’s data small. Some notable work includes the Laplace mechanism, differential privacy with Advanced Composition, and local differential privacy.

Information-theoretic privacy is a context-aware approach that explicitly models the dataset statistics. By contrast, context-free methods assume worse-case dataset statistics and adversaries. This line of research was studied by diaz2019robustness, pinceti2019data, and varodayan2011smart. The second type of privacy-preserving mechanism works during the training phase. Established work in private training is mostly used to guarantee differential privacy or semantic security and encryption. The two most common methods for encryption are homomorphic encryptionand secure multi-party computation. The third type of privacy-preserving mechanism works during the inference phase. It aims at the trained systems that are deployed to offer inference-as-a-service. Most methods in private inference are similar to those in private training, except for the information-theoretic privacy. It is typically used to offer information-theoretic mathematical or empirical evidence of how these methods operate to improve privacy. There is also work using differential privacy, homomorphic encryption, and secure multi-party computation.

Data Dignity

Beyond privacy preserving, what is more urgent to accomplish is data dignity. It allows users to have absolute control to how their data is being used and they are paid accordingly. Data dignity encompasses the following aspects:

  • To help users objectively determine the benefits and risks associated with their digital presence and personal data.
  • To let users control how their data will be used and the purpose of using the data.
  • To allow users to negotiate the terms of using their data.
  • To give users complete right and autonomy to be found, analyzed, or forgotten, apart from the fundamental right over their data. There are business models such as the Microsoft Data Bank designed to give users the control of their data and those shared by the Art of Researchabout how people can buy and sell their personal data.


The informing dimension aims to deliver the facts or information to users, particularly the potential negative results, in a timely way. We illustrate it with a focus on the discussions of detecting disinformation, cyberbullying, and bias.

Disinformation Detection

Disinformation is false information that is deliberately created and spread to deceive people, a social group, organization, or country. The online information ecosystem is never short of disinformation and misinformation, and the growing concerns have been raised recently. Tackling disinformation is rather challenging mainly because (1) disinformation exists almost in all domains; (2) it is ever-changing with new problems, challenges, and threats emerging every day; (3) it entails the joint efforts of interdisciplinary research – computer science, social science, politics, policy making, and psychology, cognitive science. Accurate and efficient identification of disinformation is the core to combat disinformation. Existing prominent approaches for disinformation detection primarily rely on news content, social context, user comments, fact-checking tools, and explainable and cross-domain detection.

Early work on disinformation detection has been focused on hand-crafted features extracted from text, such as lexical and syntactic features. Apart from text, online platforms also provide abundant social information that can be leveraged to enrich the textual features, e.g., number of re-tweets and likes on Twitter. Informed by theories in social science and network science, another line of work exploits social network information to improve the detection performance. Common features are social context, user profile, user engagement, and relationships among news articles, readers, and publishers. A unique function of online platforms is that they allow users to interact through comments. Recent work has shown that user comments can provide weak supervision signal for identifying the authenticity of news articles, which enables early detection of disinformation. When the user comments are unavailable, it is possible to learn users’ response to news articles and then generate user responses. Fact-checking can be achieved manually or automatically. Manual fact-checking relies on domain experts or crowdsourced knowledge from users. Automatic fact-checking uses structure knowledge bases such as knowledge graph to verify the authenticity of news articles, see, e.g.,. Beyond within-domain detection, other tasks such as cross-domain detection, explanation, and causal understanding of fake news disseminationhave also been discussed in literature.

Cyberbullying Detection

Cyberbullying differs from other forms of abusive language in that it is not an one-off incident but “aggressively intentional acts carried out by a group or an individual using electronic forms of contact, repeatedly or over time against victims who cannot easily defend themselves”. The increasingly reported number of cyberbullying cases on social media and the resulting detrimental impact have raised great concerns in society. Cyberbullying detection is regularly figured as a binary classification problem. While it shares some similarities with document classification, it should be noted that cyberbullying identification is inherently more complicated than simply identifying oppressive content.

Distinct characteristics of cyberbullying such as power imbalance and repetition of aggressive acts are central to marking a message or a social media sessionas cyberbullying. Several major challenges in cyberbullying detection have been discussed in literature such as the formulation of the unique bullying characteristics, e.g., repetition, data annotation, and severe class imbalance. Depending on the employed features, established work can be classified into four categories: content-based, sentiment-based, user-based, and network-based methods. Features extracted from social media content are lexical items such as keywords, Bag of Words, pronoun and punctuation. Empirical evaluations have shown that textual features are the most informative predictors for cyberbullying detection. For instance, using number of offensive terms as content features is effective in detecting offensive and cursing behavior; Computing content similarity between tweets from different users can help capture users’ personality traits and peer influence, two important factors of cyberbullying occurrences. Sentiment-based features typically include key-words, phrases and emojis, and they are often combined with content-based features. A notable workidentified seven types of emotions in tweets such as anger, empathy, and fear. User-based features are typical characteristics of users, e.g., personality (e.g., hostility), demographics (e.g., age), and user activity (e.g., active users). Hostility and neuroticism are found to be strongly related to cyberbullying behavior. Further, gender and age are indicative of cyberbullying in certain cases. Network-based features measure the sociability of online users, e.g., number of friends, followers, and network embeddedness. In addition, a number of methods seek to capture the temporal dynamics to characterize the repetition of cyberbullying, such as.

Bias Detection

Compared to the well-defined notions of fairness, bias detection is much less studied and the solution is not as straightforward as it may seem. The challenges arise from various perspectives. First, the data and algorithms used to make a decision are often not available to policy makers or enforcement agents. Second, algorithms are becoming increasingly complex and the uninterpretability limits an investigator’s ability to identify systematic discrimination through analysis of algorithms. Rather, they have to examine the output from algorithms to check for anomalous results, increasing the difficulty and uncertainty of the task.

Data exploratory analysis is a simple but effective tool to detect data bias. In this initial step of data analysis, we can use basic data statistics and visual exploration to understand what is in a dataset and the characteristics of the data. For algorithmic bias, one of the earliest methods is to compare the selection rate of different groups. Discrimination is highly possible if the selection rate for one group is sufficiently lower than that for other groups. For example, the US Equal Employment Opportunity Commission (EEOC) advocates the “four-fifths rule” or “80% rule”to identify a disparate impact. Suppose $Y$ denotes a binary class (e.g., hire or not), $A$ is the protected attribute (e.g., gender), a dataset presents disparate impact if

\[ \frac{Pr(Y=1|A=0)}{Pr(Y=1|A=1)} \leq \tau =0.8. \]

However, statistical disparity does not necessarily indicate discrimination. If one group has disproportionately more qualified members, we may expect the differences between groups in the results.

A more frequently used approach is regression analysis, which is performed to examine the likelihood of favorable (or adverse) decisions across groups based on sensitive attributes. A significant, non-zero coefficient of the sensitive attributes given a correctly specified regression signals the presence of discrimination. However, we cannot guarantee to observe all the factors the decision maker considers. Therefore, instead of using rate at which decisions are made (e.g., the loan approval rates), bias detection can be based on the success rate of the decisions (e.g., the payback rate of the approved applicants), i.e., the outcome test . Another less popular statistical approach for bias detection is benchmarking. The major challenge of benchmarking analysis is identifying the distribution of the sensitive attributes of the benchmark population where sensitive attributes are unlikely to influence the identification of being at-risk. Some solutions can be seen in. Recently, AI researchers have developed tools to automatically detect bias. For instance, drawing on techniques in natural language processing and moral foundation theories, the tool by mokhberian2020moral can understand structure and nuances of content consistently showing up on left-leaning and right-leaning news sites, aiming to help consumers better prepare for unfamiliar news source. In earlier efforts, an international research group launched a non-profit organization Project Implicit[] in 1998 aimed at detecting implicit social bias.


If both of the first two dimensions fail, we may rely on the last dimension to prevent/mitigate the negative impact of socially indifferent AI algorithms on the end-users. We continue the discussions about disinformation, cyberbullying, and bias, with a focus on the prevention and mitigation strategies.

Disinformation Prevention/Mitigation

Preventing the generation/spread of disinformation and mitigating its negative impact is an urgent task because disinformation typically spread fasterthan normal information due to the catchy news content and the ranking algorithms operating behind the online news platforms. To increase user engagement, social recommender systems are designed to recommend popular posts and trending content. Therefore, disinformation often gains more visibility. An effective approach for disinformation mitigation is to govern this visibility of news, e.g., recommendation and ranking based algorithms. Mitigation also relates to early detection.

Network intervention can slow down the spread of disinformation by influencing the exposed users in a social network. For example, we can launch a counter-cascade that consists of fact-checked version of false news articles. This is commonly referred to as the influence limitation or minimization problem. Given a network with accessible counter-cascade, the goal is to find a (minimum) set of nodes in this network such that the effect of the original cascade can be minimized. A variety of approximation algorithmshave been proposed to solve the NP-hard problem and the variants. When applied to disinformation mitigation, they seek to inoculate as many nodes as possible in a short period of time. It is possible to extend the two cascades into tasks with multiple cascades, where we can further consider the different priorities of these cascades, i.e., each cascade influences the node in the network differently. The second method for disinformation mitigation is content flagging: social media platforms allow users to flag' or report’ a news content if they find it offensive, harmful, and/or false. Big social media companies such as Facebook hired professional moderators to manually investigate and/or remove these content. However, considering the millions of news generated/spread every minute, it is impractical for these moderators to manually review all the news. The solution comes to the crowd wisdom – users can choose to `flag’ the content if it violates the community guidelines of the platform. Some platforms can further provide feedback for these users about if their fact-check is correct or not. User behavior is an effective predictor for disinformation detection, therefore, the third prevention method leverages the differences between user behaviors to identify susceptible or gullible users. For example, it is shown inthat groups of vulnerable Twitter users can be identified in fake news consumption. Other studiesalso suggest that older people are more likely to spread disinformation.

Cyberbullying Prevention/Mitigation

In contrast to the large amount of work in cyberbullying detection, efforts for its prevention and mitigation have been a few. Some research suggests that prevention/mitigation strategies are defined at different levels. At technological level, we can consider providing parental control service, firewall blocking service, online services rules, text-message control, and mobile parental control, e.g., KnowBullying and BullyBlocker. Another effective tool is psychological approach, such as talking and listening to cyber-victims, providing counseling services, encouraging victims to make new relations and join social clubs. At education level, we are responsible to educate end-users, help improve their technical and cognitive skills. At administrative level, it is important for organizations and government to develop policies to regulate using free service and enhance workplace environment. Therefore, the goal of cyberbullying prevention/mitigation can only be accomplished with interdisciplinary collaborations, e.g., psychology, public health, computer science, and other behavioral and social sciences. One example is that computer and social scientists attempted to understand behavior of users in realistic environments by designing social media site for experimentation such as controlled study and post-study survey.

Existing solutions to preventing cyberbullying can report/control/warn about message content (e.g., dinakar2012common,vishwamitra2017mcdefender), provide support for victims (e.g., vishwamitra2017mcdefender), and educate both victims and bullies (e.g., dinakar2012common). A variety of anti-bully apps are also available to promote well-being of users. For example, NoMoreBullyingMe App provides online meditation techniques to support victims; “Honestly” Appencourages users to share positive responses with each other (e.g., sing a song). However, current cyberbullying prevention strategies often do not work as desired because of the complexity and nuance with which adolescents bully others online.

Bias Mitigation

Prior approaches for bias mitigation focus on either designing fair machine learning algorithms or theorizing on the social and ethical aspects of machine learning discrimination. From the technical aspect, approaches to fairness typically can be categorized into pre-processing (prior to modelling), in-processing (at the point of modelling), and post-processing (after modelling). One condition to use pre-processing approaches is that the algorithm is allowed to modify the training data. We can then transform the data to remove the discrimination. In-processing approaches eliminate bias by modifying algorithms during the training process. We can either incorporate fairness notion into the objective function or impose fairness constraint. When neither training data nor model can be modified, we can use post-processing approaches to reassign the predicted labels based on a defined function and a holdout set which was not used in the model training phase. Most of these approaches are built on the notion of protected or sensitive variables that define the (un)privileged groups. Commonly used protected variables are age, gender, marital status, race, and disabilities. A shared characteristic of these groups is they are disproportionately (less) more likely to be positively classified. Fairness measures are important to quantify fairness in the development of fairness approaches. However, creating generalized notions of fairness quantification is a challenging task. Depending on the protected target, fairness metrics are usually designed for individual fairness (e.g., every one is treated equally), group fairness (e.g., different groups such as women vs men are treated equally), or subgroup fairness. Drawing on theories in causal inference, individual fairness also includes counterfactual fairness which describes that a decision is fair towards an individual if the result was same when s/he had taken a different sensitive attribute.

Recent years have witnessed immense progress of fair machine learning – a variety of methods have been proposed to address bias and discrimination over different applications. We focus on two mainstream methods: fair classification and fair regression. A review of machine learning fairness can be referred to.

(1) Fair Classification. For a (binary) classifier with sensitive variable $S$ , the target variable $Y$ , and the classification score $R$ , general fairness desiderata have three “non-discrimination” criteria: Independence , i.e., $R\indep S$ ; Separation , i.e., $R\indep S|Y$ ; and Sufficiency , i.e., $Y\indep S | R$ . Fair machine learning algorithms need to adopt/create specific fairness definitions that fit into context. Common methods in fair classification include blinding, causal methods, transformation, sampling and subgroup analysis, adversarial learning, reweighing, and regularization and constraint optimization.

(2) Fair Regression. The goal of fair regression is to jointly minimize the difference between true and predicted values and ensure fairness. It follows the general formulation of fair classification but with continuous rather than binary/categorical target variable. Accordingly, the fairness definition, metrics, and the basic algorithms are adapted from classification to regression. For example, it is suggested using statistical parity and bounded-group-loss metrics to measure fairness in regression. Bias in linear regression is considered as the effects of a sensitive attribute on the target variable through the mean difference between groups and AUC metrics. One commonly used approach in fair regression is regularization, e.g.,.

Apart from fair machine learning, algorithm operators are encouraged to share enough details about how research is carried out to allow others to replicate it. This is a leap for mitigating bias as it helps end-users with different technical background to understand how the algorithm works before making any decision. It is also suggested that AI technologists and researchers develop a bias impact statement as a self-regulatory practice. It can help probe and avert any potential biases that are injected into or resultant from algorithmic decision. Some example questions in the statement are “What will the automated decision do?”, “How will potential bias be detected?”, and “What are the operator incentives”. In algorithm design, researchers are also responsible to encourage the role of diversity within the team, training data, and the level of cultural sensitivity. The “diversity-in-design” mechanism aims to take deliberate and transparent actions to address the upfront cultural biases and stereotypes. Furthermore, we might also consider updating nondiscrimination and other civil rights laws to interpret and redress online disparate impacts. An example of such consideration is to unambiguously define the thresholds and parameters for the disparate treatments of protected groups before the algorithm design.

Open Problems and Challenges

This survey reveals that the current understanding of SRAs is insufficient and future efforts are in great need. Here, we describe several primary challenges, as summarized in Figure challenges, in an attempt to broaden the discussions on future directions and potential solutions.

Primary challenges and open problems we confront in developing SRAs. Some challenges relate to SRAs' internal mechanisms that fulfill AI's ethical responsibilities whilst others relate to SRAs' roles to which both ethical and philanthropic responsibilities are the keys.challenges

Primary challenges and open problems we confront in developing SRAs. Some challenges relate to SRAs' internal mechanisms that fulfill AI's ethical responsibilities whilst others relate to SRAs' roles to which both ethical and philanthropic responsibilities are the keys.

Causal Learning. The correlation fallacy causes AI algorithms to meet with fundamental obstacles in order to commit social responsibility. These obstacles are robustness, explainability, and cause-effect connections. The era of big data has changed the ways of learning causality, and meanwhile, causal learning becomes an indispensable ingredient for AI systems to achieve human-level intelligence. There are a number of benefits to incorporate causality in the next-generation of AI. For example, teaching AI algorithms to understand “why” can help them transfer their knowledge to different but similar domains. Early efforts in SRAs attempted to employ causal learning concept and methods such as intervention, counterfactual, do -calculus, propensity scoring to address fairness (e.g., counterfactual fairness) and interpretability (causal interpretability) issues. They have shown prominent results in these tasks.

Context Matters. Context is the core to SRAs due to its inherently elaborate nature, e.g., the “Transparency Paradox”. Understanding and quantifying the relationships among the various principles (some are tradeoffs and some are not), e.g., fairness, transparency, and safety, have to be placed in specific context. One such context is the social context . Existing SRAs (e.g., fair machine learning), once introduced into a new social context, may render current technical interventions ineffective, inaccurate, and even dangerously misguided. A recent studyfound that while fair ranking algorithms such as Det-Greedyhelp increase the exposure of minority candidates, their effectiveness is limited by the job contexts in which employers have a preference to particular genders. How to properly integrate social context into SRAs is still an open problemAlgorithmic context_ (e.g., supervised learning, unsupervised learning, and reinforcement learning) is also extremely important when designing SRAs for the given data. A typical example is the feedback loop problem in predictive policing. A subtle algorithmic choice can have huge ramifications on the results. Consequently, we need to understand the algorithmic context to make the right algorithmic choices when designing socially responsible AI systems. Designing context-aware SRAs is the key to achieving Social Responsibility of AI.

Responsible Model Release and Governance. Nontransparent model reporting is one of the main causes of AI indifferent behaviors. As a critical step to clarify the intended use cases of AI systems and the contexts for which they are well suited, responsible model release and governance has been receiving growing attentions from both industry and academia. One role of SRAs is to bring together the tools, solutions, practices, and people to govern the built AI systems across its life cycle. At this early stage, some research results suggested that released models be accompanied by documentation detailing various characteristics of the systems, e.g., what it does, how it works, and why it matters. For example, the AI FactSheetsadvocates to use a factsheet completed and voluntarily released by AI developers to increase the transparency of their services. A similar concept is model cards, short documents that provide benchmarked evaluation for the trained AI models in a variety of conditions, e.g., different cultural or demographic groups. Typically, a model card should include the model details, intended use, evaluation metrics, training/evaluation data, ethical considerations, and caveats and recommendations. To help increase transparency, manage risk, and build trust in AI, AI technologists and researchers are responsible to address various challenges faced in creating useful AI release documentationand develop effective AI governance tools.

AI Defenses. Developing AI systems that outwit malicious AI is still at an early stage. Since we have not fully understood how AI systems work, they are not only vulnerable to attack but also likely to fail in surprising ways. As a result, it is critical and urgent to work on designing systems that are provably robust to help ensure that the AI systems are not vulnerable to adversaries. At least two capabilities an “AI firewall” needs to be equipped with: one capability is to probe an AI algorithm for weaknesses (e.g., perturb the input of an AI system to make it misbehave) and the other one is to automatically intercept potentially problematic inputs. Some big tech companies have started building their own AI defenses to identify the weak spots, e.g., the “red team” in Facebook, the software framework released by Microsoft, Nvidia, IBM, and 9 other companies. AI defenses reflect the fundamental weakness in modern AI and make AI systems more robust and intelligent.

AI Ethics Principles and Policies. Current AI principles and policies for ethical practice have at least two common criticisms: (1) they are too vaguely formulated to prove to be helpful in guiding practice; and (2) they are defined primarily by AI researchers and powerful people with mainstream populations in mind. For the first criticism, to help operationalize AI principles in practice and organizations confront inevitable value trade-offs, it has been suggested to redefine AI principles based on philosophical theories in applied ethics. Particularly, it categorizes published AI principles (e.g., fairness, accountability, and transparency) into three widely used core principles in applied ethics: autonomy, beneficence (avoiding harm and doing good), and justice. The core principles “invoke those values that theories in moral and political philosophy argue to be intrinsically valuable, meaning their value is not derived from something else”. Existing AI principles are instrumental principles that “build on concepts whose values are derived from their instrumental effect in protecting and promoting intrinsic values”. Operationazable AI principles help effectively put ethical AI in practice and reduce the responsible AI Gap in companies. To address the second criticism, we need to best elicit the inputs and values of diverse voices from the Subjects of SRAs, i.e., the minority and disadvantaged groups, and incorporate their perspectives into the tech policy document design process. If we align values of AI systems through a panel of people (who are compensated for doing this), they too can influence the system behavior, and not just the powerful people or AI researchers.

Understanding Why . Many AI systems are designed and developed without fully understanding why : What do we wish the AI system do? This is often the reason that these systems fail to represent the goals of the real tasks, a primary source of AI risks. The problem can become more challenging when the AI system is animated through a number of lines of code that lack nuance, creating a machine that does not align with our true intentions. As the first step, understanding why clearly defines our social expectation of AI systems and paves way for more specific questions such as “What is the problem? Who will define it? and what are the right people to include?”. Answering why helps us effectively abolish the development of socially indifferent AI systems in the first place and also helps understand the kinds of deception an AI system may learn by itself.

Long-Term Effect. SRAs include social concepts such as fairness that can evolve over time along with the constant changes of human values and social dynamics. This raises the concerns about the commitment SRAs need to fulfill in the long term. For example, despite the various types of fairness definitions, once introduced into the dimension of time, the number of fairness definitions may be explosive. In addition, current fairness criteria may be considered as unfair in the future. Fairness criteria are essentially designed to promote long-term well-being. However, even a static fairness notion can fail to protect the target groups when there is a feedback loop in the overall system. How to build AI systems that can commit long-term responsibility is extremely challenging and rarely studied thus far. Initial results of long-term fairnesshighlight the importance of measurement and temporal modeling in the evaluation of fairness criteria.

Humans in the Loop. While existing techniques in SRAs have indeed made significant progress towards responsible AI systems, their usefulness can be limited in some settings where the decisions made are actually poorer for every individual. For issues of fairness in prediction, for example, many findings (e.g., pfohl2020empirical) have shown the concerns about the fairness-performance trade-off: the imposition of fairness comes at a cost to model performance. Predictions are less reliable and moreover, different notions of fairness can make approaches to fairness conflict with one another. Having human in the loop matters when it comes to contextualizing the objectives of SRAs, especially for high-stake decisions. For instance, there are situations where the cut-off values of fairness for two subgroups are different, and humans can help calibrate the differences.

Responsible AI Gap in Industry. The far-reaching effect of reputational damage and employee disengagement result from AI misbehavior has forced company executives to begin understanding the risks of poorly designed AI systems and the importance of SRAs. While seeing many potential benefits of developing responsible AI systems such as increasing market share and long-term profitability, companies lack the knowledge of how to cross the “Responsible AI Gap” between principles and tangible actions. This is partly because companies view responsible AI solely as risk-avoidance mechanism and overlook its financial rewards. To capture the benefits of responsible AI in companies’ day-to-day business, companies need to go far beyond SRAs and examine every aspect of the end-to-end AI systems. A recent articlesuggested six basic steps to bridge the gulf between responsible AI and the reality: Empower responsible AI leadership, Develop principles, policies, and training, Establish human and AI governance, Conduct Responsible AI reviews, Integrate tools and methods, and Build and test a response plan . Even though the gap might be huge, small efforts built over time can let SRAs achieve a transformational impact on the businesses.

Interdisciplinary Research. Current public dialog on SRAs has been focused on a narrow subset of fields, blinding us to the opportunities presented by interdisciplinary research. It is necessary to work with researchers from different disciplines whose contributions are sorely needed, e.g., psychologist, social scientist, educators, and humanities. Non-profit organizations are both the beneficiaries and benefactors of SRAs. In partnering with non-profits and social enterprises will not only unleash AI’s potential for benefiting societal well-being, but also let AI technologists and researchers have the opportunity to encounter the real problems we are currently facing. A better understanding of what problems need to be solved helps identify SRAs that need to be created. Moreover, as big tech companies bankroll more work of academic researchers, much of ethics-based research gets concentrated in the hands of a few companies that can afford it. This is problematic because we are over reliant on the same companies that are producing socially indifferent AI systems. We need interdisciplinary and decentralized research to create SRAs and simultaneously achieve the four levels in the pyramid of Social Responsibility of AI.

SRAs for Social Good. The last challenge regards the intended use of SRAs. When SRAs are leveraged to uplift humanity, a trust in AI is further enhanced. There has been a burgeoning of AI-for-social-good movement that produces AI algorithms to help reduce poverty, hunger, inequality, injustice, climate change, ill health, and other causes of human suffering. Compared to deploying cutting-edge AI systems to solve these critical issues, a more urgent question to examine is “What makes an AI project good” in order to prevent the detrimental consequences of AI. In addition to Protecting, Informing, and Preventing, social good applications also relate closely to Fundraise and Greenlight. Applying SRAs to target solicitations for donations largely helps with fundraising for non-profits, charitable organizations, and universities. Greenlight describes how SRAs can help allocate grants and other types of resources by predicting the success rates of project proposals. It plays an important role in improving execution effectiveness of organizations. Developing social good applications that leverage power of SRAs to benefit society is an equally endeavor for AI technologists and researchers.


This survey examines multiple dimensions of research in Social Responsibility of AI, seeking to broaden the current discussions primarily focused on decision-making algorithms that perform scoring and classification tasks. We argue that having a full scope of AI to capture the connections among all the major dimensions is the key to Socially Responsible AI Algorithms (SRAs). This work starts with an inclusive definition of Social Responsibility of AI, highlighting the principles (e.g., Fairness, Inclusiveness), means (e.g., SRAs), and objective (e.g., improving humanity). To better frame the Social Responsibility of AI, we also introduce the pyramid with four-level responsibilities of AI systems: functional responsibilities, legal responsibilities, ethical responsibilities, and philanthropic responsibilities. We then focus our discussions on how to achieve Social Responsibility of AI via the proposed framework SRAs. In the definition of SRAs, we emphasize that the functional and societal aspects are integral parts of AI algorithms. Given that both the functional and legal responsibilities are the usual focuses in AI research and development, we particularly investigate the essentials to achieve AI’s ethical responsibilities: the subjects, causes, objectives, and means. For the intended use (i.e., roles) of SRAs, we discuss the need of philanthropic and ethical responsibilities for AI systems to protect and inform users, and prevent/mitigate the negative impact. We conclude with several open problems and major challenges in SRAs. At this pivotal moment in the development of AI, it is of vital importance to discuss AI ethics and specify Social Responsibility of AI. Drawing from the theory of moral license– when humans are good, we give ourselves moral license to be bad – we argue that simply asking AI to do good is insufficient and inefficient, and more can be done for AI technologists and researchers to develop socially responsible AI systems. We hope this work can propel future research in various fields to tackle together the challenges and steer a course towards a beneficial AI future.


This material is based upon work supported by, or in part by, the U.S. Army Research Laboratory (ARL), the U.S. Army Research Office (ARO), the Office of Naval Research (ONR) under contract/grant numbers W911NF2110030, W911NF2020124, and N00014-21-1-4002, as well as by the National Science Foundation (NSF) grants 1909555 and 2036127. We thank Dr. Lise Getoor and Dr. Hosagrahar V. Jagadish for their invaluable suggestions. 0.2in


   2  year = {2016},
   3  pages = {2016},
   4  volume = {23},
   5  journal = {ProPublica, May},
   6  author = {Angwin, Julia and Larson, Jeff and Mattu, Surya and Kirchner, Lauren},
   7  title = {Machine bias},
  11  year = {2015},
  12  pages = {2015},
  13  volume = {1},
  14  journal = {Retrieved September},
  15  author = {Angwin, Julia and Larson, Jeff},
  16  title = {The tiger mom tax: Asians are nearly twice as likely to get a higher price from Princeton review},
  20  year = {2014},
  21  pages = {211--407},
  22  number = {3-4},
  23  volume = {9},
  24  journal = {FnT-TCS},
  25  author = {Dwork, Cynthia and Roth, Aaron and others},
  26  title = {The algorithmic foundations of differential privacy.},
  30  year = {2019},
  31  pages = {59--68},
  32  booktitle = {FAT*},
  33  author = {Selbst, Andrew D and Boyd, Danah and Friedler, Sorelle A and Venkatasubramanian, Suresh and Vertesi, Janet},
  34  title = {Fairness and abstraction in sociotechnical systems},
  38  publisher = {The MIT Press},
  39  year = {2017},
  40  author = {Peters, Jonas and Janzing, Dominik and Sch{\"o}lkopf, Bernhard},
  41  title = {Elements of causal inference},
  45  organization = {PMLR},
  46  year = {2019},
  47  pages = {6056--6065},
  48  booktitle = {ICML},
  49  author = {Suter, Raphael and Miladinovic, Djordje and Sch{\"o}lkopf, Bernhard and Bauer, Stefan},
  50  title = {Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness},
  54  year = {2019},
  55  journal = {arXiv preprint arXiv:1901.10912},
  56  author = {Bengio, Yoshua and Deleu, Tristan and Rahaman, Nasim and Ke, Rosemary and Lachapelle, S{\'e}bastien and Bilaniuk, Olexa and Goyal, Anirudh and Pal, Christopher},
  57  title = {A meta-transfer objective for learning to disentangle causal mechanisms},
  61  year = {2019},
  62  journal = {arXiv preprint arXiv:1907.02893},
  63  author = {Arjovsky, Martin and Bottou, L{\'e}on and Gulrajani, Ishaan and Lopez-Paz, David},
  64  title = {Invariant risk minimization},
  68  year = {2019},
  69  journal = {arXiv preprint arXiv:1911.10500},
  70  author = {Sch{\"o}lkopf, Bernhard},
  71  title = {Causality for machine learning},
  75  publisher = {Morgan \& Claypool Publishers},
  76  year = {2018},
  77  pages = {1--207},
  78  number = {3},
  79  volume = {12},
  80  journal = {Synthesis Lectures on Artificial Intelligence and Machine Learning},
  81  author = {Chen, Zhiyuan and Liu, Bing},
  82  title = {Lifelong machine learning},
  86  institution = {Deakin Univeristy},
  87  year = {2015},
  88  author = {Hosen, Mohammad Anwar},
  89  title = {Prediction interval-based modelling and control of nonlinear processes},
  93  publisher = {IEEE},
  94  year = {2001},
  95  pages = {323--332},
  96  number = {4},
  97  volume = {24},
  98  journal = {IEEE Transactions on Electronics Packaging Manufacturing},
  99  author = {Ho, SL and Xie, M and Tang, LC and Xu, K and Goh, TN},
 100  title = {Neural network modeling with confidence bounds: a case study on the solder paste deposition process},
 104  publisher = {Springer},
 105  year = {2019},
 106  pages = {89--103},
 107  number = {2},
 108  volume = {21},
 109  journal = {Ethics and Information Technology},
 110  author = {Young, Meg and Magassa, Lassana and Friedman, Batya},
 111  title = {Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents},
 115  year = {2020},
 116  pages = {1--8},
 117  booktitle = {Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems},
 118  author = {Hind, Michael and Houde, Stephanie and Martino, Jacquelyn and Mojsilovic, Aleksandra and Piorkowski, David and Richards, John and Varshney, Kush R},
 119  title = {Experiences with Improving the Transparency of {AI} Models and Services},
 123  publisher = {IBM},
 124  year = {2019},
 125  pages = {6--1},
 126  number = {4/5},
 127  volume = {63},
 128  journal = {IBM Journal of Research and Development},
 129  author = {Arnold, Matthew and Bellamy, Rachel KE and Hind, Michael and Houde, Stephanie and Mehta, Sameep and Mojsilovi{\'c}, Aleksandra and Nair, Ravi and Ramamurthy, K Natesan and Olteanu, Alexandra and Piorkowski, David and others},
 130  title = {FactSheets: Increasing trust in {AI} services through supplier's declarations of conformity},
 134  year = {1996},
 135  booktitle = {Proc. of the 9th Yale WALS},
 136  author = {Ungar, Lyle H and De Veaux, Richard D and Rosengarten, Evelyn},
 137  title = {Estimating prediction intervals for artificial neural networks},
 141  publisher = {Cambridge Univ. Press},
 142  year = {2001},
 143  pages = {298--326},
 144  journal = {Clinical Applications of ANNs},
 145  author = {Dybowski, Richard and Roberts, Stephen J},
 146  title = {Confidence intervals and prediction intervals for feed-forward neural networks},
 150  publisher = {IEEE},
 151  year = {2006},
 152  pages = {1460--1478},
 153  number = {3},
 154  volume = {53},
 155  journal = {IEEE Transactions on Nuclear Science},
 156  author = {Zio, Enrico},
 157  title = {A study of the bootstrap method for estimating the accuracy of artificial neural networks in predicting nuclear transient processes},
 161  publisher = {IEEE},
 162  year = {2014},
 163  pages = {1810--1815},
 164  number = {8},
 165  volume = {26},
 166  journal = {IEEE Transactions on Neural Networks and Learning Systems},
 167  author = {Khosravi, Abbas and Nahavandi, Saeid and Srinivasan, Dipti and Khosravi, Rihanna},
 168  title = {Constructing optimal prediction intervals by using neural networks and bootstrap method},
 172  organization = {IEEE},
 173  year = {2016},
 174  pages = {328--335},
 175  booktitle = {2016 FUZZ-IEEE},
 176  author = {Mar{\'\i}n, Luis G and Valencia, Felipe and S{\'a}ez, Doris},
 177  title = {Prediction interval based on type-2 fuzzy systems for wind power generation and loads in microgrid control design},
 181  publisher = {Elsevier},
 182  year = {2014},
 183  pages = {172--180},
 184  volume = {127},
 185  journal = {Neurocomputing},
 186  author = {Quan, Hao and Srinivasan, Dipti and Khosravi, Abbas},
 187  title = {Particle swarm optimization for construction of neural network-based prediction intervals},
 191  publisher = {IEEE},
 192  year = {2014},
 193  pages = {4420--4429},
 194  number = {7},
 195  volume = {62},
 196  journal = {IEEE Transactions on Industrial Electronics},
 197  author = {Hosen, Mohammad Anwar and Khosravi, Abbas and Nahavandi, Saeid and Creighton, Douglas},
 198  title = {Improving the quality of prediction intervals through optimal aggregation},
 202  publisher = {IEEE},
 203  year = {2013},
 204  pages = {1033--1044},
 205  number = {3},
 206  volume = {29},
 207  journal = {IEEE Transactions on Power Systems},
 208  author = {Wan, Can and Xu, Zhao and Pinson, Pierre and Dong, Zhao Yang and Wong, Kit Po},
 209  title = {Probabilistic forecasting of wind power generation using extreme learning machine},
 213  publisher = {Elsevier},
 214  year = {2015},
 215  pages = {234--244},
 216  volume = {83},
 217  journal = {Renewable Energy},
 218  author = {Chu, Yinghao and Li, Mengying and Pedro, Hugo TC and Coimbra, Carlos FM},
 219  title = {Real-time prediction intervals for intra-hour DNI forecasts},
 223  organization = {IEEE},
 224  year = {2015},
 225  pages = {1919--1924},
 226  booktitle = {2015 IEEE ECCE},
 227  author = {Errouissi, Rachid and Cardenas-Barrera, Julian and Meng, Julian and Castillo-Guerra, Eduardo and Gong, Xun and Chang, Liuchen},
 228  title = {Bootstrap prediction interval estimation for wind speed forecasting},
 232  publisher = {Springer},
 233  year = {2016},
 234  pages = {22},
 235  number = {1},
 236  volume = {2},
 237  journal = {Modeling Earth Systems and Environment},
 238  author = {Kasiviswanathan, KS and Sudheer, KP},
 239  title = {Comparison of methods used for quantifying prediction interval in artificial neural network hydrologic models},
 243  publisher = {Springer},
 244  year = {2009},
 245  pages = {345},
 246  number = {4},
 247  volume = {18},
 248  journal = {Neural Computing and Applications},
 249  author = {Lu, Tao and Viljanen, Martti},
 250  title = {Prediction of indoor temperature and relative humidity using neural network models: model comparison},
 254  publisher = {Taylor \& Francis},
 255  year = {1997},
 256  pages = {748--757},
 257  number = {438},
 258  volume = {92},
 259  journal = {Journal of the American Statistical Association},
 260  author = {Hwang, JT Gene and Ding, A Adam},
 261  title = {Prediction intervals for artificial neural networks},
 265  organization = {Springer},
 266  year = {2020},
 267  pages = {556--568},
 268  booktitle = {IPMU},
 269  author = {St{\aa}hl, Niclas and Falkman, G{\"o}ran and Karlsson, Alexander and Mathiason, Gunnar},
 270  title = {Evaluation of Uncertainty Quantification in Deep Learning},
 274  publisher = {IEEE},
 275  year = {2018},
 276  pages = {36218--36234},
 277  volume = {6},
 278  journal = {IEEE Access},
 279  author = {Kabir, HM Dipu and Khosravi, Abbas and Hosen, Mohammad Anwar and Nahavandi, Saeid},
 280  title = {Neural network-based uncertainty quantification: A survey of methodologies and applications},
 284  year = {2019},
 285  journal = {arXiv preprint arXiv:1911.05116},
 286  author = {Beale, Nicholas and Battey, Heather and Davison, Anthony C and MacKay, Robert S},
 287  title = {An Unethical Optimization Principle},
 291  year = {2018},
 292  pages = {160--171},
 293  booktitle = {FAT*},
 294  author = {Ensign, Danielle and Friedler, Sorelle A and Neville, Scott and Scheidegger, Carlos and Venkatasubramanian, Suresh},
 295  title = {Runaway feedback loops in predictive policing},
 299  year = {2014},
 300  pages = {2879--2887},
 301  booktitle = {NeurIPS},
 302  author = {Kairouz, Peter and Oh, Sewoong and Viswanath, Pramod},
 303  title = {Extremal mechanisms for local differential privacy},
 307  year = {2014},
 308  pages = {1054--1067},
 309  booktitle = {CCS},
 310  author = {Erlingsson, {\'U}lfar and Pihur, Vasyl and Korolova, Aleksandra},
 311  title = {Rappor: Randomized aggregatable privacy-preserving ordinal response},
 315  organization = {IEEE},
 316  year = {2011},
 317  pages = {1932--1935},
 318  booktitle = {ICASSP},
 319  author = {Varodayan, David and Khisti, Ashish},
 320  title = {Smart meter privacy using a rechargeable battery: Minimizing the rate of information leakage},
 324  publisher = {Elsevier},
 325  year = {1984},
 326  pages = {270--299},
 327  number = {2},
 328  volume = {28},
 329  journal = {JCSS},
 330  author = {Goldwasser, Shafi and Micali, Silvio},
 331  title = {Probabilistic encryption},
 335  year = {2009},
 336  pages = {169--178},
 337  booktitle = {STOC},
 338  author = {Gentry, Craig},
 339  title = {Fully homomorphic encryption using ideal lattices},
 343  organization = {Springer},
 344  year = {2019},
 345  pages = {473--492},
 346  booktitle = {CT-RSA},
 347  author = {Makri, Eleftheria and Rotaru, Dragos and Smart, Nigel P and Vercauteren, Frederik},
 348  title = {{EPIC}: efficient private image classification (or: Learning from the masters)},
 352  year = {2019},
 353  pages = {49--58},
 354  booktitle = {IoTDI},
 355  author = {Malekzadeh, Mohammad and Clegg, Richard G and Cavallaro, Andrea and Haddadi, Hamed},
 356  title = {Mobile sensor data anonymization},
 360  year = {2018},
 361  pages = {1--6},
 362  booktitle = {Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems},
 363  author = {Malekzadeh, Mohammad and Clegg, Richard G and Cavallaro, Andrea and Haddadi, Hamed},
 364  title = {Protecting sensory data against sensitive inferences},
 368  publisher = {Elsevier},
 369  year = {2020},
 370  pages = {101132},
 371  journal = {Pervasive and Mobile Computing},
 372  author = {Malekzadeh, Mohammad and Clegg, Richard G and Cavallaro, Andrea and Haddadi, Hamed},
 373  title = {Privacy and utility preserving sensor-data transformations},
 377  year = {2018},
 378  pages = {2407--2416},
 379  booktitle = {KDD},
 380  author = {Wang, Ji and Zhang, Jianguo and Bao, Weidong and Zhu, Xiaomin and Cao, Bokai and Yu, Philip S},
 381  title = {Not just privacy: Improving performance of private deep learning in mobile cloud},
 385  year = {2016},
 386  pages = {201--210},
 387  booktitle = {ICML},
 388  author = {Gilad-Bachrach, Ran and Dowlin, Nathan and Laine, Kim and Lauter, Kristin and Naehrig, Michael and Wernsing, John},
 389  title = {Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy},
 393  year = {2017},
 394  pages = {35},
 395  volume = {2017},
 396  journal = {IACR Cryptol. ePrint Arch.},
 397  author = {Chabanne, Herv{\'e} and de Wargny, Amaury and Milgram, Jonathan and Morel, Constance and Prouff, Emmanuel},
 398  title = {Privacy-Preserving Classification on Deep Neural Network.},
 402  year = {2017},
 403  pages = {619--631},
 404  booktitle = {CCS},
 405  author = {Liu, Jian and Juuti, Mika and Lu, Yao and Asokan, Nadarajah},
 406  title = {Oblivious neural network predictions via minionn transformations},
 410  publisher = {Wnd Books},
 411  year = {2013},
 412  author = {Pacepa, Ion Mihai and Rychlak, Ronald J},
 413  title = {Disinformation: Former Spy Chief Reveals Secret Strategy for Undermining Freedom, Attacking Religion, and Promoting Terrorism},
 417  year = {2020},
 418  journal = {arXiv preprint arXiv:2010.09113},
 419  author = {Bhattacharjee, Amrita and Shu, Kai and Gao, Min and Liu, Huan},
 420  title = {Disinformation in the Online Information Ecosystem: Detection, Mitigation and Challenges},
 424  year = {2012},
 425  pages = {171--175},
 426  booktitle = {ACL},
 427  author = {Feng, Song and Banerjee, Ritwik and Choi, Yejin},
 428  title = {Syntactic stylometry for deception detection},
 432  year = {2011},
 433  journal = {arXiv preprint arXiv:1107.4557},
 434  author = {Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T},
 435  title = {Finding deceptive opinion spam by any stretch of the imagination},
 439  year = {2019},
 440  pages = {312--320},
 441  booktitle = {WSDM},
 442  author = {Shu, Kai and Wang, Suhang and Liu, Huan},
 443  title = {Beyond news contents: The role of social context for fake news detection},
 447  organization = {IEEE},
 448  year = {2018},
 449  pages = {430--435},
 450  booktitle = {2018 IEEE MIPR},
 451  author = {Shu, Kai and Wang, Suhang and Liu, Huan},
 452  title = {Understanding user profiles on social media for fake news detection},
 456  publisher = {Mary Ann Liebert, Inc., publishers 140 Huguenot Street, 3rd Floor New~…},
 457  year = {2020},
 458  pages = {171--188},
 459  number = {3},
 460  volume = {8},
 461  journal = {Big Data},
 462  author = {Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan},
 463  title = {FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media},
 467  organization = {IEEE},
 468  year = {2018},
 469  pages = {272--279},
 470  booktitle = {2018 FRUCT},
 471  author = {Della Vedova, Marco L and Tacchini, Eugenio and Moret, Stefano and Ballarin, Gabriele and DiPierro, Massimo and de Alfaro, Luca},
 472  title = {Automatic online fake news detection combining content and social signals},
 476  year = {2020},
 477  journal = {arXiv preprint arXiv:2004.01732},
 478  author = {Shu, Kai and Zheng, Guoqing and Li, Yichuan and Mukherjee, Subhabrata and Awadallah, Ahmed Hassan and Ruston, Scott and Liu, Huan},
 479  title = {Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News},
 483  year = {2018},
 484  pages = {3834--3840},
 485  volume = {18},
 486  booktitle = {IJCAI},
 487  author = {Qian, Feng and Gong, Chengyue and Sharma, Karishma and Liu, Yan},
 488  title = {Neural User Response Generator: Fake News Detection with Collective User Intelligence.},
 492  publisher = {Public Library of Science San Francisco, CA USA},
 493  year = {2015},
 494  pages = {e0128193},
 495  number = {6},
 496  volume = {10},
 497  journal = {PloS One},
 498  author = {Ciampaglia, Giovanni Luca and Shiralkar, Prashant and Rocha, Luis M and Bollen, Johan and Menczer, Filippo and Flammini, Alessandro},
 499  title = {Computational fact checking from knowledge networks},
 503  year = {2019},
 504  number = {3},
 505  volume = {23},
 506  journal = {Computaci{\'o}n y Sistemas},
 507  author = {Janicka, Maria and Pszona, Maria and Wawer, Aleksander},
 508  title = {Cross-Domain Failures of Fake News Detection},
 512  year = {2019},
 513  pages = {395--405},
 514  booktitle = {KDD},
 515  author = {Shu, Kai and Cui, Limeng and Wang, Suhang and Lee, Dongwon and Liu, Huan},
 516  title = {d{EFEND}: Explainable fake news detection},
 520  publisher = {AAAS},
 521  year = {2018},
 522  pages = {1146--1151},
 523  number = {6380},
 524  volume = {359},
 525  journal = {Science},
 526  author = {Vosoughi, Soroush and Roy, Deb and Aral, Sinan},
 527  title = {The spread of true and false news online},
 531  organization = {Springer},
 532  year = {2019},
 533  pages = {234--244},
 534  booktitle = {SBP-BRiMS},
 535  author = {Rajabi, Zahra and Shehu, Amarda and Purohit, Hemant},
 536  title = {User behavior modelling for fake information mitigation on social web},
 540  publisher = {IEEE},
 541  year = {2017},
 542  journal = {IEEE Transactions on Affective Computing},
 543  author = {Salawu, Semiu and He, Yulan and Lumsden, Joanna},
 544  title = {Approaches to automated detection of cyberbullying: A survey},
 548  year = {2020},
 549  journal = {IEEE Internet Computing, Special Issue on Cyber-Social Health: Promoting Good and Countering Harm on Social Media},
 550  author = {Cheng, Lu and Silva, Yasin and Hall, Deborah and Liu, Huan},
 551  title = {Session-based Cyberbullying Detection: Problems and Challenges},
 555  year = {2020},
 556  booktitle = {CIKM},
 557  author = {Cheng, Lu and Shu, Kai and Wu, Siqi and Silva, Yasin N and Hall, Deborah L and Liu, Huan},
 558  title = {Unsupervised Cyberbullying Detection via Time-Informed Gaussian Mixture Model},
 562  publisher = {AAAS},
 563  year = {2019},
 564  pages = {eaau4586},
 565  number = {1},
 566  volume = {5},
 567  journal = {Science Advances},
 568  author = {Guess, Andrew and Nagler, Jonathan and Tucker, Joshua},
 569  title = {Less than you think: Prevalence and predictors of fake news dissemination on Facebook},
 573  year = {2011},
 574  pages = {665--674},
 575  booktitle = {WWW},
 576  author = {Budak, Ceren and Agrawal, Divyakant and El Abbadi, Amr},
 577  title = {Limiting the spread of misinformation in social networks},
 581  year = {2012},
 582  pages = {213--222},
 583  booktitle = {Web Science},
 584  author = {Nguyen, Nam P and Yan, Guanhua and Thai, My T and Eidenbenz, Stephan},
 585  title = {Containment of misinformation spread in online social networks},
 589  year = {2018},
 590  pages = {341--351},
 591  booktitle = {NeurIPS},
 592  author = {Tong, Amo and Du, Ding-Zhu and Wu, Weili},
 593  title = {On misinformation containment in online social networks},
 597  year = {2021},
 598  booktitle = {KDD},
 599  author = {Cheng, Lu and Guo, Ruocheng and Shu, Kai and Liu, Huan},
 600  title = {Causal Understanding of Fake News Dissemination on Social Media},
 604  publisher = {IEEE},
 605  year = {2019},
 606  pages = {1949--1978},
 607  number = {4},
 608  volume = {66},
 609  journal = {IEEE Transactions on Information Theory},
 610  author = {Diaz, Mario and Wang, Hao and Calmon, Flavio P and Sankar, Lalitha},
 611  title = {On the robustness of information-theoretic privacy measures and mechanisms},
 615  year = {2021},
 616  booktitle = {Proceedings of ACL},
 617  author = {Cheng, Lu and Mosallanezhad, Ahmadreza and Silva, Yasin N and Hall, Deborah L and Liu, Huan},
 618  title = {Mitigating Bias in Session-based Cyberbullying Detection: A Non-Compromising Approach},
 622  organization = {IEEE},
 623  year = {2019},
 624  pages = {1--5},
 625  booktitle = {2019 IEEE PESGM},
 626  author = {Pinceti, Andrea and Kosut, Oliver and Sankar, Lalitha},
 627  title = {Data-driven generation of synthetic load datasets preserving spatio-temporal features},
 631  year = {2019},
 632  journal = {arXiv preprint arXiv:1908.09635},
 633  author = {Mehrabi, Ninareh and Morstatter, Fred and Saxena, Nripsuta and Lerman, Kristina and Galstyan, Aram},
 634  title = {A survey on bias and fairness in machine learning},
 638  organization = {IEEE},
 639  year = {2008},
 640  pages = {111--125},
 641  booktitle = {2008 IEEE sp},
 642  author = {Narayanan, Arvind and Shmatikov, Vitaly},
 643  title = {Robust de-anonymization of large sparse datasets},
 647  organization = {Springer},
 648  year = {2006},
 649  pages = {265--284},
 650  booktitle = {Theory of Cryptography Conference},
 651  author = {Dwork, Cynthia and McSherry, Frank and Nissim, Kobbi and Smith, Adam},
 652  title = {Calibrating noise to sensitivity in private data analysis},
 656  publisher = {Public Library of Science},
 657  year = {2008},
 658  pages = {e1000167},
 659  number = {8},
 660  volume = {4},
 661  journal = {PLoS Genet},
 662  author = {Homer, Nils and Szelinger, Szabolcs and Redman, Margot and Duggan, David and Tembe, Waibhav and Muehling, Jill and Pearson, John V and Stephan, Dietrich A and Nelson, Stanley F and Craig, David W},
 663  title = {Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays},
 667  publisher = {World Scientific},
 668  year = {2002},
 669  pages = {557--570},
 670  number = {05},
 671  volume = {10},
 672  journal = {IJUFKS},
 673  author = {Sweeney, Latanya},
 674  title = {k-anonymity: A model for protecting privacy},
 678  publisher = {ACM New York, NY, USA},
 679  year = {2020},
 680  pages = {62--69},
 681  number = {1},
 682  volume = {64},
 683  journal = {Communications of the ACM},
 684  author = {Caba{\~n}as, Jos{\'e} Gonz{\'a}lez and Cuevas, {\'A}ngel and Arrate, Aritz and Cuevas, Rub{\'e}n},
 685  title = {Does Facebook use sensitive data for advertising purposes?},
 689  institution = {National Bureau of Economic Research},
 690  year = {2018},
 691  author = {List, John A and Momeni, Fatemeh},
 692  title = {When corporate social responsibility backfires: Theory and evidence from a natural field experiment},
 696  organization = {IEEE},
 697  year = {2019},
 698  pages = {1--1},
 699  booktitle = {Big Data},
 700  author = {Getoor, Lise},
 701  title = {Responsible Data Science},
 705  publisher = {ACM New York, NY, USA},
 706  year = {2012},
 707  pages = {1--30},
 708  number = {3},
 709  volume = {2},
 710  journal = {ACM TiiS},
 711  author = {Dinakar, Karthik and Jones, Birago and Havasi, Catherine and Lieberman, Henry and Picard, Rosalind},
 712  title = {Common sense reasoning for detection, prevention, and mitigation of cyberbullying},
 716  organization = {Springer},
 717  year = {2013},
 718  pages = {693--696},
 719  booktitle = {ECIR},
 720  author = {Dadvar, Maral and Trieschnigg, Dolf and Ordelman, Roeland and de Jong, Franciska},
 721  title = {Improving cyberbullying detection with user context},
 725  year = {2013},
 726  pages = {195--204},
 727  booktitle = {Web Science},
 728  author = {Kontostathis, April and Reynolds, Kelly and Garron, Andy and Edwards, Lynne},
 729  title = {Detecting cyberbullying: query terms and techniques},
 733  year = {2019},
 734  pages = {5829--5835},
 735  booktitle = {IJCAI},
 736  author = {Cheng, Lu and Li, Jundong and Silva, Yasin N and Hall, Deborah L and Liu, Huan},
 737  title = {{PI}-Bully: Personalized Cyberbullying Detection with Peer Influence.},
 741  organization = {Citeseer},
 742  year = {2011},
 743  pages = {446--449},
 744  booktitle = {ICWSM},
 745  author = {Biel, Joan-Isaac and Aran, Oya and Gatica-Perez, Daniel},
 746  title = {You are known by how you vlog: Personality impressions and nonverbal behavior in youtube.},
 750  publisher = {Elsevier},
 751  year = {2012},
 752  pages = {63--70},
 753  number = {1},
 754  volume = {34},
 755  journal = {Children and Youth Services Review},
 756  author = {Mishna, Faye and Khoury-Kassabri, Mona and Gadalla, Tahany and Daciuk, Joanne},
 757  title = {Risk factors for involvement in cyber bullying: Victims, bullies and bully--victims},
 761  publisher = {Elsevier},
 762  year = {2016},
 763  pages = {433--443},
 764  volume = {63},
 765  journal = {Computers in Human Behavior},
 766  author = {Al-garadi, Mohammed Ali and Varathan, Kasturi Dewi and Ravana, Sri Devi},
 767  title = {Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network},
 771  year = {2020},
 772  journal = {arXiv preprint arXiv:2010.04576},
 773  author = {Chen, Hsin-Yu and Li, Cheng-Te},
 774  title = {HENIN: Learning Heterogeneous Neural Interaction Networks for Explainable Cyberbullying Detection on Social Media},
 778  year = {2021},
 779  pages = {496--506},
 780  booktitle = {Proceedings of the Web Conference 2021},
 781  author = {Ge, Suyu and Cheng, Lu and Liu, Huan},
 782  title = {Improving Cyberbullying Detection with User Interaction},
 786  year = {2016},
 787  pages = {3895--3905},
 788  booktitle = {CHI},
 789  author = {Ashktorab, Zahra and Vitak, Jessica},
 790  title = {Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers},
 794  organization = {IEEE},
 795  year = {2013},
 796  pages = {126--133},
 797  booktitle = {ICCSIT},
 798  author = {Al Mazari, Ali},
 799  title = {Cyber-bullying taxonomies: Definition, forms, consequences and mitigation strategies},
 803  organization = {IEEE},
 804  year = {2016},
 805  pages = {1377--1379},
 806  booktitle = {ASONAM},
 807  author = {Silva, Yasin N and Rich, Christopher and Hall, Deborah},
 808  title = {BullyBlocker: Towards the identification of cyberbullying in social networking sites},
 812  year = {2009},
 813  number = {2},
 814  volume = {3},
 815  journal = {International Journal of Cyber Criminology},
 816  author = {Kraft, Ellen M and Wang, Jinchang},
 817  title = {Effectiveness of cyber bullying prevention strategies: A study on students' perspectives.},
 821  year = {2017},
 822  pages = {37--42},
 823  booktitle = {CODASPY},
 824  author = {Vishwamitra, Nishant and Zhang, Xiang and Tong, Jonathan and Hu, Hongxin and Luo, Feng and Kowalski, Robin and Mazer, Joseph},
 825  title = {MCDefender: Toward effective cyberbullying defense in mobile online social networks},
 829  year = {2018},
 830  pages = {1--12},
 831  booktitle = {CHI},
 832  author = {DiFranzo, Dominic and Taylor, Samuel Hardman and Kazerooni, Franccesca and Wherry, Olivia D and Bazarova, Natalya N},
 833  title = {Upstanding by design: Bystander intervention in cyberbullying},
 837  publisher = {ACM New York, NY, USA},
 838  year = {2020},
 839  pages = {1--38},
 840  number = {1},
 841  volume = {1},
 842  journal = {ACM Transactions on Data Science},
 843  author = {Beigi, Ghazaleh and Liu, Huan},
 844  title = {A survey on privacy in social media: Identification, mitigation, and applications},
 848  year = {2017},
 849  author = {Ashktorab, Zahra},
 850  title = {Designing Cyberbullying Prevention and Mitigation Tools},
 854  publisher = {ACM New York, NY, USA},
 855  year = {2019},
 856  pages = {54--60},
 857  number = {3},
 858  volume = {62},
 859  journal = {Communications of the ACM},
 860  author = {Pearl, Judea},
 861  title = {The seven tools of causal inference, with reflections on machine learning},
 865  year = {2018},
 866  journal = {arXiv preprint arXiv:1803.04383},
 867  author = {Liu, Lydia T and Dean, Sarah and Rolf, Esther and Simchowitz, Max and Hardt, Moritz},
 868  title = {Delayed impact of fair machine learning},
 872  year = {2018},
 873  pages = {1389--1398},
 874  booktitle = {The Web Conference},
 875  author = {Hu, Lily and Chen, Yiling},
 876  title = {A short-term intervention for long-term fairness in the labor market},
 880  publisher = {Elsevier},
 881  year = {2015},
 882  pages = {149--157},
 883  volume = {46},
 884  journal = {Computers in Human Behavior},
 885  author = {Balakrishnan, Vimala},
 886  title = {Cyberbullying among young adults in Malaysia: The roles of gender, age and Internet frequency},
 890  organization = {ACM},
 891  year = {2019},
 892  pages = {1--1},
 893  booktitle = {2019 WSDM},
 894  author = {Jagadish, V. H.},
 895  title = {Responsible Data Science},
 899  organization = {Springer},
 900  year = {2008},
 901  pages = {1--19},
 902  booktitle = {TAMC},
 903  author = {Dwork, Cynthia},
 904  title = {Differential privacy: A survey of results},
 908  publisher = {Cambridge University Press},
 909  year = {2017},
 910  author = {Schaefer, Rafael F and Boche, Holger and Khisti, Ashish and Poor, H Vincent},
 911  title = {Information Theoretic Security and Privacy of Information Systems},
 915  publisher = {ACM New York, NY, USA},
 916  year = {2017},
 917  pages = {1--45},
 918  number = {6},
 919  volume = {50},
 920  journal = {CSUR},
 921  author = {Li, Jundong and Cheng, Kewei and Wang, Suhang and Morstatter, Fred and Trevino, Robert P and Tang, Jiliang and Liu, Huan},
 922  title = {Feature selection: A data perspective},
 926  publisher = {AAAS},
 927  year = {1975},
 928  pages = {398--404},
 929  number = {4175},
 930  volume = {187},
 931  journal = {Science},
 932  author = {Bickel, Peter J and Hammel, Eugene A and O'Connell, J William},
 933  title = {Sex bias in graduate admissions: Data from {B}erkeley},
 937  publisher = {Taylor \& Francis},
 938  year = {1972},
 939  pages = {364--366},
 940  number = {338},
 941  volume = {67},
 942  journal = {Journal of the American Statistical Association},
 943  author = {Blyth, Colin R},
 944  title = {On {S}impson's paradox and the sure-thing principle},
 948  publisher = {ACM New York, NY, USA},
 949  year = {2020},
 950  pages = {1--37},
 951  number = {4},
 952  volume = {53},
 953  journal = {CSUR},
 954  author = {Guo, Ruocheng and Cheng, Lu and Li, Jundong and Hahn, P Richard and Liu, Huan},
 955  title = {A survey of learning causality with data: Problems and methods},
 959  publisher = {ACM New York, NY, USA},
 960  year = {2018},
 961  pages = {54--61},
 962  number = {6},
 963  volume = {61},
 964  journal = {Communications of the ACM},
 965  author = {Baeza-Yates, Ricardo},
 966  title = {Bias on the web},
 970  year = {1996},
 971  pages = {24--30},
 972  booktitle = {NeurIPS},
 973  author = {Craven, Mark and Shavlik, Jude W},
 974  title = {Extracting tree-structured representations of trained networks},
 978  year = {2009},
 979  pages = {1},
 980  number = {3},
 981  volume = {1341},
 982  journal = {University of Montreal},
 983  author = {Erhan, Dumitru and Bengio, Yoshua and Courville, Aaron and Vincent, Pascal},
 984  title = {Visualizing higher-layer features of a deep network},
 988  year = {2017},
 989  journal = {arXiv preprint arXiv:1703.04730},
 990  author = {Koh, Pang Wei and Liang, Percy},
 991  title = {Understanding black-box predictions via influence functions},
 995  year = {2016},
 996  pages = {2280--2288},
 997  booktitle = {NeurIPS},
 998  author = {Kim, Been and Khanna, Rajiv and Koyejo, Oluwasanmi O},
 999  title = {Examples are not enough, learn to criticize! criticism for interpretability},
1003  year = {2013},
1004  journal = {arXiv preprint arXiv:1312.6034},
1005  author = {Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew},
1006  title = {Deep inside convolutional networks: Visualising image classification models and saliency maps},
1010  month = {Feb},
1011  year = {2021},
1012  author = {Schwab, Katharine},
1013  journal = {Fast Company},
1014  abstractnote = {Big Tech has used its power to control the field of AI ethics and avoid accountability. Now, the ouster of Timnit Gebru is putting the movement for equitable tech in the spotlight.},
1015  howpublished = {\url{}},
1016  title = {‘This is bigger than just Timnit’: How Google tried to silence a critic and ignited a movement},
1020  year = {2019},
1021  journal = {Doteveryone},
1022  howpublished = {\url{}},
1023  author = {Miller C, Coldicutt R.},
1024  title = {People, Power and Technology: The Tech Workers’ View},
1028  year = {2015},
1029  author = {Carpenter, Julia},
1030  journal = {Washington Post},
1031  abstractnote = {Is the Google algorithm sexist -- or is it us?},
1032  howpublished = {\url{}},
1036  month = {April},
1037  year = {2021},
1038  author = {Horwitz, Jeff},
1039  journal = {Wall Street Journal},
1040  abstractnote = {Researchers found Facebook systems were more likely to present certain job ads to users if their gender identity reflected the concentration of that gender in a particular position or industry.},
1041  howpublished = {\url{}},
1042  issn = {0099-9660},
1043  title = {Facebook Algorithm Shows Gender Bias in Job Ads, Study Finds},
1047  month = {July},
1048  year = {2021},
1049  author = {Wikipedia},
1050  journal = {Wikipedia},
1051  note = {Page Version ID: 1035933869},
1052  howpublished = {\url{\%E2\%80\%93Cambridge_Analytica_data_scandal&oldid=1035933869}},
1056  month = {July},
1057  year = {2021},
1058  author = {Marr, Bernard},
1059  abstractnote = {The amount of data we produce every day is truly […]},
1060  howpublished = {\url{}},
1064  year = {2020},
1065  author = {Siegel, Eric},
1066  journal = {KDnuggets},
1067  abstractnote = {We often discuss applying data science and machine learning techniques in term so of how they help your organization or business goals. But, these algorithms aren’t limited to only increasing the bottom line. Developing new applications that leverage the predictive power of AI to benefit society and those communities in…},
1068  howpublished = {\url{}},
1072  year = {2020},
1073  author = {Knight, Will},
1074  journal = {Wired},
1075  abstractnote = {Robust Intelligence is among a crop of companies that offer to protect clients from efforts at deception.},
1076  howpublished = {\url{}},
1077  issn = {1059-1028},
1078  title = {This Company Uses {AI} to Outwit Malicious {AI}},
1082  year = {2020},
1083  author = {Rivero, Nicolás},
1084  journal = {Quartz},
1085  abstractnote = {In one topsy-turvy week, Google presented a case study in the perils of turning AI research over to Big Tech.},
1086  howpublished = {\url{}},
1087  title = {Google showed us the danger of letting corporations lead {AI} research},
1091  year = {2015},
1092  author = { Shaul, Brandy},
1093  abstractnote = {The app allows users to ask their Facebook friends questions, receiving anonymous, positive feedback to boost their confidence., The app allows users to ask their Facebook friends questions, receiving anonymous, positive feedback to boost their confidence.},
1094  howpublished = {\url{}},
1095  title = {Honestly Looks to Combat Cyberbullying on iOS, Android},
1099  month = {December},
1100  year = {2020},
1101  author = {Dobrin, Seth},
1102  journal = {Watson Blog},
1103  abstractnote = {IBM is commercializing key automated documentation capabilities from IBM Research’s AI FactSheets into Watson Studio in Cloud Pak for Data throughout 2021.},
1104  howpublished = {\url{}},
1108  month = {March},
1109  year = {2019},
1110  author = {Gershgorn, Dave},
1111  journal = {Popular Science},
1112  abstractnote = {Machine learning algorithms used by the U.S. National Security Agency to identify potential terrorists in Pakistan might be ineffective.},
1113  howpublished = {\url{}},
1117  month = {February},
1118  year = {2020},
1119  author = {Grover, Vandita},
1120  abstractnote = {On Safer Internet Day, we explore whether transparency and obtaining consent are enough for customer data privacy or will data dignity shape the digital future.},
1121  howpublished = {\url{}},
1125  month = {June},
1126  year = {2019},
1127  author = {Hart, Vi},
1128  howpublished = {\url{}},
1129  title = {Data Dignity at RadicalxChange - The Art of Research},
1133  month = {November},
1134  year = {2020},
1135  author = {Cowen, Tyler},
1136  journal = {},
1137  abstractnote = {Priority should be given to methods that will save more lives and bring back the economy more rapidly.},
1138  howpublished = {\url{}},
1142  month = {May},
1143  year = {2020},
1144  author = {Yeo, Catherine},
1145  journal = {Fair Bytes},
1146  abstractnote = {What does it mean for a machine learning algorithm to be “transparent”?},
1147  howpublished = {\url{}},
1148  title = {What is Transparency in {AI}?},
1152  month = {Mar},
1153  year = {2021},
1154  author = {Wikipedia},
1155  note = {Page Version ID: 1009774103},
1156  abstractnote = {In computer science, robustness is the ability of a computer system to cope with errors during execution and cope with erroneous input. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network. Formal techniques, such as fuzz testing, are essential to showing robustness since this type of testing involves invalid or unexpected inputs. Alternatively, fault injection can be used to test robustness. Various commercial products perform robustness testing of software analysis.},
1157  howpublished = {\url{}},
1161  year = {2019},
1162  author = {
1163Feige, Ilya},
1164  journal = {Faculty},
1165  abstractnote = {In this blog, we attempt to describe the space of Artificial Intelligence (AI) Safety; namely, all the things that someone might mean when they say “AI Safety”.},
1166  howpublished = {\url{}},
1170  month = {March},
1171  year = {2021},
1172  author = {Branswell, Helen},
1173  journal = {STAT},
1174  abstractnote = {A wide-ranging Q&A with Richard Hatchett, CEO of CEPI, the Coalition for Epidemic Preparedness Innovations.},
1175  howpublished = {\url{}},
1179  month = {July},
1180  year = {2021},
1181  author = {Wikipedia},
1182  note = {Page Version ID: 1036071985},
1183  abstractnote = {Cambridge Analytica Ltd (CA) was a British political consulting firm that came to prominence through the Facebook–Cambridge Analytica data scandal. It was started in 2013 as a subsidiary of the private intelligence company and self-described “global election management agency” SCL Group by long-time SCL executives Nigel Oakes, Alexander Nix and Alexander Oakes, with Nix as CEO. The company had close ties to the Conservative Party (UK), the British royal family and the British military. The firm maintained offices in London, New York City, and Washington, DC. The company closed operations in 2018 in the course of the Facebook–Cambridge Analytica data scandal, although related firms still exist.},
1184  howpublished = {\url{}},
1188  month = {December},
1189  year = {2020},
1190  author = {Asimov, Nanette},
1191  journal = {GovTech},
1192  abstractnote = {Stanford Medicine officials also studied guidelines — then built a mathematical algorithm that prioritized people who on paper were at high risk for COVID-19, such as older employees. But not all who came into contact with patients were doctors and nurses.},
1193  howpublished = {\url{}},
1197  year = {2015},
1198  author = {Guynn, Jessica},
1199  journal = {USA TODAY},
1200  abstractnote = {Google apologizes after Jacky Alciné reported he and friend were identifed as ``gorillas''},
1201  howpublished = {\url{}},
1205  howpublished = {\url{}},
1206  year = {2020},
1207  author = {Mills, Steven and Baltassis, Elias and Santinelli, Maximiliano and Carlisi, Cathy and Duranton, Sylvain and Gallego, Andrea},
1208  title = {Six Steps to Bridge the Responsible {AI} Gap},
1212  year = {2019},
1213  pages = {220--229},
1214  booktitle = {FAT*},
1215  author = {Mitchell, Margaret and Wu, Simone and Zaldivar, Andrew and Barnes, Parker and Vasserman, Lucy and Hutchinson, Ben and Spitzer, Elena and Raji, Inioluwa Deborah and Gebru, Timnit},
1216  title = {Model cards for model reporting},
1220  publisher = {ACM New York, NY, USA},
1221  year = {2020},
1222  pages = {18--21},
1223  number = {12},
1224  volume = {63},
1225  journal = {Communications of the ACM},
1226  author = {Canca, Cansu},
1227  title = {Operationalizing {AI} ethics principles},
1231  publisher = {IBM},
1232  year = {2019},
1233  pages = {4--1},
1234  number = {4/5},
1235  volume = {63},
1236  journal = {IBM Journal of Research and Development},
1237  author = {Bellamy, Rachel KE and Dey, Kuntal and Hind, Michael and Hoffman, Samuel C and Houde, Stephanie and Kannan, Kalapriya and Lohia, Pranay and Martino, Jacquelyn and Mehta, Sameep and Mojsilovi{\'c}, Aleksandra and others},
1238  title = {{AI} Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias},
1242  year = {2018},
1243  pages = {116--116},
1244  booktitle = {AIES},
1245  author = {Goel, Naman and Yaghini, Mohammad and Faltings, Boi},
1246  title = {Non-discriminatory machine learning through convex fairness criteria},
1250  organization = {Springer},
1251  year = {2012},
1252  pages = {35--50},
1253  booktitle = {ECML PKDD},
1254  author = {Kamishima, Toshihiro and Akaho, Shotaro and Asoh, Hideki and Sakuma, Jun},
1255  title = {Fairness-aware classifier with prejudice remover regularizer},
1259  year = {2018},
1260  pages = {853--862},
1261  booktitle = {The Web Conference},
1262  author = {Krasanakis, Emmanouil and Spyromitros-Xioufis, Eleftherios and Papadopoulos, Symeon and Kompatsiaris, Yiannis},
1263  title = {Adaptive sensitive reweighting to mitigate bias in fairness-aware classification},
1267  year = {2018},
1268  pages = {107--118},
1269  booktitle = {FAT*},
1270  author = {Menon, Aditya Krishna and Williamson, Robert C},
1271  title = {The cost of fairness in binary classification},
1275  year = {2017},
1276  pages = {3992--4001},
1277  booktitle = {NeurIPS},
1278  author = {Calmon, Flavio and Wei, Dennis and Vinzamuri, Bhanukiran and Ramamurthy, Karthikeyan Natesan and Varshney, Kush R},
1279  title = {Optimized pre-processing for discrimination prevention},
1283  year = {2019},
1284  pages = {2221--2231},
1285  booktitle = {KDD},
1286  author = {Geyik, Sahin Cem and Ambler, Stuart and Kenthapadi, Krishnaram},
1287  title = {Fairness-aware ranking in search \& recommendation systems with application to linkedin talent search},
1291  year = {2020},
1292  journal = {arXiv preprint arXiv:2012.00423},
1293  author = {S{\"u}hr, Tom and Hilgard, Sophie and Lakkaraju, Himabindu},
1294  title = {Does Fair Ranking Improve Minority Outcomes? Understanding the Interplay of Human and Algorithmic Biases in Online Hiring},
1298  publisher = {Elsevier},
1299  year = {2020},
1300  pages = {103621},
1301  journal = {Journal of Biomedical Informatics},
1302  author = {Pfohl, Stephen R and Foryciarz, Agata and Shah, Nigam H},
1303  title = {An empirical characterization of fair machine learning for clinical risk prediction},
1307  publisher = {ACM New York, NY, USA},
1308  year = {2019},
1309  pages = {26--29},
1310  number = {3},
1311  volume = {25},
1312  journal = {XRDS: Crossroads, The ACM Magazine for Students},
1313  author = {Varshney, Kush R},
1314  title = {Trustworthy machine learning and artificial intelligence},
1318  year = {2017},
1319  pages = {498--510},
1320  booktitle = {ESEC/FSE},
1321  author = {Galhotra, Sainyam and Brun, Yuriy and Meliou, Alexandra},
1322  title = {Fairness testing: testing software for discrimination},
1326  year = {2015},
1327  pages = {259--268},
1328  booktitle = {KDD},
1329  author = {Feldman, Michael and Friedler, Sorelle A and Moeller, John and Scheidegger, Carlos and Venkatasubramanian, Suresh},
1330  title = {Certifying and removing disparate impact},
1334  year = {2020},
1335  pages = {702--712},
1336  booktitle = {AISTATS},
1337  author = {Jiang, Heinrich and Nachum, Ofir},
1338  title = {Identifying and correcting label bias in machine learning},
1342  year = {2016},
1343  journal = {arXiv preprint arXiv:1610.07183},
1344  author = {Celis, L Elisa and Deshpande, Amit and Kathuria, Tarun and Vishnoi, Nisheeth K},
1345  title = {How to be fair and diverse?},
1349  year = {2019},
1350  journal = {arXiv preprint arXiv:1904.13341},
1351  author = {Feng, Rui and Yang, Yang and Lyu, Yuehan and Tan, Chenhao and Sun, Yizhou and Wang, Chunping},
1352  title = {Learning fair representations via an adversarial framework},
1356  year = {2016},
1357  journal = {arXiv preprint arXiv:1611.07509},
1358  author = {Zhang, Lu and Wu, Yongkai and Wu, Xintao},
1359  title = {A causal framework for discovering and removing direct and indirect discrimination},
1363  year = {2018},
1364  journal = {arXiv preprint arXiv:1811.04376},
1365  author = {Narendra, Tanmayee and Sankaran, Anush and Vijaykeerthy, Deepak and Mani, Senthil},
1366  title = {Explaining deep learning models using causal inference},
1370  year = {2018},
1371  journal = {arXiv preprint arXiv:1808.06581},
1372  author = {Wang, Yixin and Liang, Dawen and Charlin, Laurent and Blei, David M},
1373  title = {The deconfounded recommender: A causal inference approach to recommendation},
1377  publisher = {Lulu. com},
1378  year = {2020},
1379  author = {Molnar, Christoph},
1380  title = {Interpretable Machine Learning},
1384  publisher = {HeinOnline},
1385  year = {2017},
1386  pages = {841},
1387  volume = {31},
1388  journal = {Harv. JL \& Tech.},
1389  author = {Wachter, Sandra and Mittelstadt, Brent and Russell, Chris},
1390  title = {Counterfactual explanations without opening the black box: Automated decisions and the GDPR},
1394  publisher = {Elsevier},
1395  year = {1987},
1396  pages = {37--52},
1397  number = {1-3},
1398  volume = {2},
1399  journal = {Chemometrics and intelligent laboratory systems},
1400  author = {Wold, Svante and Esbensen, Kim and Geladi, Paul},
1401  title = {Principal component analysis},
1405  year = {2008},
1406  pages = {2579--2605},
1407  number = {Nov},
1408  volume = {9},
1409  journal = {JMLR},
1410  author = {Maaten, Laurens van der and Hinton, Geoffrey},
1411  title = {Visualizing data using t-SNE},
1415  publisher = {ACM New York, NY, USA},
1416  year = {2018},
1417  pages = {31--57},
1418  number = {3},
1419  volume = {16},
1420  journal = {Queue},
1421  author = {Lipton, Zachary C},
1422  title = {The mythos of model interpretability},
1426  year = {2015},
1427  author = {Mordvintsev, Alexander and Olah, Christopher and Tyka, Mike},
1428  title = {Inceptionism: Going deeper into neural networks},
1432  year = {2016},
1433  pages = {1480--1489},
1434  booktitle = {NAACL HLT},
1435  author = {Yang, Zichao and Yang, Diyi and Dyer, Chris and He, Xiaodong and Smola, Alex and Hovy, Eduard},
1436  title = {Hierarchical attention networks for document classification},
1440  publisher = {Springer},
1441  year = {2018},
1442  pages = {159--175},
1443  booktitle = {Human and Machine Learning},
1444  author = {Robnik-{\v{S}}ikonja, Marko and Bohanec, Marko},
1445  title = {Perturbation-based explanations of prediction models},
1449  publisher = {JSTOR},
1450  year = {1979},
1451  pages = {100--108},
1452  number = {1},
1453  volume = {28},
1454  journal = {Journal of the Royal Statistical Society. Series C (applied statistics)},
1455  author = {Hartigan, John A and Wong, Manchek A},
1456  title = {Algorithm AS 136: A k-means clustering algorithm},
1460  publisher = {Nature Publishing Group},
1461  year = {2019},
1462  pages = {206--215},
1463  number = {5},
1464  volume = {1},
1465  journal = {Nature Machine Intelligence},
1466  author = {Rudin, Cynthia},
1467  title = {Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead},
1471  publisher = {Wiley Online Library},
1472  year = {2003},
1473  pages = {1161--1189},
1474  number = {4},
1475  volume = {71},
1476  journal = {Econometrica},
1477  author = {Hirano, Keisuke and Imbens, Guido W and Ridder, Geert},
1478  title = {Efficient estimation of average treatment effects using the estimated propensity score},
1482  year = {2019},
1483  journal = {arXiv preprint arXiv:1907.03077},
1484  author = {Liu, Shusen and Kailkhura, Bhavya and Loveland, Donald and Han, Yong},
1485  title = {Generative counterfactual introspection for explainable deep learning},
1489  year = {2018},
1490  journal = {arXiv preprint arXiv:1811.05245},
1491  author = {Grath, Rory Mc and Costabello, Luca and Van, Chan Le and Sweeney, Paul and Kamiab, Farbod and Shen, Zhao and Lecue, Freddy},
1492  title = {Interpretable credit application predictions with counterfactual explanations},
1496  year = {2017},
1497  pages = {781--789},
1498  booktitle = {WSDM},
1499  author = {Joachims, Thorsten and Swaminathan, Adith and Schnabel, Tobias},
1500  title = {Unbiased learning-to-rank with biased feedback},
1504  year = {2018},
1505  pages = {610--618},
1506  booktitle = {WSDM},
1507  author = {Wang, Xuanhui and Golbandi, Nadav and Bendersky, Michael and Metzler, Donald and Najork, Marc},
1508  title = {Position bias estimation for unbiased learning to rank in personal search},
1512  publisher = {Springer},
1513  year = {2017},
1514  pages = {1--16},
1515  number = {1},
1516  volume = {4},
1517  journal = {JDSA},
1518  author = {Zhang, Lu and Wu, Xintao},
1519  title = {Anti-discrimination learning: a causal modeling-based framework},
1523  publisher = {IEEE},
1524  year = {2014},
1525  pages = {2706--2715},
1526  number = {5},
1527  volume = {30},
1528  journal = {IEEE Transactions on Power Systems},
1529  author = {Zhang, Guoyong and Wu, Yonggang and Wong, Kit Po and Xu, Zhao and Dong, Zhao Yang and Iu, Herbert Ho-Ching},
1530  title = {An advanced approach for construction of optimal wind power prediction intervals},
1534  year = {2018},
1535  pages = {923--932},
1536  booktitle = {The Web Conference},
1537  author = {Stoica, Ana-Andreea and Riederer, Christopher and Chaintreau, Augustin},
1538  title = {Algorithmic Glass Ceiling in Social Networks: The effects of social recommendations on network diversity},
1542  year = {2019},
1543  journal = {arXiv preprint arXiv:1909.08891},
1544  author = {Parafita, {\'A}lvaro and Vitri{\`a}, Jordi},
1545  title = {Explaining visual models by causal attribution},
1549  year = {2019},
1550  journal = {arXiv preprint arXiv:1902.02302},
1551  author = {Chattopadhyay, Aditya and Manupriya, Piyushi and Sarkar, Anirban and Balasubramanian, Vineeth N},
1552  title = {Neural network attributions: A causal perspective},
1556  year = {2018},
1557  journal = {arXiv preprint arXiv:1801.04016},
1558  author = {Pearl, Judea},
1559  title = {Theoretical impediments to machine learning with seven sparks from the causal revolution},
1563  publisher = {ACM},
1564  year = {2020},
1565  pages = {18--33},
1566  number = {1},
1567  volume = {22},
1568  journal = {SIGKDD Explorations},
1569  author = {Moraffah, Raha and Karami, Mansooreh and Guo, Ruocheng and Raglin, Adrienne and Liu, Huan},
1570  title = {Causal Interpretability for Machine Learning-Problems, Methods and Evaluation},
1574  year = {2020},
1575  pages = {9434--9441},
1576  booktitle = {AAAI},
1577  author = {Yang, Zekun and Feng, Juan},
1578  title = {A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations.},
1582  publisher = {National Acad Sciences},
1583  year = {2018},
1584  pages = {E3635--E3644},
1585  number = {16},
1586  volume = {115},
1587  journal = {PNAS},
1588  author = {Garg, Nikhil and Schiebinger, Londa and Jurafsky, Dan and Zou, James},
1589  title = {Word embeddings quantify 100 years of gender and ethnic stereotypes},
1593  year = {2018},
1594  pages = {242--250},
1595  booktitle = {RecSys},
1596  author = {Ekstrand, Michael D and Tian, Mucun and Kazi, Mohammed R Imran and Mehrpouyan, Hoda and Kluver, Daniel},
1597  title = {Exploring author gender in book rating and recommendation},
1601  publisher = {Nature Publishing Group},
1602  year = {2018},
1603  pages = {1--12},
1604  number = {1},
1605  volume = {8},
1606  journal = {Scientific Reports},
1607  author = {Karimi, Fariba and G{\'e}nois, Mathieu and Wagner, Claudia and Singer, Philipp and Strohmaier, Markus},
1608  title = {Homophily influences ranking of minorities in social networks},
1612  publisher = {Sciendo},
1613  year = {2015},
1614  pages = {92--112},
1615  number = {1},
1616  volume = {2015},
1617  journal = {PoPETs},
1618  author = {Datta, Amit and Tschantz, Michael Carl and Datta, Anupam},
1619  title = {Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination},
1623  publisher = {INFORMS},
1624  year = {2019},
1625  pages = {2966--2981},
1626  number = {7},
1627  volume = {65},
1628  journal = {Management Science},
1629  author = {Lambrecht, Anja and Tucker, Catherine},
1630  title = {Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of STEM career ads},
1634  organization = {NIH Public Access},
1635  year = {2018},
1636  pages = {1931},
1637  volume = {2018},
1638  booktitle = {AAAI},
1639  author = {Nabi, Razieh and Shpitser, Ilya},
1640  title = {Fair inference on outcomes},
1644  organization = {IEEE},
1645  year = {2013},
1646  pages = {71--80},
1647  booktitle = {ICDM},
1648  author = {Calders, Toon and Karim, Asim and Kamiran, Faisal and Ali, Wasif and Zhang, Xiangliang},
1649  title = {Controlling attribute effect in linear regression},
1653  year = {2019},
1654  journal = {arXiv preprint arXiv:1905.12843},
1655  author = {Agarwal, Alekh and Dud{\'\i}k, Miroslav and Wu, Zhiwei Steven},
1656  title = {Fair regression: Quantitative definitions and reduction-based algorithms},
1660  year = {2019},
1661  pages = {1452--1458},
1662  booktitle = {IJCAI},
1663  author = {Xu, Depeng and Wu, Yongkai and Yuan, Shuhan and Zhang, Lu and Wu, Xintao},
1664  title = {Achieving Causal Fairness through Generative Adversarial Networks.},
1668  year = {2010},
1669  pages = {39--58},
1670  booktitle = {Causality: Objectives and Assessment},
1671  author = {Pearl, Judea},
1672  title = {Causal inference},
1676  publisher = {Cambridge university press},
1677  year = {2009},
1678  author = {Pearl, Judea},
1679  title = {Causality},
1683  publisher = {Oxford University Press},
1684  year = {1983},
1685  pages = {41--55},
1686  number = {1},
1687  volume = {70},
1688  journal = {Biometrika},
1689  author = {Rosenbaum, Paul R and Rubin, Donald B},
1690  title = {The central role of the propensity score in observational studies for causal effects},
1694  publisher = {American Psychological Association},
1695  year = {1974},
1696  pages = {688},
1697  number = {5},
1698  volume = {66},
1699  journal = {Journal of Educational Psychology},
1700  author = {Rubin, Donald B},
1701  title = {Estimating causal effects of treatments in randomized and nonrandomized studies},
1705  publisher = {Wiley Online Library},
1706  year = {2019},
1707  pages = {e1312},
1708  number = {4},
1709  volume = {9},
1710  journal = {Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
1711  author = {Holzinger, Andreas and Langs, Georg and Denk, Helmut and Zatloukal, Kurt and M{\"u}ller, Heimo},
1712  title = {Causability and explainability of artificial intelligence in medicine},
1716  year = {2018},
1717  pages = {119--133},
1718  booktitle = {FAT*},
1719  author = {Dwork, Cynthia and Immorlica, Nicole and Kalai, Adam Tauman and Leiserson, Max},
1720  title = {Decoupled classifiers for group-fair and efficient machine learning},
1724  year = {2017},
1725  journal = {arXiv preprint arXiv:1707.00044},
1726  author = {Bechavod, Yahav and Ligett, Katrina},
1727  title = {Penalizing unfairness in binary classification},
1731  publisher = {Springer},
1732  year = {2010},
1733  pages = {277--292},
1734  number = {2},
1735  volume = {21},
1736  journal = {Data Mining and Knowledge Discovery},
1737  author = {Calders, Toon and Verwer, Sicco},
1738  title = {Three naive {B}ayes approaches for discrimination-free classification},
1742  year = {2012},
1743  pages = {214--226},
1744  booktitle = {ITCS},
1745  author = {Dwork, Cynthia and Hardt, Moritz and Pitassi, Toniann and Reingold, Omer and Zemel, Richard},
1746  title = {Fairness through awareness},
1750  year = {2016},
1751  journal = {arXiv preprint arXiv:1608.00528},
1752  author = {Johnson, Kory D and Foster, Dean P and Stine, Robert A},
1753  title = {Impartial predictive modeling: Ensuring fairness in arbitrary models},
1757  year = {2016},
1758  pages = {3315--3323},
1759  booktitle = {NeurIPS},
1760  author = {Hardt, Moritz and Price, Eric and Srebro, Nati},
1761  title = {Equality of opportunity in supervised learning},
1765  year = {2017},
1766  journal = {arXiv preprint arXiv:1706.02409},
1767  author = {Berk, Richard and Heidari, Hoda and Jabbari, Shahin and Joseph, Matthew and Kearns, Michael and Morgenstern, Jamie and Neel, Seth and Roth, Aaron},
1768  title = {A convex framework for fair regression},
1772  year = {2017},
1773  journal = {arXiv preprint arXiv:1708.01870},
1774  author = {Weller, Adrian},
1775  title = {Challenges for transparency},
1779  publisher = {Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA},
1780  year = {2017},
1781  pages = {120--134},
1782  number = {2},
1783  volume = {5},
1784  journal = {Big Data},
1785  author = {d'Alessandro, Brian and O'Neil, Cathy and LaGatta, Tom},
1786  title = {Conscientious classification: A data scientist's guide to discrimination-aware classification},
1790  publisher = {IOS Press},
1791  year = {2007},
1792  pages = {17},
1793  volume = {157},
1794  journal = {FAIA},
1795  author = {Legg, Shane and Hutter, Marcus and others},
1796  title = {A collection of definitions of intelligence},
1800  year = {2020},
1801  pages = {180--186},
1802  booktitle = {AIES},
1803  author = {Slack, Dylan and Hilgard, Sophie and Jia, Emily and Singh, Sameer and Lakkaraju, Himabindu},
1804  title = {Fooling lime and shap: Adversarial attacks on post hoc explanation methods},
1808  year = {2017},
1809  pages = {4066--4076},
1810  booktitle = {NeurIPS},
1811  author = {Kusner, Matt J and Loftus, Joshua and Russell, Chris and Silva, Ricardo},
1812  title = {Counterfactual fairness},
1816  year = {2018},
1817  journal = {arXiv preprint arXiv:1805.05859},
1818  author = {Loftus, Joshua R and Russell, Chris and Kusner, Matt J and Silva, Ricardo},
1819  title = {Causal reasoning for algorithmic fairness},
1823  year = {2016},
1824  pages = {1135--1144},
1825  booktitle = {KDD},
1826  author = {Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos},
1827  title = {``{W}hy should {I} trust you?'' Explaining the predictions of any classifier},
1831  year = {2019},
1832  journal = {arXiv preprint arXiv:1906.05684},
1833  author = {Leslie, David},
1834  title = {Understanding artificial intelligence ethics and safety},
1838  year = {2017},
1839  pages = {4765--4774},
1840  booktitle = {NeurIPS},
1841  author = {Lundberg, Scott M and Lee, Su-In},
1842  title = {A unified approach to interpreting model predictions},
1846  publisher = {Springer},
1847  year = {2020},
1848  pages = {1--18},
1849  journal = {Electronic Markets},
1850  author = {Thiebes, Scott and Lins, Sebastian and Sunyaev, Ali},
1851  title = {Trustworthy artificial intelligence},
1855  year = {2021},
1856  pages = {449--453},
1857  booktitle = {8th ACM IKDD CODS and 26th COMAD},
1858  author = {Singh, Richa and Vatsa, Mayank and Ratha, Nalini},
1859  title = {Trustworthy {AI}},
1863  publisher = {Multidisciplinary Digital Publishing Institute},
1864  year = {2019},
1865  pages = {832},
1866  number = {8},
1867  volume = {8},
1868  journal = {Electronics},
1869  author = {Carvalho, Diogo V and Pereira, Eduardo M and Cardoso, Jaime S},
1870  title = {Machine learning interpretability: A survey on methods and metrics},
1874  year = {2020},
1875  journal = {arXiv preprint arXiv:2002.02770},
1876  author = {Yao, Liuyi and Chu, Zhixuan and Li, Sheng and Li, Yaliang and Gao, Jing and Zhang, Aidong},
1877  title = {A Survey on Causal Inference},
1881  year = {2018},
1882  journal = {arXiv preprint arXiv:1810.00069},
1883  author = {Chakraborty, Anirban and Alam, Manaar and Dey, Vishal and Chattopadhyay, Anupam and Mukhopadhyay, Debdeep},
1884  title = {Adversarial attacks and defences: A survey},
1888  year = {2020},
1889  journal = {IEEE Transactions on Neural Networks and Learning Systems},
1890  author = {Tjoa, E. and Guan, C},
1891  title = {A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI},
1895  organization = {IEEE},
1896  year = {2016},
1897  pages = {582--597},
1898  booktitle = {IEEE Symposium on SP},
1899  author = {Papernot, Nicolas and McDaniel, Patrick and Wu, Xi and Jha, Somesh and Swami, Ananthram},
1900  title = {Distillation as a defense to adversarial perturbations against deep neural networks},
1904  publisher = {Wiley Online Library},
1905  year = {2008},
1906  pages = {376--385},
1907  number = {4},
1908  volume = {49},
1909  journal = {Journal of Child Psychology and Psychiatry},
1910  author = {Smith, Peter K and Mahdavi, Jess and Carvalho, Manuel and Fisher, Sonja and Russell, Shanette and Tippett, Neil},
1911  title = {Cyberbullying: Its nature and impact in secondary school pupils},
1915  organization = {Springer},
1916  year = {2017},
1917  pages = {52--67},
1918  booktitle = {ECML PKDD},
1919  author = {Dani, Harsh and Li, Jundong and Liu, Huan},
1920  title = {Sentiment informed cyberbullying detection in social media},
1924  organization = {Springer},
1925  year = {2020},
1926  pages = {206--219},
1927  booktitle = {SocInfo},
1928  author = {Mokhberian, Negar and Abeliuk, Andr{\'e}s and Cummings, Patrick and Lerman, Kristina},
1929  title = {Moral Framing and Ideological Bias of News},
1933  year = {2019},
1934  journal = {Center for Technology Innovation, Brookings. Tillg{\"a}nglig online: https://www. brookings. edu/research/algorithmic-bias-detection-and-mitigation-bestpractices-and-policies-to-reduce-consumer-harms/\# footnote-7 (2019-10-01)},
1935  author = {Lee, Nicol Turner and Resnick, Paul and Barton, Genie},
1936  title = {Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms},
1940  year = {2001},
1941  booktitle = {ACJS, Washington, DC},
1942  author = {McConnell, Elizabeth H and Scheidegger, Amie R},
1943  title = {Race and speeding citations: Comparing speeding citations issued by air traffic officers with those issued by ground traffic officers},
1947  year = {2001},
1948  journal = {Public Services Research Institute},
1949  author = {Lange, James E and Blackman, Kenneth O and Johnson, Mark B},
1950  title = {Speed violation survey of the New Jersey Turnpike: Final report},
1954  publisher = {The University of Chicago Press},
1955  year = {1993},
1956  pages = {385--409},
1957  number = {3},
1958  volume = {101},
1959  journal = {Journal of Political Economy},
1960  author = {Becker, Gary S},
1961  title = {Nobel lecture: The economic way of looking at behavior},
1965  institution = {mimeo, Yale Law School},
1966  year = {2010},
1967  author = {Ayres, Ian},
1968  title = {Testing for discrimination and the problem of” included variable bias”,”},
1972  year = {2020},
1973  journal = {Detection, Mitigation and Implications (July 26, 2020)},
1974  author = {Fu, Runshan and Huang, Yan and Singh, Param Vir},
1975  title = {{AI} and Algorithmic Bias: Source, Detection, Mitigation and Implications},
1979  year = {2014},
1980  pages = {3--6},
1981  booktitle = {SAM},
1982  author = {Huang, Qianjia and Singh, Vivek Kumar and Atrey, Pradeep Kumar},
1983  title = {Cyberbullying detection using social and textual analysis},
1987  organization = {SIAM},
1988  year = {2019},
1989  pages = {235--243},
1990  booktitle = {SDM},
1991  author = {Cheng, Lu and Guo, Ruocheng and Silva, Yasin and Hall, Deborah and Liu, Huan},
1992  title = {Hierarchical attention networks for cyberbullying detection on the instagram social network},
1996  year = {2019},
1997  pages = {339--347},
1998  booktitle = {WSDM},
1999  author = {Cheng, Lu and Li, Jundong and Silva, Yasin N and Hall, Deborah L and Liu, Huan},
2000  title = {Xbully: Cyberbullying detection within a multi-modal context},
2004  publisher = {ACM New York, NY},
2005  year = {2021},
2006  pages = {1--23},
2007  number = {2},
2008  volume = {2},
2009  journal = {ACM/IMS Transactions on Data Science},
2010  author = {Cheng, Lu and Guo, Ruocheng and Silva, Yasin N and Hall, Deborah and Liu, Huan},
2011  title = {Modeling temporal patterns of cyberbullying detection with hierarchical attention networks},
2015  organization = {ACL},
2016  year = {2012},
2017  pages = {656--666},
2018  booktitle = {NAACL HLT},
2019  author = {Xu, Jun-Ming and Jun, Kwang-Sung and Zhu, Xiaojin and Bellmore, Amy},
2020  title = {Learning from bullying traces in social media},
2024  year = {2020},
2025  journal = {arXiv preprint arXiv:2004.12254},
2026  author = {Mirshghallah, Fatemehsadat and Taram, Mohammadkazem and Vepakomma, Praneeth and Singh, Abhishek and Raskar, Ramesh and Esmaeilzadeh, Hadi},
2027  title = {Privacy in Deep Learning: A Survey},
2031  publisher = {Elsevier},
2032  year = {2020},
2033  pages = {21--45},
2034  volume = {384},
2035  journal = {Neurocomputing},
2036  author = {Boulemtafes, Amine and Derhab, Abdelouahid and Challal, Yacine},
2037  title = {A review of privacy-preserving techniques for deep learning},
2041  year = {2017},
2042  pages = {603--618},
2043  booktitle = {CCS},
2044  author = {Hitaj, Briland and Ateniese, Giuseppe and Perez-Cruz, Fernando},
2045  title = {Deep models under the GAN: information leakage from collaborative deep learning},
2049  year = {2004},
2050  pages = {99--108},
2051  booktitle = {KDD},
2052  author = {Dalvi, Nilesh and Domingos, Pedro and Sanghai, Sumit and Verma, Deepak},
2053  title = {Adversarial classification},
2057  year = {2014},
2058  pages = {2672--2680},
2059  booktitle = {NeurIPS},
2060  author = {Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua},
2061  title = {Generative adversarial nets},
2065  year = {2006},
2066  pages = {16--25},
2067  booktitle = {ASIACCS},
2068  author = {Barreno, Marco and Nelson, Blaine and Sears, Russell and Joseph, Anthony D and Tygar, J Doug},
2069  title = {Can machine learning be secure?},
2073  year = {2014},
2074  pages = {17--32},
2075  booktitle = {USENIX Security},
2076  author = {Fredrikson, Matthew and Lantz, Eric and Jha, Somesh and Lin, Simon and Page, David and Ristenpart, Thomas},
2077  title = {Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing},
2081  organization = {IEEE},
2082  year = {2017},
2083  pages = {3--18},
2084  booktitle = {2017 IEEE Symposium on SP},
2085  author = {Shokri, Reza and Stronati, Marco and Song, Congzheng and Shmatikov, Vitaly},
2086  title = {Membership inference attacks against machine learning models},
2090  publisher = {Inderscience Publishers (IEL)},
2091  year = {2015},
2092  pages = {137--150},
2093  number = {3},
2094  volume = {10},
2095  journal = {International Journal of Security and Networks},
2096  author = {Ateniese, Giuseppe and Mancini, Luigi V and Spognardi, Angelo and Villani, Antonio and Vitali, Domenico and Felici, Giovanni},
2097  title = {Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers},
2101  year = {2016},
2102  pages = {601--618},
2103  booktitle = {USENIX Security},
2104  author = {Tram{\`e}r, Florian and Zhang, Fan and Juels, Ari and Reiter, Michael K and Ristenpart, Thomas},
2105  title = {Stealing machine learning models via prediction apis},
2109  year = {2015},
2110  pages = {1322--1333},
2111  booktitle = {CCS},
2112  author = {Fredrikson, Matt and Jha, Somesh and Ristenpart, Thomas},
2113  title = {Model inversion attacks that exploit confidence information and basic countermeasures},
2117  publisher = {IEEE},
2118  year = {2019},
2119  pages = {2805--2824},
2120  number = {9},
2121  volume = {30},
2122  journal = {IEEE Transactions on Neural Networks and Learning Systems},
2123  author = {Yuan, Xiaoyong and He, Pan and Zhu, Qile and Li, Xiaolin},
2124  title = {Adversarial examples: Attacks and defenses for deep learning},
2128  year = {2016},
2129  journal = {arXiv preprint arXiv:1611.03814},
2130  author = {Papernot, Nicolas and McDaniel, Patrick and Sinha, Arunesh and Wellman, Michael},
2131  title = {Towards the science of security and privacy in machine learning},
2135  year = {2019},
2136  author = {Akhtar, Zahid and Dasgupta, Dipankar},
2137  title = {A brief survey of Adversarial Machine Learning and Defense Strategies},
2141  year = {2015},
2142  journal = {arXiv preprint arXiv:1508.03326},
2143  author = {Zhou, Li},
2144  title = {A survey on contextual multi-armed bandits},
2148  year = {2020},
2149  journal = {arXiv preprint arXiv:2010.04053},
2150  author = {Caton, Simon and Haas, Christian},
2151  title = {Fairness in Machine Learning: A Survey},
2155  publisher = {Bloomington},
2156  year = {1991},
2157  pages = {39--48},
2158  number = {4},
2159  volume = {34},
2160  journal = {Business Horizons},
2161  author = {Carroll, Archie B and others},
2162  title = {The pyramid of corporate social responsibility: Toward the moral management of organizational stakeholders},
2166  year = {2020},
2167  journal = {arXiv preprint arXiv:2011.07586},
2168  author = {Bhatt, Umang and Antor{\'a}n, Javier and Zhang, Yunfeng and Liao, Q Vera and Sattigeri, Prasanna and Fogliato, Riccardo and Melancon, Gabrielle Gauthier and Krishnan, Ranganath and Stanley, Jason and Tickoo, Omesh and others},
2169  title = {Uncertainty as a form of transparency: Measuring, communicating, and using uncertainty},
2173  year = {2018},
2174  pages = {592--603},
2175  volume = {31},
2176  journal = {NeurIPS},
2177  author = {Dhurandhar, Amit and Chen, Pin-Yu and Luss, Ronny and Tu, Chun-Chen and Ting, Paishun and Shanmugam, Karthikeyan and Das, Payel},
2178  title = {Explanations based on the missing: Towards contrastive explanations with pertinent negatives},
2182  publisher = {Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA},
2183  year = {2017},
2184  pages = {246--255},
2185  number = {3},
2186  volume = {5},
2187  journal = {Big Data},
2188  author = {Varshney, Kush R and Alemzadeh, Homa},
2189  title = {On the safety of machine learning: Cyber-physical systems, decision sciences, and data products},
2193  publisher = {IBM},
2194  year = {2019},
2195  pages = {3--1},
2196  number = {4/5},
2197  volume = {63},
2198  journal = {IBM Journal of Research and Development},
2199  author = {Sattigeri, Prasanna and Hoffman, Samuel C and Chenthamarakshan, Vijil and Varshney, Kush R},
2200  title = {Fairness GAN: Generating datasets with fairness properties using a generative adversarial network},
2204  year = {2020},
2205  booktitle = {AISTATS},
2206  author = {Wei, Dennis and Ramamurthy, Karthikeyan Natesan and Calmon, Flavio du Pin},
2207  title = {Optimized score transformation for fair classification},
2211  year = {2020},
2212  journal = {arXiv preprint arXiv:2009.13676},
2213  author = {Abdalla, Mohamed and Abdalla, Moustafa},
2214  title = {The Grey Hoodie Project: Big Tobacco, Big Tech, and the threat on academic integrity},


arXiv:2101.02032v5 [cs.CY]
License: cc-by-4.0

Related Posts

Transparency on the Use of AI and AI Systems

Transparency on the Use of AI and AI Systems

As organizations embrace AI technologies, the imperative of transparency in AI governance becomes crucial.

AI Ethics: An Empirical Study on the Views of Practitioners and Lawmakers

AI Ethics: An Empirical Study on the Views of Practitioners and Lawmakers

Introduction Artificial Intelligence (AI) becomes necessary across a vast array of industries including health, manufacturing, agriculture, and banking.

A Brief Overview of AI Governance for Responsible Machine Learning Systems

A Brief Overview of AI Governance for Responsible Machine Learning Systems

Introduction In this position paper, we share our insights about AI Governance in companies, which enables new connections between various aspects and properties of trustworthy and socially responsible Machine Learning: security, robustness, privacy, fairness, ethics, interpretability, transparency, etc.