Papers is Alpha. This content is part of an effort to make research more accessible, and (most likely) has lost some details from the original. You can find the original paper here.
Introduction
Technology developers, researchers, policymakers, and others have identified the design and development process of artificial intelligence (AI) systems as a site for interventions to promote more ethical and just ends for AI systems. Recognizing this opportunity, researchers, practitioners, and activists have created a plethora of tools, resources, guides, and kits—of which the dominant paradigm is a “toolkit”—to promote ethics in AI design and development. Toolkits help technology practitioners and other stakeholders surface, discuss, or address ethical issues in their work. However, as the field appears to coalesce around this paradigm, it is critical to consider how these toolkits help to define and shape that work. Technologies that create standards (such as widely adopted toolkits), shape how people understand and interact with the worldPrior research in CSCW and related fields has advanced our understanding of the work required to implement AI ethics principles in practice. In addition, prior work in CSCW has also examined the politics of tools and other artifacts designed to support the work of pursuing values and ethics, such as security, privacy, and UX design. Previous reviews of AI ethics and fairness toolkits have primarily focused on their usability and functionalityor evaluating their efficacy in addressing ethical issues. In this paper, we contribute to these bodies of research by taking a more critical approach to understand how AI ethics toolkits, like all tools, enact values and assumptions about what it means to do the work of ethics. We start from the basis that simply creating toolkits will not be sufficient to address ethical issues. They must be adopted and used in practice within specific organizational contexts, but, as prior research has identified, adopting AI ethics tools and processes within organizational contexts presents challenges beyond usability and functionality. Therefore, by understanding how toolkits envision the work of AI ethics—particularly how those work practices may align (or not) with the organizational contexts in which they may be used—we may better identify opportunities to improve the design of toolkits and identify instances where additional processes or artifacts beyond toolkits may be useful. To investigate this, we ask:
What are the discourses of ethics that ethical AI toolkits draw on to legitimize their use?
Who do the toolkits imagine as doing the work of addressing ethics in AI?
What do toolkits imagine to be the specific work practices of addressing ethics in AI?
To do this, we compiled and qualitatively coded a corpus of 27 AI ethics toolkits (broadly construed) to identify the discourses about ethics, the imagined users of the toolkits, and the work practices the toolkits envision and support. We found that AI ethics toolkits largely frame the work of AI ethics as technical work for individual technical practitioners, even as those same toolkits call for engaging broader sets of stakeholders to grapple with social aspects of AI ethics. In addition, we find that toolkits do not contend with the organizational, labor, and political implications of AI ethics work in practice.In general, we found gaps between the types of stakeholders and work practices the toolkits call for and the support they provide. Despite framing ethics and fairness as sociotechnical issues that require diverse stakeholder involvement and engagement, many of the toolkits focused on technical approaches for individual technical practitioners to undertake. With few exceptions, toolkits lacked guidance on how to involve more diverse stakeholders or how to navigate organizational power dynamics when addressing AI ethics.=-1
We provide recommendations for designers of AI ethics toolkits—both future and existing—to (1) embrace the non-technical dimensions of AI ethics work; (2) support the work of engaging with stakeholders[Here, we use the term “stakeholder” expansively, to include both potential users of the toolkits, others who may be part of the AI design, development, and deployment process, as well as other direct and indirect stakeholders who may be impacted by AI systems. We take this expansive approach following Lucy Suchman’s work complicating the notion of the useras well as Forlizzi and Zimmerman’s work calling for more attention to stakeholders outside of the end users. In cases where we specifically mean the users of the toolkit, we use the term “user.”] from non-technical backgrounds; and (3) structure the work of AI ethics as a problem for collective action. We end with a discussion of how we, as a research community, can foster the design of toolkits that achieve these goals, and we grapple with how we might create metaphors and formats beyond toolkits that resist the solutionism[Although we provide suggestions for how to improve the design of AI ethics toolkits, we are wary of wholesale endorsing this form, as it may lead towards a technosolutionist approach. Nonetheless, this is the dominant paradigm for resources to support AI ethics in practice. As they are widely used, we believe there is value in exploring how the toolkit may be improved following the “practical turn” of values in design research, while simultaneously grappling with its limitations.] prevalent in today’s resources.
Background
Toolkits
As a genre
What sort of thing is a toolkit? At their core, toolkits are curated collections of tools and materials. Examples abound: do-it-yourself construction toolkits; first aid kits; traveling salesman kits; and research toolkits for (e.g.,) conducting participatory development efforts in rural communities, among many other examples. If we view them as a genre of communication, we can see how their design choices structure their users’ actions and interactions by conveying expectations for how they might be used. As Mattern has argued, toolkits make particular claims about the world through their design—they construct an imagined user, make an implicit argument about what forms of knowledge matter, and suggest visions for the way the world should be. As a genre of communication, toolkits suggest a set of practices in a commonly recognized form; they formalize complex processes, but in so doing, they may flatten nuance and suggest that the tools to solve complex problems lie within the confines of the kit. Although artifacts can make certain practices legible, understandable, and knowable across different contexts, they can also abstract away from locally situated practices. Moreover, toolkits work to configure what Goodwin calls professional vision: “socially organized ways of seeing and understanding events that are answerable to the distinctive interests of a particular social group”. This professional vision has political implications: in Goodwin’s analysis, U.S. policing creates “suspects” to whom “use of force” can be applied; it is thus critical to examine how toolkits may configure the professional vision of AI practitioners working on ethics.=-1
In AI ethics
In light of AI practitioners’ needs for support in addressing the ethical dimensions of AI, technology companies, researchers at CSCW, FAccT, CHI, and other venues, as well as other groups have developed numerous tools and resources to support that work, with many such resources taking the form of toolkits. Several papers have performed systemic meta-reviews and empirical analyses of AI ethics toolkits. For instance, one line of research performs descriptive analyses of AI ethics toolkits, including’s work identifying stakeholder types common across toolkits, and stages in the organizational lifecycle at which various toolkits are applied, as well as’s work proposing a typology of AI ethics approaches synthesized from a variety of toolkits, and’s analysis of 77 AI ethics toolkits, finding that many lack instructions or training to facilitate adoption. In addition, others have conducted more empirical examination of toolkits, including’s normative evaluation of six open source fairness toolkits, using surveys and interviews with practitioners to understand the strengths and weaknesses of these tools, as well as’s work conducting simulated ethics scenarios with ML practitioners, observing their experience using various ethics toolkits to inform recommendations for their design, and’s work exploring how practitioners use toolkits in their AI ethics work in practice.
In technology fields other than AI ethics, others have studied how design toolkits shape work practices. For instance,identify how toolkits operationalize ethics, identify their audience, and embody specific theories of change.’s analysis of cybersecurity toolkits reveals a complex set of “differentially” vulnerable persons, all attempting to achieve security for their socially situated needs. Building on prior empirical work evaluating the functionality and usability of AI ethics toolkits, we take a critical approach to understand the work practices that toolkits envision for their imagined users, and how those work practices might be enacted in particular sites of technology production. In other words, we focus our analysis on how toolkits help configure the organizational practice of AI ethics.
AI Ethics in Organizational Practice
As the field of AI ethics has moved from developing high-level principlesto operationalizing those principles in particular sets of practices, prior research has identified the crucial role that social and organizational dynamics play in whether and how those practices are enacted in the organizational contexts where AI systems are developed. Substantial prior work has identified the crucial role of organizational dynamics (e.g., workplace politics, institutional norms, organizational culture)in shaping technology design practices more broadly. Prior ethnographic research on the work practices of data scientists has identified how technical decisions are never just technical—that they are often contested and negotiated by multiple actors (e.g., data scientists, business team members, user researchers) within their situated contexts of work.discuss how such negotiations were shaped by the organizations’ business priorities, and how the culture and structure of those organizations legitimized technical knowledge over other types of knowledge and expertise, in ways that shaped how negotiations for technical design decisions were resolved. These dynamics are found across a range of technology practitioners, including user experience professionals, technical researchers, or privacy professionals.
Prior research on AI ethics work practices has similarly identified how the organizational contexts of AI development shape practitioners’ practices for addressing ethical concerns. Metcalf et al., explored the recent institutionalization of ethics in tech companies by tracing the roles and responsibilities of so-called “ethics owners”. In contrast with ethics owners who may have responsibility over ethical implications of AI,identified how the social pressures on AI practitioners (e.g., data scientists, ML engineers, AI product managers) to ship products on rapid timelines disincentivized them to raise concerns about potential ethical issues. Taking a wider view,discussed how AI development suffers from misaligned incentives and a lack of organizational accountability structures to support proactive anticipation of and work to address ethical AI issues. However, as resources to support AI ethics work have proliferated—including AI ethics toolkits—it is not clear to what extent the designers of those resources have learned the lessons of this research on how organizational dynamics may shape AI ethics work in practice.
Methods
Researchers’ positionality
The three authors share an interest in issues related to fairness and ethics in AI and ML systems, and have formal training in human-computer interaction and information studies, but also draw on interdisciplinary research fields studying the intersections of technology and society. All three authors are male, and live and work for academic and industry research institutions in the United States. One author’s prior research is situated in values in design, studying the practices used by user experience and other technology professionals to address ethical issues in their work, including the organizational power dynamics involved in these practices. Another author’s prior work has focused on how AI practitioners conceptualize fairness and address it in their work practices. He has conducted fairness research with AI practitioners, has contributed to multiple resources for fairness in AI, and has worked on fairness in AI at large technology companies. The third author has built course materials to teach undergraduate and graduate students how to identify and ameliorate bias in machine learning algorithms and has reflected on the ways that students do not get exposed to fairness in technical detail during their coursework. =-1
The corpus we developed may have been shaped by our positionality as researchers in academia and industry living in the U.S. and conducting the search in English. Our prior research with technology practitioners led us to focus on the artifact of the “toolkit,” which we have encountered in our prior work, although we recognize that this focus may obscure other artifacts and forms of action that are currently in use but that did not fit our conception of a toolkit. Furthermore, our familiarity with gaps between the corporate rhetoric of ethical action and actual practices related to ethical action (e.g.,) led us to focus our research questions and analysis to highlight potential gaps between the rhetoric or imaginaries embedded in toolkits and the practices or tensions we are familiar with from our prior work and experiences with practitioners. This framing is one particular lens with which to understand these artifacts, although there may be other lenses that may provide additional insights.
Corpus development
We conducted a review of existing ethics toolkits, curated to explore the breadth of ways that ethical issues are portrayed in relation to developing AI systems. We began by conducting a broad search for such artifacts in May-June 2021. We searched in two ways. First, we looked at references from recent research papers from CSCW, FAccT, and CHI that survey ethical toolkits. Second, following the approach in, we emulated the position of a practitioner looking for ethical toolkits and conducted a range of Google searches for artifacts using the terms: “AI ethics toolkit,” “AI values toolkit,” “AI fairness toolkit,” “ethics design toolkit,” “values design toolkit.” Several search results provided artifacts such as blog posts or lists of other toolkits, and many toolkits appeared in results from multiple search terms.[Although not all toolkits specifically focused on AI (some focused on “algorithms” or “design”), their content and their inclusion in search results made it reasonably likely that a practitioner would consult with the resource in deciding how to enact AI ethics.] We shared and discussed these resources with each other to discuss what might (not) be considered a toolkit (for instance, we decided to exclude ethical oaths or compilations of tools).[Note that the term toolkit is used in this paper is an analytical category chosen by the researchers to search for and describe the artifacts being studied. Not all the artifacts we analyzed explicitly described themselves using the term toolkit. See the Appendix for more details about the toolkits.] Although we broadly view toolkits as curated collections of tools and materials, we largely take an inductive approach to understanding what toolkits purport to be. From these search processes, we initially identified 57 unique candidate toolkits for analysis.
Our goal was to identify a subset of toolkits for deeper qualitative analysis in order to sample a variety of types of toolkits (rather than attempt to create an exhaustive or statistically representative sample). After reading through the toolkits, we discussed potential dimensions of variation, including: the source(s) of the toolkit (e.g., academia, industry, etc), the intended audience or user, form factor(s) of the toolkit and any guidance it provided (e.g., code, research papers, documentation, case studies, activity instructions, etc.), and its stated goal(s) or purpose(s). We also used the following criteria to narrow the corpus for deeper qualitative analysis:
- The toolkit’s audience should be a stakeholder related to the design, deployment, or use of AI systems. This led us to exclude toolkits such as Shen et al.’s value cards, designed primarily for use in a student or educational setting, but not to exclude toolkits such as, intended to be used by community advocates.
- We excluded five artifacts that focused on non-AI systems, and four designed to be used in classroom settings.
- =-1
- The toolkit should provide specific guidance or actionable items to its audience, which could be technical, organizational, or social actions. Artifacts that provided lists of other toolkits or only provided informational materials were excluded (e.g., a blog post advocating for greater use of value-sensitive design).
- We excluded five artifacts that were primarily informational or advocacy materials, four where we could not access enough information, such as paywalled services, and two that focused on professional education activities.
- Given our focus on practice, the toolkit should have some indication of use (by stakeolders either internal or external to companies). Although we are unable to validate the extent to which each toolkit has been adopted,
- we used a set of proxies to estimate which toolkits are likely to have been used by practitioners, including whether it appeared in practitioner-created lists of resources, its search results rankings, or (for open source code toolkits) indications of community use or contributions. One author also works in an industry institution, and was able to provide further insight into toolkit usage by industry teams. This excluded some toolkits that were created as part of academic papers, and which did not seem to be more broadly used by practitioners at the time of sampling, such as FairSight.
- We excluded seven artifacts that seemed to have low use, and two artifacts that were primarily academic research papers.
- In addition, due to the authors’ language limitations, we excluded one toolkit not in English.
We independently reviewed the toolkits for inclusion, exclusion, or discussion. As a group, we discussed toolkits that we either marked for discussion or that we rated differently. To resolve disagreements, we decided to aim for variation along multiple dimensions (a toolkit that overlapped a lot with an already included toolkit was less likely to be included). From the 57 candidates, 30 total were excluded. The final corpus includes 27 toolkits, which are summarized in Section section-corpus-description and fully listed in Appendix section-toolkit-list.
Corpus Analysis
In the first round of our analysis, we conducted an initial coding of the 27 toolkits based on the following dimensions: the source(s) of the toolkit (e.g., academia or industry), the intended audience or user, its stated goal(s), and references to the ML pipeline.[Although many of these were explicitly stated in the toolkits’ documentation, some required some interpretative coding. We resolved all disagreements through discussion amongst all three authors.] We used the results of this initial coding to inform our discussions of which toolkits to include in the corpus, as well as to inform our second round of analysis. We then began a second round of more open-ended inductive qualitative analysis based on our research questions (following). From reading through the toolkits, the authors discussed potential emerging themes. These initial themes included: what work do toolkits imagine is needed to address AI ethics; who do toolkits describe as doing the work of AI ethics; how does that compare to prior research about enacting AI ethics work in practice; what types of guidance are provided in toolkits; how do toolkits refer to the organizational contexts where they may be used; how do toolkits conceptualize social values (such as fairness or inclusion); when in or beyond the design process do the toolkits suggest they should be used; the toolkits’ different form factors; what social or technical background knowledge might be required to understand or use the toolkit; and whether toolkits describe any risks or limitations associated with their use. Our open-ended exploration of these themes helped us refine our research questions (to those presented in Section sec-intro).
Based on these themes, we decided to ask the following questions of each of the toolkits to further our analysis:
- What language does the toolkit use to describe values and ethics?
- What does the toolkit say about the users and other stakeholders of the AI systems to whom the toolkit aims its attention?
- What type of work is needed to enact the toolkit’s guidance in practice?
- What does the toolkit say about the organizational context in which workers must apply the toolkit?
Each author read closely through one third of the toolkits, found textual examples that addressed each of these questions, and posted those examples onto sticky notes in an online whiteboard. Collectively, all the authors conducted thematic analysis and affinity diagramming on the online whiteboard, inductively clustering examples into higher-level themes, which we report on in the findings section.
Corpus Description
We briefly describe our corpus of 27 toolkits based on our first round of analysis.[Multiple codes could be assigned to each toolkit, so the counts may sum to more than 27.] A full listing of toolkits is in Appendix section-toolkit-list, including details of our coding results in Table tab-toolkits-analysis. The toolkit authors include: technology companies (16 toolkits), university centers and academic researchers (6), non-profit organizations or institutes (6), open source communities (2), design agencies (2), a government agency (1), and an individual tech worker (1).
The toolkits’ form factors vary greatly as well. Many are technical in nature, such as open-source code (11 toolkits), proprietary code (1), documentation (12), tutorials (2), a software product (1), or a web-based tool (1). Other common forms include exercise or activity instructions (7), worksheets (5), guides or manuals (5), frameworks or guidelines (2), checklists (2), or cards (2). Several include informational websites or reading materials (4). Considering the toolkits’ audiences, most are targeted towards technical audiences such as developers (6 toolkits), data scientists (6), designers (5), technology professionals or builders (3), implementation or product teams (3), analysts (2), or UX teams (1). Some are aimed at different levels within organizations, including: managers or product/project managers (2), executive leadership (1), internal stakeholders (1), team members (1), or organizations broadly (1). Some toolkits’ audiences include people outside of technology companies, including: policymakers or government leaders (3), advocates (3), software clients or customers (1), vendors (1), civil society organizations (1), community groups (1), and users (1). We elaborate more on the toolkits’ intended audiences in Section stakeholders.=-1
Findings
We begin our findings with a description of the language toolkits use to describe and frame the work of AI ethics (RQ1). We then discuss the audiences envisioned to use the toolkits (RQ2); and close with what the toolkits envision to be the work of AI ethics (RQ3).
Language, framing, and discourses of ethics (RQ1)
Motivating Ethics: Harms, Risks, Opportunities, and Scale
We first look at how the toolkits motivate their use. Often, they articulate a problem that the toolkit will help address. One way of articulating a problem is identifying how AI systems can have effects that harm people. In such cases, toolkits motivate ethical problems by highlighting harms to people outside the design and development process—a group that Pfaffenberger terms the “impact constituency,” the “individuals, groups, and institutions who lose as a technology diffuses throughout society”. For instance, Fairlearn describes unfairness “in terms of its impact on people — i.e., in terms of harms — and not in terms of specific causes, such as societal biases, or in terms of intent, such as prejudice” [itm-t5-fairlearn]. Other toolkits gesture towards the “impact” [itm-t2-modelcards] or “unintended consequences” [itm-t9-aiethicscards] of systems.
Conversely, other toolkits frame problems by articulating how AI systems can present risks to the organizations developing or deploying them. They highlight potential business, financial, or reputational risks, or by relating AI ethics to issues of corporate risk management more broadly. The Ethics & Algorithms toolkit, aimed at governments and organizations who are procuring and deploying AI systems describes itself as “A risk management framework for governments (and other people too!) to approach ethical issues.” [itm-t7-ethicsandalgorithms]. Other toolkits suggest that they can help manage business risks, in part by generating governance and compliance reports. In contrast with the language of harms, which focuses on people who are affected by AI systems (often by acknowledging historical harms that different groups have experienced), the language of risk is more forward facing, focusing on the potential for something to go wrong and how it might affect the organization developing or deploying the AI system—leading the organization to try to find ways to prepare contingencies for the possible negative futures it can foresee for itself.
Not all toolkits frame AI ethics as avoiding negative outcomes, however. The integrate.ai guide uses the term “opportunity,” framing AI ethics in terms of pursuing positive opportunities or outcomes. The guide argues that AI ethics can be part of initiatives “incentivizing risk professionals to act for quick business wins and showing business leaders why fairness and transparency are good for business” [itm-t16-responsibleai]. The IDEO AI Ethics cards (which in some sections also frames AI ethics in terms of harms to people) also discusses capturing positive potential, writing: “In order to have a truly positive impact, AI-powered technologies must be grounded in human needs and work to extend and enhance our capabilities, not replace them” [itm-t9-aiethicscards]. In these examples, AI ethics is framed as a way for businesses or the impact constituency to capture “upside” benefits of technology through design, development, use, and business practices.
Some toolkits imagine that the positive or negative impacts of AI technologies will occur at a global scale. This is evidenced by statements such as: “your [technology builders’] work is global. Designing AI to be trustworthy requires creating solutions that reflect ethical principles deeply rooted in important and timeless values.”[itm-t28-harmsmodeling]; or “Data systems and algorithms can be deployed at unprecedented scale and speed—and unintended consequences will affect people with that same scale and speed” [itm-t9-aiethicscards]. Framing ethics globally perhaps draws attention to potential non-obvious harms or risks that might occur, prompting toolkit users to consider broader and more diverse populations who interact with AI systems. At the same time, the language of AI ethics operating at a global scale—and thus addressable at a global scale—also suggests a shared universal definition of social values, or suggests that social values have universally shared or similar impacts. This view of values as a stable, universal phenomenon has been critiqued by a range of scholars who discuss how social values are experienced in different ways, and are situated in local contexts and practices.
Sources of Legitimacy for Ethical Action
Toolkits’ use of language also claims authority from existing discourses about what constitutes an ethical problem and how problems should be addressed. These claims help connect the toolkits’ practices to a broader set of practices or frameworks that may be more widely accepted or understood, helping to legitimize the toolkits’ perspectives and practices, and providing a useful tactical alignment between the toolkit and existing organizational practices and resources.
Perhaps surprisingly, almost none of the toolkits provide an explicit discussion of philosophical ethical frameworks. (Although toolkits may implicitly draw on different ethical theories, our focus in this analysis is on the explicit theories, discourses, and frameworks that are referred to in the text of the toolkits and their supporting documentation). One exception to this is the Design Ethically toolkit, which provides a brief overview of deontological ethics and consequentialism, calling them “duty-based” and “results-based” [itm-t1-ethicskit]. Several toolkits adopt the language of “responsible innovation.” The Consequence Scanning toolkit was developed in the U.K. and calls itself “an Agile event for Responsible Innovators” [itm-t8-consequencescanning]. The integrate.ai toolkit is titled “Responsible AI in Consumer Enterprise” [itm-t16-responsibleai]. Fairlearn notes that its community consists of “responsible AI enthusiasts” [itm-t5-fairlearn]. Several toolkits in our corpus are listed as part of Microsoft’s “responsible AI” resources [t24-hax, itm-t27-communityjury, itm-t28-harmsmodeling]. There seems to be rhetorical power in aligning these toolkits with practices of responsible innovation, although questions about what people or groups the companies or toolkit users are responsible to are not explicitly discussed. More broadly, what it means to align toolkits with responsible innovation is itself an open question.[With origins in the rise of science and technology as a vector of political power in the 20th century, “responsible innovation” frames free enterprise as the agents of ethics, implicitly removing from frame policymakers, regulation, and other forms of popular governance or oversight. Future work should investigate more deeply what discursive work “responsible innovation” does in the context of AI ethics more broadly, particularly as it concerns private enterprise.]
Other toolkits look to external laws and standards as a legitimate basis for action; ethics is thus conceptualized as complying and acting in accordance with the law. Audit-AI, a tool that measures discriminatory patterns in data and machine learning predictions, explicitly cites U.S. labor regulations set by the Equal Employment Opportunity Commission (EEOC), writing that “According to the Uniform Guidelines on Employee Selection Procedures (UGESP; EEOC et al., 1978), all assessment tools should comply to fair standard of treatment for all protected groups” [itm-t19-auditai]. Audit-AI similarly draws on EEOC practices when choosing a p-value for statistical significance and choosing other metrics to define bias. This aligns the toolkit with a regulatory authority’s practices as the basis for ethics; however, it does not explicitly question whether this particular definition of fairness is applicable in contexts beyond the cultural and legal U.S. employment context.=-1
Several toolkits frame ethics as upholding human rights principles, drawing on the UN Declaration of Human Rights. In our dataset, this occurred most prominently in Microsoft’s Harms Modeling Toolkit: “As a part of our company’s dedication to the protection of human rights, Microsoft forged a partnership with important stakeholders outside of our industry, including the United Nations (UN)” [itm-t28-harmsmodeling]. Supported by the UN’s Guiding Principles on Business and Human Rights, many large technology companies have made commitments to upholding and promoting human rights.[It has been argued that involving businesses in the human rights agenda can provide legitimacy and disseminate human rights norms in broader ways than nation states could alone. However, more recent research and commentary has been critical of technology companies’ commitments to human rights, with a 2019 UN report stating that big technology companies “operate in an almost human rights-free zone.” This corresponds with prior research that shows how human rights discourses provide one source of values for AI ethics guidelines more broadly.Many companies have existing resources or practices around human rights, such as human rights impact assessments. Framing AI ethics as a human rights issue may help tactically align the toolkit with these pre-existing initiatives and practices.=-1
The envisioned users and other stakeholders for toolkits (RQ2)
This section asks, who is to do the work of AI ethics? The design and supporting documentation of toolkits presupposes a particular audience—or, asdescribes it, they “summon” particular users through the types of shared understanding, background knowledge, and expertise they draw on and presume their users to have. The toolkits in our corpus mention several specific job categories internal to the organizations in question: software engineers; data scientists; members of cross-functional or cross-disciplinary teams; risk or internal governance teams; C-level executives; board members. To a lesser extent, they mention designers. All of these categories of stakeholders pre-configure specific logics of labor and power in technology design. Toolkits that mention engineering and data science roles focus on ethics as the practical, humdrum work of creating engineering specifications and then meeting those specifications. (One toolkit, Deon, is a command-line utility for generating “ethics checklists”) [itm-t12-deon]. For C-level executives and board members, toolkits frame ethics as both a business risk and a strategic differentiator in a crowded market. As the integrate.ai Responsible AI guide states, “Sustainable innovation means incentivizing risk professionals to act for quick business wins and showing business leaders why fairness and transparency are good for business.” [itm-t16-responsibleai]
Of course, stakeholders involved in AI design and development always already have their roles pre-configured by their job titles and organizational positionality; roles that the toolkits invoke and summon in their description of potential toolkit users and other relevant stakeholders. They (for example, “business leaders”) are sensitized toward particular facets of ethics, which are made relevant to them through legible terms (for example, “risk”). As such, the nature of these internal (i.e., internal to the institutions developing AI) stakeholders’ participation in the work of ethics is bound to vary. On what terms do these internal stakeholders get to participate? Borrowing fromwho in turn channels, what are the “terms of inclusion” for each of these internal stakeholders?
Technically-oriented tooling (like Google’s What If tool [itm-t10-whatif]) envisions technical staff who contribute directly to production codebases. Although toolkits rarely address the organizational positioning of engineers (and their concerns) directly, they are specific about the mechanism of action and means of participation for these technical tools. One runs statistical tests, provides assurances around edge cases, and keeps track of statistical markers like disparate impact or the p% rule.=-1
For social and human-centered practices, the terms of participation are less clear. The rhetoric of these toolkits is one of participation—between cross-functional teams (comprised of different roles), between C-suite executives and tech labor, and between stakeholders both internal and external to the organization. But no toolkit quite specifies how this engagement should be enacted. Methodological detail is scant, let alone acknowledgements of power differentials between workers and executives, or tech workers and external stakeholders. Even those rare toolkits that do acknowledge power as a factor—for example, what the Ethics & Algorithms toolkit lists as its “mitigation #1”—under-specify how this power should be dealt with.
“Mitigation 1. Effective community engagement is people-centered, partnerships-driven, and power-aware. Engagement with the community should be social (using existing social networks and connections), technical (skills, tools, and digital spaces), physical (commons), and on equal terms (aware of and accounting for power).” [itm-t1-ethicskit]
Although this “mitigation” refers specifically to the need to be aware of power, to account for power, it offers no specific strategies to become aware, to do such “accounting.” Who does that work, and how?
This question brings us to the second broad category of stakeholders invoked by toolkits—stakeholders external to companies, described as “the community” above. This group variously includes clients, vendors, customers, users, civil society groups, journalists, advocacy groups, community members, and others impacted by AI systems. These stakeholders are imagined as outside the organization in question, sometimes by several degrees (although some, such as customers, clients, and vendors, may be variously entangled with the organization’s operations). For example, the Harms Modeling toolkit lists “non-customer stakeholders; direct and indirect stakeholders; marginalized populations” [itm-t28-harmsmodeling]. The Community Jury mentions “direct and indirect stakeholders impacted by the technology, representative of the diverse community in which the technology will be deployed” [itm-t27-communityjury]. Google’s Model Cards describes its artifacts as being for “everyone… experts and non-experts alike” [itm-t2-modelcards]. None of those toolkits, however, provide guidance on how to identify specific stakeholders, or how to engage with them once they have been identified. Indeed, the work these external stakeholders are imagined to do in these circumstances is under-specified. Their specific roles are under-imagined, relegated to the vague “raising concerns” or “providing input” from “on-the-ground perspectives.” We return to this point in the following section.
Work practices envisioned by toolkits (RQ3)
Much of the work of ethics as imagined by the toolkits focuses on technical work with ML models, in specific workflows and tooling suites, despite claims that fairness is sociotechnical (e.g., [itm-t5-fairlearn]). Many toolkits aimed at design and development teams call for engagement with stakeholders external to the team or company—and for such stakeholders to inform the team about potential ethical impacts, or for the AI design team to inform and communicate about ethical risks to stakeholders. However, there is little guidance provided by the tools on how to do this; these imagined roles for stakeholders beyond the development team are framed as informants or as recipients of information (without the ability to shape systems’ designs). Moreover, the technical orientation of many toolkits may preclude meaningful participation by non-technical stakeholders. As framed by the toolkits, the work of ethics is often imagined to be done by individual data scientists or ML teams, both of whom are imagined to have the power to influence key design decisions, without considering how organizational power dynamics may shape those processes. The imagined work of ethics here is largely individual self-reflection, or team discussions, but without a theory of change for how self-reflection or discussions might lead to meaningful organizational shifts.
Emphasis on technical work
Much of the work of ethics as imagined by the toolkits (and their designers) is focused on technical work with ML models, ML workflows, and ML tooling suites—with few exceptions, i.e., the Algorithmic Equity Toolkit [itm-t17-aekit] and others [itm-t8-consequencescanning, itm-t27-communityjury] (the forms of non-technical work that these few toolkits suggest is an area for further exploration, which we discuss in Section section-recommendations-design). This is in spite of the claims from some toolkits that “fairness is a sociotechnical problem” [itm-t5-fairlearn, itm-t27-communityjury]. In practice, this means that tools’ imagined (and suggested) uses are oriented around the ML lifecycle, often integrated into specific ML tool pipelines. For instance, Amazon’s SageMaker describes how it provides the ability to “measure biases that can occur during each stage of the ML lifecycle (data collection, model training and tuning, and monitoring of ML models deployed for inference)” [itm-t22-sagemaker]. Other toolkits go further, and are specifically designed to be implemented into particular ML programming tooling suites, such as Scala or Spark [itm-t18-lift], TensorFlow, or Google Cloud AI platform [itm-t10-whatif, itm-t20-tensorflow]. Some toolkits, albeit substantially fewer, provide recommendations for how toolkit users might make different choices about how to use the tool depending on where they are in their ML lifecycle [itm-t3-aif360].
However, this emphasis on technical functionality offered by the toolkits, as well as the fact that many are designed to fit into ML modeling workflows and tooling suites suggests that non-technical stakeholders (whether they are non-technical workers involved in the design of AI systems, or stakeholders external to technology companies) may have difficulty using these toolkits to contribute to the work of ethical AI. At the very least, it implies that the intended users must have sufficient technical knowledge to understand how they would use the toolkit in their work—and further reinforces that the work of AI ethics is technical in nature, despite claims to the contrary [itm-t5-fairlearn, itm-t27-communityjury]. In this envisioned work, what role is there for designers and user researchers, for domain experts, or for people impacted by AI systems, in doing the work of AI ethics?
Calls to engage stakeholders, but little guidance on how
One of the key elements of AI ethics work suggested by toolkits involves engaging stakeholders external to the development team or their company (as discussed in Sec. stakeholders). However, many toolkits lacked specific resources or approaches for how to do this engagement work. Toolkits often advocated for working with diverse groups of stakeholders to inform the development team about potential impacts of their systems, or to “seek more information from stakeholders that you identified as potentially experiencing harm” [itm-t28-harmsmodeling]. For some toolkits, this was envisioned to take the form of user research, recommending that teams “bring on a neutral user researcher to ensure everyone is heard” [itm-t27-communityjury] (what it means for a researcher to be “neutral” is left to the imagination), or to “help teams think through how people may interact with a design” [itm-t9-aiethicscards]. Others envisioned this information gathering as workshop sessions or discussions, as in the consequence scanning guide [itm-t8-consequencescanning] or community jury approach [itm-t27-communityjury].=-1
Although some toolkits called for AI development teams to learn about the impacts of their systems from external stakeholders, a smaller subset were designed to support external stakeholders or groups in better understanding the impacts of AI. For instance, the Algorithmic Equity Toolkit was designed to help citizens and community groups “find out more about a specific automated decision system” by providing a set of questions for people to ask to policymakers and technology vendors [itm-t17-aekit]. In addition, some developer-facing tools such as Model Cards were designed to provide information to “help advocacy groups better understand the impact of AI on their communities” [itm-t2-modelcards]. Despite these calls for engagement, toolkits lack concrete resources for precisely how to engage external stakeholders in either understanding the ethical impact of AI systems or involving them in the process of their design to support more ethical outcomes. Some toolkits explicitly name particular activities that would benefit from involving a wide range of stakeholders, such as the Harms Modeling toolkit: “You can complete this ideation activity individually, but ideally it is conducted as collaboration between developers, data scientists, designers, user researcher, business decision-makers, and other disciplines that are involved in building the technology” [itm-t28-harmsmodeling]. The stakeholders named by the Harms Modeling toolkit, however, are still “disciplines involved in building the technology” [itm-t28-harmsmodeling] and not, for instance, people who are harmed or otherwise impacted by the system outside of the company. Others, such as the Ethics & Algorithms toolkit, broaden the scope, recommending that “you will almost certainly need additional people to help - whether they are stakeholders, data analysts, information technology professionals, or representatives from a vendor that you are working with” [itm-t7-ethicsandalgorithms]. However, despite framing the activity as a “collaboration” [itm-t28-harmsmodeling] or “help” [itm-t7-ethicsandalgorithms] such toolkits provide little guidance for how to navigate the power dynamics or organizational politics involved in convening a diverse group to use the toolkit.
Theories of change
Ethical AI toolkits present different theories of change for how practitioners using the toolkits may effect change in the design, development, or deployment of AI/ML systems. For many toolkits, individuals within the organization are envisioned to be the catalysts for change via oaths [itm-t13-designethically] or “an individual exercise” [itm-t1-ethicskit] where individuals are prompted to “facilitat[e] your own reflective process” [itm-t1-ethicskit]. This approach is aligned with whatBoyd and others have referred to as developing ethical sensitivity. Some toolkits explicitly articulated the belief that individual practitioners who are aware of possible ethical issues may be able to change the direction of the design process. For instance, “The goal of Deon is to push that conversation forward and provide concrete, actionable reminders to the developers that have influence over how data science gets done” [itm-t12-deon]. However, this belief that individual data scientists “have influence over how data science gets done” may be at odds with the reality of organizational power structures that may lead to changes in AI design.
In other cases, the implicit theory of change involves product and development teams having conversations, which are then thought to lead to changes in design decisions towards more ethical design processes or outcomes. Some toolkits propose activities designed to “elicit conversation and encourage risk evaluation as a team” [itm-t7-ethicsandalgorithms]. Others start with individual ethical sensitivity, then move to team-level discussions, suggesting that the toolkit should “provoke discussion among good-faith actors who take their ethical responsibilities seriously” [itm-t12-deon]. Such group-level activities rely on having discussions with “good-faith actors,” presumably those who have developed some level of individual sensitivity to ethical issues. As one toolkit suggests for these group-level conversations, “There is a good chance someone else is having similar thoughts and these conversations will help align the team” [itm-t9-aiethicscards]. In this framing, the work of ethics involves finding like-minded individuals and getting to alignment within the team. However, this approach relies on the possibility of reaching alignment. As such, it may not provide sufficient support for individuals whose ethical views about AI may differ from their team. Individuals may feel social pressure from others on their team to stay silent, or not appear to be contrarian in the face of consensus from the rest of their team.
In fact, despite many toolkits’ claims to empower individual practitioners to raise issues, toolkits largely appeared not to address fundamental questions of worker power and collective action. For instance, the IDEO AI Ethics Cards state that “all team members should be empowered to trust their instincts and raise this Pause flag… at any point if a concept or feature does not feel human-centered” [itm-t9-aiethicscards], and similarly the Design Ethically Toolkit advises that “Having a variety of different thinkers who are all empowered to speak in the brainstorm session makes a world of a difference” [itm-t13-designethically]. However, the Design Ethically toolkit was the only example in our corpus that provided resources to support workplace organizing to meaningfully secure power for tech workers in driving change within their organizations.
Finally, other toolkits pose theories of change that suggest that pressure from external sources (i.e., media, public pressure or advocacy, or other civil society actors or organizations) may lead to changes in AI design and deployment (usually implied to be within corporate or government contexts). The Algorithmic Equity Kit in particular, is explicitly designed to provide resources for “community groups involved in advocacy campaigns” [itm-t17-aekit] to help support that advocacy work. Other toolkits, such as the Ethics & Algorithms Toolkit, focus on government agencies using AI that are “facing increasing pressure from the public, the media, and academic institutions to be more transparent and accountable about their use” [itm-t7-ethicsandalgorithms]. As such, the toolkit offers resources for government agencies to respond to such pressure and provide more transparency and accountability in their algorithmic systems.
More generally, many toolkits enact some form of solutionism—the belief that ethical issues that may arise in AI design can be solved with the right tool or process (typically the approach they propose). Some tools [e.g., itm-t2-modelcards, itm-t3-aif360, itm-t10-whatif, itm-t20-tensorflow] suggest that ethical values such as fairness can be achieved via technical tools alone: “If all fairness metrics are fair, The Bias Report will evaluate the current model as fair.” [itm-t6-aequitas]. Some toolkits (albeit fewer) do note the limitations of purely technical solutions to fundamentally sociotechnical problems [itm-t3-aif360, itm-t5-fairlearn, itm-t10-whatif], as in AIF360’s documentation, which states that “the metrics and algorithms in AIF360… clearly do not capture the full scope of fairness in all situations” [itm-t3-aif360]. As the What-If tool documentation states, “There is no one right [definition of fairness], but we probably can agree that humans, not computers, are the ones who should answer this question” [itm-t10-whatif]. However, even with these acknowledgements, the documentation goes on to note the important role that the toolkit plays in enabling humans to answer that question, as “What-If lets us play `what if’ with theories of fairness, see the trade-offs, and make the difficult decisions that only humans can make” [itm-t10-whatif].
These general framings suggest a particular flavor of solutionism, in which the work of ethics in AI design involves following a particular process (i.e., the one proposed by the toolkit). Toolkits propose ethical work practices that fit into existing development processes [e.g., itm-t12-deon], in ways that suggest that all that is needed is the addition of an activity or discussion prompt and not, for instance, fundamental changes to the corporate values systems or business models that may lead to harms from AI systems. Some toolkits were explicit that ethical AI work should not significantly disrupt existing corporate priorities, saying, “Business goals and ethics checks should guide technical choices; technical feasibility should influence scope and priorities; executives should set the right incentives and arbitrate stalemates” [itm-t16-responsibleai].=-1
Discussion
Throughout these toolkits, we observed a mismatch between the imagined roles and work practices for ethics in AI and the support the toolkits provided for achieving those roles and practices. Specifically, despite rhetoric from the documentation of many toolkits that the work of ethics is _socio_technical, involving contributions from a variety of stakeholders, the actual design and functionality of the majority of toolkits involved technical work for primarily developers and data scientists. Toolkits suggested multi-stakeholder approaches to addressing ethical issues in sociotechnical ways, but most toolkits provided little scaffolding for the social dimensions of ethics or for engaging stakeholders from multiple (non-technical) backgrounds. These technosolutionist approaches to AI ethics suggest that AI ethics toolkits may act as a “technology of de-politicization”, sublimating sociopolitical considerations in favor of technical fixes. With few exceptions [e.g., itm-t17-aekit], the toolkits took a decontextualized approach to ethics, largely divorced from the sociopolitical nuance of what ethics might mean in the contexts in which AI systems may be deployed, or how ethical work practices might be enacted within the organizational contexts of the sites of AI production (e.g., technology companies). In such a decontextualized view of ethics, toolkit designers envision individual users who have the agency to make decisions about their design of AI systems, and who are not beholden to the role of power dynamics within the workplace: organizational hierarchies, misaligned priorities, and incentives for ethical work practices—key considerations for the use of AI ethics toolkits, given the reality of business priorities and profit motives.=-1
When toolkits did attend to how ethical work might fit within business processes, many of them leveraged discourses of business risk and responsible innovation to help motivate adoption of ethics tools and processes. These discourses may function tacticallyas a way to allow toolkits to tap into existing institutional processes and resources they may not otherwise have access to (for example, mechanisms for managing legal liability). However, in so doing, companies may sidestep questions of how logics of capital accumulation themselves shape the capacity for AI systems to exert harms and shape the sociotechnical imaginariesfor what ethics might mean—or foreclose alternative ways of conceptualizing ethics. As a result, ethical concerns may be sublimated to the interests of capital. In the following sections, we unpack implications of our findings for AI ethics toolkit researchers and designers.=-1
Reflections and Implications for Research
As the prior sections suggest, the content and guidance provided by toolkits, as well as the metaphor and format of “toolkits” as a predominant way to address AI ethics, constructs particular ways of seeing the world—what constitutes an ethical problem, who should be responsible for addressing those problems, and what are the legitimate practices for addressing them. We underscore this point by using the metaphor of “seeing like a toolkit,” to draw attention to two ideas.
First, although toolkits provide a useful format for sharing information and practices across boundaries and contexts, an over-reliance on toolkits may risk decontextualizing or abstracting away from the social and political contexts where AI systems are deployed and governed, and from the organizational contexts in which those toolkits may be used. Toolkits, by design, are intended to be portable objects usable across a variety of contexts—but as a result, ethical AI toolkits may act as a “device for decontextualizing”. This portability may allow toolkits to be more generalizable or scalable by “mediating between the local and the universal”in order to support their adoption and use across multiple contexts. However, the flattening of local distinctiveness in order to be more easily transportable across contextsbrings with it particular risks for ethical AI. As Selbst et al., have written, efforts for fairness in AI run the risk of what they have referred to as “abstraction traps,” or abstracting away crucial elements of the social context in which AI systems are deployed and within which fairness and ethical considerations must be understood. As a result, toolkits that are explicitly designed to be decontextualized—both from the social context where AI systems will be deployed (and within which ethics must be understood) and from the organizational context in which those toolkits may be used—may inadvertently suggest to their users that either the context does not matter for the work of ethics, or that it is up to the toolkit user to do the work of _re_contextualizing, or translating its methods for their context of use and deployment (cf.). However, this is quite a burden for the toolkits to place on their users, particularly as the imagined users of many ethical AI toolkits appear to be largely technical practitioners who may not have the training or background to do such contextualization and translation work.
This pattern of decontextualization of toolkits mirrors Scott’s concepts of legibility and simplification in statecraft.[whose book Seeing Like a State informs the title of this paper] In order to govern, the state employs techniques such as standardized measurement or systems of private property ownership to make local heterogeneous practices legible, but this also serves to simplify and standardize understandings of social practices which may not equate with local experiences. Similarly, for toolkits to be legible among communities of practice and organizational structures that seek to build systems at scale, toolkits make ethical practices legible in ways that are often simplified and do not account for the hetereogeneity of contextual experiences and on the ground practices of doing AI ethics, requiring users who can do this difficult translation work.
Second, these toolkits represent a form of “professional vision” that may inadvertently promote a solutionist orientation to AI ethics. As Goodwin has argued, “professional vision” is how the discursive practices of professional cultures shape how we see the world in socially situated and historically constituted ways. Similarly, in Silbey’s work on industrial safety culture, she argues that disasters that are not spectacular or sudden—such as slow-acting oil leaks—are often ignored, “existing physically, but not in any organizationally cognizable form”. For ethics in AI, the discursive practices instantiated in our tools shape how the field sees the ethical terrain for action—what are the objects of concern, how might they be made legible or amenable to action, what resources might be marshalled to address them, and by whom. Likewise, problems left outside of toolkits’ purview may risk not being seen as legitimate ethical issues by practitioners.
The tools curated within a toolkit are intended to solve particular problems (here, problems related to the ethics of AI), but the metaphor of the toolkit itself may reinforce a solutionist framing, suggesting to their users that ethical problems can in fact, be solved by using the tools or processes therein—for instance, that AI systems can be “de-biased,” which they cannot be—rather than mitigating their potential for harm. This solutionist orientation is not limited to toolkits; indeed, Selbst et al. have written about the solutionist trap for fairness in sociotechnical systems more generally, but the genre of the toolkit may inadvertently reinforce the idea of ethics as a managerial exercise, or a technical solution to fundamentally contextual and contested challenges (cf.). As a result, this framing may inhibit investment (of time, attention, resources) into alternative approaches that do not fit within the confines of the solutionist orientation of a toolkit, or foreclose alternative theories of change (such as a focus on the political economy of AI development). This may also lead to false expectations (from practitioners using the toolkit as well as stakeholders and communities impacted by AI), potentially leading to frustration, resentment, and further harm when those expectations for solved problems are not met. Others have discussed how corporate dicourse of “solving” ethical issues are often rooted in public relations goals or economic self-interest.
This is a broader issue for the field. Metcalf and Moss discuss how ethics in Silicon Valley is in part framed through the lenses of technological solutionism and market fundamentalism—that an optimal set of tools, procedures, or criteria will lead to an ethical outcome, and that ethical solutions should be pursued within the boundaries of what the market finds profitable. These lenses miss out on the value of non-technical expertise and practices, as well as a broader array of potential ethical (if less profitable) alternatives. What do we lose when we fail to grapple with capital as a force in shaping the ethical considerations of AI? We note that these critiques are not a call to abandon toolkits altogether, but rather an interrogation of what politics we might (unintentionally) embed when framing an AI ethics intervention as a “toolkit.” What are the political choices one makes when one creates a toolkit, and how can we make those choices more intentional? Although we find that AI ethics toolkits tend to focus on technical practices in ways that may be decontextualized from the wider social and political context, we are inspired by toolkits in other domains that explicitly engage in questions of politics and power, for example toolkits that serve as methods of participatory engagement to purposefully include broader communities to consider issues of justice.
We also consider the politics of the choice of deciding to make a “toolkit” versus making something else. We thus ask what ways of “seeing” AI ethics do all toolkits miss? What are new ways of seeing that can produce new, practical interventions? New approaches might move beyond toolkits and look to other theories of change, such as political economy. However, we as authors note that our situatedness in particular debates in the West may occlude our sensitivity to alternative ethical frameworks. Indigenous notions of “making kin”could reveal radical new possibilities for what AI ethics could be, and by what processes it may be enacted. How can we, as a research community, make space for such alternatives? Following from this problem-posing orientation, we do not offer solutions here, but instead pose these as questions for researchers, practitioners, and communities to address through developing alternatives to the dominant paradigm of the toolkit. Some promising examples include the People’s Guide to AI zine; J. Khadijah Abdurahman’s and We Be Imagining’s call for lighting “alternate beacons” to help “organize for different futures” for technology development; and the AI Now Institute’s series on a new lexicon to offer narratives beyond those from the Global North to critically study AI, among others. We call on the CSCW community and others (e.g., FAccT, CHI) to amplify and expand these efforts.=-1
Recommendations for Toolkit Design
Practitioners will continue to require support in enacting ethics in AI, and toolkits are one potential approach to provide such support, as evidenced by their ongoing popularity. Although much of this paper has focused on a critical analysis of toolkits, we offer suggestions for toolkit design following the “practical turn” in values in design research—i.e., if we accept that toolkits can embody and promote particular social values, we might consider an additional (or alternative) set of values in the design of toolkits. We acknowledge that toolkits alone will not solve all the problems of addressing AI ethics, but they can nevertheless be improved to better consider the social and organizational contexts where they might be deployed.
Our findings suggest three concrete recommendations for improving toolkits’ potential to support the work of AI ethics. Toolkits should: (1) provide support for the non-technical dimensions of AI ethics work; (2) support the work of engaging with stakeholders from non-technical backgrounds; (3) structure the work of AI ethics as a problem for collective action.
Embrace the non-technical dimensions of ethics work
Despite emerging awareness that fairness is _socio_technical, the majority of toolkits provided resources to support technical work practices (although some toolkits called for their users to engage in other forms of work [e.g., itm-t5-fairlearn]). This might entail resources to support understanding the theories and concepts of ethics in non-technical ways,[Note that Fairlearn [itm-t5-fairlearn] has—since we conducted the data analysis for this paper—published resources in its user guide for understanding social science concepts such as construct validity for concepts such as fairnessand explanations of sociotechnical abstraction traps.] as well as resources drawing from the social sciences for understanding stakeholders’ situated experiences and perceptions of AI systems and their impacts. For instance, toolkit designers might incorporate methods from qualitative research, user research, or value-sensitive design (e.g.,), as some existing tools suggest (e.g., [itm-t27-communityjury]). Although some AI ethics education tools are beginning to be designed with these perspectives (e.g., value cards), fewer practitioner-oriented toolkits utilize them. As a precursor to this, practitioners may need support in identifying the stakeholders for their systems and use cases, in the contexts in which those systems are (or will be) deployed, including community members, data subjects, or others beyond the users, paying customers, or operators of a given AI system. Approaches such as stakeholder mapping from fields like Human-Computer Interactionmay be useful here, and such resources may be incorporated into AI ethics toolkits.
Support for engaging with stakeholders from non-technical backgrounds
Although many toolkits call for engaging stakeholders from different backgrounds and with different forms of expertise (internal stakeholders such as designers or business leaders; external stakeholders such as advocacy groups and policymakers), the toolkits themselves offer little support for how their users might bridge such disciplinary divides, further contributing to the mismatch between the rhetorical promise of toolkits and their current design. Toolkits should thus support this translational work.[Some emerging work is exploring the role of “boundary objects”to help practitioners align on key concepts and develop a shared language, e.g., https://events.withgoogle.com/pair-symposium-2020/PAIR Symposium 2020, although this work has not focused on ethics of AI specifically.] This might entail, for instance, asking what fairness means to the various stakeholders implicated in ethical AI, or communicating the output of algorithmic impact assessments (e.g., various fairness metrics) in ways that non-technical stakeholders can understand and work with. The Algorithmic Equity Toolkit (whose design process is discussed in) tackles this challenge from the perspective of community members and groups, providing resources to these external stakeholders to support their advocacy work [itm-t17-aekit]. Meanwhile, recent research has explored how to engage non-technical stakeholders in discussions about tradeoffs in model performance, or in participatory AI design processes more generally, although such approaches have largely not been incorporated into toolkits (with few recent exceptions). Moreover, approaches that involve stakeholders impacted by AI conducting “crowd audits” of algorithmic harmshave not yet made their way into the toolkits we analyzed, where the results of such crowd audits might be used to shape AI practitioners’ development practices.=-1
Structure the work of AI ethics as a problem for collective action
One question we found palpably missing in the toolkits we analyzed was, how do toolkits support stakeholders in grappling with organizational dynamics involved in doing the work of ethics? Silbey has written about the “safety culture” promoted in other high-stakes industries (e.g., fossil fuel extraction), where the responsibility to avoid catastrophe is too often located in the behaviors and attitudes of individual actors—typically those with the least power in the organization—rather than systemic processes or organizational oversight. To address this gap, toolkits could provide support for helping practitioners communicate to organizational leadership and advocate for the need to engage in ethical AI work practices, or advocate for additional time or resources to do this work. One form this might take is providing support for strategic alignment of ethics discourses with business priorities and discourses (e.g., business risk, responsible innovation, corporate social responsibility, etc). However, these discourses bring risks: the aims and values of ethical AI could be subverted by business priorities. For instance,discuss how business priorities for AI deployment across market tiers may subvert practitioners’ goals for fairness work. Given the risk that such an approach might smuggle in business logics that subvert ethical aims (see Sec. discourses), toolkit designers might instead consider how to support the users of their toolkits in becoming aware of the organizational power dynamics that may impact the work of ethics (e.g., power mapping exercises), including identifying institutional levers they can pull to shape organizational norms and practices from the bottom up. In addition, toolkits should structure ethical AI as a problem for collective action for multiple groups of stakeholders, rather than work for individual practitioners. This may involve supporting collective action by workers within tech companies, or fostering communities of practice of professionals working on ethical AI across institutions (to share knowledge and best practices, as well as shift professional norms and standards), or supporting collective efforts for ethical AI across industry professionals designing AI and communities impacted by AI. This might also involve providing support for organizing collective action in the workplaces, such as unions, tactical walkouts, or other uses of labor power based on their role in technology production. Prior research found that technology professionals pursuing design justice sought project- and institutional-level tools and interventions rather than individual-level ones. However, few toolkits we saw (with the Data Ethically toolkit as a notable exception [itm-t13-designethically]) provide resources to inform and support practitioners about the role of collective action in ethical AI.=-1
Limitations and Future Work
We examined a small subset of toolkits which may not be representative of all AI ethics toolkits. Most of the toolkits we examined were from tech companies and academia, and we may thus have missed out on toolkits developed by nonprofits, civil society, or government agencies. Furthermore, the toolkits we examined largely skewed towards industry practitioners as the envisioned users (with some exceptions; e.g., [itm-t17-aekit]), and were largely intended to fit into AI development processes (as suggested by the large proportion of toolkits that were open source code). As such, future work should explicitly target toolkits intended to be used by policymakers, civil society, or community stakeholders more generally. Recognizing that creating technical tools can re-inscirbe the harms they seek to address (e.g.,) in addition to re-designing the politics of toolkits, future work should also investigate other forms of political action that consider and address the social and institutional aspects of technology development.
In addition, our corpus was built from search queries; as such, searching for toolkits using terms we did not include here may result in identifying toolkits that we did not include in our corpus. More broadly, our positionality has shaped how we approach our research, including the research questions we chose, the toolkits we identified, and how we coded and interpreted our data. As Sambasivan et al.(among others, such as) have pointed out, AI ethics may mean different things in different cultural contexts, including relying on different legal frameworks, and aiming towards fundamentally different outcomes. Our corpus is necessarily partial and reflective of our positionality and cultural context.=-1
Conclusion
This paper investigates how AI ethics toolkits frame and embed particular visions for what it means to do the work of addressing ethics. Based on our findings, we recommend that designers of AI ethics toolkits should better support the social dimensions of ethics work, provide support for engaging with diverse stakeholders, and frame AI ethics as a problem for collective action rather than individual practice. Toolkit development should be tied more closely to empirical research that studies the social, organizational, and technical work required to surface and address ethical issues. Creating tools or resources in a format that challenges the notions of the “toolkit” per se may open up the design space to foster new approaches to AI ethics. Although no single artifact alone will solve all AI ethics problems, intentionally diversifying the forms of work that such artifacts envision and support may enable more effective ethical interventions in the work practices adopted by developers, designers, researchers, policymakers, and other stakeholders.
Thank you to Emma Lurie, Zoe Kahn, Ken Holstein, our colleagues at the UC Berkeley Center for Long-Term Cybersecurity and Microsoft Research, and the anonymous reviewers for their comments and feedback on this work.
Toolkit Listing and Analysis
Ethics Kit, http://ethicskit.org/tools.html
Model Cards, https://modelcards.withgoogle.com/about
AI Fairness 360, https://aif360.mybluemix.net/
InterpretML, https://github.com/interpretml/interpret
Fairlearn,https://fairlearn.github.io/
Aequitas,http://aequitas.dssg.io/
Ethics & Algorithms Toolkit https://ethicstoolkit.ai/
Consequence Scanning Kit,https://www.doteveryone.org.uk/project/consequence-scanning/
AI Ethics Cards, https://www.ideo.com/post/ai-ethics-collaborative-activities-for-designers
What If Tool, https://pair-code.github.io/what-if-tool/
Digital Impact Toolkit, https://digitalimpact.io/toolkit/
Deon Ethics Checklist, http://deon.drivendata.org/
Design Ethically Toolkit, https://www.designethically.com/toolkit
Lime, https://github.com/marcotcr/lime
Weights and Biases, https://wandb.ai/site
Responsible AI in Consumer Enterprise,https://static1.squarespace.com/static/5d387c126be524000116bbdb/t/5d77e37092c6df3a5151c866/1568138185862/Ethics-of-artificial-intelligence.pdf
Algorithmic Equity Toolkit (AEKit),https://www.aclu-wa.org/AEKit
LinkedIn Fairness Toolkit (LiFT), https://github.com/linkedin/LiFT , https://engineering.linkedin.com/blog/2020/lift-addressing-bias-in-large-scale-ai-applications
Audit AI, https://github.com/pymetrics/audit-ai
TensorFlow Fairness Indicators,https://github.com/tensorflow/fairness-indicators
Judgment Call,https://docs.microsoft.com/en-us/azure/architecture/guide/responsible-innovation/judgmentcall
SageMaker Clarify, https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/fairness_and_explainability/fairness_and_explainability.html
NLP CheckList, https://github.com/marcotcr/checklist
HAX Workbook and Playbook, https://www.microsoft.com/en-us/haxtoolkit/workbook/
Community Jury, https://docs.microsoft.com/en-us/azure/architecture/guide/responsible-innovation/community-jury/
Harms Modeling,https://docs.microsoft.com/en-us/azure/architecture/guide/responsible-innovation/harms-modeling/
Algorithmic Accountability Policy Toolkit,https://ainowinstitute.org/aap-toolkit.pdf
July 2022 [revised]October 2022 [accepted]January 2023
Bibliography
1@article{Pfaffenberger1992,
2 year = {1992},
3 volume = {17},
4 url = {https://estsjournal.org/index.php/ests/article/view/132 http://journals.sagepub.com/doi/10.1177/016224399201700302},
5 title = {{Technological Dramas}},
6 pages = {282--312},
7 number = {3},
8 month = {Jul},
9 journal = {Science, Technology, \& Human Values},
10 issn = {0162-2439},
11 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Pfaffenberger - 1992 - Technological Dramas.pdf:pdf},
12 doi = {10.1177/016224399201700302},
13 author = {Pfaffenberger, Bryan},
14 abstract = {This article examines the technological construction of political power, as well as resistance to political power, by means of an "ideal-typical" model called a technolog ical drama. In technological regularization, a design constituency creates artifacts whose features reveal an intention to shape the distribution of wealth, power, or status in society. The design constituency also creates myths, social contexts, and rituals to legitimate its intention and constitute the artifact's political impact. In reply, the people adversely affected by regularization engage in myth-, context-, or artifact-altering strate gies that represent an accommodation to the system (technological adjustment) or a conscious attempt to change it (technological reconstitution). A technological drama, in short, is a specifically technological form of political discourse. A key point is that throughout all three processes, political "intentions," no less than the facticity and hardness of the technology's "impact," are themselves constituted and constructed in reciprocal and discursive interaction with technological activities. Technology is not politics pursued by other means; it is politics constructed by technological means.},
15}
16
17@inproceedings{LeDantec2009Values,
18 year = {2009},
19 url = {http://dl.acm.org/citation.cfm?id=1518701.1518875 http://dl.acm.org/citation.cfm?doid=1518701.1518875},
20 title = {{Values as lived experience: Evolving value sensitive design in support of value discovery}},
21 publisher = {ACM Press},
22 pages = {1141},
23 keywords = {VSD,case study,critique,emergent,empirical methods,fieldwork,methodology,photo elicitation,pluralistic,values},
24 issn = {9781605582467},
25 isbn = {9781605582467},
26 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Le Dantec, Poole, Wyche - 2009 - Values as lived experience Evolving value sensitive design in support of value discovery.pdf:pdf},
27 doi = {10.1145/1518701.1518875},
28 booktitle = {Proceedings of the 27th international conference on Human factors in computing systems - CHI 09},
29 author = {{Le Dantec}, Christopher A. and Poole, Erika Shehan and Wyche, Susan P.},
30 address = {New York, New York, USA},
31 abstract = {The Value Sensitive Design (VSD) methodology provides a comprehensive framework for advancing a value-centered research and design agenda. Although VSD provides helpful ways of thinking about and designing value-centered computational systems, we argue that the specific mechanics of VSD create thorny tensions with respect to value sensitivity. In particular, we examine limitations due to value classifications, inadequate guidance on empirical tools for design, and the ways in which the design process is ordered. In this paper, we propose ways of maturing the VSD methodology to overcome these limitations and present three empirical case studies that illustrate a family of methods to effectively engage local expressions of values. The findings from our case studies provide evidence of how we can mature the VSD methodology to mitigate the pitfalls of classification and engender a commitment to reflect on and respond to local contexts of design.},
32}
33
34@inproceedings{Houston2016Values,
35 year = {2016},
36 url = {http://dl.acm.org/citation.cfm?doid=2858036.2858470},
37 title = {{Values in Repair}},
38 publisher = {ACM Press},
39 pages = {1403--1414},
40 mendeley-groups = {Post-Conference Reading/CHI 2016},
41 isbn = {9781450333627},
42 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Houston et al. - 2016 - Values in Repair.pdf:pdf},
43 doi = {10.1145/2858036.2858470},
44 booktitle = {Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI '16},
45 author = {Houston, Lara and Jackson, Steven J and Rosner, Daniela K and Ahmed, Syed Ishtiaque and Young, Meg and Kang, Laewoo},
46 address = {New York, New York, USA},
47}
48
49@article{JafariNaimi2015ValuesHypotheses,
50 year = {2015},
51 volume = {31},
52 url = {http://www.mitpressjournals.org/doi/10.1162/DESI_a_00354},
53 title = {{Values as Hypotheses: Design, Inquiry, and the Service of Values}},
54 pages = {91--104},
55 number = {4},
56 month = {Oct},
57 journal = {Design Issues},
58 issn = {0747-9360},
59 isbn = {13978-3-923859-82-5},
60 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/JafariNaimi, Nathan, Hargraves - 2015 - Values as Hypotheses Design, Inquiry, and the Service of Values.pdf:pdf},
61 doi = {10.1162/DESI_a_00354},
62 author = {JafariNaimi (Parvin), Nassim and Nathan, Lisa and Hargraves, Ian},
63 abstract = {(fr. hohe/ gehobene Schneiderei)},
64}
65
66@inproceedings{Shilton2014HowToSeeValues,
67 year = {2014},
68 url = {http://dl.acm.org/citation.cfm?id=2531602.2531625%5Cnhttp://dl.acm.org/citation.cfm?doid=2531602.2531625 https://dl.acm.org/doi/10.1145/2531602.2531625},
69 title = {{How to see values in social computing: Methods for Studying Values Dimensions}},
70 publisher = {ACM},
71 pages = {426--435},
72 month = {feb},
73 mendeley-tags = {dissertation,value centered design},
74 keywords = {dissertation,research methods,value centered design,value sensitive design,values in design},
75 isbn = {9781450325400},
76 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Shilton, Koepfler, Fleischmann - 2014 - How to see values in social computing Methods for Studying Values Dimensions.pdf:pdf},
77 doi = {10.1145/2531602.2531625},
78 booktitle = {Proceedings of the 17th ACM conference on Computer supported cooperative work \& social computing},
79 author = {Shilton, Katie and Koepfler, Jes A. and Fleischmann, Kenneth R.},
80 address = {New York, NY, USA},
81 abstract = {Human values play an important role in shaping the design and use of information technologies. Research on values in social computing is challenged by disagreement about indicators and objects of study as researchers distribute their focus across contexts of technology design, adoption, and use. This paper draws upon a framework that clarifies how to see values in social computing research by describing values dimensions, comprised of sources and attributes of values in sociotechnical systems. This paper uses the framework to compare how diverse research methods employed in social computing surface values and make them visible to researchers. The framework provides a tool to analyze the strengths and weaknesses of each method for observing values dimensions. By detailing how and where researchers might observe interactions between values and technology design and use, we hope to enable researchers to systematically identify and investigate values in social computing. Copyright {\textcopyright} 2014 ACM.},
82}
83
84@article{yates1992genres,
85 publisher = {Academy of Management Briarcliff Manor, NY 10510},
86 year = {1992},
87 pages = {299--326},
88 number = {2},
89 volume = {17},
90 journal = {Academy of management review},
91 author = {Yates, JoAnne and Orlikowski, Wanda J},
92 title = {Genres of organizational communication: A structurational approach to studying communication and media},
93}
94
95@inproceedings{shen2021valuecards,
96 series = {FAccT '21},
97 location = {Virtual Event, Canada},
98 keywords = {Value Cards, Machine Learning, CS Education, Deliberation, Fairness},
99 numpages = {12},
100 pages = {850–861},
101 booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
102 abstract = {Recently, there have been increasing calls for computer science curricula to complement existing technical training with topics related to Fairness, Accountability, Transparency and Ethics (FATE). In this paper, we present Value Cards, an educational toolkit to inform students and practitioners the social impacts of different machine learning models via deliberation. This paper presents an early use of our approach in a college-level computer science course. Through an in-class activity, we report empirical data for the initial effectiveness of our approach. Our results suggest that the use of the Value Cards toolkit can improve students' understanding of both the technical definitions and trade-offs of performance metrics and apply them in real-world contexts, help them recognize the significance of considering diverse social values in the development and deployment of algorithmic systems, and enable them to communicate, negotiate and synthesize the perspectives of diverse stakeholders. Our study also demonstrates a number of caveats we need to consider when using the different variants of the Value Cards toolkit. Finally, we discuss the challenges as well as future applications of our approach.},
103 doi = {10.1145/3442188.3445971},
104 url = {https://doi.org/10.1145/3442188.3445971},
105 address = {New York, NY, USA},
106 publisher = {Association for Computing Machinery},
107 isbn = {9781450383097},
108 year = {2021},
109 title = {Value Cards: An Educational Toolkit for Teaching Social Impacts of Machine Learning through Deliberation},
110 author = {Shen, Hong and Deng, Wesley H. and Chattopadhyay, Aditi and Wu, Zhiwei Steven and Wang, Xu and Zhu, Haiyi},
111}
112
113@misc{Shonhiwa2020humanValuesMedium,
114 year = {2020},
115 urldate = {2021-12-03},
116 url = {https://uxdesign.cc/human-values-matter-why-value-sensitive-design-should-be-part-of-every-ux-designers-toolkit-e53ffe7ec436},
117 title = {{Human values matter: why value-sensitive design should be part of every UX designer's toolkit}},
118 booktitle = {UX Collective},
119 author = {Shonhiwa, Mandla},
120}
121
122@techreport{Spitzberg2020,
123 year = {2020},
124 title = {{Principles at Work: Applying “Design Justice” in Professionalized Workplaces}},
125 pages = {1--5},
126 file = {:C\:/Users/ryw9/Box/Papers Archive/Spitzberg et al (2020) Principles at work- applying design justice in professionalized workplaces.pdf:pdf},
127 doi = {10.21428/93b2c832.e3a8d187},
128 booktitle = {CSCW 2020 Workshop on Collective Organizing and Social Responsibility},
129 author = {Spitzberg, Danny and Shaw, Kevin and Angevine, Colin and Wilkins, Marissa and Strickland, M and Yamashiro, Janel and Adams, Rhonda and Lockhart, Leah},
130}
131
132@article{Metcalf2019OwningEthics,
133 year = {2019},
134 volume = {86},
135 title = {{Owning ethics: Corporate logics, Silicon Valley, and the institutionalization of ethics}},
136 pages = {449--476},
137 number = {2},
138 journal = {Social Research},
139 issn = {0037783X},
140 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Metcalf, Moss, boyd - 2019 - Owning ethics Corporate logics, Silicon Valley, and the institutionalization of ethics.pdf:pdf},
141 author = {Metcalf, Jacob and Moss, Emanuel and danah Boyd},
142}
143
144@inproceedings{Khovanskaya2019dataRhetoric,
145 year = {2019},
146 url = {https://dl.acm.org/doi/10.1145/3322276.3323691},
147 title = {{Data Rhetoric and Uneasy Alliances: Data Advocacy in US Labor History}},
148 publisher = {ACM},
149 pages = {1391--1403},
150 month = {jun},
151 isbn = {9781450358507},
152 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Khovanskaya, Sengers - 2019 - Data Rhetoric and Uneasy Alliances Data Advocacy in US Labor History.pdf:pdf},
153 doi = {10.1145/3322276.3323691},
154 booktitle = {Proceedings of the 2019 on Designing Interactive Systems Conference},
155 author = {Khovanskaya, Vera and Sengers, Phoebe},
156 address = {New York, NY, USA},
157}
158
159@inproceedings{10.1145/3442188.3445938,
160 series = {FAccT '21},
161 location = {Virtual Event, Canada},
162 keywords = {surveillance, algorithmic justice, Participatory design, algorithmic equity, regulation, participatory action research, accountability},
163 numpages = {10},
164 pages = {772–781},
165 booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
166 abstract = {Motivated by the extensive documented disparate harms of artificial intelligence (AI), many recent practitioner-facing reflective tools have been created to promote responsible AI development. However, the use of such tools internally by technology development firms addresses responsible AI as an issue of closed-door compliance rather than a matter of public concern. Recent advocate and activist efforts intervene in AI as a public policy problem, inciting a growing number of cities to pass bans or other ordinances on AI and surveillance technologies. In support of this broader ecology of political actors, we present a set of reflective tools intended to increase public participation in technology advocacy for AI policy action. To this end, the Algorithmic Equity Toolkit (the AEKit) provides a practical policy-facing definition of AI, a flowchart for assessing technologies against that definition, a worksheet for decomposing AI systems into constituent parts, and a list of probing questions that can be posed to vendors, policy-makers, or government agencies. The AEKit carries an action-orientation towards political encounters between community groups in the public and their representatives, opening up the work of AI reflection and remediation to multiple points of intervention. Unlike current reflective tools available to practitioners, our toolkit carries with it a politics of community participation and activism.},
167 doi = {10.1145/3442188.3445938},
168 url = {https://doi.org/10.1145/3442188.3445938},
169 address = {New York, NY, USA},
170 publisher = {Association for Computing Machinery},
171 isbn = {9781450383097},
172 year = {2021},
173 title = {An Action-Oriented AI Policy Toolkit for Technology Audits by Community Advocates and Activists},
174 author = {Krafft, P. M. and Young, Meg and Katell, Michael and Lee, Jennifer E. and Narayan, Shankar and Epstein, Micah and Dailey, Dharma and Herman, Bernease and Tam, Aaron and Guetler, Vivian and Bintz, Corinne and Raz, Daniella and Jobe, Pa Ousman and Putz, Franziska and Robick, Brian and Barghouti, Bissan},
175}
176
177@article{goodwin2015professional,
178 year = {1994},
179 volume = {96},
180 urldate = {2023-01-14},
181 title = {Professional Vision},
182 publisher = {[American Anthropological Association, Wiley]},
183 pages = {606--633},
184 number = {3},
185 journal = {American Anthropologist},
186 author = {Charles Goodwin},
187 abstract = {Seeing is investigated as a socially situated, historically constituted body of practices through which the objects of knowledge that animate the discourse of a profession are constructed and shaped. Analysis of videotapes of archaeologists making maps and lawyers animating events visible on the Rodney King videotape focuses on practices that are articulated in a work-relevant way within sequences of human interaction, including coding schemes, highlighting, and graphic representations. Through the structure of talk in interaction, members of a profession hold each other accountable for, and contest the proper constitution and perception of, the objects that define their professional competence.},
188 url = {http://www.jstor.org/stable/682303},
189}
190
191@inproceedings{lee2021landscape,
192 series = {CHI '21},
193 location = {Yokohama, Japan},
194 keywords = {algorithm auditing, bias detection, fairness, open source toolkits, bias, bias mitigation, algorithmic fairness, fairness toolkits},
195 numpages = {13},
196 articleno = {699},
197 booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems},
198 abstract = {With the surge in literature focusing on the assessment and mitigation of unfair outcomes in algorithms, several open source ‘fairness toolkits’ recently emerged to make such methods widely accessible. However, little studied are the differences in approach and capabilities of existing fairness toolkits, and their fit-for-purpose in commercial contexts. Towards this, this paper identifies the gaps between the existing open source fairness toolkit capabilities and the industry practitioners’ needs. Specifically, we undertake a comparative assessment of the strengths and weaknesses of six prominent open source fairness toolkits, and investigate the current landscape and gaps in fairness toolkits through an exploratory focus group, a semi-structured interview, and an anonymous survey of data science/machine learning (ML) practitioners. We identify several gaps between the toolkits’ capabilities and practitioner needs, highlighting areas requiring attention and future directions towards tooling that better support ‘fairness in practice.’},
199 doi = {10.1145/3411764.3445261},
200 url = {https://doi.org/10.1145/3411764.3445261},
201 address = {New York, NY, USA},
202 publisher = {Association for Computing Machinery},
203 isbn = {9781450380966},
204 year = {2021},
205 title = {The Landscape and Gaps in Open Source Fairness Toolkits},
206 author = {Lee, Michelle Seng Ah and Singh, Jat},
207}
208
209@inproceedings{richardson2021towards,
210 series = {CHI '21},
211 location = {Yokohama, Japan},
212 keywords = {algorithmic bias, machine learning fairness, ML, fairness, ethics, user-centric evaluation, AI},
213 numpages = {13},
214 articleno = {236},
215 booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems},
216 abstract = {In order to support fairness-forward thinking by machine learning (ML) practitioners, fairness researchers have created toolkits that aim to transform state-of-the-art research contributions into easily-accessible APIs. Despite these efforts, recent research indicates a disconnect between the needs of practitioners and the tools offered by fairness research. By engaging 20 ML practitioners in a simulated scenario in which they utilize fairness toolkits to make critical decisions, this work aims to utilize practitioner feedback to inform recommendations for the design and creation of fair ML toolkits. Through the use of survey and interview data, our results indicate that though fair ML toolkits are incredibly impactful on users’ decision-making, there is much to be desired in the design and demonstration of fairness results. To support the future development and evaluation of toolkits, this work offers a rubric that can be used to identify critical components of Fair ML toolkits.},
217 doi = {10.1145/3411764.3445604},
218 url = {https://doi.org/10.1145/3411764.3445604},
219 address = {New York, NY, USA},
220 publisher = {Association for Computing Machinery},
221 isbn = {9781450380966},
222 year = {2021},
223 title = {Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits},
224 author = {Richardson, Brianna and Garcia-Gathright, Jean and Way, Samuel F. and Thom, Jennifer and Cramer, Henriette},
225}
226
227@incollection{morley2021initial,
228 publisher = {Springer},
229 year = {2021},
230 pages = {153--183},
231 booktitle = {Ethics, Governance, and Policies in Artificial Intelligence},
232 author = {Morley, Jessica and Floridi, Luciano and Kinsey, Libby and Elhalal, Anat},
233 title = {From what to how: an initial review of publicly available AI ethics tools, methods and research to translate principles into practices},
234}
235
236@article{ayling2021putting,
237 doi = {10.1007/s43681-021-00084-x},
238 publisher = {Springer},
239 year = {2021},
240 pages = {405--429},
241 number = {3},
242 volume = {2},
243 journal = {AI and Ethics},
244 author = {Ayling, Jacqui and Chapman, Adriane},
245 title = {Putting AI ethics to work: are the tools fit for purpose?},
246}
247
248@inproceedings{10.1145/3375627.3377141,
249 series = {AIES '20},
250 location = {New York, NY, USA},
251 keywords = {feminist theory, work and organizations, ai ethics, sts, data work, theory, social sciences},
252 numpages = {2},
253 pages = {5–6},
254 booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
255 abstract = {Recent advances in artificial intelligence applications have sparked scholarly and public attention to the challenges of the ethical design of technologies. These conversations about ethics have been targeted largely at technology designers and concerned with helping to inform building better and fairer AI tools and technologies. This approach, however, addresses only a small part of the problem of responsible use and will not be adequate for describing or redressing the problems that will arise as more types of AI technologies are more widely used.Many of the tools being developed today have potentially enormous and historic impacts on how people work, how society organises, stores and distributes information, where and how people interact with one another, and how people's work is valued and compensated. And yet, our ethical attention has looked at a fairly narrow range of questions about expanding the access to, fairness of, and accountability for existing tools. Instead, I argue that scholars should develop much broader questions of about the reconfiguration of societal power, for which AI technologies form a crucial component.This talk will argue that AI ethics needs to expand its theoretical and methodological toolkit in order to move away from prioritizing notions of good design that privilege the work of good and ethical technology designers. Instead, using approaches from feminist theory, organization studies, and science and technology, I argue for expanding how we evaluate uses of AI. This approach begins with the assumption of socially informed technological affordances, or "imagined affordances" [1] shaping how people understand and use technologies in practice. It also gives centrality to the power of social institutions for shaping technologies-in-practice.},
256 doi = {10.1145/3375627.3377141},
257 url = {https://doi.org/10.1145/3375627.3377141},
258 address = {New York, NY, USA},
259 publisher = {Association for Computing Machinery},
260 isbn = {9781450371100},
261 year = {2020},
262 title = {From Bad Users and Failed Uses to Responsible Technologies: A Call to Expand the AI Ethics Toolkit},
263 author = {Neff, Gina},
264}
265
266@article{chivukula2021surveying,
267 numpages = {32},
268 year = {2021},
269 journal = {arXiv preprint arXiv:2102.08909},
270 author = {Chivukula, Shruthi Sai and Li, Ziqing and Pivonka, Anne C and Chen, Jingning and Gray, Colin M},
271 title = {Surveying the Landscape of Ethics-Focused Design Methods},
272}
273
274@article{pierce2018differential,
275 doi = {10.1145/3274408},
276 publisher = {ACM New York, NY, USA},
277 year = {2018},
278 pages = {1--24},
279 number = {CSCW},
280 volume = {2},
281 journal = {Proceedings of the ACM on Human-Computer Interaction},
282 author = {Pierce, James and Fox, Sarah and Merrill, Nick and Wong, Richmond},
283 title = {Differential vulnerabilities and a diversity of tactics: What toolkits teach us about cybersecurity},
284}
285
286@inproceedings{sambasivan2021reimagining,
287 series = {FAccT '21},
288 location = {Virtual Event, Canada},
289 keywords = {religion, decoloniality, India, feminism, algorithmic fairness, ability, caste, gender, class, anti-caste politics, critical algorithmic studies},
290 numpages = {14},
291 pages = {315–328},
292 booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
293 abstract = {Conventional algorithmic fairness is West-centric, as seen in its subgroups, values, and methods. In this paper, we de-center algorithmic fairness and analyse AI power in India. Based on 36 qualitative interviews and a discourse analysis of algorithmic deployments in India, we find that several assumptions of algorithmic fairness are challenged. We find that in India, data is not always reliable due to socio-economic factors, ML makers appear to follow double standards, and AI evokes unquestioning aspiration. We contend that localising model fairness alone can be window dressing in India, where the distance between models and oppressed communities is large. Instead, we re-imagine algorithmic fairness in India and provide a roadmap to re-contextualise data and models, empower oppressed communities, and enable Fair-ML ecosystems.},
294 doi = {10.1145/3442188.3445896},
295 url = {https://doi.org/10.1145/3442188.3445896},
296 address = {New York, NY, USA},
297 publisher = {Association for Computing Machinery},
298 isbn = {9781450383097},
299 year = {2021},
300 title = {Re-Imagining Algorithmic Fairness in India and Beyond},
301 author = {Sambasivan, Nithya and Arnesen, Erin and Hutchinson, Ben and Doshi, Tulsee and Prabhakaran, Vinodkumar},
302}
303
304@article{Jobin:2019bw,
305 doi = {10.1038/s42256-019-0088-2},
306 month = {Sep},
307 volume = {1},
308 pages = {1--11},
309 year = {2019},
310 journal = {Nature Machine Intelligence},
311 title = {{The global landscape of AI ethics guidelines}},
312 author = {Jobin, Anna and Ienca, Marcello and Vayena, Effy},
313}
314
315@article{Mittelstadt:2019ve,
316 doi = {10.48550/arXiv.1906.06668},
317 year = {2019},
318 howpublished = {CoRR arXiv:1906.06668},
319 author = {Mittelstadt, Brent},
320 title = {AI Ethics--Too Principled to Fail?},
321}
322
323@article{schiff2020principles,
324 doi = {10.48550/arXiv.2006.04707},
325 year = {2020},
326 journal = {arXiv preprint arXiv:2006.04707},
327 author = {Schiff, Daniel and Rakova, Bogdana and Ayesh, Aladdin and Fanti, Anat and Lennon, Michael},
328 title = {Principles to practices for responsible AI: Closing the gap},
329}
330
331@article{metcalf2019owning,
332 publisher = {Johns Hopkins University Press},
333 year = {2019},
334 pages = {449--476},
335 number = {2},
336 volume = {86},
337 journal = {Social Research: An International Quarterly},
338 author = {Metcalf, Jacob and Moss, Emanuel and others},
339 title = {Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics},
340}
341
342@inproceedings{madaio2020co,
343 series = {CHI '20},
344 location = {Honolulu, HI, USA},
345 keywords = {fairness, checklists, ML, AI, co-design, ethics},
346 numpages = {14},
347 pages = {1–14},
348 booktitle = {Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems},
349 abstract = {Many organizations have published principles intended to guide the ethical development and deployment of AI systems; however, their abstract nature makes them difficult to operationalize. Some organizations have therefore produced AI ethics checklists, as well as checklists for more specific concepts, such as fairness, as applied to AI systems. But unless checklists are grounded in practitioners' needs, they may be misused. To understand the role of checklists in AI ethics, we conducted an iterative co-design process with 48 practitioners, focusing on fairness. We co-designed an AI fairness checklist and identified desiderata and concerns for AI fairness checklists in general. We found that AI fairness checklists could provide organizational infrastructure for formalizing ad-hoc processes and empowering individual advocates. We highlight aspects of organizational culture that may impact the efficacy of AI fairness checklists, and suggest future design directions.},
350 doi = {10.1145/3313831.3376445},
351 url = {https://doi.org/10.1145/3313831.3376445},
352 address = {New York, NY, USA},
353 publisher = {Association for Computing Machinery},
354 isbn = {9781450367080},
355 year = {2020},
356 title = {Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI},
357 author = {Madaio, Michael A. and Stark, Luke and Wortman Vaughan, Jennifer and Wallach, Hanna},
358}
359
360@article{rakova2021responsible,
361 doi = {10.1145/3449081},
362 publisher = {ACM New York, NY, USA},
363 year = {2021},
364 pages = {1--23},
365 number = {CSCW1},
366 volume = {5},
367 journal = {Proceedings of the ACM on Human-Computer Interaction},
368 author = {Rakova, Bogdana and Yang, Jingying and Cramer, Henriette and Chowdhury, Rumman},
369 title = {Where responsible AI meets reality: Practitioner perspectives on enablers for shifting organizational practices},
370}
371
372@article{suchman2002located,
373 year = {2002},
374 pages = {7},
375 number = {2},
376 volume = {14},
377 journal = {Scandinavian journal of information systems},
378 author = {Suchman, Lucy},
379 title = {Located accountabilities in technology production},
380}
381
382@article{wong2021tactics,
383 doi = {10.1145/3479499},
384 publisher = {ACM New York, NY, USA},
385 year = {2021},
386 pages = {1--28},
387 number = {CSCW2},
388 volume = {5},
389 journal = {Proceedings of the ACM on Human-Computer Interaction},
390 author = {Wong, Richmond Y},
391 title = {Tactics of Soft Resistance in User Experience Professionals' Values Work},
392}
393
394@article{shilton2013values,
395 doi = {10.1177/0162243912436985},
396 publisher = {Sage Publications Sage CA: Los Angeles, CA},
397 year = {2013},
398 pages = {374--397},
399 number = {3},
400 volume = {38},
401 journal = {Science, Technology, \& Human Values},
402 author = {Shilton, Katie},
403 title = {Values levers: Building ethics into design},
404}
405
406@inproceedings{neff2020bad,
407 series = {AIES '20},
408 location = {New York, NY, USA},
409 keywords = {social sciences, data work, work and organizations, feminist theory, sts, theory, ai ethics},
410 numpages = {2},
411 pages = {5–6},
412 booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
413 abstract = {Recent advances in artificial intelligence applications have sparked scholarly and public attention to the challenges of the ethical design of technologies. These conversations about ethics have been targeted largely at technology designers and concerned with helping to inform building better and fairer AI tools and technologies. This approach, however, addresses only a small part of the problem of responsible use and will not be adequate for describing or redressing the problems that will arise as more types of AI technologies are more widely used.Many of the tools being developed today have potentially enormous and historic impacts on how people work, how society organises, stores and distributes information, where and how people interact with one another, and how people's work is valued and compensated. And yet, our ethical attention has looked at a fairly narrow range of questions about expanding the access to, fairness of, and accountability for existing tools. Instead, I argue that scholars should develop much broader questions of about the reconfiguration of societal power, for which AI technologies form a crucial component.This talk will argue that AI ethics needs to expand its theoretical and methodological toolkit in order to move away from prioritizing notions of good design that privilege the work of good and ethical technology designers. Instead, using approaches from feminist theory, organization studies, and science and technology, I argue for expanding how we evaluate uses of AI. This approach begins with the assumption of socially informed technological affordances, or "imagined affordances" [1] shaping how people understand and use technologies in practice. It also gives centrality to the power of social institutions for shaping technologies-in-practice.},
414 doi = {10.1145/3375627.3377141},
415 url = {https://doi.org/10.1145/3375627.3377141},
416 address = {New York, NY, USA},
417 publisher = {Association for Computing Machinery},
418 isbn = {9781450371100},
419 year = {2020},
420 title = {From Bad Users and Failed Uses to Responsible Technologies: A Call to Expand the AI Ethics Toolkit},
421 author = {Neff, Gina},
422}
423
424@article{passi2018trust,
425 keywords = {organizational work, data science, trust, credibility, collaboration},
426 numpages = {28},
427 articleno = {136},
428 month = {Nov},
429 journal = {Proc. ACM Hum.-Comput. Interact.},
430 abstract = {The trustworthiness of data science systems in applied and real-world settings emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility. Highlighting the collaborative and heterogeneous nature of real-world data science, we show how the management of trust in applied corporate data science settings depends not only on pre-processing and quantification, but also on negotiation and translation. We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.},
431 doi = {10.1145/3274405},
432 url = {https://doi.org/10.1145/3274405},
433 number = {CSCW},
434 volume = {2},
435 address = {New York, NY, USA},
436 publisher = {Association for Computing Machinery},
437 issue_date = {November 2018},
438 year = {2018},
439 title = {Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects},
440 author = {Passi, Samir and Jackson, Steven J.},
441}
442
443@article{passi2020making,
444 publisher = {SAGE Publications Sage UK: London, England},
445 year = {2020},
446 numpages = {13},
447 doi = {10.1177/2053951720939605},
448 number = {2},
449 volume = {7},
450 journal = {Big Data \& Society},
451 author = {Passi, Samir and Sengers, Phoebe},
452 title = {Making data science systems work},
453}
454
455@inproceedings{passi2019problem,
456 series = {FAT* '19},
457 location = {Atlanta, GA, USA},
458 keywords = {Problem Formulation, Machine Learning, Target Variable, Fairness, Data Science},
459 numpages = {10},
460 pages = {39–48},
461 booktitle = {Proceedings of the Conference on Fairness, Accountability, and Transparency},
462 abstract = {Formulating data science problems is an uncertain and difficult process. It requires various forms of discretionary work to translate high-level objectives or strategic goals into tractable problems, necessitating, among other things, the identification of appropriate target variables and proxies. While these choices are rarely self-evident, normative assessments of data science projects often take them for granted, even though different translations can raise profoundly different ethical concerns. Whether we consider a data science project fair often has as much to do with the formulation of the problem as any property of the resulting model. Building on six months of ethnographic fieldwork with a corporate data science team---and channeling ideas from sociology and history of science, critical data studies, and early writing on knowledge discovery in databases---we describe the complex set of actors and activities involved in problem formulation. Our research demonstrates that the specification and operationalization of the problem are always negotiated and elastic, and rarely worked out with explicit normative considerations in mind. In so doing, we show that careful accounts of everyday data science work can help us better understand how and why data science problems are posed in certain ways---and why specific formulations prevail in practice, even in the face of what might seem like normatively preferable alternatives. We conclude by discussing the implications of our findings, arguing that effective normative interventions will require attending to the practical work of problem formulation.},
463 doi = {10.1145/3287560.3287567},
464 url = {https://doi.org/10.1145/3287560.3287567},
465 address = {New York, NY, USA},
466 publisher = {Association for Computing Machinery},
467 isbn = {9781450361255},
468 year = {2019},
469 title = {Problem Formulation and Fairness},
470 author = {Passi, Samir and Barocas, Solon},
471}
472
473@misc{mattern_2021,
474 month = {Jul},
475 year = {2021},
476 author = {Mattern, Shannon},
477 journal = {Toolshed},
478 url = {https://tool-shed.org/unboxing-the-toolkit/},
479 title = {Unboxing the Toolkit},
480}
481
482@misc{kelty_2018,
483 month = {Jun},
484 year = {2018},
485 author = {Kelty, Christopher M},
486 journal = {Limn},
487 url = {https://limn.it/articles/the-participatory-development-toolkit/},
488 title = {The Participatory Development Toolkit},
489}
490
491@inproceedings{Holstein:2019fr,
492 series = {CHI '19},
493 location = {Glasgow, Scotland UK},
494 keywords = {needfinding, algorithmic bias, fair machine learning, ux of machine learning, product teams, empirical study},
495 numpages = {16},
496 pages = {1–16},
497 booktitle = {Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems},
498 abstract = {The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such unfairness. If these tools are to have a positive impact on industry practice, however, it is crucial that their design be informed by an understanding of real-world needs. Through 35 semi-structured interviews and an anonymous survey of 267 ML practitioners, we conduct the first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems. We identify areas of alignment and disconnect between the challenges faced by teams in practice and the solutions proposed in the fair ML research literature. Based on these findings, we highlight directions for future ML and HCI research that will better address practitioners' needs.},
499 doi = {10.1145/3290605.3300830},
500 url = {https://doi.org/10.1145/3290605.3300830},
501 address = {New York, NY, USA},
502 publisher = {Association for Computing Machinery},
503 isbn = {9781450359702},
504 year = {2019},
505 author = {Holstein, Kenneth and Wortman Vaughan, Jennifer and Daum{\'e} III, Hal and Dudik, Miro and Wallach, Hanna},
506 title = {Improving fairness in machine learning systems: What do industry practitioners need?},
507}
508
509@inproceedings{boyd2020ethical,
510 series = {CSCW '20 Companion},
511 location = {Virtual Event, USA},
512 keywords = {ethical sensitivity, datasets, machine learning, ethics, work practices, technology development},
513 numpages = {6},
514 pages = {87–92},
515 booktitle = {Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing},
516 abstract = {Despite a great deal of attention to developing ethical mitigations for Machine Learning (ML) training data and models, we don't yet know how these interventions will be adopted by those who curate data and use them to train ML models. Will they help ML engineers find and address ethical concerns in their work? My proposed dissertation seeks to understand ML engineers? ethical sensitivity? their propensity to notice, analyze, and act on socially impactful aspects of their work-while curating training data and describe the effects of context documents and ethical guides as practice-based ethics interventions in this early stage of ML development. It asks how ML engineers recognize,particularize, and judge ethical questions while exploring new training data; introduces Ethical Sensitivity to the study of social computing; and will describe how Datasheets intervene in perception and particularization; and will develop a document that can help engineers move from particularization to judgment. It will accomplish these goals using a think aloud experiment with engineers working with unfamiliar training data (with or without a Datasheet), a Value Sensitive Design study that aims to fit an ethical mitigation guide to engineers? work practices, and a systematic review of ethical sensitivity.},
517 doi = {10.1145/3406865.3418359},
518 url = {https://doi.org/10.1145/3406865.3418359},
519 address = {New York, NY, USA},
520 publisher = {Association for Computing Machinery},
521 isbn = {9781450380591},
522 year = {2020},
523 title = {Ethical Sensitivity in Machine Learning Development},
524 author = {Boyd, Karen},
525}
526
527@article{weaver2008ethical,
528 publisher = {Wiley Online Library},
529 year = {2008},
530 pages = {607--618},
531 number = {5},
532 volume = {62},
533 journal = {Journal of advanced nursing},
534 author = {Weaver, Kathryn and Morse, Janice and Mitcham, Carl},
535 title = {Ethical sensitivity in professional practice: concept analysis},
536}
537
538@article{hitzig2020normative,
539 publisher = {Cambridge University Press},
540 year = {2020},
541 pages = {407--434},
542 number = {3},
543 volume = {36},
544 journal = {Economics \& Philosophy},
545 author = {Hitzig, Zo{\"e}},
546 title = {The normative gap: mechanism design and ideal theories of justice},
547}
548
549@misc{friedman2002value,
550 publisher = {Citeseer},
551 year = {2002},
552 number = {2-12},
553 journal = {University of Washington technical report},
554 author = {Friedman, Batya and Kahn, Peter and Borning, Alan},
555 title = {Value sensitive design: Theory and methods},
556}
557
558@article{yoo2018stakeholder,
559 doi = {10.1007/s10676-018-9474-4},
560 publisher = {Springer},
561 year = {2021},
562 pages = {1--5},
563 journal = {Ethics and Information Technology},
564 author = {Yoo, Daisy},
565 title = {Stakeholder Tokens: a constructive method for value sensitive design stakeholder analysis},
566}
567
568@incollection{star1989structure,
569 numpages = {18},
570 pages = {37–54},
571 booktitle = {Distributed Artificial Intelligence (Vol. 2)},
572 address = {San Francisco, CA, USA},
573 publisher = {Morgan Kaufmann Publishers Inc.},
574 isbn = {0273088106},
575 year = {1989},
576 author = {Star, Susan Leigh},
577 title = {The structure of ill-structured solutions: Boundary objects and heterogeneous distributed problem solving},
578}
579
580@article{silbey2009taming,
581 publisher = {Annual Reviews},
582 year = {2009},
583 pages = {341--369},
584 volume = {35},
585 journal = {Annual Review of Sociology},
586 author = {Silbey, Susan S},
587 title = {Taming Prometheus: Talk about safety and culture},
588}
589
590@book{redfield2013life,
591 address = {Berkeley},
592 publisher = {University of California Press},
593 year = {2013},
594 author = {Redfield, Peter},
595 title = {Life in crisis},
596}
597
598@inproceedings{selbst2019fairness,
599 series = {FAT* '19},
600 location = {Atlanta, GA, USA},
601 keywords = {Interdisciplinary, Fairness-aware Machine Learning, Sociotechnical Systems},
602 numpages = {10},
603 pages = {59–68},
604 booktitle = {Proceedings of the Conference on Fairness, Accountability, and Transparency},
605 abstract = {A key goal of the fair-ML community is to develop machine-learning based systems that, once introduced into a social context, can achieve social and legal outcomes such as fairness, justice, and due process. Bedrock concepts in computer science---such as abstraction and modular design---are used to define notions of fairness and discrimination, to produce fairness-aware learning algorithms, and to intervene at different stages of a decision-making pipeline to produce "fair" outcomes. In this paper, however, we contend that these concepts render technical interventions ineffective, inaccurate, and sometimes dangerously misguided when they enter the societal context that surrounds decision-making systems. We outline this mismatch with five "traps" that fair-ML work can fall into even as it attempts to be more context-aware in comparison to traditional data science. We draw on studies of sociotechnical systems in Science and Technology Studies to explain why such traps occur and how to avoid them. Finally, we suggest ways in which technical designers can mitigate the traps through a refocusing of design in terms of process rather than solutions, and by drawing abstraction boundaries to include social actors rather than purely technical ones.},
606 doi = {10.1145/3287560.3287598},
607 url = {https://doi.org/10.1145/3287560.3287598},
608 address = {New York, NY, USA},
609 publisher = {Association for Computing Machinery},
610 isbn = {9781450361255},
611 year = {2019},
612 title = {Fairness and Abstraction in Sociotechnical Systems},
613 author = {Selbst, Andrew D. and Boyd, Danah and Friedler, Sorelle A. and Venkatasubramanian, Suresh and Vertesi, Janet},
614}
615
616@inproceedings{blodgett2020language,
617 abstract = {We survey 146 papers analyzing {``}bias{''} in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing {``}bias{''} is an inherently normative process. We further find that these papers{'} proposed quantitative techniques for measuring or mitigating {``}bias{''} are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing {``}bias{''} in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of {``}bias{''}---i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements{---}and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities.},
618 pages = {5454--5476},
619 doi = {10.18653/v1/2020.acl-main.485},
620 url = {https://aclanthology.org/2020.acl-main.485},
621 publisher = {Association for Computational Linguistics},
622 address = {Online},
623 year = {2020},
624 month = {July},
625 booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
626 author = {Blodgett, Su Lin and
627Barocas, Solon and
628Daum{\'e} III, Hal and
629Wallach, Hanna},
630 title = {Language (Technology) is Power: A Critical Survey of {``}Bias{''} in {NLP}},
631}
632
633@article{shen2020designing,
634 doi = {10.1145/3415224},
635 publisher = {ACM New York, NY, USA},
636 year = {2020},
637 pages = {1--22},
638 number = {CSCW2},
639 volume = {4},
640 journal = {Proceedings of the ACM on Human-Computer Interaction},
641 author = {Shen, Hong and Jin, Haojian and Cabrera, {\'A}ngel Alexander and Perer, Adam and Zhu, Haiyi and Hong, Jason I},
642 title = {Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance},
643}
644
645@inproceedings{cheng2021soliciting,
646 series = {CHI '21},
647 location = {Yokohama, Japan},
648 keywords = {algorithmic fairness, human-centered AI, child welfare, algorithm-assisted decision-making, machine learning},
649 numpages = {17},
650 articleno = {390},
651 booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems},
652 abstract = {Recent work in fair machine learning has proposed dozens of technical definitions of algorithmic fairness and methods for enforcing these definitions. However, we still lack an understanding of how to develop machine learning systems with fairness criteria that reflect relevant stakeholders’ nuanced viewpoints in real-world contexts. To address this gap, we propose a framework for eliciting stakeholders’ subjective fairness notions. Combining a user interface that allows stakeholders to examine the data and the algorithm’s predictions with an interview protocol to probe stakeholders’ thoughts while they are interacting with the interface, we can identify stakeholders’ fairness beliefs and principles. We conduct a user study to evaluate our framework in the setting of a child maltreatment predictive system. Our evaluations show that the framework allows stakeholders to comprehensively convey their fairness viewpoints. We also discuss how our results can inform the design of predictive systems.},
653 doi = {10.1145/3411764.3445308},
654 url = {https://doi.org/10.1145/3411764.3445308},
655 address = {New York, NY, USA},
656 publisher = {Association for Computing Machinery},
657 isbn = {9781450380966},
658 year = {2021},
659 title = {Soliciting Stakeholders’ Fairness Notions in Child Maltreatment Predictive Systems},
660 author = {Cheng, Hao-Fei and Stapleton, Logan and Wang, Ruiqi and Bullock, Paige and Chouldechova, Alexandra and Wu, Zhiwei Steven Steven and Zhu, Haiyi},
661}
662
663@incollection{stark2021critical,
664 publisher = {Springer},
665 year = {2021},
666 pages = {257--280},
667 booktitle = {The Cultural Life of Machine Learning},
668 author = {Stark, Luke and Greene, Daniel and Hoffmann, Anna Lauren},
669 title = {Critical Perspectives on Governance Mechanisms for AI/ML Systems},
670}
671
672@article{freire1996pedagogy,
673 year = {1996},
674 journal = {New York: Continuum},
675 author = {Freire, Paolo},
676 title = {Pedagogy of the oppressed (revised)},
677}
678
679@article{malazita2019infrastructures,
680 publisher = {Taylor \& Francis},
681 year = {2019},
682 pages = {300--312},
683 number = {4},
684 volume = {30},
685 journal = {Digital Creativity},
686 author = {Malazita, James W and Resetar, Korryn},
687 title = {Infrastructures of abstraction: how computer science education produces anti-political subjects},
688}
689
690@article{ahn2020fairsight,
691 doi = {10.1109/TVCG.2019.2934262},
692 pages = {1086-1095},
693 number = {1},
694 volume = {26},
695 year = {2020},
696 title = {FairSight: Visual Analytics for Fairness in Decision Making},
697 journal = {IEEE Transactions on Visualization and Computer Graphics},
698 author = {Ahn, Yongsu and Lin, Yu-Ru},
699}
700
701@inproceedings{metcalf2021algorithmicImpactAssessments,
702 series = {FAccT '21},
703 location = {Virtual Event, Canada},
704 keywords = {harm, impact, algorithmic impact assessment, governance, accountability},
705 numpages = {12},
706 pages = {735–746},
707 booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
708 abstract = {Algorithmic impact assessments (AIAs) are an emergent form of accountability for organizations that build and deploy automated decision-support systems. They are modeled after impact assessments in other domains. Our study of the history of impact assessments shows that "impacts" are an evaluative construct that enable actors to identify and ameliorate harms experienced because of a policy decision or system. Every domain has different expectations and norms around what constitutes impacts and harms, how potential harms are rendered as impacts of a particular undertaking, who is responsible for conducting such assessments, and who has the authority to act on them to demand changes to that undertaking. By examining proposals for AIAs in relation to other domains, we find that there is a distinct risk of constructing algorithmic impacts as organizationally understandable metrics that are nonetheless inappropriately distant from the harms experienced by people, and which fall short of building the relationships required for effective accountability. As impact assessments become a commonplace process for evaluating harms, the FAccT community, in its efforts to address this challenge, should A) understand impacts as objects that are co-constructed accountability relationships, B) attempt to construct impacts as close as possible to actual harms, and C) recognize that accountability governance requires the input of various types of expertise and affected communities. We conclude with lessons for assembling cross-expertise consensus for the co-construction of impacts and building robust accountability relationships.},
709 doi = {10.1145/3442188.3445935},
710 url = {https://doi.org/10.1145/3442188.3445935},
711 address = {New York, NY, USA},
712 publisher = {Association for Computing Machinery},
713 isbn = {9781450383097},
714 year = {2021},
715 title = {Algorithmic Impact Assessments and Accountability: The Co-Construction of Impacts},
716 author = {Metcalf, Jacob and Moss, Emanuel and Watkins, Elizabeth Anne and Singh, Ranjit and Elish, Madeleine Clare},
717}
718
719@article{kemp2013humanRights,
720 url = {https://doi.org/10.1080/14615517.2013.782978},
721 doi = {10.1080/14615517.2013.782978},
722 publisher = {Taylor & Francis},
723 year = {2013},
724 pages = {86-96},
725 number = {2},
726 volume = {31},
727 journal = {Impact Assessment and Project Appraisal},
728 title = {Human rights and impact assessment: clarifying the connections in practice},
729 author = {Kemp, Deanna and Vanclay, Frank},
730}
731
732@techreport{UnitedNationsHumanRights2011,
733 year = {2011},
734 title = {{Guiding Principles on Business and Human Rights: Implementing the United Nations "Protect, Respect and Remedy" Framework}},
735 institution = {United Nations},
736 file = {:C\:/Users/ryw9/Box/Papers Archive/United Nations (2011)Guiding principles on business and human rights- implementing the United Nations protect, respect and remedy framework.pdf:pdf},
737 doi = {10.4324/9781351171922-3},
738 author = {{United Nations Human Rights Office of the High Commissioner}},
739}
740
741@unpublished{Ruggie2017SocialConstructionUN,
742 year = {2017},
743 title = {{The Social Construction of the UN Guiding Principles on Business \& Human Rights}},
744 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Ruggie - 2017 - The Social Construction of the UN Guiding Principles on Business & Human Rights.pdf:pdf},
745 doi = {10.2139/ssrn.2984901},
746 booktitle = {Harvard Kennedy School Faculty Research Working Paper Series},
747 author = {Ruggie, John Gerard},
748 abstract = {Academic proponents and opponents of the UN Guiding Principles on Business & Human Rights have generated a bourgeoning literature. And by now there are several years of practical experience to inform the debate. But the conceptual and theoretical understanding of global rulemaking that informed my development of the UNGPs, and to which I have contributed as a scholar, have not been fully articulated and debated. This chapter aims to close that gap, on the supposition that those ideas might have contributed to the UNGPs' relative success where previous efforts failed, and that in some measure they may be applicable in other complex and contested global policy domains.},
749}
750
751@article{Hoffmann2020terms,
752 issue = {12},
753 volume = {23},
754 year = {2020},
755 url = {http://journals.sagepub.com/doi/10.1177/1461444820958725},
756 title = {{Terms of inclusion: Data, discourse, violence}},
757 pages = {146144482095872},
758 month = {sep},
759 journal = {New Media \& Society},
760 issn = {1461-4448},
761 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Hoffmann - 2020 - Terms of inclusion Data, discourse, violence.pdf:pdf},
762 doi = {10.1177/1461444820958725},
763 author = {Hoffmann, Anna Lauren},
764 abstract = {Inclusion has emerged as an early cornerstone value for the emerging domain of “data ethics.” On the surface, appeals to inclusion appear to address the threat that biased data technologies making decisions or misrepresenting people in ways that reproduce longer standing patterns of oppression and violence. Far from a panacea for the threats of pervasive data collection and surveillance, however, these emerging discourses of inclusion merit critical consideration. Here, I use the lens of discursive violence to better theorize the relationship between inclusion and the violent potentials of data science and technology. In doing so, I aim to articulate the problematic and often perverse power relationships implicit in ideals of “inclusion” broadly, which—if not accompanied by dramatic upheavals in existing hierarchical power structures—too often work to diffuse the radical potential of difference and normalize otherwise oppressive structural conditions.},
765}
766
767@inproceedings{Greene2019betterNicer,
768 pages = {2122-2131},
769 year = {2019},
770 url = {http://hdl.handle.net/10125/59651},
771 title = {{Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning}},
772 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Greene, Hoffmann, Stark - 2019 - Better, Nicer, Clearer, Fairer A Critical Assessment of the Movement for Ethical Artificial Intelligenc.pdf:pdf},
773 doi = {10.24251/HICSS.2019.258},
774 booktitle = {Proceedings of the 52nd Hawaii International Conference on System Sciences},
775 author = {Greene, Daniel and Hoffmann, Anna Lauren and Stark, Luke},
776 abstract = {This paper uses frame analysis to examine recent high-profile values statements endorsing ethical design for artificial intelligence and machine learning (AI/ML). Guided by insights from values in design and the sociology of business ethics, we uncover the grounding assumptions and terms of debate that make some conversations about ethical design possible while forestalling alternative visions. Vision statements for ethical AI/ML co-opt the language of some critics, folding them into a limited, technologically deterministic, expert-driven view of what ethical AI/ML means and how it might work.},
777}
778
779@book{Scott1998seeing,
780 year = {1998},
781 title = {{Seeing Like a State: How certain schemes to improve the human condition have failed}},
782 publisher = {Yale University Press},
783 author = {Scott, James C.},
784 address = {New Haven},
785}
786
787@article{braun2006using,
788 publisher = {Taylor \& Francis},
789 year = {2006},
790 pages = {77--101},
791 number = {2},
792 volume = {3},
793 journal = {Qualitative research in psychology},
794 author = {Braun, Virginia and Clarke, Victoria},
795 title = {Using thematic analysis in psychology},
796}
797
798@book{jasanoff2015dreamscapes,
799 address = {Chicago},
800 publisher = {University of Chicago Press},
801 year = {2015},
802 author = {Jasanoff, Sheila and Kim, Sang-Hyun},
803 title = {Dreamscapes of modernity: Sociotechnical imaginaries and the fabrication of power},
804}
805
806@article{madaio2021assessing,
807 year = {2021},
808 journal = {arXiv preprint arXiv:2112.05675},
809 author = {Madaio, Michael and Egede, Lisa and Subramonyam, Hariharan and Vaughan, Jennifer Wortman and Wallach, Hanna},
810 title = {Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support},
811}
812
813@article{shen2021everyday,
814 keywords = {auditing algorithms, everyday users, algorithmic bias, everyday algorithm auditing, fair machine learning},
815 numpages = {29},
816 articleno = {433},
817 month = {oct},
818 journal = {Proc. ACM Hum.-Comput. Interact.},
819 abstract = {A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users' knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems.},
820 doi = {10.1145/3479577},
821 url = {https://doi.org/10.1145/3479577},
822 number = {CSCW2},
823 volume = {5},
824 address = {New York, NY, USA},
825 publisher = {Association for Computing Machinery},
826 issue_date = {October 2021},
827 year = {2021},
828 title = {Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors},
829 author = {Shen, Hong and DeVos, Alicia and Eslami, Motahhare and Holstein, Kenneth},
830}
831
832@misc{ding2018deciphering,
833 pages = {1-44},
834 url = {https://www.fhi.ox.ac.uk/wp-content/uploads/Deciphering_Chinas_AI-Dream.pdf},
835 year = {2018},
836 publisher = {Future of Humanity Institute Technical Report},
837 author = {Ding, Jeffrey},
838 title = {Deciphering China’s AI dream},
839}
840
841@article{STILGOE20131568,
842 abstract = {The governance of emerging science and innovation is a major challenge for contemporary democracies. In this paper we present a framework for understanding and supporting efforts aimed at ‘responsible innovation’. The framework was developed in part through work with one of the first major research projects in the controversial area of geoengineering, funded by the UK Research Councils. We describe this case study, and how this became a location to articulate and explore four integrated dimensions of responsible innovation: anticipation, reflexivity, inclusion and responsiveness. Although the framework for responsible innovation was designed for use by the UK Research Councils and the scientific communities they support, we argue that it has more general application and relevance.},
843 keywords = {Responsible innovation, Governance, Emerging technologies, Ethics, Geoengineering},
844 author = {Jack Stilgoe and Richard Owen and Phil Macnaghten},
845 url = {https://www.sciencedirect.com/science/article/pii/S0048733313000930},
846 doi = {https://doi.org/10.1016/j.respol.2013.05.008},
847 issn = {0048-7333},
848 year = {2013},
849 pages = {1568-1580},
850 number = {9},
851 volume = {42},
852 journal = {Research Policy},
853 title = {Developing a framework for responsible innovation},
854}
855
856@inproceedings{forlizzi2013promoting,
857 pages = {1-12},
858 year = {2013},
859 volume = {13},
860 booktitle = {Proceedings of the 5th International Congress of International Association of Societies of Design Research-IASDR},
861 author = {Forlizzi, Jodi and Zimmerman, John},
862 title = {Promoting service design as a core practice in interaction design},
863}
864
865@article{lewis2018making,
866 doi = {https://doi.org/10.21428/bfafd97b},
867 publisher = {PubPub},
868 year = {2018},
869 journal = {Journal of Design and Science},
870 author = {Lewis, Jason Edward and Arista, Noelani and Pechawis, Archer and Kite, Suzanne},
871 title = {Making kin with the machines},
872}
873
874@inproceedings{Chivukula2020DimensionsUX,
875 year = {2020},
876 url = {https://dl.acm.org/doi/10.1145/3313831.3376459},
877 title = {{Dimensions of UX Practice that Shape Ethical Awareness}},
878 publisher = {ACM},
879 pages = {1--13},
880 month = {apr},
881 isbn = {9781450367080},
882 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Chivukula et al. - 2020 - Dimensions of UX Practice that Shape Ethical Awareness.pdf:pdf},
883 doi = {10.1145/3313831.3376459},
884 booktitle = {Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems},
885 author = {Chivukula, Shruthi Sai and Watkins, Chris Rhys and Manocha, Rhea and Chen, Jingle and Gray, Colin M.},
886 address = {New York, NY, USA},
887 abstract = {HCI researchers are increasingly interested in describing the complexity of design practice, including ethical, organiza- tional, and societal concerns. Recent studies have identified individual practitioners as key actors in driving the design process and culture within their respective organizations, and we build upon these efforts to reveal practitioner concerns re- garding ethics on their own terms. In this paper, we report on the results of an interview study with eleven UX practitioners, capturing their experiences that highlight dimensions of de- sign practice that impact ethical awareness and action. Using a bottom-up thematic analysis, we identified five dimensions of design complexity that influence ethical outcomes and span individual, collaborative, and methodological framing of UX activity. Based on these findings, we propose a set of impli- cations for the creation of ethically-centered design methods that resonate with this complexity and inform the education of future},
888}
889
890@book{Bamberger2015PrivacyGround,
891 year = {2015},
892 title = {{Privacy on the Ground: Driving Corporate Behavior in the United States and Europe}},
893 publisher = {The MIT Press},
894 author = {Bamberger, Kenneth A. and Mulligan, Deirdre K.},
895 address = {Cambridge, Massachusetts},
896}
897
898@article{Crockett2021BuildingTrustworthy,
899 doi = {10.1109/TAI.2021.3137091},
900 pages = {1-1},
901 number = {},
902 volume = {},
903 year = {2021},
904 title = {Building Trustworthy AI Solutions: A Case for Practical Solutions for Small Businesses},
905 journal = {IEEE Transactions on Artificial Intelligence},
906 author = {Crockett, Keeley Alexandra and Gerber, Luciano and Latham, Annabel and Colyer, Edwin},
907}
908
909@techreport{Alston2019UNReportPoverty,
910 year = {2019},
911 volume = {17564},
912 url = {https://undocs.org/A/74/493},
913 title = {{Report of the Special Rapporteur on extreme poverty and human rights}},
914 pages = {1--23},
915 number = {October},
916 institution = {United Nations},
917 file = {:C\:/Users/ryw9/Box/Papers Archive/Alston (2019) Report of the Special Rapporteur on extreme poverty and.pdf:pdf},
918 booktitle = {United Nations General Assembly},
919 author = {Alston, Philip},
920 abstract = {The digital welfare state is either already a reality or emerging in many countries across the globe. In these states, systems of social protection and assistance are increasingly driven by digital data and technologies that are used to automate, predict, identify, surveil, detect, target and punish. In the present report, the irresistible attractions for Governments to move in this direction are acknowledged, but the grave risk of stumbling, zombie-like, into a digital welfare dystopia is highlighted. It is argued that big technology companies (frequently referred to as “big tech”) operate in an almost human rights-free zone, and that this is especially problematic when the private sector is taking a leading role in designing, constructing and even operating significant parts of the digital welfare state. It is recommended in the report that, instead of obsessing about fraud, cost savings, sanctions, and market -driven definitions of efficiency, the starting point should be on how welfare budgets could be transformed through technology to ensure a higher standard of living for the vulnerable and disadvantaged.},
921}
922
923@book{ahmed2012being,
924 address = {Durham, NC},
925 publisher = {Duke University Press},
926 year = {2012},
927 author = {Ahmed, Sara},
928 title = {On being included},
929}
930
931@book{Onuoha2018PeoplesGuideAI,
932 year = {2018},
933 url = {https://alliedmedia.org/resources/peoples-guide-to-ai},
934 title = {{A People's Guide to AI}},
935 publisher = {Allied Media Projects},
936 file = {:C\:/Users/ryw9/Box/Papers Archive/Onuoha, Nucera (2018) People's Guide to AI.pdf:pdf},
937 author = {Onuoha, Mim and Nucera, Diana},
938}
939
940@article{Abdurahman2021Body,
941 year = {2021},
942 url = {https://logicmag.io/beacons/a-body-of-work-that-cannot-be-ignored/},
943 title = {{A Body of Work That Cannot Be Ignored}},
944 number = {15: Beacons},
945 journal = {Logic},
946 file = {:C\:/Users/ryw9/Box/Papers Archive/Abdurahman (2021) A Body of Work That Cannot Be Ignored.pdf:pdf},
947 author = {Abdurahman, J Khadijah},
948}
949
950@misc{Raval2021NewAILexicon,
951 year = {2021},
952 urldate = {2022-01-07},
953 url = {https://medium.com/a-new-ai-lexicon/a-new-ai-lexicon-responses-and-challenges-to-the-critical-ai-discourse-f2275989fa62},
954 title = {{A New AI Lexicon: Responses and Challenges to the Critical AI discourse}},
955 file = {:C\:/Users/ryw9/Box/Papers Archive/Raval, Kak (2021) A New AI Lexicon_ Responses and Challenges to the Critical AI discourse _ by AI Now Institute _ A New AI Lexicon _ Medium.pdf:pdf},
956 booktitle = {AI Now Institute},
957 author = {Raval, Noopur and Kak, Amba},
958}
959
960@misc{Ozoma2021TechWorkerHandbook,
961 year = {2021},
962 urldate = {2022-01-07},
963 url = {https://techworkerhandbook.org/},
964 title = {{The Tech Worker Handbook}},
965 booktitle = {The Tech Worker Handbook},
966 author = {Ozoma, Ifeoma},
967}
968
969@misc{LittleSis2017MapThePower,
970 year = {2017},
971 urldate = {2022-01-07},
972 url = {https://littlesis.org/toolkit},
973 title = {{Map the Power Toolkit}},
974 author = {LittleSis},
975}
976
977@inproceedings{mitchell2019model,
978 series = {FAT* '19},
979 location = {Atlanta, GA, USA},
980 keywords = {disaggregated evaluation, fairness evaluation, ethical considerations, ML model evaluation, model cards, documentation, datasheets},
981 numpages = {10},
982 pages = {220–229},
983 booktitle = {Proceedings of the Conference on Fairness, Accountability, and Transparency},
984 abstract = {Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type [15]) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related artificial intelligence technology, increasing transparency into how well artificial intelligence technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation.},
985 doi = {10.1145/3287560.3287596},
986 url = {https://doi.org/10.1145/3287560.3287596},
987 address = {New York, NY, USA},
988 publisher = {Association for Computing Machinery},
989 isbn = {9781450361255},
990 year = {2019},
991 title = {Model Cards for Model Reporting},
992 author = {Mitchell, Margaret and Wu, Simone and Zaldivar, Andrew and Barnes, Parker and Vasserman, Lucy and Hutchinson, Ben and Spitzer, Elena and Raji, Inioluwa Deborah and Gebru, Timnit},
993}
994
995@article{gebru2021datasheets,
996 doi = {10.1145/3458723},
997 publisher = {ACM New York, NY, USA},
998 year = {2021},
999 pages = {86--92},
1000 number = {12},
1001 volume = {64},
1002 journal = {Communications of the ACM},
1003 author = {Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and Iii, Hal Daum{\'e} and Crawford, Kate},
1004 title = {Datasheets for datasets},
1005}
1006
1007@inproceedings{jacobs2021measurement,
1008 series = {FAccT '21},
1009 location = {Virtual Event, Canada},
1010 keywords = {construct reliability, construct validity, measurement, fairness},
1011 numpages = {11},
1012 pages = {375–385},
1013 booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
1014 abstract = {We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable properties (and other unobservable theoretical constructs) thought to be related to them---i.e., operationalized via a measurement model. This process, which necessarily involves making assumptions, introduces the potential for mismatches between the theoretical understanding of the construct purported to be measured and its operationalization. We argue that many of the harms discussed in the literature on fairness in computational systems are direct results of such mismatches. We show how some of these harms could have been anticipated and, in some cases, mitigated if viewed through the lens of measurement modeling. To do this, we contribute fairness-oriented conceptualizations of construct reliability and construct validity that unite traditions from political science, education, and psychology and provide a set of tools for making explicit and testing assumptions about constructs and their operationalizations. We then turn to fairness itself, an essentially contested construct that has different theoretical understandings in different contexts. We argue that this contestedness underlies recent debates about fairness definitions: although these debates appear to be about different operationalizations, they are, in fact, debates about different theoretical understandings of fairness. We show how measurement modeling can provide a framework for getting to the core of these debates.},
1015 doi = {10.1145/3442188.3445901},
1016 url = {https://doi.org/10.1145/3442188.3445901},
1017 address = {New York, NY, USA},
1018 publisher = {Association for Computing Machinery},
1019 isbn = {9781450383097},
1020 year = {2021},
1021 title = {Measurement and Fairness},
1022 author = {Jacobs, Abigail Z. and Wallach, Hanna},
1023}
1024
1025@article{delgado2021stakeholder,
1026 numpages = {7},
1027 year = {2021},
1028 journal = {arXiv preprint arXiv:2111.01122},
1029 author = {Delgado, Fernando and Yang, Stephen and Madaio, Michael and Yang, Qian},
1030 title = {Stakeholder Participation in AI: Beyond" Add Diverse Stakeholders and Stir"},
1031}
1032
1033@article{sloane2020participation,
1034 series = {EAAMO '22},
1035 location = {Arlington, VA, USA},
1036 keywords = {machine learning, participatory methods, design},
1037 numpages = {6},
1038 articleno = {1},
1039 booktitle = {Equity and Access in Algorithms, Mechanisms, and Optimization},
1040 abstract = {This paper critiques popular modes of participation in design practice and machine learning. It examines three existing kinds of participation in design practice and machine learning participation as work, participation as consultation, and as participation as justice – to argue that the machine learning community must become attuned to possibly exploitative and extractive forms of community involvement and shift away from the prerogatives of context independent scalability. Cautioning against “participation washing”, it argues that the notion of “participation” should be expanded to acknowledge more subtle, and possibly exploitative, forms of community involvement in participatory machine learning design. Specifically, it suggests that it is imperative to recognize design participation as work; to ensure that participation as consultation is context-specific; and that participation as justice must be genuine and long term. The paper argues that such a development can only be scaffolded by a new epistemology around design harms, including, but not limited to, in machine learning. To facilitate such a development, the paper suggests developing we argue that developing a cross-sectoral database of design participation failures that is cross-referenced with socio-structural dimensions and highlights “edge cases” that can and must be learned from.},
1041 doi = {10.1145/3551624.3555285},
1042 url = {https://doi.org/10.1145/3551624.3555285},
1043 address = {New York, NY, USA},
1044 publisher = {Association for Computing Machinery},
1045 isbn = {9781450394772},
1046 year = {2022},
1047 title = {Participation Is Not a Design Fix for Machine Learning},
1048 author = {Sloane, Mona and Moss, Emanuel and Awomolo, Olaitan and Forlano, Laura},
1049}
1050
1051@article{madaio2022assessing,
1052 doi = {10.1145/3512899},
1053 publisher = {ACM New York, NY, USA},
1054 year = {2022},
1055 pages = {1--26},
1056 number = {CSCW1},
1057 volume = {6},
1058 journal = {Proceedings of the ACM on Human-Computer Interaction},
1059 author = {Madaio, Michael and Egede, Lisa and Subramonyam, Hariharan and Wortman Vaughan, Jennifer and Wallach, Hanna},
1060 title = {Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support},
1061}
1062
1063@inproceedings{deng2022exploring,
1064 series = {FAccT '22},
1065 location = {Seoul, Republic of Korea},
1066 numpages = {12},
1067 pages = {473–484},
1068 booktitle = {2022 ACM Conference on Fairness, Accountability, and Transparency},
1069 abstract = {Recent years have seen the development of many open-source ML fairness toolkits aimed at helping ML practitioners assess and address unfairness in their systems. However, there has been little research investigating how ML practitioners actually use these toolkits in practice. In this paper, we conducted the first in-depth empirical exploration of how industry practitioners (try to) work with existing fairness toolkits. In particular, we conducted think-aloud interviews to understand how participants learn about and use fairness toolkits, and explored the generality of our findings through an anonymous online survey. We identified several opportunities for fairness toolkits to better address practitioner needs and scaffold them in using toolkits effectively and responsibly. Based on these findings, we highlight implications for the design of future open-source fairness toolkits that can support practitioners in better contextualizing, communicating, and collaborating around ML fairness efforts.},
1070 doi = {10.1145/3531146.3533113},
1071 url = {https://doi.org/10.1145/3531146.3533113},
1072 address = {New York, NY, USA},
1073 publisher = {Association for Computing Machinery},
1074 isbn = {9781450393522},
1075 year = {2022},
1076 title = {Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits},
1077 author = {Deng, Wesley Hanwen and Nagireddy, Manish and Lee, Michelle Seng Ah and Singh, Jatinder and Wu, Zhiwei Steven and Holstein, Kenneth and Zhu, Haiyi},
1078}
1079
1080@article{shilton2018values,
1081 doi = {10.1561/1100000073},
1082 publisher = {Now Publishers, Inc.},
1083 year = {2018},
1084 pages = {107--171},
1085 number = {2},
1086 volume = {12},
1087 journal = {Foundations and Trends{\textregistered} in Human--Computer Interaction},
1088 author = {Shilton, Katie},
1089 title = {Values and ethics in human-computer interaction},
1090}
1091
1092@inproceedings{shen2022model,
1093 series = {FAccT '22},
1094 location = {Seoul, Republic of Korea},
1095 numpages = {12},
1096 pages = {440–451},
1097 booktitle = {2022 ACM Conference on Fairness, Accountability, and Transparency},
1098 abstract = {There have been increasing calls for centering impacted communities – both online and offline – in the design of the AI systems that will be deployed in their communities. However, the complicated nature of a community’s goals and needs, as well as the complexity of AI’s development procedures, outputs, and potential impacts, often prevents effective participation. In this paper, we present the Model Card Authoring Toolkit, a toolkit that supports community members to understand, navigate and negotiate a spectrum of machine learning models via deliberation and pick the ones that best align with their collective values. Through a series of workshops, we conduct an empirical investigation of the initial effectiveness of our approach in two online communities – English and Dutch Wikipedia, and document how our participants collectively set the threshold for a machine learning based quality prediction system used in their communities’ content moderation applications. Our results suggest that the use of the Model Card Authoring Toolkit helps improve the understanding of the trade-offs across multiple community goals on AI design, engage community members to discuss and negotiate the trade-offs, and facilitate collective and informed decision-making in their own community contexts. Finally, we discuss the challenges for a community-centered, deliberation-driven approach for AI design as well as potential design implications.},
1099 doi = {10.1145/3531146.3533110},
1100 url = {https://doi.org/10.1145/3531146.3533110},
1101 address = {New York, NY, USA},
1102 publisher = {Association for Computing Machinery},
1103 isbn = {9781450393522},
1104 year = {2022},
1105 title = {The Model Card Authoring Toolkit: Toward Community-Centered, Deliberation-Driven AI Design},
1106 author = {Shen, Hong and Wang, Leijie and Deng, Wesley H. and Brusse, Ciell and Velgersdijk, Ronald and Zhu, Haiyi},
1107}
1108
1109@article{watkins2022four,
1110 year = {2022},
1111 journal = {arXiv preprint arXiv:2202.09519},
1112 author = {Watkins, Elizabeth Anne and McKenna, Michael and Chen, Jiahao},
1113 title = {The four-fifths rule is not disparate impact: a woeful tale of epistemic trespassing in algorithmic fairness},
1114}
1115
1116@book{gray2019ghost,
1117 address = {Boston},
1118 publisher = {Houghton Mifflin Harcourt},
1119 year = {2019},
1120 author = {Gray, Mary L and Suri, Siddharth},
1121 title = {Ghost work: How to stop Silicon Valley from building a new global underclass},
1122}
1123
1124@inproceedings{bray2022radical,
1125 series = {CHI '22},
1126 location = {New Orleans, LA, USA},
1127 keywords = {participatory design, method, qualitative methods, design research methods, design methods},
1128 numpages = {13},
1129 articleno = {452},
1130 booktitle = {Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems},
1131 abstract = {When considering the democratic intentions of co-design, designers and design researchers must evaluate the impact of power imbalances embedded in common design and research dynamics. This holds particularly true in work with and for marginalized communities, who are frequently excluded in design processes. To address this issue, we examine how existing design tools and methods are used to support communities in processes of community building or reimagining, considering the influence of race and identity. This paper describes our findings from 27 interviews with community design practitioners conducted to evaluate the Building Utopia toolkit, which employs an Afrofuturist lens for speculative design processes. Our research findings support the importance of design tools that prompt conversations on race in design, and tensions between the desire for imaginative design practice and the immediacy of social issues, particularly when designing with Black and brown communities.},
1132 doi = {10.1145/3491102.3501945},
1133 url = {https://doi.org/10.1145/3491102.3501945},
1134 address = {New York, NY, USA},
1135 publisher = {Association for Computing Machinery},
1136 isbn = {9781450391573},
1137 year = {2022},
1138 title = {Radical Futures: Supporting Community-Led Design Engagements through an Afrofuturist Speculative Design Toolkit},
1139 author = {Bray, Kirsten E and Harrington, Christina and Parker, Andrea G and Diakhate, N'Deye and Roberts, Jennifer},
1140}
1141
1142@inproceedings{wong2020beyondchecklists,
1143 year = {2020},
1144 url = {https://dl.acm.org/doi/10.1145/3406865.3418590},
1145 title = {{Beyond Checklist Approaches to Ethics in Design}},
1146 publisher = {ACM},
1147 pages = {511--517},
1148 month = {oct},
1149 isbn = {9781450380591},
1150 file = {:C\:/Users/ryw9/Box/Papers Archive/Wong et al (2020) Beyond check list approaches to ethics in design.pdf:pdf},
1151 doi = {10.1145/3406865.3418590},
1152 booktitle = {Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing},
1153 author = {Wong, Richmond Y and Boyd, Karen and Metcalf, Jake and Shilton, Katie},
1154 address = {New York, NY, USA},
1155}
1156
1157@inproceedings{luger2015playing,
1158 year = {2015},
1159 url = {http://dx.doi.org/10.1145/2702123.2702142 http://dl.acm.org/citation.cfm?doid=2702123.2702142},
1160 title = {{Playing the Legal Card: Using Ideation Cards to Raise Data Protection Issues within the Design Process}},
1161 publisher = {ACM Press},
1162 pages = {457--466},
1163 isbn = {9781450331456},
1164 file = {:C\:/Users/ryw9/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Luger et al. - 2015 - Playing the Legal Card Using Ideation Cards to Raise Data Protection Issues within the Design Process.pdf:pdf},
1165 doi = {10.1145/2702123.2702142},
1166 booktitle = {Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI '15},
1167 author = {Luger, Ewa and Urquhart, Lachlan and Rodden, Tom and Golembewski, Michael},
1168 address = {New York, New York, USA},
1169}
1170
1171@article{shilton2020rolepplaying,
1172 year = {2020},
1173 url = {http://link.springer.com/10.1007/s11948-020-00250-0},
1174 title = {{Role-Playing Computer Ethics: Designing and Evaluating the Privacy by Design (PbD) Simulation}},
1175 month = {Jul},
1176 journal = {Science and Engineering Ethics},
1177 issn = {1353-3452},
1178 doi = {10.1007/s11948-020-00250-0},
1179 author = {Shilton, Katie and Heidenblad, Donal and Porter, Adam and Winter, Susan and Kendig, Mary},
1180}
1181
1182@incollection{flanagan2014values,
1183 year = {2014},
1184 title = {{Groundwork for Values in Games}},
1185 publisher = {MIT Press},
1186 chapter = {1},
1187 booktitle = {Values at Play in Digital Games},
1188 author = {Flanagan, Mary and Nissenbaum, Helen},
1189 address = {Cambridge, Massachusetts},
1190}
1191
1192@article{boyd2021datasheets,
1193 year = {2021},
1194 volume = {5},
1195 url = {https://dl.acm.org/doi/10.1145/3479582},
1196 title = {{Datasheets for Datasets help ML Engineers Notice and Understand Ethical Issues in Training Data}},
1197 publisher = {Association for Computing Machinery},
1198 pages = {1--27},
1199 number = {CSCW2},
1200 month = {oct},
1201 keywords = {development practices,ethical sensitivity,ethics,machine learning,training data},
1202 journal = {Proceedings of the ACM on Human-Computer Interaction},
1203 issn = {2573-0142},
1204 file = {:C\:/Users/ryw9/Box/Papers Archive/Boyd (2021) Datasheets for datasets help ML engineers notice and understand ethical issues in training data.pdf:pdf},
1205 doi = {10.1145/3479582},
1206 author = {Boyd, Karen L},
1207}
1208
1209@book{bowker1999sorting,
1210 address = {Cambridge, MA},
1211 publisher = {MIT Press},
1212 year = {1999},
1213 volume = {4},
1214 journal = {Classification and its consequences},
1215 author = {Bowker, Geoffrey and Star, Susan Leigh},
1216 title = {Sorting things out},
1217}
1218
1219@article{hoffmann2019wherefairness,
1220 file = {Hoffmann - 2019 - Where fairness fails data, algorithms, and the li.pdf:C\:\\Users\\ryw9\\Zotero\\storage\\ZE5HZ5RX\\Hoffmann - 2019 - Where fairness fails data, algorithms, and the li.pdf:application/pdf},
1221 pages = {900--915},
1222 year = {2019},
1223 month = {June},
1224 author = {Hoffmann, Anna Lauren},
1225 journal = {Information, Communication \& Society},
1226 urldate = {2022-10-14},
1227 number = {7},
1228 language = {en},
1229 doi = {10.1080/1369118X.2019.1573912},
1230 url = {https://www.tandfonline.com/doi/full/10.1080/1369118X.2019.1573912},
1231 shorttitle = {Where fairness fails},
1232 issn = {1369-118X, 1468-4462},
1233 volume = {22},
1234 title = {Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse},
1235}
1236
1237@article{green2021datascience,
1238 file = {Full Text:C\:\\Users\\ryw9\\Zotero\\storage\\LTBL88EB\\Green - 2021 - Data Science as Political Action Grounding Data S.pdf:application/pdf},
1239 pages = {249--265},
1240 year = {2021},
1241 month = {September},
1242 author = {Green, Ben},
1243 journal = {Journal of Social Computing},
1244 urldate = {2022-10-14},
1245 number = {3},
1246 doi = {10.23919/JSC.2021.0029},
1247 url = {https://ieeexplore.ieee.org/document/9684742/},
1248 shorttitle = {Data {Science} as {Political} {Action}},
1249 issn = {2688-5255},
1250 volume = {2},
1251 title = {Data {Science} as {Political} {Action}: {Grounding} {Data} {Science} in a {Politics} of {Justice}},
1252}
1253
1254@inproceedings{bietti2020ethicswashing,
1255 series = {FAT* '20},
1256 location = {Barcelona, Spain},
1257 keywords = {ethics, technology ethics, technology law, AI, moral philosophy, self-regulation, regulation},
1258 numpages = {10},
1259 pages = {210–219},
1260 booktitle = {Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency},
1261 abstract = {The word 'ethics' is under siege in technology policy circles. Weaponized in support of deregulation, self-regulation or handsoff governance, "ethics" is increasingly identified with technology companies' self-regulatory efforts and with shallow appearances of ethical behavior. So-called "ethics washing" by tech companies is on the rise, prompting criticism and scrutiny from scholars and the tech community at large. In parallel to the growth of ethics washing, its condemnation has led to a tendency to engage in "ethics bashing." This consists in the trivialization of ethics and moral philosophy now understood as discrete tools or pre-formed social structures such as ethics boards, self-governance schemes or stakeholder groups.The misunderstandings underlying ethics bashing are at least threefold: (a) philosophy and "ethics" are seen as a communications strategy and as a form of instrumentalized cover-up or fa\c{c}ade for unethical behavior, (b) philosophy is understood in opposition and as alternative to political representation and social organizing and (c) the role and importance of moral philosophy is downplayed and portrayed as mere "ivory tower" intellectualization of complex problems that need to be dealt with in practice.This paper argues that the rhetoric of ethics and morality should not be reductively instrumentalized, either by the industry in the form of "ethics washing," or by scholars and policy-makers in the form of "ethics bashing." Grappling with the role of philosophy and ethics requires moving beyond both tendencies and seeing ethics as a mode of inquiry that facilitates the evaluation of competing tech policy strategies. In other words, we must resist narrow reductivism of moral philosophy as instrumentalized performance and renew our faith in its intrinsic moral value as a mode of knowledgeseeking and inquiry. Far from mandating a self-regulatory scheme or a given governance structure, moral philosophy in fact facilitates the questioning and reconsideration of any given practice, situating it within a complex web of legal, political and economic institutions. Moral philosophy indeed can shed new light on human practices by adding needed perspective, explaining the relationship between technology and other worthy goals, situating technology within the human, the social, the political. It has become urgent to start considering technology ethics also from within and not only from outside of ethics.},
1262 doi = {10.1145/3351095.3372860},
1263 url = {https://doi.org/10.1145/3351095.3372860},
1264 address = {New York, NY, USA},
1265 publisher = {Association for Computing Machinery},
1266 isbn = {9781450369367},
1267 year = {2020},
1268 title = {From Ethics Washing to Ethics Bashing: A View on Tech Ethics from within Moral Philosophy},
1269 author = {Bietti, Elettra},
1270}
1271
1272@inproceedings{mcmillan2019againstethical,
1273 series = {HTTF 2019},
1274 location = {Nottingham, United Kingdom},
1275 keywords = {Ethics, Artificial Intelligence, Policy, Human Rights, Algorithms},
1276 numpages = {3},
1277 articleno = {9},
1278 booktitle = {Proceedings of the Halfway to the Future Symposium 2019},
1279 abstract = {In this paper we use the EU guidelines on ethical AI, and the responses to it, as a starting point to discuss the problems with our community’s focus on such manifestos, principles, and sets of guidelines. We cover how industry and academia are at times complicit in ‘Ethics Washing’, how developing guidelines carries the risk of diluting our rights in practice, and downplaying the role of our own self interest. We conclude by discussing briefly the role of technical practice in ethics.},
1280 doi = {10.1145/3363384.3363393},
1281 url = {https://doi.org/10.1145/3363384.3363393},
1282 address = {New York, NY, USA},
1283 publisher = {Association for Computing Machinery},
1284 isbn = {9781450372039},
1285 year = {2019},
1286 title = {Against Ethical AI},
1287 author = {McMillan, Donald and Brown, Barry},
1288}
Attribution
arXiv:2202.08792v2
[cs.CY]
License: cc-by-4.0