Join Our Discord (940+ Members)

The Fallacy of AI Functionality

Content License: cc-by

The Fallacy of AI Functionality

Papers is Alpha. This content is part of an effort to make research more accessible, and (most likely) has lost some details from the original. You can find the original paper here.

Introduction

As one of over 20,000 cases falsely flagged for unemployment benefit fraud by Michigan’s MIDAS algorithm, Brian Russell had to file for bankruptcy, undermining his ability to provide for his two young children. The state finally cleared him of the false charges two years later. RealPage, one of several automated tenant screening tools producing “cheap and fast—but not necessarily accurate—reports for an estimated nine out of 10 landlords across the country”, flagged Davone Jackson with a false arrest record, pushing him out of low income housing and into a small motel room with his 9-year-old daughter for nearly a year. Josiah Elleston-Burrell had his post-secondary admissions potentially revoked, Robert Williams was wrongfully arrested for a false facial recognition match, Tammy Dobbs lost critical access to healthcare benefits. The repercussions of AI-related functionality failures in high stakes scenarios cannot be overstated, and the impact reverberates in real lives for weeks, months and even years.

Despite the current public fervor over the great potential of AI, many deployed algorithmic products do not work. AI-enabled moderation tools regularly flag safe content, teacher assessment tools mark star instructors to be fired, hospital bed assignment algorithms prioritize healthy over sick patients, andmedical insurance service distribution and pricing systems gatekeep necessary care-taking resources. Deployed AI-enabled clinical support tools misallocate prescriptions, misread medical images, and misdiagnose. The New York MTA’s pilot of facial recognition had a reported 100% error rate, yet the program moved forward anyway. Some of these failures have already proven to disproportionately impact some more than others: moderation tool glitches target minoritized groups; facial recognition tools fail on darker skinned female faces; a hospital resource allocation algorithm’s misjudgements will mostly impact Black and lower income patients. However, all failures in sum reveal a broader pattern of a market saturated with dysfunctional, deployed AI products.

Importantly, the hype is not limited to AI’s boosters in corporations and the technology press; scholars and policymakers often assume functionality while discussing the dangers of algorithmic systems as well. In fact, many of the current critiques, policy positions and interventions in algorithmic accountability implicitly begin from thepremise that such deployed algorithmic systems work, echoing narratives of super-human ability, broad applicability, and consistency, espoused in corporate marketing materials, academic research papers and in mainstream media. These proposals thus often fall short of acknowledging the functionality issues in AI deployments and the role of the lack of functional safety in contributing to the harm perpetuated by these systems. The myth of functionality is one held dearly by corporate stakeholders and their investors. If a product works, we can weigh its costs and benefits. But if the product does not work, the judgment is no longer a matter of pros and cons, but a much simpler calculation, exposing that this product does not deserve its spot on the market. Although notions of accuracy and product expectations are stakeholder-dependent and can be contested, the assessment of such claims are often easier to empirically measure, grounding the discussion of harm in a way that is challenging to repudiate.

As an overlooked aspect of AI policy, functionality is often presented as a consideration secondary to other ethical challenges. In this paper, we argue that it is a primary concern that often precedes such problems. We start by calling out what we perceive to be a functionality assumption, prevalent in much of the discourse on AI risks. We then argue that this assumption does not hold in a large set of cases. Drawing on the AI, Algorithmic and Automation Incident and Controversy Repository (AAAIRC), we offer a taxonomy of the ways in which such failures can take form and the harms they cause, which differ from the more commonly cited critiques of AI. We then discuss the existing accountability tools to address functionality issues, that are often overlooked in AI policy literature and in practice, due in large part to this assumption of functionality.

A review of past work demonstrates that although there is some acknowledgement that AI has a functionality problem, little has been done to systematically discuss the range of problems specifically associated with functionality. Recent work details that the AI research field suffers from scientific validity and evaluation problems.have demonstrated reproducibility failures in published work on predicting civil wars.found that advances in machine learning often “evaporate under closer scrutiny or turn out to be less widely applicable than originally hoped.”

There is also some work demonstrating that AI products are challenging to engineer correctly in practice. In a survey of practitioners,describe how developers often modify traditional software engineering practices due to unique challenges presented by ML, such as the increased effort required for testing and defining requirements. They also found that ML practitioners “tend to communicate less frequently with clients” and struggle to make accurate plans for the tasks required in the development process.have additionally argued that ML systems “have a special capacity for incurring technical debt.”

Other papers discuss how the AI label lends itself to inflated claims of functionality that the systems cannot meet.andcritique hyped narratives pushed in the AI industry, joined by many similar domain-specific critiques.recently popularized the metaphor of “snake oil” as a description of such AI products, raising concerns about the hyperbolic claims now common on the market today.has noted that despite the “intelligent” label, many deployed AI systems used by public agencies involve simple models defined by manually crafted heuristics. Similarly,argue that AI makes claims to generality while modeling behaviour that is determined by highly constrained and context-specific data. In a study of actual AI policy discussions,found that policymakers often define AI with respect to how human-like a system is, and concluded that this could lead to deprioritizing issues more grounded in reality.

Finally,has argued that even critics of technology often hype the very technologies that they critique, as a way of inflating the perception of their dangers. He refers to this phenomenon as “criti-hype”—criticism which both needs and feeds on hype. As an example, he points to disinformation researchers, who embrace corporate talking points of a recommendation model that can meaningfully influence consumer behavior to the point of controlling their purchases or voting activity—when in actuality, these algorithms have little ability to do either. Even the infamous Cambridge Analytica product was revealed to be “barely better than chance at applying the right [personality] scores to individuals”, and the company accused explicitly of “selling snake oil”.

The Functionality Assumption

It is unsurprising that promoters of AI do not tend to question its functionality. More surprising is the prevalence of criti-hype in the scholarship and political narratives around automation and machine learning—even amidst discussion of valid concerns such as trustworthiness, democratization, fairness, interpretability, and safety. These fears, though legitimate, are often premature “wishful worries”—fears that can only be realized once the technology works, or works “too well”, rather than being grounded in a reality where these systems do not always function as expected. In this section, we discuss how criti-hype in AImanifests as an unspoken assumption of functionality.

The functionality of AI systems is rarely explicitly mentioned in AI principle statements, policy proposals and AI ethics guidelines. In a recent review of the landscape of AI ethics guidelines,found that few acknowledge the possibility of AI not working as advertised. In guidelines about preventing malfeasance, the primary concern is malicious use of supposedly functional AI products by nefarious actors. Guidelines around “trust” are geared towards eliciting trust in AI systems from users or the public, implying that trusting these AI products would be to the benefit of these stakeholders and allow AI to “fulfill its world changing potential’’. Just one guideline of the hundreds reviewed in the survey “explicitly suggests that, instead of demanding understandability, it should be ensured that AI fulfills public expectations’’. Similarly, the U.S. National Institute of Standards and Technology (NIST) seeks to define “trustworthiness” based primarily on how much people are willing to use the AI systems they are interacting with. This framing puts the onus on people to trust in systems, and not on institutions to make their systems reliably operational, in order to earn that trust. NIST’s concept of trust is also limited, citing the “dependability” section of ISO/IEEE/IEC standards, but leaving out other critical concepts in these dependability engineering standards that represent basic functionality requirements, including assurance, claim veracity, integrity level, systematic failure, or dangerous condition. Similarly, the international trade group, the Organisation for Economic Co-operation and Development (OECD), mentions “robustness” and “trustworthy AI” in their AI principles but makes no explicit mention of expectations around basic functionality or performance assessment.

The ideal of “democratizing” AI systems, and the resulting AI innovation policy, is another effort premised on the assumed functionality of AI. This is the argument that access to AI tooling and AI skills should be expanded—with the corollary claim that it is problematic that only certain institutions, nations, or individuals have access to the ability to build these systems. A recent example of democratization efforts was the global push for the relaxation of oversight in data sharing in order to allow for more innovation in AI tool development in the wake of the COVID-19 pandemic. The goal of such efforts was to empower a wider range of non-AI domain experts to participate in AI tool development. This policy impact was long lasting and informed later efforts such as the AI National Resource (AINR) effort in the USand the National Medical Imaging Platform (NMIP) executed by National Health Services (NHS) in the UK. In this flurry of expedited activity, some parallel concerns were also raised about how the new COVID-19 AI tools would adequately address cybersecurity, privacy, and anti-discrimination challenges, but the functionality and utility of the systems remained untested for some time.

An extremely premature set of concerns are those of an autonomous agent becoming so intelligent that humans lose control of the system. While it is not controversial to claim that such concerns are far from being realized, this fear of misspecified objectives, runaway feedback loops, and AI alignment presumes the existence of an industry that can get AI systems to execute on any clearly declared objectives, and that the main challenge is to choose and design an appropriate goal. Needless to say, if one thinks the danger of AI is that it will work too well, it is a necessary precondition that it works at all.

The fear of hyper-competent AI systems also drives discussions on potential misuse. For example, expressed concerns around large language models centers on hyped narratives of the models’ ability to generate hyper-realistic online content, which could theoretically be used by malicious actors to facilitate harmful misinformation campaigns. While these are credible threats, concerns around large language models tend to dismiss the practical limitations of what these models can achieve, neglecting to address more mundane hazards tied to the premature deployment of a system that does not work. This pattern is evident in the EU draft AI regulation, where, even as the legislation does concern functionality to a degree, the primary concerns—questions of “manipulative systems,” “social scoring,” and “emotional or biometric categorization”—“border on the fantastical”. A major policy focus in recent years has been addressing issues of bias and fairness in AI. Fairness research is often centered around attempting to balance some notion of accuracy with some notion of fairness. This research question presumes that an unconstrained solution without fairness restrictions is the optimal solution to the problem. However, this intuition is only valid when certain conditions and assumptions are met, such as the measurement validity of the data and labels. Scholarship on fairness also sometimes presumes that unconstrained models will be optimal or at least useful.argued that U.S. anti-discrimination law would have difficulty addressing algorithmic bias because the “nature of data mining” means that in many cases we can assume the decision is at least statistically valid. Similarly, as an early example of technical fairness solutions,created a method to remove disparate impact from a model while preserving rank, which only makes sense if the unconstrained system output is correct in the first place. Industry practitioners then carry this assumption into how they approach fairness in AI deployments. For example, audits of AI hiring tools focus primarily on ensuring an 80% selection rate for protected classes (the so-called 4/5ths rule) is satisfied, and rarely mention product validation processes, demonstrating an assumed validity of the prediction task.

Another dominant theme in AI policy developments is that of explainability or interpretability. The purpose of making models explainable or interpretable differs depending on who is seen as needing to understand them. From the engineering side, interpretability is usually desired for debugging purposes, so it is focused on functionality. But on the legal or ethical side, things look different. There has been much discussion about whether the GDPR includes a “right to explanation” and what such a right entails. Those rights would serve different purposes. To the extent the purpose of explanation is to enable contestation, then functionality is likely included as an aspect of the system subject to challenge. To the extent explanation is desired to educate consumers about how to improve their chances in the future, such rights are only useful when the underlying model is functional. Similarly, to the extent regulators are looking into functionality, explanations aimed at regulators can assist oversight, but typically explanations are desired to check the basis for decisions, while assuming the systems work as intended. Not all recent policy developments hold the functionality assumption strongly. The Food and Drug Administration (FDA) guidelines for AI systems integrated into software as a medical device (SaMD) has a strong emphasis on functional performance, clearly not taking product performance as a given. The draft AI Act in the EU includes requirements for pre-marketing controls to establish products’ safety and performance, as well as quality management for high risk systems. These mentions suggest that functionality is not always ignored outright. Sometimes, it is considered in policy, but in many cases, that consideration lacks the emphasis of the other concerns presented.

The Many Dimensions of AI Dysfunction

Functionality can be difficult to define precisely. The dictionary definition of “fitness for a product’s intended use”is useful, but incomplete, as some intended uses are impossible. Functionality could also be seen as a statement that a product lives up to the vendor’s performance claims, but this, too, is incomplete; specifications chosen by the vendor could be insufficient to solve the problem at hand. Another possible definition is “meeting stakeholder expectations” more generally, but this is too broad as it sweeps in wider AI ethics concerns with those of performance or operation.

Lacking a perfectly precise definition of functionality, in this section we invert the question by creating a taxonomy that brings together disparate notions of product failure. Our taxonomy serves several other purposes, as well. Firstly, the sheer number of points of failure we were able to identify illustrates the scope of the problem. Secondly, we offer language in which to ground future discussions of functionality in research and policy. Finally, we hope that future proposals for interventions can use this framework to concretely illustrate the way any proposed interventions might work to prevent different kinds of failure.

Methodology

To challenge the functionality assumption and demonstrate the various ways in which AI doesn’t work, we developed a taxonomy of known AI failures through the systematic review of case studies. To do this, we partly relied on the AI, Algorithmic and Automation Incident and Controversy Repository (AIAAIC) spreadsheet crowdsourced from journalism professionals. Out of a database of over 800 cases, we filtered the cases down to a spreadsheet of 283 cases from 2012 to 2021 based on whether the technology involved claimed to be AI, ML or data-driven, and whether the harm reported was due to a failure of the technology. In particular, we focused on describing the ways in which the artifact itself was connected to the failure, as opposed to infrastructural or environmental “meta” failures which caused harm through the artifact. We split up the rows in the resulting set and used an iterative tagging procedure to come up with categories that associate each example with a different element or cause of failure. We updated, merged, and grouped our tags in meetings between tagging sessions, resulting in the following taxonomy. We then chose known case studies from the media and academic literature to illustrate and best characterize these failure modes.

Failure Taxonomy

Here, we present a taxonomy of AI system failures and provide examples of known instances of harm. Many of these cases are direct refutations of the specific instances of the functionality assumptions in Section assumption.

Failure Taxonomy

Table Label: tab-taxonomy

Download PDF to view table

Impossible Tasks

In some situations, a system is not just “broken” in the sense that it needs to be fixed. Researchers across many fields have shown that certain prediction tasks cannot be solved with machine learning. These are settings in which no specific AI developed for the task can ever possibly work, and a functionality-centered critique can be made with respect to the task more generally. Since these general critiques sometimes rely on philosophical, controversial, or morally contested grounds, the arguments can be difficult to leverage practically and may imply the need for further evidence of failure modes along the lines of our other categories.

Conceptually Impossible

Certain classes of tasks have been scientifically or philosophically “debunked” by extensive literature. In these cases, there is no plausible connection between observable data and the proposed target of the prediction task. This includes what Stark and Hutson call “physiognomic artificial intelligence,” which attempts to infer or create hierarchies about personal characteristics from data about their physical appearance. Criticizing the EU Act’s failure to address this inconvenient truth,pointed out that “those claiming to detect emotion use oversimplified, questionable taxonomies; incorrectly assume universality across cultures and contexts; and risk `[taking] us back to the phrenological past’ of analysing character traits from facial structures.”

A notorious example of technology broken by definition are attempts to infer “criminality” from a person’s physical appearance. A paper claiming to do this “with no racial bias” was announced by researchers at Harrisburg University in 2020, prompting widespread criticism from the machine learning community. In an open letter, the Coalition for Critical Technology note that the only plausible relationship between a person’s appearance and their propensity to commit a crime is via the biased nature of the category of “criminality” itself. In this setting, there is no logical basis with which to claim functionality.

Practically Impossible There can be other, more practical reasons for why a machine learning model or algorithm cannot perform a certain task. For example, in the absence of any reasonable observable characteristics or accessible data to measure the model goals in question, attempts to represent these objectives end up being inappropriate proxies. As a construct validity issue, the constructs of the built model could not possibly meaningfully represent those relevant to the task at hand.

Criminal justice offers a wide variety of such practically impossible tasks, either through the lack of availability of the data.Many predictive policing tools are arguably practically impossible AI systems. Predictive policing attempts to predict crime at either the granularity of location or at an individual level. The data that would be required to do the task properly—accurate data about when and where crimes occur—does not and will never exist. While crime is a concept with a fairly fixed definition, it is practically impossible to predict because of structural problems in its collection. The problems with crime data are well-documented—whether in differential victim crime reporting rates, selection bias based on policing activities, dirty data from periods of recorded unlawful policing, and more.

What work is this discussion of thick concepts doing? Can we delete? Why is it in this subsection while it says they are conceptually impossible? Also, is it necessarily accurate that thick concept = conceptually impossible? Some of these are thick concepts – concepts that describe and evaluate simultaneously – which ideally are co-produced but are typically based on the values judgement of researchers. Common focus areas of criminal justice algorithms like “likelihood of flight”,“public safety”, or “dangerousness” represent thick concepts. A universal or legally valid version of these constructs may not be possible, making thempotentially conceptually impossible.

Due to upstream policy, data or societal choices, AI tasks can be practically impossible for one set of developers and not for another, or for different reasons in different contexts. The fragmentation, billing focus, and competing incentives of the US healthcare system have made multiple healthcare-related AI tasks practically impossible. US EHR data is often erroneous, miscoded, fragmented, and incomplete, creating a mismatch between available data and intended use. Many of these challenges appeared when IBM attempted to support cancer diagnoses. In one instance, this meant using synthetic as opposed to real patients for oncology prediction data, leading to “unsafe and incorrect” recommendations for cancer treatments. In another, IBM worked with MD Anderson to work on leukemia patient records, poorly extracting reliable insights from time-dependent information like therapy timelines—the components of care most likely to be mixed up in fragmented doctors’ notes.

Engineering Failures

Algorithm developers maintain enormous discretion over a host of decisions, and make choices throughout the model development lifecycle. These engineering choices include defining problem formulation, setting up evaluation criteria, and determining a variety of other details. Failures in AI systems can often be traced to these specific policies or decisions in the development process of the system.

Model Design Failures

Sometimes, the design specifications of a model are inappropriate for the task it is being developed for. For instance, in a classification model, choices such as which input and target variables to use, whether to prioritize accepting true positives or rejecting false negatives, and how to process the training data all factor into determining model outcomes. These choices are normative and may prioritize values such as efficiency over preventing harmful failures. In 2014, BBC Panorama uncovered evidence of international students systematically cheating on English language exams run by the UK’s Educational Testing Service by having others take the exam for them. The Home Office began an investigation and campaign to cancel the visas of anyone who was found to have cheated. In 2015, ETS used voice recognition technology to identify this type of cheating. According to the National Audit Office,

ETS identified 97% of all UK tests as “suspicious”. It classified 58% of 58,459 UK tests as “invalid” and 39% as “questionable”. The Home Office did not have the expertise to validate the results nor did it, at this stage, get an expert opinion on the quality of the voice recognition evidence. … but the Home Office started cancelling visas of those individuals given an “invalid” test.

The staggering number of accusations obviously included a number of false positives. The accuracy of ETS’s method was disputed between experts sought by the National Union of Students and the Home Office; the resulting estimates of error rates ranged from 1% to 30%. Yet out of 12,500 people who appealed their immigration decisions, only 3,600 won their cases—and only a fraction of these were won through actually disproving the allegations of cheating. This highly opaque system was thus notable for the disproportionate amount of emphasis that was put into finding cheaters rather than protecting those who were falsely accused. Although we cannot be sure the voice recognition model was trained to optimize for sensitivity rather than specificity, as the head of the NAO aptly put, “When the Home Office acted vigorously to exclude individuals and shut down colleges involved in the English language test cheating scandal, we think they should have taken an equally vigorous approach to protecting those who did not cheat but who were still caught up in the process, however small a proportion they might be”. This is an example of a system that was not designed to prevent a particular type of harmful failure.

Model Implementation Failures

Even if a model was conceptualized in a reasonable way, some component of the system downstream from the original plan can be executed badly, lazily, or wrong. In 2011, the state of Idaho attempted to build an algorithm to set Medicaid assistance limits for individuals with developmental and intellectual disabilities. When individuals reported sudden drastic cuts to their allowances, the ACLU of Idaho tried to find out how the allowances were being calculated, only to be told it was a trade secret. Theclass action lawsuit that followed resulted in a court-ordered disclosure of the algorithm, which was revealed to have critical flaws. According to Richard Eppink, Legal Director of the ACLU of Idaho,

There were a lot of things wrong with it. First of all, the data they used to come up with their formula for setting people’s assistance limits was corrupt. They were using historical data to predict what was going to happen in the future. But they had to throw out two-thirds of the records they had before they came up with the formula because of data entry errors and data that didn’t make sense.

Data validation is a critical step in the construction of a ML system, and the team that built the benefit system chose to use a highly problematic dataset to train their model. For this reason, we consider this to be an implementation failure.

Another way that failures can be attributed to poor implementation is when a testing framework was not appropriately implemented. One area in which a lack of sufficient testing has been observed in the development of AI is in the area of clinical medicine.systematically examined the methods and claims of studies which compared the performance of diagnostic deep learning computer vision algorithms against that of expert clinicians. In their literature review, they identified 10 randomized clinical trials and 81 non-randomized clinical trials. Of the 81 non-randomized studies, they found the median number of clinical experts compared to the AI was 4, full access to datasets and code were unavailable in over 90% of studies, the overall risk of bias was high, and adherence to reporting standards were suboptimal, and therefore poorly substantiate their claims. Similarly, the Epic sepsis prediction model, a product actually implemented at hundreds of hospitals, was recently externally validated by, who found that the model had poor calibration to other hospital settings and discriminated against under-represented demographics. These results suggest that the model’s testing prior to deployment may have been insufficient to estimate its real-world performance. Notably, the COVID-19 technology which resulted from innovation policy and democratization efforts mentioned in section assumption was later shown to be completely unsuitable for clinical deployment after the fact.

Missing Safety Features

Sometimes model failures are anticipated yet difficult to prevent; in this case, engineers can sometimes take steps to ensure these points of failure will not cause harm. In 2014, a Nest Labs smoke and carbon monoxide detector was recalled. The detector had a feature which allowed the user to turn it off with a “wave” gesture. However, the company discovered in testing that under certain circumstances, the sensor could be unintentionally deactivated. Detecting a wave gesture with complete accuracy is impossible, and Google acknowledges factors that contribute to the possibility of accidental wave triggering for its other home products. However, the lack of a failsafe to make sure the carbon monoxide detector could not be turned off accidentally made the product dangerous.

In the same way, the National Transportation Safety Board (NTSB) cited a lack of adequate safety measures—such as “a warning/alert when the driver’s hands are off the steering wheel”, “remote monitoring of vehicle operators” and even the companies’ “inadequate safety culture”—as the probable causes in at least two highly publicized fatal crashes of Uberand Teslaself-driving cars. As products in public beta-testing, this lack of functional safeguards was considered to be an even more serious operational hazard than any of the engineering failures involved (such as the vehicle’s inability to detect an incoming pedestrianor truck).

This category also encompasses algorithmic decision systems in critical settings that lack a functional appeals process. This has been a recurring feature in algorithms which allocate benefits on behalf of the government. Not all of these automated systems rely on machine learning, but many have been plagued by bugs and faulty data, resulting in the denial of critical resources owed to citizens. In the case of the Idaho data-driven benefit allocation system, even the people responsible for reviewing appeals were unable to act as a failsafe for the algorithm’s mistakes: “They would look at the system and say, `It’s beyond my authority and my expertise to question the quality of this result’ ”.

Deployment Failures

Sometimes, despite attempts to anticipate failure modes during the design phase, the model does not “fail” until it is exposed to certain external factors and dynamics that arise after it is deployed.

Robustness Issues

A well-documented source of failure is a lack of robustness to changing external conditions.have observed that the benchmarking methods used for evaluation in machine learning can suffer from both internal and external validity problems, where “internal validity refers to issues that arise within the context of a single benchmark” and “external validity asks whether progress on a benchmark transfers to other problems.” If a model is developed in a certain context without strong evaluation methods for external validity, it may perform poorly when exposed to real-world conditions that were not captured by the original context. For instance, while many computer vision models developed on ImageNet are tested on synthetic image perturbations in an attempt to measure and improve robustness, buthave found that these models are not robust to real-world distribution shifts such as a change in lighting or pose. Robustness issues are also of dangerous consequence in language models. For example, when large language models are used to process the queries of AI-powered web search, the models’ fragility to misspellings, or trivial changes toformatand contextcan lead to unexpected results. In one case, a large language model used in Google search could not adequately handle cases ofnegation– and so when queried with “what to do when having a seizure”, the model alarmingly sourced the information for what not to do, unable to differentiate between the two cases.

Failure under Adversarial Attacks

Failures can also be induced by the actions of an adversary—an actor deliberately trying to make the model fail. Real-world examples of this often appear in the context of facial recognition, in which adversaries have some evidence that they can fool face-detection systems with, such as 3d-printed masksor software-generated makeup. Machine learning researchers have studied what they call “adversarial examples,” or inputs that are designed to make a machine learning model fail. However, some of this research has been criticized by its lack of a believable threat model— in other words, not focusing on what real-world “adversaries” are actually likely to do.

Unanticipated Interactions

A model can also fail to account for uses or interactions that it was not initially conceived to handle. Even if an external actor or user is not deliberately trying to break a model, their actions may induce failure if they interact with the model in a way that was not planned for by the model’s designers. For instance, there is evidence that this happened at the Las Vegas Police Department:

As new records about one popular police facial recognition system show, the quality of the probe image dramatically affects the likelihood that the system will return probable matches. But that doesn’t mean police don’t use bad pictures anyway. According to documents obtained by Motherboard, the Las Vegas Metropolitan Police Department (LVMPD) used “non-suitable" probe images in almost half of all the facial recognition searches it made last year, greatly increasing the chances the system would falsely identify suspects, facial recognition researchers said.This aligns with reports fromabout other police departments inappropriately uploading sketch and celebrity photos to facial recognition tools. It is possible for designers to preempt misuse by implementing instructions, warnings, or error conditions, and failure to do so creates a system that does not function properly.

Communication Failures

As with other areas of software development, roles in AI development and deployment are becoming more specialized. Some roles focus on managing the data that feeds into models, others specialize in modeling, and others optimally engineer models for speed and scale. There are even those in “analytics translator” roles – managers dedicated to acting as communicators between data science work and non-technical business leaders. And, of course, there are salespeople. Throughout this chain of actors, potential miscommunications or outright lies can happen about the performance, functional safety or other aspects of deployed AI/ML systems. Communication failures often co-occur with other functional safety problems, and the lack of accountability for false claims – intentional or otherwise – makes these particularly pernicious and likely to occur as AI hype continues absent effective regulation.

Falsified or Overstated Capabilities

To pursue commercial or reputational interests, companies and researchers may explicitly make claims about models which are provably untrue. A common form of this are claims that a product is “AI”, when in fact it mainly involves humans making decisions behind the scenes. While this in and of itself may not create unsafe products, expectations based on unreasonable claims can create unearned trust, and a potential over-reliance that hurts parties who purchase the product. As an example, investors poured money into ScaleFactor, a startup that claimed to have AI that could replace accountants for small businesses, with the exciting (for accountants) tagline “Because evenings are for families, not finance”. Under the hood, however,

Instead of software producing financial statements, dozens of accountants did most of it manually from ScaleFactor’s Austin headquarters or from an outsourcing office in the Philippines, according to former employees. Some customers say they received books filled with errors, and were forced to re-hire accountants, or clean up the mess themselves.Even large well-funded entities misrepresent the capabilities of their AI products. Deceptively constructed evaluation schemes allow AI product creators to make false claims. In 2018, Microsoft created machine translation with “equal accuracy to humans in Chinese to English translations”. However, the study used to make this claim (still prominently displayed in press release materials) was quickly debunked by a series of outside researchers who found that at the document-level, when provided with context from nearby sentences, and/or compared to human experts, the machine translation model did not indeed achieve equal accuracy to human translators. This follows a pattern seen with machine learning products in general, where the advertised performance on a simple and static data benchmark, is much lower than the performance on the often more complex and diverse data encountered in practice.

Misrepresented Capabilities A simple way to deceive customers into using prediction services is to sell the product for a purpose you know it can’t reliably be used for. In 2018, the ACLU of Northern California revealed that Amazon effectively misrepresented capabilities to police departments in selling their facial recognition product, Rekognition. Building on previous work, the ACLU ran Rekognition with a database of mugshots against members of U.S. Congress using the default setting and found 28 members falsely matched within the database, with people of color shown as a disproportionate share of these errors. This result was echoed bymonths later. Amazon responded by claiming that for police use cases, the threshold for the service should be set at either 95% or 99% confidence. However, based on a detailed timeline of events, it is clear that in selling the service through blog posts and other campaigns that thresholds were set at 80% or 85% confidence, as the ACLU had used in its investigation. In fact, suggestions to shift that threshold were buried in manuals end-users did not read or use – even when working in partnership with Amazon. At least one of Amazon’s police clients also claimed being unaware of needing to modify the default threshold.

The hype surrounding IBM’s Watson in healthcare represents another example where a product that may have been fully capable of performing specific helpful tasks was sold as a panacea to health care’s ills. As discussed earlier, this is partially the result of functional failures like practical impossibility – but these failures were coupled with deceptively exaggerated claims. The backlash to this hype has been swift in recent years, with one venture capitalist claiming “I think what IBM is excellent at is using their sales and marketing infrastructure to convince people who have asymmetrically less knowledge to pay for something”. At Memorial-Sloan Kettering, after $62 million dollars spent and may years of effort, MD Anderson famously cancelled IBM Watson contracts with no results to show for it. This is particularly a problem in the context of algorithms developed by public agencies – where the AI systems can be adopted as symbols for progress, or smokescreens for undesirable policy outcomes, and thus liable to inflated narratives of performance.discusses how the celebrated success of “self-driving shuttles” in Columbus, Ohio omits its marked failure in the lower-income Linden neighborhood, where residents were now locked out of the transportation apps due to a lack of access to a bank account, credit cards, a data plan or Wi-Fi. Similarly,demonstrates how a $1.4 billion contract with a coalition of high-tech companies led an Indiana governor to stubbornly continue a welfare automation algorithm that resulted in a 54% increase in the denials of welfare applications.

Dealing With Dysfunction: Opportunities for Intervention on Functional Safety

The challenge of dealing with an influx of fraudulent or dysfunctional products is one that has plagued many industries, including food safety, medicine, financial modeling, civil aviationand the automobile industry. In many cases, it required the active advocacy of concerned citizens to lead to the policy interventions that would effectively change the tide of these industries. The AI field seems to now be facing this same challenge.

Thankfully, as AI operates as a general purpose technology prevalent in many of these industries, there already exists a plethora of governance infrastructure to address this issue in related fields of application. In fact, healthcare is the field where AI product failures appear to be the most visible, in part due to the rigor of pre-established evaluation processes. Similarly, the transportation industry has a rich history of thorough accident reports and investigations, through organizations such as the National Transportation and Safety Board (NTSB), who have already been responsible for assessing the damage from the few known cases of self-driving car crashes from Uber and Tesla.

In this section, we specifically outline the legal and organizational interventions necessary to address functionality issues in general context in which AI is developed and deployed into the market. In broader terms, the concept of functional safety in engineering design literaturewell encapsulates the concerns articulated in this paper—namely that a system can be deployed without working very well, and that such performance issues can cause harm worth preventing.

Legal/Policy Interventions

The law has several tools at its disposal to address product failures to work correctly. They mostly fall in the category of consumer protection law. This discussion will be U.S.-based, but analogues exist in most jurisdictions.

Consumer Protection

The Federal Trade Commission is the federal consumer protection agency within the United States with the broadest subject matter jurisdiction. Under Section 5 of the FTC Act, it has the authority to regulate “unfair and deceptive acts or practices” in commerce. This is abroad grant authority to regulate practices that injure consumers. The authority to regulate deceptive practices applies to any material misleading claims relating to a consumer product. The FTC need not show intent to deceive or that deception actually occurred, only that claims are misleading. Deceptive claims can be expressed explicitly—for example, representation in the sales materials that is inaccurate—or implied, such as an aspect of the design that suggests a functionality the product lacks. Many of the different failures, especially impossibility, can trigger a deceptive practices claim.

The FTC’s ability to address unfair practices is wider-ranging but more controversial. The FTC can reach any practice “likely to cause substantial injury to consumers[,] not reasonably avoidable by consumers themselves and not outweighed by countervailing benefits to consumers”. Thus, where dysfunctional AI is being sold and its failures causes substantial harm to consumers, the FTC could step in. Based on the FTC’s approach to data security, in which the Commission has sued companies for failing to adequately secure consumer data in their possession against unknown third-party attackers, even post-deployment failures—if foreseeable and harmful—can be included among unfair practices, though they partially attributable to external actors.

The FTC can use this authority to seek an injunction, requiring companies to cease the practice. Formally, the FTC does not have the power to issue fines under its Section 5 authority, but the Commission frequently enters into long-term consent decrees with companies that it sues, permitting continuing jurisdiction, monitoring, and fines for future violations. The Commission does not have general rulemaking authority, so most of its actions to date have taken the form of public education and enforcement. The Commission does, however, have authority to make rules regarding unfair or deceptive practices under the Magnuson-Moss Warranty Act. Though it has created no new rules since 1980, in July 2021, the FTC voted to change internal agency policies to make it easier to do so.

Other federal agencies also have the ability to regulate faulty AI systems, depending on their subject matter. The Consumer Product Safety Commission governs the risks of physical injury due to consumer products. They can create mandatory standards for products, can require certifications of adherence to those rules, and can investigate products that have caused harm, leading to bans or mandatory recalls. The National Highway Safety Administration offers similar oversight for automobiles specifically. The Consumer Finance Protection Bureau can regulate harms from products dealing with loans, banking, or other consumer finance issues.

In addition to various federal agencies, all states have consumer protection statutes that bar deceptive practices and many bar unfair practices as well, like the FTC Act. False advertising laws are related and also common. State attorneys general often take active roles as enforcers of those laws. Of course, the efficacy of such laws varies from state to state, but in principle, they become another source of law and enforcement to look to for the same reasons that the FTC can regulate under Section 5. One particular state law worth noting is California’s Unfair Competition Law, which allows individuals to sue for injunctive relief to halt conduct that violates other laws, even if individuals could not otherwise sue under that law.

It is certainly no great revelation that federal and state regulatory apparatuses exist. Rather, our point is that while concerns about discrimination and due process can lead to difficult questions about the operation of existing law and proposals for legal reform, thinking about the ways that AI is not working makes it look like other product failures that we know how to address. Where AI doesn’t work, suddenly regulatory authority is easy to find.

Products Liability Law

Another avenue for legal accountability may come from the tort of products liability, though there are some potential hurdles. In general, if a person is injured by a defective product, they can sue the producer or seller in products liability. The plaintiff need not have purchased or used the product; it is enough that they were injured by it, and the product has a defect that rendered it unsafe.

It would stand to reason that a functionality failure in an AI system could be deemed a product defect. But surprisingly, defective software has never led to a products liability verdict. One commonly cited reason is that products liability applies most clearly to tangible things, rather than information products, and that aside from a stray comment in one appellate case, no court has actually ruled that software is even a “product” for these purposes. This would likely not be a problem for software that resides within a physical system, but for non-embodied AI, it might pose a hurdle. In a similar vein, because most software harms have typically been economic in nature, with, for example, a software crash leading to a loss of work product, courts have rejected these claims as “pure economic loss” belonging more properly in contract law than tort. But these mostly reflect courts’ anxiety with intangible injuries, and as AI discourse has come to recognize many concrete harms, these concerns are less likely to be hurdles going forward.

Writing about software and tort law,identifies the complexity of software as a more fundamental type of hurdle. For software of nontrivial complexity, it is provably impossible to guarantee bug-free code. An important part of products liability is weighing the cost of improvements and more testing against the harms. But as no amount of testing can guarantee bug-free software, it will difficult to determine how much testing is enough to be considered reasonable or non-negligent. Choi analogizes this issue to car crashes: car crashes are inevitable, but courts developed the idea of crashworthiness to ask about the car’s contribution to the total harm, even if the initial injury was attributable to a product defect. While Choi looks to crashworthiness as a solution, the thrust of his argument is that software can cause exactly the type of injury that products liability aims to protect us from, and doctrine should reflect that.

While algorithmic systems have a similar sort of problem, the failure we describe here are more basic. Much as writing bug-free software is impossible, creating a model that handles every corner case perfectly is impossible. But the failures we address here are not about unforeseeable corner cases in models. We are concerned with easier questions of basic functionality, without which a system should never have been shipped. If a system is not functional, in the sense we describe, a court should have no problem finding that it is unreasonably defective. As discussed above, a product could be placed on the market claiming the ability to do something it cannot achieve in theory or in practice, or it can fail to be robust to unanticipated but foreseeable uses by consumers. Even where these errors might be difficult to classify in doctrinally rigid categories of defect, courts have increasingly been relying on “malfunction doctrine,” which allows for circumstantial evidence to be used as proof of defect where “a product fails to perform its manifestly intended function.”. Courts are increasingly relying on this doctrine and it could apply here. Products liability could especially easily apply to engineering failures, where the error was foreseeable and an alternative, working version of the product should have been built.

Warranties

Another area of law implicated by product failure is warranty law, which protects the purchasers of defunct AI and certain third parties who stand to benefit from the sale. Sales of goods typically come with a set of implied warranties. The implied warranty of merchantability applies to all goods and states, among other things, that the good is “fit for the ordinary purposes for which such goods are used”. The implied warranty of fitness for particular purpose applies when a seller knows that the buyer has a specific purpose in mind and the buyer is relying on the seller’s skill or judgment about the good’s fitness, stating that the good is fit for that purpose. Defunct AI will breach both these warranties. The remedy for such a breach is limited to contract damages. This area of law is concerned with ensuring that purchasers get what they pay for, so compensation will be limited roughly to value of the sale. Injuries not related to the breach of contract are meant to be worked out in tort law, as described above.

Fraud

In extreme cases, the sale of defunct AI may constitute fraud. Fraud has many specific meanings in law, but invariably it involves a knowing or intentional misrepresentation that the victim relied on in good faith. In contract law, proving that a person was defrauded can lead to contract damages. Restitution is another possible remedy for fraud. In tort law, a claim of fraud can lead to compensation necessary to rectify any harms that come from the fraud, as well as punitive damages in egregious cases. Fraud is difficult to prove, and our examples do not clearly indicate fraud, but it is theoretically possible if someone is selling snake oil. Fraud can lead to criminal liability as well.

Finally, other areas of law that are already involved in the accountability discussion, such as discrimination and due process, become much easier cases to make when the AI doesn’t work. Disparate impact law requires that the AI tool used be adequately predictive of the desired outcome, before even getting into the question of whether it is too discriminatory or not. A lack of construct validity would easily subject a model’s user to liability. Due process requires decisions to not be arbitrary, and AI that doesn’t work loses its claim to making decisions on a sound basis. Where AI doesn’t work, legal cases in general become easier.

Organizational interventions

In addition to legal levers, there are many organizational interventions that can be deployed to address the range of functionality issues discussed. Due to clear conflicts of interest, the self-regulatory approaches described are far from adequate oversight for these challenges, and the presence of regulation does a lot to incentivise organizations to take these actions in the first place. However, they do provide an immediate path forward in addressing these issues.

Internal Audits & Documentation

After similar crises of performance in fields such as aerospace, finance and medicine, such processes evolved in those industries to enforce a new level of introspection in the form of internal audits. Taking the form of anything from documentation exercises to challenge datasets as benchmarks, these processes raised the bar for deployment criteria and matured the product development pipeline in the process. The AI field could certainly adopt similar techniques for increasing the scrutiny of their systems, especially given the nascent state of reflection and standardization common in ML evaluation processes. For example, the “Failure modes, effects, and diagnostic analysis (FMEDA)’’ documentation process from the aerospace industry could support the identification of functional safety issues prior to AI deployment, in addition to other resources from aerospace (such as the functional hazard analyses (FHA) or Functional Design Assurance Levels (FDALS)).

Ultimately, internal audits are a self-regulatory approach—though audits conducted by independent second parties such as a consultancy firm could provide a fresh perspective on quality control and performance in reference to articulated organizational expectations. The challenge with such audits, however, is that the results are rarely communicated externally and disclosure is not mandatory, nor is it incentivized. As a result, assessment outcomes are mainly for internal use only, often just to set internal quality assurance standards for deployment and prompt further engineering reflection during the evaluation process.

Product Certification & Standards

A trickier intervention is the avenue of product certification and standards development for AI products. This concept has already made its way into AI policy discourse; CEN (European Committee for Standardisation) and CENELEC (European Committee for Electrotechnical Standardisation), two of three European Standardisation Organisations (ESOs) were heavily involved in the creation of the EU’s draft AI Act. On the U.S. front, industry groups IEEE and ISO regularly shape conversations, with IEEE going so far as to attempt the development of a certification program. In the aviation industry, much of the establishment of engineering standards happened without active government intervention, between industry peers. These efforts resemble the Partnership on AI’s attempt to establish norms on model documentation processes. Collective industry-wide decision-making on critical issues can raise the bar for the entire industry and raise awareness within the industry of the importance of handling functionality challenges. Existing functional safety standards from the automobile (ISO 26262), aerospace (US RTCA DO-178C), defense (MIL-STD-882E) and electronics (IEEE IEC 61508 / IEC 61511) industries, amongst others, can provide a template on how to approach this challenge within the AI industry.

Other Interventions

There are several other organizational factors that can determine and assess the functional safety of a system. As a client making decisions on which projects to select, or permit for purchase, it can be good to set performance related requirements for procurement and leverage this procurement process in order to set expectations for functionality. Similarly, cultural expectations for safety and engineering responsibility impact the quality of the output from the product development process – setting these expectations internally and fostering a healthy safety culture can increase cooperation on other industry-wide and organizational measures. Also, as functionality is a safety risk aligned with profit-oriented goals, many model logging and evaluation operations tools are available for organizations to leverage in the internal inspection of their systems – including tools for more continuous monitoring of deployed systems.

Conclusion : The Road Ahead

We cannot take for granted that AI products work. Buying into the presented narrative of a product with at least basic utility or an industry that will soon enough “inevitably’’ overcome known functional issues causes us to miss important sources of harm and available legal and organizational remedies. Although functionality issues are not completely ignored in AI policy, the lack of awareness of the range in which these issues arise leads to the problems being inadequately emphasized and poorly addressed by the full scope of accountability tools available.

The fact that faulty AI products are on the market today makes this problem particularly urgent. Poorly vetted products permeate our lives, and while many readily accept the potential for harms as a tradeoff, the claims of the products’ benefits go unchallenged. But addressing functionality involves more than calling out demonstrably broken products. It also means challenging those who develop AI systems to better and more honestly understand, explore, and articulate the limits of their products prior to their release into the market or public use. Adequate assessment and communication of functionality should be a minimum requirement for mass deployment of algorithmic systems. Products that do not function should not have the opportunity to affect people’s lives.

We thank the Mozilla Foundation and the Algorithmic Justice League for providing financial support during this project.

Bibliography

   1@article{shankar2021towards,
   2  year = {2021},
   3  journal = {arXiv preprint arXiv:2108.13557},
   4  author = {Shankar, Shreya and Parameswaran, Aditya},
   5  title = {Towards Observability for Machine Learning Pipelines},
   6}
   7
   8@article{mulligan2019procurement,
   9  publisher = {HeinOnline},
  10  year = {2019},
  11  pages = {773},
  12  volume = {34},
  13  journal = {Berkeley Tech. LJ},
  14  author = {Mulligan, Deirdre K and Bamberger, Kenneth A},
  15  title = {Procurement as policy: Administrative process for machine learning},
  16}
  17
  18@article{richardson2021best,
  19  year = {2021},
  20  journal = {Available at SSRN 3855637},
  21  author = {Richardson, Rashida},
  22  title = {Best Practices for Government Procurement of Data-Driven Technologies},
  23}
  24
  25@article{rubenstein2021acquiring,
  26  year = {2021},
  27  volume = {73},
  28  journal = {Florida Law Review},
  29  author = {Rubenstein, David S},
  30  title = {Acquiring ethical AI},
  31}
  32
  33@inproceedings{bender2020climbing,
  34  year = {2020},
  35  pages = {5185--5198},
  36  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  37  author = {Bender, Emily M and Koller, Alexander},
  38  title = {Climbing towards NLU: On meaning, form, and understanding in the age of data},
  39}
  40
  41@misc{menegus2019defense,
  42  publisher = {Gizmodo},
  43  year = {2019},
  44  author = {Menegus, Brian},
  45  title = {Defense of amazon’s face recognition tool undermined by its only known police client},
  46}
  47
  48@article{garvie2019garbage,
  49  year = {2019},
  50  journal = {Georgetown Law Center on Privacy \& Technology},
  51  author = {Garvie, Clare},
  52  title = {Garbage in, Garbage out. Face recognition on flawed data},
  53}
  54
  55@misc{uber-crash,
  56  url = {https://ntsb.gov/investigations/Pages/HWY18FH010.aspx},
  57  file = {https://ntsb.gov/investigations/Pages/HWY18FH010.aspx},
  58  year = {2018},
  59  institution = {National Transportation Safety Board},
  60  number = {HWY18MH010},
  61  title = {Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian},
  62  author = {National Transportation Safety Board},
  63}
  64
  65@misc{uber-crash-2,
  66  url = {https://www.ntsb.gov/news/press-releases/Pages/NR20191119c.aspx},
  67  file = {https://www.ntsb.gov/news/press-releases/Pages/NR20191119c.aspx},
  68  year = {2019},
  69  institution = {National Transportation Safety Board},
  70  number = {HWY18MH010},
  71  title = {‘Inadequate Safety Culture’ Contributed to Uber Automated Test Vehicle Crash - NTSB Calls for Federal Review Process for Automated Vehicle Testing on Public Roads},
  72  author = {Eric Weiss},
  73}
  74
  75@misc{tesla-crash,
  76  url = {https://ntsb.gov/investigations/Pages/HWY18FH010.aspx},
  77  file = {https://ntsb.gov/investigations/Pages/HWY18FH010.aspx},
  78  year = {2017},
  79  institution = {National Transportation Safety Board},
  80  number = {HWY16FH018},
  81  title = {Collision Between a Car Operating With Automated Vehicle Control Systems
  82and a Tractor-Semitrailer Truck},
  83  author = {National Transportation Safety Board},
  84}
  85
  86@misc{tesla-crash-2,
  87  url = {https://www.ntsb.gov/news/press-releases/pages/pr20170912.aspx},
  88  file = {https://www.ntsb.gov/news/press-releases/pages/pr20170912.aspx},
  89  year = {2017},
  90  institution = {National Transportation Safety Board},
  91  number = {HWY16FH018},
  92  title = {Driver Errors, Overreliance on Automation, Lack of Safeguards, Led to Fatal Tesla Crash},
  93  author = {National Transportation Safety Board},
  94}
  95
  96@article{ratner2019mlsys,
  97  year = {2019},
  98  journal = {arXiv preprint arXiv:1904.03257},
  99  author = {Ratner, Alexander and Alistarh, Dan and Alonso, Gustavo and Andersen, David G and Bailis, Peter and Bird, Sarah and Carlini, Nicholas and Catanzaro, Bryan and Chayes, Jennifer and Chung, Eric and others},
 100  title = {MLSys: The new frontier of machine learning systems},
 101}
 102
 103@inproceedings{bender2020climbing,
 104  year = {2020},
 105  pages = {5185--5198},
 106  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 107  author = {Bender, Emily M and Koller, Alexander},
 108  title = {Climbing towards NLU: On meaning, form, and understanding in the age of data},
 109}
 110
 111@article{sloane2022silicon,
 112  publisher = {Elsevier},
 113  year = {2022},
 114  pages = {100425},
 115  number = {2},
 116  volume = {3},
 117  journal = {Patterns},
 118  author = {Sloane, Mona and Moss, Emanuel and Chowdhury, Rumman},
 119  title = {A Silicon Valley love triangle: Hiring algorithms, pseudo-science, and the quest for auditability},
 120}
 121
 122@article{nayak2019understanding,
 123  year = {2019},
 124  volume = {295},
 125  journal = {The Keyword},
 126  author = {Nayak, Pandu},
 127  title = {Understanding searches better than ever before},
 128}
 129
 130@misc{covid_us,
 131  publisher = {POLITICO},
 132  year = {2020},
 133  author = {Overly, Steven},
 134  title = {White House seeks Silicon Valley help battling coronavirus},
 135}
 136
 137@misc{covid_china,
 138  publisher = {Center for Security and Emerging Technology},
 139  year = {2020},
 140  author = {Weinstein, Emily },
 141  title = {China's Use of AI in its COVID-19 Response},
 142}
 143
 144@article{covid_greece,
 145  author = {Nature Editorial},
 146  language = {en},
 147  year = {2021},
 148  pages = {447--448},
 149  number = {7877},
 150  volume = {597},
 151  journal = {Nature},
 152  title = {Greece used {AI} to curb {COVID}: what other nations can learn},
 153}
 154
 155@article{covid_africa,
 156  year = {2021},
 157  journal = {Available at SSRN 3787748},
 158  author = {Mellado, Bruce and Wu, Jianhong and Kong, Jude Dzevela and Bragazzi, Nicola Luigi and Asgary, Ali and Kawonga, Mary and Choma, Nalomotse and Hayasi, Kentaro and Lieberman, Benjamin and Mathaha, Thuso and others},
 159  title = {Leveraging Artificial Intelligence and Big Data to optimize COVID-19 clinical public health and vaccination roll-out strategies in Africa},
 160}
 161
 162@article{covid_intl,
 163  publisher = {Multidisciplinary Digital Publishing Institute},
 164  year = {2020},
 165  pages = {156--165},
 166  number = {2},
 167  volume = {1},
 168  journal = {AI},
 169  author = {Allam, Zaheer and Dey, Gourav and Jones, David S},
 170  title = {Artificial intelligence (AI) provided early detection of the coronavirus (COVID-19) in China and will influence future Urban health policy internationally},
 171}
 172
 173@article{havens2019principles,
 174  publisher = {IEEE},
 175  year = {2019},
 176  pages = {69--72},
 177  number = {4},
 178  volume = {52},
 179  journal = {Computer},
 180  author = {Havens, John C and Hessami, Ali},
 181  title = {From Principles and Standards to Certification},
 182}
 183
 184@inproceedings{ieee_cert,
 185  organization = {IEEE},
 186  year = {2018},
 187  pages = {3--18},
 188  booktitle = {2018 IEEE Symposium on Security and Privacy (SP)},
 189  author = {Gehr, Timon and Mirman, Matthew and Drachsler-Cohen, Dana and Tsankov, Petar and Chaudhuri, Swarat and Vechev, Martin},
 190  title = {Ai2: Safety and robustness certification of neural networks with abstract interpretation},
 191}
 192
 193@article{raji2019ml,
 194  year = {2019},
 195  journal = {arXiv preprint arXiv:1912.06166},
 196  author = {Raji, Inioluwa Deborah and Yang, Jingying},
 197  title = {About ml: Annotation and benchmarking on understanding and transparency of machine learning lifecycles},
 198}
 199
 200@article{sloane2021ai,
 201  year = {2021},
 202  author = {Sloane, Mona and Chowdhury, Rumman and Havens, John C and Lazovich, Tomo and Rincon Alba, Luis},
 203  title = {AI and Procurement-A Primer},
 204}
 205
 206@article{UNESCO,
 207  address = {Paris, France},
 208  publisher = {UNESCO},
 209  year = {2022},
 210  journal = {Missing Links in AI Policy},
 211  title = {Change From the Outside: Towards Credible Third-Party Audits of AI Systems},
 212  author = {Raji, Inioluwa Deborah and Costanza-Chock, Sasha and Buolamwini, Joy},
 213}
 214
 215@inproceedings{raji2020closing,
 216  year = {2020},
 217  pages = {33--44},
 218  booktitle = {Proceedings of the 2020 conference on fairness, accountability, and transparency},
 219  author = {Raji, Inioluwa Deborah and Smart, Andrew and White, Rebecca N and Mitchell, Margaret and Gebru, Timnit and Hutchinson, Ben and Smith-Loud, Jamila and Theron, Daniel and Barnes, Parker},
 220  title = {Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing},
 221}
 222
 223@book{roland1991system,
 224  publisher = {John Wiley \& Sons},
 225  year = {1991},
 226  author = {Roland, Harold E and Moriarty, Brian},
 227  title = {System safety engineering and management},
 228}
 229
 230@book{smith2004functional,
 231  publisher = {Routledge},
 232  year = {2004},
 233  author = {Smith, David and Simpson, Kenneth},
 234  title = {Functional safety},
 235}
 236
 237@article{harris2019ntsb,
 238  year = {2019},
 239  journal = {IEEE Spectrum},
 240  author = {Harris, M},
 241  title = {NTSB investigation into deadly Uber self-driving car crash reveals lax attitude toward safety},
 242}
 243
 244@article{benjamens2020state,
 245  publisher = {Nature Publishing Group},
 246  year = {2020},
 247  pages = {1--8},
 248  number = {1},
 249  volume = {3},
 250  journal = {NPJ digital medicine},
 251  author = {Benjamens, Stan and Dhunnoo, Pranavsingh and Mesk{\'o}, Bertalan},
 252  title = {The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database},
 253}
 254
 255@article{rivera2020guidelines,
 256  publisher = {British Medical Journal Publishing Group},
 257  year = {2020},
 258  volume = {370},
 259  journal = {bmj},
 260  author = {Rivera, Samantha Cruz and Liu, Xiaoxuan and Chan, An-Wen and Denniston, Alastair K and Calvert, Melanie J},
 261  title = {Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension},
 262}
 263
 264@article{liu2020reporting,
 265  publisher = {British Medical Journal Publishing Group},
 266  year = {2020},
 267  volume = {370},
 268  journal = {bmj},
 269  author = {Liu, Xiaoxuan and Rivera, Samantha Cruz and Moher, David and Calvert, Melanie J and Denniston, Alastair K},
 270  title = {Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension},
 271}
 272
 273@article{wu2021medical,
 274  publisher = {Nature Publishing Group},
 275  year = {2021},
 276  pages = {582--584},
 277  number = {4},
 278  volume = {27},
 279  journal = {Nature Medicine},
 280  author = {Wu, Eric and Wu, Kevin and Daneshjou, Roxana and Ouyang, David and Ho, Daniel E and Zou, James},
 281  title = {How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals},
 282}
 283
 284@book{heppenheimer1995turbulent,
 285  publisher = {Wiley New York},
 286  year = {1995},
 287  author = {Heppenheimer, Thomas A and Heppenheimer, Ta},
 288  title = {Turbulent skies: the history of commercial aviation},
 289}
 290
 291@book{silver2012signal,
 292  publisher = {Penguin},
 293  year = {2012},
 294  author = {Silver, Nate},
 295  title = {The signal and the noise: why so many predictions fail--but some don't},
 296}
 297
 298@book{vinsel2019moving,
 299  publisher = {JHU Press},
 300  year = {2019},
 301  author = {Vinsel, Lee},
 302  title = {Moving Violations: Automobiles, Experts, and Regulations in the United States},
 303}
 304
 305@article{nader1965unsafe,
 306  year = {1965},
 307  author = {Nader, Ralph},
 308  title = {Unsafe at any speed. The designed-in dangers of the American automobile},
 309}
 310
 311@book{bausell2009snake,
 312  publisher = {Oxford University Press},
 313  year = {2009},
 314  author = {Bausell, R Barker},
 315  title = {Snake oil science: The truth about complementary and alternative medicine},
 316}
 317
 318@book{anderson2015snake,
 319  publisher = {McFarland},
 320  year = {2015},
 321  author = {Anderson, Ann},
 322  title = {Snake oil, hustlers and hambones: the American medicine show},
 323}
 324
 325@book{blum2018poison,
 326  publisher = {Penguin},
 327  year = {2018},
 328  author = {Blum, Deborah},
 329  title = {The Poison Squad: One Chemist's Single-minded Crusade for Food Safety at the Turn of the Twentieth Century},
 330}
 331
 332@article{moradi2021evaluating,
 333  year = {2021},
 334  journal = {arXiv preprint arXiv:2108.12237},
 335  author = {Moradi, Milad and Samwald, Matthias},
 336  title = {Evaluating the robustness of neural language models to input perturbations},
 337}
 338
 339@article{pruthi2019combating,
 340  year = {2019},
 341  journal = {arXiv preprint arXiv:1905.11268},
 342  author = {Pruthi, Danish and Dhingra, Bhuwan and Lipton, Zachary C},
 343  title = {Combating adversarial misspellings with robust word recognition},
 344}
 345
 346@article{berger2019mta,
 347  year = {2019},
 348  journal = {The Wall Street Journal},
 349  author = {Berger, Paul},
 350  title = {MTA’s Initial Foray Into Facial Recognition at High Speed Is a Bust},
 351}
 352
 353@article{adadi2018peeking,
 354  publisher = {IEEE},
 355  year = {2018},
 356  pages = {52138--52160},
 357  volume = {6},
 358  journal = {IEEE access},
 359  author = {Adadi, Amina and Berrada, Mohammed},
 360  title = {Peeking inside the black-box: a survey on explainable artificial intelligence (XAI)},
 361}
 362
 363@inproceedings{bhatt2020explainable,
 364  year = {2020},
 365  pages = {648--657},
 366  booktitle = {Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency},
 367  author = {Bhatt, Umang and Xiang, Alice and Sharma, Shubham and Weller, Adrian and Taly, Ankur and Jia, Yunhan and Ghosh, Joydeep and Puri, Ruchir and Moura, Jos{\'e} MF and Eckersley, Peter},
 368  title = {Explainable machine learning in deployment},
 369}
 370
 371@misc{OED,
 372  howpublished = {\url{https://www.oed.com/view/Entry/54950742}},
 373  year = {2021},
 374  publisher = {Oxford University Press},
 375  key = {OED Online},
 376}
 377
 378@misc{AIAAIC,
 379  howpublished = {\url{https://www.aiaaic.org/}},
 380  year = {{2021}},
 381  author = {{Charlie Pownall}},
 382  title = {AI, Algorithmic and Automation Incident and Controversy Repository (AIAAIC)},
 383}
 384
 385@misc{digiwatch,
 386  howpublished = {\url{https://dig.watch/trends/covid-19-crisis-digital-policy-overview/}},
 387  year = {{2021}},
 388  author = {Digwatch},
 389  publisher = {{Digwatch}},
 390  title = {The COVID-19 crisis: A digital policy overview},
 391}
 392
 393@article{krass2021us,
 394  publisher = {British Medical Journal Publishing Group},
 395  year = {2021},
 396  volume = {372},
 397  journal = {bmj},
 398  author = {Krass, Mark and Henderson, Peter and Mello, Michelle M and Studdert, David M and Ho, Daniel E},
 399  title = {How US law will evaluate artificial intelligence for covid-19},
 400}
 401
 402@inproceedings{raghavan2020mitigating,
 403  year = {2020},
 404  pages = {469--481},
 405  booktitle = {Proceedings of the 2020 conference on fairness, accountability, and transparency},
 406  author = {Raghavan, Manish and Barocas, Solon and Kleinberg, Jon and Levy, Karen},
 407  title = {Mitigating bias in algorithmic hiring: Evaluating claims and practices},
 408}
 409
 410@inproceedings{wilson2021building,
 411  year = {2021},
 412  pages = {666--677},
 413  booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
 414  author = {Wilson, Christo and Ghosh, Avijit and Jiang, Shan and Mislove, Alan and Baker, Lewis and Szary, Janelle and Trindel, Kelly and Polli, Frida},
 415  title = {Building and auditing fair algorithms: A case study in candidate screening},
 416}
 417
 418@misc{engler2021independent,
 419  year = {2021},
 420  author = {Engler, Alex C},
 421  title = {Independent auditors are struggling to hold AI companies accountable. FastCompany},
 422}
 423
 424@incollection{shaban2021explainability,
 425  publisher = {Springer},
 426  year = {2021},
 427  pages = {1--10},
 428  booktitle = {Explainable AI in Healthcare and Medicine},
 429  author = {Shaban-Nejad, Arash and Michalowski, Martin and Buckeridge, David L},
 430  title = {Explainability and Interpretability: Keys to Deep Medicine},
 431}
 432
 433@misc{missing_link_xai,
 434  howpublished = {\url{https://www.h2o.ai/blog/interpretability-the-missing-link-between-machine-learning-healthcare-and-the-fda/}},
 435  year = {2018},
 436  author = {Andrew Langsner and Patrick Hall},
 437  publisher = {H2O.ai Blog},
 438  title = {Interpretability: The missing link between machine learning, healthcare, and the FDA?},
 439}
 440
 441@misc{FDA,
 442  howpublished = {\url{https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles}},
 443  year = {{2021}},
 444  author = {U.S. Food and Drug Administration},
 445  publisher = {{U.S. Food and Drug Administration}},
 446  title = {Good Machine Learning Practice for Medical Device Development: Guiding Principles},
 447}
 448
 449@misc{GDPR,
 450  year = {2016},
 451  title = {Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)},
 452  key = {General Data Protection Regulation},
 453}
 454
 455@article{maccarthy2019examination,
 456  year = {2019},
 457  journal = {Available at SSRN 3615731},
 458  author = {MacCarthy, Mark},
 459  title = {An Examination of the Algorithmic Accountability Act of 2019},
 460}
 461
 462@inproceedings{feldman2015certifying,
 463  year = {2015},
 464  pages = {259--268},
 465  booktitle = {proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining},
 466  author = {Feldman, Michael and Friedler, Sorelle A and Moeller, John and Scheidegger, Carlos and Venkatasubramanian, Suresh},
 467  title = {Certifying and removing disparate impact},
 468}
 469
 470@article{veale_euact,
 471  publisher = {Verlag Dr. Otto Schmidt},
 472  year = {2021},
 473  pages = {97--112},
 474  number = {4},
 475  volume = {22},
 476  journal = {Computer Law Review International},
 477  author = {Veale, Michael and Borgesius, Frederik Zuiderveen},
 478  title = {Demystifying the Draft EU Artificial Intelligence Act—Analysing the good, the bad, and the unclear elements of the proposed approach},
 479}
 480
 481@misc{goog_search_fail,
 482  howpublished = {\url{https://itwire.com/home-it/how-a-google-search-could-end-up-endangering-a-life.html}},
 483  year = {{2021}},
 484  author = {Sam Varghese},
 485  publisher = {{ITWire}},
 486  title = {How a Google search could end up endangering a life},
 487}
 488
 489@article{ettinger2020bert,
 490  publisher = {MIT Press},
 491  year = {2020},
 492  pages = {34--48},
 493  volume = {8},
 494  journal = {Transactions of the Association for Computational Linguistics},
 495  author = {Ettinger, Allyson},
 496  title = {What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models},
 497}
 498
 499@misc{llmopenai,
 500  primaryclass = {cs.CL},
 501  archiveprefix = {arXiv},
 502  eprint = {1908.09203},
 503  year = {2019},
 504  author = {Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert-Voss and Jeff Wu and Alec Radford and Gretchen Krueger and Jong Wook Kim and Sarah Kreps and Miles McCain and Alex Newhouse and Jason Blazakis and Kris McGuffie and Jasmine Wang},
 505  title = {Release Strategies and the Social Impacts of Language Models},
 506}
 507
 508@article{llmdeepmind,
 509  year = {2021},
 510  journal = {arXiv preprint arXiv:2112.04359},
 511  author = {Weidinger, Laura and Mellor, John and Rauh, Maribeth and Griffin, Conor and Uesato, Jonathan and Huang, Po-Sen and Cheng, Myra and Glaese, Mia and Balle, Borja and Kasirzadeh, Atoosa and others},
 512  title = {Ethical and social risks of harm from Language Models},
 513}
 514
 515@article{brundage2018malicious,
 516  year = {2018},
 517  journal = {arXiv preprint arXiv:1802.07228},
 518  author = {Brundage, Miles and Avin, Shahar and Clark, Jack and Toner, Helen and Eckersley, Peter and Garfinkel, Ben and Dafoe, Allan and Scharre, Paul and Zeitzoff, Thomas and Filar, Bobby and others},
 519  title = {The malicious use of artificial intelligence: Forecasting, prevention, and mitigation},
 520}
 521
 522@article{jaques2019moral,
 523  year = {2019},
 524  volume = {10},
 525  journal = {University of Miami School of Law},
 526  author = {Jaques, Abby Everett},
 527  title = {Why the moral machine is a monster},
 528}
 529
 530@article{talat2021word,
 531  year = {2021},
 532  journal = {arXiv preprint arXiv:2111.04158},
 533  author = {Talat, Zeerak and Blix, Hagen and Valvoda, Josef and Ganesh, Maya Indira and Cotterell, Ryan and Williams, Adina},
 534  title = {A Word on Machine Ethics: A Response to Jiang et al.(2021)},
 535}
 536
 537@article{jiang2021delphi,
 538  year = {2021},
 539  journal = {arXiv preprint arXiv:2110.07574},
 540  author = {Jiang, Liwei and Hwang, Jena D and Bhagavatula, Chandra and Bras, Ronan Le and Forbes, Maxwell and Borchardt, Jon and Liang, Jenny and Etzioni, Oren and Sap, Maarten and Choi, Yejin},
 541  title = {Delphi: Towards machine ethics and norms},
 542}
 543
 544@article{awad2018moral,
 545  publisher = {Nature Publishing Group},
 546  year = {2018},
 547  pages = {59--64},
 548  number = {7729},
 549  volume = {563},
 550  journal = {Nature},
 551  author = {Awad, Edmond and Dsouza, Sohan and Kim, Richard and Schulz, Jonathan and Henrich, Joseph and Shariff, Azim and Bonnefon, Jean-Fran{\c{c}}ois and Rahwan, Iyad},
 552  title = {The moral machine experiment},
 553}
 554
 555@misc{shane2019janelle,
 556  year = {2019},
 557  author = {Shane, J},
 558  title = {Janelle Shane: The danger of AI is weirder than you think TED Talk, 10: 20. Katsottu 8.8, 2020},
 559}
 560
 561@article{brundage2015taking,
 562  publisher = {Elsevier},
 563  year = {2015},
 564  pages = {32--35},
 565  volume = {72},
 566  journal = {Futures},
 567  author = {Brundage, Miles},
 568  title = {Taking superintelligence seriously: Superintelligence: Paths, dangers, strategies by Nick Bostrom (Oxford University Press, 2014)},
 569}
 570
 571@book{bostrom2014superintelligence,
 572  publisher = {Oxford University Press},
 573  year = {2014},
 574  url = {https://books.google.com/books?id=7\_H8AwAAQBAJ},
 575  lccn = {2013955152},
 576  isbn = {9780199678112},
 577  author = {Bostrom, N.},
 578  title = {Superintelligence: Paths, Dangers, Strategies},
 579}
 580
 581@inproceedings{prunkl2020beyond,
 582  year = {2020},
 583  pages = {138--143},
 584  booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
 585  author = {Prunkl, Carina and Whittlestone, Jess},
 586  title = {Beyond near-and long-term: Towards a clearer account of research priorities in AI ethics and society},
 587}
 588
 589@article{atkinson2018going,
 590  publisher = {IUP Publications},
 591  year = {2018},
 592  pages = {7--56},
 593  number = {4},
 594  volume = {12},
 595  journal = {IUP Journal of Computer Sciences},
 596  author = {Atkinson, Robert D},
 597  title = {" It Is Going to Kill Us!" and Other Myths About the Future of Artificial Intelligence},
 598}
 599
 600@article{crawford2016artificial,
 601  year = {2016},
 602  number = {06},
 603  volume = {25},
 604  journal = {The New York Times},
 605  author = {Crawford, Kate},
 606  title = {Artificial intelligence’s white guy problem},
 607}
 608
 609@misc{covidfail_summ,
 610  publisher = {MIT Technology Review},
 611  year = {2021},
 612  author = {Heaven, Will Douglas},
 613  title = {Hundreds of AI tools have been built to catch covid. None of them helped},
 614}
 615
 616@misc{covidfail3,
 617  howpublished = {\url{https://www.turing.ac.uk/sites/default/files/2021-06/data-science-and-ai-in-the-age-of-covid_full-report_2.pdf}},
 618  year = {{2021}},
 619  author = {Inken von Borzyskowski, Anjali Mazumder, Bilal Mateen, Michael Wooldridge},
 620  publisher = {{The Alan Turing Institute}},
 621  title = {Data science and AI in the age of COVID-19},
 622}
 623
 624@article{covidfail2,
 625  publisher = {British Medical Journal Publishing Group},
 626  year = {2020},
 627  volume = {369},
 628  journal = {bmj},
 629  author = {Wynants, Laure and Van Calster, Ben and Collins, Gary S and Riley, Richard D and Heinze, Georg and Schuit, Ewoud and Bonten, Marc MJ and Dahly, Darren L and Damen, Johanna A and Debray, Thomas PA and others},
 630  title = {Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal},
 631}
 632
 633@article{covidfail1,
 634  publisher = {Nature Publishing Group},
 635  year = {2021},
 636  pages = {199--217},
 637  number = {3},
 638  volume = {3},
 639  journal = {Nature Machine Intelligence},
 640  author = {Roberts, Michael and Driggs, Derek and Thorpe, Matthew and Gilbey, Julian and Yeung, Michael and Ursprung, Stephan and Aviles-Rivero, Angelica I and Etmann, Christian and McCague, Cathal and Beer, Lucian and others},
 641  title = {Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans},
 642}
 643
 644@misc{NAIR,
 645  howpublished = {\url{https://hai.stanford.edu/sites/default/files/2022-01/HAI_NRCR_v17.pdf}},
 646  year = {{2021}},
 647  author = {Daniel E. Ho, Jennifer King, Russell C. Wald, Christopher Wan},
 648  publisher = {{Stanford University Human-centered Artificial Intelligence}},
 649  title = {Building a National AI Research Resource:  A Blueprint for the National Research Cloud},
 650}
 651
 652@misc{NMIP,
 653  howpublished = {\url{https://www.nhsx.nhs.uk/ai-lab/ai-lab-programmes/ai-in-imaging/national-medical-imaging-platform-nmip/}},
 654  year = {{2021}},
 655  author = {NHS AI Lab},
 656  publisher = {{NHSx}},
 657  title = {National Medical Imaging Platform (NMIP)},
 658}
 659
 660@article{democratization2,
 661  year = {2020},
 662  author = {Awasthi, Pranjal and George, Jordana J},
 663  title = {A case for Data Democratization},
 664}
 665
 666@inproceedings{democratization3,
 667  year = {2017},
 668  pages = {5--3},
 669  booktitle = {Datapower Conference Proceedings},
 670  author = {Garvey, Colin K},
 671  title = {On the Democratization of AI},
 672}
 673
 674@incollection{democratization,
 675  publisher = {transcript-Verlag},
 676  year = {2020},
 677  pages = {9--32},
 678  booktitle = {The Democratization of Artificial Intelligence},
 679  author = {Sudmann, Andreas},
 680  title = {The Democratization of Artificial Intelligence},
 681}
 682
 683@article{ahmed2020democratization,
 684  year = {2020},
 685  journal = {arXiv preprint arXiv:2010.15581},
 686  author = {Ahmed, Nur and Wahed, Muntasir},
 687  title = {The de-democratization of ai: Deep learning and the compute divide in artificial intelligence research},
 688}
 689
 690@article{yeung2020recommendation,
 691  publisher = {Cambridge University Press},
 692  year = {2020},
 693  pages = {27--34},
 694  number = {1},
 695  volume = {59},
 696  journal = {International Legal Materials},
 697  author = {Yeung, Karen},
 698  title = {Recommendation of the council on artificial intelligence (oecd)},
 699}
 700
 701@article{slota2020good,
 702  publisher = {Wiley Online Library},
 703  year = {2020},
 704  pages = {e275},
 705  number = {1},
 706  volume = {57},
 707  journal = {Proceedings of the Association for Information Science and Technology},
 708  author = {Slota, Stephen C and Fleischmann, Kenneth R and Greenberg, Sherri and Verma, Nitin and Cummings, Brenna and Li, Lan and Shenefiel, Chris},
 709  title = {Good systems, bad data?: Interpretations of AI hype and failures},
 710}
 711
 712@inproceedings{raji2020saving,
 713  year = {2020},
 714  pages = {145--151},
 715  booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
 716  author = {Raji, Inioluwa Deborah and Gebru, Timnit and Mitchell, Margaret and Buolamwini, Joy and Lee, Joonseok and Denton, Emily},
 717  title = {Saving face: Investigating the ethical concerns of facial recognition auditing},
 718}
 719
 720@article{barocas2021designing,
 721  year = {2021},
 722  journal = {arXiv preprint arXiv:2103.06076},
 723  author = {Barocas, Solon and Guo, Anhong and Kamar, Ece and Krones, Jacquelyn and Morris, Meredith Ringel and Vaughan, Jennifer Wortman and Wadsworth, Duncan and Wallach, Hanna},
 724  title = {Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs},
 725}
 726
 727@inproceedings{raji2019actionable,
 728  year = {2019},
 729  pages = {429--435},
 730  booktitle = {Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society},
 731  author = {Raji, Inioluwa Deborah and Buolamwini, Joy},
 732  title = {Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products},
 733}
 734
 735@article{richardson2021defining,
 736  year = {2021},
 737  journal = {Maryland Law Review, Forthcoming},
 738  author = {Richardson, Rashida},
 739  title = {Defining and Demystifying Automated Decision Systems},
 740}
 741
 742@article{narayanan2019recognize,
 743  publisher = {Massachusetts Institute of Technology},
 744  year = {2019},
 745  journal = {Arthur Miller Lecture on Science and Ethics},
 746  author = {Narayanan, Arvind},
 747  title = {How to recognize AI snake oil},
 748}
 749
 750@inproceedings{bender2021dangers,
 751  year = {2021},
 752  pages = {610--623},
 753  booktitle = {Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency},
 754  author = {Bender, Emily M and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret},
 755  title = {On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?},
 756}
 757
 758@article{tennant2021attachments,
 759  publisher = {SAGE Publications Sage UK: London, England},
 760  year = {2021},
 761  pages = {846--870},
 762  number = {6},
 763  volume = {51},
 764  journal = {Social Studies of Science},
 765  author = {Tennant, Chris and Stilgoe, Jack},
 766  title = {The attachments of ‘autonomous’ vehicles},
 767}
 768
 769@book{fake_ai,
 770  publisher = {Meatspace Press},
 771  year = {2021},
 772  author = {Kaltheuner, Frederike and Birhane, Abeba and Raji, Inioluwa Deborah and Amironesei, Razvan and Denton, Emily and Hanna, Alex and Nicole, Hilary and Smart, Andrew and Oduro, Serena Dokuaa and Vincent, James and Reben, Alexander and Milne, Gemma and Black, Crofton and Harvey, Adam and Strait, Andrew and Parida, Tulsi and Ashok, Aparna and Jansen, Fieke and Cath, Corinne and Peppin, Aidan},
 773  title = {Fake AI},
 774}
 775
 776@book{broussard2018artificial,
 777  publisher = {mit Press},
 778  year = {2018},
 779  author = {Broussard, Meredith},
 780  title = {Artificial unintelligence: How computers misunderstand the world},
 781}
 782
 783@article{sculley2015hidden,
 784  year = {2015},
 785  pages = {2503--2511},
 786  volume = {28},
 787  journal = {Advances in neural information processing systems},
 788  author = {Sculley, David and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Francois and Dennison, Dan},
 789  title = {Hidden technical debt in machine learning systems},
 790}
 791
 792@inproceedings{liao2021we,
 793  year = {2021},
 794  booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
 795  author = {Liao, Thomas and Taori, Rohan and Raji, Inioluwa Deborah and Schmidt, Ludwig},
 796  title = {Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning},
 797}
 798
 799@misc{kapoor_irreproducible_2021,
 800  urldate = {2021-07-28},
 801  url = {https://reproducible.cs.princeton.edu/},
 802  howpublished = {\url{https://reproducible.cs.princeton.edu/}},
 803  year = {2021},
 804  pages = {6},
 805  author = {Kapoor, Sayash and Narayanan, Arvind},
 806  language = {en},
 807  abstract = {The use of Machine Learning (ML) methods for prediction and forecasting has become widespread across the quantitative sciences. However, there are many known methodological pitfalls in ML-based research. As a case study of these pitfalls, we examine the subfield of civil war onset prediction in Political Science. Our main finding is that several recent studies published in top Political Science journals claiming superior performance of ML models over Logistic Regression models fail to reproduce. Our results provide two reasons to be skeptical of the use of ML methods in this research area, by both questioning their usefulness and highlighting the pitfalls of applying them correctly. Results identifying errors in studies that use ML methods have appeared in at least seven quantitative science fields. However, we go farther than most previous research to investigate whether the claims made in the reviewed studies survive once the errors are corrected. We argue that there is a reproducibility crisis brewing in research fields that use ML methods and discuss a few systemic interventions that could help resolve it.},
 808  title = {({Ir}){Reproducible} {Machine} {Learning}: {A} {Case} {Study}},
 809}
 810
 811@article{robertson2021engagement,
 812  year = {2021},
 813  journal = {arXiv preprint arXiv:2201.00074},
 814  author = {Robertson, Ronald E and Green, Jon and Ruck, Damian and Ognyanova, Katya and Wilson, Christo and Lazer, David},
 815  title = {Engagement Outweighs Exposure to Partisan and Unreliable News within Google Search},
 816}
 817
 818@article{badnews,
 819  howpublished = {\url{https://harpers.org/archive/2021/09/bad-news-selling-the-story-of-disinformation/}},
 820  year = {2021},
 821  journal = {Harper's Magazine},
 822  author = {Bernstein, Joseph},
 823  title = {Bad News},
 824}
 825
 826@article{hern2018cambridge,
 827  year = {2018},
 828  volume = {6},
 829  journal = {The Guardian},
 830  author = {Hern, Alex},
 831  title = {Cambridge Analytica: how did it turn clicks into votes},
 832}
 833
 834@article{gibney2018scant,
 835  year = {2018},
 836  journal = {Nature},
 837  author = {Gibney, Elizabeth},
 838  title = {The scant science behind Cambridge Analytica’s controversial marketing techniques},
 839}
 840
 841@article{matz2017psychological,
 842  publisher = {National Acad Sciences},
 843  year = {2017},
 844  pages = {12714--12719},
 845  number = {48},
 846  volume = {114},
 847  journal = {Proceedings of the national academy of sciences},
 848  author = {Matz, Sandra C and Kosinski, Michal and Nave, Gideon and Stillwell, David J},
 849  title = {Psychological targeting as an effective approach to digital mass persuasion},
 850}
 851
 852@book{hwang2020subprime,
 853  publisher = {FSG originals},
 854  year = {2020},
 855  author = {Hwang, Tim},
 856  title = {Subprime attention crisis: advertising and the time bomb at the heart of the Internet},
 857}
 858
 859@article{pineau2021improving,
 860  publisher = {Microtome Publishing},
 861  year = {2021},
 862  volume = {22},
 863  journal = {Journal of Machine Learning Research},
 864  author = {Pineau, Joelle and Vincent-Lamarre, Philippe and Sinha, Koustuv and Larivi{\`e}re, Vincent and Beygelzimer, Alina and d’Alch{\'e}-Buc, Florence and Fox, Emily and Larochelle, Hugo},
 865  title = {Improving reproducibility in machine learning research: a report from the NeurIPS 2019 reproducibility program},
 866}
 867
 868@article{firestone2020performance,
 869  publisher = {National Acad Sciences},
 870  year = {2020},
 871  pages = {26562--26571},
 872  number = {43},
 873  volume = {117},
 874  journal = {Proceedings of the National Academy of Sciences},
 875  author = {Firestone, Chaz},
 876  title = {Performance vs. competence in human--machine comparisons},
 877}
 878
 879@article{grover,
 880  year = {2021},
 881  journal = {arXiv preprint arXiv:2111.15366},
 882  author = {Raji, Inioluwa Deborah and Bender, Emily M and Paullada, Amandalynne and Denton, Emily and Hanna, Alex},
 883  title = {AI and the everything in the whole wide world benchmark},
 884}
 885
 886@article{diaz2021double,
 887  howpublished = {\url{https://www. brennancenter. org/sites/default/files/2021-08/Double\_Standards\_Content\_Moderation. pdf}},
 888  year = {2021},
 889  journal = {New York: Brennan Center for Justice},
 890  author = {D{\'\i}az, {\'A}ngel and Hecht, Laura},
 891  title = {Double Standards in Social Media Content Moderation},
 892}
 893
 894@inproceedings{oakden2020hidden,
 895  year = {2020},
 896  pages = {151--159},
 897  booktitle = {Proceedings of the ACM conference on health, inference, and learning},
 898  author = {Oakden-Rayner, Luke and Dunnmon, Jared and Carneiro, Gustavo and R{\'e}, Christopher},
 899  title = {Hidden stratification causes clinically meaningful failures in machine learning for medical imaging},
 900}
 901
 902@article{freeman2021use,
 903  publisher = {British Medical Journal Publishing Group},
 904  year = {2021},
 905  volume = {374},
 906  journal = {bmj},
 907  author = {Freeman, Karoline and Geppert, Julia and Stinton, Chris and Todkill, Daniel and Johnson, Samantha and Clarke, Aileen and Taylor-Phillips, Sian},
 908  title = {Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy},
 909}
 910
 911@misc{pain_wired,
 912  howpublished = {\url{https://www.wired.com/story/opioid-drug-addiction-algorithm-chronic-pain/}},
 913  year = {{2021}},
 914  author = {Szalavitz, Maia},
 915  publisher = {{Wired}},
 916  title = {The Pain Was Unbearable. So Why Did Doctors Turn Her Away?},
 917}
 918
 919@article{obermeyer2019dissecting,
 920  publisher = {American Association for the Advancement of Science},
 921  year = {2019},
 922  pages = {447--453},
 923  number = {6464},
 924  volume = {366},
 925  journal = {Science},
 926  author = {Obermeyer, Ziad and Powers, Brian and Vogeli, Christine and Mullainathan, Sendhil},
 927  title = {Dissecting racial bias in an algorithm used to manage the health of populations},
 928}
 929
 930@article{paige2020houston,
 931  publisher = {SAGE Publications Sage CA: Los Angeles, CA},
 932  year = {2020},
 933  pages = {350--359},
 934  number = {5},
 935  volume = {49},
 936  journal = {Educational Researcher},
 937  author = {Paige, Mark A and Amrein-Beardsley, Audrey},
 938  title = {“Houston, We Have a Lawsuit”: A Cautionary Tale for the Implementation of Value-Added Models for High-Stakes Employment Decisions},
 939}
 940
 941@article{richardson2019litigating,
 942  year = {2019},
 943  journal = {AI Now Institute, September},
 944  author = {Richardson, Rashida and Schultz, Jason M and Southerland, Vincent M},
 945  title = {Litigating Algorithms: 2019 US Report},
 946}
 947
 948@misc{fb_nudity,
 949  howpublished = {\url{https://www.businessinsider.com/facebook-mistakes-onions-for-sexualised-content-2020-10}},
 950  year = {{2020}},
 951  author = {Hamilton, Isobel Asher},
 952  publisher = {{Business Insider}},
 953  title = {Facebook's nudity-spotting AI mistook a photo of some onions for 'sexually suggestive' content},
 954}
 955
 956@misc{fb_hoes,
 957  howpublished = {\url{https://nypost.com/2021/07/20/facebook-cracks-down-on-discussing-hoes-in-gardening-group/}},
 958  year = {{2021}},
 959  author = {O’Neill, Jesse},
 960  publisher = {{New York Post}},
 961  title = {Facebook cracks down on discussing ‘hoes’ in gardening group},
 962}
 963
 964@article{tiktokerror,
 965  howpublished = {\url{https://i-d.vice.com/en_uk/article/m7epya/tiktoks-algorithm-reportedly-bans-creators-using-terms-black-and-blm}},
 966  year = {2021},
 967  journal = {The Verge},
 968  author = {Kpakima, Kumba},
 969  title = {Tiktok’s algorithm reportedly bans creators using terms 'Black' and 'BLM'},
 970}
 971
 972@article{lecher2018happens,
 973  year = {2018},
 974  journal = {The Verge},
 975  author = {Lecher, Colin},
 976  title = {What happens when an algorithm cuts your health care},
 977}
 978
 979@misc{alevels,
 980  publisher = {The Guardian},
 981  year = {2021},
 982  author = {Lamont, Tom},
 983  title = {The student and the algorithm: how the exam results fiasco threatened one pupil’s future},
 984}
 985
 986@article{kippin2021covid,
 987  publisher = {Springer},
 988  year = {2021},
 989  pages = {1--23},
 990  journal = {British Politics},
 991  author = {Kippin, Sean and Cairney, Paul},
 992  title = {The COVID-19 exams fiasco across the UK: four nations and two windows of opportunity},
 993}
 994
 995@article{hill2020wrongfully,
 996  year = {2020},
 997  volume = {24},
 998  journal = {The New York Times},
 999  author = {Hill, Kashmir},
1000  title = {Wrongfully accused by an algorithm},
1001}
1002
1003@article{kirchner2020access,
1004  journal = {The Markup},
1005  year = {2020},
1006  author = {Kirchner, Lauren and Goldstein, Matthew},
1007  title = {Access Denied: Faulty Automated Background Checks Freeze Out Renters},
1008}
1009
1010@article{kirchner2020automated,
1011  year = {2020},
1012  volume = {28},
1013  month = {May},
1014  journal = {The New York Times},
1015  author = {Kirchner, Lauren and Goldstein, Matthew},
1016  title = {How Automated Background Checks Freeze Out Renters},
1017}
1018
1019@misc{bankrupt_MIDAS,
1020  howpublished = {\url{https://www.freep.com/story/news/local/michigan/2019/12/22/government-artificial-intelligence-midas-computer-fraud-fiasco/4407901002/}},
1021  year = {{2019}},
1022  author = {Egan, Paul},
1023  booktitle = {{Detroit Free Press}},
1024  title = {State of Michigan's mistake led to man filing bankruptcy},
1025}
1026
1027@article{charette2018michigan,
1028  year = {2018},
1029  pages = {6},
1030  number = {3},
1031  volume = {18},
1032  journal = {IEEE Spectrum},
1033  author = {Charette, Robert},
1034  title = {Michigan’s MiDAS Unemployment System: Algorithm Alchemy Created Lead, Not Gold-IEEE Spectrum},
1035}
1036
1037@article{barocas2016big,
1038  publisher = {HeinOnline},
1039  year = {2016},
1040  pages = {671},
1041  volume = {104},
1042  journal = {Calif. L. Rev.},
1043  author = {Barocas, Solon and Selbst, Andrew D},
1044  title = {Big data's disparate impact},
1045}
1046
1047@article{kaminski2019right,
1048  year = {2019},
1049  pages = {189},
1050  volume = {34},
1051  journal = {Berkeley Technology Law Journal},
1052  author = {Kaminski, Margot E},
1053  title = {The Right to Explanation, Explained},
1054}
1055
1056@article{kaminski2021right,
1057  publisher = {JSTOR},
1058  year = {2021},
1059  pages = {1957--2048},
1060  number = {7},
1061  volume = {121},
1062  journal = {Columbia Law Review},
1063  author = {Kaminski, Margot E and Urban, Jennifer M},
1064  title = {The right to contest AI},
1065}
1066
1067@inproceedings{barocas2020hidden,
1068  year = {2020},
1069  pages = {80--89},
1070  booktitle = {Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency},
1071  author = {Barocas, Solon and Selbst, Andrew D and Raghavan, Manish},
1072  title = {The hidden assumptions behind counterfactual explanations and principal reasons},
1073}
1074
1075@article{edwards2017slave,
1076  publisher = {HeinOnline},
1077  year = {2017},
1078  pages = {18},
1079  volume = {16},
1080  journal = {Duke L. \& Tech. Rev.},
1081  author = {Edwards, Lilian and Veale, Michael},
1082  title = {Slave to the algorithm: Why a right to an explanation is probably not the remedy you are looking for},
1083}
1084
1085@article{selbst2017meaningful,
1086  publisher = {Oxford University Press},
1087  year = {2017},
1088  pages = {233--242},
1089  number = {4},
1090  volume = {7},
1091  journal = {International Data Privacy Law},
1092  author = {Selbst, Andrew D and Powles, Julia},
1093  title = {Meaningful information and the right to explanation},
1094}
1095
1096@article{wachter2017right,
1097  publisher = {Oxford University Press},
1098  year = {2017},
1099  pages = {76--99},
1100  number = {2},
1101  volume = {7},
1102  journal = {International Data Privacy Law},
1103  author = {Wachter, Sandra and Mittelstadt, Brent and Floridi, Luciano},
1104  title = {Why a right to explanation of automated decision-making does not exist in the general data protection regulation},
1105}
1106
1107@book{hartzog2018privacy,
1108  publisher = {Harvard University Press},
1109  year = {2018},
1110  author = {Hartzog, Woodrow},
1111  title = {Privacy’s blueprint},
1112}
1113
1114@article{selbst2020negligence,
1115  publisher = {HeinOnline},
1116  year = {2020},
1117  pages = {1315},
1118  volume = {100},
1119  journal = {BUL Rev.},
1120  author = {Selbst, Andrew D},
1121  title = {Negligence and AI's human users},
1122}
1123
1124@book{hoofnagle2016federal,
1125  publisher = {Cambridge University Press},
1126  year = {2016},
1127  author = {Hoofnagle, Chris Jay},
1128  title = {Federal Trade Commission: Privacy Law and Policy},
1129}
1130
1131@article{calo2015robotics,
1132  publisher = {HeinOnline},
1133  year = {2015},
1134  pages = {513},
1135  volume = {103},
1136  journal = {Calif. L. Rev.},
1137  author = {Calo, Ryan},
1138  title = {Robotics and the Lessons of Cyberlaw},
1139}
1140
1141@article{engstrom20133D,
1142  publisher = {HeinOnline},
1143  year = {2013},
1144  pages = {35},
1145  volume = {162},
1146  journal = {U. Pa. L. Rev. Online},
1147  author = {Engstrom, Nora Freeman},
1148  title = {3-D printing and product liability: identifying the obstacles},
1149}
1150
1151@article{zollers2004no,
1152  publisher = {HeinOnline},
1153  year = {2004},
1154  pages = {745},
1155  volume = {21},
1156  journal = {Santa Clara Computer \& High Tech. LJ},
1157  author = {Zollers, Frances E and McMullin, Andrew and Hurd, Sandra N and Shears, Peter},
1158  title = {No more soft landings for software: Liability for defects in an industry that has come of age},
1159}
1160
1161@misc{Winter,
1162  year = {1991},
1163  key = {Winter v. G.P. Putnam's Sons, 938 F.2d 1033 (9th Cir. 1991)},
1164}
1165
1166@article{hubbard2014sophisticated,
1167  publisher = {HeinOnline},
1168  year = {2014},
1169  pages = {1803},
1170  volume = {66},
1171  journal = {Fla. L. Rev.},
1172  author = {Hubbard, F Patrick},
1173  title = {Sophisticated robots: balancing liability, regulation, and innovation},
1174}
1175
1176@article{owen2001manufacturing,
1177  publisher = {HeinOnline},
1178  year = {2001},
1179  pages = {851},
1180  volume = {53},
1181  journal = {SCL Rev.},
1182  author = {Owen, David G},
1183  title = {Manufacturing Defects},
1184}
1185
1186@article{geistfeld2017roadmap,
1187  publisher = {HeinOnline},
1188  year = {2017},
1189  pages = {1611},
1190  volume = {105},
1191  journal = {Calif. L. Rev.},
1192  author = {Geistfeld, Mark A},
1193  title = {A roadmap for autonomous vehicles: State tort liability, automobile insurance, and federal safety regulation},
1194}
1195
1196@article{choi2019crashworthy,
1197  publisher = {HeinOnline},
1198  year = {2019},
1199  pages = {39},
1200  volume = {94},
1201  journal = {Wash. L. Rev.},
1202  author = {Choi, Bryan H},
1203  title = {Crashworthy code},
1204}
1205
1206@misc{ThirdRestatement_S2,
1207  year = {},
1208  key = {Restatement (Third) of Torts: Products Liability § 2},
1209}
1210
1211@misc{ThirdRestatement_S3,
1212  year = {},
1213  key = {Restatement (Third) of Torts: Products Liability § 3},
1214}
1215
1216@misc{UCC_2-314,
1217  year = {},
1218  key = {Uniform Commercial Code § 2-314},
1219}
1220
1221@misc{UCC_2-315,
1222  year = {},
1223  key = {Uniform Commercial Code § 2-315},
1224}
1225
1226@misc{Section_5,
1227  year = {},
1228  key = {Federal Trade Commission Act, 15 U.S.C. § 45},
1229}
1230
1231@misc{CFPB_jx,
1232  year = {},
1233  key = {12 U.S.C. § 5511},
1234}
1235
1236@misc{CPSC_about,
1237  title = {About Us},
1238  howpublished = {\url{https://www.cpsc.gov/About-CPSC}},
1239  author = {Consumer Product Safety Commission},
1240}
1241
1242@article{citron2007technological,
1243  publisher = {HeinOnline},
1244  year = {2007},
1245  pages = {1249},
1246  volume = {85},
1247  journal = {Wash. UL Rev.},
1248  author = {Citron, Danielle Keats},
1249  title = {Technological due process},
1250}
1251
1252@misc{Zhang,
1253  year = {2013},
1254  title = {\textup{Zhang v. Superior Ct., 304 P.3d 163 (2013)}},
1255  author = {},
1256}
1257
1258@misc{AMG_FTC,
1259  year = {2021},
1260  title = {\textup{AMG Capital Management v. Federal Trade Commission, 141 S.Ct. 1341}},
1261  author = {},
1262}
1263
1264@article{mcgeveran2018duty,
1265  publisher = {HeinOnline},
1266  year = {2018},
1267  pages = {1135},
1268  volume = {103},
1269  journal = {Minn. L. Rev.},
1270  author = {McGeveran, William},
1271  title = {The Duty of Data Security},
1272}
1273
1274@misc{Snapchat_consent_decree,
1275  year = {2014},
1276  title = {\textup{In re Snapchat, Inc., File No. 132-3078, Docket No. C-4501 (consent decree)}},
1277  author = {Federal Trade Commission},
1278}
1279
1280@misc{FB_consent_decree,
1281  year = {2019},
1282  title = {\textup{Stipulated Order for Civil Penalty, Monetary Judgment, and Injunctive Relief, No. 1:19-cv-2184, Docket 2-1 (D.D.C. July 24, 2019) (fining Facebook \$5 billion for violating a prior consent decree)}},
1283  author = {},
1284}
1285
1286@misc{FTC_Mag_Moss,
1287  month = {July 1},
1288  year = {2021},
1289  howpublished = {\url{https://www.ftc.gov/news-events/press-releases/2021/07/ftc-votes-update-rulemaking-procedures-sets-stage-stronger}},
1290  title = {FTC Votes to Update Rulemaking Procedures, Sets Stage for Stronger Deterrence of Corporate Misconduct},
1291  author = {Federal Trade Commission},
1292}
1293
1294@article{citron2016privacy,
1295  publisher = {HeinOnline},
1296  year = {2016},
1297  pages = {747},
1298  volume = {92},
1299  journal = {Notre Dame L. Rev.},
1300  author = {Citron, Danielle Keats},
1301  title = {The Privacy Policymaking of State Attorneys General},
1302}
1303
1304@techreport{NCLC_Report,
1305  month = {02},
1306  year = {2009},
1307  author = {Carter, Carolyn L.},
1308  institution = {National Consumer Law Center},
1309  title = {Consumer Protection in the States},
1310}
1311
1312@article{methods_in_the_magic,
1313  url = { 
1314https://doi.org/10.1080/03637751.2017.1375130
1315},
1316  doi = {10.1080/03637751.2017.1375130},
1317  publisher = {Routledge},
1318  year = {2018},
1319  pages = {57-80},
1320  number = {1},
1321  volume = {85},
1322  journal = {Communication Monographs},
1323  title = {Situating methods in the magic of Big Data and AI},
1324  author = {M. C. Elish and danah boyd},
1325}
1326
1327@article{ml_software_practices,
1328  doi = {10.1109/TSE.2019.2937083},
1329  pages = {1857-1871},
1330  number = {9},
1331  volume = {47},
1332  year = {2021},
1333  title = {How does Machine Learning Change Software Development Practices?},
1334  journal = {IEEE Transactions on Software Engineering},
1335  author = {Wan, Zhiyuan and Xia, Xin and Lo, David and Murphy, Gail C.},
1336}
1337
1338@misc{nao_ets,
1339  author = {National Audit Office},
1340  month = {Jul},
1341  year = {2020},
1342  journal = {National Audit Office},
1343  url = {https://www.nao.org.uk/press-release/investigation-into-the-response-to-cheating-in-english-language-tests/},
1344  title = {Investigation into the response to cheating in English language tests - national audit office (NAO) press release},
1345}
1346
1347@article{Mitchell2021-pk,
1348  year = {2021},
1349  month = {March},
1350  pages = {141--163},
1351  number = {1},
1352  volume = {8},
1353  publisher = {Annual Reviews},
1354  journal = {Annu. Rev. Stat. Appl.},
1355  abstract = {A recent wave of research has attempted to define fairness
1356quantitatively. In particular, this work has explored what
1357fairness might mean in the context of decisions based on the
1358predictions of statistical and machine learning models. The
1359rapid growth of this new field has led to wildly inconsistent
1360motivations, terminology, and notation, presenting a serious
1361challenge for cataloging and comparing definitions. This article
1362attempts to bring much-needed order. First, we explicate the
1363various choices and assumptions made?often implicitly?to justify
1364the use of prediction-based decision-making. Next, we show how
1365such choices and assumptions can raise fairness concerns and we
1366present a notationally consistent catalog of fairness
1367definitions from the literature. In doing so, we offer a concise
1368reference for thinking through the choices, assumptions, and
1369fairness considerations of prediction-based decision-making.},
1370  author = {Mitchell, Shira and Potash, Eric and Barocas, Solon and D'Amour,
1371Alexander and Lum, Kristian},
1372  title = {Algorithmic Fairness: Choices, Assumptions, and Definitions},
1373}
1374
1375@article{Raji2019-od,
1376  eprint = {1912.06166},
1377  primaryclass = {cs.CY},
1378  archiveprefix = {arXiv},
1379  year = {2019},
1380  month = {December},
1381  abstract = {We present the ``Annotation and Benchmarking on
1382Understanding and Transparency of Machine Learning
1383Lifecycles'' (ABOUT ML) project as an initiative to
1384operationalize ML transparency and work towards a standard
1385ML documentation practice. We make the case for the
1386project's relevance and effectiveness in consolidating
1387disparate efforts across a variety of stakeholders, as well
1388as bringing in the perspectives of currently missing voices
1389that will be valuable in shaping future conversations. We
1390describe the details of the initiative and the gaps we hope
1391this project will help address.},
1392  author = {Raji, Inioluwa Deborah and Yang, Jingying},
1393  title = {{ABOUT} {ML}: Annotation and Benchmarking on Understanding
1394and Transparency of Machine Learning Lifecycles},
1395}
1396
1397@inproceedings{Karimi2021-jo,
1398  location = {Virtual Event, Canada},
1399  keywords = {consequential recommendations, algorithmic recourse, explainable
1400artificial intelligence, causal inference, counterfactual
1401explanations, contrastive explanations, minimal interventions},
1402  address = {New York, NY, USA},
1403  year = {2021},
1404  month = {March},
1405  series = {FAccT '21},
1406  pages = {353--362},
1407  publisher = {Association for Computing Machinery},
1408  abstract = {As machine learning is increasingly used to inform consequential
1409decision-making (e.g., pre-trial bail and loan approval), it
1410becomes important to explain how the system arrived at its
1411decision, and also suggest actions to achieve a favorable
1412decision. Counterfactual explanations -``how the world would
1413have (had) to be different for a desirable outcome to occur''-
1414aim to satisfy these criteria. Existing works have primarily
1415focused on designing algorithms to obtain counterfactual
1416explanations for a wide range of settings. However, it has
1417largely been overlooked that ultimately, one of the main
1418objectives is to allow people to act rather than just
1419understand. In layman's terms, counterfactual explanations
1420inform an individual where they need to get to, but not how to
1421get there. In this work, we rely on causal reasoning to caution
1422against the use of counterfactual explanations as a
1423recommendable set of actions for recourse. Instead, we propose a
1424shift of paradigm from recourse via nearest counterfactual
1425explanations to recourse through minimal interventions, shifting
1426the focus from explanations to interventions.},
1427  author = {Karimi, Amir-Hossein and Sch{\"o}lkopf, Bernhard and Valera,
1428Isabel},
1429  booktitle = {Proceedings of the 2021 {ACM} Conference on Fairness,
1430Accountability, and Transparency},
1431  title = {Algorithmic Recourse: from Counterfactual Explanations to
1432Interventions},
1433}
1434
1435@article{Jobin2019-oa,
1436  language = {en},
1437  year = {2019},
1438  month = {September},
1439  pages = {389--399},
1440  number = {9},
1441  volume = {1},
1442  publisher = {Nature Publishing Group},
1443  journal = {Nature Machine Intelligence},
1444  abstract = {In the past five years, private companies, research institutions
1445and public sector organizations have issued principles and
1446guidelines for ethical artificial intelligence (AI). However,
1447despite an apparent agreement that AI should be `ethical', there
1448is debate about both what constitutes `ethical AI' and which
1449ethical requirements, technical standards and best practices are
1450needed for its realization. To investigate whether a global
1451agreement on these questions is emerging, we mapped and analysed
1452the current corpus of principles and guidelines on ethical AI.
1453Our results reveal a global convergence emerging around five
1454ethical principles (transparency, justice and fairness,
1455non-maleficence, responsibility and privacy), with substantive
1456divergence in relation to how these principles are interpreted,
1457why they are deemed important, what issue, domain or actors they
1458pertain to, and how they should be implemented. Our findings
1459highlight the importance of integrating guideline-development
1460efforts with substantive ethical analysis and adequate
1461implementation strategies. As AI technology develops rapidly, it
1462is widely recognized that ethical guidelines are required for
1463safe and fair implementation in society. But is it possible to
1464agree on what is `ethical AI'? A detailed analysis of 84 AI
1465ethics reports around the world, from national and international
1466organizations, companies and institutes, explores this question,
1467finding a convergence around core principles but substantial
1468divergence on practical implementation.},
1469  author = {Jobin, Anna and Ienca, Marcello and Vayena, Effy},
1470  title = {The global landscape of {AI} ethics guidelines},
1471}
1472
1473@misc{noonetrustai-xo,
1474  note = {Accessed: 2022-1-6},
1475  howpublished = {\url{https://cpr.unu.edu/publications/articles/ai-global-governance-no-one-should-trust-ai.html}},
1476  title = {{AI} \& Global Governance: No One Should Trust {AI} - United
1477Nations University Centre for Policy Research},
1478  author = {Bryson,Joanna},
1479}
1480
1481@unpublished{Stanton2021-oa,
1482  year = {2021},
1483  month = {March},
1484  author = {Stanton, Brian and Jensen, Theodore},
1485  title = {Trust and Artificial Intelligence},
1486}
1487
1488@misc{aclu-comment-trust,
1489  author = {ACLU},
1490  month = {September},
1491  year = {2021},
1492  note = {Accessed: 2022-1-6},
1493  howpublished = {\url{https://www.aclu.org/letter/aclu-comment-nists-proposal-managing-bias-ai}},
1494  title = {{ACLU} Comment on {NIST's} Proposal for Managing Bias in {AI}},
1495}
1496
1497@article{ieee_dictionary_dependability,
1498  keywords = {Standards;IEEE Standards;Patents;Software
1499measurement;Dictionaries;Warranties;Trademarks;availability;dependability;maintainability;and
1500reliability},
1501  year = {2006},
1502  month = {May},
1503  pages = {1--41},
1504  author = {IEEE},
1505  journal = {IEEE Std 982. 1-2005 (Revision of IEEE Std 982. 1-1988)},
1506  abstract = {A Standard Dictionary of Measures of the Software Aspects of
1507Dependability for assessing and predicting the reliability,
1508maintainability, and availability of any software system; in
1509particular, it applies to mission critical software systems.},
1510  title = {{IEEE} Standard Dictionary of Measures of the Software Aspects of
1511Dependability},
1512}
1513
1514@inproceedings{Passi2019-av,
1515  location = {Atlanta, GA, USA},
1516  keywords = {Problem Formulation, Machine Learning, Fairness, Data Science,
1517Target Variable},
1518  address = {New York, NY, USA},
1519  year = {2019},
1520  month = {January},
1521  series = {FAT* '19},
1522  pages = {39--48},
1523  publisher = {Association for Computing Machinery},
1524  abstract = {Formulating data science problems is an uncertain and difficult
1525process. It requires various forms of discretionary work to
1526translate high-level objectives or strategic goals into
1527tractable problems, necessitating, among other things, the
1528identification of appropriate target variables and proxies.
1529While these choices are rarely self-evident, normative
1530assessments of data science projects often take them for
1531granted, even though different translations can raise profoundly
1532different ethical concerns. Whether we consider a data science
1533project fair often has as much to do with the formulation of the
1534problem as any property of the resulting model. Building on six
1535months of ethnographic fieldwork with a corporate data science
1536team---and channeling ideas from sociology and history of
1537science, critical data studies, and early writing on knowledge
1538discovery in databases---we describe the complex set of actors
1539and activities involved in problem formulation. Our research
1540demonstrates that the specification and operationalization of
1541the problem are always negotiated and elastic, and rarely worked
1542out with explicit normative considerations in mind. In so doing,
1543we show that careful accounts of everyday data science work can
1544help us better understand how and why data science problems are
1545posed in certain ways---and why specific formulations prevail in
1546practice, even in the face of what might seem like normatively
1547preferable alternatives. We conclude by discussing the
1548implications of our findings, arguing that effective normative
1549interventions will require attending to the practical work of
1550problem formulation.},
1551  author = {Passi, Samir and Barocas, Solon},
1552  booktitle = {Proceedings of the Conference on Fairness, Accountability, and
1553Transparency},
1554  title = {Problem Formulation and Fairness},
1555}
1556
1557@article{Passi2020-dr,
1558  year = {2020},
1559  month = {July},
1560  pages = {2053951720939605},
1561  number = {2},
1562  volume = {7},
1563  publisher = {SAGE Publications Ltd},
1564  journal = {Big Data \& Society},
1565  abstract = {How are data science systems made to work? It may seem that
1566whether a system works is a function of its technical design,
1567but it is also accomplished through ongoing forms of
1568discretionary work by many actors. Based on six months of
1569ethnographic fieldwork with a corporate data science team, we
1570describe how actors involved in a corporate project negotiated
1571what work the system should do, how it should work, and how to
1572assess whether it works. These negotiations laid the foundation
1573for how, why, and to what extent the system ultimately worked.
1574We describe three main findings. First, how already-existing
1575technologies are essential reference points to determine how and
1576whether systems work. Second, how the situated resolution of
1577development challenges continually reshapes the understanding of
1578how and whether systems work. Third, how business goals, and
1579especially their negotiated balance with data science
1580imperatives, affect a system?s working. We conclude with
1581takeaways for critical data studies, orienting researchers to
1582focus on the organizational and cultural aspects of data
1583science, the third-party platforms underlying data science
1584systems, and ways to engage with practitioners? imagination of
1585how systems can and should work.},
1586  author = {Passi, Samir and Sengers, Phoebe},
1587  title = {Making data science systems work},
1588}
1589
1590@inproceedings{Muller2019-cy,
1591  location = {Glasgow, Scotland Uk},
1592  keywords = {work practice, data science},
1593  address = {New York, NY, USA},
1594  year = {2019},
1595  month = {May},
1596  series = {CHI EA '19},
1597  pages = {1--8},
1598  number = {Paper W15},
1599  publisher = {Association for Computing Machinery},
1600  abstract = {With the rise of big data, there has been an increasing need to
1601understand who is working in data science and how they are doing
1602their work. HCI and CSCW researchers have begun to examine these
1603questions. In this workshop, we invite researchers to share
1604their observations, experiences, hypotheses, and insights, in
1605the hopes of developing a taxonomy of work practices and open
1606issues in the behavioral and social study of data science and
1607data science workers.},
1608  author = {Muller, Michael and Feinberg, Melanie and George, Timothy and
1609Jackson, Steven J and John, Bonnie E and Kery, Mary Beth and
1610Passi, Samir},
1611  booktitle = {Extended Abstracts of the 2019 {CHI} Conference on Human Factors
1612in Computing Systems},
1613  title = {{Human-Centered} Study of Data Science Work Practices},
1614}
1615
1616@article{Passi2018-jt,
1617  keywords = {collaboration, organizational work, data science, trust,
1618credibility},
1619  address = {New York, NY, USA},
1620  year = {2018},
1621  month = {November},
1622  pages = {1--28},
1623  number = {CSCW},
1624  volume = {2},
1625  publisher = {Association for Computing Machinery},
1626  journal = {Proc. ACM Hum.-Comput. Interact.},
1627  abstract = {The trustworthiness of data science systems in applied and
1628real-world settings emerges from the resolution of specific
1629tensions through situated, pragmatic, and ongoing forms of work.
1630Drawing on research in CSCW, critical data studies, and history
1631and sociology of science, and six months of immersive
1632ethnographic fieldwork with a corporate data science team, we
1633describe four common tensions in applied data science work:
1634(un)equivocal numbers, (counter)intuitive knowledge,
1635(in)credible data, and (in)scrutable models. We show how
1636organizational actors establish and re-negotiate trust under
1637messy and uncertain analytic conditions through practices of
1638skepticism, assessment, and credibility. Highlighting the
1639collaborative and heterogeneous nature of real-world data
1640science, we show how the management of trust in applied
1641corporate data science settings depends not only on
1642pre-processing and quantification, but also on negotiation and
1643translation. We conclude by discussing the implications of our
1644findings for data science research and practice, both within and
1645beyond CSCW.},
1646  author = {Passi, Samir and Jackson, Steven J},
1647  title = {Trust in Data Science: Collaboration, Translation, and
1648Accountability in Corporate Data Science Projects},
1649}
1650
1651@misc{Lehr_undated-aq,
1652  note = {Accessed: 2021-8-10},
1653  howpublished = {\url{https://lawreview.law.ucdavis.edu/issues/51/2/Symposium/51-2_Lehr_Ohm.pdf}},
1654  author = {Lehr, David and Ohm, Paul},
1655  title = {Playing with the data: What legal scholars should learn about
1656machine learning},
1657}
1658
1659@article{Henke2018-ua,
1660  year = {2018},
1661  month = {February},
1662  journal = {Harvard Business Review},
1663  abstract = {It's easier for companies to train existing employees for it than
1664to hire new ones.},
1665  author = {Henke, Nicolaus and Levine, Jordan and McInerney, Paul},
1666  title = {You Don't Have to Be a Data Scientist to Fill This {Must-Have}
1667Analytics Role},
1668}
1669
1670@article{scalefactor,
1671  language = {en},
1672  year = {2020},
1673  month = {July},
1674  journal = {Forbes Magazine},
1675  abstract = {Kurt Rathmann told his big-name investors he had developed
1676groundbreaking AI to do the books for small businesses. In
1677reality, humans did most of the work.},
1678  author = {Jeans, David},
1679  title = {{ScaleFactor} Raised \$100 Million In A Year Then Blamed Covid-19
1680For Its Demise. Employees Say It Had Much Bigger Problems},
1681}
1682
1683@misc{Translator2018-ki,
1684  language = {en},
1685  note = {Accessed: 2022-1-12},
1686  howpublished = {\url{https://www.microsoft.com/en-us/translator/blog/2018/03/14/human-parity-for-chinese-to-english-translations/}},
1687  year = {2018},
1688  month = {March},
1689  abstract = {Microsoft announced today that its researchers have developed
1690an AI machine translation system that can translate with the
1691same accuracy as a human from Chinese to English. To validate
1692the results, the researchers used an industry standard test
1693set of news stories (newstest2017) to compare human and
1694machine translation results. To further ensure accuracy of
1695the evaluation, the team also....},
1696  author = {Translator, Microsoft},
1697  booktitle = {Microsoft Translator Blog},
1698  title = {Neural Machine Translation reaches historic milestone: human
1699parity for Chinese to English translations},
1700}
1701
1702@article{mulligan2019thing,
1703  publisher = {ACM New York, NY, USA},
1704  year = {2019},
1705  pages = {1--36},
1706  number = {CSCW},
1707  volume = {3},
1708  journal = {Proceedings of the ACM on Human-Computer Interaction},
1709  author = {Mulligan, Deirdre K and Kroll, Joshua A and Kohli, Nitin and Wong, Richmond Y},
1710  title = {This thing called fairness: disciplinary confusion realizing a value in technology},
1711}
1712
1713@article{CambridgeAnalytica,
1714  url = {https://www.latimes.com/politics/la-na-pol-cambridge-analytica-20180321-story.html},
1715  journal = {Los Angeles Times},
1716  title = {Was Cambridge Analytica a digital Svengali or snake-oil salesman?},
1717  date = {2018-03-21},
1718  author = {Halper, Evan},
1719}
1720
1721@article{Toral2018-wn,
1722  eprint = {1808.10432},
1723  primaryclass = {cs.CL},
1724  archiveprefix = {arXiv},
1725  year = {2018},
1726  month = {August},
1727  abstract = {We reassess a recent study (Hassan et al., 2018) that
1728claimed that machine translation (MT) has reached human
1729parity for the translation of news from Chinese into
1730English, using pairwise ranking and considering three
1731variables that were not taken into account in that previous
1732study: the language in which the source side of the test set
1733was originally written, the translation proficiency of the
1734evaluators, and the provision of inter-sentential context.
1735If we consider only original source text (i.e. not
1736translated from another language, or translationese), then
1737we find evidence showing that human parity has not been
1738achieved. We compare the judgments of professional
1739translators against those of non-experts and discover that
1740those of the experts result in higher inter-annotator
1741agreement and better discrimination between human and
1742machine translations. In addition, we analyse the human
1743translations of the test set and identify important
1744translation issues. Finally, based on these findings, we
1745provide a set of recommendations for future human
1746evaluations of MT.},
1747  author = {Toral, Antonio and Castilho, Sheila and Hu, Ke and Way, Andy},
1748  title = {Attaining the Unattainable? Reassessing Claims of Human
1749Parity in Neural Machine Translation},
1750}
1751
1752@article{Laubli2018-sn,
1753  eprint = {1808.07048},
1754  primaryclass = {cs.CL},
1755  archiveprefix = {arXiv},
1756  year = {2018},
1757  month = {August},
1758  abstract = {Recent research suggests that neural machine translation
1759achieves parity with professional human translation on the
1760WMT Chinese--English news translation task. We empirically
1761test this claim with alternative evaluation protocols,
1762contrasting the evaluation of single sentences and entire
1763documents. In a pairwise ranking experiment, human raters
1764assessing adequacy and fluency show a stronger preference
1765for human over machine translation when evaluating documents
1766as compared to isolated sentences. Our findings emphasise
1767the need to shift towards document-level evaluation as
1768machine translation improves to the degree that errors which
1769are hard or impossible to spot at the sentence-level become
1770decisive in discriminating quality of different translation
1771outputs.},
1772  author = {L{\"a}ubli, Samuel and Sennrich, Rico and Volk, Martin},
1773  title = {Has Machine Translation Achieved Human Parity? A Case for
1774Document-level Evaluation},
1775}
1776
1777@article{Dobbe2019-ms,
1778  eprint = {1911.09005},
1779  primaryclass = {cs.AI},
1780  archiveprefix = {arXiv},
1781  year = {2019},
1782  month = {November},
1783  abstract = {As AI systems become prevalent in high stakes domains such
1784as surveillance and healthcare, researchers now examine how
1785to design and implement them in a safe manner. However, the
1786potential harms caused by systems to stakeholders in complex
1787social contexts and how to address these remains unclear. In
1788this paper, we explain the inherent normative uncertainty in
1789debates about the safety of AI systems. We then address this
1790as a problem of vagueness by examining its place in the
1791design, training, and deployment stages of AI system
1792development. We adopt Ruth Chang's theory of intuitive
1793comparability to illustrate the dilemmas that manifest at
1794each stage. We then discuss how stakeholders can navigate
1795these dilemmas by incorporating distinct forms of dissent
1796into the development pipeline, drawing on Elizabeth
1797Anderson's work on the epistemic powers of democratic
1798institutions. We outline a framework of sociotechnical
1799commitments to formal, substantive and discursive challenges
1800that address normative uncertainty across stakeholders, and
1801propose the cultivation of related virtues by those
1802responsible for development.},
1803  author = {Dobbe, Roel and Gilbert, Thomas Krendl and Mintz, Yonatan},
1804  title = {Hard Choices in Artificial Intelligence: Addressing
1805Normative Uncertainty through Sociotechnical Commitments},
1806}
1807
1808@misc{Buolamwini_undated-dd,
1809  note = {Accessed: 2022-1-12},
1810  howpublished = {\url{http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf}},
1811  abstract = {Recent studies demonstrate that machine learning algorithms
1812can discriminate based on classes like race and gender. In
1813this work, we present an approach to evaluate bias present in
1814automated facial analysis algorithms and datasets with
1815respect to phenotypic subgroups. Using the dermatologist
1816approved Fitzpatrick Skin Type classification system, we
1817characterize the gender and skin type distribution of two
1818facial analysis benchmarks, IJB-A and Adience. We find that
1819these datasets are overwhelmingly composed of lighter-skinned
1820subjects (79.6\% for IJB-A and 86.2\% for Adience) and
1821introduce a new facial analysis dataset which is balanced by
1822gender and skin type. We evaluate 3 commercial gender
1823classification systems using our dataset and show that
1824darker-skinned females are the most misclassified group (with
1825error rates of up to 34.7\%). The maximum error rate for
1826lighter-skinned males is 0.8\%. The substantial disparities
1827in the accuracy of classifying darker females, lighter
1828females, darker males, and lighter males in gender
1829classification systems require urgent attention if commercial
1830companies are to build genuinely fair, transparent and
1831accountable facial analysis algorithms.},
1832  author = {Buolamwini, Joy and Friedler, Sorelle A and Wilson, Christo},
1833  title = {Gender shades: Intersectional accuracy disparities in
1834commercial gender classification},
1835}
1836
1837@misc{Snow2018-vw,
1838  language = {en},
1839  note = {Accessed: 2022-1-12},
1840  howpublished = {\url{https://www.aclu.org/blog/privacy-technology/surveillance-technologies/amazons-face-recognition-falsely-matched-28}},
1841  year = {2018},
1842  month = {July},
1843  abstract = {Amazon's face surveillance technology is the target of
1844growing opposition nationwide, and today, there are 28 more
1845causes for concern. In a test the ACLU recently conducted of
1846the facial recognition tool, called ``Rekognition,'' the
1847software incorrectly matched 28 members of Congress,
1848identifying them as other people who have been arrested for a
1849crime. The members of Congress},
1850  author = {Snow, Jacob},
1851  booktitle = {American Civil Liberties Union},
1852  title = {Amazon's Face Recognition Falsely Matched 28 Members of
1853Congress With Mugshots},
1854}
1855
1856@misc{Wood_undated-ek,
1857  howpublished = {\url{https://aws.amazon.com/blogs/aws/thoughts-on-machine-learning-accuracy/}},
1858  author = {Wood, Matt},
1859  title = {Thoughts On Machine Learning Accuracy},
1860}
1861
1862@misc{aclu_response_response_fr,
1863  author = {ACLU},
1864  month = {July},
1865  year = {2018},
1866  language = {en},
1867  note = {Accessed: 2022-1-12},
1868  howpublished = {\url{https://www.aclu.org/press-releases/aclu-comment-new-amazon-statement-responding-face-recognition-technology-test}},
1869  abstract = {SAN FRANCISCO -- Amazon today issued an additional statement
1870in response to the American Civil Liberties Union Foundation
1871of Northern California test of Rekognition, the company's
1872face recognition technology. The test revealed that
1873Rekognition falsely matched 28 current members of Congress
1874with images in an arrest photo database.},
1875  booktitle = {American Civil Liberties Union},
1876  title = {{ACLU} Comment on New Amazon Statement Responding to Face
1877Recognition Technology Test},
1878}
1879
1880@misc{Ross2018-nn,
1881  language = {en},
1882  note = {Accessed: 2022-1-13},
1883  howpublished = {\url{https://www.statnews.com/2018/07/25/ibm-watson-recommended-unsafe-incorrect-treatments/?utm_source=STAT+Newsletters&utm_campaign=beb06f048d-MR_COPY_08&utm_medium=email&utm_term=0_8cab1d7961-beb06f048d-150085821}},
1884  year = {2018},
1885  month = {July},
1886  abstract = {Slide decks presented last summer by an IBM Watson Health
1887executive largely blame the problems on the training of
1888Watson for Oncology by IBM engineers and doctors at the
1889renowned Memorial Sloan Kettering Cancer Center.},
1890  author = {Ross, Casey and Swetlitz, Ike and Cohrs, Rachel and
1891Dillingham, Ian and {STAT Staff} and Florko, Nicholas and
1892Bender, Maddie},
1893  booktitle = {{STAT}},
1894  title = {{IBM's} Watson supercomputer recommended 'unsafe and
1895incorrect' cancer treatments, internal documents show},
1896}
1897
1898@article{md_anderson_benches_watson,
1899  language = {en},
1900  year = {2017},
1901  month = {February},
1902  journal = {Forbes Magazine},
1903  abstract = {MD Anderson has placed a much-ballyhooed 'Watson for cancer'
1904product it was developing with IBM on hold -- and is looking for
1905a new partner.},
1906  author = {Herper, Matthew},
1907  title = {{MD} Anderson Benches {IBM} Watson In Setback For Artificial
1908Intelligence In Medicine},
1909}
1910
1911@misc{Wojcik_undated-nb,
1912  note = {Accessed: 2022-1-13},
1913  howpublished = {\url{https://www.cnbc.com/2017/05/08/ibms-watson-is-a-joke-says-social-capital-ceo-palihapitiya.html}},
1914  author = {Wojcik, Natalia},
1915  booktitle = {{CNBC}},
1916  title = {{IBM's} Watson `is a joke,' says Social Capital {CEO}
1917Palihapitiya},
1918}
1919
1920@article{Simon2019-ed,
1921  language = {en},
1922  keywords = {Artificial intelligence application in medicine; Clinical
1923decision support; Closing the cancer care gap; Democratization of
1924evidence‐based care; Virtual expert advisor},
1925  year = {2019},
1926  month = {June},
1927  pages = {772--782},
1928  number = {6},
1929  volume = {24},
1930  journal = {Oncologist},
1931  abstract = {BACKGROUND: Rapid advances in science challenge the timely
1932adoption of evidence-based care in community settings. To bridge
1933the gap between what is possible and what is practiced, we
1934researched approaches to developing an artificial intelligence
1935(AI) application that can provide real-time patient-specific
1936decision support. MATERIALS AND METHODS: The Oncology Expert
1937Advisor (OEA) was designed to simulate peer-to-peer consultation
1938with three core functions: patient history summarization,
1939treatment options recommendation, and management advisory.
1940Machine-learning algorithms were trained to construct a dynamic
1941summary of patients cancer history and to suggest approved
1942therapy or investigative trial options. All patient data used
1943were retrospectively accrued. Ground truth was established for
1944approximately 1,000 unique patients. The full Medline database of
1945more than 23 million published abstracts was used as the
1946literature corpus. RESULTS: OEA's accuracies of searching
1947disparate sources within electronic medical records to extract
1948complex clinical concepts from unstructured text documents
1949varied, with F1 scores of 90\%-96\% for non-time-dependent
1950concepts (e.g., diagnosis) and F1 scores of 63\%-65\% for
1951time-dependent concepts (e.g., therapy history timeline). Based
1952on constructed patient profiles, OEA suggests approved therapy
1953options linked to supporting evidence (99.9\% recall; 88\%
1954precision), and screens for eligible clinical trials on
1955ClinicalTrials.gov (97.9\% recall; 96.9\% precision). CONCLUSION:
1956Our results demonstrated technical feasibility of an AI-powered
1957application to construct longitudinal patient profiles in context
1958and to suggest evidence-based treatment and trial options. Our
1959experience highlighted the necessity of collaboration across
1960clinical and AI domains, and the requirement of clinical
1961expertise throughout the process, from design to training to
1962testing. IMPLICATIONS FOR PRACTICE: Artificial intelligence
1963(AI)-powered digital advisors such as the Oncology Expert Advisor
1964have the potential to augment the capacity and update the
1965knowledge base of practicing oncologists. By constructing dynamic
1966patient profiles from disparate data sources and organizing and
1967vetting vast literature for relevance to a specific patient, such
1968AI applications could empower oncologists to consider all therapy
1969options based on the latest scientific evidence for their
1970patients, and help them spend less time on information ``hunting
1971and gathering'' and more time with the patients. However,
1972realization of this will require not only AI technology
1973maturation but also active participation and leadership by
1974clincial experts.},
1975  author = {Simon, George and DiNardo, Courtney D and Takahashi, Koichi and
1976Cascone, Tina and Powers, Cynthia and Stevens, Rick and Allen,
1977Joshua and Antonoff, Mara B and Gomez, Daniel and Keane, Pat and
1978Suarez Saiz, Fernando and Nguyen, Quynh and Roarty, Emily and
1979Pierce, Sherry and Zhang, Jianjun and Hardeman Barnhill, Emily
1980and Lakhani, Kate and Shaw, Kenna and Smith, Brett and Swisher,
1981Stephen and High, Rob and Futreal, P Andrew and Heymach, John and
1982Chin, Lynda},
1983  title = {Applying Artificial Intelligence to Address the Knowledge Gaps in
1984Cancer Care},
1985}
1986
1987@misc{Strickland_undated-ng,
1988  note = {Accessed: 2022-1-13},
1989  howpublished = {\url{https://spectrum.ieee.org/how-ibm-watson-overpromised-and-underdelivered-on-ai-health-care}},
1990  abstract = {After its triumph on Jeopardy!, IBM's AI seemed poised to
1991revolutionize medicine. Doctors are still waiting},
1992  author = {Strickland, Eliza},
1993  title = {{IBM} Watson Heal Thyself: How {IBM} Watson Overpromised And
1994Underdeliverd On {AI} Health Care},
1995}
1996
1997@article{Gianfrancesco2018-vl,
1998  language = {en},
1999  year = {2018},
2000  month = {November},
2001  pages = {1544--1547},
2002  number = {11},
2003  volume = {178},
2004  journal = {JAMA Intern. Med.},
2005  abstract = {A promise of machine learning in health care is the avoidance of
2006biases in diagnosis and treatment; a computer algorithm could
2007objectively synthesize and interpret the data in the medical
2008record. Integration of machine learning with clinical decision
2009support tools, such as computerized alerts or diagnostic support,
2010may offer physicians and others who provide health care targeted
2011and timely information that can improve clinical decisions.
2012Machine learning algorithms, however, may also be subject to
2013biases. The biases include those related to missing data and
2014patients not identified by algorithms, sample size and
2015underestimation, and misclassification and measurement error.
2016There is concern that biases and deficiencies in the data used by
2017machine learning algorithms may contribute to socioeconomic
2018disparities in health care. This Special Communication outlines
2019the potential biases that may be introduced into machine
2020learning-based clinical decision support tools that use
2021electronic health record data and proposes potential solutions to
2022the problems of overreliance on automation, algorithms based on
2023biased data, and algorithms that do not provide information that
2024is clinically meaningful. Existing health care disparities should
2025not be amplified by thoughtless or excessive reliance on
2026machines.},
2027  author = {Gianfrancesco, Milena A and Tamang, Suzanne and Yazdany, Jinoos
2028and Schmajuk, Gabriela},
2029  title = {Potential Biases in Machine Learning Algorithms Using Electronic
2030Health Record Data},
2031}
2032
2033@inproceedings{Jacobs2021-rk,
2034  location = {Virtual Event Canada},
2035  conference = {FAccT '21: 2021 ACM Conference on Fairness, Accountability, and
2036Transparency},
2037  address = {New York, NY, USA},
2038  year = {2021},
2039  month = {March},
2040  publisher = {ACM},
2041  author = {Jacobs, Abigail Z and Wallach, Hanna},
2042  booktitle = {Proceedings of the 2021 {ACM} Conference on Fairness,
2043Accountability, and Transparency},
2044  title = {Measurement and Fairness},
2045}
2046
2047@article{Alexandrova_undated-mx,
2048  journal = {Eur. J. Philos. Sci.},
2049  author = {Alexandrova, Anna and Fabian, Mark},
2050  title = {Democratising Measurement: Or Why Thick Concepts Call for
2051Coproduction},
2052}
2053
2054@article{Jacobs2021-og,
2055  eprint = {2109.05658},
2056  primaryclass = {cs.CY},
2057  archiveprefix = {arXiv},
2058  year = {2021},
2059  month = {September},
2060  abstract = {Measurement of social phenomena is everywhere, unavoidably,
2061in sociotechnical systems. This is not (only) an academic
2062point: Fairness-related harms emerge when there is a
2063mismatch in the measurement process between the thing we
2064purport to be measuring and the thing we actually measure.
2065However, the measurement process -- where social, cultural,
2066and political values are implicitly encoded in
2067sociotechnical systems -- is almost always obscured.
2068Furthermore, this obscured process is where important
2069governance decisions are encoded: governance about which
2070systems are fair, which individuals belong in which
2071categories, and so on. We can then use the language of
2072measurement, and the tools of construct validity and
2073reliability, to uncover hidden governance decisions. In
2074particular, we highlight two types of construct validity,
2075content validity and consequential validity, that are useful
2076to elicit and characterize the feedback loops between the
2077measurement, social construction, and enforcement of social
2078categories. We then explore the constructs of fairness,
2079robustness, and responsibility in the context of governance
2080in and for responsible AI. Together, these perspectives help
2081us unpack how measurement acts as a hidden governance
2082process in sociotechnical systems. Understanding measurement
2083as governance supports a richer understanding of the
2084governance processes already happening in AI -- responsible
2085or otherwise -- revealing paths to more effective
2086interventions.},
2087  author = {Jacobs, Abigail Z},
2088  title = {Measurement as governance in and for responsible {AI}},
2089}
2090
2091@misc{Mayson_dangdefendants,
2092  note = {Accessed: 2022-1-15},
2093  howpublished = {\url{https://www.yalelawjournal.org/article/dangerous-defendants}},
2094  abstract = {Bail reformers aspire to untether pretrial detention from
2095wealth and condition it instead on the risk that a defendant
2096will commit crime if released. In setting this risk
2097threshold, this Article argues that there is no clear
2098constitutional, moral, or practical basis for distinguishing
2099between equally dangerous defendants and non-defendants.},
2100  author = {Mayson, Sandra G},
2101  title = {Dangerous Defendants},
2102}
2103
2104@article{Lum2016-hz,
2105  language = {en},
2106  year = {2016},
2107  month = {October},
2108  pages = {14--19},
2109  number = {5},
2110  volume = {13},
2111  publisher = {Wiley},
2112  journal = {Signif. (Oxf.)},
2113  abstract = {Predictive policing systems are used increasingly by law
2114enforcement to try to prevent crime before it occurs. But what
2115happens when these systems are trained using biased data?
2116Kristian Lum and William Isaac consider the evidence ? and the
2117social consequences},
2118  author = {Lum, Kristian and Isaac, William},
2119  title = {To predict and serve?},
2120}
2121
2122@article{Ferguson2016-bs,
2123  year = {2016},
2124  publisher = {HeinOnline},
2125  journal = {Wash. UL Rev.},
2126  abstract = {… This article examines predictive policing's evolution with the
2127goal ofproviding the first practical and theoretical critique of
2128this new policing … assessment throughout the criminal justice
2129system, this article provides an analytical framework to police
2130new predictive technologies. …},
2131  author = {Ferguson, A G},
2132  title = {Policing predictive policing},
2133}
2134
2135@article{Hoffman2013-ms,
2136  language = {en},
2137  year = {2013},
2138  pages = {497--538},
2139  number = {4},
2140  volume = {39},
2141  journal = {Am. J. Law Med.},
2142  abstract = {Very large biomedical research databases, containing electronic
2143health records (EHR) and genomic data from millions of patients,
2144have been heralded recently for their potential to accelerate
2145scientific discovery and produce dramatic improvements in medical
2146treatments. Research enabled by these databases may also lead to
2147profound changes in law, regulation, social policy, and even
2148litigation strategies. Yet, is ``big data'' necessarily better
2149data? This paper makes an original contribution to the legal
2150literature by focusing on what can go wrong in the process of
2151biomedical database research and what precautions are necessary
2152to avoid critical mistakes. We address three main reasons for
2153approaching such research with care and being cautious in relying
2154on its outcomes for purposes of public policy or litigation.
2155First, the data contained in biomedical databases is surprisingly
2156likely to be incorrect or incomplete. Second, systematic biases,
2157arising from both the nature of the data and the preconceptions
2158of investigators, are serious threats to the validity of research
2159results, especially in answering causal questions. Third, data
2160mining of biomedical databases makes it easier for individuals
2161with political, social, or economic agendas to generate
2162ostensibly scientific but misleading research findings for the
2163purpose of manipulating public opinion and swaying policymakers.
2164In short, this paper sheds much-needed light on the problems of
2165credulous and uninformed acceptance of research results derived
2166from biomedical databases. An understanding of the pitfalls of
2167big data analysis is of critical importance to anyone who will
2168rely on or dispute its outcomes, including lawyers, policymakers,
2169and the public at large. The Article also recommends technical,
2170methodological, and educational interventions to combat the
2171dangers of database errors and abuses.},
2172  author = {Hoffman, Sharona and Podgurski, Andy},
2173  title = {The use and misuse of biomedical data: is bigger really better?},
2174}
2175
2176@article{Hoffman2013-oa,
2177  language = {en},
2178  year = {2013},
2179  month = {March},
2180  pages = {56--60},
2181  volume = {41 Suppl 1},
2182  journal = {J. Law Med. Ethics},
2183  abstract = {The accelerating adoption of electronic health record (EHR)
2184systems will have far-reaching implications for public health
2185research and surveillance, which in turn could lead to changes in
2186public policy, statutes, and regulations. The public health
2187benefits of EHR use can be significant. However, researchers and
2188analysts who rely on EHR data must proceed with caution and
2189understand the potential limitations of EHRs. Because of
2190clinicians' workloads, poor user-interface design, and other
2191factors, EHR data can be erroneous, miscoded, fragmented, and
2192incomplete. In addition, public health findings can be tainted by
2193the problems of selection bias, confounding bias, and measurement
2194bias. These flaws may become all the more troubling and important
2195in an era of electronic ``big data,'' in which a massive amount
2196of information is processed automatically, without human checks.
2197Thus, we conclude the paper by outlining several regulatory and
2198other interventions to address data analysis difficulties that
2199could result in invalid conclusions and unsound public health
2200policies.},
2201  author = {Hoffman, Sharona and Podgurski, Andy},
2202  title = {Big bad data: law, public health, and biomedical databases},
2203}
2204
2205@article{Agrawal2020-rs,
2206  language = {en},
2207  year = {2020},
2208  month = {April},
2209  pages = {525--534},
2210  number = {4},
2211  volume = {124},
2212  journal = {Heredity},
2213  abstract = {Big Data will be an integral part of the next generation of
2214technological developments-allowing us to gain new insights from
2215the vast quantities of data being produced by modern life. There
2216is significant potential for the application of Big Data to
2217healthcare, but there are still some impediments to overcome,
2218such as fragmentation, high costs, and questions around data
2219ownership. Envisioning a future role for Big Data within the
2220digital healthcare context means balancing the benefits of
2221improving patient outcomes with the potential pitfalls of
2222increasing physician burnout due to poor implementation leading
2223to added complexity. Oncology, the field where Big Data
2224collection and utilization got a heard start with programs like
2225TCGA and the Cancer Moon Shot, provides an instructive example as
2226we see different perspectives provided by the United States (US),
2227the United Kingdom (UK) and other nations in the implementation
2228of Big Data in patient care with regards to their centralization
2229and regulatory approach to data. By drawing upon global
2230approaches, we propose recommendations for guidelines and
2231regulations of data use in healthcare centering on the creation
2232of a unique global patient ID that can integrate data from a
2233variety of healthcare providers. In addition, we expand upon the
2234topic by discussing potential pitfalls to Big Data such as the
2235lack of diversity in Big Data research, and the security and
2236transparency risks posed by machine learning algorithms.},
2237  author = {Agrawal, Raag and Prabakaran, Sudhakaran},
2238  title = {Big data in digital healthcare: lessons learnt and
2239recommendations for general practice},
2240}
2241
2242@article{Ensign2017-vi,
2243  eprint = {1706.09847},
2244  primaryclass = {cs.CY},
2245  archiveprefix = {arXiv},
2246  year = {2017},
2247  month = {June},
2248  abstract = {Predictive policing systems are increasingly used to
2249determine how to allocate police across a city in order to
2250best prevent crime. Discovered crime data (e.g., arrest
2251counts) are used to help update the model, and the process
2252is repeated. Such systems have been empirically shown to be
2253susceptible to runaway feedback loops, where police are
2254repeatedly sent back to the same neighborhoods regardless of
2255the true crime rate. In response, we develop a mathematical
2256model of predictive policing that proves why this feedback
2257loop occurs, show empirically that this model exhibits such
2258problems, and demonstrate how to change the inputs to a
2259predictive policing system (in a black-box manner) so the
2260runaway feedback loop does not occur, allowing the true
2261crime rate to be learned. Our results are quantitative: we
2262can establish a link (in our model) between the degree to
2263which runaway feedback causes problems and the disparity in
2264crime rates between areas. Moreover, we can also demonstrate
2265the way in which \textbackslashemph\{reported\} incidents of
2266crime (those reported by residents) and
2267\textbackslashemph\{discovered\} incidents of crime (i.e.
2268those directly observed by police officers dispatched as a
2269result of the predictive policing algorithm) interact: in
2270brief, while reported incidents can attenuate the degree of
2271runaway feedback, they cannot entirely remove it without the
2272interventions we suggest.},
2273  author = {Ensign, Danielle and Friedler, Sorelle A and Neville, Scott
2274and Scheidegger, Carlos and Venkatasubramanian, Suresh},
2275  title = {Runaway Feedback Loops in Predictive Policing},
2276}
2277
2278@unpublished{Richardson2019-cn,
2279  keywords = {Policing, Predictive Policing, Civil Rights, Bias, Justice, Data,
2280AI, Machine Learning},
2281  year = {2019},
2282  month = {February},
2283  abstract = {Law enforcement agencies are increasingly using predictive
2284policing systems to forecast criminal activity and allocate
2285police resources. Yet in numerous jurisdictions, these systems
2286are built on data produced during documented periods of flawed,
2287racially biased, and sometimes unlawful practices and policies
2288(``dirty policing''). These policing practices and policies shape
2289the environment and the methodology by which data is created,
2290which raises the risk of creating inaccurate, skewed, or
2291systemically biased data (``dirty data''). If predictive policing
2292systems are informed by such data, they cannot escape the
2293legacies of the unlawful or biased policing practices that they
2294are built on. Nor do current claims by predictive policing
2295vendors provide sufficient assurances that their systems
2296adequately mitigate or segregate this data.In our research, we
2297analyze thirteen jurisdictions that have used or developed
2298predictive policing tools while under government commission
2299investigations or federal court monitored settlements, consent
2300decrees, or memoranda of agreement stemming from corrupt,
2301racially biased, or otherwise illegal policing practices. In
2302particular, we examine the link between unlawful and biased
2303police practices and the data available to train or implement
2304these systems. We highlight three case studies: (1) Chicago, an
2305example of where dirty data was ingested directly into the city's
2306predictive system; (2) New Orleans, an example where the
2307extensive evidence of dirty policing practices and recent
2308litigation suggests an extremely high risk that dirty data was or
2309could be used in predictive policing; and (3) Maricopa County,
2310where despite extensive evidence of dirty policing practices, a
2311lack of public transparency about the details of various
2312predictive policing systems restricts a proper assessment of the
2313risks. The implications of these findings have widespread
2314ramifications for predictive policing writ large. Deploying
2315predictive policing systems in jurisdictions with extensive
2316histories of unlawful police practices presents elevated risks
2317that dirty data will lead to flawed or unlawful predictions,
2318which in turn risk perpetuating additional harm via feedback
2319loops throughout the criminal justice system. The use of
2320predictive policing must be treated with high levels of caution
2321and mechanisms for the public to know, assess, and reject such
2322systems are imperative.},
2323  author = {Richardson, Rashida and Schultz, Jason and Crawford, Kate},
2324  title = {Dirty Data, Bad Predictions: How Civil Rights Violations Impact
2325Police Data, Predictive Policing Systems, and Justice},
2326}
2327
2328@unpublished{Stevenson2021-fr,
2329  keywords = {pretrial detention, consequentialism, risk assessments, bail
2330reform},
2331  year = {2021},
2332  month = {February},
2333  abstract = {How dangerous must a person be to justify the state in locking
2334her up for the greater good? The bail reform movement, which
2335aspires to limit pretrial detention to the truly dangerous---and
2336which has looked to algorithmic risk assessments to quantify
2337danger---has brought this question to the fore. Constitutional
2338doctrine authorizes pretrial detention when the government's
2339interest in safety ``outweighs'' an individual's interest in
2340liberty, but it does not specify how to balance these goods. If
2341detaining ten presumptively innocent people for three months is
2342projected to prevent one robbery, is it worth it?This Article
2343confronts the question of what degree of risk justifies pretrial
2344preventive detention if one takes the consequentialist approach
2345of current law seriously. Surveying the law, we derive two
2346principles: 1) detention must avert greater harm (by preventing
2347crime) than it inflicts (by depriving a person of liberty) and 2)
2348prohibitions against pretrial punishment mean that the harm
2349experienced by the detainee cannot be discounted in the
2350cost-benefit calculus. With this conceptual framework in place,
2351we develop a novel empirical method for estimating the relative
2352harms of incarceration and crime victimization that we call
2353``Rawlsian cost-benefit analysis'': a survey method that asks
2354respondents to choose between being the victim of certain crimes
2355or being jailed for varying time periods. The results suggest
2356that even short periods of incarceration impose grave harms, such
2357that a person must pose an extremely high risk of serious crime
2358in order for detention to be justified. No existing risk
2359assessment tool is sufficient to identify individuals who warrant
2360detention. The empirical results demonstrate that the stated
2361consequentialist rationale for pretrial detention cannot begin to
2362justify our current detention rates, and suggest that the
2363existing system veers uncomfortably close to pretrial punishment.
2364The degree of discord between theory and practice demands a
2365rethinking of pretrial law and policy.},
2366  author = {Stevenson, Megan T and Mayson, Sandra G},
2367  title = {Pretrial detention and the value of liberty},
2368}
2369
2370@misc{Gouldin_undated-oc,
2371  note = {Accessed: 2022-1-14},
2372  howpublished = {\url{https://lawreview.uchicago.edu/sites/lawreview.uchicago.edu/files/02\%20Gouldin_ART_SA\%20\%28JPM\%29.pdf}},
2373  abstract = {Our illogical and too-well-traveled paths to pretrial
2374detention have created staggering costs for defendants who
2375spend unnecessary time in pretrial detention and for
2376taxpayers who fund a broken system. These problems remain
2377recalcitrant even as a third generation of reform efforts
2378makes impressive headway. They are likely to remain so until
2379judges, attorneys, legislators, and scholars address a
2380fundamental definitional problem: the collapsing of very
2381different types of behavior that result in failures to appear
2382in court into a single, undifferentiated category of
2383nonappearance risk. That single category muddies critical
2384distinctions that this Article's new taxonomy of pretrial
2385nonappearance risks clarifies. This taxonomy (i) isolates
2386true flight risk (the risk that a defendant will flee the
2387jurisdiction) from other forms of ``local'' nonappearance
2388risk and (ii) distinguishes between local nonappearance risks
2389based on persistence, willfulness, amenability to
2390intervention, and cost. Upon examination, it is clear that
2391flight and nonappearance are not simply interchangeable names
2392for the same concept, nor are they merely different degrees
2393of the same type of risk. In the context of measuring and
2394managing risks, many defendants who merely fail to appear
2395differ in important ways from their fugitive cousins.
2396Precision about these distinctions is constitutionally
2397mandated and statutorily required. It is also essential for
2398current reform efforts that are aimed at identifying less
2399intrusive and lower-cost interventions that can effectively
2400manage the full range of nonappearance and flight risks.
2401These distinctions are not reflected in the pretrial
2402risk-assessment tools that are increasingly being employed
2403across the country. But they should be. A more nuanced
2404understanding of these differences},
2405  author = {Gouldin, Lauryn P and Appleman, Laura and Baughman, Shima
2406Baradaran and Berger, Todd and Bybee, Keith and Cahill,
2407Michael and Commandeur, Nicolas and Eaglin, Jessica and
2408Futrell, Nicole Smith and Godsoe, Cynthia and Gold, Russell
2409and Kohn, Nina and Lain, Corinna and Levine, Kate and Mayson,
2410Sandy and Moore, Janet and Ouziel, Lauren and Podgor, Ellen
2411and Roberts, Anna and Sacharoff, Laurent and Schnacke, Tim
2412and Simonson, Jocelyn and True-Frost, Cora},
2413  title = {Defining flight risk},
2414}
2415
2416@article{Slobogin2003-ou,
2417  language = {en},
2418  year = {2003},
2419  publisher = {Elsevier BV},
2420  journal = {SSRN Electron. J.},
2421  abstract = {This article addresses the state's police power authority to
2422deprive people of liberty based on predictions of antisocial
2423behavior. Most conspicuously exercised against so-called
2424``sexual predators,'' this authority purportedly justifies a
2425wide array of other state interventions as well, ranging from
2426police stops to executions. Yet there still is no general theory
2427of preventive detention. This article is a preliminary effort in
2428that regard. The article first surveys the various objections to
2429preventive detention: the unreliability objection; the
2430punishment-in-disguise objection; the legality objection; and
2431the dehumanization objection. None of these objections justifies
2432a complete prohibition on the state's power to detain people
2433based on dangerousness. But they do suggest significant
2434limitations on that power regarding acceptable methods of
2435prediction, the nature and duration of preventive detention, the
2436threshold conduct that can trigger such detention, and the
2437extent to which it can replace punishment as the official
2438response to antisocial behavior. On the latter issue, the
2439central conclusion is that preventive detention which functions
2440as a substitute for punishment, as in the case of sexual
2441predator statutes, is only permissible if certain psychological
2442and predictive criteria are met. The rest of the paper develops
2443these criteria. It argues that the psychological criterion
2444should be undeterrability, defined as the characteristic
2445ignorance that one's criminal activity is criminal or a
2446characteristic willingness to commit crime despite certain and
2447significant punishment, a definition that differs from both the
2448usual academic stance and the Supreme Court's
2449inability-to-control formulation. The paper next argues that
2450selection of a prediction criterion should be informed by two
2451principles, the proportionality principle (which varies the
2452legally requisite level of dangerousness with the nature and
2453duration of the state's intervention) and the consistency
2454principle (which takes as a reference point the implicit
2455dangerousness assessments in the law of crimes). Finally, the
2456paper explores some of the implications of the latter principle
2457for the criminal law, including the possibility that some crimes
2458- in particular various possession offenses, reckless
2459endangerment and vagrancy - violate the fundamental norms of the
2460police power authority.},
2461  author = {Slobogin, Christopher},
2462  title = {A jurisprudence of dangerousness},
2463}
2464
2465@inproceedings{Akpinar2021-fb,
2466  location = {Virtual Event, Canada},
2467  address = {New York, NY, USA},
2468  year = {2021},
2469  month = {March},
2470  series = {FAccT '21},
2471  pages = {838--849},
2472  publisher = {Association for Computing Machinery},
2473  abstract = {Police departments around the world have been experimenting with
2474forms of place-based data-driven proactive policing for over two
2475decades. Modern incarnations of such systems are commonly known
2476as hot spot predictive policing. These systems predict where
2477future crime is likely to concentrate such that police can
2478allocate patrols to these areas and deter crime before it
2479occurs. Previous research on fairness in predictive policing has
2480concentrated on the feedback loops which occur when models are
2481trained on discovered crime data, but has limited implications
2482for models trained on victim crime reporting data. We
2483demonstrate how differential victim crime reporting rates across
2484geographical areas can lead to outcome disparities in common
2485crime hot spot prediction models. Our analysis is based on a
2486simulation1 patterned after district-level victimization and
2487crime reporting survey data for Bogot{\'a}, Colombia. Our
2488results suggest that differential crime reporting rates can lead
2489to a displacement of predicted hotspots from high crime but low
2490reporting areas to high or medium crime and high reporting
2491areas. This may lead to misallocations both in the form of
2492over-policing and under-policing.},
2493  author = {Akpinar, Nil-Jana and De-Arteaga, Maria and Chouldechova,
2494Alexandra},
2495  booktitle = {Proceedings of the 2021 {ACM} Conference on Fairness,
2496Accountability, and Transparency},
2497  title = {The effect of differential victim crime reporting on predictive
2498policing systems},
2499}
2500
2501@article{vinsel_critihype,
2502  url = {https://sts-news.medium.com/youre-doing-it-wrong-notes-on-criticism-and-technology-hype-18b08b4307e5},
2503  author = {Vinsel, Lee},
2504  title = {You’re Doing It Wrong: Notes on Criticism and Technology Hype},
2505}
2506
2507@inbook{krafft_et_al,
2508  numpages = {7},
2509  pages = {72–78},
2510  booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
2511  url = {https://doi.org/10.1145/3375627.3375835},
2512  address = {New York, NY, USA},
2513  publisher = {Association for Computing Machinery},
2514  isbn = {9781450371100},
2515  year = {2020},
2516  title = {Defining AI in Policy versus Practice},
2517  author = {Krafft, P. M. and Young, Meg and Katell, Michael and Huang, Karen and Bugingo, Ghislain},
2518}
2519
2520@inproceedings{hidden_technical_debt,
2521  year = {2015},
2522  volume = {28},
2523  url = {https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf},
2524  title = {Hidden Technical Debt in Machine Learning Systems},
2525  publisher = {Curran Associates, Inc.},
2526  pages = {},
2527  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
2528  booktitle = {Advances in Neural Information Processing Systems},
2529  author = {Sculley, D. and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Fran\c{c}ois and Dennison, Dan},
2530}
2531
2532@article{stark_and_hutson,
2533  url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3927300},
2534  year = {2022},
2535  journal = {forthcoming in Fordham Intellectual Property, Media \& Entertainment Law Journal XXXII},
2536  author = {Stark, Luke and Hutson, Jevan},
2537  title = {Physiognomic Artificial Intelligence},
2538}
2539
2540@article{wired_criminality,
2541  url = {https://www.wired.com/story/algorithm-predicts-criminality-based-face-sparks-furor/},
2542  journal = {Wired},
2543  title = {An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor},
2544  date = {2020-06-24},
2545  author = {Fussell, Sidney},
2546}
2547
2548@article{ai_vs_clinicians,
2549  journal = {BMJ},
2550  eprint = {https://www.bmj.com/content/368/bmj.m689.full.pdf},
2551  url = {https://www.bmj.com/content/368/bmj.m689},
2552  publisher = {BMJ Publishing Group Ltd},
2553  doi = {10.1136/bmj.m689},
2554  year = {2020},
2555  elocation-id = {m689},
2556  volume = {368},
2557  title = {Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies},
2558  author = {Nagendran, Myura and Chen, Yang and Lovejoy, Christopher A and Gordon, Anthony C and Komorowski, Matthieu and Harvey, Hugh and Topol, Eric J and Ioannidis, John P A and Collins, Gary S and Maruthappu, Mahiben},
2559}
2560
2561@article{sepsis_validation,
2562  eprint = {https://jamanetwork.com/journals/jamainternalmedicine/articlepdf/2781307/jamainternal\_wong\_2021\_oi\_210027\_1627674961.11707.pdf},
2563  url = {https://doi.org/10.1001/jamainternmed.2021.2626},
2564  doi = {10.1001/jamainternmed.2021.2626},
2565  issn = {2168-6106},
2566  month = {08},
2567  year = {2021},
2568  pages = {1065-1070},
2569  number = {8},
2570  volume = {181},
2571  journal = {JAMA Internal Medicine},
2572  title = {{External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients}},
2573  author = {Wong, Andrew and Otles, Erkin and Donnelly, John P. and Krumm, Andrew and McCullough, Jeffrey and DeTroyer-Cooley, Olivia and Pestrue, Justin and Phillips, Marie and Konye, Judy and Penoza, Carleen and Ghous, Muhammad and Singh, Karandeep},
2574}
2575
2576@article{aclu_idaho,
2577  url = {https://www.aclu.org/blog/privacy-technology/pitfalls-artificial-intelligence-decisionmaking-highlighted-idaho-aclu-case},
2578  journal = {ACLU Blogs},
2579  title = {Pitfalls of Artificial Intelligence Decisionmaking Highlighted In Idaho ACLU Case},
2580  date = {2017-06-02},
2581  author = {Stanley, Jay},
2582}
2583
2584@article{verge_aclu_idaho,
2585  url = {https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy},
2586  journal = {The Verge},
2587  title = {What Happens When an Algorithm Cuts Your Health Care},
2588  date = {2018-03-21},
2589  author = {Lecher, Colin},
2590}
2591
2592@misc{democratizing_h20,
2593  howpublished = {\url{https://www.h2o.ai/democratizing-ai/}},
2594  year = {2022},
2595  title = {H2O.ai is Democratizing Artificial Intelligence},
2596}
2597
2598@article{democratizing_deloitte,
2599  journal = {Deloitte Insights - Signals for Strategists},
2600  url = {https://www2.deloitte.com/content/dam/insights/us/articles/4602_Democratizing-data-science/DI_Democratizing-data-science.pdf},
2601  author = {Schatsky, David and Chauhan, Rameeta and Muraskin, Craig},
2602  year = {2018},
2603  title = {Democratizing data science to bridge the talent gap},
2604}
2605
2606@article{de_democratizing,
2607  bibsource = {dblp computer science bibliography, https://dblp.org},
2608  biburl = {https://dblp.org/rec/journals/corr/abs-2010-15581.bib},
2609  timestamp = {Tue, 03 Nov 2020 11:44:23 +0100},
2610  eprint = {2010.15581},
2611  eprinttype = {arXiv},
2612  url = {https://arxiv.org/abs/2010.15581},
2613  year = {2020},
2614  volume = {abs/2010.15581},
2615  journal = {CoRR},
2616  title = {The De-democratization of {AI:} Deep Learning and the Compute Divide
2617in Artificial Intelligence Research},
2618  author = {Nur Ahmed and
2619Muntasir Wahed},
2620}
2621
2622@article{wired_paperclips,
2623  url = {https://www.wired.com/story/the-way-the-world-ends-not-with-a-bang-but-a-paperclip/},
2624  journal = {Wired},
2625  title = {The Way the World Ends: Not with a Bang But a Paperclip},
2626  date = {2017-10-21},
2627  author = {Rogers, Adam},
2628}
2629
2630@misc{coalition,
2631  howpublished = {\url{https://medium.com/@CoalitionForCriticalTechnology/abolish-the-techtoprisonpipeline-9b5b14366b16}},
2632  title = {Abolish the \#TechToPrisonPipeline},
2633  date = {2020-06-22},
2634  author = {Coalition for Critical Technology},
2635}
2636
2637@misc{google_nest_help,
2638  howpublished = {\url{https://support.google.com/googlenest/answer/6294727?hl=en}},
2639  title = {Wave control - Google Nest Help},
2640}
2641
2642@inproceedings{measuring_robustness_to_natural_distribution_shifts,
2643  year = {2020},
2644  volume = {33},
2645  url = {https://proceedings.neurips.cc/paper/2020/file/d8330f857a17c53d217014ee776bfd50-Paper.pdf},
2646  title = {Measuring Robustness to Natural Distribution Shifts in Image Classification},
2647  publisher = {Curran Associates, Inc.},
2648  pages = {18583--18599},
2649  editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
2650  booktitle = {Advances in Neural Information Processing Systems},
2651  author = {Taori, Rohan and Dave, Achal and Shankar, Vaishaal and Carlini, Nicholas and Recht, Benjamin and Schmidt, Ludwig},
2652}
2653
2654@misc{arvind-reproducibility,
2655  urldate = {2021-07-28},
2656  url = {https://reproducible.cs.princeton.edu/},
2657  howpublished = {\url{https://reproducible.cs.princeton.edu/}},
2658  year = {2021},
2659  pages = {6},
2660  author = {Kapoor, Sayash and Narayanan, Arvind},
2661  language = {en},
2662  title = {({Ir}){Reproducible} {Machine} {Learning}: {A} {Case} {Study}},
2663}
2664
2665@inproceedings{fairness_tradeoffs_neurips,
2666  year = {2019},
2667  volume = {32},
2668  url = {https://proceedings.neurips.cc/paper/2019/file/373e4c5d8edfa8b74fd4b6791d0cf6dc-Paper.pdf},
2669  title = {Unlocking Fairness: a Trade-off Revisited},
2670  publisher = {Curran Associates, Inc.},
2671  pages = {},
2672  editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
2673  booktitle = {Advances in Neural Information Processing Systems},
2674  author = {Wick, Michael and panda, swetasudha and Tristan, Jean-Baptiste},
2675}
2676
2677@article{nest,
2678  month = {Apr},
2679  year = {2014},
2680  author = {Wingfield, Nick},
2681  journal = {The New York Times},
2682  url = {https://www.nytimes.com/2014/04/04/technology/nest-labs-citing-flaw-halts-smoke-detector-sales.html},
2683  title = {Nest Labs Stops Selling Its Smoke Detector},
2684}
2685
2686@misc{catherine_olsson,
2687  url = {https://medium.com/@catherio/unsolved-research-problems-vs-real-world-threat-models-e270e256bc9e},
2688  howpublished = {\url{https://medium.com/@catherio/unsolved-research-problems-vs-real-world-threat-models-e270e256bc9e}},
2689  year = {2019},
2690  author = {Olsson, Catherine},
2691  language = {en},
2692  title = {Unsolved research problems vs. real-world threat models},
2693}
2694
2695@article{3d_printed_masks,
2696  url = {https://www.theverge.com/2019/12/13/21020575/china-facial-recognition-terminals-fooled-3d-mask-kneron-research-fallibility},
2697  journal = {The Verge},
2698  title = {Researchers fooled Chinese facial recognition terminals with just a mask},
2699  date = {2019-12-13},
2700  author = {Peters, Jay},
2701}
2702
2703@article{makeup,
2704  bibsource = {dblp computer science bibliography, https://dblp.org},
2705  biburl = {https://dblp.org/rec/journals/corr/abs-2109-06467.bib},
2706  timestamp = {Tue, 21 Sep 2021 17:46:04 +0200},
2707  eprint = {2109.06467},
2708  eprinttype = {arXiv},
2709  url = {https://arxiv.org/abs/2109.06467},
2710  year = {2021},
2711  volume = {abs/2109.06467},
2712  journal = {CoRR},
2713  title = {Dodging Attack Using Carefully Crafted Natural Makeup},
2714  author = {Nitzan Guetta and
2715Asaf Shabtai and
2716Inderjeet Singh and
2717Satoru Momiyama and
2718Yuval Elovici},
2719}
2720
2721@article{vegas_pd,
2722  url = {https://www.vice.com/en/article/pkyxwv/las-vegas-cops-used-unsuitable-facial-recognition-photos-to-make-arrests},
2723  journal = {Vice},
2724  title = {Las Vegas Cops Used ‘Unsuitable’ Facial Recognition Photos To Make Arrests},
2725  date = {2020-08-07},
2726  author = {Feathers, Todd},
2727}
2728
2729@inproceedings{LiaoAreWe2021,
2730  url = {https://openreview.net/forum?id=mPducS1MsEK},
2731  year = {2021},
2732  booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Pre-Proceedings)},
2733  author = {Thomas Liao and Rohan Taori and Inioluwa Deborah Raji and Ludwig Schmidt},
2734  title = {Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning},
2735}
2736
2737@inproceedings{friedler2019comparative,
2738  year = {2019},
2739  pages = {329--338},
2740  booktitle = {Proceedings of the conference on fairness, accountability, and transparency},
2741  author = {Friedler, Sorelle A and Scheidegger, Carlos and Venkatasubramanian, Suresh and Choudhary, Sonam and Hamilton, Evan P and Roth, Derek},
2742  title = {A comparative study of fairness-enhancing interventions in machine learning},
2743}
2744
2745@inproceedings{fish2016confidence,
2746  organization = {SIAM},
2747  year = {2016},
2748  pages = {144--152},
2749  booktitle = {Proceedings of the 2016 SIAM International Conference on Data Mining},
2750  author = {Fish, Benjamin and Kun, Jeremy and Lelkes, {\'A}d{\'a}m D},
2751  title = {A confidence-based approach for balancing fairness and accuracy},
2752}
2753
2754@article{impossibility_of_fairness,
2755  numpages = {8},
2756  pages = {136–143},
2757  month = {mar},
2758  journal = {Commun. ACM},
2759  abstract = {What does it mean to be fair?},
2760  doi = {10.1145/3433949},
2761  url = {https://doi.org/10.1145/3433949},
2762  issn = {0001-0782},
2763  number = {4},
2764  volume = {64},
2765  address = {New York, NY, USA},
2766  publisher = {Association for Computing Machinery},
2767  issue_date = {April 2021},
2768  year = {2021},
2769  title = {The (Im)Possibility of Fairness: Different Value Systems Require Different Mechanisms for Fair Decision Making},
2770  author = {Friedler, Sorelle A. and Scheidegger, Carlos and Venkatasubramanian, Suresh},
2771}
2772
2773@inproceedings{disparate_impact_suresh,
2774  series = {KDD '15},
2775  location = {Sydney, NSW, Australia},
2776  keywords = {disparate impact, fairness, machine learning},
2777  numpages = {10},
2778  pages = {259–268},
2779  booktitle = {Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
2780  doi = {10.1145/2783258.2783311},
2781  url = {https://doi.org/10.1145/2783258.2783311},
2782  address = {New York, NY, USA},
2783  publisher = {Association for Computing Machinery},
2784  isbn = {9781450336642},
2785  year = {2015},
2786  title = {Certifying and Removing Disparate Impact},
2787  author = {Feldman, Michael and Friedler, Sorelle A. and Moeller, John and Scheidegger, Carlos and Venkatasubramanian, Suresh},
2788}
2789
2790@article{adversarial_examples,
2791  year = {2014},
2792  journal = {arXiv preprint arXiv:1412.6572},
2793  author = {Goodfellow, Ian J and Shlens, Jonathon and Szegedy, Christian},
2794  title = {Explaining and harnessing adversarial examples},
2795}
2796
2797@article{De_Mauro2018-mi,
2798  year = {2018},
2799  month = {September},
2800  pages = {807--817},
2801  number = {5},
2802  volume = {54},
2803  journal = {Inf. Process. Manag.},
2804  author = {De Mauro, Andrea and Greco, Marco and Grimaldi, Michele and
2805Ritala, Paavo},
2806  title = {Human resources for Big Data professions: A systematic
2807classification of job roles and required skill sets},
2808}
2809
2810@book{virginia_eubanks,
2811  address = {New York},
2812  publisher = {St. Martin's Press},
2813  year = {2018},
2814  author = {Eubanks, Virginia},
2815  title = {Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor},
2816}
2817
2818@book{green2019smart,
2819  publisher = {MIT Press},
2820  year = {2019},
2821  author = {Green, Ben},
2822  title = {The smart enough city: putting technology in its place to reclaim our urban future},
2823}
2824
2825@article{haibe_kains,
2826  bdsk-url-2 = {http://dx.doi.org/10.1038/s41586-020-2766-y},
2827  bdsk-url-1 = {https://doi.org/10.1038/s41586-020-2766-y},
2828  year = {2020},
2829  volume = {586},
2830  url = {https://doi.org/10.1038/s41586-020-2766-y},
2831  ty = {JOUR},
2832  title = {Transparency and reproducibility in artificial intelligence},
2833  pages = {E14--E16},
2834  number = {7829},
2835  journal = {Nature},
2836  isbn = {1476-4687},
2837  id = {Haibe-Kains2020},
2838  doi = {10.1038/s41586-020-2766-y},
2839  date-modified = {2022-01-21 23:46:00 +0000},
2840  date-added = {2022-01-21 23:46:00 +0000},
2841  da = {2020/10/01},
2842  author = {Haibe-Kains, Benjamin and Adam, George Alexandru and Hosny, Ahmed and Khodakarami, Farnoosh and Shraddha, Thakkar and Kusko, Rebecca and Sansone, Susanna-Assunta and Tong, Weida and Wolfinger, Russ D. and Mason, Christopher E. and Jones, Wendell and Dopazo, Joaquin and Furlanello, Cesare and Waldron, Levi and Wang, Bo and McIntosh, Chris and Goldenberg, Anna and Kundaje, Anshul and Greene, Casey S. and Broderick, Tamara and Hoffman, Michael M. and Leek, Jeffrey T. and Korthauer, Keegan and Huber, Wolfgang and Brazma, Alvis and Pineau, Joelle and Tibshirani, Robert and Hastie, Trevor and Ioannidis, John P. A. and Quackenbush, John and Aerts, Hugo J. W. L. and Massive Analysis Quality Control (MAQC) Society Board of Directors},
2843}
2844
2845@article{mit_replication,
2846  year = {2020},
2847  month = {November},
2848  journal = {MIT Technology Review},
2849  author = {Douglas Heaven, Will},
2850  title = {AI is wrestling with a replication crisis},
2851}

Attribution

arXiv:2206.09511v2 [cs.LG]
License: cc-by-4.0

Related Posts

What does it mean to be a responsible AI practitioner: An ontology of roles and skills

What does it mean to be a responsible AI practitioner: An ontology of roles and skills

Introduction With the rapid growth of the AI industry, the need for AI and AI ethics expertise has also grown.

Walking the Walk of AI Ethics: Organizational Challenges and the Individualization of Risk among Ethics Entrepreneurs

Walking the Walk of AI Ethics: Organizational Challenges and the Individualization of Risk among Ethics Entrepreneurs

Introduction Multiple polls have found that public trust in and positive sentiment toward the technology sector has fallen significantly in the US and globally in the past decade.

Big data ethics, machine ethics or information ethics? Navigating the maze of applied ethics in IT

Big data ethics, machine ethics or information ethics? Navigating the maze of applied ethics in IT

Introduction Digitalization efforts are rapidly spreading across societies, challenging new and important ethical issues that arise from technological development.