Bias and ethical issues in machine-learning models

September 2, 2019

This article was contributed by Andy Oram

The success stories that have gathered around data analytics drive broader adoption of the newest artificial-intelligence-based techniques—but risks come along with these techniques. The large numbers of freshly anointed data scientists piling into industry and the sensitivity of the areas given over to machine-learning models—hiring, loans, even sentencing for crime—means there is a danger of misapplied models, which is earning the attention of the public. Two sessions at the recent MinneBOS 2019 conference focused on maintaining ethics and addressing bias in machine-learning applications.

To define a few terms: modern analytics increasingly uses machine learning, currently the most popular form of the field broadly known as artificial intelligence (AI). In machine learning, an algorithm is run repeatedly to create and refine a model, which is then tested against new data.

MinneBOS was sponsored by the Twin Cities organization Minne Analytics; the two sessions were: "The Ethics of Analytics" by Bill Franks and "Minding the Gap: Understanding and Mitigating Bias in AI" by Jackie Anderson. (Full disclosure: Franks works on books for O'Reilly Media, which also employs the author of this article.) Both presenters pointed out that bias can sneak into machine learning at many places, and both laid out some ways to address the risks. There were interesting overlaps between the recommendations of Franks, who organized his talk around stages, and of Anderson who organized her talk around sources of bias.

When we talk about "bias" we normally think of it in the everyday of sense of discrimination on the basis of race, gender, income, or some other social category. This focus on social discrimination is reinforced by articles in the popular press. But in math and science, bias is a technical term referring to improper data handling or choice of inputs. And indeed, the risks in AI go further than protected categories such as race and gender. Bias leads to wrong results, plain and simple. Whether bias leads to social discrimination or just to lost business opportunities and wasted money, organizations must be alert and adopt ways to avoid it.

Franks based his talk on the claim that ethics are intuitive and inherently slippery. As a simple example, consider the commandment "Thou shalt not kill." Although almost everybody around the world would acknowledge the concept, few would agree on exactly when you could violate the rule. Another example would be the famous "right to be forgotten", now mandated by the GDPR in the European Union. Sometimes, the need to keep data for legal conformance or law enforcement overrides the right to be forgotten. At other times, Franks said, it's just infeasible—how can you maintain a relationship with a health care provider or insurance company if you want them to forget something about you?

Franks offered five stages in machine learning at which to look for and correct bias. I'll describe them here, noting where Anderson's three sources of bias match up.

Modeling targets and business problems

~~This stage is where the two companies mentioned earlier parted ways on the question of offering customer rewards.~~ As another example, Franks cited the privacy issues tied up in the notorious case where Target sent pregnancy-related offers to a 17-year-old who was trying to hide her pregnancy from her father. This public relations disaster highlights the differences between laws, ethics, and plain good sense about business goals. Legally, Target was perfectly entitled to send pregnancy-related offers. Although there's a difference between personal medical information and routine retail sales like potato chips, Target was probably within reasonable ethical bounds in sending the pregnancy information. Where it fell short was in considering what customers or the general public would find acceptable.

This stage in Franks's taxonomy aligns with Anderson's first source of bias, "Defining the problem". Anderson mentioned, as an example, college analytics about what potential students to target for promotional materials. (Recruitment costs per student start around $2,300 and can go over $6,000.) If one assumes that the students getting promotions are more likely to apply, matriculate, graduate, and ultimately benefit the institution (certainly that's the reasoning behind sending out the materials), avoiding bias is important for both business and social reasons. Anderson said that, before running analytics, the college has to ask what it's trying to find out: Who is most likely to accept, who is most likely to graduate, who will give the largest donations later in life, who will meet diversity goals, etc.

Modeling input data

Here the core problem is that nearly every algorithm predicts future performance based on existing data, so it reproduces whatever bias was used in the past. Anderson calls this a form of circular reasoning that excludes important new candidates, whether for a college, a retail business, or a criminal justice probe. Franks pointed out that any model can be invalidated by a change in the environment or the goal.

Franks turned here, as Anderson had, to an example of predicting college admissions. If your model was trained at a time when 80% of your students were liberal arts majors, but you have changed strategies so that 80% of your students are now business majors, your old model will fail. Another example Franks offered of a changed environment pertains to criminal sentencing. U.S. society has recently changed its idea of how to punish criminals, so "fair sentencing" software based on the old ways of sentencing has become invalid.

Two of Anderson's sources of bias align with this stage: "Selecting data" and "Cleaning/preparing the data". Anderson cited a problem in preparing data, involving a retail firm that was trying to determine which customers were most profitable. When cleaning the data, the firm excluded small purchases because it assumed it should aim for customers who purchased expensive items. Later it found out that it had totally missed a large and loyal clientele that spends quite a lot—but in small amounts.

Often, she said, you have look at results to uncover hidden sources of bias. In one case she cited, Amazon's HR department designed an algorithm to help find the best programmers among applicants. The department started without any checking for bias, which naturally led to a model that discriminated in favor of men because they are currently most common in the field. After looking at results and realizing that the model was biased against women, the department created another model after changing obvious gender markers such as pronouns and names. But the model was still biased. It turns out that word choices in job applications are gender-specific too.

Modeling transparency and monitoring

Franks endorsed the idea of transparency in models, commonly called "explainability": model users have to know what is triggering a decision. In one example, an image-recognition program was trained to distinguish huskies from wolves. It seemed to perform extremely well until researchers delved into a failed case. They discovered that the program didn't look at the animal at all. If there was snow, it assumed the animal was a wolf. This was due to the kinds of images it was shown; the wolves were outdoors and the huskies were mostly indoors.

The strategy used by these researchers to determine the exact features that led to a decision fall under the term Local Interpretable Model-Agnostic Explanations (LIME). In the case of image processing, a typical LIME approach is to focus on several different parts of the image and run them through the model to see which parts of the image truly predict the result. This process is model-agnostic.

Some fields have legal requirements for transparency. For instance, when a bank denies credit, it has to list the precise criteria that led to the decision. There are no such regulations in health care, but transparency is necessary here too. Most clinicians would be very nervous making a diagnosis based on a black-box model.

Franks suggested that LIME be a factor in government approval for the use of analytics models. If a robust LIME procedure shows that the algorithm is looking at relevant features, the model should have a stronger chance of being approved. Franks also brought up the importance of emergency brakes on algorithms. In one example, a programmed trading algorithm wiped out one company's stock price for a short time. The programmers should have built in a check for such sudden or bizarre behavior, and made the program shut itself down, just as a factory has a button any employee can push to stop the assembly line.

Modeling usage

At this stage, Franks said, you must understand whether the context in which you're using a model makes it fair. Often the researcher spends 10% of the time creating a reasonably good algorithm and 90% error-proofing it.

He complained that the public is overly spooked by exceptional events in the use of new analytics. Every time an autonomous vehicle is involved in an accident, there are bans and public calls to abandon the technology. The real question for Franks is: over a certain interval (say, 100,000 miles driven), how many accidents were caused by human drivers versus self-driving cars?

Defining policies

Franks said organizations should define clear policies regarding the use of machine learning and publish them. The famous case where Apple Computer refused to unlock an accused criminal's cell phone led to the benefit of Apple publicly stating its views about privacy. Apple may have gained some customers and lost others, but now everyone can judge the company on the basis of this policy.

Recommendations

Franks's concluding recommendations were to create an ethics review board consisting of people from different relevant disciplines, like research institutions' Institutional Review Boards (IRBs), to write out policies, and deal firmly with violations. He called on the analytics community as a whole to take ownership of bias.

Anderson's main recommendation for fixing bias was to ensure team diversity. You have to experience life a certain way in order to understand how people with that life behave—and how they're potentially excluded. She also advised companies to collaborate, overcoming the barriers they set up out of worries over trade secrets. Many organizations don't even share the information they have across internal company borders. They should join forums and collaborative initiatives to recognize bias and improve diversity. Organizations focused on this include the Algorithmic Justice League (AJL). There are also toolkits such as IBM's AI Fairness 360.

The talks by Franks and Anderson showed that, about a decade into the new epoch of machine learning, researchers and practitioners are aware of bias and are designing practices that try to correct it. One remaining question is how much we as a society can depend on the competency and goodwill of the organization designing or using the model. Where can regulation fit in? And how much responsibility lies on the researchers who designed the model, versus the user who applies it in real life, or even the regulators who approve the model's use? Hopefully, as we learn the efficacy of practices that correct bias we can also answer the question of how to make sure they are used.

Index entries for this article
GuestArticles	Oram, Andy

Somewhat OT: Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 0:38 UTC (Tue) by dskoll (subscriber, #1630) [Link] (2 responses)

"Thou shalt not kill" is a bad translation. The original text is more like "Do not murder" which is a little less ambiguous.

Somewhat OT: Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 7:54 UTC (Tue) by vadim (subscriber, #35271) [Link] (1 responses)

I would say it's a lot more ambiguous, actually.

"Kill" is a pretty clear term.

"Murder" on the other hand just stands for "whatever kinds of killing that we happen to disagree with", which can mean absolutely anything. It's almost as bad of a rule as "do the right thing".

Somewhat OT: Bias and ethical issues in machine-learning models

Posted Sep 8, 2019 19:23 UTC (Sun) by marcH (subscriber, #57642) [Link]

Yes, "Murder" is better because it doesn't try to hide the actual ambiguity of the real-world rule like "Kill" tries to.

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 1:14 UTC (Tue) by scientes (guest, #83068) [Link] (1 responses)

I, personally, find that having us humans be bred by machines is a relief.

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 1:14 UTC (Tue) by scientes (guest, #83068) [Link]

...goes back to right swiping.

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 8:45 UTC (Tue) by nilsmeyer (guest, #122604) [Link]

I think part of the problems is that reality doesn't always align with our biases on how it should be.

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 9:02 UTC (Tue) by laarmen (subscriber, #63948) [Link] (2 responses)

I'm confused by the sentence "This stage is where the two companies mentioned earlier parted ways on the question of offering customer rewards." in the section "Modeling targets and business problems". Which companies are we talking about here ? AFAICT the only companies mentioned before this are O'Reilly Media and Minne Analytics. Did I miss something ?

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 13:45 UTC (Tue) by jake (editor, #205) [Link] (1 responses)

> Did I miss something ?

No, we did. In a late-breaking edit, we eliminated one of the examples, but missed that it was referred to further on. Sigh ...

jake

Bias and ethical issues in machine-learning models

Posted Sep 8, 2019 19:29 UTC (Sun) by marcH (subscriber, #57642) [Link]

In such a "big typo" case I would not just strike the spurious sentence through but remove it completely. Strike through is useful to correct readers who previously internalized something wrong but logical enough to be believed. But here this is just noise I wasted a fair amount of time on trying to understand what correction had been made.

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 9:22 UTC (Tue) by wazoox (subscriber, #69624) [Link] (1 responses)

Biases in ML are the subject of this amusing and interesting article : https://thegradient.pub/nlps-clever-hans-moment-has-arrived/

Bias and ethical issues in machine-learning models

Posted Sep 9, 2019 18:56 UTC (Mon) by ballombe (subscriber, #9523) [Link]

> Biases in ML are the subject of this amusing and interesting article :
https://thegradient.pub/nlps-clever-hans-moment-has-arrived/

What makes the article strange is that anybody that teach human students has encountered this issue (student guessing the correct answer to a test from lexical cue), but somehow it is new to machine learning researchers ?

On the other hand, this can gives a pseudo-objective measure of how much student test answers can be guessed from lexical cue!

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 9:27 UTC (Tue) by mimor (guest, #133670) [Link]

> just as a factory has a button any employee can push to stop the assembly line

FYI: This is called an Andon cord/button: https://en.wikipedia.org/wiki/Andon_(manufacturing)

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 9:30 UTC (Tue) by mimor (guest, #133670) [Link] (3 responses)

For those that are looking for a plethora of examples where machine-learning models went wrong, I can recommend the following read:
https://en.wikipedia.org/wiki/Weapons_of_Math_Destruction
The examples are mostly US based, so if you're looking for a EU or other continent-example, this is not the book for you.

Bias and ethical issues in machine-learning models

Posted Sep 4, 2019 9:17 UTC (Wed) by edeloget (subscriber, #88392) [Link]

Does that mean that the book is biased?

(Ok, I'm leaving).

Bias and ethical issues in machine-learning models

Posted Sep 10, 2019 12:38 UTC (Tue) by robbe (guest, #16131) [Link] (1 responses)

I found that the article was also pretty USA-centric (or -biased, if you like). More so than LWN typically is.

For example, I was reading offline and couldn’t readily guess what twin cities were meant – and I am not well versed in how the mentioned universities typically earn money.

Bias and ethical issues in machine-learning models

Posted Sep 10, 2019 15:11 UTC (Tue) by nix (subscriber, #2304) [Link]

> I was reading offline and couldn’t readily guess what twin cities were meant

Obviously Ul Qoma and Besźel, right? (Right?)

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 12:32 UTC (Tue) by clugstj (subscriber, #4020) [Link] (3 responses)

A lot of this article sounds like "Make sure your machine-learning biases match our societal biases".

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 13:14 UTC (Tue) by farnz (subscriber, #17727) [Link]

It's slightly more subtle than that - it's "make sure your machine learning biases match the biases our society claims to want, not those that it actually has".

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 13:29 UTC (Tue) by Paf (subscriber, #91811) [Link] (1 responses)

Well, sure? But it’s not like we can avoid that anyway - remember, the vast majority of these examples are not trained on objective data, like say exam results (which can be biased but obviously a lot less so than things which are just human choices) - they’re trained on stuff that was generated through human choices.

So, yes - we have to decide what we want these things to do, and that means thinking about what biases might be present in them. You might not think that, for example, women’s low rate of participation in engineering jobs is a bias. I don’t agree, but it’s a position one could hold.

But consider a more extreme example, let’s say a criminal sentencing system that was trained only on sentencing information from the American south from 1900 to 1950, and is intended to tell a judge what a “reasonable” sentence is based on this data. If the input includes racial descriptors, I think *most* people would agree it’s not going to generate reasonable sentences for black defendants compared to white defendants.

So what would you do with that? I mean, you’d presumably, idk, try to pick a different input set or blind it to racial information, etc, etc.

And so you’re altering the model to fit your view of the world - your bias, one might say.

From there on in, it’s just an argument about what the right view of the world is and what data sets and models are good enough/objective enough/whatever to generate trained systems that give “good” results.

So yeah, you can obviously wrap yourself around the axle looking for your desired results and just bend the model/inputs until it gives them... but it also seems clear that the choice of input data etc is hugely important. And now we’re in a *massive* grey area of moral choice.

We’re not modeling physics here. We’re asking about human choices, so bias is everywhere.

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 15:41 UTC (Tue) by nilsmeyer (guest, #122604) [Link]

> But it’s not like we can avoid that anyway - remember, the vast majority of these examples are not trained on objective data, like say exam results (which can be biased but obviously a lot less so than things which are just human choices)

If you look at Exams and IQ tests, there are some clear distributions that don't really inspire comfort. Many people who argue for (unconscious) biases existing would probably have huge problems with the uneven distributions here.

> You might not think that, for example, women’s low rate of participation in engineering jobs is a bias. I don’t agree, but it’s a position one could hold.

You would need to be able to prove a couple of things:
1. rates of participation aren't (roughly) equal due to bias.
2. (optional) that bias creates harmful outcomes
3. this can be fixed by properly tweaked machine learning algorhitms

I think that's a tall order, especially if you not only look at professions like engineering, there are some with even greater disparity, there are very few female plumbers or bricklayers, men are under-represented in healthcare. This all has to fit together somehow. Are there any jobs that have 50:50 parity between sexes?

Bias and ethical issues in machine-learning models

Posted Sep 3, 2019 16:17 UTC (Tue) by q3cpma (subscriber, #120859) [Link] (2 responses)

>After looking at results and realizing that the model was biased against women, the department created another model after changing obvious gender markers such as pronouns and names. But the model was still biased
>But the model was still biased
"But the model wasn't aligning with our/my bias".

Why do we need this political bullshit disguised as truism (i.e. ML doesn't erase biases), by the way?

Bias and ethical issues in machine-learning models

Posted Sep 4, 2019 13:57 UTC (Wed) by nix (subscriber, #2304) [Link]

In this case it might well be genuine bias: the proportion of women in the output was not equivalent to the proportion in the general population. i.e. actual mathematical bias. (One wonders if you'd call it 'political bullshit' if it came out with a large bias in the opposite direction...)

Bias and ethical issues in machine-learning models

Posted Sep 5, 2019 5:40 UTC (Thu) by dvdeug (guest, #10998) [Link]

One wonders how you'll feel when the AI selects against people with multisyllable names (because all their programmers were named Wang, Li or Chen) or people with words like "Republican" on their resume.

In any case, the law cares. If you're writing code that evaluates people, you may raising legal issues for your company if you don't try to control for biases in the legally protected categories.

Bias and ethical issues in machine-learning models

Posted Sep 4, 2019 6:29 UTC (Wed) by nilsmeyer (guest, #122604) [Link] (1 responses)

And a lot of these times these algorithms are used in advertising, which isn't a particularly ethical pursuit either.

Bias and ethical issues in machine-learning models

Posted Sep 5, 2019 16:24 UTC (Thu) by madhatter (subscriber, #4665) [Link]

I'm glad somebody made that point. When I read in the original article

> Although there's a difference between personal medical information and routine retail sales like potato chips, Target was probably within reasonable ethical bounds in sending the pregnancy information.

my hackles went right up. That might be within reasonable ethical bounds in the US, but the EU is definitely moving away from the idea that retailers are entitled to make any use of data arising from customer transactions other than those involved in processing the actual transaction.

Bias and ethical issues in machine-learning models

Posted Sep 5, 2019 16:48 UTC (Thu) by kmweber (guest, #114635) [Link]

Mar Hicks's article "Hacking the Cis-tem" in IEEE Annals of the History of Computing a few months ago is an excellent account of algorithmic bias as a social/historical issue, if anyone's interested. Historians and other STS scholars are increasingly turning attention to these issues.

Bias and ethical issues in machine-learning models

Posted Sep 8, 2019 23:26 UTC (Sun) by marcH (subscriber, #57642) [Link]

tl;dr: garbage in, garbage out.

So while AI is finally producing something useful after a few decades, there are apparently still a few jobs that need humans. These humans will likely be _assisted_ by AI more and more often.

As this example shows, working with more and more machines will require being comfortable with data, science and numbers more than ever before. Too bad climate change and energy are for instance in the US questions of party affiliation much more than question of... degrees Farenheits and BTUs. Let's put all the blame on non-decimal units for the failed education ;-)