How testing prevents AI bias

By: Guest

8, November, 2019

Categories:

Artificial Intelligence - Consumer - Data - IoT - Machine Learning -

  •  
  •  
  •  
  •  
  •  
  •  

By Jen Gold on behalf of Applause

We’ve all heard of the potential for hiring bias that AI can create. You’re looking for the perfect hire, but your AI algorithm only tells you to recruit white men from prestigious schools. 

Of course, the reason isn’t that white men are the definitive optimal hires – it’s because you’ve (hopefully inadvertently) trained your AI to think those are the best candidates. This is an insidious example of the dangers of systemic biases, and you need to tackle this problem head-on. With the right steps, you can remove biases, not exacerbate them.

Companies that suffer from AI bias are not only opening themselves up to potential discrimination lawsuits – they’re simply not going to make the best data-driven decisions. If your AI provides biased outputs, you can’t properly serve your customer base.

Here at Applause, we believe that if you want to reach your audience regardless of gender, race, nationality or any other demographic, you must account for AI bias. Here’s what to think about.

What causes AI bias?

To teach an AI and/or machine learning algorithm, you need one thing most of all: training data. And lots of it.

Algorithms tune themselves on this training data, and developers must input high-quantity and high-quality data to build the most effective models.

Remember, however, training data is not just about how much you consume. Think of the training data you use as the food you eat. If you eat a well-rounded and balanced diet, you will look and function like the best version of yourself. If you miss meals or consume a steady diet of junk food, you won’t be your optimal self. And if you hone in too much on one specific food – even if it’s a healthy option – you’ll suffer from a poor result too.

Here’s the scary part: If you input low-quality training data, you might not even realize your AI is biased. And if you input data from one source, you are destined for biased results. To improve your data quality, you want to source, label and annotate your data. This will help the algorithm ingest the data and make proper use of it – but that’s just the first step.

How to ensure there’s no bias

It’s easy to judge if a human being moves to a poor diet – their energy drops and their waistline expands. Now how do you determine if your algorithm has a poor diet of training data? Here are a couple ways.

Your training data should originate from a diverse population, including by country, age, gender, race, culture, ideology, socioeconomic and education level, and more. This is the equivalent of a healthy diet – a good mix of fruits, vegetables, protein, etc. Training data that comes from a broad base is far more likely to create a representative and unbiased output.

However, you shouldn’t simply trust that your AI is unbiased, even if the training data comes from a diverse background. This is where testing comes in. You want another diverse community to test your algorithm before it goes public – it’s critical to use a different members of your testing community to prevent confirmation bias.

Your testing community highlights the problems. For example, a female tester can point out problems with your algorithm that a male tester is likely to miss, and a European tester will notice issues that slip past an American. With that information, you can retrain the algorithm with new data to prevent bias.

Many companies skip this final testing step. Some don’t have the budget to test their AI models, and others perhaps want to launch quickly and lack the time for extensive in-house testing. There are even some companies that use the same team that built the AI to test it, which results in confirmation bias.

Here at Applause, we strongly believe you are doing your company a long-term disservice if you don’t thoroughly test your AI algorithms. Remember, poor AI is frustrating and alienating for customers, and you need to protect your brand.

You can ask yourself all day, will my training data produce biased results? But ultimately, the best way to answer that question is to ask a lot of highly vetted and well-trained testers, and see what they tell you.