Software 2.0; or, why did that AI think that muffin was a chihuahua?
For all the hype around AI, there’s often that nagging sense of doubt. Why did that Tesla think hitting 85mph in a residential area was fine? Why was it so easy to turn a benign chatbot into a fascist? How is that machine learning can identify powerful new antibiotics but also think a helicopter is a rifle?
AI’s uncanny ability to create distance between our expectations and reality is because machine learning is not traditional software. Its outputs make more sense—and mistakes in them are better dealt with—when we think of it as software 2.0 and are aware of the differences between it and the more traditional software with which we’re more familiar.
Software 1.0 vs software 2.0
Traditional software is code written manually by a developer. It uses clear, deterministic rules. Given the same input, it will also produce the same output.
Machine learning’s algorithms are hidden behind a veil; they are written automatically based on input data.
At the outset, this input data is training data (i.e., labelled data). This often comes from data labellers, who take raw data and attach labels to it to create a training dataset. Once a model has been trained on the training data, it heads into production, where its input data becomes real-world data.
Why does AI fail at simple tasks?
Machine learning is useful only when aptly applied. If the task requires a clear mathematical specification, it’s probably best being left to software 1.0.
For example, imagine you’re in your living room and you have a simple wheeled robot. You want it to drive from its current location, through the door, and out into your hallway. You can write rules for it to follow, such as, “Turn left 180 degress, then move forward 30centimetres, then turn right 20 degrees”, etc. This would be enough to get your robot out of the living room. It’s a good problem for software 1.0 to solve. Simply write the code, run it,
and boom; your robot left the room.
Solving this problem with an advanced neural network is possible, but wholly unnecessary and overly complicated.
So where can I use software 2.0?
Let’s upgrade your wheeled robot to a car. Now, you want that car to drive from Tulsa to Chicago, with a detour for some fried chicken in Kentucky. A human with Google Maps and a driver’s license could accomplish this pretty easily.
Attempting to do this using the handwritten rules we used with our living-room robot would be mind-bending arduous and almost guaranteed to fail. Even if you manage to code all the instructions for driving the route correctly, your car isn’t prepared for stop lights, other cars, inclement weather, etc., etc.
Driving from Tulsa to Chicago with a stop off for fried chicken is not a deterministic process. If you were to drive the route 1,000 times, it would be significantly different every time.
What this means is software 2.0 is your best bet when your problem involves an element of randomness or the tasks is possible for humans to do almost automatically. As computer scientist Andrew Ng puts it:
“If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.”
Software 2.0 in the real world
When you see instances where machine learning has been fooled, these are the edge cases that training data is essential to solving. More data, and specifically data tailored to your precise use case, means fewer mistakes like this.
The need for so much high-quality training data means that data scientists end up spending more and more time acquiring training data and using it to train the model.
Even once you’ve identified the problem you’re going to solve with ML, you’re still faced with a huge barrier to production deployment: 98 percent of businesses cite preparation and aggregation of large datasets as a major hurdle to the deployment of machine learning algorithms.
Canotic aims to remove the hurdle between raw data and training data. We want to make sure everyone who wants to use software 2.0 in their business is able to do so without incurring massive costs in terms of both time and expertise.
We provide access to data labelling products that you can customize to the requirements of your project. Once you’ve run a few test tasks, you can upload raw data, and Canotic will return accurate training data for your ML model to train off.
What does this mean for you?
Getting started with AI can seem daunting, and there are certainly some things to keep in mind before getting started. During our talk at the AI & Big Data event in London, we will cover these fundamental differences between software 1.0 and 2.0 in more detail and look at some frequent and rather costly mistakes companies encounter when making the transition to an AI-enabled business. If you’d like to read more on AI in the real world, check out our white paper or talk to our team.
Image reference: https://www.freecodecamp.org/news/chihuahua-or-muffin-my-search-for-the-best-computer-vision-api-cbda4d6b425d/