Are we so different from machines?
Not very often, your job requires you to train artificial neural networks, and at the same time, your personal life gives you the joy of raising natural neural networks, also known as your children. But when it happens, you will have this funny déjà vu feeling that will lead you to ask some serious questions.
You look at your newborn daughter, and day by day you observe how she first starts to recognize day from night, then light from dark, then human from non-human, then mummy from daddy.
And you observe the same thing at your job – you train a model of a neural network to describe images and you can see that first it starts paying attention to general features like brightness or contrast, and then it uses these skills to build shapes and patterns, and at the end it becomes so sophisticated that it can recognize cars from bicycles, cats from dogs, women from men, mummies from daddies.
Well, you can put all these similarities down to being totally coincidental, or you can dig into this further. I did the latter, and if you want to join me today on my quest, prepare yourself for a treat – we will play surgeons, cutting the artificial “brain” layer by layer, on the mission of answering these haunting questions: are we really so different from machines? When an artificial neural network looks at a picture, does it see it differently than how we see it? What does it see? Dots, lines, shapes, patterns? And what if the artificial neural network actually is more sophisticated than we are? What does it see that we don’t? What do we miss?
How does one become an artificial brain surgeon?
Well, firstly one needs to obtain an artificial “brain”. In our case, we will use VGG19 – a network specializing in image recognition. In reality, it’s just a kind of a complicated mathematical function which takes a matrix of pixels of an image (this would be the input) and transforms it into a description of what’s on the image (this would be the result). You can think of it as a quite smart Excel spreadsheet: on the first tab you just enter colors of each pixel, and then the maths kicks in, the information gets transferred from one sheet to another, and at the end you have, on the last sheet, textual information of what’s in the image:
The nice part of VGG is that it’s built from separate blocks (similar to Excel sheets), so we can always look inside each block, and we can also “freeze” other blocks. You see, being an Artificial Intelligence surgeon is really easy – freeze a sheet, and voila, you’ve just cut a piece of brain off!
OK, so we have the brain. Now, how do we see what it sees? Well, this is going to be a bit complicated, so brace yourself.
As you probably know from biology at school, neurons in your brain activate themselves. You look at a picture, and some neurons in your brain fire and some don’t. VGG is very similar; each block is built from neurons, and when VGG looks at an image, some of the neurons fire and some don’t.
Equipped with this knowledge, you can do a very cool trick:
- You take an image, a landscape for example, and you run it through VGG. You store the description on the last block (Lake 98%, Trees 95%, Beach 27%, etc.) – we will call it the Content Information.
- You take a different image, a Rembrandt painting for example, and you also run it through VGG. This time you store information about which neurons fired and which didn’t – we will call it the Style Information.
- Now, the trick: you create a third image, a totally different image, so that it will make VGG producing the same Content Information on the last block, AND AT THE SAME TIME having the same neurons activations (Style Information). It’s like asking a painter: draw me a piece of art that has the same objects as this landscape and gives me that same sensation as when I look at this Rembrandt image.
And here’s how it works in practice:
Let me show you the combined image in full, so you could see how good the artificial painter is:
Pretty Rembrandt-ish, isn’t it?
The trick is called Style Transfer and we will use it to see what happens on each layer of the artificial network. We will cut the brain layer by layer, and use each layer in the Style Transfer process separately. This will allow us to see what each layer can sense, what skills it possesses. For example, if a layer is very sensitive to lines, then even if we show it a nice image of a landscape, it will see it as lines – the only things that give it any sensations. And respectively, it will draw the landscape using lines. So by examining how each layer draws the landscape, we can understand how it sees the landscape – what it can and can’t recognize.
One last thing – we will not use a Rembrandt painting, since its style is too sophisticated. Instead, we will use something that is pure and simple – like this Kandinsky painting:
Without further ado, let us start the operation…
So here we are: a landscape painted by only the first layer. As you can see, it is all about dots. The dots are the same size. The layer gets the colors totally wrong – it’s because it uses only the colors that it can find on the Kandinsky painting. Interestingly, it can’t mix colors – you can’t see dark green for example, because it’s not on the Kandinsky.
The second layer uses elements from the first layer, but it is more sophisticated. Dots become lines. If you zoom very close, you notice that lines are mostly at right angles to each over – the layer understands the idea of perpendicularity. Still, the lines have the same width – like it was using only one type of crayons. Also, shapes (places between lines) have the same width. The image starts to have details (see branches on trees), which means that this layer pays attention to detail. The colors are still wrong, but you can see that it can mix colors, and you can now observe shades of colors from the Kandinsky image.
The third layer is even more sophisticated, you can see it can draw lines with different widths and can change the orientation of them (angles). With regards to colors, they are still wrong, but you can notice gradients and some patterns – this means that the layer can spot them.
So far we have moved from dots to lines to gradients to patterns… basically, when it comes to seeing an image, this is all the stuff we humans know. Do you want to see what’s beyond them?
Well, the first thing you notice is probably that the colors are right. You might think: finally, the network started to recognize the colors! But you would be wrong.
The thing is, on the first 3 layers, the colors were wrong precisely because the network was paying attention to colors. Remember, we asked it to draw us an image with the same sensations as when looking at the Kandinsky image – and because it was using Kandinsky colors, it means it was paying attention to it. Now it has stopped. From the fourth layer onward, colors are not important, but the network is so beyond colors now.
You can also notice strange patterns in the sky – this is how this layer perceives a dull background. It uses artifacts from the Kandinsky image which give it the same sensations. To understand why it does that, we imagine it’s like talking to a Rain Man – you see just the dull sky, but he feels it differently, to him it’s easier to explain it using lines, grills, and waves. This is what it is like to feel an image where you’re past dots and lines and gradients and patterns… past what humans are able to grasp.
To give you maybe another explanation: imagine that I am a blind man and I ask you to describe the sky. You will probably say something in line with “you know, it’s mostly blue, like water in a lake, and it’s got clouds, they are these white puffy things, similar to sheep”. And this would make me very puzzled. I did hear a lake before, and I have heard sheep. I have also heard the sky before, and it doesn’t sound anything like a lake or a sheep. Why are you talking about lakes and sheep when you describe the sky? Doesn’t make much sense, does it?
Well, looking at how the network feels the image, it’s us who is this blind man, puzzled by why the network draws lines and grills and waves.
I am not going to lie to you, I can’t understand what it draws. Just sit and enjoy the network climbing, layer by layer, onto this meta-level of seeing images, this meta-level we can’t comprehend. Notice how little it pays attention to colors and details, how it draws more and more things “above” the original image. How we could see the world, but never will.
Eighth (last) layer
How do you feel, knowing you see less?
So here we are, at the end of our quest. We played surgeon, we looked inside an artificial brain, we saw its skills and sensation. Some we recognized, some we didn’t. Do you feel overwhelmed? I certainly do.
On one hand, it’s reassuring to know we humans share the base with Artificial Intelligence, even if it is just such a very simple VGG19.
On the other hand, it’s disturbing to know it sees more, senses more than we do. Should we be afraid?
Time will show.
Written by Lukasz Kuncewicz on behalf of Enigma Pattern Inc.