Originally posted on realclearscience.
Artificial intelligence is everywhere. Once the domain of science fiction, it’s used in everything from virtual assistants, to facial recognition systems, to self-driving cars. But how, exactly, does it work? The answer has actually not been well understood, which is a problem. After all, not fully understanding the “how” means we don’t fully understand why and when it fails. That can have big implications, particularly where safety is concerned—think of not knowing why a self-driving car got into an accident.
Most current AI is built on neural networks, which are series of algorithms that recognize patterns in massive datasets, similar to how a human brain works. Our brains use billions of cells called neurons that form networks of connections with one another, processing information as they send signals to and from each other, hence the name neural network.
These networks are “trained” by being given examples of inputs and desired outputs. For example, by repeatedly training the networks on input-output pairs, they find “features” in their input that help them perform their task—such as detecting cat ears or cat tails or simply identifying whether an image depicts a cat or not.
While we use AI more and more, the artificial intelligence research community doesn’t necessarily have a complete understanding of what neural networks are doing; they give us good results, but we don’t always know how or why.
And that is what we’re looking to solve.
Our team at Los Alamos National Laboratory has developed a novel approach for comparing neural networks that looks within the black box of artificial intelligence to help us better understand neural network behavior.
Our recent research was presented at the Conference on Uncertainty in Artificial Intelligence. In addition to studying network similarity, this research is a crucial step toward characterizing the behavior of robust neural networks.
Neural networks are high performance, but fragile. For example, self-driving cars use neural networks to detect signs. When conditions are ideal, they do this quite well. However, the smallest aberration—such as a sticker on a stop sign—can cause the neural network to misidentify the sign and never stop.
To improve neural networks, we are looking at ways to improve network robustness. One state-of-the-art approach, a metric of similarity between neural networks, involves “attacking” networks during their training process. We intentionally introduce aberrations and train the AI to ignore them. This process is called adversarial training and essentially makes it harder to fool the networks.
Our team applied the new metric of network similarity to adversarially train neural networks, and found, surprisingly, that adversarial training causes neural networks for computer vision to converge to very similar data representations, regardless of network architecture, as the magnitude of the attack increases.
We found that when we train neural networks to be robust against adversarial attacks, they begin to use the same features to perform their task.
There has been extensive effort in industry and the academic community searching for the “right architecture” for neural networks, but our findings indicate that the introduction of adversarial training narrows this search space substantially. As a result, the AI research community may not need to spend as much time exploring new architectures, knowing that adversarial training causes diverse architectures to converge to similar solutions.
By finding that robust neural networks are similar to each other, we’re making it easier to understand how robust AI might really work. We might even be uncovering hints as to how perception occurs in humans and other animals. There’s no question that, as AI continues to become an integral part of our everyday lives, a greater understanding of how it works is better for all of us.