I still read Slashdot for my tech news (because I’m old, I guess) and came across this article, AI Training Algorithms Susceptible to Backdoors, Manipulation. The article cites a paper that shows how the training data for a “deep” machine learning algorithms can be subtly poisoned (intentionally or otherwise) such that the algorithm can be trained to react abnormally to inputs that don’t seem abnormal to humans.
For example, an ML algorithm for self-driving cars might be programmed to recognize stop signs, by showing it thousands of stop signs as well as thousands of things that are not stop signs, and telling it which is which. Afterwords, when shown new pictures, the algorithm does a good job classifying them into the correct categories.
But lets say someone added a few pictures of stop signs with Post-It notes stuck on them into the “non stop sign” pile? The program would learn to recognize a stop sign with a sticky on it as a non stop sign. Unless you test your algorithm with pictures of stop signs with sticky notes on them (and why would you even think of that?), you’ll never know that your algorithm will happily misclassify them. Et voila, you have created a way to selectively get self driving cars to zip through stop signs like they weren’t there. This is bad.
What caught my eye about this research is that the authors seem not to fully grasp that this is not a computer problem or an algorithm problem. It is a more general problem that philosophers, logicians, and semiologists have grappled with for a long time. I see it as a sign of the intellectual poverty of most programmers’ education that they did not properly categorize this issue.
Everyone has different terms for it, and I don’t know jack about philosophy, but it really boils down to:
- Can you know what someone else is thinking?
- Can you know how their brain works?
- Can you know they perceive the same things you perceive the same way?
Your brain is wholly isolated from the brains of everyone else. You can’t really know what’s going on inside their heads, except so much as they tell you, and for that, even if everyone is trying to be honest, we are limited by “language” and the mapping of symbols in your language to “meaning” in the heads of the speaker and listener can never truly be known. Sorry!
Now in reality, we seem to get by. if someone says he is hungry, that probably means he wants food. But what if someone tells you there is no stop sign at the intersection? Does he know what a stop sign is? Is he lying to you? How is his vision? Can he see colors? What if the light is kinda funny? All you can do is rely on your experience with that person’s ability to identify stop signs to know if he’ll give you the right answer. Maybe you can lean on the fact that he’s a licensed driver. However, you don’t know how his wet neural net has been trained by life experience and you have to make a guess about the adequacy of his sign-identification skills.
These deep learning algorithms, neural nets and the like, are not much like human brains, but they do have this in common with our brains: they are too complex to be made sense of. That is, we can’t look at the connections of neurons in the brain nor can we look at some parameters of a trained neural network and say, “oh, those are about sticky notes on stop signs. That is, all those coefficients are uninterpretable.
We’re stuck doing what we have done with people since forever: we “train” them, then we “test” them, and we hope to G-d that the test we gave covers all the scenarios they’ll face. It works, mostly, kinda, except when it doesn’t. (See every pilot-induced aviation accident, ever.)
I find it somewhat ironic that statisticians have worked hard to build models whose coefficients can be interpreted, but engineers are racing to build things around more sophisticated models that do neat things, but whose inner workings can’t quite be understood. Interpreting model coefficients is part of how how scientists assess the quality of their models and how they use them to tell stories about the world. But with the move to “AI” and deep learning, we’re giving that up. We are gaining the ability to build sophisticated tools that can do incredible things, but we can only assess their overall external performance — their F scores — with limited ability to look under the hood.
3 thoughts on “machines don’t think but they can still be unknowable”
Isn’t the training of the algorithm done by the manufacturer? Why would they deliberately mistrain it? (1st sentence of paragraph 3)
Well, I think my example was half-baked.
They could accidentally mis-train it, or a rogue employee could intentionally and surreptitiously mis-train it. Also, I strongly suspect that the cost of auditing a large training set is unappealing to most companies.
The accidental scenario, I suppose could happen if you don’t really know what the algorithm is keying on. It might be keying on the red octagon, which is pretty universal, but also shows up in some strange places.
But, honestly, stop signs are a pretty easy problem. Expand to fuzzier problems, such as detecting an enemy missile launch or a child drowning in a pool. It’s easier there to imagine just getting the training wrong and not knowing it until it’s too late.
To the extent that we believe what is in our heads is also in the heads of other humans (and indeed, other sentient beings, in many circumstances), we are not as isolated as the machine-learning algorithm that relies on a training set given it by a human trainer.
But yeah, it is hard to reconcile oneself to the idea that neither the algorithm nor its programmer can fully understand what it knows or how, after the initial training is done.
I’ve got enough trouble with cars driven by sentient, empathetic (I think) human beings. I’m hoping our robot masters are not driving around anytime soon.