August 2017 – Tools of our Tools

The business model for smart devices in the home is shaping up to be simple and bad: cheap hardware and no service contracts. That sounds great for consumers — after all, why should I pay $100 for a smart power outlet made of a $0.40 microcontroller and a $1 relay, and why should I have to pay a monthly fee to switch it — but it is going to have serious negative ramifications.

Let me start by saying that many bits have already been spilled about basic IoT security:

making sure that messages sent to and from your device back to the manufacturer cannot be faked or intercepted
making sure that your IoT device is not hacked remotely, turning it into someone else’s IoT device
making sure that your data, when it is at rest in the vendor’s systems is not stolen and misused

As things stand, none of that is going to happen satisfactorily, primarily because of incompatible incentives. When you sell a device for the raw cost of its hardware, with minimal markups and no opportunity for ongoing revenue, you also have no incentive for ongoing security work. Or any kind of work for that matter. If you bought the device on the “razor + blade” model, where the device was cheap, but important revenue was based on your continued use of the product, things might be different.

Worse than that, however, in order to find new revenue streams (immediately, or at potential future streams), vendors have strong incentives to collect all the data they can from the device. You do not know — even when the devices are operating as designed — exactly what they are doing. They are in essence little listening bugs willingly planted all over your home, and you do not know what kind of information they are exfiltrating, nor do you know who is ultimately receiving that information.

I think there is a solution to this problem, if people want it, and it requires two basic parts to work properly:

We need a business model for smart devices that puts strong incentives in place for vendors to continue to support their products. This will never happen with the cheapie Fry’s Electronics special IoT Doohickey of the Week. Instead, we probably need a real engagement with sticks (liability) and carrots (enhanced revenue) that are driven by ongoing contractual engagement. That is, money should continue to flow.

We need a standardized protocol for IoT that provides for a gateway at the home, and encrypted data on both sides of the gateway, but with the gateway owner having access to the encryption keys on the inner side of the gateway. The standardized protocol would have fields for the vendor name and hosts, as well as a human readable json-style payload — and a rule that nothing can be double-encrypted in the payload, keeping it from the eyes of the user.

Under such an arrangement, users, or their gateways acting as proxies for them, could monitor what is coming and going. You could program your gateway, for example, to block unnecessary information from http messages sent by your device.

Of course, the vendors, seeing the blocked information might decide not to provide their service, and that’s their right, but at least everyone would know the score.

Will this happen? Well, I think vendors with the long view of things would probably see #1 as appealing. Users will not, perhaps. But that is because users are not fully aware of the consequences of inviting someone else to monitor their activities. Perhaps people will think differently after a few sensational misuses of their data.

Vendors will fight #2 mightily. Of course, they could ignore it completely, with the potential antidote that a large number of users who insist on it becoming excluded from their total available market. With a critical mass of people using gateways that implement #2, I think we could tip things, but it right now it seems a long shot.

I am quite pessimistic about all this. I don’t think we’ll see #1 or #2 unless something spectacularly bad happens first.

For the record, I do use a few IoT devices in my home. There are two flavors: those I built myself and those I bought. For the self-built, they exist entirely within my network and do not interact with any external server. I obviously know what they do. For those I bought, they they exist on a DMZ style network up with no access to my home network at all (at least if my router is working as intended). This mitigates the worry of pwned devices accessing my computer and files, but does not stop them from sending whatever they collect back to the mothership.

I still read Slashdot for my tech news (because I’m old, I guess) and came across this article, AI Training Algorithms Susceptible to Backdoors, Manipulation. The article cites a p aper that shows how the training data for a “deep” machine learning algorithms can be subtly poisoned (intentionally or otherwise) such that the algorithm can be trained to react abnormally to inputs that don’t seem abnormal to humans.

For example, an ML algorithm for self-driving cars might be programmed to recognize stop signs, by showing it thousands of stop signs as well as thousands of things that are not stop signs, and telling it which is which. Afterwords, when shown new pictures, the algorithm does a good job classifying them into the correct categories.

But lets say someone added a few pictures of stop signs with Post-It notes stuck on them into the “non stop sign” pile? The program would learn to recognize a stop sign with a sticky on it as a non stop sign. Unless you test your algorithm with pictures of stop signs with sticky notes on them (and why would you even think of that?), you’ll never know that your algorithm will happily misclassify them. Et voila, you have created a way to selectively get self driving cars to zip through stop signs like they weren’t there. This is bad.

What caught my eye about this research is that the authors seem not to fully grasp that this is not a computer problem or an algorithm problem. It is a more general problem that philosophers, logicians, and semiologists have grappled with for a long time. I see it as a sign of the intellectual poverty of most programmers’ education that they did not properly categorize this issue.

Everyone has different terms for it, and I don’t know jack about philosophy, but it really boils down to:

Can you know what someone else is thinking?
Can you know how their brain works?
Can you know they perceive the same things you perceive the same way?

You can’t.

Your brain is wholly isolated from the brains of everyone else. You can’t really know what’s going on inside their heads, except so much as they tell you, and for that, even if everyone is trying to be honest, we are limited by “language” and the mapping of symbols in your language to “meaning” in the heads of the speaker and listener can never truly be known. Sorry!

Now in reality, we seem to get by. if someone says he is hungry, that probably means he wants food. But what if someone tells you there is no stop sign at the intersection? Does he know what a stop sign is? Is he lying to you? How is his vision? Can he see colors? What if the light is kinda funny? All you can do is rely on your experience with that person’s ability to identify stop signs to know if he’ll give you the right answer. Maybe you can lean on the fact that he’s a licensed driver. However, you don’t know how his wet neural net has been trained by life experience and you have to make a guess about the adequacy of his sign-identification skills.

These deep learning algorithms, neural nets and the like, are not much like human brains, but they do have this in common with our brains: they are too complex to be made sense of. That is, we can’t look at the connections of neurons in the brain nor can we look at some parameters of a trained neural network and say, “oh, those are about sticky notes on stop signs. That is, all those coefficients are uninterpretable.

We’re stuck doing what we have done with people since forever: we “train” them, then we “test” them, and we hope to G-d that the test we gave covers all the scenarios they’ll face. It works, mostly, kinda, except when it doesn’t. (See every pilot-induced aviation accident, ever.)

I find it somewhat ironic that statisticians have worked hard to build models whose coefficients can be interpreted, but engineers are racing to build things around more sophisticated models that do neat things, but whose inner workings can’t quite be understood. Interpreting model coefficients is part of how how scientists assess the quality of their models and how they use them to tell stories about the world. But with the move to “AI” and deep learning, we’re giving that up. We are gaining the ability to build sophisticated tools that can do incredible things, but we can only assess their overall external performance — their F scores — with limited ability to look under the hood.

Month: August 2017

IoT information security will never come under the prevailing business model

machines don’t think but they can still be unknowable