Clever, disturbing

Apple was recently granted a new patent for technology that will disable your phone’s camera at concerts where photography is forbidden.

The patent uses an infrared signal, which could be picked up by the imaging sensor itself. This is rather ingenious and cunning, because you could not disable the shut-down sensor without disabling the camera yourself, since they are one and the same.

IPhone_5S_main_cameraDepending one how pervasive such tech became, and how closely integrated the detection, decoding, and disabling is to the actual silicon image sensor, it could become nearly impossible to defeat this tech, or to obtain a phone that doesn’t include it.

I find blocking cameras at concert venues mildly annoying, but the potential for abuse of this technology seems large. Will folks on the street use it to block being photographed? Will it be deployed in government buildinds? Outside cop-cars? Will the secret for how to disable everyone else’s phone get out?

Over the last few years we’ve seen some exciting benefits from ubiquitous deployment of cameras. People are getting caught doing things that are illegal or at least shameful. I’d be bummed to see some technology from Silicon Valley reverse this progress.



Detrumpify2 — some cleanup

Even though my short brush with Internet fame appears to be over (Detrumpify has about 920 users today, up only 30 from yesterday), pride required that I update the extension because it was a bit too quick-n-dirty for my taste. Everything in it was hard-coded and that meant that every update I made to add new sites or insults would require users to approve an update. Hard-coding stuff in your programs is a big no-no, starting from CS 101 on.

So, I have a rewritten version available, and intrepid fans can help me out by testing it. You will not find it by searching on the Chrome Web Store, instead, get it directly from here. It is substantially more complicated under the hood than before, so expect bugs. (Github here, in “v2” folder.)

An important difference between this and the classic version is that there is an options page. It looks like this:

Screen Shot 2016-06-28 at 11.33.34 AM The main thing it lets you do is specify an URL from which a configuration file will periodically be downloaded. The config file contains the actual insults as well as some other parameters. I will host and maintain several configuration files ToolsOfOurTools, but anyone who wants to make one (for example, to mock a different presidential candidate) will be able to do so and just point to it.

If you want to make changes locally, you can also load a file, click on the edit button, make changes, and then click on the lock button. From then on the extension will use your custom changes.

The format of the config file is simple. Here’s an example with most of the names removed:


  • actions  is a container that will hold one or more named sets of search and replace instructions. This file just has one for replacing trump variations, but one can make files that will replace many different things according to different rules
  • find_regex  inside the trump action finds a few variations of Trump, Donald Trump, Donald J. Trump.
  • monikers  section lists the alternatives.
  • randomize_mode  can be always , hourly , daily , and tells how often the insult changes. In always , it will change with each appearance in the document.
  • refresh_age  is how long to wait (in milliseconds) before hitting the server for an update.
  • run_info  tells how long to wait before running the plugin and how many times to run. This is for sites that do not elaborate their content until after some javascript runs. (ie, every site these days, apparently). Here, it runs after 1000ms, then runs four more times, each time waiting 1.8x as long as the last time.
  •   bracket  can be set to a two-element array of text to be placed before and after any trump replacement.
  • schema  is required to ID the format of this file and should look just like that.
  • whitelist  is a list of sites that are enabled to run the extension. Et voila.

Let me know if you experience issues / bugs! The code that runs this is quite a bit more complex than the version you’re running now. In particular, I’m still struggling a bit with certain websites that turn on “content security policies” that get in the way of the config fetch. Sometimes it works, sometimes it doesn’t.


Simulate this, my dark overloards!

Apparently, both Elon Musk and Neil deGrasse Tyson believe that we are probably living in a more advanced civilization’s computer simulation.

Now, I’m no philosopher, so I can’t weigh in on whether I really exist, but it does occur to me that if this is a computer simulation, it sucks. First, we have cruelty, famine, war, natural disasters, disease. On top of that, we do not have flying cars, or flying people, or teleportation for that matter.

Seriously, whoever is running this advanced civilization simulation must be into some really dark shit.

Mental Models

I think we all make mental models constantly — simplifications of the world that help us understand it. And for services on the Internet, our mental models are probably very close — logically, if not in implementation — to the reality of what those services do. If not, how could we use them?

I also like to imagine how the service works, too. I don’t know why I do this, but it makes me feel better about the universe. For a lot of things, to a first approximation, the what and how are sufficiently close that they are essentially the same model. And sometimes a model of how it works eludes me entirely.

for example, my model of email is that an email address is the combination of a username and a system name. My mail server looks up the destination mail server, and IP routes my blob of text to the destination mail server, where that server routes it to the appropriate user’s “mailbox,” which is a file. Which is indeed how it works, more or less, with lots of elision of what I’m sure are important details.

I’ve also begun sorting my mental models of Internet companies and services into a taxonomy that have subjective meaning for me, based on how meritorious and/or interesting they are. Here’s a rough draft:

The What The How Example Dave’s judgment
obvious as in real life email Very glad these exist, but nobody deserves a special pat on the back for them. I’ll add most matchmaking services, too.
obvious non-obvious, but simple and/or elegant Google Search (PageRank) High regard. Basically, this sort of thing has been the backbone of Internet value to-date
not obvious / inscrutable nobody cares Google Buzz lack of popularity kills these. Not much to talk about
obvious obvious Facebook Society rewards these but technically, they are super-boring to me
obvious non-obvious and complex natural language, machine translation, face recognition Potentially very exciting, but not really very pervasive or economically important just yet. Potentially creepy and may represent the end of humanity’s reign on earth.


Google search is famously straightforward. You’re searching for some “thing,” and Google is combing a large index for that “thing.” Back in the Altavista era, that “thing” was just keywords on a page. Google’s first innovation was to use the site’s own popularity (as measured by who links to it and the rankings of those links.) to help sort the results. I wonder how many people had a some kind of mental model of how Google worked that was different than that of Altavista — aside from the simple fact that it worked much “better.” The thing about Google’s “Pagerank” was that it was quite simple, and quite brilliant, because, honestly, none of the rest of us thought of it. So kudos to them.

There have been some Internet services I’ve tried over the years that I could not quite understand. I’m not talking about how they work under the hood, but how they appear to work from my perspective. Remember Google “Buzz?” I never quite understood what that was supposed to be doing.

Facebook, in its essence is pretty simple, too, and I think we all formed something of a working mental model for what we think it does. Here’s mine, written up as SQL code. First, the system is composed of a few tables:

A table of users, a table representing friendships, and a table of posts. The tables are populated by straightforward UI actions like “add friend” or “write post.”

Generating a user’s wall when they log in is as simple as:

You could build an FB clone with that code alone. It is eye-rollingly boring and unclever.

Such an implementation would die when you got past a few thousand users or posts, but with a little work and modern databases that automatically shard and replicate, etc, you could probably handle a lot more. Helping FB is the fact they makes no promises about correctness: a post you make may or may not ever appear on your friend’s wall, etc.

I think the ridiculous simplicity of this is why I have never taken Facebook very seriously. Obviously it’s a gajillion dollar idea, but technically, there’s nothing remotely creative or interesting there. Getting it all to work for a billion users making a billion posts a day is, I’m sure, a huge technical challenge, but not requiring inspiration. (As an aside, today’s FB wall is not so simple. It uses some algorithm to rank and highlight posts. What’s the algorithm and why and when will my friends see my post? Who the hell knows?! Does this bother anybody else but me?)

The last category is things that are reasonably obviously useful to lots of people, but how they work is pretty opaque, even if you think about it for awhile. That is, things that we can form a mental model of what it is, but mere mortals do not understand how it works. Machine translation falls into that category, and maybe all the new machine learning and future AI apps do, too.

It’s perhaps “the” space to watch, but if you ask me the obvious what / simple how isn’t nearly exhausted yet — as long as you can come up with an interesting “why,” that is.

next, they’ll discover fire

OK, now I’m feeling ornery. Google just announced a new chip of theirs that is tailored for machine-learning. It’s called the Tensor Processing Unit. and it is designed to speed up a software package called TensorFlow.

Okay, that’s pretty cool. But then Sundar Pichai has to go ahead and say:

This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law).

No, no, no, no, no.

First of all, Moore’s law is not about performance.  It is a statement of transistor density scaling, and this chip isn’t going to move that needle at all — unless Google has invented their own semiconductor technology.

Second, people have been developing special-purpose chips that solve a problem way faster than can a general-purpose microprocessor since the beginning of chip-making. It used to be that pretty much anything computationally interesting could not be done in a processor. Graphics, audio, modems, you name it all used to be done in hardware. Such chips are called application specific integrated circuits (ASICs) and, in fact, the design and manufacture of ASICs is more or less what gave Silicon Valley its name.

So, though I’m happy that Google has a cool new chip (and that they finally found an application that they believe merits making a custom chip) I wish the tech press wasn’t so gullible as to print any dumb thing that a Google rep says.


I’ll take one glimmer of satisfaction from this, though. And that is that someone found an important application that warrants novel chip design effort. Maybe there’s life for “Silicon” Valley yet.

notes on self-driving cars

A relaxing trip to work (courtesy wikimedia)
A relaxing trip to work (courtesy wikimedia)

Short post here. I notice people are writing about self-driving cars a lot. There is a lot of excitement out there about our driverless future.

I have a few thoughts, to expand on at a later day:


Apparently a lot of economic work on driving suggests that the a major externality of driving is from congestion. Simply, your being on the road slows down other people’s trips and causes them to burn more gas. It’s an externality because it is a cost of driving that you cause but don’t pay.

Now, people are projecting that a future society of driverless cars will make driving cheaper by 1) eliminating drivers (duh) and 2) getting more utilization out of cars. That is, mostly, our cars sit in parking spaces, but in a driverless world, people might not own cars so much anymore, but rent them by the trip. Such cars would be much better utilized and, in theory, cheaper on a per-trip basis.

So, if I understand my micro econ at all, people will use cars more because they’ll be cheaper. All else equal, that should increase congestion, since in our model, congestion is an externality. Et voila, a bad outcome.


But, you say, driverless cars will operate more efficiently, and make more efficient use of the roadways, and so they generate less congestion than stupid, lazy, dangerous, unpredictable human drivers. This may be so, but I will caution with a couple of ideas. First, how much less congestion will a driverless trip cause than a user-operated one? 75% as much? Half? Is this enough to offset the effect mentioned above? Maybe.

But there is something else that concerns me: the difference between soft- and hard-limits.

Congestion as we experience it today, seems to come on gradually as traffic approaches certain limits. You’ve got cars on the freeway, you add cars, things get slower. Eventually, things somewhat suddenly get a lot slower, but even then it’s certain times of the day, in certain weather, etc.

Now enter a driverless cars that utilize capacity much more effectively. Huzzah! More cars on the road getting where they want, faster. What worries me is that was is really happening is not that the limits are raised, but that we are operating the system much close to existing, real limits. Furthermore, now that automation is sucking out all the marrow from the road bone — the limits become hard walls, not gradual at all.

So, imagine traffic is flowing smoothly until a malfunction causes an accident, or a tire blows out, or there is a foreign object in the road — and suddenly the driverless cars sense the problem, resulting in a full-scale insta-jam, perhaps of epic proportions, in theory, locking up an entire city nearly instantaneously. Everyone is safely stopped, but stuck.

And even scarier than that is the notion that the programmers did not anticipate such a problem, and the car software is not smart enough to untangle it. Human drivers, for example, might, in an unusual situation, use shoulders or make illegal u-turns in order to extricate themselves from a serious problem. That’d be unacceptable in a normal situation, but perhaps the right move in an abnormal one. Have you ever had a cop the scene of an accident wave at you to do something weird? I have.

Will self-driving cars be able to improvise? This is an AI problem well beyond that of “merely” driving.”


Speaking of capacity and efficiency, I’ll be very interested to see how we make trade-offs of these versus safety. I do not think technology will make these trade-offs go away at all. Moving faster, closer will still be more dangerous than going slowly far apart. And these are the essential ingredients in better road capacity utilization.

What will be different will be how and when such decisions are made. In humans, the decision is made implicitly by the driver moment by moment. It depends on training, disposition, weather, light, fatigue, even mood. You might start out a trip cautiously and drive more recklessly later, like when you’re trying to eat fast food in your car. The track record for humans is rather poor, so I suspect  that driverless cars will do much better overall.

But someone will still have to decide what is the right balance of safety and efficiency, and it might be taken out of the hands of passengers. This could go different ways. In a liability-driven culture me way end up with a system that is safer but maybe less efficient than what we have now. (call it “little old lady mode”) or we could end up with decisions by others forcing us to take on more risk than we’d prefer if we want to use the road system.


I recently read in the June IEEE Spectrum (no link, print version only) that some people are suggesting that driverless cars will be a good justification for the dismantlement of public transit. Wow, that is a bad idea of epic proportions. If, in the first half of the 21st century, the world not only continues to embrace car culture, but  doubles down  to the exclusion of other means of mobility, I’m going to be ill.


*   *   *


That was a bit more than I had intended to write. Anyway, one other thought is that driverless cars may be farther off than we thought. In a recent talk, Chris Urmson, the director of the Google car project explains that the driverless cars of our imaginations — the fully autonomous, all conditions, all mission cars — may be 30 years off or more. What will come sooner are a succession of technologies that will reduce driver workload.

So, I suspect we’ll have plenty of time to think about this. Moreover, the nearly 7% of our workforce that works in transportation will have some time to plan.


minor annoyances: debug-printing enums

This is going to be another programming post.

One thing that always annoys me when working on a project in a language like C++ is that when I’m debugging, I’d like to print messages with meaningful names for the enumerated types I’m using.

The classic way to do it is something like this:

Note that I have perhaps too-cleverly left out the break statements because each case returns.

But this has problems:

  • repetitive typing
  • maintenance. Whenever you change the enum, you have to remember to change the debug function.

It just feels super-clunky.

I made a little class in C++ that I like a bit better because you only have to write the wrapper code once even to use it on a bunch of different enums. Also you can hide the code part in another file and never see or think about it again.

C++11 lets you initialize those maps pretty nicely, and they are static const, so you don’t have to worry about clobbering them or having multiple copies. But overall, it still blows because you have to type those identifiers no fewer than three times: once in the definition and twice in the printer thing.


I Googled a bit and learned about how Boost provides some seriously abusive preprocessor macros, including one that can loop. I don’t know what kind of dark preprocessor magic Boost uses, but it works. Here is the template and some macros:

And here’s how you use it:

Now I only have to list out the enumerators one time! Not bad. However, it obviously only works if you control the enum. If you are importing someone else’s header with the definition, it still has the maintenance problem of the other solutions.

I understand that the C++ template language is Turing-complete, so I’m suspect this can be done entirely with templates and no macros, but I wouldn’t have the foggiest idea how to start. Perhaps one of you do?

simple string operations in $your_favorite_language

I’ve recently been doing a small project that involves Python and Javascript code, and I keep tripping up on the differing syntax of their join()  functions. (As well as semicolons, tabs, braces, of course.) join()  is a simple function that joins an array of strings into one long string, sticking a separator in between, if you want.

So, join(["this","that","other"],"_")   returns "this_that_other" . Pretty simple.

Perl has join()  as a built-in, and it has an old-school non object interface.

Python is object-orienty, so it has an object interface:

What’s interesting here is that join is a member of the string class, and you call it on the separator string. So you are asking a "," to join up the things in that array. OK, fine.

Javascript does it exactly the reverse. Here, join is a member of the array class:

I think I slightly prefer Javascript in this case, since calling member functions of the separator just “feels” weird.

I was surprised to see that C++ does not include join in its standard library, even though it has the underlying pieces: <vector>  and <string>. I made up a little one like this:

You can see I took the Javascript approach. By the way, this is how they do it in Boost. Boost avoids the extra compare for the separator each time by handling the first list item separately.

Using it is about as easy as the scripting languages:

I can live with that, though the copy on return is just a C++ism that will always bug me.

Finally, I thought about what this might look like back in ye olden times, when we scraped our fingers on stone keyboards, and I came up with this:

Now that’s no beauty queeen. The function does double-duty to make it a bit easier to allocate for the resulting string. You call it first without a target pointer and it will return the size you need (not including the terminating null.) Then you call it again with the target pointer for the actual copy.

Of course, if any of the strings in that array are not terminated, or if you don’t pass in the right length, you’re going to get hurt.

Anyway, I must have been bored. I needed a temporary distraction.


progress, headphones edition

It looks like Intel is joining the bandwagon of people that want to take away the analog 3.5mm “headphone jack” and replace it with USB-C. This is on the heels of Apple announcing that this is definitely happening, whether you like it or not.

obsolete technology
obsolete technology

There are a lot of good rants out there already, so I don’t think I can really add much, but I just want to say that this does sadden me. It’s not about analog v. digital per se, but about simple v. complex and open v. closed.

The headphone jack is a model of simplicity. Two signals and a ground. You can hack it. You can use it for other purposes besides audio. You can get a “guzinta” or “guzoutta” adapter to match pretty much anything in the universe old or new — and if you can’t get it, you can make it. Also, it sounds Just Fine.

Now, I’m not just being ant-change. Before the 1/8″ stereo jack, we had the 1/4″ stereo jack. And before that we had mono jacks, and before that, strings and cans. And all those changes have been good. And maybe this change will be good, too.

But this transition will cost us something. For one, it just won’t work as well. USB is actually a fiendishly complex specification, and you can bet there will be bugs. Prepare for hangs, hiccups, and snits. And of course, none of the traditional problems with headphones are eliminated: loose connectors, dodgy wires, etc. On top of this, there will be, sure as the sun rises, digital rights management, and multiple attempts to control how and when you listen to music. Prepare to find headphones that only work with certain brands of players and vice versa. (Apple already requires all manufacturers of devices that want to interface digitally with the iThings to buy and use a special encryption chip from Apple — under license, natch.)

And for nerd/makers, who just want to connect their hoozyjigger to their whatsamaducky, well, it could be the end of the line entirely. For the time being, while everyone has analog headphones, there will be people selling USB-C audio converter thingies — a clunky, additional lump between devices. But as “all digital” headphones become more ubiquitous, those adapters will likely disappear, too.

Of course, we’ll always be able to crack open a pair of cheap headphones and steal the signal from the speakers themselves … until the neural interfaces arrive, that is.

EDIT: 4/28 8:41pm: Actually, the USB-C spec does allow analog on some of the pins as a “side-band” signal. Not sure how much uptake we’ll see of that particular mode.


Inferencing from Big Data

Last week, I came across this interesting piece on the perils of using “big data” to draw conclusions about the world. It analyzes, among other things, the situation of Google Flu Trends, the much heralded public health surveillance system that turned out to be mostly a predictor of winter (and has since been withdrawn).

It seems to me that big data is a fun place to explore for patterns, and that’s all good, clean, fun — but it is the moment when you think you have discovered something new when the actual work really starts. I think “data scientists” are probably on top of this problem, but are most people going on about big data data scientists?

I really do not have all that much to add to the article, but I will amateurishly opine a bit about statistical inferencing generally:


I’ve taken several statistics courses over my life (high school, undergrad, grad). In each one, I thought I had a solid grasp of the material (and got an “A”), until I took the next one, where I realized that my previous understanding was embarrassingly incorrect. I see no particular reason to think this pattern would ever stop if I took ever more stats classes. The point is, stats is hard. Big data does not make stats easier.


If you throw a bunch of variables at a model, it will find some  that look like good predictors. This is true even if the variables are totally and utterly random and unrelated to the dependent variable (see try-it-at-home experiment below). Poking around in big data, unfortunately, only encourages people to do this and perhaps draw conclusions when they should not. So, if you are going to use big data, do have a plan in advance. Know what effect size would be “interesting” and disregard things well under that threshold, even if they appear to be “statistically significant.” Determine in advance how much power (and thus, observations) you should have to make your case, and sample from your ginormous set to a more appropriate size.


Big data sets seem like they were mostly created for other purposes than statistical inferencing. That makes them a form of convenience data. They might be big, but are the variables present really what you’re after? And was this data collected scientifically, in a manner designed to minimize bias? I’m told that collecting a quality data set takes effort (and money). If that’s so, it seems likely that the quality of your average big data set is low.

A lot of big data comes from log files from web services. That’s a lame place to learn about anything other than how the people who use those web services think or even how people who do use web services think while they’re doing something other than using that web service. Just sayin’.


Well, anyway, I’m perhaps out of my depth here, but I’ll leave you with this quick experiment, in R:

It generates 10,000 observations of 201 variables, each generated from a uniform random distribution on [0,1]. Then it runs an OLS model using one variable as the dependent and the remaining 200 as independents. R is even nice enough to put friendly little asterisks next to variables that have p<0.05 .

When I run it, I get 10 variables that appear to be better than “statistically significant at the 5% level” — even though the data is nothing but pure noise. This is about what one should expect from random noise.

Of course, the r2 of the resulting model  is ridiculously low (that is, the 200 variables together have low explanatory power ). Moreover, the effect size of the variables is small. All as it should be — but you do have to know to look. And in a more subtle case, you can imagine what happens if you build a model with a bunch of variables that do have explanatory power, and a bunch more that are crap. Then you will see a nice r2 overall, but you will still have some of your crap pop up.