next, they’ll discover fire

OK, now I’m feeling ornery. Google just announced a new chip of theirs that is tailored for machine-learning. It’s called the Tensor Processing Unit. and it is designed to speed up a software package called TensorFlow.

Okay, that’s pretty cool. But then Sundar Pichai has to go ahead and say:

This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law).

No, no, no, no, no.

First of all, Moore’s law is not about performance.  It is a statement of transistor density scaling, and this chip isn’t going to move that needle at all — unless Google has invented their own semiconductor technology.

Second, people have been developing special-purpose chips that solve a problem way faster than can a general-purpose microprocessor since the beginning of chip-making. It used to be that pretty much anything computationally interesting could not be done in a processor. Graphics, audio, modems, you name it all used to be done in hardware. Such chips are called application specific integrated circuits (ASICs) and, in fact, the design and manufacture of ASICs is more or less what gave Silicon Valley its name.

So, though I’m happy that Google has a cool new chip (and that they finally found an application that they believe merits making a custom chip) I wish the tech press wasn’t so gullible as to print any dumb thing that a Google rep says.

Gah.

I’ll take one glimmer of satisfaction from this, though. And that is that someone found an important application that warrants novel chip design effort. Maybe there’s life for “Silicon” Valley yet.

notes on self-driving cars

A relaxing trip to work (courtesy wikimedia)
A relaxing trip to work (courtesy wikimedia)

Short post here. I notice people are writing about self-driving cars a lot. There is a lot of excitement out there about our driverless future.

I have a few thoughts, to expand on at a later day:

I.

Apparently a lot of economic work on driving suggests that the a major externality of driving is from congestion. Simply, your being on the road slows down other people’s trips and causes them to burn more gas. It’s an externality because it is a cost of driving that you cause but don’t pay.

Now, people are projecting that a future society of driverless cars will make driving cheaper by 1) eliminating drivers (duh) and 2) getting more utilization out of cars. That is, mostly, our cars sit in parking spaces, but in a driverless world, people might not own cars so much anymore, but rent them by the trip. Such cars would be much better utilized and, in theory, cheaper on a per-trip basis.

So, if I understand my micro econ at all, people will use cars more because they’ll be cheaper. All else equal, that should increase congestion, since in our model, congestion is an externality. Et voila, a bad outcome.

II.

But, you say, driverless cars will operate more efficiently, and make more efficient use of the roadways, and so they generate less congestion than stupid, lazy, dangerous, unpredictable human drivers. This may be so, but I will caution with a couple of ideas. First, how much less congestion will a driverless trip cause than a user-operated one? 75% as much? Half? Is this enough to offset the effect mentioned above? Maybe.

But there is something else that concerns me: the difference between soft- and hard-limits.

Congestion as we experience it today, seems to come on gradually as traffic approaches certain limits. You’ve got cars on the freeway, you add cars, things get slower. Eventually, things somewhat suddenly get a lot slower, but even then it’s certain times of the day, in certain weather, etc.

Now enter a driverless cars that utilize capacity much more effectively. Huzzah! More cars on the road getting where they want, faster. What worries me is that was is really happening is not that the limits are raised, but that we are operating the system much close to existing, real limits. Furthermore, now that automation is sucking out all the marrow from the road bone — the limits become hard walls, not gradual at all.

So, imagine traffic is flowing smoothly until a malfunction causes an accident, or a tire blows out, or there is a foreign object in the road — and suddenly the driverless cars sense the problem, resulting in a full-scale insta-jam, perhaps of epic proportions, in theory, locking up an entire city nearly instantaneously. Everyone is safely stopped, but stuck.

And even scarier than that is the notion that the programmers did not anticipate such a problem, and the car software is not smart enough to untangle it. Human drivers, for example, might, in an unusual situation, use shoulders or make illegal u-turns in order to extricate themselves from a serious problem. That’d be unacceptable in a normal situation, but perhaps the right move in an abnormal one. Have you ever had a cop the scene of an accident wave at you to do something weird? I have.

Will self-driving cars be able to improvise? This is an AI problem well beyond that of “merely” driving.”

III.

Speaking of capacity and efficiency, I’ll be very interested to see how we make trade-offs of these versus safety. I do not think technology will make these trade-offs go away at all. Moving faster, closer will still be more dangerous than going slowly far apart. And these are the essential ingredients in better road capacity utilization.

What will be different will be how and when such decisions are made. In humans, the decision is made implicitly by the driver moment by moment. It depends on training, disposition, weather, light, fatigue, even mood. You might start out a trip cautiously and drive more recklessly later, like when you’re trying to eat fast food in your car. The track record for humans is rather poor, so I suspect  that driverless cars will do much better overall.

But someone will still have to decide what is the right balance of safety and efficiency, and it might be taken out of the hands of passengers. This could go different ways. In a liability-driven culture me way end up with a system that is safer but maybe less efficient than what we have now. (call it “little old lady mode”) or we could end up with decisions by others forcing us to take on more risk than we’d prefer if we want to use the road system.

IV.

I recently read in the June IEEE Spectrum (no link, print version only) that some people are suggesting that driverless cars will be a good justification for the dismantlement of public transit. Wow, that is a bad idea of epic proportions. If, in the first half of the 21st century, the world not only continues to embrace car culture, but  doubles down  to the exclusion of other means of mobility, I’m going to be ill.

 

*   *   *

 

That was a bit more than I had intended to write. Anyway, one other thought is that driverless cars may be farther off than we thought. In a recent talk, Chris Urmson, the director of the Google car project explains that the driverless cars of our imaginations — the fully autonomous, all conditions, all mission cars — may be 30 years off or more. What will come sooner are a succession of technologies that will reduce driver workload.

So, I suspect we’ll have plenty of time to think about this. Moreover, the nearly 7% of our workforce that works in transportation will have some time to plan.

 

minor annoyances: debug-printing enums

This is going to be another programming post.

One thing that always annoys me when working on a project in a language like C++ is that when I’m debugging, I’d like to print messages with meaningful names for the enumerated types I’m using.

The classic way to do it is something like this:

Note that I have perhaps too-cleverly left out the break statements because each case returns.

But this has problems:

  • repetitive typing
  • maintenance. Whenever you change the enum, you have to remember to change the debug function.

It just feels super-clunky.

I made a little class in C++ that I like a bit better because you only have to write the wrapper code once even to use it on a bunch of different enums. Also you can hide the code part in another file and never see or think about it again.

C++11 lets you initialize those maps pretty nicely, and they are static const, so you don’t have to worry about clobbering them or having multiple copies. But overall, it still blows because you have to type those identifiers no fewer than three times: once in the definition and twice in the printer thing.

Unsatisfactory.

I Googled a bit and learned about how Boost provides some seriously abusive preprocessor macros, including one that can loop. I don’t know what kind of dark preprocessor magic Boost uses, but it works. Here is the template and some macros:

And here’s how you use it:

Now I only have to list out the enumerators one time! Not bad. However, it obviously only works if you control the enum. If you are importing someone else’s header with the definition, it still has the maintenance problem of the other solutions.

I understand that the C++ template language is Turing-complete, so I’m suspect this can be done entirely with templates and no macros, but I wouldn’t have the foggiest idea how to start. Perhaps one of you do?

simple string operations in $your_favorite_language

I’ve recently been doing a small project that involves Python and Javascript code, and I keep tripping up on the differing syntax of their join()  functions. (As well as semicolons, tabs, braces, of course.) join()  is a simple function that joins an array of strings into one long string, sticking a separator in between, if you want.

So, join(["this","that","other"],"_")   returns "this_that_other" . Pretty simple.

Perl has join()  as a built-in, and it has an old-school non object interface.

Python is object-orienty, so it has an object interface:

What’s interesting here is that join is a member of the string class, and you call it on the separator string. So you are asking a "," to join up the things in that array. OK, fine.

Javascript does it exactly the reverse. Here, join is a member of the array class:

I think I slightly prefer Javascript in this case, since calling member functions of the separator just “feels” weird.

I was surprised to see that C++ does not include join in its standard library, even though it has the underlying pieces: <vector>  and <string>. I made up a little one like this:

You can see I took the Javascript approach. By the way, this is how they do it in Boost. Boost avoids the extra compare for the separator each time by handling the first list item separately.

Using it is about as easy as the scripting languages:

I can live with that, though the copy on return is just a C++ism that will always bug me.

Finally, I thought about what this might look like back in ye olden times, when we scraped our fingers on stone keyboards, and I came up with this:

Now that’s no beauty queeen. The function does double-duty to make it a bit easier to allocate for the resulting string. You call it first without a target pointer and it will return the size you need (not including the terminating null.) Then you call it again with the target pointer for the actual copy.

Of course, if any of the strings in that array are not terminated, or if you don’t pass in the right length, you’re going to get hurt.

Anyway, I must have been bored. I needed a temporary distraction.

 

The answer is always the same, regardless of question. (Civil Aviation Edition)

The Wall Street Journal had an editorial last week suggesting that the US air traffic control system needs to privatize.

It’s not a new debate, and though I will get into some specifics of the discussion below, what really resonated for me is how religious and ideological is the belief that corporations just do everything better. It’s not like the WSJ made any attempt whatsoever to list (and even dismiss) counter-arguments to ATC privatization. It’s almost as if the notion that there could be some justification for a publicly funded and run ATC has just never occurred to them.

It reminded me of a similar discussion, in a post in an energy blog I respect, lamenting the “dysfunction” in California’s energy politics, particularly from the CPUC.

What both pieces seemed to have in common is a definition of dysfunction that hews very close to “not the outcome that a market would have produced.” That is to say, they see the output of non-market (that is, political) processes as fundamentally inferior and inefficient, if not outright illegitimate. Of course, the outcomes from political processes can be inefficient and dysfunctional, but this is hardly a law of nature.

For my loyal reader (sadly, not a typo), none of this is news, but it still saddens me that so many potentially interesting problems (like how best to provision air traffic control services) break down on such tired ideological grounds: do you want to make policy based on one-interested-dollar per vote or one-interested-person per vote?

I want us to be much more agnostic and much more empirical in these kinds of debates. Sometimes markets get good/bad outcomes, sometimes politics does.

For example, you might never have noticed that you can’t fly Lufthansa or Ryanair from San Francisco to Chicago. That’s because there are “cabotage” laws in the US that bar foreign carriers from offering service between US cities. Those laws are blatantly anti-competitive and the flying public is definitely harmed by this. This is a political outcome I don’t particularly like due, in part, to Congress paying better attention to the airlines than to the passengers. Yet, I’m not quite ready to suggest that politics does not belong in aviation.

Or, in terms of energy regulation, it’s worth remembering that we brought politics into the equation a very long time ago because “the market” was generating pretty crappy outcomes, too. What I’m saying is that neither approach has a exclusive rights to dysfunction.

A control towerOK. Let’s get back to ATC and the WSJ piece.

In it, the WSJ makes frequent reference to Canada’s ATC organization, NavCanada, that was privatized during a budget crunch a few years back, and has performed well since then. This is in contrast to to an FAA that has repeated “failed to modernize.”

But the US is not Canada, and our air traffic situation is very different. A lot of planes fly here! Anyone who has spent any serious time looking at our capacity problems knows thUS and Europe have very different sources of flight delaysat the major source of delay in the US is from insufficient runways and terminal airspace, not control capabilities per se. That is to say, modernizing the ATC system so that aircraft could fly more closely using GPS position information doesn’t really buy you all that much if the real crunch is access to the airport. If you are really interested, check out this comparison of the US and European ATC performance. The solution in the US is pouring more concrete in more places, not necessarily a revamped ATC. (It’s not that ATC equipment could not benefit from revamping, only that it is not the silver bullet promised.)

Here’s another interesting mental exercise: Imagine you have developed new technology to improve the throughput of an ATC facility by 30% — but the hitch is that when you deploy the technology, there will be diminution in performance during the switchover, as human learning, inevitable hiccups, and the need to temporary run the old and new systems in parallel takes its toll. Now imagine that you want to deploy that technology at a facility that is already operating at near its theoretical maximum capability. See a problem there? It’s not an easy thing.

Another issue in the article regards something called ADS-B (Automatic Dependent Surveillance – Broadcast), a system by which aircraft broadcast their GPS-derived position. Sounds pretty good, and yet, the US has taken a long time to get it going widely. (It’s not required on all aircraft until 2020) Why? Well, one reason is that a lot of the potential cost-savings from switching to ADS-B would come from the retirement of expensive, old primary radars that “paint” aircraft with radio waves and sense the reflected energy. Thing is, primary radars can see metal objects in the sky, and ADS-B receivers only see aircraft that are broadcasting their position. You may have heard in recent hijackings how transponders were disabled by pilot — so, though the system is cool, it certainly cannot alone replace the existing surveillance systems. The benefits are not immediate and large, and it leaves some important problems unsolved. Add in the high cost of equippage, and it was an easy target to delay. But is that a sign of dysfunction or good decision-making?

All of which is to say that I’m not sure a privately run organization, facing similar constraints, would make radically different decisions than has the FAA.

Funding the system is an interesting question, too. Yes, a private organization that can charge fees has a reliable revenue stream and is thus is able to go to financial markets to borrow for investment.  This is in contrast to the FAA, which has had a hard time funding major new projects because of constant congressional budget can-kicking. Right now the FAA is operating on an extension of its existing authorization (from 2012), and a second extension is pending, with a real reauthorization still behind that. OK, so score one for a private organization. (Unless we can make Congress function again, at least.)

But what happens to privatized ATC if there is a major slowdown in air travel? Do investments stop, or is service degraded due to cost cutting, or does the government end up lending a hand anyway? And how might an airline-fee-based ATC operate differently from one that ostensibly serves the public? Even giving privatization proponents the benefit of the doubt that a privatized ATC would be more efficient and better at cost saving, would such an organization be as good at spending more money when an opportunity comes along to make flying safer, faster, or more convenient for passengers? How about if the costs of such changes fall primarily on the airlines, through direct equippage costs and ATC fees? Or, imagine a scenario where most airlines fly large aircraft between major cities, and an an upstart starts flying lots of small aircraft between small cities. Would a privatized ATC or publicly funded ATC better resist the airlines’ anti-competitive pressures to erect barriers to newcomers?

I actually don’t know the answers. The economics of aviation are somewhat mysterious to me, as they probably are to you unless your an economist or operations researcher. But I’m pretty sure the Scott McCartney of the WSJ knows even less.

 

progress, headphones edition

It looks like Intel is joining the bandwagon of people that want to take away the analog 3.5mm “headphone jack” and replace it with USB-C. This is on the heels of Apple announcing that this is definitely happening, whether you like it or not.

obsolete technology
obsolete technology

There are a lot of good rants out there already, so I don’t think I can really add much, but I just want to say that this does sadden me. It’s not about analog v. digital per se, but about simple v. complex and open v. closed.

The headphone jack is a model of simplicity. Two signals and a ground. You can hack it. You can use it for other purposes besides audio. You can get a “guzinta” or “guzoutta” adapter to match pretty much anything in the universe old or new — and if you can’t get it, you can make it. Also, it sounds Just Fine.

Now, I’m not just being ant-change. Before the 1/8″ stereo jack, we had the 1/4″ stereo jack. And before that we had mono jacks, and before that, strings and cans. And all those changes have been good. And maybe this change will be good, too.

But this transition will cost us something. For one, it just won’t work as well. USB is actually a fiendishly complex specification, and you can bet there will be bugs. Prepare for hangs, hiccups, and snits. And of course, none of the traditional problems with headphones are eliminated: loose connectors, dodgy wires, etc. On top of this, there will be, sure as the sun rises, digital rights management, and multiple attempts to control how and when you listen to music. Prepare to find headphones that only work with certain brands of players and vice versa. (Apple already requires all manufacturers of devices that want to interface digitally with the iThings to buy and use a special encryption chip from Apple — under license, natch.)

And for nerd/makers, who just want to connect their hoozyjigger to their whatsamaducky, well, it could be the end of the line entirely. For the time being, while everyone has analog headphones, there will be people selling USB-C audio converter thingies — a clunky, additional lump between devices. But as “all digital” headphones become more ubiquitous, those adapters will likely disappear, too.

Of course, we’ll always be able to crack open a pair of cheap headphones and steal the signal from the speakers themselves … until the neural interfaces arrive, that is.

EDIT: 4/28 8:41pm: Actually, the USB-C spec does allow analog on some of the pins as a “side-band” signal. Not sure how much uptake we’ll see of that particular mode.

 

Inferencing from Big Data

Last week, I came across this interesting piece on the perils of using “big data” to draw conclusions about the world. It analyzes, among other things, the situation of Google Flu Trends, the much heralded public health surveillance system that turned out to be mostly a predictor of winter (and has since been withdrawn).

It seems to me that big data is a fun place to explore for patterns, and that’s all good, clean, fun — but it is the moment when you think you have discovered something new when the actual work really starts. I think “data scientists” are probably on top of this problem, but are most people going on about big data data scientists?

I really do not have all that much to add to the article, but I will amateurishly opine a bit about statistical inferencing generally:

1.

I’ve taken several statistics courses over my life (high school, undergrad, grad). In each one, I thought I had a solid grasp of the material (and got an “A”), until I took the next one, where I realized that my previous understanding was embarrassingly incorrect. I see no particular reason to think this pattern would ever stop if I took ever more stats classes. The point is, stats is hard. Big data does not make stats easier.

2

If you throw a bunch of variables at a model, it will find some  that look like good predictors. This is true even if the variables are totally and utterly random and unrelated to the dependent variable (see try-it-at-home experiment below). Poking around in big data, unfortunately, only encourages people to do this and perhaps draw conclusions when they should not. So, if you are going to use big data, do have a plan in advance. Know what effect size would be “interesting” and disregard things well under that threshold, even if they appear to be “statistically significant.” Determine in advance how much power (and thus, observations) you should have to make your case, and sample from your ginormous set to a more appropriate size.

3

Big data sets seem like they were mostly created for other purposes than statistical inferencing. That makes them a form of convenience data. They might be big, but are the variables present really what you’re after? And was this data collected scientifically, in a manner designed to minimize bias? I’m told that collecting a quality data set takes effort (and money). If that’s so, it seems likely that the quality of your average big data set is low.

A lot of big data comes from log files from web services. That’s a lame place to learn about anything other than how the people who use those web services think or even how people who do use web services think while they’re doing something other than using that web service. Just sayin’.

 

Well, anyway, I’m perhaps out of my depth here, but I’ll leave you with this quick experiment, in R:

It generates 10,000 observations of 201 variables, each generated from a uniform random distribution on [0,1]. Then it runs an OLS model using one variable as the dependent and the remaining 200 as independents. R is even nice enough to put friendly little asterisks next to variables that have p<0.05 .

When I run it, I get 10 variables that appear to be better than “statistically significant at the 5% level” — even though the data is nothing but pure noise. This is about what one should expect from random noise.

Of course, the r2 of the resulting model  is ridiculously low (that is, the 200 variables together have low explanatory power ). Moreover, the effect size of the variables is small. All as it should be — but you do have to know to look. And in a more subtle case, you can imagine what happens if you build a model with a bunch of variables that do have explanatory power, and a bunch more that are crap. Then you will see a nice r2 overall, but you will still have some of your crap pop up.

 

 

 

More minimum wage bullshit

 

Workers unaware that they are soon to be laid off.
Workers unaware that they are soon to be laid off.

Some clever economists have come up with a name for the religious  application of simple economic principles to complex situations where they probably don’t apply: Econ-101ism.

That’s immediately what I thought of when my better half told me about this stupid article in Investor’s Business Daily about the minimum wage and UC Berkeley.

See, folks at Berkeley touted the $15/hr minimum wage as a good thing, and then UC laid off a bunch of people. Coincidence? The good people at Irritable Bowel Disease think not!

Except, few at UC gets paid the minimum wage. And the $15/hr minimum wage has not taken effect and won’t take effect for years. And the reason for the job cuts are the highly strained budget situation at the UCs, a problem that is hardly new.

You could make an argument that a $15/hr minimum will strain the economy, resulting in lower tax revenue, resulting in less state money, resulting in layoffs at the UC’s. I guess. Quite a lot of moving parts in that story, though.

Smells like bullshit.

Edit: UCB does have its own minimum wage, higher than the California minimum. It has been $14/hr since 10/2015 and will be $15/hr starting in 2017. (http://www.mercurynews.com/business/ci_28522491/uc-system-will-raise-minimum-wage-15-an)

Another edit: Chancellor Dirks claim the 500 job cuts would save $50M/yr. That implies an average hourly cost of $50/hr. Even if 1/2 that goes to overhead and benefits, those would be $25/hr jobs, not near the minimum. In reality, the jobs probably had a range of salaries, and one can imagine some were near the $15 mark, but it is not possible that all or even most of them were.

 

Peak Processing Pursuit

I’ve known for some time that the semiconductor (computer chip) business has not been the most exciting place, but it still surprised me and bummed me out to see that Intel was laying off 11% of its workforce. There are lots of theories about what is happening in non-cloudy-software-appy tech, but I think fundamentally, the money is being drained out of “physical” tech businesses. The volumes are there, of course. Every gadget has a processor — but that processor doesn’t command as much margin as it once did.

A CPU from back in the day (Moto 68040)
A CPU from back in the day (Moto 68040)

A lot of people suggest that the decline in semiconductors is a result of coming to the end of the Moore’s Law epoch. The processors aren’t getting better as fast as they used to, and some  argue (incorrectly) hardly at all. This explains the decline, because without anything new and compelling on the horizon, people do not upgrade.

But in order for that theory to work, you also have to assume that the demand for computation has leveled off. This, I think, is almost as monumental a shift as Moore’s Law ending. Where are the demanding new applications? In the past we always seemed to want more computer (better graphics, snappier performance, etc) and now we somewhat suddenly don’t. It’s like computers became amply adequate right about the same time that they stopped getting much better.

Does anybody else find that a little puzzling? Is it coincidence? Did one cause the other, and if so, which way does the causality go?

The situation reminds me a bit of “peak oil,” the early-2000’s fear that global oil production will peak and there will be massive economic collapse as a result. Well, we did hit a peak in oil production in 2008-9 time-frame, but it wasn’t from scarcity, it was from low demand in a faltering economy. Since then, production has been climbing again. But with the possibility of electrified transportation tantalizingly close, we may see true peak oil in the years ahead, driven by diminished demand rather than diminished supply.

I am not saying that we have reached “peak computer.” That would be absurd. We are all using more CPU instructions than ever, be it on our phones, in the cloud, or in our soon-to-be-internet-of-thinged everything. But the ever-present pent up demand for more and better CPU performance from a single device seems to be behind us. Which, for just about anyone alive but the littlest infants, is new and weird.

If someone invents a new activity to do on a computing device that is both popular and ludicrously difficult, that might put us back into the old regime. And given that Moore’s Law is sort of over, that could make for exciting times in Silicon Valley (or maybe Zhongguancun), as future performance will require the application of sweat and creativity, rather than just riding a constant wave. (NB: there was a lot of sweat and creativity required to keep the wave going.)

Should I hold my breath?