simple string operations in $your_favorite_language

I’ve recently been doing a small project that involves Python and Javascript code, and I keep tripping up on the differing syntax of their join()  functions. (As well as semicolons, tabs, braces, of course.) join()  is a simple function that joins an array of strings into one long string, sticking a separator in between, if you want.

So, join(["this","that","other"],"_")   returns "this_that_other" . Pretty simple.

Perl has join()  as a built-in, and it has an old-school non object interface.

Python is object-orienty, so it has an object interface:

What’s interesting here is that join is a member of the string class, and you call it on the separator string. So you are asking a "," to join up the things in that array. OK, fine.

Javascript does it exactly the reverse. Here, join is a member of the array class:

I think I slightly prefer Javascript in this case, since calling member functions of the separator just “feels” weird.

I was surprised to see that C++ does not include join in its standard library, even though it has the underlying pieces: <vector>  and <string>. I made up a little one like this:

You can see I took the Javascript approach. By the way, this is how they do it in Boost. Boost avoids the extra compare for the separator each time by handling the first list item separately.

Using it is about as easy as the scripting languages:

I can live with that, though the copy on return is just a C++ism that will always bug me.

Finally, I thought about what this might look like back in ye olden times, when we scraped our fingers on stone keyboards, and I came up with this:

Now that’s no beauty queeen. The function does double-duty to make it a bit easier to allocate for the resulting string. You call it first without a target pointer and it will return the size you need (not including the terminating null.) Then you call it again with the target pointer for the actual copy.

Of course, if any of the strings in that array are not terminated, or if you don’t pass in the right length, you’re going to get hurt.

Anyway, I must have been bored. I needed a temporary distraction.

 

progress, headphones edition

It looks like Intel is joining the bandwagon of people that want to take away the analog 3.5mm “headphone jack” and replace it with USB-C. This is on the heels of Apple announcing that this is definitely happening, whether you like it or not.

obsolete technology
obsolete technology

There are a lot of good rants out there already, so I don’t think I can really add much, but I just want to say that this does sadden me. It’s not about analog v. digital per se, but about simple v. complex and open v. closed.

The headphone jack is a model of simplicity. Two signals and a ground. You can hack it. You can use it for other purposes besides audio. You can get a “guzinta” or “guzoutta” adapter to match pretty much anything in the universe old or new — and if you can’t get it, you can make it. Also, it sounds Just Fine.

Now, I’m not just being ant-change. Before the 1/8″ stereo jack, we had the 1/4″ stereo jack. And before that we had mono jacks, and before that, strings and cans. And all those changes have been good. And maybe this change will be good, too.

But this transition will cost us something. For one, it just won’t work as well. USB is actually a fiendishly complex specification, and you can bet there will be bugs. Prepare for hangs, hiccups, and snits. And of course, none of the traditional problems with headphones are eliminated: loose connectors, dodgy wires, etc. On top of this, there will be, sure as the sun rises, digital rights management, and multiple attempts to control how and when you listen to music. Prepare to find headphones that only work with certain brands of players and vice versa. (Apple already requires all manufacturers of devices that want to interface digitally with the iThings to buy and use a special encryption chip from Apple — under license, natch.)

And for nerd/makers, who just want to connect their hoozyjigger to their whatsamaducky, well, it could be the end of the line entirely. For the time being, while everyone has analog headphones, there will be people selling USB-C audio converter thingies — a clunky, additional lump between devices. But as “all digital” headphones become more ubiquitous, those adapters will likely disappear, too.

Of course, we’ll always be able to crack open a pair of cheap headphones and steal the signal from the speakers themselves … until the neural interfaces arrive, that is.

EDIT: 4/28 8:41pm: Actually, the USB-C spec does allow analog on some of the pins as a “side-band” signal. Not sure how much uptake we’ll see of that particular mode.

 

Inferencing from Big Data

Last week, I came across this interesting piece on the perils of using “big data” to draw conclusions about the world. It analyzes, among other things, the situation of Google Flu Trends, the much heralded public health surveillance system that turned out to be mostly a predictor of winter (and has since been withdrawn).

It seems to me that big data is a fun place to explore for patterns, and that’s all good, clean, fun — but it is the moment when you think you have discovered something new when the actual work really starts. I think “data scientists” are probably on top of this problem, but are most people going on about big data data scientists?

I really do not have all that much to add to the article, but I will amateurishly opine a bit about statistical inferencing generally:

1.

I’ve taken several statistics courses over my life (high school, undergrad, grad). In each one, I thought I had a solid grasp of the material (and got an “A”), until I took the next one, where I realized that my previous understanding was embarrassingly incorrect. I see no particular reason to think this pattern would ever stop if I took ever more stats classes. The point is, stats is hard. Big data does not make stats easier.

2

If you throw a bunch of variables at a model, it will find some  that look like good predictors. This is true even if the variables are totally and utterly random and unrelated to the dependent variable (see try-it-at-home experiment below). Poking around in big data, unfortunately, only encourages people to do this and perhaps draw conclusions when they should not. So, if you are going to use big data, do have a plan in advance. Know what effect size would be “interesting” and disregard things well under that threshold, even if they appear to be “statistically significant.” Determine in advance how much power (and thus, observations) you should have to make your case, and sample from your ginormous set to a more appropriate size.

3

Big data sets seem like they were mostly created for other purposes than statistical inferencing. That makes them a form of convenience data. They might be big, but are the variables present really what you’re after? And was this data collected scientifically, in a manner designed to minimize bias? I’m told that collecting a quality data set takes effort (and money). If that’s so, it seems likely that the quality of your average big data set is low.

A lot of big data comes from log files from web services. That’s a lame place to learn about anything other than how the people who use those web services think or even how people who do use web services think while they’re doing something other than using that web service. Just sayin’.

 

Well, anyway, I’m perhaps out of my depth here, but I’ll leave you with this quick experiment, in R:

It generates 10,000 observations of 201 variables, each generated from a uniform random distribution on [0,1]. Then it runs an OLS model using one variable as the dependent and the remaining 200 as independents. R is even nice enough to put friendly little asterisks next to variables that have p<0.05 .

When I run it, I get 10 variables that appear to be better than “statistically significant at the 5% level” — even though the data is nothing but pure noise. This is about what one should expect from random noise.

Of course, the r2 of the resulting model  is ridiculously low (that is, the 200 variables together have low explanatory power ). Moreover, the effect size of the variables is small. All as it should be — but you do have to know to look. And in a more subtle case, you can imagine what happens if you build a model with a bunch of variables that do have explanatory power, and a bunch more that are crap. Then you will see a nice r2 overall, but you will still have some of your crap pop up.

 

 

 

Peak Processing Pursuit

I’ve known for some time that the semiconductor (computer chip) business has not been the most exciting place, but it still surprised me and bummed me out to see that Intel was laying off 11% of its workforce. There are lots of theories about what is happening in non-cloudy-software-appy tech, but I think fundamentally, the money is being drained out of “physical” tech businesses. The volumes are there, of course. Every gadget has a processor — but that processor doesn’t command as much margin as it once did.

A CPU from back in the day (Moto 68040)
A CPU from back in the day (Moto 68040)

A lot of people suggest that the decline in semiconductors is a result of coming to the end of the Moore’s Law epoch. The processors aren’t getting better as fast as they used to, and some  argue (incorrectly) hardly at all. This explains the decline, because without anything new and compelling on the horizon, people do not upgrade.

But in order for that theory to work, you also have to assume that the demand for computation has leveled off. This, I think, is almost as monumental a shift as Moore’s Law ending. Where are the demanding new applications? In the past we always seemed to want more computer (better graphics, snappier performance, etc) and now we somewhat suddenly don’t. It’s like computers became amply adequate right about the same time that they stopped getting much better.

Does anybody else find that a little puzzling? Is it coincidence? Did one cause the other, and if so, which way does the causality go?

The situation reminds me a bit of “peak oil,” the early-2000’s fear that global oil production will peak and there will be massive economic collapse as a result. Well, we did hit a peak in oil production in 2008-9 time-frame, but it wasn’t from scarcity, it was from low demand in a faltering economy. Since then, production has been climbing again. But with the possibility of electrified transportation tantalizingly close, we may see true peak oil in the years ahead, driven by diminished demand rather than diminished supply.

I am not saying that we have reached “peak computer.” That would be absurd. We are all using more CPU instructions than ever, be it on our phones, in the cloud, or in our soon-to-be-internet-of-thinged everything. But the ever-present pent up demand for more and better CPU performance from a single device seems to be behind us. Which, for just about anyone alive but the littlest infants, is new and weird.

If someone invents a new activity to do on a computing device that is both popular and ludicrously difficult, that might put us back into the old regime. And given that Moore’s Law is sort of over, that could make for exciting times in Silicon Valley (or maybe Zhongguancun), as future performance will require the application of sweat and creativity, rather than just riding a constant wave. (NB: there was a lot of sweat and creativity required to keep the wave going.)

Should I hold my breath?

A postulate regarding innovation

I’ve been thinking a lot lately about success and innovation. Perhaps its because of my lack of success and innovation.

Anyway, I’ve been wondering how the arrow of causality goes with those things. Are companies successful because they are innovative, or are they innovative because they are successful.

This is not exactly a chicken-and-egg question. Google is successful and innovative. It’s pretty obvious that innovation came first. But after a few “game periods,” the situation becomes more murky. Today, Google can take risks and reach further forward into the technology pipeline for ideas than a not-yet successful entrepeneur could. In fact, a whole lot of their innovation seems not to affect their bottom line much, in part because its very hard to grow a new business at the scale of their existing cash cows. This explains (along with impatience and the opportunity to invest in their high-returning existing businesses) Google’s penchant for drowning many projects in the bathtub.

I can think of other companies that had somewhat similar behavior over history. AT&T Bell Labs and IBM TJ Watson come to mind as places that were well funded due to their parent companies enormous success (success, derived at least in part, from monopoly or other market power). And those places innovated. A lot. As in Nobel Prizes, patents galore, etc. But, despite their productive output of those labs, I don’t think they ever contributed very much to the companies’ success. I mean, the transistor! The solar cell! But AT&T didn’t pursue these businesses because they had a huge working business that didn’t have much to do with those. Am I wrong about that assessment? I hope someone more knowledgable will correct me.

Anyway, that brings me back to the titans of today, Google, Facebook, etc. And I’ll continue to wonder out loud:

  • are they innovating?
  • is the innovation similar to their predecessors?
  • are they benefiting from their innovation?
  • if not, who does, and why do they do it?

So, this gets back to my postulate, which is that, much more often than not, success drives innovation, and not the reverse. That it ever happens the other way is rare and special.

Perhaps a secondary postulate is that large, successful companies do innovate, but they have weak incentives to act aggressively on those innovations, and so their creative output goes underutilized longer than it might if it had been in the hands of a less successful organization.

Crazy?

Nerd alert: Google inverter challenge

A couple of years ago Google announced an electrical engineering contest with a $1M prize. The goal was build the most compact DC to AC power inverter that could meet certain requirements, namely 2kVA power output at 240 Vac 60Hz, from a 450V DC source with a 10Ω impedance. The inverter had to withstand certain ambient conditions and reliability, and it had to meet FCC interference requirements.

Fast forward a few years, and the results are in. Several finalists met the design criteria, and the grand prize winner exceeded the energy density requirements by more than 3x!

First, Congrats, to the “Red Electrical Devils!” Screen Shot 2016-03-09 at 9.56.34 AMI wish I were smart enough to have been able to participate, but my knowledge of power electronics is pretty hands-off, unless you are impressed by using TRIACs to control holiday lighting. Here’s the IEEE on what they thought it would take to win.

Aside from general gEEkiness, two things interested me about this contest. First, from an econ perspective, contests are just a fascinating way to spur R&D. Would you be able to get entrants, given the cost of participation and the likelihood of winning the grand prize? Answer: yes. This seems to be a reliable outcome if the goal is interested enough to the right body of would-be participants.

The second thing that I found fascinating was the goal: power density. I think most people understand the goals of efficiency, but is it important that power inverters be small? The PV inverter on the side of your house, also probably around 2kW, is maybe 20x as big as these. Is that bad? How much is it worth it to shrink such an inverter? (Now, it is true if you want to achieve power density, you must push on efficiency quite a bit, as every watt of energy lost to heat needs to be dissipated somehow, and that gets harder and harder as the device gets smaller. But in this case, though efficiencies achieved were excellent, they were not cutting edge, and the teams instead pursued extremely clever cooling approaches.)

I wonder what target market Google has in mind for these high power density inverters. Cars perhaps? In that case, density is more important than a fixed PV inverter, but still seemingly not critical to this extreme. Specific density rather than volumetric seems like it would be more important. Maybe Google never had a target in mind. For sure, there was no big reveal with the winner announcement. Maybe Google just thought that this goal was the most likely to generate innovation in this space overall, without a particular end use in mind at all — it’s certainly true that power electronics are a huge enabling piece of our renewable energy future, and perhaps it’s not getting the share of attention it deserves.

I’m not the first, though, to wonder what this contest was “really about.” I did not have to scroll far down the comments to see one from Slobodon Ćuk, a rather famous power electronics researcher, inventor of a the Ćuk inverter.

Screen Shot 2016-03-09 at 9.55.00 AMAnyway, an interesting mini-mystery, but a cool achievement regardless.

Power corrupts, software ecosystems edition

I’ve written a lot of software in my day, but it turns out little of it has been for use of anyone but myself and others in the organizations for which I’ve worked.

Recently, my job afforded me a small taste of what it’s like to publish software. I wrote a small utility, an extension to a popular web browser to solve a problem with a popular web application. The extension was basically a “monkey patch” of sorts over the app in question.

Now, in order to get it up on the “web store”, there were some hoops to jump through, including submission to the company for final approval. In the process, I signed up for the mailing list that programmers use 2000px-Gnome-system-lock-screen.svgto help them deal with hiccups in the publishing process.

As it turned out, my hoop-jumping wasn’t too hard. Because I was only publishing to my own organization, this extension ultimately did not need to be reviewed by the Great And Powerful Company. But I’ve continued to follow the mailing list, because it has turned out to be fascinating.

Every day, a developer or two or three, who has toiled for months or years to create something they think is worthwhile, sends a despondent message to the list: “Please help! My app has been rejected and I don’t know why!” Various others afflicted try to provide advice for things they tried that worked or did not.

And the thing is, in many cases, nobody can help them. Because the decision was made without consultation with the developer. The developer has no access to the decision-maker whatsoever. No email, no phone call, no explanation. Was the app rejected because it violated the terms and conditions? Which ones?

The developers will have to guess, make some changes, and try their luck again. It’s got to be an infuriateing process, and a real experience of powerlessness.

This is how software is distributed today — through a small number of authorities. Google, Apple, Amazon, etc. If you want to play, you do it by their rules. Even on PCs and Macs there seems a strong push to move software distribution onto the company web stores. So far, it is still possible to put a random executable on your PC and run it — at least after clicking through a series of warnings. But will that be the case forever? We shall see.

The big companies have some good reasons for doing this. They can

  • assert quality control (of a sort. NB: plenty of crap apps in curated stores anyway)
  • help screen for security problems
  • screen out malware.

But they also have bad (that is, consumer-unfriendly) reasons to do this, like

  • monetizing other people’s work to a degree only possible in a monopoly situation
  • keeping out products that compete with their own
  • blocking apps that circumvent company-imposed limitations (like blocking frameworks and interpreters that might allow people to develop for the framework rather than the specific target OS)

All of those reasons are on top of the inevitable friction associated with dealing with a large, careless, monolithic organization and its bureaucrats, who might find it easier to reject your puny app than to take the time to understand what it’s doing and why it has merit.

Most sad to me is that the amazing freedom that computing used to have is being restricted. Most users would never use that freedom, but it was nice to have. If you could find a program, you could run it. If you could not find it, you could write it. And that is being chipped away at.

 

 

 

Apple Open Letter… eh

[ Updated below, but I’m leaving the text here as I originally wrote it. ]

 

By now, just about everyone has seen the open letter from Apple about device encryption and privacy. A lot of people are impressed that such a company with so much to lose would stand up for their customers. Eh, maybe.

I have to somewhat conflicting thoughts on the whole matter:

1)

If Apple had designed security on the iPhone properly, it would not even be possible for them to do what the government is asking. In essence, the government plan is for Apple to develop a new version of iOS that they can “upgrade” the phone to, which would bypass (or make it easier to bypass) the security on the device. Of course, it should not be possible to upgrade the OS of a phone without the consent of a verified users, so this is a bug they baked in from the beginning — for their benefit, of course, not the government’s.

Essentially, though they have not yet written the “app” that takes advantage of this backdoor, they have already created it in a sense. The letter is therefore deceptive as written.

2)

The US government can get a warrant to search anything. Anything. Any. Thing. This is has it has been since the beginning of government. They can’t go out and do so without a warrant. They can’t (well, shouldn’t) be able to pursue wholesale data mining of every single person, but they can get a warrant to break any locked box and see what’s inside.

Why should data be different?

I think the most common argument around this subject is that the government cannot be trusted with such power. That is, yes, the government may have a reasonably right to access encrypted data in certain circumstances (like decrypting known terrorist’s phones!) but the tools that allow that also give them the power to access data under less clear-cut circumstances as well.

The argument then falls into a slippery slope domain — a domain in which I’m generally unimpressed. In fact, I would dismiss it entirely if the US government hadn’t already engaged in important widespread abuse of similar powers.

Nevertheless, I think the argument that the government should not have backdoors to people’s data is one of practical controls rather than fundamental rights to be free from search.

 

I have recommendations to address both thoughts:

  1. Apple, like all manufacturers, should implement security properly, so that neither they nor any other entity possess a secret backdoor.
  2. Phone’s should have a known backdoor, a one-time password algorithm seeded at the time of manufacture, and stored and managed by a third party, such as the EFF. Any attempts to access this password, whether granted or denied, would be logged and viewable as a public record.

I don’t have a plan for sealed and secret warrants.

 

[ Update 2/17 11:30 CA time ]

So, the Internet has gone further and explained a bit more about what Apple is talking about and what the government has asked for. It seems that basically, the government wants to be able to to brute-force the device, and wants Apple to make a few changes to make that possible:

  1. that the device won’t self-wipe after too many incorrect passwords
  2. that the device will not enforce extra time-delay between attempts
  3. that the the attempts can be conducted electronically, via the port, rather than manually by the touch screen

I guess this is somehow different than Apple being able to hack their own devices, but to me, it’s still basically the same situation. They can update the OS and remove security features. That the final attack is brute force rather than a backdoor is hardly relevant.

So I’m standing behind my assessment that the Apple security is borked by design.

tech fraud, innovation, and telling the difference

I admit it. I’m something of a connoisseur of fraud, particularly technology fraud.I’m  fascinated by it. That’s why I have not been able to keep my eyes off the unfolding story of Thernos. Theranos was a company formed to do blood testing with minute quantities of blood. The founder, who dropped out of Stanford to pursue her idea, imagined blood testing kiosks in every drugstore, making testing ubiquitous, cheap, safe, painless. It all sounds pretty great in concept, but it seemed to me from the very start to lack an important hallmark of seriousness: evidence of a thoughtful survey of “why hasn’t this happened already?”

There were plenty of warning signs that this would not work out, but I think what’s fascinating to me is that the very same things that set off klaxons in my brain lured in many investors. For example, the founder dropped out of school, so had “commitment,” but no technical background in the art she was promising to upend. Furthermore, there were very few medical or testing professionals among her directors. (There was one more thing that did it for me: the founder liked to ape the presentation style and even fashion style of Steve Jobs. Again, there were people with money who got lured by that … how? The mind boggles.)

Anyway, there is, today, a strange current of anti-expert philosophy floating around Silicon Valley. I don’t know what to make of it. They do have some points. It is true that expertise can blind you to new ideas. And it’s also true that a lot of people who claim to be experts are really just walking sacks of rules-of-thumb and myths accreted over unremarkable careers.

At the same time, building truly innovative technology products is especially hard. I’m not talking about applying technology to hailing a cab. I’m talking about creating new technology. The base on which you are innovating is large and complex. The odds that you can add something meaningful to it through some googling seems vanishingly small.

But it is probably non-zero, too. Which means that we will always have stories of the iconoclast going against the grain to make something great. But are those stories explanatory? Do they tell us about how innovation works? Are they about exceptions or rules? Should we mimic successful people who defy experts, by defying experts ourselves, and if we do, what are our chances of success? And should we even try to acquire expertise ourselves?

All of this brings me to one of my favorite frauds in progress: Ubeam. This is a startup that wants to charge your cell phone, while it’s in your pocket, by means of ultrasound — and it raised my eyebrows the moment I heard about it. They haven’t raised quite as much money as did Theranos, but their technology is even less likely to work. (There are a lot of reasons, but they boil down to the massive attenuation of ultrasound in air, the danger of exposing people to high levels of ultrasound, the massive energy loss from sending out sound over a wide area, only to be received over a small one [or the difficulty and danger of forming a tight beam], the difficulty of penetrating clothes, purses, and phone holders, and the very low likelihood that a phone’s ultrasound transducer will be positioned close to normally with respect to the beam source.) And if they somehow manage to make it work, it’s still a terrible idea, as it will be grotesquely inefficient.

What I find so fascinating about this startup is that the founder is ADAMANT that people who do not believe it will work are just trapped in an old paradigm. They are incapable of innovation — broken, in a way. She actively campaigns for “knowledge by Google” and against expertise.

As an engineer by training and genetic predisposition, this TEDx talk really blows my mind. I still cannot quite process it:

 

 

Technical Support that isn’t

I’ve been doing a little progr^H^H^H^H^Hsoftware engineering lately, and with it, I’ve been interacting with libraries and APIs from third parties. Using APIs can be fun, because it lets your simple program leverage somebody else’s clever work. On the other hand, I really hate learning complex APIs because the knowledge is a) too often hard-won through extended suffering and b) utterly disposable. You will not be able to use what you’ve learned next week, much less, next year.

So, when I’m learning a new API, I read the docs, but I admit to trying to avoid reading them too closely or completely, and I try not to commit any of it to memory. I’m after the bare minimum to get my job done.

That said, I sometimes get stuck and have to look closer, and a few times recently, I’ve even pulled the trigger on the last resort: writing for support. Here’s the thing: I do this when I have read the docs and am seriously stuck and/or strongly suspect that the API is broken in some way.

As it happens, I have several years’ experience as a corporate and field applications engineer (that’s old-skool Silicon Valley speak for person-who-helps-make-it-work), so I like to think I know how to approach support folks; I know how I would like to be approached.

I always present them with a single question, focused down to the most basic elements, preferably in a form that they can use to reproduce the issue themselves.

But three out of three times recently when I’ve done this in the past month (NB: a lot of web APIs are, in fact, quite broken), I have run into a support person who replied before considering my example problems or even simply running. Most annoying, they sent me the dreaded “Have you looked at the documentation?”

All of those are support sins, but I find the last the most galling. For those of you whose job it is to help others use your product, let me make this very humble suggestion: always, always, ALWAYS point the user to the precise bit of documentation that answers the question asked. (Oh, and if your docs website does not allow that, it is broken, too. Fix it.)

This has three handy benefits:

  1. It helps the user with their problem immediately. Who woodanodeit?
  2. It chastens users who are chastenable (like me), by pointing out how silly and lazy they were to write for help before carefully examining the docs.
  3. It forces you, the support engineer, to look at the documentation yourself, with a particular question in mind, and gives you the opportunity to consider whether the doc’s ability to answer this question might be improvable.

 

On the other hand, sending a friendly email suggesting that the user “look at the documentation” makes you look like an ass.