Detrumpify2 — some cleanup

Even though my short brush with Internet fame appears to be over (Detrumpify has about 920 users today, up only 30 from yesterday), pride required that I update the extension because it was a bit too quick-n-dirty for my taste. Everything in it was hard-coded and that meant that every update I made to add new sites or insults would require users to approve an update. Hard-coding stuff in your programs is a big no-no, starting from CS 101 on.

So, I have a rewritten version available, and intrepid fans can help me out by testing it. You will not find it by searching on the Chrome Web Store, instead, get it directly from here. It is substantially more complicated under the hood than before, so expect bugs. (Github here, in “v2” folder.)

An important difference between this and the classic version is that there is an options page. It looks like this:

Screen Shot 2016-06-28 at 11.33.34 AM The main thing it lets you do is specify an URL from which a configuration file will periodically be downloaded. The config file contains the actual insults as well as some other parameters. I will host and maintain several configuration files ToolsOfOurTools, but anyone who wants to make one (for example, to mock a different presidential candidate) will be able to do so and just point to it.

If you want to make changes locally, you can also load a file, click on the edit button, make changes, and then click on the lock button. From then on the extension will use your custom changes.

The format of the config file is simple. Here’s an example with most of the names removed:

Explanation:

  • actions  is a container that will hold one or more named sets of search and replace instructions. This file just has one for replacing trump variations, but one can make files that will replace many different things according to different rules
  • find_regex  inside the trump action finds a few variations of Trump, Donald Trump, Donald J. Trump.
  • monikers  section lists the alternatives.
  • randomize_mode  can be always , hourly , daily , and tells how often the insult changes. In always , it will change with each appearance in the document.
  • refresh_age  is how long to wait (in milliseconds) before hitting the server for an update.
  • run_info  tells how long to wait before running the plugin and how many times to run. This is for sites that do not elaborate their content until after some javascript runs. (ie, every site these days, apparently). Here, it runs after 1000ms, then runs four more times, each time waiting 1.8x as long as the last time.
  •   bracket  can be set to a two-element array of text to be placed before and after any trump replacement.
  • schema  is required to ID the format of this file and should look just like that.
  • whitelist  is a list of sites that are enabled to run the extension. Et voila.

Let me know if you experience issues / bugs! The code that runs this is quite a bit more complex than the version you’re running now. In particular, I’m still struggling a bit with certain websites that turn on “content security policies” that get in the way of the config fetch. Sometimes it works, sometimes it doesn’t.

 

Mental Models

I think we all make mental models constantly — simplifications of the world that help us understand it. And for services on the Internet, our mental models are probably very close — logically, if not in implementation — to the reality of what those services do. If not, how could we use them?

I also like to imagine how the service works, too. I don’t know why I do this, but it makes me feel better about the universe. For a lot of things, to a first approximation, the what and how are sufficiently close that they are essentially the same model. And sometimes a model of how it works eludes me entirely.

for example, my model of email is that an email address is the combination of a username and a system name. My mail server looks up the destination mail server, and IP routes my blob of text to the destination mail server, where that server routes it to the appropriate user’s “mailbox,” which is a file. Which is indeed how it works, more or less, with lots of elision of what I’m sure are important details.

I’ve also begun sorting my mental models of Internet companies and services into a taxonomy that have subjective meaning for me, based on how meritorious and/or interesting they are. Here’s a rough draft:

The What The How Example Dave’s judgment
obvious as in real life email Very glad these exist, but nobody deserves a special pat on the back for them. I’ll add most matchmaking services, too.
obvious non-obvious, but simple and/or elegant Google Search (PageRank) High regard. Basically, this sort of thing has been the backbone of Internet value to-date
not obvious / inscrutable nobody cares Google Buzz lack of popularity kills these. Not much to talk about
obvious obvious Facebook Society rewards these but technically, they are super-boring to me
obvious non-obvious and complex natural language, machine translation, face recognition Potentially very exciting, but not really very pervasive or economically important just yet. Potentially creepy and may represent the end of humanity’s reign on earth.

 

Google search is famously straightforward. You’re searching for some “thing,” and Google is combing a large index for that “thing.” Back in the Altavista era, that “thing” was just keywords on a page. Google’s first innovation was to use the site’s own popularity (as measured by who links to it and the rankings of those links.) to help sort the results. I wonder how many people had a some kind of mental model of how Google worked that was different than that of Altavista — aside from the simple fact that it worked much “better.” The thing about Google’s “Pagerank” was that it was quite simple, and quite brilliant, because, honestly, none of the rest of us thought of it. So kudos to them.

There have been some Internet services I’ve tried over the years that I could not quite understand. I’m not talking about how they work under the hood, but how they appear to work from my perspective. Remember Google “Buzz?” I never quite understood what that was supposed to be doing.

Facebook, in its essence is pretty simple, too, and I think we all formed something of a working mental model for what we think it does. Here’s mine, written up as SQL code. First, the system is composed of a few tables:

A table of users, a table representing friendships, and a table of posts. The tables are populated by straightforward UI actions like “add friend” or “write post.”

Generating a user’s wall when they log in is as simple as:

You could build an FB clone with that code alone. It is eye-rollingly boring and unclever.

Such an implementation would die when you got past a few thousand users or posts, but with a little work and modern databases that automatically shard and replicate, etc, you could probably handle a lot more. Helping FB is the fact they makes no promises about correctness: a post you make may or may not ever appear on your friend’s wall, etc.

I think the ridiculous simplicity of this is why I have never taken Facebook very seriously. Obviously it’s a gajillion dollar idea, but technically, there’s nothing remotely creative or interesting there. Getting it all to work for a billion users making a billion posts a day is, I’m sure, a huge technical challenge, but not requiring inspiration. (As an aside, today’s FB wall is not so simple. It uses some algorithm to rank and highlight posts. What’s the algorithm and why and when will my friends see my post? Who the hell knows?! Does this bother anybody else but me?)

The last category is things that are reasonably obviously useful to lots of people, but how they work is pretty opaque, even if you think about it for awhile. That is, things that we can form a mental model of what it is, but mere mortals do not understand how it works. Machine translation falls into that category, and maybe all the new machine learning and future AI apps do, too.

It’s perhaps “the” space to watch, but if you ask me the obvious what / simple how isn’t nearly exhausted yet — as long as you can come up with an interesting “why,” that is.

minor annoyances: debug-printing enums

This is going to be another programming post.

One thing that always annoys me when working on a project in a language like C++ is that when I’m debugging, I’d like to print messages with meaningful names for the enumerated types I’m using.

The classic way to do it is something like this:

Note that I have perhaps too-cleverly left out the break statements because each case returns.

But this has problems:

  • repetitive typing
  • maintenance. Whenever you change the enum, you have to remember to change the debug function.

It just feels super-clunky.

I made a little class in C++ that I like a bit better because you only have to write the wrapper code once even to use it on a bunch of different enums. Also you can hide the code part in another file and never see or think about it again.

C++11 lets you initialize those maps pretty nicely, and they are static const, so you don’t have to worry about clobbering them or having multiple copies. But overall, it still blows because you have to type those identifiers no fewer than three times: once in the definition and twice in the printer thing.

Unsatisfactory.

I Googled a bit and learned about how Boost provides some seriously abusive preprocessor macros, including one that can loop. I don’t know what kind of dark preprocessor magic Boost uses, but it works. Here is the template and some macros:

And here’s how you use it:

Now I only have to list out the enumerators one time! Not bad. However, it obviously only works if you control the enum. If you are importing someone else’s header with the definition, it still has the maintenance problem of the other solutions.

I understand that the C++ template language is Turing-complete, so I’m suspect this can be done entirely with templates and no macros, but I wouldn’t have the foggiest idea how to start. Perhaps one of you do?