Thursday, December 29, 2016

the digital library of babel

Jorge Luis Borges introduced the concept of the Library of Babel,  a "vast library containing all possible 410-page books of a certain format and character set." To further quote wikipedia,
Though the order and content of the books is random and apparently completely meaningless, the inhabitants believe that the books contain every possible ordering of just 25 basic characters (22 letters, the period, the comma, and the space). Though the vast majority of the books in this universe are pure gibberish, the library also must contain, somewhere, every coherent book ever written, or that might ever be written, and every possible permutation or slightly erroneous version of every one of those books.
The other day, my company CarGurus had a lunch and learn about the internals of git. I've always been impressed with how quick updates were once you've cloned a repository. In part that's because of how git stores an archive with a compressed version of every version of every file and folder your project has generated, and so chances are in doesn't have to pull down that much fresh data. What's really clever is how it stores them; each is in a physical file named after the SHA-1 hash of the file contents (each physical file sits in a folder named for the first two hex digits of the 40-hex-digit hash, you can see those folders in the .git/objects/ dir of your git project.)

SHA-1 is really amazing, because it's SO amazingly unlikely that two different files will generate the same hash. This page describes it as
Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (3.6 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.
So. Many years ago, an awesome experimental site word.com (now sadly defunct, the domain bought out by Merriam-Webster) ran a subsite called Pixeltime - you can read about it at my tribute page, but the upshot was it was an online graphic editor slash contest with an emcee the Pixel Master whom I've described as "a cross of Mr. Rogers and Max Headroom via Blue Man Group". Each image was 45x45, with a palette of 16 colors. (I made some visual basic hacks that let me essentially upload photos in 5 shades of gray by grabbing the mouse and clicking each pixel)


That one on the right is a little joke - I realized there was a maximum number of images that could be made in that format... at first I badly underestimated how many that is, but it turns out it's 16^2025. (one square could be 16 colors, 2 squares would be 16 * 16, etc) Anyway, most calculators don't even try to figure out what that is, they just call it "infinity".

So here's the thing: that "infinity" is much, much, much bigger than the number of unique SHA-1 hashes. If you were to make a hash for each image, you would certainly get a large number of collisions. In fact, 45x45 is extravagant - by my reckoning you could flood SHA-1 with a simple 16 colors at 10x10, which gives you 2.6 * 10 ^ 120 pictures. (I encourage people to check my math - I've certainly got it wrong before.)

So SHA-1 hashspace is so much bigger than what humanity could conceivably generate, and yet the universe of everything - if you don't put many restrictions on the grammar of the everything you're generating - is so much larger than that.

I don't think our brains can even deal with a million, never mind billions or trillions. (My 6th grade math teacher had a book of a thousand pages of a thousand dots each, with certain amusing values labeled.) Hell, get a dollars worth of pennies, lay em out in an uneven sprawl on a flat surface, and I'll bet you think it looks more like 40 cents.

Or, just watch this:

Monday, December 26, 2016

recreating processing / p5.js's map() function in php

Processing's map() function (here's an interactive p5 demo of it) is super useful and conceptually important, but I don't know if it natively exists in PHP, or I just don't know the term for it.

It takes 5 values, a basic value, the min and max value of the input range, and then the min and max value of the output range. So if you wanted to position something in Processing based on the pointer X position, but constrain it from, say, 1-100, you would call

map(mouseX, 0, width, 1,100);

Note that basic values outside the "input range" are valid, and will result in outputs outside the output range, and similarly you can flip either pair of range numbers and get things reversed, so to speak.

Anyway, I was starting a DIY wordcloud in PHP for tags on my website, and wanted to map the tag appearance count (ranging from 1-250 or so) to font pixel sizes (maybe from 10 to 100 or so).

Since the processing page provided a nice little tester, I wrote it in Java to make sure I wasn't screwing anything up, and could visually compare results to the existing map() function:

float mymap(float val, float inputMin,float inputMax,
    float outputMin, float outputMax){
     float inputRange = inputMax - inputMin;
     float outputRange = outputMax - outputMin;
     float scale = outputRange / inputRange;
     float trueVal = val - inputMin;
     return outputMin + (trueVal * scale);
}

and then ported it to PHP:

function mymap($val,$inputMin,$inputMax, $outputMin, $outputMax){
     $inputRange = $inputMax - $inputMin;
     $outputRange = $outputMax - $outputMin;
     $scale = $outputRange / $inputRange;
     $trueVal = $val - $inputMin;
     return $outputMin + ($trueVal * $scale);
}

Didn't do much error checking, if the min and max of the input are equal I guess that might be a problem.

Here's the first (well, second, I fiddled with a consistent line-height) result:
Not great, not terrible. I might try something like jQCloud if I want it to look better (not clear to me yet if you can make each word clickable.)

Sunday, December 25, 2016

simple ai

Merry Christmas! Today is the last day for my 2016 advent calendar.

I admit it ends on a bit of wimper, some of the earlier entries were much more interesting. Probably the most entertaining entry was the first, hockeyclaus. It's based on santaskate, which is the very first one I made when I started constructing these in 2009. It's an easy enough game, but not quite trivial, and some people say they just like bully bashing the elves around.

I used a crude heuristic to give the elves some smarts.... essentially I draw a line between the puck and the goal, and then a circle a small, arbitrary distance around the puck. If the puck is further from the goal than the elf is, the elf tries to get to the intersection of that line and circle but on the far side of puck - if the puck is nearer the goal, the elf tries to drive through the puck to the goal, that's the point it's attracted to.

I added a debug feature to the game, hit 'd' to toggle the extra visualization:
In this diagram, you see the lower elf is nearer the goal than the puck is, so is attracted to a point behind the puck, while the upper elf figures it's more or less on the other side of the puck already. Clearly not a good algorithm, but I'm ok with this game not being very challenging.

It might be cool to make this a Comp Sci programming challenge... how would two teams of one or two elves compete, in what is roughly a full-round-arena version of air hockey? I could see making an interface that both fully describes the "world" and all actors in it, and then lets the program say what direction of thrust each player it controls gets... I wonder what strategies would emerge from people who are actually good at programming AI.

drive space and network speeds have flatlined?

This chart, from Matt Komorowski's Cost per Gb Update page, confirms an inkling I've had...
Hard drives ain't getting bigger, the right side of that line is flat. Compare that to Komorowski's 2009 post, when things were a bit more rosy...

I'm feeling this now because I'm been looking for an excuse to replace my 2013 Macbook Air (apparently I'm a masochist when it comes to ports) I had my current model's HD space goosed up to 500Gb when I acquired it. If I got a new "Macbook" and if storage was my major concern over processor speed and memory (which it is), I'd have to pay at $1400 vs $1000 for the base model.

Some of that's the Apple Tax, but they're also an industry pacesetter, and the situation doesn't seem better in PC land. (Not that I consider it a serious option for me right now.)

(Phones have had similar flattenings. Apple finally shook off some its old "16 Gb might be good enough for cheapskates" illusion, but the asymmetry of their storage math, offering models at 256Gb, 128 and then 32 rather than 64 shows that they are playing the game of artificial incentives to not go with the cheapest model.)

And like that first Komorowski link suggests, some of its the cloud, people don't generally aren't storing as much locally as they used to. For me, I feel the pain mostly with photos and videos from my own phone (especially since I've been doing One Second Everyday for a few years) and other cameras - that's what pushes me to want larger capacity on my laptop, I'm old and curmudgeonly enough that I don't trust any cloud service, or any system more complex than "cleverly named nested folders I can see are backed up properly".

But even if I embraced a cloud service - it's not like network speeds have gotten that much better either! Objectively, home speeds feel like they've flatlined as well, and most people don't have unlimited data on mobile.

And Moore's law has hit some rough limits itself; having to be so aware of the quantum is a real pain for chip designers.

After a century of astounding technological progress, and with some of our proudest achievements in space being half a century ago, you wonder how badly things are going to stall. Combine that with a political situation that seems rather retrograde, and it's tough to keep all parts of the the old optimism going...

Thursday, December 22, 2016

gadgets of the 90s

Cool piece on some gadgets (2 organizers and a phone) from the 1990s. I'm so nostalgic for this kind of stuff, and the discussion of the UIs and formfactors and so forth was thoughtful if not particularly deep.

The PalmPilot was the first product to combine the form factor, general UI, and organizer functionality we all enjoy on our slab phones today. (Newton was first, but missed the form factor- though come to think of it, large phones are getting back up there in screensize, but the drive for device thinness is even more critical.

(Link via Lost in Mobile, a great blog for gadget and watch lovers. Of course Shaun McGill is an even bigger fan of the Psion than the Palm, and I can't really blame him. His How Did We Get to the iPhone is a nice read in the same area.)

Wednesday, December 21, 2016

$(document).ready() add how these kids don't know from grep-ability

On a code review with a tad of jQuery, I got dinged for using the old-school
$(document).ready(myFunction);
rather than the new hotness of
$(myFunction);
The former is being deprectated but I really think the recommended syntax suffers from poor grep-ability, you can't readily find all instances of "something that happens when the page loads" across a codebase, or even in the context of a single file, since $(myFunction) is about the same as $(mySelector) and both would require a fancier regex to distinguish usage from $("anotherSelector") or $('yetAnotherSelector').

The problem is that what was the selector for "ready()" never mattered, so this page claims people were using $(), $("document"), $(document), as well as confused people using $("someImageSelector") in the false hope it would wait til the image loaded.

In practice, I only saw $(document) used, but anyway. Progress... Better syntactics, poorer searching.

Thursday, December 15, 2016

chrome's mobile device emulation with the phone around it

Shown: the "before" from an
upcoming redesign...
I guess it wasn't too too many years ago that Firefox's "Firebug" was the web developer's friend, the main way of getting insight into the piecesparts of a webpage, but Chrome's devtools have become the new standard.

One useful part of it is the emulation of various phones and devices, so that sites with specialized mobile modes can be accurately emulated. For one page I also wanted some fake-y "screenshots on phone" with the gadget's hardware bezel etc visible around the sides of the content. It wasn't easy to google for, but eventually I realized at the top right of the bar with the device picker was a "..." menu (except the dots were arranged vertically) and one of the options was "Show Device Frame"

So, just mentioning it here for reference for my future self, or in case it helps some trying to google for it.

Tuesday, December 13, 2016

the return of star ratings to ios

3 months ago I complained that star ratings were dropped from iOS, but as of 10.2, they're back. More or less... you have to explicitly enable them by going to Settings | Music and toggling "Show Star Ratings" (Mystery: why doesn't Settings' find box show anything for "star"? Their search tool is ridiculously limited...)

Once you're there, this option gets added to the "..." menu when a song is playing...

Then it's a pretty simple dialog:

It's distinctly tougher to get to than previously, where tapping on the album cover image would flip the image, and then another tap would set the # of stars and flip it back with no need to hit "Done". People were complaining about the complexity on MacRumors, since setting a star rating used to feel like a more or less reasonable thing to do while driving, and now it's a bit on the fiddly side for that. Overall I think it supports my earlier speculation that Apple's UX folks felt there was too much confusion relative to overlap with the heart-based Like/Dislike options, which feed back into Apples "music selected for you" AI in a way the stars don't - so they'll keep stars around for power users, but for most folks they want to keep them on the simpler pattern.

It's reassuring to see Apple is aware that this is an important features for some of their users. (The earlier hack of using Siri to set the star rating, besides interrupting the playing song, wasn't complete for me because I often I just want to check if I had already advanced a song to the 4- or 5-star level, and Siri wouldn't tell me what a song was already rated) Hopefully this should ensure stars in iTunes will remain for the foreseeable future. (Critical to me because of my use of smart playlists)

Friday, December 9, 2016

javascript30

https://javascript30.com/ - build 30 things in 30 days. Interesting!

freemarker template: safely pass optional freemarker boolean into javascript

New job! I started with CarGurus just this week. Great site if you want to both find cars in your area and know if you'd be getting a good deal or a bad deal.

Interesting culture here - super engineer-centric, features tend to be outlined and handed to coders rather than having a lot (or, any) independent graphic design work beforehand. Plus they have come up with solid answers to problems I've seen plaguing other places I've worked: for example, the way developers too often play with "toy" databases that don't reflect what's going on production. For CarGurus, there are 3 options for our remote dev machines: slice, which is that minimalist subset that is sometimes good to work with on the server side, staging, which is a local weekly copy of production, and shared staging, which is the same but doesn't require everyone to do as much maintenance.

The technology is a mix of proven and reliable and more cutting edge. (Cutting-edge-wise I'm so relieved they're going with React and bypassed the whole Angular thing entirely) On the server, that means some older stuff like Struts and FreeMarker Templating; but both are actually pretty good at what they do. FreeMarker predates JSPs but is a now a well maintained Apache project.

Still, it can be a little arcane. One problem was I wanted to set a variable on the FreeMarker side for Javascript to read as a boolean, but to have the absence of that variable be smoothly treated as 'false'. The code for that turned out to be:

<script>
var isModal = ${(feedbackAsModal?has_content)?then(feedbackAsModal?c,'false')};
</script>

A little wordy but not too bad.

Oh, plus CarGurus bring in lunch for everyone every day, so that's a plus, though I still haven't quite worked out how to manage my calorie counting...

UPDATE:
Senior coworker Jasper suggest:
var isModal = ${(feedbackAsModal!false)?string?js_string};

For me the challenge is remembering that FreeMarker is Java-y in its view of typing, even though it doesn't always look it- it kind of feels like other duck-typed languages I have more familiarity with, plus default variable assignments don't have to specify a type.

As I read through the FreeMarker docs, other idiosyncrasies people here have pointed out include "Freemarker has no default printing of booleans, you need to explicitly use ?string". Also it kinda hates null values - the ! operator is important, in that earlier example, with its meaning "use this substitute if the actual value is null"

Friday, December 2, 2016

advent calendar redux

advent2016.alienbill.com Once again, I made a virtual advent calendar! An original little digital toy for each day leading up to Christmas. You can come back daily to see what's unlocked.

I've been doing these for about 7 or so years. It's a fun way to remind myself that programming can be fun, it doesn't just have to be for work...

I might have started out with the best of the 25, Santa Hockey is actually kind of a fun little game.