Tuesday, July 18, 2017

thinking about xml vs json

Seeing an XML config file at work got me to thinking why JSON feels so much better than XML for so many people. I find it a fascinating topic, maybe because it seems like the industry movement towards more JSON is validating my personal biases...  One friend of my mine paraphrases it "well, you see, XML has way too many sharp pointy bracket bits, it's hard on the eyes".

I guess it's weird that XML lets you enforce discipline about what CAN be said (via validating schemas) but has less to say about how a coder should what they'd probably want to say...  namely "I am likely to want to serialize a lot of lists and key/value pairs".

It reminds me of when I first learned Perl, coming from a background of C (with a bit of BASIC and Logo growing up) - the concept of maps, regular expressions, strings as "first class" participants (vs C's  not "arrays of characters"), duck-typing, and not having to micromanage memory use were revelations. But especially maps (key-value pairs) - a hugely empowering concept.... trivially simple yet enormously powerful, which is about the definition of elegant. And that elegance is something that JSON leverages so well.

Googling a bit I found Stop Comparing JSON and XML which, honestly, sounds a little defensive to me. For some engineers, XML's precision and control just feels better, but it sounds like some fans feel they're on the wrong side of the trendlines, so it opens up like this:
Stop it! These things are not comparable. It's similar to comparing a bicycle and an AMG S65. Seriously, which one is better? They both can take you from home to the office, right? In some cases, a bicycle will do it better. 
The not so subtle implication being that XML is more like the $220K Mercedes and JSON the bike.

I'm not sure I agree that "JSON is a data format, XML is a language". The article points out some standard tools that XML comes with: XPath processors for pulling things out of a chunk of data, XML Schemas for validating (I guess that one out over DTD?),  XSL for transforming (and OH what a pain that can be, trying to use a pure-functional "I can't even use a conditional to set the initial value of a variable, because once I'm out of scope the conditional set up the variable went away")... I don't see those things as being intrinsic to the format, however.

Moving on - take a look at the example that article gave, JSON vs XML
{
  "id": 123,
  "title": "Object Thinking",
  "author": "David West",
  "published": {
    "by": "Microsoft Press",
    "year": 2004
  }
}

vs

<?xml version="1.0"?>

<book id="123">
  <title>Object Thinking</title>
  <author>David West</author>
  <published>
    <by>Microsoft Press</by>
    <year>2004</year>
  </published>
</book>

(The article says that's 140 vs 167 characters, but I put the latter at 189) ... anyway, back to my point that XML is a bit worse at suggesting a "best practice" of how something should encoded - because you're so often not sure if something "should" be an attribute or a child element. The article puts id as "metadata", but that seems kind of an arbitrary distinction to me. (Trying to think of what the rule of thumb takeaway is - data is the information that would have to exist in a different storage system, but metadata is sort of specific to that system?) I've certainly seen other folks who would have done something like
  <published year="2004">
    <publisher>Microsoft Press</publisher>
  </publishes>

and so coming into a place, trying to follow the previous developers' footsteps- the decision can be arbitrary, and thus hard to predict.

I remember thinking it weird how hard it was to write a Schema (or maybe a DTD?) that let the child elements be in any order; the tools I was using in the mid-aughts made it much simpler to insist on, say, "first title, THEN author, THEN published", rather then saying "there needs to be a title, author, and published but they can be in any order". It seemed odd to me, because the idea of maps were so in my head then, while this kind of stricter document definition felt weirdly like an obfuscated round of "fill in the blank".

Conversely, JSON is actually stricter - in the sense of it STRONGLY suggesting that keys of a map should be unique. It guides you to thinking in terms of maps and ordered lists (it's kind of interesting that there's not a strong concept of an unordered set in it, but obviously the interpreting system is free to ignore or embrace the order given.)

Still, I think a lot of the vehemence come from engineer's gut feelings, rather than any small set of arguments. Probably some of the same people who dislike duck-typing are more likely to prefer XML's style of strictness, and the ability to verify the semantic completeness of a document before having code interact with it. (Also worth checking out is stuff like my friend Leonard Richardson's O'Reilly book "RESTful Web APIs"; I suspect he feels the rise of JSON is a bit of a step backwards, in terms of making information available to all, and understandable by automated systems, and so he's interested in best of both world approaches that have the strengths of a JSON foundation while adding in some of the missing meta- aspects that tell you what you're actually looking at.)

You know I see a lot of the points I make here are often well covered in the comments of that Stop Comparing JSON and XML article. It's nice to have allies!






Wednesday, July 12, 2017

the robotic ruler of the river of no return

River Raid was one of the finest games produced for the Atari 2600. One of the first vertically scrolling shooters, this game was remarkably well designed. While the enemies (copters and boats and later small jets) could only threaten the player with menacing kamikaze moves upon approach, the constantly diminishing fuel supply would lead the player to recklessly hightail it down the "River of No Return" to pass over replenishing fuel depots, a tension-provoking detail most other games of the era couldn't match. And I am going to introduce you to the games indisputable conqueror.

First, a note about the game's author, Carol Shaw- the first professional female video game designer. This game is her singular masterpiece (I don't think many people really look back that fondly on "3-D Tic Tac Toe", and the 1-on-1 Pong-like action of her "Polo" tie-in game never saw the light of day...) This interview has her talking about her experience. But her peers thought she was great, designer Mike Albaugh said
I would have to include Carol Shaw, who was simply the best programmer of the 6502 and probably one of the best programmers period....in particular, [she] did the [2600] kernels, the tricky bit that actually gets the picture on the screen for a number of games that she didn't fully do the games for. She was the go-to gal for that sort of stuff.
As a guy who wrote an original Atari 2600 from scratch in assembly , I know how tricky that kernel stuff is... (and true confession, my game ended up having its kernel tweaked by genius Paul Slocum anyway.)

One of the cleverest bits of River Raid is its use of pseudorandom number generators to generate section after section of the river - this let the game pack in a consistent, huge game playing field even though the whole cartridge was only 4K bytes of ROM. The levels alternated between straight sections and split sections and went on practically forever.

Over a decade ago I got to wondering about how far the river went, and got so far as having B. Watson generate this image of the first 4 sections, guaranteed to bring a bit of nostalgia to the 80's gamer heart:
(Of course the funny thing about posting this kind of image is that River Raid scrolls from the bottom, but webpages scroll from the top...) That project to map out more of the river never went anywhere, but this AtariAge thread gets revived from time to time... and I would say, the indisputable Ruler of the River of No Return (and one of the participants in that thread) is one "Lord Tom"

For starters, here's Lord Tom's map of the first 600 river sections...

And how does Lord Tom know what the first 600 sections look like? I contacted him at AtariAge (such a damn fine resource!) and he said
To make the map, I wrote a Lua script for use in the BizHawk emulator that essentially cheated through the game with the plane offscreen somewhere, taking screen-shots of each enemy/terrain slot along the way (32 per map section). I assembled these into the big map with a simple Java app.
But that wasn't enough for Lord Tom. He's a member of the "TAS Community" - Tool Assisted Speedruns, folks who learn how to let machines help them drive through to the ending of games faster than any human ever could. They don't cheat - the actual code of the game is sacrosanct - but by abusing every input available to them they're like the crew of the Nebuchadnezzar getting ready to dive back into the Matrix, mastering the code behind the world that lets, say, Mario move like a crazy drug-fueld Ninja, or in Lord Tom's case, to build a frickin' robot to play the game better than any human (or 'bot) in history ever has. Specifically, to get the maximum possible score of 1,000,000 (or in Activision speak, !!!!!!) That looks like this:


To give that robot a script, he built a replica of River Raid in Java, one that could reproduce all the twists and turns and boats and helicopters and fuel tanks that that little cartridge's algorithm could churn out with incredible precision, and then used it to power something like the "Many Worlds Interpretation" of Quantum Physics, plotting out a millions of possible futures for each frame, then pruning and working the best 150,000 or so, until he got a damn near optimal path. (And to give you an idea of this robot's skill about this, not only does this well-nigh perfect path take an hour twenty to get to that million points, Activision would send you a patch designating you a "River Raider" if you sent in a photograph showing that you got 15,000!)

So, in his own words:
Yes, due to the technique I used for solving the game, I had to write a Java simulator, which I think ended up being something like 10,000 times faster than trying to do the bot computations through the emulator. And I only simulated the game's logic/state; I didn't actually output a display or sound, though in the grand scheme of things that would have been easy enough to do.

The solving algorithm focused heavily on fuel and (of course) score. Since fuel is consumed at the same rate regardless of speed, it's best to almost always go full throttle. There are a few terrain exceptions, and the other main exceptions are slow-downs to get extra fuel or manipulate which enemies move/don't move to make them easier to kill.

For fuel, I basically looked at the map and plotted out how far I'd get for each life (once fuel becomes rare, it's better score-wise to die for a full tank than to keep slowing down to milk depots). Then for various points along the route (e.g. section 5, 10th enemy) I'd specify a minimum fuel to have -- any solution paths with less fuel would be killed.

The only non-survivable states in the game relate to fuel, and then very limited times when e.g. you can't slow down fast enough to clear terrain, or avoid an enemy that's about to hit you.

Other than that, it was pure heuristic; 30 times a second it would simulate paths with each possible input, eliminate duplicates and deaths, and periodically score them and keep the best several thousand. To handle islands, I stipulated that a certain # of paths would always be kept alive on each side of the screen. As I recall, the algorithm would score and cull several times each section; it never really "looked ahead" at all, just periodically compared outcomes for 500,000 or so input possibilities and kept the best ones.

I think all in all, I calculate the bot simulated over 2 trillion game states to complete the game. 
You can read even more details at his TASVideos Submission Page, but I think you get the idea here.

Amazing. I've done Atari coding and even some Java-based "tool assistants" (to get photorealistic images into the long-lost site pixeltime, or to remove the scrolling credits from still backdrops) but nothing that comes ANYWHERE NEAR what Lord Tom (or Carol Shaw, for that matter) has done.

the end of the genius bar

The Cambridgeside Galleria is a surprisingly durable shopping mall undergoing a makeover. (I assume to make it look less 90s brassy bling and more Star Trek/Apple Store-ish) Their Apple Store just reopened after a long time closed for renovations and expansion. Now it looks like this:



There's more tables, better use of the sidewalls for product display but, most strikingly: no Genius Bar at the back. They do have a bunch of these little cube stools:

and you can see one of the Genius-y worktables in the back, the worktables are where they actually do their magic. Reports are some of the idea is to remove that old counter as a barrier, and literally get the Genius on the same side as the customer.

It's an interesting idea, and maybe a gamble to get rid of that visible base of support. I've long thought that the Genius Bar was the secret sauce, the "unfair advantage" Apple sported its rivals Android and Windows. Microsoft has most obviously started to follow Apple's lead with its own stores, but since it doesn't control the hardware as much as Apple does, I wonder if it feels like more of a mixed bag. And for Android? The people at the various carrier's stores never seem like they're going to be quite as focused or with it as the Apple folk, though I'm sure some are quite good.

Monday, July 10, 2017

what's new with you?

At work I got to thinking about that if there's an international icon for "NEW" that doesn't use icons (new as opposed to used, obviously a plus sign is pretty common for "add another")

It reminded me of an old videogame, Captain Blood, where you had to travel a galaxy communicating with aliens, but you could only pick from a bank of 28 pictograph icons...
Interesting concept...

Thursday, July 6, 2017

porchfest poster cheats and tweaks

So this marks the 4th year of be doing the web (and print) work for the Jamaica Plain Porchfest.

Like I said last year, I'm not a print designer but I play one on the web - last year's innovation was splitting the collection of porches north and south and putting a map and performance list on either side of the print map, rather than having a big map on one side and making people flip flip flip to see who was playing where in the block schedule.

In general, this year's print map looks fairly similar to last's...



This year my energy was directed into making most of the website reusable on a year-after-year basis. Prior to this, I would start with a more or less blank slate and then copy in old files as needed. Also I tended to have all the data lumped in the same set of folders as the content. I knew I could do better.

The core of it is still good, reliable, rugged JSON files in directories, but this year I moved them (and the band photos) to a separate root folder from my main content, and so now every script looks at  a hidden ".dbroot" folder containing the path to that year's data.

I address one big usability issue: in prior years I had the webpage and poster number porches on a strict North to South basis. This had a big flaw: it was hard to follow when  sometimes house "n" would be way on the east side but house "n+1" would be on the west, and then house "n+2" would be back to the west. I wanted a more human-friendly way of clustering things, so a group of houses consecutively numbered would all be near each other. I wasn't feeling smart enough to teach a computer to do all that clustering work so I hacked an existing map display page so I could click each house in order, and that would assign it its number. (This will also make it easier to maintain ordering if a porch is added or removed late; the people running the event consider it important to keep the porch numbering on the print maps consistent with whatever is online, which wouldn't happen if the computer was reassigning numbers based on latitude every time the page loaded.)

Another issue is that sometimes one address is supporting multiple events, or maybe it is hosting musical performance (marked by an orange house icon) but is also one of the event sponsors (marked by a yellow circle with a letter). Previously I then made a new type of icon (yellow house) but this year I realized it was better to just adjust icons' positions, so they were abutting on the map rather tha overlapping. It's not like they icons need to be precise, we're not targeting drone strikes, just making sure people get to the right side of the street in the right area until they can hear and follow the music themselves.

I'd "cheated" latitude and longitude years prior for similar reasons, but it was a serious pain in the butt...  a single degree of latitude change is about 70 miles, so to make "one icon over" adjustments on a map you have to deal with thousandths of a degree, and it was terribly fiddly. Also it's hard to translate from x and y (i.e. what I'm looking at on screen) into Lat/Long, especially since North America is in the negative Longitudes, and I the developer would have to remember which way was positive and which was negative. BLEH!

So in my JSON database, I added a "xycheat" field as an array of two numbers which was then read by code like this
   var cheat = location.xycheat;
   if(cheat){
     location.long += cheat[0] / 2500;
     location.lat -= cheat[1] / 3500;
  }

Those values meant xycheat = [1,.5]; would move a porch roughly one icon worth east, and half an icon's worth south. Much easier than the "tweak a value in the thousandth of decimal, reload, check where it landed, repeat".

Finally, I almost got bitten at the last minute by bitrot. Lacking print tools or the know how to use them, I tend to assemble the parts of the print map on a big webpage - map and block schedule, and then finesse the assemblage by hand. To get it at closer to 300 dpi for print vs the 72 dpi web standards are based on, I put a "zoom:3" on the body, then (as I described earlier) use the headless browser Phantomjs to make an oversized screenshot.

This year - and of course this is all at the last minute when I assume I'm on the finish line... phantom didn't work. I was getting


Assertion failed: (_consumed <= scratch_size), function _hb_coretext_shape, file src/hb-coretext.cc, line 764.

No idea. Luckily downloading the latest version fixed it (I found it a little easier to get the latest via their download page rather than homebrew where I loaded it before.)

Once that was settled I was still getting this error:
ReferenceError: Can't find variable: google

Not sure if it was tied into me using https on all my sites now or what, but googling I found 
./phantomjs  --ignore-ssl-errors=true  phantom_view.js 
which seemed to fix it. Phew! I'm not sure what my plan B would have been - maybe just coping with screen resolution print - highly suboptimal.

The final site is looking a little long in the tooth (especially from the inside... panicked coding every 4 years doesn't always lead to the best engineering)  despite the still pretty decent mobile support. So next year maybe I'll focus on clean up, or even try (maybe) making some kind of app, though android support is going to be a pain for this Apple fanboy.