Saturday, July 16, 2016

parsing itunes Library.xml into json

Is it fair to say iTunes' Library.xml situation is a hot mess, representing all the reasons why the early 2000's embrace of xml got replaced by an affection for json? I'll let you be the judge...

I've grown to distrust iTunes, sadly. A month or two ago I noticed one album that I remembered having (Mario Bros the Movie soundtrack... don't judge.) was missing. The other week I realized a King Missile album wasn't there, and today a difficult-to-replace album from a college A Capella group I worked with once (again, don't judge).

Some of these files I can recover from old file backups. Plus I had a copy of iTunes' "Library.xml" (What you get when you goto "File | Library | Export Library") from when I switch my collection to my mac so I should be able to figure out what has flown the coop.

Except look at this xml file excerpt (again, not with the judging of the music choice, ok?)
<dict>
        <key>Track ID</key><integer>3853</integer>
        <key>Name</key><string>Diggin' Your Scene</string>
        <key>Artist</key><string>Smash Mouth</string>
        <key>Album Artist</key><string>Smash Mouth</string>
        <key>Composer</key><string>Gregory Camp</string>
        <key>Album</key><string>Astro Lounge</string>
        <key>Genre</key><string>Alternative Pop</string>
        <key>Kind</key><string>MPEG audio file</string>
        <key>Size</key><integer>3803563</integer>
        <key>Total Time</key><integer>190066</integer>
      [...]
</dict>

I think the # one thing I don't like about XML is when it relies on node proximity for semantic values, especially since it has a rather rich ability to label metadata as attributes of the like. Having what is roughly a flat list of key tag, value tag (of some kind), key, value tag (of some kind).  Wouldn't something like
<dict>
  <item key="Track ID" type="integer">3853</item>
</dict>
make more sense? Or if that was too difficult to validate (and heaven knows XML engineers loved their validations) making key an attribute
<dict>
  <integer key="Track ID">3853</integer>
</dict>
at least nesting these things into a wrapper, so it's an iterable set?
<dict>
  <item><key>Track ID</key><integer>3853</integer></item>
</dict>
Do I just not "get" the magic of XML in this case?

Anyway, I decided to get this into JSON, where I'm most comfortable manipulating things. My first attempt was using sparkbuzz's generic jQuery-xml2json but it choked pretty hard on Apple's mess, giving me separate piles of keys, integers, and string values. I know you can use JQuery's selectors on xml, so I decided to go that route. Here is my in-progress embedded script (using the old python miniwebserver trick to serve up the files) - it doesn't yet do the comparison of the old Library contents with the new, but that should be pretty easy with this JSON format:
  {
   "Track ID": "3853",
   "Name": "Diggin' Your Scene",
   "Artist": "Smash Mouth",
   "Album Artist": "Smash Mouth",
   "Composer": "Gregory Camp",
   "Album": "Astro Lounge",
   "Genre": "Alternative Pop",
   "Kind": "MPEG audio file",
   "Size": "3803563",
   "Total Time": "190066",
  [...]
  }
Like a breath of fresh air, ain't it?

Anyway, here's the relevant bit of script: (small.xml was my test file, and when I'm ready to gear up to the real files, I'll add multiple calls in the "$.when()" so that files are loaded in parallel and work doesn't begin until all are loaded.

var library = {};
$.when(
        $.ajax({
        url: 'small.xml',
        dataType: 'xml',
        success: function(response) {
            xmlToLibraryJson("small",response);
        }
    })    
).then(function() {      
  //just a place holder to show my results  
    $("#guts").append(JSON.stringify(library,null," "));  
});

function xmlToLibraryJson(file,response){
          $xml = $(response);//$.parseXML( response );
          var lib = [];
          $xml.find("key:contains('Tracks')+dict > dict").each(function(i,elem){
            var $dict = $(elem);
            var dict = {};
            $dict.find("key").each(function(j,thiskey){
              dict[$(thiskey).text()] = $(thiskey).next().text();
            });
            lib.push(dict);            
          });
          library[file] = lib;
}

So, that's it - it was a little harder to assemble than it looks, I haven't had to use $.next() and fancy proximity CSS-ish selectors that much before. I still have to accept the fact that if I add in old songs to iTunes, there's virtually no way to get iTunes to not treat them as new songs, and that will absolutely screw up my rolling "new music" Smart Playlists. But it's better than not having my old music at all.

1 comment:

  1. I posted this on Facebook and pinged my friend Tim Kutz, who thought I was a bit too harsh on XML, that indeed this probably was a poor choice of design on the iTunes side, but I shouldn't blame the format (I think he gets good use of XML and its ability to self-describe in his field involving a lot of medical documentation.

    My response was this:
    Hm. Good question- I guess the specifics of XML led to widespread "cultural" problems and poor designs, of which this is just one example.

    In particular, two related issues:
    1. the ordering of tags (especially for purposes of validation) often mattered - many validators would say "this is invalid XML because the order is wrong", even if all the fields were present.

    this ties into 2:
    2. a XML document often didn't neatly slot into a programming language memory structure - of course there were marshallers and demarshallers, and in some cases those would handle things fine, but often not - as I saw today when i tried to use a generic "xml2json" filter here.

    It's funny, both of these have a common root of "XML lets you (encourages you?) to add semantic meaning to the juxtaposition of elements" (vs setting up a hierarchy or using attributes) Admittedly my brain is now forever molded into thinking in terms of maps and lists and combinations of those, but ("and so"?) I can't think of much that XML can carry that JSON couldn't.

    Similarly, validation was built into the way XML was used; you could of course build a data verifier for JSON, but in XML, they got all hot for the idea of using semantic correctness--and adding in whole new formats, mulitiple competing ones, to describe it... and in general, I'm in favour of being VERY parsimonious with new syntaxes I expect programmers to know (Unlike some toolkits, *cof*cof*Angular) In JSON you'd probably build that verifier in the language at hand, instead.

    In short, I felt vindicated that in the mindspace of the modern web, JSON ate XML's lunch. (Never underestimate the power of easy parsing and editing ... it's what keeps me using tab-delimited text files more often than I should :-D )

    ReplyDelete