Tuesday, July 18, 2017

thinking about xml vs json

Seeing an XML config file at work got me to thinking why JSON feels so much better than XML for so many people. I find it a fascinating topic, maybe because it seems like the industry movement towards more JSON is validating my personal biases...  One friend of my mine paraphrases it "well, you see, XML has way too many sharp pointy bracket bits, it's hard on the eyes".

I guess it's weird that XML lets you enforce discipline about what CAN be said (via validating schemas) but has less to say about how a coder should what they'd probably want to say...  namely "I am likely to want to serialize a lot of lists and key/value pairs".

It reminds me of when I first learned Perl, coming from a background of C (with a bit of BASIC and Logo growing up) - the concept of maps, regular expressions, strings as "first class" participants (vs C's  not "arrays of characters"), duck-typing, and not having to micromanage memory use were revelations. But especially maps (key-value pairs) - a hugely empowering concept.... trivially simple yet enormously powerful, which is about the definition of elegant. And that elegance is something that JSON leverages so well.

Googling a bit I found Stop Comparing JSON and XML which, honestly, sounds a little defensive to me. For some engineers, XML's precision and control just feels better, but it sounds like some fans feel they're on the wrong side of the trendlines, so it opens up like this:
Stop it! These things are not comparable. It's similar to comparing a bicycle and an AMG S65. Seriously, which one is better? They both can take you from home to the office, right? In some cases, a bicycle will do it better. 
The not so subtle implication being that XML is more like the $220K Mercedes and JSON the bike.

I'm not sure I agree that "JSON is a data format, XML is a language". The article points out some standard tools that XML comes with: XPath processors for pulling things out of a chunk of data, XML Schemas for validating (I guess that one out over DTD?),  XSL for transforming (and OH what a pain that can be, trying to use a pure-functional "I can't even use a conditional to set the initial value of a variable, because once I'm out of scope the conditional set up the variable went away")... I don't see those things as being intrinsic to the format, however.

Moving on - take a look at the example that article gave, JSON vs XML
{
  "id": 123,
  "title": "Object Thinking",
  "author": "David West",
  "published": {
    "by": "Microsoft Press",
    "year": 2004
  }
}

vs

<?xml version="1.0"?>

<book id="123">
  <title>Object Thinking</title>
  <author>David West</author>
  <published>
    <by>Microsoft Press</by>
    <year>2004</year>
  </published>
</book>

(The article says that's 140 vs 167 characters, but I put the latter at 189) ... anyway, back to my point that XML is a bit worse at suggesting a "best practice" of how something should encoded - because you're so often not sure if something "should" be an attribute or a child element. The article puts id as "metadata", but that seems kind of an arbitrary distinction to me. (Trying to think of what the rule of thumb takeaway is - data is the information that would have to exist in a different storage system, but metadata is sort of specific to that system?) I've certainly seen other folks who would have done something like
  <published year="2004">
    <publisher>Microsoft Press</publisher>
  </publishes>

and so coming into a place, trying to follow the previous developers' footsteps- the decision can be arbitrary, and thus hard to predict.

I remember thinking it weird how hard it was to write a Schema (or maybe a DTD?) that let the child elements be in any order; the tools I was using in the mid-aughts made it much simpler to insist on, say, "first title, THEN author, THEN published", rather then saying "there needs to be a title, author, and published but they can be in any order". It seemed odd to me, because the idea of maps were so in my head then, while this kind of stricter document definition felt weirdly like an obfuscated round of "fill in the blank".

Conversely, JSON is actually stricter - in the sense of it STRONGLY suggesting that keys of a map should be unique. It guides you to thinking in terms of maps and ordered lists (it's kind of interesting that there's not a strong concept of an unordered set in it, but obviously the interpreting system is free to ignore or embrace the order given.)

Still, I think a lot of the vehemence come from engineer's gut feelings, rather than any small set of arguments. Probably some of the same people who dislike duck-typing are more likely to prefer XML's style of strictness, and the ability to verify the semantic completeness of a document before having code interact with it. (Also worth checking out is stuff like my friend Leonard Richardson's O'Reilly book "RESTful Web APIs"; I suspect he feels the rise of JSON is a bit of a step backwards, in terms of making information available to all, and understandable by automated systems, and so he's interested in best of both world approaches that have the strengths of a JSON foundation while adding in some of the missing meta- aspects that tell you what you're actually looking at.)

You know I see a lot of the points I make here are often well covered in the comments of that Stop Comparing JSON and XML article. It's nice to have allies!






No comments:

Post a Comment