Tuesday, May 17, 2016

debugging JSON

Iffy JSON can be a serious pain in the butt; the parser that JQuery uses by default is extremely picky and prone to failing silently; and some sources of JSON tend to exclude ANY whitespace and so can swamp many text editors (that really weren't designed for lines hundreds of thousands of characters long) - finding some garbage characters that choke can be a serious chore.

http://json.parser.online.fr/ has been my default "make sure this works", I like how it evaluates it two ways - sometimes a block of "almost JSON" will choke $.getJSON but - protip - if you precede the block with var someVar =  and end it with ;, and then include it as .js, it gets by.

That's a suboptimal hack, and that site, while sometimes useful, does a poor job of telling you what characters are causing things to choke... recently I discovered http://jsonlint.com/ - this site is pretty great, because it does "prettify"-step BEFORE checking to see if everything is valid, and gives a more or less reasonable explanation of what's wrong.

Anyway, here is some perl I hacked together in an emergency way for the work I'm doing with Somerville Porchfest, to clean up the "almost JSON" they're handing me:

open(READ, "bands.raw.json");
open(WRITE, "> bands.json");
while(defined($line = <READ>)){
    $line =~ s/\\\'/\'/g; #replace \' with '                                    
    $line =~ s/\\\"/\'/g; #replace \" with '                                    
    $line =~ s/\\x3c/\</g; #replace \x3c with <                                 
    $line =~ s/\\x3e/\>/g; #replace \x3c with >                                 
    $line =~ s/\\x26/\&amp\;/g; #replace \x26 with &amp;                        
    $line =~ s/[^[:ascii:]]//g; #ascii only please                              
    $line =~ s/\\r/ /g;  #remove embdedded \r                                   
    $line =~ s/\\n/ /g;  #remove embedded \n                                    
    $line =~ s/\t/    /g; #repalce tabs with spaces                             
    print WRITE $line;
}
close WRITE;

close READ;

I've seen worse! That line with [:ascii:] seems especially useful in the future, via this perlmonks thread.

No comments:

Post a Comment