Shaun Mccran

My digital playground


XML Whitespace is evil and should be punished

I've spent a bit of time recently working on a flash based reporting website. The project is all pretty standard architecture for a flash website, but there has been a persistent issue about loading times and poor user experience.

After having a bit of digging around behind the scenes it appeared as though the flash SWF file was streaming a configuration XML file in the background. This isn't a great idea at the best of times but in this case the file was 147k lines long and weighed in at 22.5mb. So this accounted for a stack of loading time when the flash app loads up.

After downloading the XML file and browsing through it in Eclipse my first impression is that there was a ton of whitespace in it. After running a quick 'find and replace' on any double space characters (to avoid removing spaces in legitimate text strings) and re-saving the file was down to 3.2mb.

So let this be a warning to anyone loading up machine-to-machine text files. Squeeze them down, don't include whitespace, your Apps don't care about it, the file doesn't need to be human readable, all you are doing is using up network bandwidth.

I won't even get into the risks involved in allowing your config files to be downloaded by people here, that's a whole different issue!

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Adam Cameron's Gravatar I think the bigger problem you have here is that you've got 147k lines of XML data that you need to do *anything* with.

Also, XML it intended for data that's targeted at computers AND people. I'd say if it's not intended for people, it shouldn't be XML at all.

Of course in your case it might be outwith your control, but as general advice for other situations.

# Posted By Adam Cameron | 24/10/2012 13:11
Neil Webb's Gravatar Poor old XML Whitespace, he's just a patsy!

As Adam stated, by definition XML should be human readable else why go with XML. But the real point of interest is that they have 147,000 lines of code in a CONFIG file ... aaaaaaand you just removed all the whitespace! I certainly hope it doesn't need to be human-readable ;)

1) Are 147k lines really needed just to configure the app? i.e. Does every single person loading the app need all of that data? And need it upfront?

2) What about using a binary protocol (AS Remoting) instead of XML? You could whip the time down drastically for ALL data transfers throughout the app and have your data in Class structures too.

3) If you need to keep the XML then what about reviewing the structure of it?
I wrote some thoughts on this a while back:
Notice how, even in my small example, simply re-evaluating where a single attribute is placed not only prevents the need to modify any code within the app itself, but can also remove a lot of bulk from the child nodes (if there were, say, 100 'singles' and 100 'doubles' you can see that this one simple change would amount to lines and lines of code). If you have no control over the structure because it's pre-defined you can still XSLtransform it (see comments for the post I linked to).
# Posted By Neil Webb | 24/10/2012 16:56
Shaun McCran's Gravatar @Adam That is a very good point, I was kinda amazed myself when I saw exactly how big the config file was.

The file is sort of intended for people, but bizarrely I can only say that because it contains loads of logic that should be in the flash file, not in the Config file.

@Neil Rather shockingly the 147k lines are comprised of stacks of XML blocks stacked against each other (like the dev's just copied and pasted all the single files into one long file.) Also all the mySQL code is in there (They are actually passing the sql code back to the server to run queries). On top of that there is XML data schema's and example data blocks, so literally all the logic to to run the flash app.

It could be run once, at an application level rather than at a user level, but I'm pretty sure they wont know how to do that.
# Posted By Shaun McCran | 24/10/2012 17:32
Neil Webb's Gravatar >Also all the mySQL code is in there
Wow, right. Security, yay!
It sounds like a huge mess of misplaced responsibility. The original app at my old job was initially placing a lot of responsibility on the config too. It was eventually refactored so that it just loaded in some basic params then made a call to a back-end (.NET service) which handled everything from there on in. Fortunately I had a boss who was clued up and understood the need to do this.
# Posted By Neil Webb | 24/10/2012 21:51
Adam Cameron's Gravatar > "all the mySQL code is in there (They are actually passing the sql code back to the server to run queries). On top of that there is XML data schema's and example data blocks, so literally all the logic to to run the flash app."


*That* is what your blog article should have been about. "Incredibly dumb things you can do with XML"

I feel sorry for you having to maintain that stuff.
# Posted By Adam Cameron | 25/10/2012 05:14
Shaun McCran's Gravatar It is a pretty shocking app, supplied by a third party, so I've got limited control over it.

I'm hoping to change how the App works, but I think its going to be too much of an Architectural change for them. It is a great example of just how inefficiently an App can be slung together.

(I am still trying to work out why they are passing around XML for data within the App as well, rather than using the FlaxGateway for AMF)
# Posted By Shaun McCran | 25/10/2012 06:55
reflective tape's Gravatar Pretty! This was an extremely wonderful post. Many thanks for providing this info.
# Posted By reflective tape | 12/01/2016 23:59
Back to top