Discerning the type of data stored in a file is frequently a challenge. We’ve come up with all sorts of ways to do it- like including magic bytes at the start of a file, using file extensions, appending MIME type information where possible, and frequently just hoping for the best. Ivan was working on a Python system that needed to handle XML data. Someone wanted to make sure that the XML data was actually XML, and not some other file format.

def is_xml(str):
    return str.startswith("<")

Any string of text which starts with < is clearly an XML file. This certainly won’t give any false positives. If we assume that they at least trimed whitespace off, I think we can be fairly safe that there won’t be any false negatives at least. Though if there is some way to generate a valid XML document where the first non-whitespace character isn’t a <, I’d be curious to see it.

The real question is: what if this check is actually successful at filtering out a large amount of invalid files? If this check is basically useless, that’s a WTF. If this check is actually valuable– that’s a bigger WTF.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there’s configuration drift. Get started with Otter today!

Remy Porter

Source link

You May Also Like

r/funny – Bathroom Wisdom

This is a friendly reminder to read our rules. Memes, social media,…

Grand Jury Testimony: Mike Pence – Marilyn Sands, Humor Times

Was Star Witness Mike Pence’ Grand Jury testimony an Interrogation or a…

Grasshopper

One of my multitude of peeves is the phrase “special characters.” Somehow,…

I Was Eight, But My Hair Was About 35

“My mom and aunt were always into the big huge 80’s hair,…