Discerning the type of data stored in a file is frequently a challenge. We’ve come up with all sorts of ways to do it- like including magic bytes at the start of a file, using file extensions, appending MIME type information where possible, and frequently just hoping for the best. Ivan was working on a Python system that needed to handle XML data. Someone wanted to make sure that the XML data was actually XML, and not some other file format.

def is_xml(str):
    return str.startswith("<")

Any string of text which starts with < is clearly an XML file. This certainly won’t give any false positives. If we assume that they at least trimed whitespace off, I think we can be fairly safe that there won’t be any false negatives at least. Though if there is some way to generate a valid XML document where the first non-whitespace character isn’t a <, I’d be curious to see it.

The real question is: what if this check is actually successful at filtering out a large amount of invalid files? If this check is basically useless, that’s a WTF. If this check is actually valuable– that’s a bigger WTF.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there’s configuration drift. Get started with Otter today!

Remy Porter

Source link

You May Also Like

John Deering for Jan 07, 2024 – John Deering, Humor Times

John Deering is chief editorial cartoonist for the Arkansas Democrat-Gazette, the state’s largest…

You Could Barely Tell He Was On Vacation

“My Italian grandfather on his first trip to Hawaii.” (submitted by IG…

It Was The Photo Of A Lifetime, Except For What Was Happening In The Background

“Dad with a nice drive at Pebble Beach.” (submitted by IG @kt10milelake) …

Threading in JavaScript

The easiest way to write programs that support concurrency is to not.…