File Type Detection

Discerning the type of data stored in a file is frequently a challenge. We’ve come up with all sorts of ways to do it- like including magic bytes at the start of a file, using file extensions, appending MIME type information where possible, and frequently just hoping for the best. Ivan was working on a Python system that needed to handle XML data. Someone wanted to make sure that the XML data was actually XML, and not some other file format.

def is_xml(str):
    return str.startswith("<")

Any string of text which starts with < is clearly an XML file. This certainly won’t give any false positives. If we assume that they at least trimed whitespace off, I think we can be fairly safe that there won’t be any false negatives at least. Though if there is some way to generate a valid XML document where the first non-whitespace character isn’t a <, I’d be curious to see it.

The real question is: what if this check is actually successful at filtering out a large amount of invalid files? If this check is basically useless, that’s a WTF. If this check is actually valuable– that’s a bigger WTF.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there’s configuration drift. Get started with Otter today!

Remy Porter

Source link

Up next

February 2023 Fresh Pix

Author

ReportWire

Share article

r/funny – Bathroom Wisdom

Grand Jury Testimony: Mike Pence – Marilyn Sands, Humor Times

Grasshopper

I Was Eight, But My Hair Was About 35

Lomachenko evokes ‘No Mas-Chenko’: What’s next for Kambosos?

Wesley Snipes Back As Blade In MCU? #Shorts

Northern Lights Gallery

Meghan Markle to miss Mother’s Day with Prince Archie and Princess Lilibet

File Type Detection

Up next

Author

ReportWire

Share article

You May Also Like