The HTML5 specification defines a parsing algorithm, based on the behaviour of mainstream browsers, which provides instructions for how to parse all markup, both valid and invalid. As a result, Hubbub parses web content well.
If you are looking for an HTML5 parser in Python or Ruby, you may wish to look at html5lib.