This article argues that regular expressions cannot match nested structures because regexes are finite automatons.
He then offers a list of problems in which the answer states that the following cannot be solved using regexes:
- matching an XML element
- matching a C/VB/C# math expression
- matching a valid regex
Since 2 & 3 can conceivably contain brackets; this nesting is unsolvable for regexes. But why is it impossible to match an XML element ? (He didn't provide examples).
You can match a limited subset of HTML tags, if you know in advance the tags to be matched.
But you can't (reliably or nicely) parse arbitrary HTML. It is not a regular language.