Best language-agnostic questions in June 2011

Why is it that regex cannot match an XML element ?

6 votes

This article argues that regular expressions cannot match nested structures because regexes are finite automatons.

He then offers a list of problems in which the answer states that the following cannot be solved using regexes:

  1. matching an XML element
  2. matching a C/VB/C# math expression
  3. matching a valid regex

Since 2 & 3 can conceivably contain brackets; this nesting is unsolvable for regexes. But why is it impossible to match an XML element ? (He didn't provide examples).

You can match a limited subset of HTML tags, if you know in advance the tags to be matched.

But you can't (reliably or nicely) parse arbitrary HTML. It is not a regular language.