Best regex questions in December 2011

Recursive PHP Regex

16 votes

EDIT: I selected ridgerunner's answer as it contained the information needed to solve the problem. But I also felt like adding a fully fleshed-out solution to the specific question in case someone else wants to fully understand the example too. You will find it somewhere below.

This question is about clarifying the behavior of php's regex engine for recursive expressions. (If you ideas for how to properly match the strings below without using recursive php regex, that's very cool, but that's not the question.)

a(?:(?R)|a?)a

This is a simple expression that aims to match the character "a" or nothing, nested in one or multiple nests of the character "a". For instance, aa, aaa, aaaa, aaaaa. You don't need to use recursion for this:

aa*a

would work great. But the point is to use recursion.

Here is a piece of code you can run to test my failing pattern:

<?php
$tries=array('a','aa','aaa','aaaa','aaaaa','aaaaaa');
$regex='#a(?:(?R)|a?)a#';
foreach ($tries as $try) {
echo $try." : ";
if (preg_match($regex,$try,$hit)) echo $hit[0]."<br />";
else echo 'no match<br />';
}
?>

In the pattern, two "a"s are framing an alternation. In the alternation, we either match a recursion of the whole pattern (two "a"s framing an alternation), or the character "a", optionally empty.

In my mind, for "aaaa", this should match "aaaa".

But here is the output:

a : no match
aa : aa
aaa : aaa
aaaa : aaa
aaaaa : aaaaa
aaaaaa : aaa

Can someone explain what is happening on the third and fifth lines of output? I have tried tracing the path that I imagine the engine must be taking, but I must be imagining it wrong. Why is the engine returning "aaa" as a match for "aaaa"? What makes it so eager? I must be imagining the matching tree in the wrong order.

I realise that

#(?:a|a(?R)a)*#

kind of works, but my question is why the other pattern doesn't.

Thanks heaps!

Excellent (and difficult) question!

First, with the PCRE regex engine, the (?R) behaves like an atomic group (unlike Perl?). Once it matches (or doesn't match), the matching that happened inside the recursive call is final (and all backtracking breadcrumbs saved within the recursive call are discarded). However, the regex engine does save what was matched by the whole (?R) expression, and can give it back and try the other alternative to achieve an overall match. To describe what is happening, lets change your example slightly so that it will be easier to talk about and keep track of what is being matched at each step. Instead of: aaaa as the subject text, lets use: abcd. And lets change the regex from '#a(?:(?R)|a?)a#' to: '#.(?:(?R)|.?).#'. The regex engine matching behavior is the same.

Matching regex: /.(?:(?R)|.?)./ to: "abcd"

answer = r'''
Step Depth Regex          Subject  Comment
1    0     .(?:(?R)|.?).  abcd     Dot matches "a". Advance pointers.
           ^              ^
2    0     .(?:(?R)|.?).  abcd     Try 1st alt. Recursive call (to depth 1).
                 ^         ^
3    1     .(?:(?R)|.?).  abcd     Dot matches "b". Advance pointers.
           ^               ^
4    1     .(?:(?R)|.?).  abcd     Try 1st alt. Recursive call (to depth 2).
                 ^          ^
5    2     .(?:(?R)|.?).  abcd     Dot matches "c". Advance pointers.
           ^                ^
6    2     .(?:(?R)|.?).  abcd     Try 1st alt. Recursive call (to depth 3).
                 ^           ^
7    3     .(?:(?R)|.?).  abcd     Dot matches "d". Advance pointers.
           ^                 ^
8    3     .(?:(?R)|.?).  abcd     Try 1st alt. Recursive call (to depth 4).
                 ^            ^
9    4     .(?:(?R)|.?).  abcd     Dot fails to match end of string.
           ^                  ^    DEPTH 4 (?R) FAILS. Return to step 8 depth 3.
                                   Give back text consumed by depth 4 (?R) = ""
10   3     .(?:(?R)|.?).  abcd     Try 2nd alt. Optional dot matches EOS.
                    ^         ^    Advance regex pointer.
11   3     .(?:(?R)|.?).  abcd     Required dot fails to match end of string.
                       ^      ^    DEPTH 3 (?R) FAILS. Return to step 6 depth 2
                                   Give back text consumed by depth3 (?R) = "d"
12   2     .(?:(?R)|.?).  abcd     Try 2nd alt. Optional dot matches "d".
                    ^        ^     Advance pointers.
13   2     .(?:(?R)|.?).  abcd     Required dot fails to match end of string.
                       ^      ^    Backtrack to step 12 depth 2
14   2     .(?:(?R)|.?).  abcd     Match zero "d" (give it back).
                    ^        ^     Advance regex pointer.
15   2     .(?:(?R)|.?).  abcd     Dot matches "d". Advance pointers.
                       ^     ^     DEPTH 2 (?R) SUCCEEDS.
                                   Return to step 4 depth 1
16   1     .(?:(?R)|.?).  abcd     Required dot fails to match end of string.
                       ^      ^    Backtrack to try other alternative. Give back
                                    text consumed by depth 2 (?R) = "cd"
17   1     .(?:(?R)|.?).  abcd     Optional dot matches "c". Advance pointers.
                    ^       ^      
18   1     .(?:(?R)|.?).  abcd     Required dot matches "d". Advance pointers.
                       ^     ^     DEPTH 1 (?R) SUCCEEDS.
                                   Return to step 2 depth 0
19   0     .(?:(?R)|.?).  abcd     Required dot fails to match end of string.
                       ^      ^    Backtrack to try other alternative. Give back
                                    text consumed by depth 1 (?R) = "bcd"
20   0     .(?:(?R)|.?).  abcd     Try 2nd alt. Optional dot matches "b".
                    ^      ^       Advance pointers.
21   0     .(?:(?R)|.?).  abcd     Dot matches "c". Advance pointers.
                       ^    ^      SUCCESSFUL MATCH of "abc"
'''

There is nothing wrong with the regex engine. The correct match is abc (or aaa for the original question.) A similar (albeit much longer) sequence of steps can be made for the other longer result string in question.

Java regex anomaly?

13 votes

Can anyone tell me why

System.out.println("test".replaceAll(".*", "a"));

Results in

aa

Note that the following has the same result:

System.out.println("test".replaceAll(".*$", "a"));

I have tested this on java 6 & 7 and both seem to behave the same way. Am I missing something or is this a bug in the java regex engine?

This is not an anomaly: .* can match anything.

You ask to replace all occurrences:

  • the first occurrence does match the whole string, the regex engine therefore starts from the end of input for the next match;
  • but .* also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it with a.

The same applies for .*$, which is really no different at all from .*. If you want what I think you want, use .+ instead (no need for the $), but in general, avoid .* or .+ at all costs, and .*? or .+? like the plague

Why doesn't $ in .NET multiline regular expressions match CRLF?

13 votes

I have noticed the following:

var b1 = Regex.IsMatch("Line1\nLine2", "Line1$", RegexOptions.Multiline);   // true
var b2 = Regex.IsMatch("Line1\r\nLine2", "Line1$", RegexOptions.Multiline); // false

I'm confused. The documentation of RegexOptions says:

Multiline: Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.

Since C# and VB.NET are mainly used in the Windows world, I would guess that most files processed by .NET applications use CRLF linebreaks (\r\n) rather than LF linebreaks (\n). Still, it seems that the .NET regular expression parser does not recognize a CRLF linebreak as an end of line.

I know that I could workaround this, for example, by matching Line1\r?$, but it still strikes me as strange. Is this really the intended behaviour of the .NET regexp parser or did I miss some hidden UseWindowsLinebreaks option?

From MSDN:

By default, $ matches only the end of the input string. If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. It does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.

http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx#Multiline

So I can't say why (compatibility with regular expressions from other languages?), but at the very least it's intended.

Get itunes id from url

9 votes

what is the best way to retrieve id from itunes app link?

let say i have these link:

http://itunes.apple.com/us/app/bring-me-sandwiches!!/id457603026?mt=8

http://itunes.apple.com/us/app/bring-me-sandwiches!!/id457603026

i just want to get the id, 457603026 using php preg_match

try:

$url = 'http://itunes.apple.com/us/app/bring-me-sandwiches!!/id457603026';
preg_match("/id(\d+)/", $url, $match); 
echo $match[1]; //457603026

Sanitize sentence in php

8 votes

The title may sound odd, but im kind of trying to set up this preg_replace that takes care of messy writers for a textarea. It has to:

  1. if there is an exclamation sign, there should not be another one in a row.
  2. if there is a ., the comma wins and it has to be ,
  3. when there is one+ spaces before a coma, it should be reduced to nothing.
  4. the sentence cannot start or end with a comma.
  5. there should never be more than 2 of the same letters joined together.
  6. a space must be always present after a comma.

E.g.:

  • ,My house, which is green., is nice!
  • My house..., which is green, is nice!!!
  • My house ,which is green,,, is nice!!

The end result should always be:

My house, which is green, is nice!

Is there an already built regex that takes care of this?

Solution check out FakeRainBrigand's solution below!

I might have to use this for my own sites... nice idea!

<?php

$text = 'My hooouse..., which is greeeeeen , is nice!!!  ,And pretty too...';

$pats = array(
'/([.!?]\s{2}),/', # Abc.  ,Def
'/\.+(,)/',  # ......,
'/(!)!+/',   # abc!!!!!!!!
'/\s+(,)/',  # abc   , def
'/([a-zA-Z])\1\1/', # greeeeeeen
'/,(?!\s)/'); 

$fixed = preg_replace($pats, '$1', $text);

echo $fixed;
echo "\n\n";

?>

And the 'modified' version of $text: "My house, which is green, is nice! And pretty too."

UPDATE: Here's the version that handles "abc,def" -> "abc, def".

<?php

$text = 'My hooouse..., which is greeeeeen ,is nice!!!  ,And pretty too...';

$pats = array(
'/([.!?]\s{2}),/', # Abc.  ,Def
'/\.+(,)/',        # ......,
'/(!)!+/',         # abc!!!!!!!!
'/\s+(,)/',        # abc   , def
'/([a-zA-Z])\1\1/');      # greeeeeeen

$fixed = preg_replace($pats, '$1', $text);
$really_fixed = preg_replace('/,(?!\s)/', ', ', $fixed);

echo $really_fixed;
echo "\n\n";
?>

I would think this is a bit slower since it's an additional function call.

Perl writing to a 'memory file' plays tricks with pattern matching

8 votes

When I run this code, I get "no" printed out:

my $memory_file;
my $fh;
open ($fh, '>', \$memory_file);
print $fh "abc";
if( $memory_file =~ m/^.*$/ )
{ print "yes\n" }
else
{ print "no\n" }

If I print out $memory_file, the contents are indeed "abc".

If I change the pattern to .* (no ^ or $) it works as expected.

If I put the line $memory_file = "abc" before the match, I get 'yes' printed out (as originally expected).

What on earth is going on here?

(This is perl 5.14.1)

Update: Some more discussion on PerlMonks. It is seeming like a bug, I will log it.

It is the end of line character that is messing things up. While a regular assignment works:

my $str = "abc";
print "Works" if $str =~ /^.*$/;

...the code in the question does not. This regex should match any string, since it also matches the empty string. Even undefined values would match (though it would cause a warning). Also, ^.* does match.

The only reasonable explanation is that for some reason, whatever check is performed to match an end of string, it is not finding it. The end of the string is missing.

Curiously, replacing $ with \z works. But not \Z.

Adding a newline also works. Which would sort of make sense, as adding a newline would imply that an end of the string is also added, in the non-multiline regex sense.

I don't know the inner workings of why this happens, but I suspect that when using this particular form of "assignment", an end-of-the-string marker is never placed on the string. A sort of "raw" assignment, which confuses the check for end of string in the regex.

It feels like a bug. Perhaps this particular feature has not been properly maintained.

Count the capture groups in a qr regex?

8 votes

I am working on a project which at one point gets a list of files from an ftp server. At that point it either returns an arrayref of files OR if an optional regex reference (i.e. qr), is passed it filters the list down using grep. Further if that qr has a capture group, it treats the captured section as a version number and returns instead a hashref where the keys are the versions and the values are the file names (which would have been returned as the array if no capture groups). The code looks like (simplified slightly)

sub filter_files {
  my ($files, $pattern) = @_;
  my @files = @$files;
  unless ($pattern) {
    return \@files;
  }

  @files = grep { $_ =~ $pattern } @files;
  carp "Could not find any matching files" unless @files;

  my %versions = 
    map { 
      if ($_ =~ $pattern and defined $1) { 
        ( $1 => $_ )
      } else {
        ()
      }
    } 
    @files;

  if (scalar keys %versions) {
    return \%versions;
  } else {
    return \@files;
  }
}

This implementation tries to create the hash and returns it if it succeeds. My question, is can I detect that the qr has a capture group and only attempt to create the hash if it does?

You could use something like:

sub capturing_groups{
    my $re = shift;
    "" =~ /|$re/;
    return $#+;
}

say capturing_groups qr/fo(.)b(..)/;

Output:

2

Use ready-made character class and restrict it further

7 votes

Lots of ready-to-use character classes are available in Perl regular expressions, such as \d or \S, or new-fangled Unicode grokkers such as \p{P}, which matches punctuation characters.

Now let's say I'd like to match all punctuation characters \p{P} (quite a number of them, and not something you want to type in by hand) - all but one, all but the good old komma (or comma, ,).

Is there a way to specify this requirement short of expanding the handy character class and taking away the komma by hand?

$ unichars -au '\p{P}' | wc -l
598

Double negation:

/[^\P{P},]/

$ unichars -au '[^\P{P},]' | wc -l
597

"And" through lookahead/lookbehind:

/\p{P}(?<!,)/

$ unichars -au '\p{P}(?<!,)' | wc -l
597

unichars

javascript regular expressions as functions?

7 votes

Apparently, back in Firefox 3.6, the following was legitimate:

/[0-9]{3}/('23 2 34 678 9 09')

and the result was '678'.

FF8 isn't having any. What's the right syntax now?

Do you want

/[0-9]{3}/.test('23 2 34 678 9 09');

or

/[0-9]{3}/.exec('23 2 34 678 9 09');

Word splitting with regular expressions in Haskell

6 votes

There are several packages available for the usage of regular expressions in Haskell (e.g. Text.Regex.Base, Text.Regex.Posix etc.). Most packages I've seen so far use a subset of Regex I know, by which I mean: I am used to split a sentence into words with the following Regex:

\\w+

Nearly all packages in Haskell I tried so far don't support this (at least the earlier mentioned and Text.Regex.TDFA neither). I know that with Posix the usage of [[:word:]+] would have the same effect, but I would like to use the variant mentioned above.

From there are two questions:

  1. Is there any package to archive that?
  2. If there really is, why is there a different common usage?
  3. What advantages or disadvantages are there?

The '\w' is a Perl pattern, and supported by PCRE, which you can access in Haskell with my regex-pcre package or the pcre-light library. If your input is a list of Char then the 'words' function in the standard Prelude may be enough; if your input is ASCII bytestring then Data.ByteString.Char8 may work. There may be a utf8 library with word splitting, but I cannot quickly find it.

Why is "$1" ending up in my Regex.Replace() result?

6 votes

I am trying to write a regular expression to rewrite URLs to point to a proxy server.

bodystring = Regex.Replace(bodystring, "(src='/+)", "$1" + proxyStr);

The idea of this expression is pretty simple, basically find instances of "src='/" or "src='//" and insert a PROXY url at that point. This works in general but occasionally I have found cases where a literal "$1" will end up in the result string.

This makes no sense to me because if there was no match, then why would it replace anything at all?

Unfortunately I can't give a simple example of this at it only happens with very large strings so far, but I'd like to know conceptually what could make this sort of thing happen.

As an aside, I tried rewriting this expression using a positive lookbehind as follows:

bodystring = Regex.Replace(bodystring, "(?<=src='/+)", proxyStr);

But this ends up with proxyStr TWICE in the output if the input string contains "src='//". This also doesn't make much sense to me because I thought that "src=" would have to be present in the input twice in order to get proxyStr to end up twice in the output.

When proxyStr = "10.15.15.15:8008/proxy?url=http://", the replacement string becomes "$110.15.15.15:8008/proxy?url=http://". It contains a reference to group number 110, which certainly does not exist.

You need to make sure that your proxy string does not start in a digit. In your case you can do it by not capturing the last slash, and changing the replacement string to "$1/"+proxyStr, like this:

bodystring = Regex.Replace(bodystring, "(src='/*)/", "$1/" + proxyStr);

Edit:

Rawling pointed out that .NET's regexp library addresses this issue: you can enclose 1 in curly braces to avoid false aliasing, like this:

bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);

Replace words in a string, but ignore HTML

6 votes

I'm trying to write a highlight plugin, and would like to preserve HTML formatting. Is it possible to ignore all the characters between < and > in a string when doing a replace using javascript?

Using the following as an example:

var string = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I would like to be able to achieve the following (replace 'dolor' with 'FOO'):

var string = "Lorem ipsum FOO span sit amet, consectetuer <span class='dolor'>FOO</span> adipiscing elit.";

Or perhaps even this (replace 'span' with 'BAR'):

var string = "Lorem ipsum dolor BAR sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";

I came very close to finding an answer given by tambler here: Can you ignore HTML in a string while doing a Replace with jQuery? but, for some reason, I just can't get the accepted answer to work.

I'm completely new to regex, so any help would be gratefully appreciated.

Parsing the HTML using the browser's built-in parser via innerHTML followed by DOM traversal is the sensible way to do this. Here's an answer loosely based on this answer:

Live demo: http://jsfiddle.net/FwGuq/1/

Code:

// Reusable generic function
function traverseElement(el, regex, textReplacerFunc) {
    // script and style elements are left alone
    if (!/^(script|style)$/.test(el.tagName)) {
        var child = el.lastChild;
        while (child) {
            if (child.nodeType == 1) {
                traverseElement(child, regex, textReplacerFunc);
            } else if (child.nodeType == 3) {
                textReplacerFunc(child, regex);
            }
            child = child.previousSibling;
        }
    }
}

// This function does the replacing for every matched piece of text
// and can be customized to do what you like
function textReplacerFunc(textNode, regex, text) {
    textNode.data = textNode.data.replace(regex, "FOO");
}

// The main function
function replaceWords(html, words) {
    var container = document.createElement("div");
    container.innerHTML = html;

    // Replace the words one at a time to ensure each one gets matched
    for (var i = 0, len = words.length; i < len; ++i) {
        traverseElement(container, new RegExp(words[i], "g"), textReplacerFunc);
    }
    return container.innerHTML;
}


var html = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
alert( replaceWords(html, ["dolor"]) );

Any way to treat .* as .{0,1024} in perl RE?

6 votes

We allow some user-supplied REs for the purpose of filtering email. Early on we ran into some performance issues with REs that contained, for example, .*, when matching against arbitrarily-large emails. We found a simple solution was to s/\*/{0,1024}/ on the user-supplied RE. However, this is not a perfect solution, as it will break with the following pattern:

/[*]/

And rather than coming up with some convoluted recipe to account for every possible mutation of user-supplied RE input, I'd like to just limit perl's interpretation of the * and + characters to have a maximum length of 1024 characters.

Is there any way to do this?

Update

Added a (?<!\\) before the quantifiers, because escaped *+ should not be matched. Replacement will still fail if there is an \\* (match \ 0 or more times).

An improvement would be this

s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/
s/(?<!\\)\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/

See it here on Regexr

That means match [*+] but only if there is no closing ] ahead and no [ till then. And there is no \ (the (?<!\\) part) allowed before the square brackets.

(?! ... ) is a negative lookahead

(?<! ... ) is a negative lookbehind

See perlretut for details

Update 2 include possessive quantifiers

s/(?<!(?<!\\)[\\+*?])\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/   # for +
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/    # for *

See it here on Regexr

Seems to be working, but its getting real complicated now!

How to turn several "Sqrt[some text inside]" into several Sqrt(some text inside) , I mean from [] into ()

6 votes

I got the following Expression that can look like this (the amount of Sqrt[XXX] is unknow)

Sqrt[A+B] + Sqrt[Min[A,B]] * Min[Sqrt[C],D]

and I want to turn all Sqrt[XXX] into Sqrt(XXX) , I want to replace the [] brackets of the Sqrt into () brackets

so the above example will look like

Sqrt(A+B) + Sqrt(Min[A,B]) * Min[Sqrt(C),D]

I don't want to "hurt" the other [] brackets in the expression (like the ones next to Min)

How can I do it with regex ?

You can do this using iteration over the characters in the String. First look for the index of Sqrt[ and then look for the matching closing bracket.

Here is some sample code:

final String s = "Sqrt[A+B] + Sqrt[Min[A,B]] * Min[Sqrt[C],D]";
final char[] charArray = s.toCharArray();

int index = s.indexOf("Sqrt[");
while (index != -1) {
    final int open = index + 4;
    charArray[open] = '(';

    // look for closing bracket
    int close;
    int matching = 0;
    for (close = open + 1; close < charArray.length; close++) {
        char c = charArray[close];
        if (c == ']') {
            if (matching == 0) {
                break;
            }
            matching--;
        } else if (c == '[') {
            matching++;
        }
    }
    charArray[close] = ')';
    index = s.indexOf("Sqrt[", index + 1);
}
System.out.println(new String(charArray));

I have not tested it properly, so please do.

can't understand String.replaceAll non-greedy behavior

6 votes

any Idea why the following test fails (returns "xx" instead of "x")

@Test 
public void testReplaceAll(){
    assertEquals("x", "xyz".replaceAll(".*", "x"));
}

I don't want to do "^.*$".... I want to understand this behavior. any clues?

Yes, it is exactly the same as described in this question!

.* will first match the whole input, but then also an empty string at the end of the input...

Let's symbolize the regex engine with | and the input with <...> in your example.

  • input: <xyz>;
  • regex engine, before first run: <|xyz>;
  • regex engine, after first run: <xyz|> (matched text: "xyz");
  • regex engine, after second run: <xyz>| (matched text: "").

Not all regex engines behave this way. Java does, however. So does perl. Sed, as a counterexample, will position its cursor after the end of the input in step 3.

Now, you also have to understand one crucial thing: regex engines, when they encounter a zero-length match, always advance one character. Otherwise, consider what would happen if you attempted to replace '^' with 'a': '^' matches a position, therefore is a zero-length match. If the engine didn't advance one character, "x" would be replaced with "ax", which would be replace with "aax", etc. So, after the second match, which is empty, Java's regex engine advances one "character"... Of which there aren't any: end of processing.

How to convert a measurement displayed in an architectural format to a floating point?

6 votes

I have a database that was created and is used by an architecture firm. All measurements are stored in a format like this: 15-3/4" and 12' 6-3/4".

Is there a way to convert these types of measurements into floating point in Python? Or is there a library out there that provides this functionality?

Likewise, how would you convert from a floating point to the above format?

Depending on how regular the patterns are, you can use str.partition to do the parsing:

def architectural_to_float(text):
    ''' Convert architectural measurements to inches.

        >>> for text in """15-3/4",12' 6-3/4",3/4",3/4',15',15",15.5'""".split(','):
        ...     print text.ljust(10), '-->', architectural_to_float(text)
        ...
        15-3/4"    --> 15.75
        12' 6-3/4" --> 150.75
        3/4"       --> 0.75
        3/4'       --> 9.0
        15'        --> 180.0
        15"        --> 15.0
        15.5'      --> 186.0

    '''
    # See http://stackoverflow.com/questions/8675714
    text = text.replace('"', '').replace(' ', '')
    feet, sep, inches = text.rpartition("'")
    floatfeet, sep, fracfeet = feet.rpartition('-')
    feetnum, sep, feetdenom = fracfeet.partition('/')
    feet = float(floatfeet or 0) + float(feetnum or 0) / float(feetdenom or 1)
    floatinches, sep, fracinches = inches.rpartition('-')
    inchesnum, sep, inchesdenom = fracinches.partition('/')
    inches = float(floatinches or 0) + float(inchesnum or 0) / float(inchesdenom or 1)
    return feet * 12.0 + inches