Best security questions in January 2011

Is there any way to put malicious code into Regex?

79 votes

I want to add regex search capability to my public web page. Other than html encoding the output, do I need to do anything to guard against malicious user input?

Google searches are swamped by people solving the converse problem-- using regex to detect malicious input--which I'm not interested in. In my scenario, the user input is regex

Update: I'll be using the Regex library in .NET (C#)

Denial‐of‐Service Concerns

The most common concern with regexes is a denial‐of‐service attack through pathological patterns that go exponential — or even super‐exponential! — and so appear to take forever to solve. These may only show up on particular input data, but one can generally create one wherein this doesn’t matter.

Which ones these are will depend somewhat on how smart the regex compiler you’re using happens to be, because some of these can be detected during compilation time. Regex compilers that implement recursion usually have a built‐in recursion‐depth counter for checking non‐progression.

Russ Cox’s excellent 2007 paper on Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) talks about ways that most modern NFAs, which all seem to derive from Henry Spencer’s code, suffer severe performance degradation, but where a Thompson‐style NFA has no such problems.

If you only admit patterns that can be solved by DFAs, you can compile them up as such, and they will run faster, possibly much faster. However, it takes time to do this. The Cox paper mentions this approach and its attendant issues. It all comes down to a classic time–space trade‐off.

With a DFA, you spend more time building it (and allocating more states), whereas with an NFA you spend more time executing it, since it can be multiple states at the same time, and backtracking can eat your lunch — and your CPU.

Denial‐of‐Service Solutions

Probably the most reasonable way to address these patterns that are on the losing end of a race with the heat‐death of the universe is to wrap them with a timer that effectively places a maximum amount of time allowed for their execution. Usually this will be much, much less than the default timeout that most HTTP servers provide.

There are various ways to implement these, ranging form a simple alarm(N) at the C level, to some sort of try {} block the catches alarm‐type exceptions, all the way to spawning off a new thread that’s specially created with a timing constraint built right into it.

Code Callouts

In regex languages that admit code callouts, some mechanism for allowing or disallowing these from the string you’re going to compile should be provided. Even if code callouts are only to code in the language you are using, you should restrict them; they don’t have to be able to call external code, although if they can, you’ve got much bigger problems.

For example, in Perl one cannot have code callouts in regexes created from string interpolation (as these would be, as they’re compiled during run‐time) unless the special lexically‐scoped pragma use re "eval"; in active in the current scope.

That way nobody can sneak in a code callout to run system programs like rm -rf *, for example. Because code callouts are so security‐sensitive, Perl disables them by default on all interpolated strings, and you have to go out of your way to re‐enable them.

User‐Defined \P{roperties}

There remains one more security‐sensitive issue related to Unicode-style properties — like \pM, \p{Pd}, \p{Pattern_Syntax}, or \p{Script=Greek} — that may exist in some regex compilers that support that notation.

The issue is that in some of these, the set of possible properties is user‐extensible. That means you can have custom properties that are actual code callouts to named functions in some particular namepace, like \p{GoodChars} or \p{Class::Good_Characters}. How your language handles those might be worth looking at.

Sandboxing

In Perl, a sandboxed compartment via the Safe module would give control over namespace visibility. Other languages offer similar sandboxing technologies. If such devices are available, you might want to look into them, because they are specifically designed for limited execution of untrusted code.

Which functions in the C standard library commonly encourage bad practice?

34 votes

Hello all,

This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.

Specifically, to quote R..

strncpy does not null-terminate, and does null-pad the whole remainder of the destination buffer, which is a huge waste of time. You can work around the former by adding your own null padding, but not the latter. It was never intended for use as a "safe string handling" function, but for working with fixed-size fields in Unix directory tables and database files. snprintf(dest, n, "%s", src) is the only correct "safe strcpy" in standard C, but it's likely to be a lot slower. By the way, truncation in itself can be a major bug and in some cases might lead to privilege elevation or DoS, so throwing "safe" string functions that truncate their output at a problem is not a way to make it "safe" or "secure". Instead, you should ensure that the destination buffer is the right size and simply use strcpy (or better yet, memcpy if you already know the source string length).

And from Jonathan Leffler

Note that strncat() is even more confusing in its interface than strncpy() - what exactly is that length argument, again? It isn't what you'd expect based on what you supply strncpy() etc - so it is more error prone even than strncpy(). For copying strings around, I'm increasingly of the opinion that there is a strong argument that you only need memmove() because you always know all the sizes ahead of time and make sure there's enough space ahead of time. Use memmove() in preference to any of strcpy(), strcat(), strncpy(), strncat(), memcpy().

So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?

In the interests of objectivity, I have a number of criteria for an answer:

  • Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
  • Please highlight the misuse to which the code is currently put.
  • Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.

Please avoid:

  • Debates over naming conventions of functions (except where this unequivocably causes confusion).
  • "I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.

As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.

I am also working as per C99.

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

Are dynamic mysql queries with sql escaping just as secure as prepared statements?

8 votes

I have an application which would greatly benefit by using dynamic mysql queries in combination with mysql (mysqli) real escape string. If I ran all data received from the user through mysql real escape would it be just as secure as using mysql prepared statements?

Yes, but a qualified yes.

You need to properly escape 100% of the input. And you need to properly set character sets (If you're using the C API, you need to call the mysql_set_character_set() instead of SET NAMES). If you miss one tiny thing, you're vulnerable. So it's yes, as long as you do everything right...

And that's the reason a lot of people will recommend prepared queries. Not because they are any safer. But because they are more forgiving...

CSRF token for ajax

4 votes

I have a problem with forms submitted with ajax. I do my forms with Zend Framework. Some are real forms so I add a Hash element. Others are for small operations (like upvote and downvote here) so I do them with links.

My problem is that I need to use ajax especially for the small forms (the links). I see a lot of questions but nothing comprehensive enough to solve the problem. Is there a detailed description on how to get csrf token working smoothly when forms are submitted via ajax? preferably with Zend Framework but general PHP answers will help too.

You don't need a CSRF token. You case use the HTTP_X_REQUESTED_WITH method (see e.g. here).