Best php questions in July 2012

Webbased chat in php without using database or file

15 votes

I am trying to implement a realtime chat application using PHP . Is it possible to do it without using a persistent data storage like database or file . Basically what I need is a mediator written in PHP who

  1. accepts messages from client browsers
  2. Broadcasts the message to other clients
  3. Forgets the message

You will want to use Sockets. This article will cover exactly what you want to do: http://devzone.zend.com/209/writing-socket-servers-in-php/

Optimize JavaScript CSS download

13 votes

I have a number of pages for my website all use jQuery and JSON and the same CSS, except for a few pages. The first page is user login. As the user will take time to type in his username and password, I want to download all the required JavaScript and CSS files for the entire user session during login. How can this be done? The header is the same for all pages. How do I optimize it?

My idea would be load in js and css files dynamically after document.load. This would not affect the load time of the login page, whilst also caching your js and css files once the user has logged in.

You could also easily change this to document.ready if it loads faster for you.

What about something like this?

$(document).load(function() {
    function preloadFile(filename, filetype){

        //For javascript files
        if (filetype=="js"){
            var fileref=document.createElement('script');
            fileref.setAttribute("type","text/javascript");
            fileref.setAttribute("src", filename);
        }

        //For CSS files
        else if (filetype=="css") {
            var fileref=document.createElement("link");
            fileref.setAttribute("rel", "stylesheet");
            fileref.setAttribute("type", "text/css");
            fileref.setAttribute("href", filename);
        }

        document.getElementsByTagName("head")[0].appendChild(fileref)
    }

    //Examples of how to use below
    preloadFile("myscript.js", "js"); 
    preloadFile("javascript.php", "js");
    preloadFile("mystyle.css", "css");
});

References

http://www.javascriptkit.com/javatutors/loadjavascriptcss.shtml

Figuring out the exact key created by PHP's mcrypt

11 votes

A PHP application I'm maintaining uses Rijndael_256 with EBC_MODE encryption with mcrypt. Fun has it that the key isn't 256 bits long, but only 160. According to the mcrypt_encrypt documentation the key is padded with \0 to get the required size if it's too small.

The key with which the data will be encrypted. If it's smaller than the required keysize, it is padded with '\0'. It is better not to use ASCII strings for keys.

This seem to happen at around the start of line 1186 in mcrypt.c and modifying the key at line 1213.

So lets say we've got $key = 'abcdefghijkm'; which is too short, but PHP's implementation of mcrypt makes sure it's extended to 32 characters (or 256 bit) when using RIJNDAEL_256. What will the final key look like?

I'm asking this because another application is being built that uses the same encrypted data, but is in another language. Perl to be exact and I'm using Crypto::Rijndael. For the given example key, what is the exact key I would have to feed to Crypto::Rijndael (or any other for that matter) to be able to decrypt the data again?

Update

With Perl I can generate a key that's \0 padded doing pack('a32', 'my secret key'); (or Z32), length() will report 32 and the Crypt::Rijndael module accepts the key. Looking at the source of PHP's mcrypt this should be the key that's being generated (\0 padded), but it simply won't take it.

In theory in PHP pack('a32', 'my secret key'); should result in the same \0 padded key that PHP's mcrypt generates, but this isn't the case.

I'm very close to just encrypt everything again but with a new key. This is taking too much time.

The issue isn't the key's padding, it's that you're using two different block sizes. In PHP, using MCRYPT_RIJNDAEL_256 uses a block size of... 256 bits. However, in perl using Crypt::Rijndael, they note:

blocksize
The blocksize for Rijndael is 16 bytes (128 bits), although the algorithm actually supports any blocksize that is any multiple of our bytes. 128 bits, is however, the AES-specified block size, so this is all we support.

So there's no key that will allow for conversion between those different algorithms. You can either switch to 128 bits in PHP:

<?
$key = "abcdefghijklmnopqrstuvwxyz";
$data = "Meet me at 11 o'clock behind the monument.";
$crypttext = mcrypt_encrypt(MCRYPT_RIJNDAEL_128, $key, $data, MCRYPT_MODE_ECB, nil);
echo bin2hex($crypttext) . "\n";
// prints c613d1804f52f535cb4740242270b1bcbf85151ce4c874848fd1fc2add06e0cc2d26b6403feef4a8df18f7dd7f8ac67d
?>

Which Perl can decrypt without a problem using Crypt::Rijndael:

use Crypt::Rijndael;
$key = "abcdefghijklmnopqrstuvwxyz\0\0\0\0\0\0";
$crypttext = "c613d1804f52f535cb4740242270b1bcbf85151ce4c874848fd1fc2add06e0cc2d26b6403feef4a8df18f7dd7f8ac67d";
$cipher = Crypt::Rijndael->new($key, Crypt::Rijndael::MODE_ECB());
print $cipher->decrypt(pack('H*', $crypttext));
# prints "Meet me at 11 o'clock behind the monument."

Or you can switch to a different Perl module that supports more block sizes, e.g., Crypt::Rijndael_PP:

# Same PHP code except using MCRYPT_RIJNDAEL_256
# prints f38469ec9deaadbbf49bb25fd7fc8b76462ebfbcf149a667306c8d1c033232322ee5b83fa87d49e4e927437647dbf7193e6d734242d583157b492347a2b1514c

Perl:

use Crypt::Rijndael_PP ':all';
$key = "abcdefghijklmnopqrstuvwxyz\0\0\0\0\0\0";
$crypttext = "f38469ec9deaadbbf49bb25fd7fc8b76462ebfbcf149a667306c8d1c033232322ee5b83fa87d49e4e927437647dbf7193e6d734242d583157b492347a2b1514c";
print rijndael_decrypt(unpack('H*', $key), MODE_ECB, pack('H*', $crypttext), 256, 256);
# prints "Meet me at 11 o'clock behind the monument."

How to make ngix virtual directories accessible in php?

10 votes

Let's say I have a web server (nginx) server.com where I have only one php file index.php (there is no directory structure). I want to be able to access anything after server.com. It will be an url structure. For example server.com/google.com, server.com/yahoo.com.au etc...

An example would be http://whois.domaintools.com/google.com (if I am right that they don't have a directory /google.com

Q1: How can I access whatever is afer server.com in my index.php?

Q2: Can I even get protocol from such url? For example server.com/http://www.google.com or server.com/https://www.google.com

PS

I really don't know if the term virtual directory is used here correctly. I just want to do what I saw somewhere else.

location / {
    rewrite ^/(.*)$ /index.php?q=$1
}

location = /index.php {
    #Do your normal php passing stuff here now
}

Is that what you were looking for?

As an answer to your second question, you can parse the protocol in php. Nginx doesn't need to do that. To parse the url, you can use the parse_url function

Best Practice: How to Structure Arrays - Standards and Naming Conventions

10 votes

What is the best practice in multidimensional array structure in terms of what elements hold the iterator vs the detail elements?

The majority of my programming experience (and I do mainly do it for fun) comes from following tutorials on google, so I apologize in advance if this seems an exceptionally daft question - but I do want to start improving my code.

Whenever I have needed to make a multidimensional array, my naming has always placed the counter in the first element.

For example, if I have a single dimensional array as follows:

$myArray['year']=2012;
$myArray['month']='July';
$myArray['measure']=3;
// and so on.

However, if I wanted to make that same array keep a few owners of history I would add another dimension and format it as follows:

$myArray[$owner]['year']=2012;
$myArray[$owner]['month']='July';
$myArray[$owner]['measure']=3;

Edit: To make sure that my example isn't off-putting or leading in the right direction, I am basically following this structure:

$myArray[rowOfData][columnOfData]

Now, my question is about accepted convention. Should I instead be doing the following?

$myArray['year'][$owner]=2012;
$myArray['month'][$owner]='July';
$myArray['measure'][$owner]=3;

Edit: using that edit from above, should it be:

$myArray[columnOfData][rowOfData]

I have searched about array naming conventions, but keep hitting articles arguing about whether to name arrays as plurals or not. The way I have been naming them seems to be more logical and I think it follows a structure that resembles an object better i.e. object->secondaryLevel->detail but for all I know I have been doing it ass-about all this time. As I getting more and more into programming, I would prefer to change my habits if they are wrong.

Is there an accepted standard or is it just anything goes with arrays? If you were looking at code written by someone else, what format would be expecting? I get that any structure that makes sense/is intuitive is accepted.

Also from an iteration point of view, which one of the following is more intuitive?:

for($i=0;$i<$someNumber;$i++)
{
    echo $myArray[$i]['year'];
    // OR
    echo $myArray['year'][$owner];
}

Edit: I did have this post tagged as c# and Java because I wanted to get some opinions outside of just PHP programmers. I think that as arrays are used in so many different languages, it would have been good to get some input from programmers in various langauges.

Your question is subjective, in that everyone may have a different approach to the situation you stated, and you are wise to even ask the question; How best to name your variables, classes, etc. Sad to say, I spend more time than I care to admit determining the best variable names that make sense and satisfy the requirements. My ultimate goal is to write code which is 'self documenting'. By writing self-documenting code you will find that it is much easier to add features or fix defects as they arise.

Over the years I have come to find that these practices work best for me:

Arrays: Always plural

I do this so loop control structures make more semantic sense, and are easier to work with.

// With a plural array it's easy to access a single element
foreach ($students as $student) {}

// Makes more sense semantically
do {} (while (count($students) > 0);

Arrays of objects > deep multi-dimensional arrays

In your example your arrays started blowing up to be 3 element deep multi-dimensional arrays, and as correct as Robbie's code snippet is, it demonstrates the complexity it takes to iterate over multi-dimensional arrays. Instead, I would suggest creating objects, which can be added to an array. Note that the following code is demonstrative only, I always use accessors.

class Owner
{
    public $year;
    public $measure;
    public $month;
}

// Demonstrative hydration 
for ($i = 1 ; $i <= 3 ; $i++) {

    $owner = new Owner();

    $owner->year = 2012;
    $owner->measure = $i;
    $owner->month = rand(1,12);

    $owners[] = $owner;
}

Now, you only need to iterate over a flat array to gain access to the data you need:

foreach ($owners as $owner) {
    var_dump(sprintf('%d.%d: %d', $owner->month, $owner->year, $owner->measure));
}

The cool thing about this array of objects approach is how easy it will be to add enhancements, what if you want to add an owner name? No problem, simply add the member variable to your class and modify your hydration a bit:

class Owner
{
    public $year;
    public $measure;
    public $month;
    public $name;
}

$names = array('Lars', 'James', 'Kirk', 'Robert');

// Demonstrative hydration 
for ($i = 1 ; $i <= 3 ; $i++) {

    $owner = new Owner();

    $owner->year = 2012;
    $owner->measure = $i;
    $owner->month = rand(1,12);
    $owner->name = array_rand($names);

    $owners[] = $owner;
}

foreach ($owners as $owner) {
    var_dump(sprintf('%s: %d.%d: %d', $owner->name, $owner->month, $owner->year, $owner->measure));
}

You have to remember that the above code snippets are just suggestions, if you rather stick with deep multi-dimensional arrays, then you will have to figure out an element ordering arrangement that makes sense to YOU and those you work with, if you think you will have trouble with the setup six months down the road, then it is best to implement a better strategy while you have the chance.

PHP Do something when session gets expired

Asked on Sun, 22 Jul 2012 by Linas php
10 votes

So let's say user did something on my website, for example uploaded some images or whatever, and he left without logging out and never came back or let's say he did come back after few months.

So my question would be, is there some kind of way for example to delete his uploaded files after session have expired, let's say after 30 mins (keep in mind that user never reloaded page), so that would need to run entirely on server side without user interfering at all.

EDIT Thank you all for your wonderful answers, it gave me quite a few great ideas, i wish i could accept all of your answers :)

One way would be to call

  $thePath = session_save_path();

and iterate over all saved session file, unserialze each and check them for the specified timeout property.

Unfortunately, you need to scan the whole directory to find all session files, which are older than a defined period of time. You'd use start() to figure out the age of a session file.

On a well-maintained server, each virtual host should have a separate directory for its session data. A not-so-well-maintained might store all sessions in a unified shared directory. Therefore, ensure that you don't read or delete other virtual hosts' session data.

Better Approach using a database

Therefore I propose to save session data to your application's backend database. Using SQL, it would be trivial to find all outdated session files.

The documentation for session_set_save_handler() provides a sample, which explains this whole process quite nicely based on objects.

Class object not working inside ob_start callback

Asked on Wed, 04 Jul 2012 by qxxx php
9 votes

I don't know why, but this code worked for me a month ago... maybe I upgraded the php but can't remember. Tried this with PHP 5.2.17 and 5.3.6

Why is it not possible to use a class object inside the callback of a ob_start function?

<?php
$f=new stdClass();
$f->title="awesome Title";

function callback($buffer) 
{
    global $f;
    $buffer=str_replace("###TITLE###", $f->title, $buffer);
    return $buffer;
}
ob_start("callback");
?>

This is the ###TITLE###

Output is:

PHP Notice:  Trying to get property of non-object in /Users/qxxx/Sites/test/test.php on line 8
This is the 

should be:

This is the awesome Title

This is because the output buffer is being implicitly flushed by the termination of the script.

At this point PHP has already destroyed unreferenced variables, so when it comes to execute your callback function, the variable $f does not exist in the global scope.

You can solve this by explicitly flushing the buffer before shutdown starts destroying objects, by placing the following line somewhere in your script.

register_shutdown_function('ob_end_flush');

Edit:

I'd like to add that even though this is currently the accepted answer that explains the "why", the solution provided here does not address the root cause of the issue; the fact that global is being used.

Many people will tell you that global is evil, without giving a reason why. Here you can see one of the reasons.

The answer provided by Jack gives a more "best practice" solution (using closures to maintain the variable reference), and should be considered as the proper way to avoid using global in new codebases.

Emulating PHP's CLI in a browser

9 votes

I'm considering the idea of a browser-based PHP IDE and am curious about the possibility of emulating the command line through the browser, but I'm not familiar enough with developing tools for the CLI to know if it's something that could be done easily or at all. I'd like to do some more investigation, but so far haven't been able to find very many resources on it.

From a high level, my first instinct is to set up a text input which would feed commands to a PHP script via AJAX and return any output onto the page. I'm just not familiar enough with the CLI to know how to interface with it in that context.

I don't need actual code, though that would be useful too, but I'm looking for more of which functions, classes or APIs I should investigate further. Ideally, I would prefer something baked into PHP (assume PHP 5.3) and not a third-party library. How would you tackle this? Are there any resources or projects I should know about?

Edit: The use case for this would be a localhost or development server, not a public facing site.

Call this function trough a RPC or a direct POST from javascript, which does things in this order:

  • Write the PHP code to a file (with a random name) in a folder (with a random name), where it will sit alone, execute, and then be deleted at the end of execution.
  • The current PHP process will not run the code in that file. Instead it has to have exec permissions (safe_mode off). exec('php -c /path/to/security_tight/php.ini') (see php -?)
  • Catch any ouput and send it back to the browser. You are protected from any weird errors. Instead of exec I recomment popen so you can kill the process and manually control the timeout of waiting for it to finish (in case you kill that process, you can easily send back an error to the browser);

You need lax/normal security (same as the entire IDE backend) for the normal PHP process which runs when called through the browser.

You need strict and paranoid security for the php.ini and php process which runs the temporary script (go ahead and even separate it on another machine which has no network/internet access and has its state reverted to factory every hour just to be sure).

Don't use eval(), it is not suitable for this scenario. An attacker can jump out into your application and use your current permissions and variables state against you.

Geo Search Without Distance

9 votes

This is my first look at geo location. Please forgive my long question. I hope it is clear.

I have a database holding records with lat & lng. I am trying to write a PHP class which defines four decimal coordinates that form a boundary box (to then retrieve all records within the box).

I only need to know which locations are within n km of the user location. I do not need the distances from the user to the locations.

I am building on the first part of this answer.

At the core of the above answer is:

10 km in a straight line is:

on the latitude is equal to ~1'(minute)
on the longitude is equal to ~6'(minutes)

Using this as a basis, do some quick math and in your query add to the WHERE clause removing any locations that are outside the 'box' that is created by adding the buffer zone with the assumption of 1' lat & 6' long.

Mmmm, "do some quick math" says Patrick. I have failed to do the maths correctly after a few days.

My class is hanging on in there and seems to work for 10km. However, I am failing to make it work with any other distance.

First I moved to seconds, so 10km is:

on the latitude is equal to 60 seconds
on the longitude is equal to 360 seconds

This worked fine.

However, I then tried 5km is:

on the latitude is equal to 30 seconds
on the longitude is equal to 180 seconds

This did not work.

Can you explain why this approach does not work for 5km (and other distances I have tried)?

Below is my class code. Clearly the switch is rubbish. Any help appreciated.

class coords 
{
/*
    Calculate a bounding box from a decimal coordinate and distance.
    The public vars are populated with minimum and maximum longitudes and latitudes which define the boundary.
*/
    public $dec_lat_min;
    public $dec_lat_max;
    public $dec_lng_min;
    public $dec_lng_max;

    function init($dec_lat, $dec_lng, $dist) 
    { 
        // This switch is a terrible way to allow multiple distances.

        // 10km = 1 min lat = 60 sec lat
        // 10km = 6 min lng = 360 sec lng

        // 5km = 30 sec lat
        // 5km = 180 sec lng

        // 1km = 6 sec lat
        // 1km = 36 sec lat

        // 500m = 3 sec lat
        // 500m = 18 sec lat

        switch($dist)
        {
            case 10: // 10km
                $sec_diff_lat = 60;
                $sec_diff_lng = 360;
                break;
            case 5: // 5km
                $sec_diff_lat = 30;
                $sec_diff_lng = 180;
                break;
            case 1: // 1km
                $sec_diff_lat = 6;
                $sec_diff_lng = 36;
                break;
            default: // 500m
                $sec_diff_lat = 3;
                $sec_diff_lng = 18;
                break;  
        }

        // Convert lat to DMS
        $dms_lat = $this->dec2dms($dec_lat);

        // Allow for western hemisphere (ie negative)
        $dms_lat['hem'] == '-' ? $h = -1 : $h = 1;

        // Populate min and max latitudes
        $this->dec_lat_min = $this->dms2dec($dms_lat['deg'],$dms_lat['min'],$dms_lat['sec']+(-1 * $sec_diff_lat * $h),$dms_lat['hem']);
        $this->dec_lat_max = $this->dms2dec($dms_lat['deg'],$dms_lat['min'],$dms_lat['sec']+($sec_diff_lat * $h),$dms_lat['hem']);

        $dms_lng = $this->dec2dms($dec_lng);

        $dms_lng['hem'] == '-' ? $h = -1 : $h = 1;

        $this->dec_lng_min = $this->dms2dec($dms_lng['deg'],$dms_lng['min'],$dms_lng['sec']+(-1 * $sec_diff_lng * $h),$dms_lng['hem']);
        $this->dec_lng_max = $this->dms2dec($dms_lng['deg'],$dms_lng['min'],$dms_lng['sec']+($sec_diff_lng * $h),$dms_lng['hem']);

    }

    function dec2dms($d) 
    {
        $d = (string)$d;

        // got dashes?
        if ($d[0] == "-") {
            $hem = '-';
            $dVal = substr($d,1);
        } else {
            $hem = '';
            $dVal = $d;
        }

        // degrees = degrees
        $dVals = explode('.', $dVal);
        $dmsDeg = $dVals[0];

        // * 60 = mins
        $dRemainder = ('0.'.$dVals[1]) * 60;
        $dmsMinVals = explode('.', $dRemainder);
        $dmsMin = $dmsMinVals[0];

        // * 60 again = secs
        $dMinRemainder = ('0.'.$dmsMinVals[1]) * 60;
        $dmsSec = round($dMinRemainder);

        return array("deg"=>$dmsDeg,"min"=>$dmsMin,"sec"=>$dmsSec, "hem"=>$hem);
    }

    function dms2dec($deg,$min,$sec,$hem) 
    {
        // find decimal latitude
        $d = $deg+((($min*60)+($sec))/3600);
        $d = $hem.$d;

        return $this->round100000($d);
    }

    function round100000($v) 
    {
        return round($v * 100000) / 100000;
    } 
}

As the other answer noted you really want to use the Haversine Formula which takes into account the users latitude.

Haversine Formula: http://en.wikipedia.org/wiki/Haversine_formula

This other answer has a ton of great resources to accomplish this in PHP or SQL:

MySQL Great Circle Distance (Haversine formula)

PHP and the million array baby

9 votes

Imagine you have the following array of integers:

array(1, 2, 1, 0, 0, 1, 2, 4, 3, 2, [...] );

The integers go on up to one million entries; only instead of being hardcoded they've been pre-generated and stored in a JSON formatted file (of approximately 2MB in size). The order of these integers matters, I can't randomly generate it every time because it should be consistent and always have the same values at the same indexes.

If this file is read back in PHP afterwards (e.g. using file_get_contents + json_decode) it takes from 700 to 900ms just to get the array back — "Okay" I thought, "it's probably reasonable since json_decode has to parse about 2 million characters, let's cache it". APC caches it in a entry that takes about 68MB, probably normal, zvals are large. Retrieving however this array back from APC also takes some good 600ms which is in my eyes still way too much.

Edit: APC does serialize/unserialize to store and retrieve content which with a million item array is a lengthy and heavy process.

So the questions:

  • Should I expect this latency if I intend to load a one million entries array, no matter the data store or the method, in PHP? As far as I understand APC stores the zval itself, so theoretically retrieving it from APC should be as fast as it can possibly get (no parsing, no conversion, no disk access)

  • Why is APC so slow for something so seemingly simple?

  • Is there any efficient way to load a one million entries array entirely in memory using PHP? assuming RAM usage is not a problem.

  • If I were to access only slices of this array based on indexes (e.g. loading the chunk from index 15 to index 76) and never actually have the entire array in memory (yes, I understand this is the sane way of doing it, but I wanted to know all the sides), what would be the most efficient data store system for the complete array? Obviously not a RDBM; I'm thinking redis, but I would be happy to hear other ideas.

Say the integers are all 0-15. Then you can store 2 per byte:

<?php
$data = '';
for ($i = 0; $i < 500000; ++$i)
  $data .= chr(mt_rand(0, 255));

echo serialize($data);

To run: php ints.php > ints.ser

Now you have a file with a 500000 byte string containing 1,000,000 random integers from 0 to 15.

To load:

<?php
$data = unserialize(file_get_contents('ints.ser'));

function get_data_at($data, $i)
{
  $data = ord($data[$i >> 1]);

  return ($i & 1) ? $data & 0xf : $data >> 4;
}

for ($i = 0; $i < 1000; ++$i)
  echo get_data_at($data, $i), "\n";

The loading time on my machine is about .002 seconds.

Of course this might not be directly applicable to your situation, but it will be much faster than a bloated PHP array of a million entries. Quite frankly, having an array that large in PHP is never the proper solution.

I'm not saying this is the proper solution either, but it definitely is workable if it fits your parameters.

Note that if your array had integers in the 0-255 range, you could get rid of the packing and just access the data as ord($data[$i]). In that case, your string would be 1M bytes long.

Finally, according to the documentation of file_get_contents(), php will memory map the file. If so, your best performance would be to dump raw bytes to a file, and use it like:

$ints = file_get_contents('ints.raw');
echo ord($ints[25]);

This assumes that ints.raw is exactly one million bytes long.

(PHP) Parsing RegEx string - balancing brackets

8 votes

I'm trying to parse string in the following format (EBNF, I hope this is right) in PHP:

<exp>      ::= <base>[{<modifier>["!"]"("<exp>")"}]
<base>     ::= <role>[{<modifier><role>}]
<modifier> ::= "&" | "|"
<role>     ::= ["!"]<str>[","<str>]

Where <str> is any string that would pass [a-zA-Z0-9\-]+

The following are example of patterns that would have to be parsed:

token1
token1&token2
token1|(token2&!token3)
(token1&token2)|(token3&(token4|(!token5,12&token6)))
!(token1&token2|(token3&!token4))|token5,12

I am trying to write a RegEx pattern that would always give me four groups:

  1. The left-most <expression>. From the above example this would be:
    • token1
    • token1
    • token1
    • token1&token2
    • token1&token2|(token3&!token4)
  2. If ["!"] was present. I.e.
    • null
    • null
    • null
    • null
    • !
  3. The <modifier> for the next <expression> (if any). This would be:
    • null
    • &
    • |
    • |
    • |
  4. The remaining of the pattern.
    • null
    • token2
    • token2&!token3
    • token3&(token4|(!token5,12&token6))
    • token5,12

I can parse this provided that the first expression doesn't contain any <modifier>s.

^\(?(!?)([a-zA-Z0-9\-]+)\)?([&|]?)(.*)$

I am stuck at this point. I have tried using lookarounds, however I can't figure out how to ensure that the group is captured when all brackets are balanced. Is this achievable with RegEx or do I need to write code using loops etc. to do this?

As far as I know, it is impossible.

You have a context-free grammar (EBNF is for this type of grammars - Type-2 grammars), which cannot be parsed with regular expressions (which are for regular grammars - Type-3 grammars).

http://en.wikipedia.org/wiki/Chomsky_hierarchy

As an example of the thing you cannot handle here: number of opening parantheses - you can only write one regexp for each number of these (but there can be infinite, right?), otherwise there is no way to tell if the number of matching closing parantheses is the same. There is no way to count how many chars mathed by the specific part of regexp with quantifiers (+, *, etc.)

Is it possible to restrict a route for AJAX only?

8 votes

Is it possible to restrict a Symfony 2 route for XHR requests only? I want to declare routes, which are only accessible via AJAX.

I do not want to put some extra lines into each AJAX-specific-actions like that:

if ($request->isXmlHttpRequest()) {
    // do something
} else {
    // do something else
}

I want to define:

  • one rule for AJAX requests
  • one rule for GET/POST requests to the same URL

in order to get around having conditions like above.

My advice would be to define your own router service instead of default, which would extend from Symfony\Bundle\FrameworkBundle\Routing\Router, and redefine method resolveParameters() with implementing your own logic for handling additional requirements.

And then, you could do something like this in your routing:

your_route:
    pattern:  /somepattern
    defaults: { somedefaults }
    requirements:
        _request_type:  some_requirement

Stop recursive incestuous child parent relationship in mysql

7 votes

I am programming in PHP / MySQL / Javascript. I have a list of parts which we want to link in a child / parent relationship with no limit on the amount of tiers.

When I am picking from a list of parts to add a child to a parent I limit the list of parts to exclude the parent itself, and any parts which are already children of that parent.

What I have discovered is that I also want to exclude the grandparents of the parent as otherwise we can get an incestuous relationship, which when I display the tree of parts will create an infinite loop.

Not only that but I cannot allow the child part to be a great grandparent of the parent or great great grandparent e.t.c.

Here is the SQL statement I use currently which I think could also be improved by using LEFT JOIN but I am not skillful enough with SQL at this point.

SELECT * 
FROM sch_part_general 
WHERE (sch_part_general.part_id <> $parentId) 
AND (sch_part_general.part_id NOT IN 
  (SELECT part_id FROM sch_part_mapping WHERE parent_id = $parentId)
)

sch_part_general is a multi column table with all the parts, with part_id as the primary key. sch_part_mapping is a two column mapping table with part_id (child) || parent_id (parent).

Could someone point me in the right direction with the SQL query? I am not keen on using a while loop to create the SQL statement as I think this will be quite inefficient but it is the only way I have considered might work so far.

MySQL doesn't have much (if any) support for hierarchical queries. If you want to stick to what is called theAdjacency List Model, all you can do is add a JOIN for each level you like to include. Needless to say this doesn't scale well.

On the other hand, if you can alter your Database Schema, I would suggest implementing the Nested Set Model.

A very good explantion of the Nested Set Model is presented in Mike Hillyer's blog

Limitations of the Adjacency List Model

Working with the adjacency list model in pure SQL can be difficult at best. Before being able to see the full path of a category we have to know the level at which it resides.

Nested Set Model

the concept of nested sets in SQL has been around for over a decade, and there is a lot of additional information available in books and on the Internet. In my opinion the most comprehensive source of information on managing hierarchical information is a book called Joe Celko’s Trees and Hierarchies in SQL for Smarties, written by a very respected author in the field of advanced SQL, Joe Celko.

Mysql Regular Expression search with no repeating characters

6 votes

I have a database table with words from a dictionary.

Now I want to select words for an anagram. For example if I give the string SEPIAN it should fetch values like apes, pain, pains, pies, pines, sepia, etc.

For this I used the query

SELECT * FROM words WHERE word REGEXP '^[SEPIAN]{1,6}$'

But this query returns words like anna, essen which have repeated characters not in the supplied string. Eg. anna has two n's but there is only one n in the search string SEPIAN.

How can I write my regular expression to achieve this? Also if there are repeated characters in my search string at that time the repeated characters should reflect in the result.

Since MySQL does not support back-referencing capturing groups, the typical solution of (\w).*\1 will not work. This means that any solution given will need to enumerate all possible doubles. Furthermore, as far as I can tell back-references are not valid in look-aheads or look-behinds, and look-aheads and look-behinds are not supported in MySQL.

However, you can split this into two expressions, and use the following query:

SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,6}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N.*?N'

Not very pretty, but it works and it should be fairly efficient as well.


To support a set limit of repeated characters, use the following pattern for your secondary expression:

A(.*?A){X,}

Where A is your character and X is the number of times it's allowed.

So if you're adding another N to your string SEPIANN (for a total of 2 Ns), your query would become:

SELECT * FROM words
WHERE word REGEXP '^[SEPIAN]{1,7}$'
AND NOT word REGEXP 'S.*?S|E.*?E|P.*?P|I.*?I|A.*?A|N(.*?N){2}'

regexps: variable-length lookbehind-assertion alternatives

6 votes

Is there any implementation of regular expressions, that supports variable-length lookbehind-assertion?

/(?<!foo.*)bar/

How can I write a r.e. that has the same meaning but uses no lookbehind-assertion?

Is there any chances that this type of assertion will be implemented someday?

Update #1

Things are much better that I thought.

(1) There are regular expressions implementation that support variable-length lookbehind-assertion already.

Python module regex (not standard re, but additional regex module) supports such assertions (and has many other cool features).

>>> import regex
>>> m = regex.search('(?<!foo.*)bar', 'f00bar')
>>> print m.group()
bar
>>> m = regex.search('(?<!foo.*)bar', 'foobar')
>>> print m
None

It was a really big surprise for me that there is something in regular expressions that Perl can't do and Python can. Probably, there is "enhanced regular expression" implementation for Perl also?

(Thanks and +1 to MRAB).

(2) There is a cool feature \K in modern regualar expressions.

This symbols means that when you make a substitution (and from my point of view the most interesting usagecase of assertions is the substitution), all characters that were found before \K must not be changed.

s/unchanged-part\Kchanged-part/new-part/x

That is almost like a look-behind assertion, but not so flexible of course.

More about \K:

As far as I understand, you can't use \K twice in the same regular expression. And you can't say till which point you want to "kill" the characters that you've found. That is always till the beginning of the line.

(Thanks and +1 to ikegami).

My additional questions:

  • Is it possible to say what point must be the final point of \K effect?
  • What about enhanced regular expressions implementations for Perl/Ruby/JavaScript/PHP? Something like regex for Python.

Most of the time, you can use \K.

s/(?<=foo.*)bar/moo/s;

would be

s/foo.*\Kbar/moo/s;

and by

s/(?<!foo.*)bar/moo/s;

you mean

s/^(?:(?!foo).)*\Kbar/moo/s;

If you're just matching, you don't even need the \K.

/foo.*bar/s

/^(?:(?!foo).)*bar/s

(?:(?!STRING).) is to (?:STRING) as [^CHAR] is to CHAR.

Why use MySQL DB Caching?

6 votes

I am currently in the process of developing two iOS applications which heavily rely on MySQL databases. They each have their own API which is requested by the respective application, which runs relevant queries requesting data from the MySQL databases.

The queries vary from being simple, user or 'object' based:

SELECT `username`, `id`, `full_name` FROM `users` WHERE `id` = 1
INSERT INTO `users` (`full_name`, `username`, `email`, `password`, `signup_method`, `latitude`, `longitude`) VALUES (?, ?, ?, ?, ?, ?, ?)"
SELECT q.*, (SELECT COUNT(a.qid) FROM answers as a WHERE qid=q.id) AS a_count FROM questions as q ORDER BY a_count DESC LIMIT 1, 10

to location based:

SELECT ( 6371 * acos( cos( radians(?) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(?) ) + sin( radians(?) ) * sin( radians( latitude ) ) ) ) AS distance FROM `users` HAVING distance <= 5 ORDER BY points DESC

SELECT * , (6371 * acos(cos(radians(latitude)) * cos(radians({$values['latitude']})) * cos(radians({$values['longitude']}) - radians(longitude)) + sin(radians(latitude)) * sin(radians({$values['latitude']})))) AS distance FROM `questions` HAVING distance <= ? ORDER by distance LIMIT ?,?

These queries obviously take time. Especially the later due to the performance intensity it causes.

Many services use caching layers alongside their databases to improve performance. E.g:

  • Memcachd
  • Redis
  • and more.

My question is; when, in regards to queries, should caching be used, and what are the benefits of using caching?

Thanks,

Max!

You should cache simply when it's cheaper to cache than it is to generate the results from scratch.

This cost depends on things like:

  • processing power of various servers and software. Maybe you have limited capacity on your db server, but excess capacity on another server.
  • money: is it cheaper to buy more powerfull hardware than to build a cache system?
  • CPU-cost of generating the results from scratch vs. RAM-cost of cache. Most often, DB-servers are CPU-bound, while cache-servers are memory-bound. It's for you to decide which is cheaper to upgrade in your case.
  • speed of retrieving from cache vs. speed of retrieving from db. If, as you say, the queries are time-expensive, and getting them from a cache is cheaper, caching will speed up your requests.
  • how often your cached items need to be refreshed. If they only last for seconds, it may not be worth the hassle.
  • having a method to expire and refresh cached items. This is often a very hard problem.
  • having the technical knowledge and time to manage the additional complexity.

But always, start at the source. Have you examined MySQL's slow-query-log, to see which queries are costly? It can help you see where you're missing important indices, and which queries take unexpectedly long. [pt-query-digest]1 from the Percona-Toolkit can help with by summarizing this logfile. Optimize your databases before you start caching.

Looking at your types of queries, it seems to me that caching the results and even pre-heating the cache is well worth it.

The choice of cache is an important one of course. I assume you're already using MySQL's built-in query-cache? Make sure it's enabled and that it has enough memory assigned to it. Simple queries like the 'SELECT username' one are cheap anyway, but are also easily cached by MySQL itself. There are a lot of limits to built-in query-caching though, and a lot of reasons that queries are not cached or caches are flushed. For example, queries with functions (like your location-based queries) are simply skipped. Read the docs.

Using a cache like Redis allows for far more control over what to cache, for how long, and how to expire it. There are many ideas on how to implement this and they depend on your application as well. Have a look around the net.

I'd suggest enabling the query-cache, simply because it's easy and cheap and will help a bit, and I'd definitely look at implementing an in-memory caching layer for you database. Maybe an indexing server, like Solr, which has built-in methods for location-bases queries, is worth considering. We use it together with MySQL.

Memcached and Redis are good choices for caching. I'd personally pick Redis because it has more use-cases and optional persistance to disk, but that's entirely up to you. Maybe your framework-of-choice has some existing components that you can use in your application.

Another tip: measure everything. You only know what to optimize or cache if you know what takes time. Also, the results of your optimizations will only be clear if you measure again. Implement something like statsd and measure the various events and timings in your application. Better too much than not enough. Graph the results and analyze them over time. You'll be surprised what turns up.

PHP/JS: Echoing several variables without losing their value

5 votes

The JS/Ajax function I have built submits without a button click or page refresh. The function gets the values of the input field and with php echoed out the results. But everytime a variable is echoed the next variable erases the value of the previous one. How can avoid this? EXAMPLE

JS

<script>
$(document).ready(function() {
  var timer = null; 
    var dataString;   
      function submitForm(){
        $.ajax({ type: "POST",
           url: "index.php",
           data: dataString,
           success: function(result){
                         $('#special').html('<p>' +  $('#resultval', result).html() + '</p>');
                                           }
                 });
                 return false; }

    $('#contact_name').on('keyup', function() {
    clearTimeout(timer);
    timer = setTimeout(submitForm, 050);
          var name = $("#contact_name").val();
    dataString = 'name='+ name;
    });

     $('#email').on('keyup', function() {
     clearTimeout(timer);
     timer = setTimeout(submitForm, 050);
     var name = $("#email").val();
     dataString = 'name='+ name;
     });

     $('#phone').on('keyup', function() {
     clearTimeout(timer);
     timer = setTimeout(submitForm, 050);
     var name = $("#phone").val();
     dataString = 'name='+ name;
     });

     $('#address').on('keyup', function() {
     clearTimeout(timer);
     timer = setTimeout(submitForm, 050);
     var name = $("#address").val();
     dataString = 'name='+ name;
     });

     $('#website').on('keyup', function() {
     clearTimeout(timer);
     timer = setTimeout(submitForm, 050);
     var name = $("#website").val();
     dataString = 'name='+ name;
     });


 }); 
</script>

HTML/PHP

<form action="" method="post" enctype="multipart/form-data" id="contact_form" name="form4"> 
     <div class="row">  
      <div class="label">Contact Name *</div> <!-- end .label --> 
        <div class="input"> 
          <input type="text" id="contact_name" class="detail" name="contact_name" value="<?php $contact_name ?>" />  
          <div id="special"><span id="resultval"></span></div>  
        </div><!-- end .input--> 
     </div><!-- end .row --> 
     <div class="row">  
      <div class="label">Email Address *</div> <!-- end .label --> 
       <div class="input"> 
        <input type="text" id="email" class="detail" name="email" value="<?php $email ?>" />  
        <div id="special"><span id="resultval"></span></div> 
       </div><!-- end .input--> 
     </div><!-- end .row --> 
    </form>

you can use append() method:

success: function(result){
        $('#special').append('<p>' + result + '</p>');
}

as you have set similar classes to the inputs you can minify your code:

 $('.detail').on('keyup', function() {
    clearTimeout(timer);
    var name = $(this).val();
    dataString = 'name='+ name;
    timer = setTimeout(submitForm, 050);
 });

note that IDs must be unique and repetitively requesting data from the server is not efficient.

Push data to page without checking periodically for it?

4 votes

Is there any way you can push data to a page rather than checking for it periodically?

Obviously you can check for it periodically with ajax, but is there any way you can force the page to reload when a php script is executed?

Theoretically you can improve an ajax request's speed by having a table just for when the ajax function is supposed to execute (update a value in the table when the ajax function should retrieve new data from the database) but this still requires a sizable amount of memory and a mysql connection as well as still some waiting time while the query executes even when there isn't an update/you don't want to execute the ajax function that retrieves database data.

Is there any way to either make this even more efficient than querying a database and checking the table that stores the 'if updated' data OR tell the ajax function to execute from another page?

I guess node.js or HTML5 webSocket could be a viable solution as well?

Or you could store 'if updated' data in a text file? Any suggestions are welcome.

You're basically talking about notifying the client (i.e. browser) of server-side events. It really comes down to two things:

  1. What web server are you using? (are you limited to a particular language?)
  2. What browsers do you need to support?

Your best option is using WebSockets to do the job, anything beyond using web-sockets is a hack. Still, many "hacks" work just fine, I suggest you try Comet or AJAX long-polling.

There's a project called Atmosphere (and many more) that provide you with a solution suited towards the web server you are using and then will automatically pick the best option depending on the user's browser.

If you aren't limited by browsers and can pick your web stack then I suggest using SocketIO + nodejs. It's just my preference right now, WebSockets is still in it's infancy and things are going to get interesting once it starts to develop more. Sometimes my entire application isn't suited for nodejs, so I'll just offload the data operation to it alone.

Good luck.

jQuery $.post Syntax

4 votes

I am using phpStorm to edit a file. This code breaks the page:

$("#delete_all_button").click(function(){
    var oTT = TableTools.fnGetInstance( 'pickup_list_all' );
    var selectedRows = oTT.fnGetSelectedData();

    first = selectedRows[0][selectedRows.length-1];
    $.post("delete.php", {'claimID': first}, function(data){
        console.log(data);
    });
});

Specifically the colon inbetween claimID and first. When I hover overtop a red squiggly underneath the colon, the editor tells me "} expected". When I try and load the page, I get no error in the console and the page is just white.

Another important thing to note is that when I copy previously working post code from another file which has no errors in the code to this file, the errors appear.

What could be the problem? Libraries?

I have imported jquery with the following line:

<script type="text/javascript" charset="utf-8" src="js/jquery-1.7.2.min.js"></script>

Thanks!

The data parameter of jquery.post() expects "A map or string that is sent to the server with the request." You could try changing the second parameter to JSON.stringify({'claimID': first}) or wrap it in brackets to make it an array [{'claimID': first}]

How do I get the value after hash (#) from a URL using jquery

4 votes

Example :

www.site.com/index.php#hello

I want to put the value "hello" in a variable

var type= ....

using jquery

No need for jQuery

var type = window.location.hash.substr(1);