Best php questions in March 2012

57 votes

If I try this:

$l = 0;

echo $l + ++$l;
echo PHP_EOL;
echo $l;

I get this output:

2
1

DEMO: http://codepad.org/ncVuJtJu

Why is that?

I expect to get this as an output:

1
1

My reasoning:

$l = 0;  // l === 0

echo $l + ++$l; // (0) + (0+1) === 1
echo PHP_EOL;
echo $l; // l === 1

But why isn't that the output???

All the answers explaining why you get 2 and not 1 are actually wrong. According to the PHP documentation, mixing + and ++ in this manner is undefined behavior, so you could get either 1 or 2. Switching to a different version of PHP may change the result you get, and it would be just as valid.

See example 1, which says:

// mixing ++ and + produces undefined behavior
$a = 1;
echo ++$a + $a++; // may print 4 or 5

Notes:

  1. Operator precedence does not determine the order of evaluation. Operator precedence only determines that the expression $l + ++$l is parsed as $l + (++$l), but doesn't determine if the left or right operand of the + operator is evaluated first. If the left operand is evaluated first, the result would be 0+1, and if the right operand is evaluated first, the result would be 1+1.

  2. Operator associativity also does not determine order of evaluation. That the + operator has left associativity only determines that $a+$b+$c is evaluated as ($a+$b)+$c. It does not determine in what order a single operator's operands are evaluated.

Also relevant: On this bug report regarding another expression with undefined results, a PHP developer says: "We make no guarantee about the order of evaluation [...], just as C doesn't. Can you point to any place on the documentation where it's stated that the first operand is evaluated first?"

Behaviour of is_callable on '/'

26 votes

My colleague and I have encountered some rather odd behavour. Our environments are Ubuntu 11.10, PHP 5.3.6-13ubuntu3.6 with Suhosin-Patch, and Windows 7 PHP 5.3.5.

On our machines, the following code runs as one would expect:

<?php
function t() { }
var_dump(is_callable('/'));

With the output:

bool(false)

On one of our servers, CentOS release 5.7 (Final), PHP 5.3.8, the same code produces:

bool(true)

Without the t() function, is_callable performs as expected. Note that is_function behaves the same as is_callable in these tests.

Does anyone have any idea what could be causing this?

Edit:

It seems to only happen when a function named t is present, anything else, like b, c etc, and the output is as expected.

Edit - testing with more characters:

<?php
function t() { }
foreach(str_split('/abcdefghijkmnopqrstuvwxyz023456789ABCDEFGHIJKLMNOPQRSTUVXYZ!@#$%^&*()-_+=`~;:[]{}\\|\'"?.>,<') as $character) {
    if (is_callable($character)) var_dump($character, is_callable($character));
}

Outputs the following on the server:

string(1) "/"
bool(true)
string(1) "t"
bool(true)
string(1) "T"
bool(true)
string(1) "_" // gettext
bool(true)
string(1) ":" // With the t() function undefined, this remains callable on the server
bool(true)

On our environments, the output is as expected:

string(1) "t"
bool(true)
string(1) "T"
bool(true)

Edit - more information on cbuckley's comment:

<?php 
ini_set('display_errors', 1);
error_reporting(E_ALL);
function t() { }
$v = '/'; $v();

Produces output: Call to undefined function /()

As a work around you could try this:

$name = '/';
$actual = null;
if (is_callable($name, false, $actual) && $name === $actual) {
    // Method is actually callable
}

glob() can't find file names with multibyte characters on Windows?

18 votes

I'm writing a file manager and need to scan directories and deal with renaming files that may have multibyte characters. I'm working on it locally on Windows/Apache PHP 5.3.8, with the following file names in a directory:

  • filename.jpg
  • имяфайла.jpg
  • file件name.jpg
  • פילענאַמע.jpg
  • 文件名.jpg

Testing on a live UNIX server woked fine. Testing locally on Windows using glob('./path/*') returns only the first one, filename.jpg.

Using scandir(), the correct number of files is returned at least, but I get names like ?????????.jpg (note: those are regular question marks, not the � character.

I'll end up needing to write a "search" feature to search recursively through the entire tree for filenames matching a pattern or with a certain file extension, and I assumed glob() would be the right tool for that, rather than scan all the files and do the pattern matching and array building in the application code. I'm open to alternate suggestions if need be.

Assuming this was a common problem, I immediately searched Google and Stack Overflow and found nothing even related. Is this a Windows issue? PHP shortcoming? What's the solution: is there anything I can do?

Addendum: Not sure how related this is, but file_exists() is also returning FALSE for these files, passing in the full absolute path (using Notepad++, the php file itself is UTF-8 encoding no BOM). I'm certain the path is correct, as neighboring files without multibyte characters return TRUE.

EDIT: glob() can find a file named filename-äöü.jpg. Previously in my .htaccess file, I had AddDefaultCharset utf-8, which I didn't consider before. filename-äöü.jpg was printing as filename-���.jpg. The only effect removing that htaccess line seemed to have was now that file name prints normally.

I've deleted the .htaccess file completely, and this is my actual test script in it's entirety (I changed a couple of file names from the original post):

print_r(scandir('./uploads/')); 
print_r(glob('./uploads/*'));

Output locally on Windows:

Array
(
    [0] => .
    [1] => ..
    [2] => ??? ?????.jpg
    [3] => ???.jpg
    [4] => ?????????.jpg
    [5] => filename-äöü.jpg
    [6] => filename.jpg
    [7] => test?test.jpg
)
Array
(
    [0] => ./uploads/filename-äöü.jpg
    [1] => ./uploads/filename.jpg
)

Output on remote UNIX server:

Array
(
    [0] => .
    [1] => ..
    [2] => filename-äöü.jpg
    [3] => filename.jpg
    [4] => test이test.jpg
    [5] => имя файла.jpg
    [6] => פילענאַמע.jpg
    [7] => 文件名.jpg
)
Array
(
    [0] => ./uploads/filename-äöü.jpg
    [1] => ./uploads/filename.jpg
    [2] => ./uploads/test이test.jpg
    [3] => ./uploads/имя файла.jpg
    [4] => ./uploads/פילענאַמע.jpg
    [5] => ./uploads/文件名.jpg
)

Since this is a different server, regardless of platform - configuration could be different so I'm not sure what to think, and I can't fully pin it on Windows yet (could be my PHP installation, ini settings, or Apache config). Any ideas?

It looks like the glob() function depends on how your copy of PHP was built and whether it was compiled with a unicode-aware WIN32 API (I don't believe the standard builid is.

Cf. http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php

Excerpt from comments on the article:

Philippe Verdy 2010-09-26 8:53 am

The output from your PHP installation on Windows is easy to explain : you installed the wrong version of PHP, and used a version not compiled to use the Unicode version of the Win32 API. For this reason, the filesystem calls used by PHP will use the legacy "ANSI" API and so the C/C++ libraries linked with this version of PHP will first try to convert yout UTF-8-encoded PHP string into the local "ANSI" codepage selected in the running environment (see the CHCP command before starting PHP from a command line window)

Your version of Windows is MOST PROBABLY NOT responsible of this weird thing. Actually, this is YOUR version of PHP which is not compiled correctly, and that uses the legacy ANSI version of the Win32 API (for compatibility with the legacy 16-bit versions of Windows 95/98 whose filesystem support in the kernel actually had no direct support for Unicode, but used an internal conversion layer to convert Unicode to the local ANSI codepage before using the actual ANSI version of the API).

Recompile PHP using the compiler option to use the UNICODE version of the Win32 API (which should be the default today, and anyway always the default for PHP installed on a server that will NEVER be Windows 95 or Windows 98...)

Then Windows will be able to store UTF-16 encoded filenames (including on FAT32 volumes, even if, on these volumes, it will also generate an aliased short name in 8.3 format using the filesystem's default codepage, something that can be avoided in NTFS volumes).

All what you describe are problems of PHP (incorrect porting to Windows, or incorrect system version identification at runtime) : reread the README files coming with PHP sources explaining the compilation flags. I really think that the makefile on Windows should be able to configure and autodetect if it really needs to use ONLY the ANSI version of the API. If you are compiling it for a server, make sure that the Configure script will effectively detect the full support of the UNICODE version of the Win32 aPI and will use it when compiling PHP and when selecting the runtime libraries to link.

I use PHP on Windows, correctly compiled, and I absolutely DON'T know the problems you cite in your article.

Let's forget now forever these non-UNICODE versions of the Win32 API (which are using inconsistantly the local ANSI codepage for the Windows graphical UI, and the OEM codepage for the filesystem APIs, the DOS/BIOS-compatible APIs, the Console APIs) : these non-Unicode versions of the APIs are even MUCH slower and more costly than the Unicode versions of the APIs, because they are actually translating the codepage to Unicode before using the core Unicode APIs (the situation on Windows NT-based kernels is exactly the reverse from the situation on versions of Windows based on a virtual DOS extender, such as Windows 95/98/ME).

When you don't use the native version of the API, your API call will pass through a thunking layer that will transcode the strings between Unicode and one of the legacy ANSI or CHCP-selected OEM codepages, or the OEM codepage hinted on the filesystem: this requires additional temporary memory allocation within the non-native version of the Win32 API. This takes additional time to convert things before doing the actual work by calling the native API.

In summary: the PHP binary you install on Windows MUST be different depending on if you compiled it for Windows 95/98/SE (or the old Win16s emulation layer for Windows 3.x, which had a very mimimum support of UTF-8, only to support the Unicode subsets of Unicode used by the ANSI and OEM codapges selected when starting Windows from a DOS extender) or if it was compiled for any other version of Windows based on the NT kernel.

The best proof that this is a problem of PHP and not Windows, is that your weird results will NOT occur in other languages like C#, Javascript, VB, Perl, Ruby... PHP has a very bad history in tracking versions (and too many historical source code quirks and wrong assumptions that should be disabled today, and an inconsistant library that has inherited all those quirks initially made in old versions of PHP for old versions of Windows that are even no longer officially supported, by Microsoft or even by PHP itself !).

In other words : RTFM ! Or download and install a binary version of PHP for Windows precompield with the correct settings : I really think that PHP should distribute Windows binaries already compiled by default for the Unicode version of the Win32 API, and using the Unicode version of the C/C++ libraries : internally the PHP code will convert its UTF-8 strings to UTF-16 before calling the Win32 API, and back from UTF-16 to UTF-8 when retrieving Win32 results, instead of converting PHP's internal UTF-8 strings back/to the local OEM codepage (for the filesystem calls) or the local ANSI codepage (for all other Win32 APIs, including the registry or process).

Node.js vs PHP processing speed

15 votes

I've been looking into node.js recently and wanted to see a true comparison of processing speed for PHP vs Node.js. In most of the comparisons I had seen, Node trounced Apache/PHP set ups handily. However all of the tests were small 'hello worlds' that would not accurately reflect any webpage's markup.

So I decided to create a basic HTML page with 10,000 hello world paragraph elements. In these tests Node with Cluster was beaten to a pulp by PHP on Nginx utilizing PHP-FPM. So I'm curious if I am misusing Node somehow or if Node is really just this bad at processing power.

Note that my results were equivalent outputting "Hello world\n" with text/plain as the HTML, but I only included the HTML as it's closer to the use case I was investigating.

My testing box:

  • Core i7-2600 Intel CPU (has 8 threads with 4 cores)
  • 8GB DDR3 RAM
  • Fedora 16 64bit
  • Node.js v0.6.13
  • Nginx v1.0.13
  • PHP v5.3.10 (with PHP-FPM)

My test scripts:

Node.js script

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('death', function (worker) {
    console.log('worker ' + worker.pid + ' died');
  });
}
else {
  // Worker processes have an HTTP server.
  http.Server(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});

    res.write('<html>\n<head>\n<title>Speed test</title>\n</head>\n<body>\n');
    for (var i = 0; i < 10000; i++) {
      res.write('<p>Hello world</p>\n');
    }

    res.end('</body>\n</html>');
  }).listen(80);
}

This script is adapted from Node.js' documentation at http://nodejs.org/docs/latest/api/cluster.html

PHP script

<?php

echo "<html>\n<head>\n<title>Speed test</title>\n</head>\n<body>\n";

for ($i = 0; $i < 10000; $i++) {
  echo "<p>Hello world</p>\n";
}

echo "</body>\n</html>";

My results

Node.js

$ ab -n 500 -c 20 http://speedtest.dev/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking speedtest.dev (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        
Server Hostname:        speedtest.dev
Server Port:            80

Document Path:          /
Document Length:        190070 bytes

Concurrency Level:      20
Time taken for tests:   14.603 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      95066500 bytes
HTML transferred:       95035000 bytes
Requests per second:    34.24 [#/sec] (mean)
Time per request:       584.123 [ms] (mean)
Time per request:       29.206 [ms] (mean, across all concurrent requests)
Transfer rate:          6357.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       2
Processing:    94  547 405.4    424    2516
Waiting:        0  331 399.3    216    2284
Total:         95  547 405.4    424    2516

Percentage of the requests served within a certain time (ms)
  50%    424
  66%    607
  75%    733
  80%    813
  90%   1084
  95%   1325
  98%   1843
  99%   2062
 100%   2516 (longest request)

PHP/Nginx

$ ab -n 500 -c 20 http://speedtest.dev/test.php
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking speedtest.dev (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        nginx/1.0.13
Server Hostname:        speedtest.dev
Server Port:            80

Document Path:          /test.php
Document Length:        190070 bytes

Concurrency Level:      20
Time taken for tests:   0.130 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      95109000 bytes
HTML transferred:       95035000 bytes
Requests per second:    3849.11 [#/sec] (mean)
Time per request:       5.196 [ms] (mean)
Time per request:       0.260 [ms] (mean, across all concurrent requests)
Transfer rate:          715010.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:     3    5   0.7      5       7
Waiting:        1    4   0.7      4       7
Total:          3    5   0.7      5       7

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      5
  75%      5
  80%      6
  90%      6
  95%      6
  98%      6
  99%      6
 100%      7 (longest request)

Additional details

Again what I'm looking for is to find out if I'm doing something wrong with Node.js or if it is really just that slow compared to PHP on Nginx with FPM.

I certainly think Node has a real niche that it could fit well, however with these test results (which I really hope I made a mistake with - as I like the idea of Node) lead me to believe that it is a horrible choice for even a modest processing load when compared to PHP (let alone JVM or various other fast solutions).

As a final note, I also tried running an Apache Bench test against node with $ ab -n 20 -c 20 http://speedtest.dev/ and consistently received a total test time of greater than 0.900 seconds.

I was suspicious of the 10k+ res.write() calls, and suggested to replace it with a temporary string variable because all of the calls are more expensive since the http object is not Javascript but C++, and therefore crosses CPU contexts to execute. My guess was apparently right.

Buffered res.write()

http.Server(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/html'});

  var buffer = '<html>\n<head>\n<title>Speed test</title>\n</head>\n<body>\n';
  for (var i = 0; i < 10000; i++) {
    buffer += '<p>Hello world</p>\n';
  }

  buffer += '</body>\n</html>';

  res.end(buffer);
}).listen(80);

Final result

$ ab -n 500 -c 20 http://speedtest.dev/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking speedtest.dev (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        
Server Hostname:        speedtest.dev
Server Port:            80

Document Path:          /
Document Length:        190070 bytes

Concurrency Level:      20
Time taken for tests:   0.389 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      95066500 bytes
HTML transferred:       95035000 bytes
Requests per second:    1283.91 [#/sec] (mean)
Time per request:       15.577 [ms] (mean)
Time per request:       0.779 [ms] (mean, across all concurrent requests)
Transfer rate:          238392.49 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       2
Processing:     1   15  45.3      5     254
Waiting:        1    8  30.7      3     252
Total:          1   15  45.4      5     256

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      7
  75%      8
  80%     10
  90%     13
  95%     25
  98%    233
  99%    254
 100%    256 (longest request)

PHP inheritance, parent functions using child variables

14 votes

While reviewing some PHP code I've discovered a strange thing. Here is the simple example illustration of it:

File A.php:

<?php
class A{
    public function methodA(){
        echo $this->B;
    }
}
?>

File B.php:

<?php
    class B extends A{
        public $B = "It's working!";
    }
?>

File test.php:

<?php
    require_once("A.php");
    require_once("B.php");
    $b = new B();
    $b->methodA();
?>

Running test.php prints out "It's working!", but question is why is it working? :) Is this a feature or a bug? Method methodA in class A can also call methods that are in class B which should not work in OOP.

You're only instantiating class B. Ignore A for the moment, and pretend that methodA() is part of class B.

When class B extends A, it gets all of A's functions. $this->B isn't evaluated until the code is running, not prior. Therefore no error occurs, and won't occur as $this->B exists in class B.

How can I automatically test my site for SQL injection attacks, using either a script or program?

12 votes

I've searched and found a good discussion here on SO, but it is several years old.

What programs are there, or is there a simple script I can run, to find the SQL injection holes in the URLs in my entire site?

Preferably, I'd like to run a script (PHP) or program that crawls my site, bouncing from link to link, attempting to find holes, and upon discovery, stores that URL so I have a list of URLs I need to fix.

Does this exist?

Yes and no. First i'll preface this by saying I'm not just posting links but have done security audits professionally using all of these tools and not as a developer on a project but an external resource. Note that generally sqlserver injection is different than mysql as well.

Free tools like paros proxy [crawls] (previously mentioned),

burpsuite (previously mentioned [crawls] but active attacks requires pro): http://portswigger.net/burp/

sqlninja (sqlserver only) http://sqlninja.sourceforge.net/

google rat proxy: [crawls] http://code.google.com/p/ratproxy/

websecurify: [crawls] http://www.websecurify.com/

wapiti: [crawls but takes work to set up - can be used specifically for sqli with spider] http://wapiti.sourceforge.net/

nikto: [crawls but not for sqli...]

are great! They can help you identify problems but take a great deal of human analysis due to large amounts of false positives. Commercial tools are available like:

NTOSpider (one of the best [crawls!]) : http://www.ntobjectives.com/software/ntospider

are very expensive but talking to a rep will get you a free copy for a period of time (which I have done with them). They make sorting through results faster by providing validation links in the reports but you STILL need a trained eye and analysis as I have found false positives.

Ultimately the correct answer to this question is: You can use tools to help you identify if there are security (sqli) vulnerabilities but only a trained eye using the tools can validate them. Further only a proper code review and analysis can identify vulnerabilities that an app (even a very good one) may miss.

Tools can help but you need human time and analysis to do this correctly. Proxies and request manglers are the real tools for hitting the app with injection and are done with careful intention of trained testers or those with a curious mind.

Is using a for-loop on submitted POST data in PHP safe?

11 votes

I'm always a worry-wart about security in my PHP applications, and I just (potentially) thought of a way a hacker could kill my script. Currently my application takes form data and submits it as an array to a PHP script via AJAX, then loops through this array.

foreach($_POST['form_data'] as $field => $value){
   //Do something here.
}

However, what if a hacker were to forge an AJAX request, and repeatedly submit the 'form_data' array with 100000000000 random elements? The loop would have to iterate through each element, possibly causing a DoS (or at least slow down service), correct?

I'm not entirely educated here, so I may have some incorrect assumptions. Thanks for any input!

This will not be an issue: PHP limits the maximum number of POST vars using the max_input_vars directive, which defaults to 1000 variables.

This limit is actually enforced to prevent a much more serious type of DOS attack than the one you are thinking about (really, iterating a few thousand array elements is like nothing), namely hash table collision based attacks (often referred to as HashDOS). For more info on that issue see my article Supercolliding a PHP array.

Dependency injection: should I inject everything or use a service locator for some objects?

10 votes

I'm currently refactoring my Zend Framework based PHP library from using a service locator to (constructor) dependency injection (DI). I feel that it improves my code a lot, but I'm not sure if I should inject all dependencies. A service locator seems easier for dependencies which are used a lot and are unspecific. I have the following dependencies which I still access using a service locator:

  1. A Zend_Translate object (I need to translate messages everywhere).
  2. A Zend_Locale object (stores the current language)
  3. A Zend_Config object (a lot of things are configurable by ini-file)
  4. Instances of utility classes (for array and string manipulation)

If I injected these dependencies, they'd clutter my constructors and distract from the specific dependencies. For testing, I can just set up these dependencies in my service locator before running the tests. The pragmatist in me says I'm doing just fine, but the purist says I should go all the way with DI.

Would you recommend DI for these types of objects or not?

When it comes to the concern about cluttering the constructors, it's most likely a code smell that the classes are violating the Single Responsibility Principle. Constructor Injection is very beneficial here because it make this much more obvious.

Some people also worry about injecting dependencies which are only rarely used, but that's not a problem either. When it comes to creating object graphs, performance is rarely a problem, and even if it is, the Virtual Proxy pattern can fix it.

In short, there's no reason to ever use a Service Locator. There's always a better alternative that involves proper inversion of control.

Negation of Hex in PHP, funny behavior

9 votes

Got some weird behavior I was wondering if someone could clear up for me.

Check it out

$hex = 0x80008000;

print_r(decbin(intval($hex)) . '<br/>');
print_r(decbin($hex));

Outputs

10000000000000001000000000000000
10000000000000001000000000000000

As expected.

But

$hex = 0x80008000;

print_r(decbin(~intval($hex)) . '<br/>');
print_r(decbin(~$hex));

Outputs

1111111111111110111111111111111
1111111111111111111111111111111

Why is the middle bit not switching when $hex is negated?

Gonna give a shot to my own question here.

Yes this is a 32-bit / 64-bit difference.

In 32-bit systems, a float type has to take up two memory spaces to get the required 64 bits. Php uses double-precision (see http://en.wikipedia.org/wiki/Floating_point#IEEE_754:_floating_point_in_modern_computers)

The $hex evaluates to a float type. Intval and decbin functions convert this into an int type (1st example above)

In the 2nd example we are using the not bitwise operator BEFORE we use decbin. This flips the bits in the two-memory space double-precision float first, and then is converted to int second. Giving us something different than what we expected.

Indeed, if we put the negate inside of the intval() like so:

$hex = 0x80008000;

print_r(decbin(intval(~$hex)) . '<br/>');
print_r(decbin(~$hex));

We get

1111111111111111111111111111111
1111111111111111111111111111111

As output.

I'm not good enough to prove this with math yet (which can be figured out with the help of this article http://en.wikipedia.org/wiki/Double_precision). But maybe when I have time later -_-

I think it's very important to learn how numbers are represented in computers so we can understand anomalies like this and not call them bugs.

Is the PDO Library faster than the native MySQL Functions?

9 votes

I have read several questions regarding this but I fear they may be out of date as newer versions of the PDO libraries have been released since these questions were answered.

I have written a MySQL class that builds queries and escapes parameters, and then returns results based on the query. Currently this class is using the built-in mysql functions.

I am well aware of the advantages of using the PDO Library, e.g. it is compatible with other databases, stored procedures are easier to execute etc... However, what I would like to know is simply; is using the PDO Library faster then using the mysql built-in functions?

I have just written the equivalent class for MsSQL, so rewriting it to work with all databases would not take me long at all. Is it worth it or is the PDO library slower?

I found PDO in many situation/projects to be even faster than the more native modules.
Mainly because many patterns/building blocks in a "PDO-application" require less php script driven code and more code is executed in the compiled extension and there is a speed penalty when doing things in the script. Simple, synthetic tests without data and error handling often do not cover this part, which is why (amongst other problems like e.g. measuring inaccuracies) I think "10000x SELECT x FROM foo took 10ms longer" conclusions are missing the point more often than not .
I can't provide you with solid benchmarks and the outcome depends on how the surrounding application handles the data but even synthetic tests usually only show differences so negligible that you better spend your time on optimizing your queries, the MySQL server, the network, ... instead of worrying about PDO's raw performance. Let alone security and error handling ...

How to correctly fork an open-source library?

9 votes

I'd like to fork an open-source php-library.

It has its own license, in which is written:

You are permitted to use, copy, modify, and distribute the Software and its documentation, with or without modification, for any purpose, provided that the following conditions are met:

And there are some conditions about providing copy of original license agreement, adding copyrights in every source file, etc.

I want to add new features in this library, which are written under GPL. Then the whole new product should be under GPL? So I should add both GPL and 'old' license agreemenets? And in every source file I should keep both license copyrights?

What was the original license agreement? Your sentencing makes it confusing about whether the original part was GPL or you want to add GPL to it.

If the original license was GPL, then your new software must also be GPL. There is no way around it unless you get the permission from the author or all authors - if there is more than one. You can still sell your product if it is under GPL, but note that the buyer may 'resell' it with whatever price they find appropriate, including free, as long as license conditions are met. GPL is not a problem when building a website or software that is specific to a client, as long as you are fine with giving the client the rights to modify and republish the software.

But if you want to add GPL stuff to non-GPL project, then consider using LGPL license instead. LGPL allows to release the component itself under a GPL-like license while not requiring the other software to be GPL or LGPL in return.

System design for matching closest registered store based on zip code?

8 votes

I have the following problem.

I am creating a restaurant delivery system.

So restaurants choose the zipcodes they want to deliver. So in Boston, they might choose Either all of Boston or Back Bay (a specific area of Boston with several zip codes .... ).

Basically, the restaurant confirms the areas they are willing to serve by ticking boxes that are described as follows:

- Cambridge (ZIP CODE)
- Boston (all of Boston)
--- Back Bay (covers zip codes: 02...., 02.., 02..)
--- North Boston (covers zip codes: 02145, 021..., 02..., 02..)

Users type in their zipcodes, and I match them to the areas that Restaurants specified.

What is the best way to design such a system? I don't think I am going in the right direction...

is this only for Boston or would it be global? Will you be looking at exact zipcode matches? What if someone enters a zipcode you don't have but it's within the delivery range. I would recommend using longitude / latitude lookups.

This might be a good place to start: https://developers.google.com/maps/articles/phpsqlsearch

BTW: I'm looking to do something very similar and will most likely use the article referenced above :) thanks for helping me too.

What OCR options exist beyond Tesseract?

7 votes

I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border, but have tried adding one with imagemagick with no ocr advantage); they range from 2 chars to 5 and are a pretty reliable font, however the characters are variable enough that simply using an image size checksum or such is not going to work.

I actually have tried: http://www.free-ocr.co.uk/ and surprisingly it has 100% accuracy. The problem I have with utilizing it is that I cannot rely on another outside service's reliability for this particular use case. I need to be able to control uptime to a higher degree.

What options exist for OCR besides sticking with Tesseract or doing a complete custom training of it? Also, it would be VERY helpful if this were compatible with Heroku style hosting (at least where I can compile the bins and shove them over).

I have successfully used GOCR in the past for small image OCR. I would say accuracy was around 85%, after getting the grayscale options set properly, on fairly regular fonts. It fails miserably when the fonts get complicated and has trouble with multiline layouts.

Also have a look at Ocropus, which is maintained by Google. Its related to Tesseract, but from what I understand, its OCR engine is different. With just the default models included, it achieves near 99% accuracy on high-quality images, handles layout pretty well and provides HTML output with information concerning formatting and lines. However, in my experience, its accuracy is very low when the image quality is not good enough. That being said, training is relatively simple and you might want to give it a try.

Both of them are easily callable from the command line. GOCR usage is very straightforward; just type gocr -h and you should have all the information you need. Ocropus is a bit more tricky; here's a usage example, in Ruby:

require 'fileutils'
tmp = 'directory'
file = 'file.png'

`ocropus book2pages #{tmp}/out #{file}`
`ocropus pages2lines #{tmp}/out`
`ocropus lines2fsts #{tmp}/out`
`ocropus buildhtml #{tmp}/out > #{tmp}/output.html`

text = File.read("#{tmp}/output.html")
FileUtils.rm_rf(tmp)

Retrieving a specific row depending on a date variable?

6 votes

I have 7 columns that which contain the information of closing times, each for one day. (It goes like VENUE_CLOSE_T_MO, VENUE_CLOSE_T_TU... etc)

How would I, for example choose one of those columns depending on a date variable ($somevariable) which contains a specific date?

For example, if the date variable was Sunday, March 18 22:00, it would choose column VENUE_CLOSE_T_SU.

Thanks for the help everyone!

EDIT (Solution given by TEEZ that solved the issue)

My Date variable is $Start.

And this is the code:

$day_name=strtoupper(date('D',$start));
$day_name=substr($day_name,0,2);
$selectcolumn='VENUE_CLOSE_T_'.$day_name;

So in this case $selectcolumn = VENUE_CLOSE_T_SU

And the echo is then this:

$row[$selectcolumn]

Thanks for all your help again Teez!

first get day name from variable ($somevariable)

$day_name=strtoupper(date('D',$somevariable));

then make query like below for getting column according to day in $somevariable

select concat('VENUE_CLOSE_T_',left($day_name,2)) as datecolumnname  from tableame

EDIT:

OR

you don't need to do this in query if you taking all column in query. just add these lines in php code where you printing data in we page under date column

$day_name=strtoupper(date('D',$somevariable));
$day_name=substr($day_name,0,2);
$selectcolumn='venues.VENUE_CLOSE_T_'.$day_name; 
echo $row[$selectcolumn];

What is the best method to make sure two people don't edit the same row on my web app?

6 votes

I have a PHP/jQuery/AJAX/MySQL app built for managing databases. I want to implement the ability to prevent multiple users from editing the same database row at the same time.

  1. What is this called?
  2. Do I use a token system and who ever has the token can edit it until they release the token?
  3. Do I use a "last edit date/time" to compare you loading the HTML form with the time in the database and if the database is the most resent edit then it warns you?
  4. Do I lock the row using database functions?

I'm just not sure which is the best. Assuming between 10 - 15 concurrent users

There are two general approaches-- optimistic and pessimistic locking.

Optimistic locking is generally much easier to implement in a web-based environment because it is fundamentally stateless. It scales much better as well. The downside is that it assumes that your users generally won't be trying to edit the same set of rows at the same time. For most applications, that's a very reasonable assumption but you'd have to verify that your application isn't one of the outliers where users would regularly be stepping on each other's toes. In optimistic locking, you would have some sort of last_modified_timestamp column that you would SELECT when a user fetched the data and then use in the WHERE clause when you go to update the date, i.e.

UPDATE table_name
   SET col1 = <<new value>>,
       col2 = <<new values>>,
       last_modified_timestamp = <<new timestamp>>
 WHERE primary_key = <<key column>>
   AND last_modified_timestamp = <<last modified timestamp you originally queried>>

If that updates 1 row, you know you were successful. Otherwise, if it updates 0 rows, you know that someone else has modified the data in the interim and you can take some action (generally showing the user the new data and asking them if they want to overwrite but you can adopt other conflict resolution approaches).

Pessimistic locking is more challenging to implement particularly in a web-based application particularly when users can close their browser without logging out or where users may start editing some data and go to lunch before hitting Submit. It makes it harder to scale and generally makes the application more difficult to administer. It's really only worth considering if users will regularly try to update the same rows or if updating a row takes a large amount of time for a user so it's worth letting them know up front that someone else has locked the row.

Dynamically creating date periods using MySQL

5 votes

I trying to get the Grape count from dates March 1 - 3.

enter image description here

You will notice that on March 2 - there are no grapes inserted..

I'st possible to show a query from dates March 1, 2 and 3 but showing 0 count for March 2 enter image description here

In this image above only shows dates where there are grapes..

Here is mySQL query

SELECT  `fruitDate` ,  `fruitName` , COUNT( * ) 
FROM  `tbl_fruits` 
WHERE  `fruitName` =  "Grapes"
GROUP BY  `fruitDate

UPDATE 2:

Using this query:

SELECT f.fruitDate, f.fruitName, f1.count FROM tbl_fruits f
    LEFT JOIN (SELECT fruitDate, COUNT(*) as count from tbl_fruits d WHERE d.fruitName='Grapes' GROUP BY d.fruitDate) as f1 ON (f.fruitDate = f1.fruitDate) 
    GROUP BY f.fruitDate

I got this result..but its dsplaying diffrent fruit..something wrong with my query?

enter image description here

Remember there is a dynamically (and a bit ugly) solution to creating a date range that does not require creating a table:

select aDate from (
  select @maxDate - interval (a.a+(10*b.a)+(100*c.a)+(1000*d.a)) day aDate from
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) a, /*10 day range*/
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) b, /*100 day range*/
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) c, /*1000 day range*/
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) d, /*10000 day range*/
  (select @minDate := '2001-01-01', @maxDate := '2002-02-02') e
) f
where aDate between @minDate and @maxDate

Depending on the length of the date range you can reduce the amount of dynamically generated results (10000 days means over 27 years of records each representing one day) by removing tables (d, c, b and a) and removing them from the upper formula. Setting the @minDate and @maxDate variables will allow you to specify the dates between you want to filter the results.

Edit:

I see you're still looking for a solution. Try this:

select c.date, f.fruitName, count(f.fruitName = 'Grapes')
from tbl_calendar c
left join tbl_fruits f
on c.date = f.fruitDate and f.fruitName = 'Grapes'
group by c.date, f.fruitName

If you also want to filter the extra dates from the created table, use this query:

select c.date, f.fruitName, count(f.fruitName = 'Grapes')
from tbl_calendar c
left join tbl_fruits f
on c.date = f.fruitDate and f.fruitName = 'Grapes'
group by c.date, f.fruitName
having c.date between
  (select min(fruitDate) from tbl_fruits) and
  (select max(fruitDate) from tbl_fruits)

Breaking out of a loop when a condition occurs, and avoiding the usage of its preset db value

5 votes

Assuming a student take 6 courses in a semester. All those couses have coures units(int), and depending on the score in each course there are points..

 so a score >=70 will have a point of 5

 <70 and >=60 will have a ponit of 4

and so on. For each course unit and point are multipied together, down the column for each column. Now when the score of a course is not found the grade is 'AR'. Now what i want is for the loops to omit the occurence of AR..i.e not adding the course unit of the course having a grade of 'AR'. But when i run my queries above the units still add to the total course units.

Query4 is used to generate some rows of course_unit and Score

  $query4 = mysql_query("SELECT  c.course_unit, m.score
  FROM    maintable AS m
  INNER JOIN students AS s ON
  m.matric_no = s.matric_no
  INNER JOIN courses AS c ON
  m.course_code = c.course_code
  WHERE m.matric_no = '".$matric_no."'
  AND m.level = '".$level."'")
  or die (mysql_error());

Query3 is used for the summation of the course_units

 $query3 = mysql_query("SELECT  SUM(c.
 course_unit) AS 'TOTAL'
 FROM    maintable AS m
 INNER JOIN students AS s ON
 m.matric_no = s.matric_no
 INNER JOIN courses AS c ON
 m.course_code = c.course_code
 WHERE m.matric_no = '".$matric_no."'
 AND m.level = '".$level."'")
 or die (mysql_error());

Grades in Respect to Score

 while ($row8 = mysql_fetch_assoc
 ($query8)) {
            if ($row8['score'] >= 70) {
              $grade = 'A';
            }
            elseif ($row8['score'] >= 60) {
               $grade = 'B';
            }elseif ($row8['score'] >= 50) {
               $grade = 'C';
            }elseif ($row8['score'] >= 45) {
               $grade = 'D';
            }elseif($row8['score'] >= 40) {
               $grade = 'E';
            }elseif($row8['score'] >= 0) &&
            ($row8['score'] < 40){
               $grade = 'F';
            }else{
               $grade = 'AR';
            }   
     }   

Calculation of the Grade Point

      $grade_point = 0;
      while ($row4 = mysql_fetch_assoc($query4)) {
         if ($row4['score'] >= 70) {
            $score = 5;
          }
          elseif ($row4['score'] >= 60) {
             $score = 4;
          }elseif ($row4['score'] >= 50) {
             $score = 3;
          }elseif ($row4['score'] >= 45) {
             $score = 2;
          }elseif($row4['score'] >= 40) {
             $score = 1;
          }elseif($row4['score'] >= 0 AND                       $row4['score'] < 40) {
             $score = 0;
          }else{
             $score = 0;
          } 

          $grade_point += $score * $row4['course_unit'];

      }

I have added

  if ( $grade == 'AR' )
  {
       continue;
  }

But the calculations are still the same. It adds the course_unit value of any course having

$grade == 'AR' .

I'll be most delighted with you answers. Thanks very much.

UPDATE

I have being able to solve the grade piont part by adding

     elseif($row4['score'] >= 0 AND                       $row4['score'] < 40) {
             $score = 0;
          }else{
             $score = 0;
          }

This sets both the occurences of a score between 0 and 39 to zero and also the default score of <0 (i.e AR) to zero. But it still set's the value of the courses having a grade of AR and a score of -1 to the default respective values of the course_unit.

I think this problem is being cause due to the fact that the course_unit are preloaded from the database. Any help?

Courses Table Stucture
=================

course_id
course_code
course_title
course_unit

I'll be most delighted with your answers. Thank you in anticipation.

Is it as simple as adding "AND NOT 'AR'" to your SELECT SUM statement?

Or... if your DB values are coming in as AR, why can't you use PHP is_int() in your loop? That would allow you to still assign 0 for F, and just skip over any non integer values being sent from your DB.

Detecting emails in a text

5 votes

I'm trying to create a function that translates every occurrence of a plain text email address in a given string into it's htmlized version.

Let's say I have the following code, where htmlizeEmails is the function I'm looking for:

$str = "Send me an email to bob@example.com.";
echo htmlizeEmails($str); // Echoes "Send me an email to <a href="mailto:bob@example.com">bob@example.com</a>."

If possible, I'd like this function to use the filter_var function to check if the email is valid.

Does anyone know how to do this? Thanks!

Edit:

Thanks for the answers, I used Shocker's regex to match potential email addresses and then, only if the filter_var validates it, it gets replaced.

function htmlizeEmails($text)
    preg_match_all('/([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', $text, $potentialEmails, PREG_SET_ORDER);

    $potentialEmailsCount = count($potentialEmails);
    for ($i = 0; $i < $potentialEmailsCount; $i++) {
        if (filter_var($potentialEmails[$i][0], FILTER_VALIDATE_EMAIL)) {
            $text = str_replace($potentialEmails[$i][0], '<a href="mailto:' . $potentialEmails[$i][0] .'">' . $potentialEmails[$i][0] .'</a>', $text);
        }
    }
}

$str = preg_replace('/([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', '<a href="mailto:$1">$1</a>', $str);

where ([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}) is the regular expression used for detecting an email address (this is a general example, email addresses may be more complicated than this and not all addresses may be covered, but finding the perfect regex for emails is up to you)

Securely send a Plain Text password?

5 votes

I'm working on an application for iOS which will have the user fill out their password. The password will then be posted to a PHP page on my site using either POST or GET. (It must be plaintext because it is used in a script.)

Besides HTTPS, is there any way to secure the password? Encrypt it in Obj-C and then decrypt it in PHP?

NOTE: The username is not sent... only the password is posted to the server.

EDIT: To clarify, David Stratton is correct... I'm trying to prevent malicious sniffers in public locations from simply reading clear text passwords as they are posted to the server.

Challenge response outline

Lets assume you have one-way hash function abc (in practice use md5 or sha1).

The password you store in your database is abc(password + salt) (store the salt separately)

The server generates a random challenge challenge and sends it to the client (with the salt) and calculates the expected response: abc(challenge + abc(password + salt))

The client then calculates: abc(user_password + salt) and applies the challenge to get abc(challenge + abc(user_password + salt)), that is sent to the server and the server can easily verify validity.

This is secure because:

  • The password is never sent in plaintext, or stored in plaintext
  • The hash value that is sent changes every time (mitigates replay attack)

There are some issues:

How do you know what salt to send? Well, I've never really found a solution for this, but using a deterministic algorithm to turn a username into a salt solves this problem. If the algorithm isn't deterministic an attacker could potentially figure out which username exists and which do not. This does require you to have a username though. Alternatively you could just have a static salt, but I don't know enough about cryptography to assess the quality of that implementation.

How do facebook update content when somebody has posted something

4 votes

I'm really interested on how facebook loads only content when someone else has posted something. The only thing that I can think of is using something like the one below to constantly update the page without reloading the page.

setInterval(ajax_stuff, 1000);

I was watching the console and indeed the request occurs and another new content is added to the page.

enter image description here

I want to be enlightened on how is this done. It would really be awesome if I can use this on a project. I mean doing setInterval every second really consumes much resource. Making a request only when its needed would be the best way to do things. Specifically I want to use it on this project:

https://github.com/anchetaWern/ChatRo

It's basically just a chat box, currently it still uses the setInterval(). I want to update only the content when someone else on the chat session has actually entered something.

I cannot speak directly to how FaceBook does this, but in general, you should be looking at WebSockets.

WebSockets allow the JavaScript on your page to maintain an open connection with a server whereby you can push data out in near-realtime to all of the clients connected to the server.

Take a look at http://pusher.com

Also, google Web Sockets.