Best php questions in August 2010

Best methods to parse HTML with PHP

77 votes

I'm working on a system that requires the parsing of HTML documents under PHP.

My question is simply this:

What's the best method of parsing content for relative information.

When I parse a site I don't want random content I want to find relevant content such as blocks of text, images, links etc. But obviously I don't want header links or footer links.

So is there anyway you can advise me to look at... tips / tricks are also welcome :)

I prefer using one of the native XML extensions, like

If you prefer a 3rd party lib, I'd suggest not to use SimpleHtmlDom, but a lib that actually uses DOM/libxml underneath instead of String Parsing:

You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like

Or use a WebService like

If you want to spend some money, have a look at

Last and least recommended, you can extract data from HTML with Regular Expressions. In general using Regular Expressions on HTML is discouraged. The snippets you will usually find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Once the markup changes, the Regex fails.

You can write more reliable parsers, but writing a complete and reliable custom parser with Regular Expressions is a waste of time when the aforementioned libraries already exist and do a much better and likely faster job on this.

See

How to translate between programming languages

41 votes

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

  • This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.

  • I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.

  • I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.

  • From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.


EDIT:

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

I've been building tools (DMS Software Reengineering Toolkit) to do this kind of thing since 1995, supported by a strong team of computer scientists. It provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast, and then you need reliable parsers for langauges with unreliable definitions (PHP is perfect example of this).

There's nothing wrong with you thinking about it or attempting it, but I think you'll find this a much bigger task than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST).

The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the applications you intend to convert are pretty small, well, then that 20% is OK. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate to understand and modify in the context of another 80,000 lines of program you don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice.

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.

I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator. The difference is that with this much machinery, we succeed considerably more often than we fail.

Long-term learning for a "self-made" PHP developer

30 votes



As some of us have created their public site without an academic background, I ask this question mostly to professionals:

  • Enhance oneself : Non professional may encounter a barrier: it's easy to learn the basics (vars, loops, basic sql manipulation), but there are no "long term" tutorials, which really gives a deep knowledge. I know there is no better path but: what are the steps to become able to develop big(ger) projects alone? Something like: from spaghetti to MVC and OOP. Clicking links gives me knowledge, but not a good "framework".

  • Learn how to learn: Is there a way to learn how to use existing frameworks and pieces of codes ? Besides reading the help files, how do professionals proceed ? I don't want to copy-paste code anymore.

Extra question:

  • Career: Nowadays (not back in the 90's or the early 2000's), can a non-professional consider a developer career ?

Thanks for you replies.

EDIT:

Two great books that really helped me:

Answering the questions in the order of appearance:

  • Learn some higher level design concepts to be able to see a big system clearly.
  • read some docs, use some code, dive deep.
  • Object oriented design concepts
  • Didn't find a perfect one yet :)

edit:

Enhance oneself You dont need long term tutorials. You have an abundant of professional material and you can start with the book's on the answers to the question here. I warmly recommend the first one (Code Complete). You can find good professional material on any other programming subject that interest you. The question is not what but when you'll find time for it all.

Learning how to learn is different for each one. Find what you enjoy the most and get on with it. Read, code and do both as much as you can.

Career: I think a non-professional can be sometimes more professional than some of the professionals I've met, If you enjoy it and find the challenge interesting you can do great things, you can make machines work wonders, and act gracefully and become a professional by attitude and not by some paper on the wall.

Since you are intersted in building a public web project, here you can find some excellent answers

How is click-fraud detected?

21 votes

Which methods do Google (and other PPC companies) use to prevent click fraud?

Here's What Google Does: http://googleblog.blogspot.com/2008/03/using-data-to-help-prevent-fraud.html

Google Adwords Official Answer: https://adwords.google.com/support/aw/bin/answer.py?hl=en-uk&topic=10625&answer=6114

Google 3 part system for invalid click detection

Other Methods: There are many such as:

1) If there are multiple requests from the same IP.

2) Forensic analysis of advertisers' web server log files:
This analysis of the advertiser's web server data requires an in-depth look at the source and behavior of the traffic. As industry standard log files are used for the analysis, the data is verifiable by advertising networks. The problem with this approach is that it relies on the honesty of the middlemen in identifying fraud.

3) Third-party corroboration:
Third parties offer web-based solutions that might involve placement of single-pixel images or Javascript on the advertiser's web pages and suitable tagging of the ads. The visitor may be presented with a cookie. Visitor information is then collected in a third-party data store and made available for download. The better offerings make it easy to highlight suspicious clicks, and they show the reasons for such a conclusion. Since an advertiser's log files can be tampered with, their accompaniment with corroborating data from a third party forms a more convincing body of evidence to present to the advertising network. However, the problem with third-party solutions is that such solutions see only part of the traffic of the entire network. Hence, they can be less likely to identify patterns that span several advertisers. In addition, due to the limited amount of traffic they receive when compared to middlemen, they can be overly or less aggressive when judging traffic to be fraud.

4) Then, there are numerous Pay Per click fraud detection software: One example: http://www.whosclickingwho.com/

Php Destructors

18 votes

Please give me some real life examples when you had to use __destruct in your classes.

Ok, since my last answer apparently didn't hit the mark, let me try this again. There are plenty of resources and examples on the internet for this topic. Doing a bit of searching and browsing other framework's code and you'll see some pretty good examples...

Don't forget that just because PHP will close resources on termination for you doesn't mean that it's bad to explictly close them when you no longer need them (or good to not close them)... It depends on the use case (is it being used right up to the end, or is there one call early on and then not needed again for the rest of execution)...

Now, we know that __destruct is called when the object is destroyed. Logically, what happens if the object is destroyed? Well, it means it's no longer available. So if it has resources open, doesn't it make sense to close those resources as it's being destroyed? Sure, in the average web page, the page is going to terminate shortly after, so letting PHP close them usually isn't terrible. However, what happens if for some reason the script is long-running? Then you have a resource leak. So why not just close everything when you no longer need it (or considering the scope of the destructor, when it's no longer available)?

Here's some examples in real world frameworks:

  1. Lithium's lithium\net\Socket class
  2. Kohana's Memcached Driver
  3. Joomla's FTP Implementation
  4. Zend Frameworks's SMTP Mail Transport Class
  5. CodeIgniter's TTemplate Class
  6. A Tidy Filter Helper for Cake
  7. A Google-Groups Thread about using Destructors For the Symfony Session Class

The interesting thing is that Kohana keeps track of the tags, so that it can delete by "namespace" later (instead of just clearing the cache). So it uses the destructor to flush those changes to the hard storage.

The CodeIgniter class also does something interesting in that it adds debugging output to the output stream in the destructor. I'm not saying this is good, but it's an example of yet another use...

I personally use destructors whenever I have long running processes on my master controller. In the constructor, I check for a pid file. If that file exists (And its process is still running), I throw an exception. If not, I create a file with the current processes id. Then, in the destructor I remove that file. So it's more about cleaning up after itself than just freeing resources...

When (if ever) is eval NOT evil?

14 votes

I've heard many places that PHP's eval function is often not the answer. In light of PHP 5.3's LSB and closures we're running out of reasons to depend on eval or create_function.

Are there any conceivable cases where eval is the best (only?) answer in PHP 5.3?

This question is not about whether eval is evil in general, as it obviously is not.

Summary of Answers:

  • Evaluating numerical expressions (or other "safe" subsets of PHP)
  • Unit testing
  • Interactive PHP "shell"
  • Deserialization of trusted var_export
  • Some template languages
  • Creating backdoors for administers and/or hackers
  • Compatibility with < PHP 5.3
  • Checking syntax (possibly not safe)

Eric Lippert sums eval up over three blog posts. It's a very interesting read.

As far as I'm aware, the following are some of the only reasons eval is used.

For example, when you are building up complex mathematical expressions based on user input, or when you are serializing object state to a string so that it can be stored or transmitted, and reconstituted later.

in MVC, where do you draw the line between a controller and model?

13 votes

I've seen code written where almost all non-route related code is passed to a model. I have also seen code where all database persistence is handled by a model, but non-DB processing is handled by the controller.

Which is the better approach?

The line between controller and model is actually quite clear.

Model is your application's heart. It contains the business/domain logic required to solve the problem your application was written for. The Model is usually layered into several other layers, e.g. persistence, services, domain, etc. It's a common misconception that the Model is just the database, as much as it is a common misconception that the database should be an ActiveRecord.

The controller (and the view) are part of the presentation layer. A controller's sole responsibility is to receive and handle user input directed towards your application and delegate this to the appropriate parts in the model. Nothing more. It should not handle complex application flow or code of your problem domain. You want controllers to be skinny and models fat with logic. The Model should not know about either C or V and you should be able to swap out V and C for a different presentation layer without having to touch your M.

See MVC Excerpt in Patterns of Enterprise Application Architecture

Loop through an array of data and print an 'incrementing' letter

12 votes

I need to loop through an array of data and print an 'incrementing' letter for each array value. I know I can do this:

$array = array(11, 33, 44, 98, 1, 3, 2, 9, 66, 21, 45); // array to loop through
$letters = array('a', 'b', 'c', ...); // array of letters to access
$i = 0;
foreach($array as $value) {
    echo $letters[$i++] . " - $value";
}

It seems that there should be a better way than having to create an alphabet array. Any suggestions?

Note - My loop will never get through the entire alphabet, so I'm not concerned about running out of letters.

$letters = range('a','z');

Many hash iterations: append salt every time?

12 votes

I have used unsalted md5/sha1 for long time, but as this method isn't really secure (and is getting even less secure as time goes by) I decided to switch to a salted sha512. Furthermore I want to slow the generation of the hash down by using many iterations (e.g. 100).

My question is whether I should append the salt on every iteration or only once at the beginning. Here are the two possible codes:

Append every time:

    // some nice big salt
    $salt = hash($algorithm, $salt);

    // apply $algorithm $runs times for slowdown
    while ($runs--) {
        $string = hash($algorithm, $string . $salt, $raw);
    }

    return $string;

Append once:

    // add some nice big salt
    $string .= hash($algorithm, $salt);

    // apply $algorithm $runs times for slowdown
    while ($runs--) {
        $string = hash($algorithm, $string, $raw);
    }

    return $string;

I first wanted to use the second version (append once) but then found some scripts appending the salt every time.

So, I wonder whether adding it every time adds some strength to the hash. For example, would it be possible that an attacker found some clever way to create a 100timesSha512 function which were way faster than simply executing sha512 100 times?

In short: Yes. Go with the first example... The hash function can lose entropy if feed back to itself without adding the original data (I can't seem to find a reference now, I'll keep looking).

And for the record, I am in support of hashing multiple times.

A hash that takes 500 ms to generate is not too slow for your server (considering that generating hashes are typically not done the vast majority of requests). However a hash that takes that long will significantly increase the time it will take to generate a rainbow table...

Yes, it does expose a DOS vulnerability, but it also prevents brute force attacks (or at least makes them prohibitively slow). There is absolutely a tradeoff, but to some the benefits exceed the risks...

A reference (more like an overview) to the entire process: Key Strengthening

As for the degenerating collisions, the only source I could find so far is this discussion...

And some more discussion on the topic:

  1. HEKS Proposal
  2. SecurityFocus blog on hashing
  3. A paper on Oracle's Password Hashing Algorithms

And a few more links:

  1. PBKDF2 on WikiPedia
  2. PBKDF2 Standard
  3. A email thread that's applicable
  4. Just Hashing Is Far From Enough Blog Post

There are tons of results. If you want more, Google hash stretching... There's tons of good information out there...

What is the safest way of passing arguments from server-side PHP to client-size JavaScript

12 votes

In my application I rely heavily on JavaScript to enhance the user interface, but all of the data comes from a database and is processed by PHP. By default I use 'echo' statements to substitute the required values "just in time" like so:

var myVariable = <?php echo $myVariableInPHP ?>

This, however, does not strike me as very elegant and I am concerned about stability and maintainability of such code.

Do I have any alternatives here?

For server-side, I am using the Symfony 1.4 PHP framework.

Thanks,

My favorite way is :

<?php

$var = array(
  'prop1' => 'value1',
  'prop2' => 'value2',
  // ...
);

?>
<script type="text/javascript">
   var varNameSpace = <?php echo json_encode($var); ?>;

   alert( varNameSpace.prop1 ); // -> 'value1'
</script>

Using json_encode() ensures that the values passed to Javascript are escaped and well formatted. Using a common variable container also prevents from over using the global space (window).

PHP Framework Overhead

11 votes

There are tons of PHP frameworks out there; some are pretty decent, others seem bloated and unnecessary. After watching Rasmus Lerdorf's presentation on PHP performance at Digg, I'm somewhat more concerned about the performance of the frameworks that I choose for building my applications with.

Two of the most popular frameworks that I'm aware of are CodeIgniter and CakePHP. From what I understand, CakePHP is a terrible resource hog. What about CodeIgniter? I hear that Zend Framework isn't all that slim, either.

Are there other (more performant) frameworks that I should be interested in? Would it be better to simply not use a framework at all? What considerations should I make towards choosing a PHP framework?

Using a framework or not using a framework means you're making a choice between

  1. Default Application Performance under load

  2. Speed/Stability of Development

If you decide not to use a framework, you still need to do the things a framework would do. You're just coding them yourself in raw PHP, or developing your own framework that can remain lightweight since it only has to do what you want it to do, and not what the world wants it to do. You will get better performance, but you'll spend more time developing and debugging that code that a framework handles for you automatically.

What a framework buys you is speed in development time. You don't have to write out long complicated SQL queries, or debug someone else's long complicated SQL queries. You just need to create a table and instantiate a model. You don't need to decide where you're going to escape your SQL paramaters, because the framework defines where that happens. You don't need to get into huge political fights over where the business logic vs. presentation logic goes, because the framework defines this. A framework remove the need from having a system developer on your team, or removes you from having to think about/waste time on system development. You can get to coding your application faster and get measurable, visible results sooner.

Here's another way to think of it. PHP Frameworks are slower than PHP, but PHP itself is slower than C. Why not write your application directly in C?

There's no right answer here, it's one of those software engineering/development questions that's a matter of what your current situation demands. The default choice of the industry these days is to use a framework, because if you don't your competitors will release an application that has slower PHP processing than yours, but hits the market three months earlier.

Finally, one last thing to consider from that talk. Rasmus said that Most of the time the perceived performance of your application is in the frontend. Both the Javascript code and how the browser is caching the requests it makes back to your server. PHP is an awful, horrible language that's rarely the bottleneck. When it is the bottle neck, you can usually make a few adjustments (opt code cache, focused refactoring) that will remove the performance bottleneck.

Maximum length for MD5 encryption

11 votes

What is the maximum length of the string that can have md5 encription or Is it has no limit, and if so what will be the max lendth of the md5 encripted value?

MD5 processes an arbitrary-length message into a fixed-length output of 128 bits, typically represented as a sequence of 32 hexadecimal digits.

exec always returns -1

11 votes

I'm using php 5.2.9 on a production server, and it seems that the exec() function behaves "non-standard".

If i run exec("ls", $output, $return_var) then $output will contain the list of files in the current folder as expected, but $return_var will be set to -1 instead of 0, as expected. I'm using the $return_var to determine wherever the command finished successfully, and on every other server tested this works as expected:)

Anyone ever hit a situation like this?


edit:

<?php
$command = "asd";

$t1 = time();

$output = Array();
$result = -5;
$r = exec($command, $output, $result);
$t2 = time();

echo "<pre>";
var_export(Array(
    'command'=>$command,
    'result'=>$result,
    'output'=>implode("\n", $output),
    'r'=>$r,
    't2-t1'=>$t2-$t1,
));
echo "</pre>";

Whatever command i put in $command, $result will always be -1, even for nonexistent commands...this is very weird

Assuming the system returning $result == -1 is Unix-like based (I don't know how would behave Windows with the same code)

The PHP (5.2.9) exec() function does not call the C exec() primitive (which returns -1 if it could not replace/execute the process, which is not the case here). Instead it calls popen() that creates a pipe, performs a fork() and execute a shell with your command. The return_value, -1, is not the direct result from a C primitive, but rather is built by PHP internally, depending on the way your command was processed. In other terms, the "ls" command may have been well executed, while for instance PHP could not close properly the pipe.

Looking at the C code, in ext/standard/exec.c, there could be two reasons why the return code is -1, triggered by an error ; the 2nd one happens after the popen() call

  fp = VCWD_POPEN(cmd_p, "r");

  if (!fp) {
       php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to fork [%s]", cmd);
       goto err;
  }
  // ...
  err:

  pclose_return = -1;
  goto done;

However in this case, you wouldn't see the result, and the log would show an error.

Later, the return_value is set via the line

  pclose_return = php_stream_close(stream);

Looking at _php_stream_free() (php_stream_close() is a macro replaced with _php_stream_free()), the most likely candidate that could return -1 is

  ret = stream->ops->close(stream, preserve_handle ? 0 : 1 TSRMLS_CC);

Which in turn calls indirectly the C primitive pclose(). According to the manual

The pclose() function returns -1 if wait4(2) returns an error, or some other error is detected.

There seem to be an error detected during the closing of the pipe, that does not prevent the resulting data to be set. To find the reason rigorously, one need to check the operating system setup and logs, the PHP configuration and compilation parameters.

I would recommend

  • to apply patches for your OS, and maybe update to a more recent version (if applicable),
  • to update PHP to 5.3.3 (latest as of now) since the PHP exec() code changed significantly.

Be aware that there were changes related to the PHP suhosin module in the version 5.3 that enhance by default the security when running PHP files.

Exotic names for methods, constants, variables and fields - Bug or Feature?

10 votes

after some confusion in the comments to

I thought I make into a question. According to the PHP manual, a valid class name should match against [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*. But apparently, this is not enforced, nor does it apply for anything else:

define('π', pi());
var_dump('π');

class ␀ {
    private $␀ = TRUE;
    public function ␀()
    {
        return $this->␀;
    }
}

$␀ = new ␀;
var_dump($␀ );
var_dump($␀->␀());

works fine (even though my IDE cannot show ␀). Can some erudite person clear this up for me? Can we use any Unicode? And if so, since when? Not that I would actually want to use anything but A-Za-z_ but I'm curious.

Clarification: I am not after a Regex to validate class names, nor do I know if PHP internally uses the Regex it suggests in the manual. The thing that confused me (and apparently the other guys in the linked question) is why things like $☂ = 1 can be used in PHP at all. PHP6 was suppposed to be the Unicode release but PHP6 is in hiatus. But if there is no Unicode support, why can I do this then?

This question starts to mention class names in the title, but then goes on to an example that includes exotic names for methods, constants, variables, and fields. There are actually different rules for these. Let's start with the case insensitive ones.

Case-insensitive identifiers (class and function/method names)

The general guideline here would be to use only printable ASCII characters. The reason is that these identifiers are normalized to their lowercase version, however, this conversion is locale-dependent. Consider the following PHP file, encoded in ISO-8859-1:

<?php
function func_á() { echo "worked"; }
func_Á();

Will this script work? Maybe. It depends on what tolower(193) will return, which is locale-dependent:

$ LANG=en_US.iso88591 php a.php
worked
$ LANG=en_US.utf8 php a.php

Fatal error: Call to undefined function func_Á() in /home/glopes/a.php on line 3

Therefore, it's not a good idea to use non-ASCII characters. However, even ASCII characters may give trouble in some locales. See this discussion. It's likely that this will be fixed in the future by doing a locale-independent lowercasing that only works with ASCII characters.

In conclusion, if we use multi-byte encodings for these case-insensitive identifiers, we're looking for trouble. It's not just that we can't take advantage of the case insensitivity. We might actually run into unexpected collisions because all the bytes that compose a multi-byte character are individually turned into lowercase using locale rules. It's possible that two different multi-byte characters map to the same modified byte stream representation after applying the locale lowercase rules to each of the bytes.

Case-sensitive identifiers (variables, constants, fields)

The problem is less serious here, since these identifiers are case sensitive. However, they are just interpreted as bytestreams. This means that if we use Unicode, we must consistently use the same byte representation; we can't mix UTF-8 and UTF-16; we also can't use BOMs.

In fact, we must stick to UTF-8. Outside of the ASCII range, UTF-8 uses lead bytes from 0xc0 to 0xfd and the trail bytes are in the range 0x80 to 0xbf, which are in the allowed range per the manual. Now let's say we use the character "Ġ" in a UTF-16BE encoded file. This will translate to 0x01 0x20, so the second byte will be interpreted as a space.

Having multi-byte characters being read as if they were single-byte characters is, of course, no Unicode support at all. PHP does have some multi-byte support in the form of the compilation switch "--enable-zend-multibyte". This allows you to declare the encoding of the the script:

<?php
declare(encoding='ISO-8859-1');
// code here
?>

It will also handle BOMs, which are used to auto-detect the encoding and do not become part of the output. There are, however, a few downsides:

  • Peformance hit, both memory and cpu. It stores a representation of the script in an internal multi-byte encoding, which takes more space (and it also seems to store in memory the original version) and it also spends some CPU converting the encoding.
  • Multi-byte support is usually not compiled in, so it's less tested (more bugs).
  • Portability issues between installations that have the support compiled in and those that don't.
  • Refers only to the parsing stage; does not solve the problem outlined for case-insensitive identifiers.

Finally, there is the problem of lack of normalization – the same character may be represented with different Unicode code points (independently of the encoding). This may lead to some very difficult to track bugs.

Detect how much stress the mysql database is currently under with PHP

8 votes

I am developing a large web app and want it to alter itself dependent on a factor that relates to the stress the database is currently under.

I am not sure what would me most accurate/effective/easiest. I am considering maybe the number of current connections or server response time or CPU useage?

What would be best suited and possible?

Thanks

Interesting question. What you REALLY want is a way for PHP to ask the mySQL server two questions:

server, are you using almost all your cpu capacity?

server, are you using almost all your disk IO capacity?

Based on the answers, I suppose you want to simplify the work your PHP web app does ... perhaps by eliminating some kind of search capability, or caching some data more aggressively.

If you have a shell to your (linux or bsd) mysql server, your two questions can be answered by eyeballing the output from these two commands.

   sar -u 1 10     # the %idle column tells you about unused cpu cycles
   sar -d 1 10     # the %util column tells you which disks are busy and how busy.

But, there's no sweet little query which fetches this data from mySQL to your app.

Edit: one possibility is to write a little PERL hack or other simple program that runs on your server, connects to the local data base, and once every so often (once a minute, maybe) determines %idle and %util, and updates a little one-row table in your data base. You could, without too much trouble, also add stuff like how full your disks are, to this table, if you care. Then your PHP app can query this table. This is an ideal use of the MEMORY access method. At any rate, keep it simple: you don't want your monitoring to weigh down your server.

A second-best trick, that you CAN do from your client.

Issue the command SHOW PROCESSLIST FULL, count the number of rows (mySQL processes) for which the Command is "Query", and if you have a lot of them consider it to be a high workload.

You might also add up the Time values for the processes which have status Query, and use a high value of that time as a threshold.

EDIT: if you're running on a mySQL 5 server, and your server account has access to the mySql-furnished information_schema, you can use a query directly to get the process data I mentioned:

SELECT (COUNT(*)-1) P.QUERYCOUNT, SUM(P.TIME) QUERYTIME
FROM information_schema.PROCESSLIST P
WHERE P.COMMAND = 'Query'

COUNT(*) - 1: because the above query itself counts as a query.

You will need to fiddle with the threshold values to make this work right in production.

It's a good idea to have your PHP web app shed load when the data base server can't keep up. Still, a better idea is to identify your long-running queries and optimize them.

How bad is using SELECT MAX(id) in MYSQL instead of mysql_insert_id() in PHP?

8 votes

Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).

I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...

"Oh okay, we'll only face this problem when we have 
(a) a lot of users, or 
(b) it'll only happen when two people try doing something
    at _exactly_ the same time"

I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.

Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...


Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!

The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.

It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.

I kind of agree with @Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.

And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.

And right from the MySQL documentation on the subject:

You can generate sequences without calling LAST_INSERT_ID(), but the utility of 
using the function this way is that the ID value is maintained in the server as 
the last automatically generated value. It is multi-user safe because multiple 
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by 
other clients that generate their own sequence values.

Packaging a PHP-MySQL app to allow easy install

7 votes

I've developed an open source application in php and mysql. I'd like to give it to the end user to install on their computer and use from their browser without me having to host it for them. But the end users are non-developers so they're unlikely to have what it takes to run the application (php-apache local environment like a developer would) and I don't have the time right now to invest in learning the Windows or Mac SDKs to make a real windows or Mac application. Also most of those interested in it are friends or their friends.

The solution I'm considering is to package apache/mysql/php with the php app itself, and have the installer install them so the app could run from inside the www/htdocs folder. It's like an app that comes with its own server to run it.

  • Has anyone done this sort of thing before?
  • Do I need to build apache/php/mysql from source on windows to do this, or can I somehow use existing windows binaries and have my installer just install them and position my app in the right location?
  • I'm guessing that launching or closing the application could be done through starting/stopping apache, so how would I implement a start/stop to tie into the apache start/stop.
  • Any help or ideas on this would be appreciated.

you should include the zip of xampp with the files you need for the app preloaded in the htdocs folder. you can have the users extract it to their c drive root and include some sort of README or instructions on how to start up apache and mysql. xampp includes a convenient little control panel for this purpose.

edit:

I personally use xampp all the time when I am traveling and can't be connected to my server for active development. it works wonderfully and is contained all in one folder. It also doesn't require any installation, you just unzip the package. one caveat: installing to anywhere but C:\xampp is annoying.

Building a long query and have a lot of if statements - is there a more elegant way?

7 votes

I have to build a query based on certain conditions. Is there a better way of doing it than the way I have done below? It works fine but I can see it getting out of hand fairly quickly if there were more conditions since I check if any previous conditions had been met every time I check a new one.

    $sql = "SELECT DISTINCT fkRespondentID FROM tblRespondentDayTime";

    if (!empty($day) || !empty($time) || !empty($sportID)) {

        $sql .= " WHERE";

        if (!empty($day)) {
            $sql .= " fldDay='$day'";
        }

        if (!empty($time)) {
            if (!empty($day)) {
                $sql .= " AND";
            }
            $sql .= " fldTime='$time'";
        }

        if (!empty($sportID)) {
            if (!empty($day) || !empty($time)) {
                $sql .= " AND";
            }
            $sql .= " fkRespondentID IN (SELECT fkRespondentID FROM tblRespondentSport WHERE fkSportID='$sportID')";
        }

    }

I would use the old "WHERE 1=1" trick; add this as the first condition, and then you can assume the "AND" condition on each statement that follows.

unixODBC Freetds PHP Problem

7 votes

I am using Debian. I have unixODBC installed as well as FreeTDS. I am using PHP

I have read several How-Tos and am stuck on a problem.

I tested FreeTDS by using tsql and it works.

I tested unixODBC by using isql and it works.

When I created a script in PHP and tried to access a database I get the following errors.

Fatal error: Call to undefined function odbc_connect()

I have found multiple php.ini files. Which is the one that Apache2 uses? Is there something in there that needs to be set.

Is there some setting that I missed seting that was not in the How-Tos?

All help is greatly appreciated.

You may need to provide environment variables to point to the location of your ODBC configuration files:

<?php
putenv("FREETDSCONF=/etc/freetds/freetds.conf");
putenv("ODBCSYSINI=/etc/odbcinst.ini");
putenv("ODBCINI=/etc/odbc.ini");

This works for me to connect to several ODBC databases. (Your config files might be somewhere else)

Is there any way to prevent AJAX pages from being viewed alone in a browser?

6 votes

For example, when I want to update a part of my page with AJAX I would normally make the appropriate call to getPost.php which would return the markup to be inserted into my page. Is there any way to prevent a user from accessing this page directly (eg: example.com/getPost.php with the appropriate GET or POST arguments) and getting only part of the page since this should be used with AJAX as part of a whole, not alone?

I don't think permissions can be set on the file since it's the client requesting the page but is there a way to do this by passing an extra argument that can serve as a check digit of sorts.

You could take a look at the request headers and enforce that a header must be set for AJAX requests (often people use X-Requested-With with a value like XMLHttpRequest). Be aware that this header won't be set unless you set it yourself when you make your AJAX request (or use a Javascript library that does it automatically). However, there is no way to guarantee that someone wouldn't add in that header on their own if they wanted to.

The X-Requested-With header value can be found in $_SERVER['HTTP_X_REQUESTED_WITH'].