## How can I compare two sets of 1000 numbers against each other?

I must check approximately 1000 numbers against 1000 other numbers.

I loaded both and compared them server-side:

``````foreach( \$numbers1 as \$n1 ) {
foreach( \$numbers2 as \$n2 ) {
if( \$n1 == \$n2 ) {
doBla();
}
}
}
``````

This took a long time, so I tried to do the same comparison client side using two hidden `div` elements. Then compared them using JavaScript. It still takes 45 seconds to load the page (using hidden `div` elements).

I do not need to load the numbers that are not the same.

Is there a faster algorithm? I am thinking of comparing them database side and just load the error numbers, then do an Ajax call for the remaining non-error numbers. But is a MySQL database fast enough?

Sort the lists first. Then you can walk up both lists from the start, comparing as you go.

The loop would look something like this:

``````var A = getFirstArray().sort(), B = getSecondArray().sort();

var i = 0, j = 0;
while (i < A.length && j < B.length) {
if (A[i] === B[j]) {
doBla(A[i]);
i++; j++;
}
else if (A[i] < B[j]) {
i++;
}
else
j++;
}
``````

(That's JavaScript; you could do it server-side too, but I don't know PHP.)

Edit — just to be fair to all the hashtable fans (whom I respect of course), it's pretty easy to do that in JavaScript:

``````var map = {};
for (var i = 0; i < B.length; ++i) map[B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map[A[i]]) doBla(A[i]);
``````

Or if the numbers are or might be floats:

``````var map = {};
for (var i = 0; i < B.length; ++i) map['' + B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map['' + A[i]]) doBla(A[i]);
``````

Since numbers are pretty cheap to hash (even in JavaScript, converting to string before hashing is surprisingly cheap), this would be pretty fast.

## How does Facebook achieve good performance?

I was told that Facebook was built using only PHP and MySQL, so how can Facebook's performance be so good?

Note: This needs to be updated.

1. Facebook uses HipHop, which converts PHP into C++ code (which is then compiled into much more efficient machine code than actual PHP).

2. Facebook has data distributed across many, many servers. For example, they also use Hadoop clusters for some of their data storage.

## How do search engines find relevant content?

How does Google find relevant content when it's parsing the web?

Let's say, for instance, Google uses the PHP native DOM Library to parse content. What methods would they be for it to find the most relevant content on a web page?

My thoughts would be that it would search for all paragraphs, order by the length of each paragraph and then from possible search strings and query params work out the percentage of relevance each paragraph is.

Let's say we had this URL:

``````http://domain.tld/posts/stackoverflow-dominates-the-world-wide-web.html
``````

Now from that URL I would work out that the HTML file name would be of high relevance so then I would see how close that string compares with all the paragraphs in the page!

A really good example of this would be Facebook share, when you share a page. Facebook quickly bots the link and brings back images, content, etc., etc.

I was thinking that some sort of calculative method would be best, to work out the % of relevancy depending on surrounding elements and meta data.

Are there any books / information on the best practices of content parsing that covers how to get the best content from a site, any algorithms that may be talked about or any in-depth reply?

Some ideas that I have in mind are:

• Find all paragraphs and order by plain text length
• Somehow find the Width and Height of `div` containers and order by (W+H) - @Benoit
• Check meta keywords, title, description and check relevancy within the paragraphs
• Find all image tags and order by largest, and length of nodes away from main paragraph
• Check for object data, such as videos and count the nodes from the largest paragraph / content div
• Work out resemblances from previous pages parsed

The reason why I need this information:

I'm building a website where webmasters send us links and then we list their pages, but I want the webmaster to submit a link, then I go and crawl that page finding the following information.

• An image (if applicable)
• A < 255 paragraph from the best slice of text
• Keywords that would be used for our search engine, (Stack Overflow style)
• Meta data Keywords, Description, all images, change-log (for moderation and administration purposes)

Hope you guys can understand that this is not for a search engine but the way search engines tackle content discovery is in the same context as what I need it for.

This is a very general question but a very nice topic! Definitely upvoted :) However I am not satisfied with the answers provided so far, so I decided to write a rather lengthy answer on this.

The reason I am not satisfied is that the answers are basically all true (I especially like the answer of kovshenin (+1), which is very graph theory related...), but the all are either too specific on certain factors or too general.

It's like asking how to bake a cake and you get the following answers:

• You make a cake and you put it in the oven.
• You definitely need sugar in it!
• What is a cake?
• The cake is a lie!

You won't be satisfied because you wan't to know what makes a good cake. And of course there are a lot or recipies.

Of course Google is the most important player, but, depending on the use case, a search engine might include very different factors or weight them differently.

For example a search engine for discovering new independent music artists may put a malus on artists websites with a lots of external links in.

A mainstream search engine will probably do the exact opposite to provide you with "relevant results".

There are (as already said) over 200 factors that are published by Google. So webmasters know how to optimize their websites. There are very likely many many more that the public is not aware of (in Google's case).

But in the very borad and abstract term SEO optimazation you can generally break the important ones apart into two groups:

1. How well does the answer match the question? Or: How well does the pages content match the search terms?

2. How popular/good is the answer? Or: What's the pagerank?

In both cases the important thing is that I am not talking about whole websites or domains, I am talking about single pages with a unique URL.

It's also important that pagerank doesn't represent all factors, only the ones that Google categorizes as Popularity. And by good I mean other factors that just have nothing to do with popularity.

In case of Google the official statement is that they want to give relevant results to the user. Meaning that all algorithms will be optimized towards what the user wants.

So after this long introduction (glad you are still with me...) I will give you a list of factors that I consider to be very important (at the moment):

Category 1 (how good does the answer match the question?

You will notice that a lot comes down to the structure of the document!

• The page primarily deals with the exact question.

Meaning: the question words appear in the pages title text or in heading paragraphs paragraphs. The same goes for the position of theese keywords. The earlier in the page the better. Repeated often as well (if not too much which goes under the name of keywords stuffing).

• The whole website deals with the topic (keywords appear in the domain/subdomain)

Category 2 (how important/popular is the page?)

You will notice that not all factors point towards this exact goal. Some are included (especially by Google) just to give pages a boost, that... well... that just deserved/earned it.

• Content is king!

The existence of unique content that can't be found or only very little in the rest of the web gives a boost. This is mostly measured by unordered combinations of words on a website that are generally used very little (important words). But there are much more sophisticated methods as well.

• Recency - newer is better

• Historical change (how often the page has updated in the past. Changing is good.)

If a page links another page the link is worth more if the page itself has a high pagerank.

basically links from different root domains, but other factors play a role too. Factors like even how seperated are the webservers of linking sites geographically (according to their ip address).

• Trust Rank

For example if big, trusted, established sites with redactional content link to you, you get a trust rank. That's why a link from The New York Times is worth much more than some strange new website, even if it's PageRank is higher!

• Domain trust

Your whole website gives a boost to your content if your domain is trusted. Well different factors count here. Of course links from trusted sties to your domain, but it will even do good if you are in the same datacenter as important websites.

If websites that can be resolved to a topic link to you and the query can be resolved to this topic as well, it's good.

• Distribution of links in over time.

If you earned a lot of links in in a short period of time, this will do you good at this time and the near future afterwards. But not so good later in time. If you slow and steady earn links it will do you good for content that is "timeless".

A link from a `.gov` domain is worth a lot.

• User click behaviour

Whats the clickrate of your search result?

• Time spent on site

Google analytics tracking, etc. It's also tracked if the user clicks back or clicks another result after opening yours.

• Collected user data

Votes, rating, etc., references in Gmail, etc.

Now I will introduce a third category, and one or two points from above would go into this category, but I haven't thought of that... The category is:

** How important/good is your website in general **

All your pages will be ranked up a bit depending on the quality of your websites

Factors include:

• Good site architecture (easy to navgite, structured. Sitemaps, etc...)

• How established (long existing domains are worth more).

• Hoster information (what other websites are hosted near you?

• Search frequency of your exact name.

Last, but not least, I want to say that a lot of these theese factors can be enriched by semantic technology and new ones can be introduced.

For example someone may search for Titanic and you have a website about icebergs ... that can be set into correlation which may be reflected.

Newly introduced semantic identifiers. For example OWL tags may have a huge impact in the future.

For example a blog about the movie Titanic could put a sign on this page that it's the same content as on the Wikipedia article about the same movie.

This kind of linking is currently under heavy development and establishment and nobody knows how it will be used.

Maybe duplicate content is filtered, and only the most important of same content is displayed? Or maybe the other way round? That you get presented a lot of pages that match your query. Even if they dont contain your keywords?

Google even applies factors in different relevance depending on the topic of your search query!

## Best way to implement Single-Sign-On with all major providers?

I already did a lot of research on this topic and have implemented a lot of solutions myself.

Including OpenID, Facebook Connect (using the old Rest api and the new Graph Oauth 2.0 Api), Sign in with twitter (wich has been upgraded to fully qualified Openid by now as far as I know), and so on...

But what I'm still missing is the perfect all in one solution.

During my research I stumpled about some interesting projects:

But I don't want to rely on an external provider and I would like a free solution as well, so I am not limited in implementation.

I have also seen developers implementing one service after another dutyfully following the providers instructinos and setting up models and database tables for everything.

Of course this will work but it is a shitload of work and always needs developement and changes in your application etc.

What I am looking for is an abstraction layer that takes all the services out there to one standard that can be integrated in my website. Once a new service appears I only want to add one model that deals with the abstraction of that specific provider so I can seamlessly integrate it into my application.

Or better, find an already existing solution that I can just dowonload.

Ideally this abstraction service would be hosted independently from my application so it can be used for several applications and be upgraded independently.

The last of the 3 solutions above looks promising from the concept. Everything is just ported to an synthetic openid, and the website jut has to implement openid.

After a while i found Django socialauth, a python based authentication system for the Django Webframework. But it looks like it operates as described above and i think this is the same login system that Stackoverflow uses (or at least some modified fork...).

I downloaded it and tried to set it up and to see whether it could be set up as a standalone solution but I had no luck, as I am not so into python either.

I would love a PHP based solution.

So after this long text my question precisely is:

• How would you implement SSO, any better idea than porting everything and have OpenID as basis?
• What are the pros and cons of that?
• Do you know any already existing solutions? Preferrably open source.

I hope this question is not too subjective, thanks in advance.

Update: I concluded that building a proxy / wrapper or what you might call it for facebook, to port it to an openid so it beocmes an openid endpoint / provider would be the best option. So that exactly what i did.

I added the bounty to get feedback/discussion on it. Maby my approach is not so good as i currently think it is!

Almost every major provider is an openid provider / endpoint including Google, Yahoo, Aol.

Some of them requrie the user to specify the username to construct the openid endpoint. Some of them (the ones mentioned above) do have discovery urls, where the user id is automatically returned so that the user only has to click. (i would be glad if someone could explain the technical background)

However the only pain in the ass is Facebook, because they have their Facebook connect where they use an adapted version of OAuth for authentication.

Now what I did for my project is to set up an openid provider that authenticates the user with the credentials of my facebook Application - so the user gets connected to my application - and returns a user id that looks like:

``````http://my-facebook-openid-proxy-subdomain.mydomain.com/?id=facebook-user-id
``````

I also configured it to fetch email adress and name and return it as AX attributes.

So my website just has to implement opend id and i am fine :)

I build it upon the classes you can find here: http://gitorious.org/lightopenid

In my index.php file i just call it like this:

``````<?php
require 'LightOpenIDProvider.php';
\$op->baseurl = 'http://fbopenid.2xfun.com'; // needs to be allowed by facebook
\$op->server();
?>
``````

and the source code of FacebookProvider.php follows:

``````<?php
{
public \$appid = "";
public \$appsecret = "";
public \$baseurl = "";

// i have really no idea what this is for. just copied it from the example.
public \$select_id = true;

function __construct() {

\$this->baseurl = rtrim(\$this->baseurl,'/'); // no trailing slash as it will be concatenated with
// request uri wich has leading slash

parent::__construct();

# If we use select_id, we must disable it for identity pages,
# so that an RP can discover it and get proper data (i.e. without select_id)
if(isset(\$_GET['id'])) {
// i have really no idea what happens here. works with or without! just copied it from the example.
\$this->select_id = false;
}
}

function setup(\$identity, \$realm, \$assoc_handle, \$attributes)
{
// here we should check the requested attributes and adjust the scope param accordingly
// for now i just hardcoded email
\$attributes = base64_encode(serialize(\$attributes));

\$redirecturl = urlencode(\$this->baseurl.\$_SERVER['REQUEST_URI'].'&attributes='.\$attributes);
\$url .= \$redirecturl;
\$url .= "&display=popup";
\$url .= "&scope=email";
exit();

}

function checkid(\$realm, &\$attributes)
{
// try authenticating
\$code = isset(\$_GET["code"]) ? \$_GET["code"] : false;
if(!\$code) {
// user has not authenticated yet, lets return false so setup redirects him to facebook
return false;
}

// we have the code parameter set so it looks like the user authenticated

\$redirecturl = (\$this->baseurl.\$_SERVER['REQUEST_URI']);
\$redirecturl = strstr(\$redirecturl, '&code', true);
\$redirecturl = urlencode(\$redirecturl);
\$url .= \$redirecturl;
\$url .= "&client_secret=".\$this->secret;
\$url .= "&code=".\$code;
\$data = \$this->get_data(\$url);

parse_str(\$data,\$data);

\$token = \$data['access_token'];

\$data = json_decode(\$data);

\$id = \$data->id;
\$email = \$data->email;
\$attribute_map = array(
'namePerson/friendly' => 'name', // we should parse the facebook link to get the nickname
'contact/email' => 'email',
);

if(\$id > 0) {

\$requested_attributes = unserialize(base64_decode(\$_GET["attributes"]));

// lets be nice and return everything we can
\$requested_attributes = array_merge(\$requested_attributes['required'],\$requested_attributes['optional']);
\$attributes = array();
foreach(\$requested_attributes as \$requsted_attribute) {
if(!isset(\$data->{\$attribute_map[\$requsted_attribute]})) {
continue; // unknown attribute
}
\$attributes[\$requsted_attribute] = \$data->{\$attribute_map[\$requsted_attribute]};
}

// yeah authenticated!
return \$this->serverLocation . '?id=' . \$id ;
}
die('login failed'); // die so we dont retry bouncing back to facebook
return false;
}
function get_data(\$url) {
\$ch = curl_init();
\$timeout = 5;
curl_setopt(\$ch,CURLOPT_URL,\$url);
curl_setopt(\$ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt(\$ch,CURLOPT_CONNECTTIMEOUT,\$timeout);
\$data = curl_exec(\$ch);
curl_close(\$ch);
return \$data;
}

}
``````

Its just a first working version (quick and dirty) Some dynamic stuff is hardcoded to my needs. It should show how and that it can be done. I am happy if someone picks up and improves it or re writes it or whatever :)

Well i consider this question answered

but I add a bounty just to get discussion. I would like to know what you think of my solution.

I will award the bounty to the best answer/comment beside this one.

## Why don't PHP attributes allow functions?

I'm pretty new to PHP, but I've been programming in similar languages for years. I was flummoxed by the following:

``````class Foo {
public \$path = array(
realpath(".")
);
}
``````

It produced a syntax error: `Parse error: syntax error, unexpected '(', expecting ')' in test.php on line 5` which is the `realpath` call.

But this works fine:

``````\$path = array(
realpath(".")
);
``````

After banging my head against this for a while, I was told you can't call functions in an attribute default; you have to do it in `__construct`. My question is: why?! Is this a "feature" or sloppy implementation? What's the rationale?

The compiler code suggests that this is by design, though I don't know what the official reasoning behind that is. I'm also not sure how much effort it would take to reliably implement this functionality, but there are definitely some limitations in the way that things are currently done.

Though my knowledge of the PHP compiler isn't extensive, I'm going try and illustrate what I believe goes on so that you can see where there is an issue. Your code sample makes a good candidate for this process, so we'll be using that:

``````class Foo {
public \$path = array(
realpath(".")
);
}
``````

As you're well aware, this causes a syntax error. This is a result of the PHP grammar, which makes the following relevant definition:

``````class_variable_declaration:
//...
| T_VARIABLE '=' static_scalar //...
;
``````

So, when defining the values of variables such as `\$path`, the expected value must match the definition of a static scalar. Unsurprisingly, this is somewhat of a misnomer given that the definition of a static scalar also includes array types whose values are also static scalars:

``````static_scalar: /* compile-time evaluated scalars */
//...
| T_ARRAY '(' static_array_pair_list ')' // ...
//...
;
``````

Let's assume for a second that the grammar was different, and the noted line in the class variable delcaration rule looked something more like the following which would match your code sample (despite breaking otherwise valid assignments):

``````class_variable_declaration:
//...
| T_VARIABLE '=' T_ARRAY '(' array_pair_list ')' // ...
;
``````

After recompiling PHP, the sample script would no longer fail with that syntax error. Instead, it would fail with the compile time error "Invalid binding type". Since the code is now valid based on the grammar, this indicates that there actually is something specific in the design of the compiler that's causing trouble. To figure out what that is, let's revert to the original grammar for a moment and imagine that the code sample had a valid assignment of `\$path = array( 2 );`.

Using the grammar as a guide, it's possible to walk through the actions invoked in the compiler code when parsing this code sample. I've left some less important parts out, but the process looks something like this:

``````// ...
// Begins the class declaration
zend_do_begin_class_declaration(znode, "Foo", znode);
// Set some modifiers on the current znode...
// ...
// Create the array
array_init(znode);
// Add the value we specified
// Declare the property as a member of the class
zend_do_declare_property('\$path', znode);
// End the class declaration
zend_do_end_class_declaration(znode, "Foo");
// ...
zend_do_early_binding();
// ...
zend_do_end_compilation();
``````

While the compiler does a lot in these various methods, it's important to note a few things.

1. A call to `zend_do_begin_class_declaration()` results in a call to `get_next_op()`. This means that it adds a new opcode to the current opcode array.
2. `array_init()` and `zend_do_add_static_array_element()` do not generate new opcodes. Instead, the array is immediately created and added to the current class' properties table. Method declarations work in a similar way, via a special case in `zend_do_begin_function_declaration()`.
3. `zend_do_early_binding()` consumes the last opcode on the current opcode array, checking for one of the following types before setting it to a NOP:
• ZEND_DECLARE_FUNCTION
• ZEND_DECLARE_CLASS
• ZEND_DECLARE_INHERITED_CLASS
• ZEND_VERIFY_ABSTRACT_CLASS

Note that in the last case, if the opcode type is not one of the expected types, an error is thrown – The "Invalid binding type" error. From this, we can tell that allowing the non-static values to be assigned somehow causes the last opcode to be something other than expected. So, what happens when we use a non-static array with the modified grammar?

Instead of calling `array_init()`, the compiler prepares the arguments and calls `zend_do_init_array()`. This in turn calls `get_next_op()` and adds a new INIT_ARRAY opcode, producing something like the following:

``````DECLARE_CLASS   'Foo'
SEND_VAL        '.'
DO_FCALL        'realpath'
INIT_ARRAY
``````

Herein lies the root of the problem. By adding these opcodes, `zend_do_early_binding()` gets an unexpected input and throws an exception. As the process of early binding class and function definitions seems fairly integral to the PHP compilation process, it can't just be ignored (though the DECLARE_CLASS production/consumption is kind of messy). Likewise, it's not practical to try and evaluate these additional opcodes inline (you can't be sure that a given function or class has been resolved yet), so there's no way to avoid generating the opcodes.

A potential solution would be to build a new opcode array that was scoped to the class variable declaration, similar to how method definitions are handled. The problem with doing that is deciding when to evaluate such a run-once sequence. Would it be done when the file containing the class is loaded, when the property is first accessed, or when an object of that type is constructed?

As you've pointed out, other dynamic languages have found a way to handle this scenario, so it's not impossible to make that decision and get it to work. From what I can tell though, doing so in the case of PHP wouldn't be a one-line fix, and the language designers seem to have decided that it wasn't something worth including at this point.

## What are some of the pitfalls/tips one could give for developing a web service

Looking to develop a web service (api) in PHP to offer customers an easier way to integrate with our platform. There are workflow calls that will be validated with user/pass as well as some reporting options.

Sorry I can't post more details or code on the subject and I have never developed a web service but have had experience in using them via SOAP.

Now I would also need to offer a state or status of the workflow and I think REST would be the best choice here, but still looking for opinions on that.

For reporting I would like to offer different options such as XML,Excel/CSV any reason I would pick one over the other?

What are some of the pitfalls I should lookout for?

What are some gems anyone could offer.

Thanks in advance to any help as this is very important for me to understand.

UPDATE #1:

• What would be the most secure method?
• What is the most flexible method (Platform independent)

UPDATE #2: a little bit about the data flow. Each user has creds to use the API and no data is shared between users. Usage is submit a request, the request is processed and a return is given. no updates. (Think Google) a search request is made and results are given, but in my case only one result is given. Don't know if this is needed so it's an FYI.

``````Always handle errors and exceptions.
``````

Problems will always make their presence felt in the application/api. Either at start or through further development. Don't leave this as an end task, and make it clear when an error occurs, with well documented response messages.

Also if your service will handle many requests, and for the same resource id (independent from user) the same resource is returned be sure to cache the information. And this not only for performance reasons, but for the cases when errors stuck up. This ways you can at least serve something to the client (possibly useful, more context required to be explicit).

I have a controller and action which I'm accessing through a custom URL. The original route is still accessible though at the default location

``````zend.com/controller/action
``````

How can I change this to simulate a "Page not found" when the user tries to access this URL? Is it possible?

If the action handler is used to respond to both URLs, you would first have to detect which URL is being requested (using \$this->_request->getRequestUri()). If the default URL is detected I think the easiest way to create a "page not found" would be to use

``````\$this->_redirect("/path/to/simulated/404/page")
``````

and set up a controller and action to respond.

This won't actually send an HTTP 404, though. To do that, I think you would have to raise an exception within your action handler. I don't know what the official "zendy" way of doing this is, but this seems to work:

``````throw new Zend_Controller_Action_Exception('Not Found', 404);
``````

## In regards to for(), why use i++ rather than ++i?

Perhaps it doesn't matter to the compiler once it optimizes, but in C/C++, I see most people make a for loop in the form of:

``````for (i = 0; i < arr.length; i++)
``````

where the incrementing is done with the post fix ++. I get the difference between the two forms. i++ returns the current value of i, but then adds 1 to i on the quiet. ++i first adds 1 to i, and returns the new value (being 1 more than i was).

I would think that i++ takes a little more work, since a previous value needs to be stored in addition to a next value: Push *(&i) to stack (or load to register); increment *(&i). Versus ++i: Increment *(&i); then use *(&i) as needed.

(I get that the "Increment *(&i)" operation may involve a register load, depending on CPU design. In which case, i++ would need either another register or a stack push.)

Anyway, at what point, and why, did i++ become more fashionable?

I'm inclined to believe azheglov: It's a pedagogic thing, and since most of us do C/C++ on a Window or *nix system where the compilers are of high quality, nobody gets hurt.

If you're using a low quality compiler or an interpreted environment, you may need to be sensitive to this. Certainly, if you're doing advanced C++ or device driver or embedded work, hopefully you're well seasoned enough for this to be not a big deal at all. (Do dogs have Buddah-nature? Who really needs to know?)

My theory (why i++ is more fashionable) is that when people learn C (or C++) they eventually learn to code iterations like this:

``````while( *p++ ) {
...
}
``````

Note that the post-fix form is important here (using the infix form would create a one-off type of bug).

When the time comes to write a `for` loop where `++i` or `i++` doesn't really matter, it may feel more natural to use the postfix form.

ADDED: What I wrote above applies to primitive types, really. When coding something with primitive types, you tend to do things quickly and do what comes naturally. That's the important caveat that I need to attach to my theory.

If `++` is an overloaded operator on a C++ class (the possibility Rich K. suggested in the comments) then of course you need to code loops involving such classes with extreme care as opposed to doing simple things that come naturally.

## Preventing form resubmission

2 pages. Page one contains an HTML form. Page two - the code that handles the form's data.

The form in page one is submitted and the browser is redirected to page two. Page two handles the data. Now if page two is refreshed an alert titled "Confirm Form Resubmission" pops up.

How do I prevent that?

There are 2 approaches people used to take here:

Method 1: Use AJAX + Redirect

This way you post your form in the background using JQuery or something similar to Page2, while the user still sees page1 displayed. Upon successful posting, you redirect the browser to Page2.

Method 2: Post + Redirect to self

This is a common technique on forums. Form on Page1 posts the data to Page2, Page2 processes the data and does what needs to be done, and then it does a HTTP redirect on itself. This way the last "action" the browser remembers is a simple GET on page2, so the form is not being resubmitted upon F5.

## How unique is PHP's __autoload()?

PHP's `__autoload()` (documentation) is pretty interesting to me. Here's how it works:

• You try to use a class, like `new Toast_Mitten()`(footnote1)
• The class hasn't been loaded into memory. PHP pulls back its fist to sock you with an error.
• It pauses. "Wait," it says. "There's an `__autoload()` function defined." It runs it.
• In that function, you have somehow mapped the string `Toast_Mitten` to `classes/toast_mitten.php` and told it to require that file. It does.
• Now the class is in memory and your program keeps running.

Memory benefit: you only load the classes you need. Terseness benefit: you can stop including so many files everywhere and just include your autoloader.

Things get particularly interesting if

1) Your `__autoload()` has an automatic way of determining the file path and name from the class name. For instance, maybe all your classes are in `classes/` and `Toast_Mitten` will be in `classes/toast_mitten.php`. Or maybe you name classes like `Animal_Mammal_Weasel`, which will be in `classes/animal/mammal/animal_mammal_weasel.php`.

2) You use a factory method to get instances of your class.

``````\$Mitten = Mitten::factory('toast');
``````

The Mitten::factory method can say to itself, "let's see, do I have a subclass called `Toast_Mitten()`? If so, I'll return that; if not, I'll just return a generic instance of myself - a standard mitten. Oh, look! `__autoload()` tells me there is a special class for toast. OK, here's an instance!"

Therefore, you can start out using a generic mitten throughout your code, and when the day comes that you need special behavior for toast, you just create that class and bam! - your code is using it.

My question is twofold:

• (Fact) Do other languages have similar constructs? I see that Ruby has an autoload, but it seems that you have to specify in a given script which classes you expect to use it on.
• (Opinion) Is this too magical? If your favorite language doesn't do this, do you think, "hey nifty, we should have that" or "man I'm glad Language X isn't that sloppy?"

1 My apologies to non-native English speakers. This is a small joke. There is no such thing as a "toast mitten," as far as I know. If there were, it would be a mitten for picking up hot toast. Perhaps you have toast mittens in your own country?

Both Ruby and PHP get it from AUTOLOAD in Perl.

Note that the AutoLoader module is a set of helpers for common tasks using the AUTOLOAD functionality.

## Ensure a PHP script is only ever run as a cron job?

How can I ensure a user can not run a PHP script and that it is only ever run as part of a cron job?

You can set an environment variable in your crontab. A line like `IS_CRON=1` can be placed at the beginning of your crontab, then check in your php program for `get_env("IS_CRON") == 1`.

Of course, you should also use file permissions as they're not so easily bypassed. If this is run as part of root's cron, `chown root:root yourscript.php` and `chown 700 yourscript.php`.

As ircmaxell says, it'd be better to run as a user other than root assuming you don't need root permissions for what you're doing. I was just taking a guess about your setup.

## what aspect to develop first in a website?

I'v just started learning different languages in web development and I believe the best way to improve is to think of an idea and develop it. (Feel free to correct me if i'm wrong).

My question is what aspect do I develop first. If I am looking to build a simple script which is styled and comprises of HTML, PHP and CSS, which part of the website do I design first?

I was thinking HTML > PHP > CSS however once I complete the css I would have to edit all the tags to include the css classes/id's. Is that the way it's done?

PS - Also I started off web development by learning the basics as well as a few intermediate aspects of all the three above mentioned languages and now Im looking to design/clone scripts I've come across to get better at them. Is this approach correct?

Thanks.

The order doesn't really matter.

Deal with the data and the UI separately (following the MVC pattern will help you do this), and modify each of them as needed.

You'll probably find that the optimal solution is to work on a feature at a time rather then doing the backend of all features and then the frontend of all features (or vice versa).

Within each feature, start on whichever end you have the strongest vision of and let it inform your development of the other.

## PHP: object is NULL right after creation

We have very strange errors occasionally popping up in our php logs: `Trying to get property of non-object.`

This exact error seems to be caused by the access to the member `\$shortName` in the following if statement:

``````class MyLocaleWrapper extends SomeOtherClass {
…
protected static \$system = NULL;
public static function getSystemLocale() {
if (self::\$system === NULL) {
self::\$system = new self();
debug(self::\$system);
self::\$system->rfcName = SYSTEM_LOCALE_RFCNAME;
self::\$system->shortName = strtolower(Locale::getRegion(self::\$system->rfcName));
if (self::\$system->shortName == '') {
self::\$system->shortName = strtolower(self::\$system->rfcName);
}
…

# in another file:
class SomeOtherClass {
…
public function __construct() {
# Some documentation about features that have been
# removed from the constructor, but no real code in here.
return NULL;
}
…

# in yet another file:
MyLocaleWrapper::getSystemLocale();
``````

if I dump `self::\$system` into a log file, I see that it is `NULL` - right after being constructed with the keyword `new`.

The most interesting part is that this file is included in each and every request to our page, so it gets executed ~ 10 times per second. But occasionally it just fails without anyone touching the code (or even the server).

Has anyone else ever experienced such behavior in PHP?

We finally found out we were running into php bug #50027. After setting the php.ini-variable `zend.enable_gc` to false, the error vanished.

## Why is locking such a mess in PHP?

A SO user asked a question to which the answer effectively was "use a locking mechanism".

While researching my answer, I discovered that there seems to be no simple, inter-process-reliable locking mechanism in PHP. flock() has a big fat warning:

On some operating systems flock() is implemented at the process level. When using a multithreaded server API like ISAPI you may not be able to rely on flock() to protect files against other PHP scripts running in parallel threads of the same server instance!

The discussion in this question delves into the issue pretty deeply, but comes up only with rather complex solutions: Using a RAM disk, or Memcache.

The only thing that looks halfway good is mySQL's `GET_LOCK()`.

So my question is: Is this really the way it is? Is there really no simple, straightforward, cross-platform safe locking system in PHP? One that is atomic, and will release the lock if the owner process dies, and doesn't need huge setup efforts?

Disagree with Wernight's answer. Yes the web stuff is very relevant - but the limiting factor is how the OS behaves.

On all the OS supported by PHP there are only 2 choices for file locking - blocking or non-blocking. Ultimately PHP must use the OS file-locking mechanism to avoid conflicts with non-PHP code accessing te same files. If you use blocking locks, then the PHP script may be blocked indefinitely waiting for the lock to be released - not a good scenario for a web application. OTOH if you make a non-blocking lock call and it fails - what do you do next - do you just wait a random amount of time and let all your PHP scripts try to grab the lock?

The only practical way to solve the problem is with a queued lock request which times out - but AFAIK there's no OS natively providing that facility. I've written such code myself - for a dedicated webserver so there was no problem with allowing other programs access, however I expect that it may be possible to extend to a system-wide mandatory locking system using inotify.

## Picking the nearest value from an array reflecting ranges

I have an array that reflects rebate percentages depending on the number of items ordered:

``````\$rebates = array(
1 => 0,
3 => 10,
5 => 25,
10 => 35)
``````

meaning that for one or two items, you get no rebate; for 3+ items you get 10%, for 5+ items 20%, for 10+ 35% and so on.

Is there an elegant, one-line way to get the correct rebate percentage for an arbitrary number of items, say `7`?

Obviously, this can be solved using a simple loop: That's not what I'm looking for. I'm interested whether there is a core array or other function that I don't know of that can do this more elegantly.

I'm going to award the accepted answer a bounty of 200, but apparently, I have to wait 24 hours until I can do that. The question is solved.

Here's another one, again not short at all.

``````\$percent = \$rebates[max(array_intersect(array_keys(\$rebates),range(0,\$items)))];
``````

The idea is basically to get the highest key (`max`) which is somewhere between `0` and `\$items`.

## What is wrong with this MySQL Query?

It's 12:30am and I have been coding for 9 hours straight. I really need to get this project done, but MySQL is messing with my deadline. Could you examine this snippet for me and see if you can find out what is wrong?

PHP/MySQL Query

``````\$q = \$this->db->query("SELECT * FROM bans WHERE ip='".\$ip."'");
``````

Keeps returning the following error...

MYSQL Error [Oct 6th, 2010 11:31pm CDT]
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '* FROM bans WHERE ip='206.53.90.231'' at line 1 (1064)

I do not see anything wrong with the query. I've even tried different methods of including the variable \$ip but with no avail.

EDIT:
Just to add in here, the ip column in my database is a varchar(255).

EDIT 2:
Here is the whole affected code. Keep in mind that this is all in a class. If I'm missing something, let me know.

Line from another Function

``````if(\$this->isBanned(\$_SERVER['REMOTE_ADDR'])===true) { return json_encode(array('error'=>'You are banned from this ShoutBox.')); }
``````

Affected Function

``````function isBanned(\$ip) {
\$q = \$this->db->query("SELECT * FROM bans WHERE ip='".\$ip."'"); \$num = \$this->db->affected_rows;
if(\$num>0) { \$row = \$this->db->fetch_array(\$q); if((\$row['expires'] < time()) && (\$row['expires'] !== 0)) { \$this->unbanUser(\$ip,'internal'); return false; } return true; } return false;
}
``````

unbanUser function

``````function unbanUser(\$ip,\$t='box') {
\$q = \$this->db->query("SELECT * FROM bans WHERE ip='".\$ip."'"); \$num = \$this->db->affected_rows; if(\$num>0) { \$q = \$this->db->query("DELETE * FROM bans WHERE ip='".\$ip."'");
return ((\$t=='box') ? json_encode(array('status'=>'removed')) : true); } else { return ((\$t=='box') ? json_encode(array('error'=>'Unable to locate the user.')) : true); }
}
``````

I think it may be It is your `DELETE` statement which is causing the error.

Remove the `*` after the `DELETE` and it should be fine.

## Need help understanding MySQL injection

SQL injection refers to the act of someone inserting a MySQL statement to be run on your database without your knowledge. Injection usually occurs when you ask a user for input, like their name, and instead of a name they give you a MySQL statement that you will unknowingly run on your database.

I read the whole article but I still have some major issues understand what it is and how can it be done.

In the first example, what will they actually see?

As far as i understood, if I actually echo \$name, the will see all the names because it will always "be true" am I correct?

The other thing I dont understand is whether THE MySQL injection problem is solved with mysql_real_escape_string(), there has to be more to it.

What I really dont get is that mysql_real_escape_string() is made to solve that issue, why isn´t this done automatically, I mean is there a reason you have to add every time mysql_real_escape_string(), is there cases when you should use it and thats why they dont make this automatic?

I hope the question is clear enough, maybe my luck of understanding of the topic makes the question confusing so please ask for any clarification if necessary!

MySQL won't escape automatically, because you build the query string yourself. For example:

``````\$query = 'SELECT * FROM users WHERE name="' . \$name . '"';
``````

You just pass the raw string stored in \$query, which is open to SQL injection. For example, if \$name is [something" OR "1=1] your query string ends up being:

``````\$query = 'SELECT * FROM users WHERE name="something" OR "1=1"
``````

That would return every user from the user table. Which is why you need to escape values. However, if you use PDO, it is done for you if you use the binding functionality. It's a 2 step process, preparing the querying, then "binding" the data/variables to the placeholders. In PDO, your query string would look something like this:

``````\$query = 'SELECT * FROM users WHERE name=":name"';
\$bindings = array('name'=>'something');
prepare(\$query);
execute(\$bindings);
``````

Then, things are automatically escaped for you.

## How to efficiently find the closest locations nearby a given location.

I'm sorry for the vague question title... its going to take a bit of explaining...

Basically I'm making a script where a load of business are loaded into a mySQL database with a latitude and longitude. Then I am supplying that script with a latitude an longitude (of the end user) and the script has to calculate the distance from the supplied lat/long to EACH of the entries it gets from the database and order them in order of nearest to furthest.

I only realistically need about 10 or 20 "nearest" results, but I can't think of anyway to do this other than to get all the results from the database and run the function on each of them and then array sort.

This is what I have already:

``````<?php

function getDistance(\$point1, \$point2){

\$pi          = 3.1415926;

\$distance = (\$radius * \$pi * sqrt(
(\$point1['lat'] - \$point2['lat'])
* (\$point1['lat'] - \$point2['lat'])
+ cos(\$point1['lat'] / \$deg_per_rad)  // Convert these to
* (\$point1['long'] - \$point2['long'])
* (\$point1['long'] - \$point2['long'])
) / 180);

\$distance = round(\$distance,1);
return \$distance;  // Returned using the units used for \$radius.
}

include("../includes/application_top.php");

\$lat = (is_numeric(\$_GET['lat'])) ? \$_GET['lat'] : 0;
\$long = (is_numeric(\$_GET['long'])) ? \$_GET['long'] : 0;

\$startPoint = array("lat"=>\$lat,"long"=>\$long);

\$sql = "SELECT * FROM mellow_listings WHERE active=1";
\$result = mysql_query(\$sql);

while(\$row = mysql_fetch_array(\$result)){
\$thedistance = getDistance(\$startPoint,array("lat"=>\$row['lat'],"long"=>\$row['long']));
\$data[] = array('id' => \$row['id'],
'name' => \$row['name'],
'description' => \$row['description'],
'lat' => \$row['lat'],
'long' => \$row['long'],
'county' => \$row['county'],
'postcode' => strtoupper(\$row['postcode']),
'phone' => \$row['phone'],
'email' => \$row['email'],
'web' => \$row['web'],
'distance' => \$thedistance);
}

\$url .= "q=Off+licence";    // query
\$url .= "&v=1.0";           // version number
\$url .= "&rsz=8";           // number of results
\$url .= "&key=ABQIAAAAtG"
."Pcon1WB3b0oiqER"
."FZ-TRQgsWYVg721Z"
."IDPMPlc4-CwM9Xt"
."FBSTZxHDVqCffQ2"
."W6Lr4bm1_zXeYoQ"; // api key
\$url .= "&sll=".\$lat.",".\$long;

// sendRequest
// note how referer is set manually
\$ch = curl_init();
curl_setopt(\$ch, CURLOPT_URL, \$url);
curl_setopt(\$ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt(\$ch, CURLOPT_REFERER, /* url */);
\$body = curl_exec(\$ch);
curl_close(\$ch);

// now, process the JSON string
\$json = json_decode(\$body, true);

foreach(\$json['responseData']['results'] as \$array){

\$thedistance = getDistance(\$startPoint,array("lat"=>\$array['lat'],"long"=>\$array['lng']));
\$data[] = array('id' => '999',
'name' => \$array['title'],
'description' => '',
'lat' => \$array['lat'],
'long' => \$array['lng'],
'county' => \$array['region'],
'postcode' => '',
'phone' => \$array['phoneNumbers'][0],
'email' => '',
'web' => \$array['url'],
'distance' => \$thedistance);

}

// sort the array
foreach (\$data as \$key => \$row) {
\$id[\$key] = \$row['id'];
\$distance[\$key] = \$row['distance'];
}

array_multisort(\$distance, SORT_ASC, \$data);

echo '<?xml version="1.0" encoding="UTF-8"?>'."\n";
echo '<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">'."\n";
echo '<plist version="1.0">'."\n";
echo '<array>'."\n";

for(\$i = 0; isset(\$distance[\$i]); \$i++){
//echo \$data[\$i]['id']." -> ".\$distance[\$i]."<br />";
echo '<dict>'."\n";
foreach(\$data[\$i] as \$key => \$val){
echo '<key><![CDATA['.\$key.']]></key>'."\n";
echo '<string><![CDATA['.htmlspecialchars_decode(\$val, ENT_QUOTES).']]></string>'."\n";
}
echo '</dict>'."\n";
}

echo '</array>'."\n";
echo '</plist>'."\n";
?>
``````

Now, this runs fast enough with only 2 or 3 businesses in the database, but I'm currently loading 5k businesses into the database and I'm worried that its going to be incredibly slow running this for EACH entry? What do you think?

Its not the kind of data I could cache either, as the likelihood of two users having the same lat/long is liable to be incredibly rare, and therefore wouldn't help.

Thanks for any help and any suggestions. They're all much appreciated.

Option 1: Do the calculation on the database by switching to a database that supports GeoIP.

Option 2: Do the calculation on the database: you're using MySQL, so the following stored procedure should help

``````CREATE FUNCTION distance (latA double, lonA double, latB double, LonB double)
RETURNS double DETERMINISTIC
BEGIN
SET @deltaLat = @RlatA - @RlatB;
SET @deltaLon = @RlonA - @RlonB;
SET @d = SIN(@deltaLat/2) * SIN(@deltaLat/2) +
COS(@RlatA) * COS(@RlatB) * SIN(@deltaLon/2)*SIN(@deltaLon/2);
RETURN 2 * ASIN(SQRT(@d)) * 6371.01;
END//
``````

EDIT

If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP (\$minLat, \$maxLat, \$minLong and \$maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN \$minLat AND \$maxLat AND longitude BETWEEN \$minLong AND \$maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.

FURTHER EDIT (as an explanation for the previous edit)

If you're simply using the SQL statement provided by Jonathon (or a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.

Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.

If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.

Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.

Within PHP, we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.

There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.

## Can I save credit card secret code in the database ?

I need to save the credit card numbers and secret codes of users in the database in plain text ( consensus behind obviously ) for automatic operation made from the server.

Is there some problems ?

What do I need to be aware of?

Most credit card processing agreements that I have seen do not allow you to store the code from the back of the card.

There are other security implications of storing plain text credit card numbers, but storing the code is usually specifically disallowed by your agreement. You will need to read yours to make sure you can do that.

As for storing the credit card number, that is also usually a very bad idea. If your database is compromised, you will be held liable and it could cost you a lot of money.

Unless you have a very good reason to store the credit card number and have a very good team working on security, I would not recommend storing any credit card data.