Best questions in August 2010

A function that returns only true

154 votes

During a code review I performed today for my colleague, I noticed a function that was defined as returning a boolean value, but in practice it returned only true. In a case of failure, this function threw an exception. I pointed it out and advised to change the return value to void (the code is written in C++). I had no doubt that this was wrong and was sure he had just overlooked it. To my complete surprise, the programmer told me that it was intentional for forward compatibility; he told: "What if we decide later that the function should return false and not throw an exception? The code then will just neglect the return values at all." After that we had a pretty heated dispute that ended with both sides staying at the same positions - he refused to change the function.

To have the complete picture, the purpose of the function is to initialize a networking module. A failure in this case is critical - the application halts if the module cannot initialize. But it isn't about returning false or throwing an exception. The man argued that even if the function didn't throw and always succeeded, he would have made it with boolean return value just for that forward compatibility.

Also, I should add, that we don't throw exceptions to return error codes from functions, but use them as a sign of a completely catastrophic failure. That is, there are no try-catch blocks except for the main function. Exceptions in our case are just an alternative to calling exit(errno). From this point of view my colleague was correct - his function always succeeds or completely shuts down the process.

I still believe that I am right and that such a function is a bad and even dangerous coding style, but it made me think, maybe it's just me? Maybe I'm too stuck in my personal dogma disconnected from the reality. What do you think? Is such a function an error, a weird but acceptable style, or a matter of personal preference?

I'd follow (what I consider) the heart of YAGNI — rather than trying to plan for imaginary eventualities, figure out what it's currently required to do, and write it to do that as cleanly (and well in general) as possible.

What to do about a 11000 lines C++ source file?

149 votes

So we have this huge (is 11000 lines huge?) mainmodule.cpp source file in our project and every time I have to touch it I cringe :-)

As this file is so central and large, it keeps accumulating more and more code and I can't think of a good way to make it actually start to shrink.

The file is used and actively changed in several (> 10) maintenance versions of our product and so it is really hard to refactor it. If I were to "simply" split it up, say for a start, into 3 files, then merging back changes from maintenance versions will become a nightmare. And also if you split up a file with such a long and rich history, tracking and checking old changes in the SCC history suddenly becomes a lot harder.

The file basically contains the "main class" (main internal work dispatching and coordination) of our program, so every time a feature is added, it also affects this file and every time it grows. :-(

What would you do in this situation? Any ideas on how to move new features to a separate source file without messing up the SCC workflow?

(Note on the tools: We use C++ with Visual Studio; We use AccuRev as SCC but I think the type of SCC doesn't really matter here; We use Araxis Merge to do actual comparison and merging of files)

  1. Find some code in the file which is relatively stable (not changing fast, and doesn't vary much between branches) and could stand as an independent unit. Move this into its own file, and for that matter into its own class, in all branches. Because it's stable, this won't cause (many) "awkward" merges that have to be applied to a different file from the one they were originally made on, when you merge the change from one branch to another. Repeat.

  2. Find some code in the file which basically only applies to a small number of branches, and could stand alone. Doesn't matter whether it's changing fast or not, because of the small number of branches. Move this into its own classes and files. Repeat.

So, we've got rid of the code that's the same everywhere, and the code that's specific to certain branches.

This leaves you with a nucleus of badly-managed code - it's needed everywhere, but it's different in every branch (and/or it changes constantly so that some branches are running behind others), and yet it's in a single file that you're unsuccessfully trying to merge between branches. Stop doing that. Branch the file permanently, perhaps by renaming it in each branch. It's not "main" any more, it's "main for configuration X". OK, so you lose the ability to apply the same change to multiple branches by merging, but this is in any case the core of code where merging doesn't work very well. If you're having to manually manage the merges anyway to deal with conflicts, then it's no loss to manually apply them independently on each branch.

I think you're wrong to say that the kind of SCC doesn't matter, because for example git's merging abilities are probably better than the merge tool you're using. So the core problem, "merging is difficult" occurs at different times for different SCCs. However, you're unlikely to be able to change SCCs, so the issue is probably irrelevant.

93 votes

This is not only a question, this is also a call for help.

Since I started my career as a programmer, I always tried to learn from my mistakes. I worked hard to learn best-practices and while I don't consider myself a C++ expert, I still believe I'm not a beginner either.

I was recently hired into a company for C++ development. There I was told that my way to work was "against the rules" and that I would have to change my mind.

Here are the topics I disagree with my hierarchy (their words):

  • "You should not use separate header files for your different classes. One big header file is both easier to read and faster to compile."
  • "Trying to use different headers is counter-productive : use the same super-set of headers everywhere, and enforce the use #pragma hdrstop to hasten compilation"
  • "You may not use Boost or any other library that uses nested directories to organize its files. Our build-machine doesn't work with nested directories. Moreover, you don't need Boost to create great software."

One might think I'm somehow exaggerated things, but the sad truth is that I didn't. That's their actual words.

I believe that having separate files enhance maintainability and code-correctness and can fasten compilation time by the use of the proper includes.

Have you been in a similar situation? What should I do? I feel like it's actually impossible for me to work that way and day after day, my frustration grows.

You have a problem with authority, Mr. Anderson. You believe you are special, that somehow the rules do not apply to you. Obviously, you are mistaken.

Lol mate, you're not an idealist. Your example shows, that you just cannot bend and go along with the company policy. Regardless of who is right or wrong. Just change the job if you cannot bear it.

You seem to know, what the Matrix is. So it's time to take the red pill now and leave it. Change the job, dude.

Cheers.

Best methods to parse HTML with PHP

77 votes

I'm working on a system that requires the parsing of HTML documents under PHP.

My question is simply this:

What's the best method of parsing content for relative information.

When I parse a site I don't want random content I want to find relevant content such as blocks of text, images, links etc. But obviously I don't want header links or footer links.

So is there anyway you can advise me to look at... tips / tricks are also welcome :)

I prefer using one of the native XML extensions, like

If you prefer a 3rd party lib, I'd suggest not to use SimpleHtmlDom, but a lib that actually uses DOM/libxml underneath instead of String Parsing:

You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like

Or use a WebService like

If you want to spend some money, have a look at

Last and least recommended, you can extract data from HTML with Regular Expressions. In general using Regular Expressions on HTML is discouraged. The snippets you will usually find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Once the markup changes, the Regex fails.

You can write more reliable parsers, but writing a complete and reliable custom parser with Regular Expressions is a waste of time when the aforementioned libraries already exist and do a much better and likely faster job on this.

See

49 votes

The goal

Today's Code Golf challenge is to create a regex parser in as few characters as possible.

The syntax

No, I'm not asking you to match Perl-style regular expressions. There's already a very reliable interpreter for those, after all! :-)

Here's all you need to know about regex syntax for this challenge:

  • A term is defined as a single literal character, or a regular expression within grouping parentheses ().
  • The * (asterisk) character represents a Kleene star operation on the previous TERM. This means zero or more of the previous term, concatenated together.
  • The + (plus) character represents a convenient shortcut: a+ is equivalent to aa*, meaning one or more of the previous term.
  • The ? (question mark) character represents zero or one of the previous term.
  • The | (pipe) character represents an alternation, meaning that the REGULAR EXPRESSIONS on either side can be used in the match.
  • All other characters are assumed to be literal. You may assume that all other characters are within [0-9A-Za-z] (i.e., all English alphanumerics).

Or, put another way: */+/? have highest precedence, then concatenation, then alternation. Since alternation has lower precedence than concatenation, its use within a regex without parentheses causes it to be bound to the full regex on each side. * and + and ?, on the other hand, would just apply to the immediately preceding term.

The challenge

Your challenge is to write a program that will compile or interpret a regular expression (as defined above) and then test a number of strings against it.

I'm leaving input up to you. My recommendation would be that the regex should probably come first, and then any number of strings to be tested against it; but if you want to make it last, that's fine. If you want to put everything in command-line arguments or into stdin, or the regex in command-line and the strings in stdin, or whatever, that's fine. Just show a usage example or two.

Output should be true or false, one per line, to reflect whether or not the regex matches.

Notes:

  • I shouldn't need to say this... but don't use any regex libraries in your language! You need to compile or interpret the pattern yourself. (Edit: You may use regex if it's required for splitting or joining strings. You just can't use it to directly solve the problem, e.g., converting the input regex into a language regex and using that.)
  • The regular expression must COMPLETELY match the input string for this challenge. (Equivalently, if you're familiar with Perl-like regex, assume that start- and end-of-string anchoring is in place for all matches)
  • For this challenge, all of the special characters ()*+?| are not expected to occur literally. If one comes up in the input, it is safe to assume that no pattern can match the string in question.
  • Input strings to test should be evaluated in a case-sensitive manner.

The examples

For the examples, I'm assuming everything is done in command-line arguments, regex first. (As I said above, input is up to you.) myregex here represents your invocation of the program.

> myregex easy easy Easy hard
true
false
false

> myregex ab*a aa abba abab b
true
true
false
false

> myregex 0*1|10 1 10 0110 00001
true
true
false
true

> myregex 0*(1|1+0) 1 10 0110 00001
true
true
true
true

> myregex a?b+|(a+b|b+a?)+ abb babab aaa aabba a b
true
true
false
true
false
true

NOTE: Sorry, forgot to make community wiki! :-(

GolfScript - 254 chars

n%([]:B:$:_"()"@*{:I"()*+|?"[{}/]?[{[[0B$,+:B))\;)]_]+}{B)):ß;:B;qß(:ß;}{8q}{[[0ß0$,)]]+}:8{[[0B-1=:ß)]]+:$q}{ß>$ß<\([0+$,+]\++}:q{[[I$,:ß)]]+}]=~:$}/;{n+[0]:3\{:c;;3:1_:3;{,}{)[$=]_*2/{~\.{c={3|:3}*;}{;.1|1,\:1,<{+0}*;}if}/}/;}/;1$,?)"true""false"if n}%

Somewhat straightforwardly, the first loop converts the regex into an NFA, which the second loop runs.

Sun Aug 22 00:58:24 EST 2010 (271→266) changed variable names to remove spaces
Sun Aug 22 01:07:11 EST 2010 (266→265) made [] a variable
Sun Aug 22 07:05:50 EST 2010 (265→259) made null state transitions inline
Sun Aug 22 07:19:21 EST 2010 (259→256) final state made implicit
Mon Feb 7 19:24:19 EST 2011 (256→254) using "()""str"*

$ echo "ab*a aa abba abab b"|tr " " "\n"|golfscript regex.gs
true
true
false
false

$ echo "0*1|10 1 10 0110 00001"|tr " " "\n"|golfscript regex.gs
true
true
false
true

$ echo "0*(1|1+0) 1 10 0110 00001"|tr " " "\n"|golfscript regex.gs
true
true
true
true

$ echo "a?b+|(a+b|b+a?)+ abb babab aaa aabba a b"|tr " " "\n"|golfscript regex.gs
true
true
false
true
false
true

$ echo "((A|B|C)+(a|(bbbbb)|bb|c)+)+ ABCABCaccabbbbbaACBbbb ABCABCaccabbbbbaACBbbbb"|tr " " "\n"|golfscript regex.gs
false
true

Am I undermining the efficiency of StringBuilder?

45 votes

I've started using StringBuilder in preference to straight concatenation, but it seems like it's missing a crucial method. So, I implemented it myself, as an extension:

public void Append(this StringBuilder stringBuilder, params string[] args)
{
    foreach (string arg in args)
        stringBuilder.Append(arg);
}

This turns the following mess:

StringBuilder sb = new StringBuilder();
...
sb.Append(SettingNode);
sb.Append(KeyAttribute);
sb.Append(setting.Name);

Into this:

sb.Append(SettingNode, KeyAttribute, setting.Name);

I could use sb.AppendFormat("{0}{1}{2}",..., but this seems even less preferred, and still harder to read. Is my extension a good method, or does it somehow undermine the benefits of StringBuilder? I'm not trying to prematurely optimize anything, as my method is more about readability than speed, but I'd also like to know I'm not shooting myself in the foot.

I see no problem with your extension. If it works for you it's all good.

I myself prefere:

sb.Append(SettingNode)
  .Append(KeyAttribute)
  .Append(setting.Name);

How to translate between programming languages

41 votes

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

  • This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.

  • I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.

  • I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.

  • From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.


EDIT:

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

I've been building tools (DMS Software Reengineering Toolkit) to do this kind of thing since 1995, supported by a strong team of computer scientists. It provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast, and then you need reliable parsers for langauges with unreliable definitions (PHP is perfect example of this).

There's nothing wrong with you thinking about it or attempting it, but I think you'll find this a much bigger task than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST).

The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the applications you intend to convert are pretty small, well, then that 20% is OK. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate to understand and modify in the context of another 80,000 lines of program you don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice.

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.

I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator. The difference is that with this much machinery, we succeed considerably more often than we fail.

bool operator ++ and --

40 votes

Today while writing some Visual C++ code I have come across something which has surprised me. It seems C++ supports ++ (increment) for bool, but not -- (decrement). It this just a random decision, or there is some reason behind this?

This compiles:

  static HMODULE hMod = NULL;
  static bool once = false;
  if (!once++)
    hMod = LoadLibrary("xxx");

This does not:

  static HMODULE hMod = NULL;
  static bool once = true;
  if (once--)
    hMod = LoadLibrary("xxx");

It comes from the history of using integer values as booleans.

If x is an int, but I am using it as a boolean as per if(x)... then incrementing will mean that whatever its truth value before the operation, it will have a truth-value of true after it (barring overflow).

However, it's not possible to predict the result of -- given knowledge only of the truth value of x, as it could result in false (if the integral value is 1) or true (if the integral value is anything else - notably this includes 0 [false] and 2 or more [true]).

So as a short-hand ++ worked, and -- didn't.

++ is allowed on bools for compatibility with this, but its use is deprecated in the standard.


Edit: This assumes that I only use x as an boolean, meaning that overflow can't happen until I've done ++ often enough to cause an overflow on it's own. Even with char as the type used and CHAR_BITS something low like 5, that's 32 times before this doesn't work any more (that's still argument enough for it being a bad practice, I'm not defending the practice, just explaining why it works) for a 32-bit int we of course would have to use ++ 2^32 times before this is an issue. With -- though it will only result in false if I started with a value of 1 for true, or started with 0 and used ++ precisely once before.

This is different if we start with a value that is just a few below 0. Indeed, in such a case we might want ++ to result in the false value eventually such as in:

int x = -5;
while(++x)
  doSomething(x);

However, this example is treating x as an int everywhere except the conditional, so it's equivalent to:

int x = -5;
while(++x != 0)
  doSomething(x);

Which is different to only using x as a boolean.

Is there "magic" in the STL?

40 votes

Let me start with explaining what I mean with "magic". I will use two examples from Java:

  1. Every class inherits (directly or indirectly) the Object class.
  2. Operator overloading is not supported by Java but the + operator is defined for String objects.

This means that it is impossible to make an implementation of the Object and String classes in pure(*) Java. Now this is what I mean with "magic": to make an implementation of these classes, you will need some special support from the compiler.

What I always liked about C++ is that, as far as I know, there is no such "magic" going on in the STL, i.e. it is possible to implement the STL in pure C++.

Now my question is: is this true? Or are there parts of the STL that cannot be implemented in pure C++ and need some "magic"/special compiler support?


(*) With "pure" I mean without using any class libraries.

in other words, has anything been done to the compiler to allow for a 'special case' the STL needed to work?

No.

It was all implemented as 'pure' C++ code, using the magic of templates.

There has been some work done to compilers to improve the STL (I'm thinking about various optimisations) but otherwise, no, you could write the entire STL if you really wanted. Some people did - STLPort is an implementation that didn't have the backing of any compiler manufacturer.

Why is <br> a tag rather than an HTML entity?

39 votes

Why indeed? Wouldn't something like &br; be more appropriate?

An HTML entity reference is, depending on HTML version either an SGML entity or an XML entity (HTML inherits entities from the underlying technology). Entities are a way of inserting chunks of content defined elsewhere into the document.

All HTML entities are single-character entities, and are hence basically the same as character references (technically they are different to character references, but as there are no multi-character entities defined, the distinction has no impact on HTML).

When an HTML processor sees, for example &mdash; it replaces it with the content of that entity reference with the appropriate entity, based on the section in the DTD that says:

<!ENTITY mdash   CDATA "&#8212;" -- em dash, U+2014 ISOpub -->

So it replaces the entity reference with the entity &#8212; which is in turn a character reference that gets replaced by the character (U+2014). In reality unless you are doing this with a general-purpose XML or SGML processor that doesn't understand HTML directly, this will really be done in one step.

Now, what would we replace your hypothetical &br; with to cause a line-break to happen? We can't do so with a newline character, or even the lesser known U+2028 LINE SEPARATOR (which semantically in plain text has the same meaning as <br/> in HTML), because they are whitespace characters which are not significant in most HTML code, which is something that you should be grateful for as writing HTML would be much harder if we couldn't format for readability within the source code.

What we need is not an entity, but a way to indicate semantically that the rendered content contains a line-break at this point. We also need to not indicate anything else (we can already indicate a line-break by beginning or ending a block element, but that's not what we want). The only reasonable way to do so is to have an element that means exactly that, and so we have the <br/> element, with its related tag being put into the source code.

Why does C++ support memberwise assignment of arrays within structs, but not generally?

36 votes

I understand that memberwise assignment of arrays is not supported, such that the following will not work:

int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"

I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.

However, the following does work:

struct myStruct {int num[3];};
myStruct struct1={{1,2,3}};
myStruct struct2;
struct2 = struct1;

The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.

Why is member-wise assignment of arrays supported for structs, but not in general?

edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.

edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.

Here's my take on it:

The Development of the C Language offers some insight in the evolution of the array type in C:

I'll try to outline the array thing:

C's forerunners B and BCPL had no distinct array type, a declaration like:

auto V[10] (B)
or 
let V = vec 10 (BCPL)

would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:

struct {
    int inumber;
    char name[14];
};

But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.

And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:

auto V[10];
V=V+5;

to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!

/* Example how array assignment void make things even weirder in C/C++, 
   if we don't want to break existing code.
   It's actually better to leave things as they are...
*/
typedef int vec[3];

void f(vec a, vec b) 
{
    vec x,y; 
    a=b; // pointer assignment
    x=y; // NEW! element-wise assignment
    a=x; // pointer assignment
    x=a; // NEW! element-wise assignment
}

This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.

Is it possible for a thread to Deadlock itself?

35 votes

Is it technically possible for a thread in Java to deadlock itself?

I was asked this at an interview a while back and responded that it wasn't possible but the interviewer told me that it is. Unfortunately I wasn't able to get his method on how to acheive this deadlock.

This got me thinking and the only situation that I can think of is where you can have this happen is where you have an RMI server process which contained a method that calls itself. The line of code that calls the method is placed in a synchronized block.

Is that even possible or was the interviewer incorrect?

The source code I was thinking about was along these lines (where testDeadlock is running in an RMI server process)

public boolean testDeadlock () throws RemoteException {
    synchronized (this) {
        //Call testDeadlock via RMI loopback            
    }
}

The JVM only keeps track of the local thread that has the monitor, if the calling class makes an external call back in on itself the incoming call causes the original thread to deadlock itself.

You should be able to run this code to illustrate the idea

import java.rmi.*;
import java.rmi.registry.LocateRegistry;
import java.rmi.registry.Registry;
import java.rmi.server.*;

public class DeadlockThreadExample {

    public static interface DeadlockClass extends Remote {
        public void execute() throws RemoteException;
    }

    public static class DeadlockClassImpl extends UnicastRemoteObject implements DeadlockClass {
        private Object lock = new Object();

        public DeadlockClassImpl() throws RemoteException {
            super();
        }

        public void execute() throws RemoteException {
            try {
                System.out.println("execute()::start");

                synchronized (lock) {
                    System.out.println("execute()::Entered Lock");
                    DeadlockClass deadlockClass = (DeadlockClass) Naming.lookup("rmi://localhost/DeadlockClass");
                    deadlockClass.execute();
                }
                System.out.println("execute()::Exited Lock");
            } catch (NotBoundException e) {
                System.out.println(e.getMessage());
            } catch (java.net.MalformedURLException e) {
                System.out.println(e.getMessage());
            }
            System.out.println("execute()::end");
        }
    }

    public static void main(String[] args) throws Exception {
        LocateRegistry.createRegistry(Registry.REGISTRY_PORT);
        DeadlockClassImpl deadlockClassImpl = new DeadlockClassImpl();
        Naming.rebind("DeadlockClass", deadlockClassImpl);
        DeadlockClass deadlockClass = (DeadlockClass) Naming.lookup("rmi://localhost/DeadlockClass");
        deadlockClass.execute();
        System.exit(0);
    }
}

The output from the program looks like

execute()::start
execute()::Entered Lock
execute()::start

Additionally the thread also dump shows the following

"main" prio=6 tid=0x00037fb8 nid=0xb80 runnable [0x0007f000..0x0007fc3c]
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
    - locked <0x02fdc568> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readByte(DataInputStream.java:241)


"RMI TCP Connection(4)-172.17.23.165" daemon prio=6 tid=0x0ad83d30 nid=0x1590 waiting for monitor entry [0x0b3cf000..0x0b3cfce8]
    at DeadlockThreadExample$DeadlockClassImpl.execute(DeadlockThreadExample.java:24)
    - waiting to lock <0x0300a848> (a java.lang.Object)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


"RMI TCP Connection(2)-172.17.23.165" daemon prio=6 tid=0x0ad74008 nid=0x15f0 runnable [0x0b24f000..0x0b24fbe8] 
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
    - locked <0x02ffb6d8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readByte(DataInputStream.java:241)

which indicates that the thread has indeed managed to lock itself

Looking for a question that combines the understanding of few web technologies

33 votes

I am teaching a web development course at a CS department, I wrote most of the final test by now, each question focus on a specific feature or a specific technology,

I wonder if you can think of/recommend a question that combine the knowledge of few technologies..

The course mostly covers: HTML, CSS, JS, HTTP, Servlets, JSP and JDBC. (as well as AJAX, ORM, basic security issues like SQL-Injection and XSS, HTML5, REST APIs)

EDIT: I will super appreciate questions with answers :-) thanks!

I'll give the bounty to the question with the highest rank, so please vote! I honestly like most of the questions here, thank you all :-)

Explain the relationship of the DOM to each of the following technologies: HTML, CSS, JavaScript.

The goal here is for the answer to make clear the student understands that HTML generates a DOM structure, CSS affects how that structure is rendered, and JavaScript affects how that structure is modified. If you understand how it all ties back into the DOM, all client-side coding becomes straightforward.

What's the graceful way of handling out of memory situations in C/C++?

33 votes

I'm writing an caching app that consumes large amounts of memory.

Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.

If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?

EDIT: Ok perhaps I should clarify the question. If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?

EDIT2: I've just realised that NULL will throw an exception. This could be caught at a higher level so I can perhaps gracefully exit further up. At that point, it may even be possible to recover depending on how much memory is freed. In the least I should by that point hopefully be able to log something. So while I have seen code that checks the value of a pointer after new, it is unnecessary. While in C, you should check the return value for malloc.

Well, if you are in a case where there is a failure to allocate memory, you're going to get a std::bad_alloc exception. The exception causes the stack of your program to be unwound. In all likelihood, the inner loops of your application logic are not going to be handling out of memory conditions, only higher levels of your application should be doing that. Because the stack is getting unwound, a significant chunk of memory is going to be free'd -- which in fact should be almost all the memory used by your program.

The one exception to this is when you ask for a very large (several hundred MB, for example) chunk of memory which cannot be satisfied. When this happens though, there's usually enough smaller chunks of memory remaining which will allow you to gracefully handle the failure.

Stack unwinding is your friend ;)

EDIT: Just realized that the question was also tagged with C -- if that is the case, then you should be having your functions free their internal structures manually when out of memory conditions are found; not to do so is a memory leak.

EDIT2: Example:

#include <iostream>
#include <vector>

void DoStuff()
{
    std::vector<int> data;
    //insert a whole crapload of stuff into data here.
    //Assume std::vector::push_back does the actual throwing
    //i.e. data.resize(SOME_LARGE_VALUE_HERE);
}

int main()
{
    try
    {
        DoStuff();
        return 0;
    }
    catch (const std::bad_alloc& ex)
    {   //Observe that the local variable `data` no longer exists here.
        std::cerr << "Oops. Looks like you need to use a 64 bit system (or "
                     "get a bigger hard disk) for that calculation!";
        return -1;
    }
}

EDIT3: Okay, according to commenters there are systems out there which do not follow the standard in this regard. On the other hand, on such systems, you're going to be SOL in any case, so I don't see why they merit discussion. But if you are on such a platform, it is something to keep in mind.

Mime type for WOFF fonts?

31 votes

What mime type should WOFF fonts be served as?

I am serving truetype (ttf) fonts as font/truetype and opentype (otf) as font/opentype, but I cannot find the correct format for WOFF fonts.

I have tried font/woff, font/webopen, and font/webopentype, but Chrome still complains:

"Resource interpreted as font but transferred with MIME type application/octet-stream."

Anybody know?

In January it was announced that in the meantime, Chromium will recognize "application/x-font-woff" as the mime-type for WOFF. I know this change is now in Chrome beta and if not in stable yet, it shouldn't be too far away.

Long-term learning for a "self-made" PHP developer

30 votes



As some of us have created their public site without an academic background, I ask this question mostly to professionals:

  • Enhance oneself : Non professional may encounter a barrier: it's easy to learn the basics (vars, loops, basic sql manipulation), but there are no "long term" tutorials, which really gives a deep knowledge. I know there is no better path but: what are the steps to become able to develop big(ger) projects alone? Something like: from spaghetti to MVC and OOP. Clicking links gives me knowledge, but not a good "framework".

  • Learn how to learn: Is there a way to learn how to use existing frameworks and pieces of codes ? Besides reading the help files, how do professionals proceed ? I don't want to copy-paste code anymore.

Extra question:

  • Career: Nowadays (not back in the 90's or the early 2000's), can a non-professional consider a developer career ?

Thanks for you replies.

EDIT:

Two great books that really helped me:

Answering the questions in the order of appearance:

  • Learn some higher level design concepts to be able to see a big system clearly.
  • read some docs, use some code, dive deep.
  • Object oriented design concepts
  • Didn't find a perfect one yet :)

edit:

Enhance oneself You dont need long term tutorials. You have an abundant of professional material and you can start with the book's on the answers to the question here. I warmly recommend the first one (Code Complete). You can find good professional material on any other programming subject that interest you. The question is not what but when you'll find time for it all.

Learning how to learn is different for each one. Find what you enjoy the most and get on with it. Read, code and do both as much as you can.

Career: I think a non-professional can be sometimes more professional than some of the professionals I've met, If you enjoy it and find the challenge interesting you can do great things, you can make machines work wonders, and act gracefully and become a professional by attitude and not by some paper on the wall.

Since you are intersted in building a public web project, here you can find some excellent answers

How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

30 votes

How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

In .NET 2.0 it uses the String class internally. String is only immutable outside of the System namespace, so StringBuilder can do that.

In .NET 4.0 String was changed to use char[].

In 2.0 StringBuilder looked like this

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal IntPtr m_currentThread;
    internal int m_MaxCapacity;
    internal volatile string m_StringValue; // HERE ----------------------
    private const string MaxCapacityField = "m_MaxCapacity";
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

But in 4.0 it looks like this:

public sealed class StringBuilder : ISerializable
{
    // Fields
    private const string CapacityField = "Capacity";
    internal const int DefaultCapacity = 0x10;
    internal char[] m_ChunkChars; // HERE --------------------------------
    internal int m_ChunkLength;
    internal int m_ChunkOffset;
    internal StringBuilder m_ChunkPrevious;
    internal int m_MaxCapacity;
    private const string MaxCapacityField = "m_MaxCapacity";
    internal const int MaxChunkSize = 0x1f40;
    private const string StringValueField = "m_StringValue";
    private const string ThreadIDField = "m_currentThread";

So evidently it was changed from using a string to using a char[].

EDIT: Updated answer to reflect changes in .NET 4 (that I only just discovered).

Programming for Android as a blind person.

29 votes

I have a friend who is quite a capable programmer, especially considering that he is blind. Now he would like to start developing for Android. But, the problem I see him running into is that there appears to be no accessibility features for the Android emulator. Ideally he would be able to have his computer read the contents of the Android emulation screen to him. However, at least from what I've seen, the contents of the Android screen and the buttons that can be used to manipulate the emulation Android etc. are all invisible to a screen reader.

Does anyone know of a workaround for this?


UPDATE: I found what looks like a promising resource here. It's a Text-to-Speech library for Android developed by T. V. Raman of Google. I'm still looking for more information from the community though.

A long thread on this can be found at http://www.freelists.org/post/programmingblind/Is-Android-Programming-Accessible What I've gathered from it is that accessibility can be enabled with little to no sighted help. When I tried enabling talkback it made the emulator unusably slow although this was over a year ago so maybe things have gotten better? I'm a blind programmer and know Eclipse is accessible with Jaws so he should be able to program with either an IDE or command line and a text editor. I haven't researched this but if the emulator is slow maybe another option would be to run an x86 build of Android in VMWare player? A screen reader written by google employees can be found at http://google-opensource.blogspot.com/2009/10/talkback-open-source-screenreader-for.html and one written by someone else can be found at http://spielproject.info/

Design Patterns web based applications

29 votes

I am designing a simple web based application. I am new to this web based domain.I needed your advice regarding the design patterns like how responsibility should be distributed among Servlets, criteria to make new Servlet, etc.

Actually I have few entities on my home page and corresponding to each one of them we have few options like add, edit and delete. Earlier I was using one Servlet per options like Servlet1 for add entity1, Servlet2 for edit entity1 and so on and in this way we ended up having a large number of servlets.

Now we are changing our design. My question is how you exactly choose how you choose the responsibility of a servlet. Should we have one Servlet per entity which will process all it's options and forward request to service layer.Or should we have one servlet for the whole page which will process the whole page request and then forward it to corresponding service layer.Also should the request object forwarded to service layer or not.

Please you guide us in choosing the best design.Also any pointer to a good design pattern material will be welcome.

A bit decent webapplication exist of a mix of design patterns. I'll mention only the most important ones.


Model View Controller pattern

The core (architectural) design pattern you'd like to use is the Model-View-Controller pattern. The Controller is to be represented by a Servlet which (in)directly creates/uses a specific Model and View based on the request. The Model is to be represented by Javabean classes. This is often further dividable in Business Model which contains the actions (behaviour) and Data Model which contains the data (information). The View is to be represented by JSP files which have direct access to the (Data) Model by EL (Expression Language).

Then there are variations based on how actions and events are handled. The popular ones are:

  • Request (action) based MVC: this is the simplest to implement. The (Business) Model works directly with HttpServletRequest and HttpServletResponse objects. You have to gather, convert and validate the request parameters (mostly) yourself. The View can be represented by plain vanilla HTML/CSS/JS and it does not maintain state across requests. This is how among others Spring MVC, Struts and Stripes works.

  • Component based MVC: this is harder to implement. But you end up with a simpler model and view wherein all the "raw" Servlet API is abstracted completely away. You shouldn't have the need to gather, convert and validate the request parameters yourself. The Controller does this task and sets the gathered, converted and validated request parameters in the Model. All you need to do is to define action methods which works directly with the model properties. The View is represented by "components" in flavor of JSP taglibs or XML elements which in turn generates HTML/CSS/JS. The state of the View for the subsequent requests is maintained in the session. This is particularly helpful for server-side conversion, validation and value change events. This is how among others JSF, Wicket and Play! works.

As a side note, I warmly recommend to pick an existing framework rather than reinventing your own. Learning an existing and well-developed framework takes in long term less time than developing and maintaining a robust framework yourself. From the mentioned ones I personally recommend JSF 2.0.

In the below detailed explanation I'll restrict myself to request based MVC since that's easier to implement.


Front Controller pattern (Mediator pattern)

First, the Controller part should implement the Front Controller pattern (which is a specialized kind of Mediator pattern). It should exist of only a single servlet which provides a centralized entry point of all requests. It should create the Model based on information available by the request, such as the pathinfo or servletpath, the method and/or specific parameters. The Business Model is called Action in the below HttpServlet example.

protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    try {
        Action action = ActionFactory.getAction(request);
        String view = action.execute(request, response);
        if (view.equals(request.getPathInfo().substring(1)) {
            request.getRequestDispatcher("/WEB-INF/" + view + ".jsp").forward(request, response);
        } else {
            response.sendRedirect(view); // We'd like to fire redirect in case of a view change as result of the action (PRG pattern).
        }
    } catch (Exception e) {
        throw new ServletException("Executing action failed.", e);
    }
}

Executing the action should return some identifier to locate the view. Simplest would be to use it as filename of the JSP. Map this servlet on a specific url-pattern in web.xml, e.g. /pages/*, *.do or even just *.html.

In case of prefix-patterns as for example /pages/* you could then invoke URL's like http://example.com/pages/register, http://example.com/pages/login, etc and provide /WEB-INF/register.jsp, /WEB-INF/login.jsp with the appropriate GET and POST actions. The parts register, login, etc are then available by request.getPathInfo() as in above example.

When you're using suffix-patterns like *.do, *.html, etc, then you could then invoke URL's like http://example.com/register.do, http://example.com/login.do, etc and you should change the code examples in this answer (also the ActionFactory) to extract the register and login parts by request.getServletPath() instead.


Strategy pattern

The Action should follow the Stategy pattern. It needs to be definied as an abstract/interface type which should do the work based on the passed-in arguments of the abstract method (this is the difference with the Command pattern, wherein the abstract/interface type should do the work based on the arguments which are been passed-in during the creation of the implementation).

public interface Action {
    public String execute(HttpServletRequest request, HttpServletResponse response) throws Exception;
}

You may want to make the Exception more specific with a custom exception like ActionException. It's just a basic kickoff example, the remnant is all up to you.

Here's an example of a LoginAction which (as its name says) logs in the user. The User itself is in turn a Data Model. The View is aware of the presence of the User.

public class LoginAction implements Action {
    public String execute(HttpServletRequest request, HttpServletResponse response) throws Exception {
        String username = request.getParameter("username");
        String password = request.getParameter("password");
        User user = userDAO.find(username, password);
        if (user != null) {
            request.getSession().setAttribute("user", user); // Login user.
            return "home"; // Redirect to home page.
        } else {
            request.setAttribute("error", "Unknown username/password. Please retry."); // Store error message in request scope.
            return "login"; // Go back to redisplay login form with error.
        }
    }
}

Abstract Factory pattern

The ActionFactory should follow the Abstract Factory pattern. Basically, it should provide a creational method which returns an abstract/interface type. In this case, it should return an implementation of the Action interface based on the information provided by the request. For example, the method and pathinfo (the pathinfo is the part after the context and servlet path in the request URL, excluding the query stirng).

public static Action getAction(HttpServletRequest request) {
    return actions.get(request.getMethod() + request.getPathInfo());
}

The actions in turn should be some static/applicationwide Map<String, Action> which holds all known actions. It's up to you how to fill this map. Hardcoding:

actions.put("POST/register", new RegisterAction());
actions.put("POST/login", new LoginAction());
actions.put("GET/logout", new LogoutAction());
// ...

Or configureable based on a properties/XML configuration file in the classpath: (pseudo)

for (Entry entry : configuration) {
    actions.put(entry.getKey(), Class.forName(entry.getValue()).newInstance());
}

Or dynamically based on a scan in the classpath for classes implementing a certain interface and/or annotation: (pseudo)

for (ClassFile classFile : classpath) {
    if (classFile.isInstanceOf(Action.class)) {
       actions.put(classFile.getAnnotation("mapping"), classFile.newInstance());
    }
}

Keep in mind to create a "do nothing" Action for the case there's no mapping. Let it for example return directly the request.getPathInfo().substring(1) then.


Another patterns

That were the important patterns as far.

To get a step further, you could use the Facade pattern to create a Context class which in turn wraps the request and response objects and offers several convenience methods delegating to the request and response objects and pass that as argument into the Action#execute() method instead. This adds an extra abstract layer to hide the raw Servlet API away. You should then basically end up with zero import javax.servlet.* declarations in every Action implementation. In JSF terms, this is what the FacesContext and ExternalContext classes are doing.

Then there's the State pattern for the case that you'd like to add an extra abstraction layer to split the tasks of gathering the request parameters, converting them, validating them, updating the model values and execute the actions. In JSF terms, this is what the LifeCycle is doing.

Then there's the Composite pattern for the case that you'd like to create a component based view which can be attached with the model and whose behaviour depends on the state of the request based lifecycle. In JSF terms, this is what the UIComponent represent.

This way you can evolve bit by bit towards a component based framework.


Related questions/answers

Why shouldn't I use immutable POJOs instead of JavaBeans?

28 votes

I have implemented a few Java applications now, only desktop applications so far. I prefer to use immutable objects for passing the data around in the application instead of using objects with mutators (setters and getters), also called JavaBeans.

But in the Java world, it seems to be much more common to use JavaBeans, and I can't understand why I should use them instead. Personally the code looks better if it only deals with immutable objects instead of mutate the state all the time.

Immutable objects are also recommended in Item 15: Minimize mutability, Effective Java 2ed.

If I have an object Person implemented as a JavaBean it would look like:

public class Person {
    private String name;
    private Place birthPlace;

    public Person() {}

    public setName(String name) {
        this.name = name;
    }

    public setBirthPlace(Place birthPlace) {
        this.birthPlace = birthPlace;
    }

    public String getName() {
        return name;
    }

    public Place getBirthPlace() {
        return birthPlace;
    }
}

And the same Person implemented as an immutable object:

public class Person {
    private final String name;
    private final Place birthPlace;

    public Person(String name, Place birthPlace) {
        this.name = name;
        this.birthPlace = birthPlace;
    }

    public String getName() {
        return name;
    }

    public Place getBirthPlace() {
        return birthPlace;
    }
}

Or closer to an struct in C:

public class Person {
    public final String name;
    public final Place birthPlace;

    public Person(String name, Place birthPlace) {
        this.name = name;
        this.birthPlace = birthPlace;
    }
}

I could also have getters in the immutable object to hide the implementation details. But since I only use it as a struct I prefer to skip the "getters", and keep it simple.

Simply, I don't understand why it's better to use JavaBeans, or if I can and should keep going with my immutable POJOs?

Many of the Java libraries seem to have better support for JavaBeans, but maybe more support for immutable POJOs gets more popular over time?

Prefer JavaBeans When

  • you have to interact with environments that expect them
  • you have lots of properties for which it would be inconvenient to do all initialization on instantiation
  • you have state that is expensive or impossible to copy for some reason but requires mutation
  • you think at some point you may have to change how properties are accessed (e.g. moving from stored to calculated values, access authorization, etc.)
  • you want to conform to coding standards that mindlessly insist it is somehow more "object-oriented" to use JavaBeans

Prefer Immutable POJOs When

  • you have a small number of simple properties
  • you do not have to interact with environments assuming JavaBean conventions
  • it is easy (or at the very least possible) to copy state when cloning your object
  • you don't ever plan on cloning the object at all
  • you're pretty sure that you don't ever have to modify how properties are accessed as above
  • you don't mind listening to whining (or sneering) about how your code isn't sufficiently "object-oriented"