Best java questions in October 2010

How to determine if a number is positive or negative in Java?

36 votes

I was asked this question in Amazon Chennai(India) interview , to determine whether an number is positive or negative. The rules are that , we should not use conditional operators such as <, and >, built in java functions (like substring, indexOf, charAt, and startsWith), no regex, or API's. I did some homework on this and the code is given below, but it only works for integer type. But they asked me to write a generic code that works for float, double, and long.

 // This might not be better way!!

 S.O.P ((( number >> 31 ) & 1) == 1 ? "- ve number " : "+ve number );

any ideas from your side.Thanks.

Updates:

OMG!!! looking at the answers, i think the interviewer thought me i was upto something.

Thank you all for providing an excellent answers(Especially Nanda,Itzwarky,Nabb and Strilanc) and spending your precious time of yours.

The integer cases are easy. The double case is trickier, until you remember about infinities.

Note: If you consider the double constants "part of the api", you can replace them with overflowing expressions like 1E308 * 2.

int sign(int i) {
    if (i == 0) return 0;
    if (i >> 31 != 0) return -1
    return +1;
}
int sign(long i) {
    if (i == 0) return 0;
    if (i >> 63 != 0) return -1
    return +1;
}
public static int sign(double f) {
    if (f != f) throw new IllegalArgumentException("NaN");
    if (f == 0) return 0;
    f *= Double.POSITIVE_INFINITY;
    if (f == Double.POSITIVE_INFINITY) return +1;
    if (f == Double.NEGATIVE_INFINITY) return -1;

    //this should never be reached, but I've been wrong before...
    throw new IllegalArgumentException("Unfathomed double");
}

27 votes

I have the following loop:

for (byte i = 0 ; i < 128; i++) {
    System.out.println(i + 1 + " " + name);
}

When I execute my programm it prints all numbers from -128 to 127 in an infinite loop. Why does this happen?

byte is a 1-byte type so can vary between -128...127, so condition i < 128 is always true. When you add 1 to 127 it overflows and becomes -128 and so on in a (infinite) loop...

Can you programmatically detect white noise?

22 votes

The Dell Streak has been discovered to have an FM radio which has very crude controls. 'Scanning' is unavailable by default, so my question is does anyone know how, using Java on Android, one might 'listen' to the FM radio as we iterate up through the frequency range detecting white noise (or a good signal) so as to act much like a normal radio's seek function?

I have done some practical work on this specific area, i would recommend (if you have a little time for it) to try just a little experimentation before resorting to fft'ing. The pcm stream can be interpreted very complexely and subtly (as per high quality filtering and resampling) but can also be practically treated for many purposes as the path of a wiggly line.

White noise is unpredictable shaking of the line, which is never-the-less quite continuous in intensity (rms, absolute mean..) Acoustic content is recurrent wiggling and occasional surprises (jumps, leaps) :]

Non-noise like content of a signal may be estimated by performing quick calculations on a running window of the pcm stream.

For example, noise will strongly tend to have a higher value for the absolute integral of its derivative, than non-noise. I think that is the academic way of saying this:

loop(n+1 to n.length)
{ sumd0+= abs(pcm[n]); 
  sumd1+= abs(pcm[n]-pcm[n-1]); 
}

wNoiseRatio = ?0.8; //quite easily discovered, bit tricky to calculate.

if((sumd1/sumd0)<wNoiseRatio)
{ /*not like noise*/ }

Also, the running absolute average over ~16 to ~30 samples of white noise will tend to vary less, over white noise than acoustic signal:

loop(n+24 to n.length-16)
{ runAbsAve1 += abs(pcm[n]) - abs(pcm[n-24]); }

loop(n+24+16 to n.length)
{ runAbsAve2 += abs(pcm[n]) - abs(pcm[n-24]); }

unusualDif= 5; //a factor. tighter values for longer measures.

if(abs(runAbsAve1-runAbsAve2)>(runAbsAve1+runAbsAve2)/(2*unusualDif))
{ /*not like noise*/ }

This concerns how white noise tends to be non-sporadic over large enough span to average out its entropy. Acoustic content is sporadic (localised power) and recurrent (repetitive power). The simple test reacts to acoustic content with lower frequencies and could be drowned out by high frequency content. There are simple to apply lowpass filters which could help (and no doubt other adaptions).

Also, the root mean square can be divided by the mean absolute sum providing another ratio which should be particular to white noise, though i cant figure what it is right now. The ratio will also differ for the signals derivatives as well.

I think of these as being simple formulaic signatures of noise. I'm sure there are more.. Sorry to not be more specific, it is fuzzy and imprecise advice, but so is performing simple tests on the output of an fft. For better explaination and more ideas perhaps check out statistical and stochastic(?) measurements of entropy and randomness on wikipedia etc.

Migrating Java to Scala

22 votes

What are the most important points to be aware of, and the workarounds, when gradually migrating an existing Java codebase to Scala? With a (potentially very long) intermediate phase where both languages are in use.

The sort of things I'm thinking about are:

  • different collection hierarchies
  • Java constructs that Scala can't handle well
  • Scala constructs that are impractical to use in Java
  • build tools
  • compilation order
  • immutability support in frameworks
  • etc.

Scala doesn't like:

  • inner Java classes
  • static methods and variables (especially in super classes)
  • raw types

Java doesn't like:

  • Scala objects traits
  • closures
  • actors (except Scarlett Johansson and Akka Actors since they have a Java API)
  • implicits, especially Manifests
  • advanced type constructs (higher kinded types, structural types, abstract type vars)

What is an example (in code) of a O(n!) function? It should take appropriate number of operations to run in reference to n; that is, I'm asking about time complexity.

There you go. This is probably the most trivial example of a function that runs in O(n!) time (where n is the argument to the function):

void nFac(int n) {
  for(int i=0; i<n; i++) {
    nFac(n-1);
  }
}

Obsolete Java Optimization Tips

21 votes

There are number of performance tips made obsolete by Java compiler and especially Profile-guided optimization. For example, these platform-provided optimizations can drastically (according to sources) reduces the cost of virtual function calls. VM is also capable of method inlining, loop unrolling etc.

What are other performance optimization techniques you came around still being applied but are actually made obsolete by optimization mechanisms found in more modern JVMs?

The final modifier on methods and method parameters doesn't help with the performance at all.

Also, the Java HotSpot wiki gives a good overview of the optimizations used by HotSpot and how to efficiently use them in Java code.

Why do Strings start with a "" in Java?

20 votes

Possible Duplicate:
Why does “abcd”.StartsWith(“”) return true?

Whilst debugging through some code I found a particular piece of my validation was using the .startsWith() method on the String class to check if a String started with a blank character

Considering the following :

public static void main(String args[])
{

    String s = "Hello";
    if (s.startsWith(""))
    {
        System.out.println("It does");
    }

}

It prints out It does

My question is, why do Strings start off with a blank character? I'm presuming that under the hood Strings are essentially character arrays, but in this case I would have thought the first character would be H

Can anyone explain please?

"" is an empty string containing no characters. There is no "empty character", unless you mean a space or the null character, neither of which are empty strings.

You can think of a string as starting with an infinite number of empty strings, just like you can think of a number as starting with an infinite number of leading zeros without any change to the meaning.

1 = ...00001
"foo" = ... + "" + "" + "" + "foo"

Strings also end with an infinite number of empty strings (as do decimal numbers with zeros):

1 = 001.000000...
"foo" = "foo" + "" + "" + "" + ...

Booleans, conditional operators and autoboxing

20 votes

Why does this throw NPE

public static void main(String[] args) throws Exception {
    Boolean b = true ? returnsNull() : false; // NPE on this line.
    System.out.println(b);
}

public static Boolean returnsNull() {
    return null;
}

while this doesn't

public static void main(String[] args) throws Exception {
    Boolean b = true ? null : false;
    System.out.println(b); // null
}

?

The solution is by the way to replace false by Boolean.FALSE to avoid null being unboxed to boolean --which isn't possible. But that isn't the question. The question is why? Are there any references in JLS which confirms this behaviour, especially of the 2nd case?

The difference is static typing of the expressions at compile time:

E1: `true ? returnsNull() : false` - boolean (auto-unboxing 2nd op to boolean)

E2: `true ? null : false` - Boolean (autoboxing of 3rd param to Boolean)

See Java Language Specification, section 15.25 Conditional Operator ? :

  • For E1, this clause applies:

    If one of the second and third operands is of type boolean and the type of the other is of type Boolean, then the type of the conditional expression is boolean.

    The compiler inserts auto-unboxing code to the 2nd operand (return value of returnsNull()) to make it type boolean. This of course causes the NPE from the null returned at run-time.

  • For E2, no specific typing clause applies (go read 'em!), so the final "otherwise" clause applies:

    Otherwise, the second and third operands are of types S1 and S2 respectively. Let T1 be the type that results from applying boxing conversion to S1, and let T2 be the type that results from applying boxing conversion to S2. The type of the conditional expression is the result of applying capture conversion (§5.1.10) to lub(T1, T2) (§15.12.2.7).

    The compiler inserts auto-boxing code for the 3rd operand (false). The 2nd operand is already Boolean, so no auto-boxing NPE.


This question needs a similar type analysis:

http://stackoverflow.com/questions/2615498/java-conditional-operator-result-type

Java source refactoring of 7000 references

17 votes

I have a huge problem. My supervisors want me to change a piece of java code which can effect whole project.

I need to change the signature of the void log(String) method which should take two more arguments (Class c, String methodName). Eclipse obtained 7000 references to that method and if i change it, the whole project will go down. It will take weeks for me to fix those stuff manually.

I need a refactoring tool that can help me to do this fast. (I believe that can be done fast by a appropriate factoring tool)
I am using Eclipse. As far as I see, refactoring plugin of Eclipse is gonna be bitchy.
Any suggestions/helps?

Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:


I think what you need to do is use a source code parser like javaparser to do this.

For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitorAdapter as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.

I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).

How does "final int i" work inside of a Java for loop?

17 votes

I was surprised to see that the following Java code snippet compiled and ran:

for(final int i : listOfNumbers) {
     System.out.println(i);
}

where listOfNumbers is an array of integers.

I thought final declarations got assigned only once. Is the compiler creating an Integer object and changing what it references?

Imagine that shorthand looks a lot like this:

for (Iterator<Integer> iter = listOfNumbers.iterator(); iter.hasNext(); )
{
    final int i = iter.next();
    {
        System.out.println(i);
    }
}

What factors led to the development of C# in spite of Java

16 votes

I wasn't around when all this was happening. But when I look at it now, I see Java appeared in 1995 whereas C# appeared in 2001. And the history section for C# on Wikipedia says only the following.

James Gosling, who created the Java programming language in 1994, and Bill Joy, a co-founder of Sun Microsystems, the originator of Java, called C# an "imitation" of Java; Gosling further claimed that "[C# is] sort of Java with reliability, productivity and security deleted." Klaus Kreft and Angelika Langer (authors of a C++ streams book) stated in a blog post that "Java and C# are almost identical programming languages. Boring repetition that lacks innovation."

I was left thinking as to how did these to programming languages, almost "imitations" (then) came to be. Why do you think C# came to be when the world already had Java back then?

For those of you who see subjectivity, consider the question as

What technological factors and necessities led to the development of C#? Were some of the key points not already covered by Java?

PS: I'm aware they both differ now at certain key points, as Jon Skeet points out. But they didn't seem to back when C# came out.

Microsoft originally was on the Java bandwagon with their own implementation/JVM. However they fell out with Sun over licensing and I think Microsoft were trying to put extensions into the Java API which was a big no no for Sun who wanted a common standard (and you have to applaud Sun for this at least, having a 100% common API was a big part of Java's cross-platform credentials).

So Microsoft decided to write their own C-based language with a API/VM for it (C# and .NET/CLR). .NET was wider than just C# as it also supports VB and C++ (plus a few others I'd guess).

Initially C# was a poor cousin of Java, the collection library was (and to some degree still is) pretty poor. However with the advent of features such as first-class functions, lambdas/closures and LINQ, C# is moving ahead of Java and personally C# now shades Java in productivity for a lot of simple stuff using Linq.

Java is trying to catch up now with 7/8 supporting closures but I have my doubts whether they will come up with something as good/simple as C#s syntax. That should take a leaf out of MS's book and just nick the good stuff!

Why do people say that Java is more scalable than python?

16 votes

I've seen this argument in a few places, and now, recently i saw it again on a reddit post. This is by no means a flame against any of these two languages. I am just puzzled why there is this bad reputation about python not being scalable.
I'm a python guy and now I'm getting started with Java and i just want to understand what makes Java so scalable and if the python setup that I have in mind is a good way to scale large python apps.

Now back to my idea of scaling a Python app. Let's say you code it using Django. Django runs its apps in fastcgi mode. So what if you have a front Nginx server and behind it as many other servers as needed that will each run your Django app in fastcgi mode. The front Nginx server will then load balance between your backend Djnago fastcgi running servers. Django also supports multiple databases so you could write to one master DB and then read from many slaves, again for load balancing. Throw a memcached server in to this mix and there you go you have scalability. Don't you?

Is this a viable setup? What does Java makes better? How do you scale a Java app?

Scalability is a very overloaded term these days. The comments probably refer to in-process vertical scalability.

Python has a global interpreter lock (GIL) that severely limits its ability to scale up to many threads. It releases it when calling native code (reacquiring it when the native returns), but this still requires careful design when trying to write scalable software in Python.

Are there any provable real-world languages? (scala?)

16 votes

I was taught about formal systems at university, but I was disappointed how they didn't seem to be used in the real word.

I like the idea of being able to know that some code (object, function, whatever) works, not by testing, but by proof.

I'm sure we're all familiar with the parallels that don't exist between physical engineering and software engineering (steel behaves predictably, software can do anything - who knows!), and I would love to know if there are any languages that can be use in the real word (is asking for a web framework too much to ask?)

I've heard interesting things about the testability of functional languages like scala.

As software engineers what options do we have?

Yes, there are languages designed for writing provably correct software. Some are even used in industry. Spark ADA is probably the most prominent example. I've talked to a few people at Praxis Critical Systems Limited who used it for code running on Boings (for engine monitoring) and it seems quite nice. (Here is a great summary / description of the language.) This language and accompanying proof system uses the second technique described below (it doesn't even allow dynamic memory allocation).


My impression and experience is that there are two techniques for writing correct software:

  • Technique 1: Write the software in a language you're comfortable with (C, C++ or Java for instance). Take a formal specification of such language, and prove your program correct.

    If your ambition is to be 100% correct (which is most often a requirement in automobile / aerospace industry) you'd be spending little time programming, and more time proving.

  • Technique 2: Write the software in a slightly more awkward language (some subset of ADA or tweaked version of OCaml for instance) and write the correctness proof along the way. Programming and proving goes hand in hand (the Curry-Howard correspondence even equates them completely!)

    In these scenarios you'll always end up with a correct program. (A bug will be guaranteed to be rooted in the specification.) You'll be likely to spend more time on programming but on the other hand you're proving it correct along the way.

Note that both approaches hinges on the fact you have a formal specification at hand (how else would you tell what is correct / incorrect behavior), and a formally defined semantics of the language (how else would you be able to tell what the actual behavior of your program is).

Here are a few more examples of formal approaches. If it's "real-world" or not, depends on who you ask :-)

I know of only one "provably correct" web-application language: UR. A Ur-program that "goes through the compiler" is guaranteed not to:

  • Suffer from any kinds of code-injection attacks
  • Return invalid HTML
  • Contain dead intra-application links
  • Have mismatches between HTML forms and the fields expected by their handlers
  • Include client-side code that makes incorrect assumptions about the "AJAX"-style services that the remote web server provides
  • Attempt invalid SQL queries
  • Use improper marshaling or unmarshaling in communication with SQL databases or between browsers and web servers

Is there an advantage to running JRuby if you don't know any Java?

14 votes

I've heard great things about JRuby and I know you can run it without knowing any Java. My development skills are strong, Java is just not one of the tools I know. It's a massive tool with a myriad of accompanying tools such as Maven/Ant/JUnit etc.

Is it worth moving my current Rails applications to JRuby for performance reasons alone? Perhaps if I pick up some basic Java along side, there can be so added benefits that aren't obvious such as better debugging/performance optimization tools?

Would love some advice on this one.

I think you pretty much nailed it:

JRuby is just yet another Ruby execution engine, just like MRI, YARV, IronRuby, Rubinius, MacRuby, MagLev, SmallRuby, Ruby.NET, XRuby, RubyGoLightly, tinyrb, HotRuby, BlueRuby, Red Sun and all the others.

The main differences are:

  • portability: for example, YARV is only officially supported on x86 32 Bit Linux. It is not supported on OSX or Windows or 64 Bit Linux. Rubinius only works on Unix, not on Windows. JRuby OTOH runs everywhere: desktops, servers, phones, App Engine, you name it. It runs on the Oracle JDK, OpenJDK, IBM J9, Apple SoyLatte, RedHat IcedTea and Oracle JRockit JVMs (and probably a couple of others I forgot about) and also on the Dalvik VM. It runs on Windows, Linux, OSX, Solaris, several BSDs, other proprietary and open Unices, OpenVMS and several mainframe OSs, Android and Google App Engine. In fact, on Windows, JRuby passes more RubySpec tests than "Ruby" (meaning MRI or YARV) itself!

  • extensibility: Ruby programs running on JRuby can use any arbitrary Java library. Through JRuby-FFI, they can also use any arbitrary C library. And with the new C extension support in JRuby 1.6, they can even use a large subset of MRI and YARV C extensions, like Mongrel for example. (And note that "Java" or "C" library does not actually mean written in those languages, it only means with a Java or C API. They could be written in Scala or Clojure or C++ or Haskell.)

  • tooling: whenever someone writes a new tool for YARV or MRI (like e.g. memprof), it turns out that JRuby already had a tool 5 years ago which does the same thing, only better. The Java ecosystem has some of the best tools for "runtime behavior comprehension" (which is a term I just made up, by which I mean much more than just simple profiling, I mean tools for deeply understanding what exactly your program does at runtime, what its performance characteristics are, where the bottlenecks are, where the memory is going, and most importantly why all of that is happening) and visualization available on the market, and pretty much all of those work with JRuby, at least to some extent.

  • deployment: assuming that your target system already has a JVM installed, deploying a JRuby app (and I'm not just talking about Rails, I also mean desktop, mobile, other kinds of servers) is literally just copying one JAR (or WAR) and a double-click.

  • performance: JRuby has much higher startup overhead. In return you get much higher throughput. In practice, this means that deploying a Rails app to JRuby is a good idea, as is running your integration tests, but for developer unit tests and scripts, MRI, YARV or Rubinius are better choices. Note that many Rails developers simply develop and unit test on MRI and integration test and deploy on JRuby. There's no need to choose a single execution engine for everything.

  • concurrency: JRuby runs Ruby threads concurrently. This means two things: if your locking is correct, your program will run faster, and if your locking is incorrect, your program will break. (Unfortunately, neither MRI nor YARV nor Rubinius run threads conurrently, so there's still some broken multithreaded Ruby code out there that doesn't know it's broken, because obviously concurrency bugs can only show up if there's actual concurrency.)

  • platforms (this is somewhat related to portability): there are some amazing Java platforms out there, e.g. the Azul JCA with 768 GiBytes of RAM and 864 CPU cores specifically designed for memory-safe, pointer-safe, garbage-collected, object-oriented languages. Android. Google App Engine. All of those run JRuby.

Concurrent programming techniques, pros, cons

14 votes

There is at least three well-known approaches for creating concurrent applications:

  1. Multithreading and memory synchronization through locking(.NET, Java). Software Transactional Memory (link text) is another approach to synchronization.

  2. Asynchronous message passing (Erlang).

I would like to learn if there are other approaches and discuss various pros and cons of these approaches applied to large distributed applications. My main focus is on simplifying life of the programmer.

For example, in my opinion, using multiple threads is easy when there is no dependencies between them, which is pretty rare. In all other cases thread synchronization code becomes quite cumbersome and hard to debug and reason about.

I'd strongly recommend looking at this presentation by Rich Hickey. It describes an approach to building high performance, concurrent applications which I would argue is distinct from lock-based or message-passing designs.

Basically it emphasises:

  • Lock free, multi-threaded concurrent applications
  • Immutable persistent data structures
  • Changes in state handled by Software Transactional Memory

And talks about how these principles influenced the design of the Clojure language.

Asserting order of synchronization in Java

13 votes

In highly concurrent systems, it can be difficult to be confident that your usage of locks is correct. Specifically, deadlocks can result if locks are acquired in an order that was not expected while being acquired in the proper order in another thread.

There are tools (e.g. Coverity) which can do static analysis on a code base and look for "unusual" locking orders. I'd like to explore other options to meet my needs.

Are there any light-weight* tools for instrumenting Java code which can detect cases where locks are being acquired in an order other than expected? I am okay with explicitly calling out locking orders via comments / annotations.

Free and/or open-source solutions preferred. Please also comment if there are non-instrumentation approaches to this problem.

* For my purposes, light-weight means...

  • If it is instrumentation, I can still run my program with the same ballpark performance. 30-50% degradation is acceptable, I suppose.
  • I don't have to spend half the day interacting with the tool just to get an "okay" out of it. Ideally I should only notice that I'm using it when there's a problem.
  • If it is instrumentation, it should be easy to disable for production environments.
  • It shouldn't clutter my code at every synchronize statement. As previously mentioned, I'm okay with explicitly commenting/annotating the objects or classes of objects which get locked with relative orderings.

I have not used AspectJ so cannot vouch for how easy it is to use. I have used ASM to create a custom code profiler, this was about 2 days work. The effort to instrument synchronization should be similar. AspectJ should be quicker and easier once you are up to speed with aspects.

I have implemented deadlock detecting trace for our c++ based server. Here is how I did it:

  • When ever acquiring or releasing a lock I traced:
    • <time> <tid> <lockid> <acquiring|releasing> <location in code>
  • This extra trace affected performance quite drastically and was not usable in production.
  • So when a possible deadlock was discovered in production I used the log file to figure out what was happening around the deadlock. Then reproduced this functionality in a test environment with my tracing turned on.
  • Then I ran a script on the log file to see if deadlock was possible and how. I used an awk script, using this algoritm:
    • Foreach line
      • if acquiring
        • add lockid to list of current locks for this thread
        • add each pair of locks in this list to a set lock pairs for this thread. eg for list of Lock A -> Lock B -> Lock C generate the pairs (Lock A, Lock B), (Lock A, Lock C), (Lock B, Lock C)
      • if releasing
        • remove current lockid from tail of list for this thread
    • For each lock pair search all other threads for the reverse lock pairs, each match is a potential deadlock so print the pairs and threads affected
    • Instead of making the algorithm smarter I then desk checked that the lock acquisition to see if it was a real deadlock.

I did this after failing to find the cause of a deadlock for a number of days, it took a few more days to implement and a few hours to find the deadlock.

If you are considering this approach in Java things to consider are:

  • Do you only use synchronized to protect your critical sections? Are you using the classes in java.lang.concurrent? (these might require special handling/instrumentation)
  • How easy is it to print the code location with aspects/ASM? I used __FILE__ and __LINE__ in c++. ASM will give you the class name, method name and signature.
  • You cannot instrument the locks used to protect your tracing/logging.
  • You can streamline your instrumentation if you use a log file per thread and thread local storage for the file object.
  • How do you uniquely identify objects you synchronize on? Maybe toString() and System.identityHashCode() would be enough, but might require more. I used the address of the object in C++.

Java I/O vs. Java new I/O (NIO) with Linux NPTL

12 votes

My webservers use the usual Java I/O with thread per connection mechanism. Nowadays, they are getting on their knees with increased user (long polling connection). However, the connections are mostly idle. While this can be solved by adding more webservers, I have been trying to do some research on the NIO implementation.

I got a mixed impression about it. I have read about benchmarks where regular I/O with the new NPTL library in Linux outperforms NIO.

What is the real life experience of configuring and using the latest NPTL for Linux with Java I/O? Is there any increased performance?

And on a larger scope question:

What is the maximum number of I/O and blocking threads (that we configure in the Tomcat thread pool) in a standard server class machine (Dell with a quad-core processor) we expect to perform normally (with Linux NPTL library?). What's the impact if the threadpool gets really big, say more than 1000 threads?

Any references and pointers will be very much appreciated.

Provocative blog posting, "Avoid NIO, get better throughput." Paul Tyma's(2008) blog claims ~5000 threads without any trouble; I've heard folks claim more:

  1. With NPTL on, Sun and Blackwidow JVM 1.4.2 scaled easily to 5000+ threads. Blocking model was consistently 25-35% faster than using NIO selectors. Lot of techniques suggested by EmberIO folks were employed - using multiple selectors, doing multiple (2) reads if the first read returned EAGAIN equivalent in Java. Yet we couldn't beat the plain thread per connection model with Linux NPTL.

I think the key here is to measure the overhead and performance, and make the move to non-blocking I/O only when you know you need to and can demonstrate an improvement. The additional effort to write and maintain non-blocking code should be factored in to your decision. My take is, if your application can be cleanly expressed using synchronous/blocking I/O, DO THAT. If your application is amenable to non-blocking I/O and you won't just be re-inventing blocking I/O badly in application-space, CONSIDER moving to nio based on measured performance needs. I'm amazed when I poke around the google results for this how few of the resources actually cite any (recent) numbers!

Also, see Paul Tyma's presentation slides: The old way is new again. Based on his work at Google, concrete numbers suggest that synchronous threaded I/O is quite scalable on Linux, and consider "NIO is faster" a myth that was true for awhile, but no longer. Some good additional commentary here on Comet Daily. He cites the following (anecdotal, still no solid link to benchmarks, etc...) result on NPTL:

In tests, NPTL succeeded in starting 100,000 threads on a IA-32 in two seconds. In comparison, this test under a kernel without NPTL would have taken around 15 minutes

If you really are running into scalability problems, you may want to tune the thread stack size using XX:ThreadStackSize. Since you mention Tomcat see here.

Finally, if you're bound and determined to use non-blocking I/O, make every effort to build on an existing framework by people who know what they're doing. I've wasted far too much of my own time trying to get an intricate non-blocking I/O solution right (for the wrong reasons).

See also related on SO.

In regards to for(), why use i++ rather than ++i?

10 votes

Perhaps it doesn't matter to the compiler once it optimizes, but in C/C++, I see most people make a for loop in the form of:

for (i = 0; i < arr.length; i++)

where the incrementing is done with the post fix ++. I get the difference between the two forms. i++ returns the current value of i, but then adds 1 to i on the quiet. ++i first adds 1 to i, and returns the new value (being 1 more than i was).

I would think that i++ takes a little more work, since a previous value needs to be stored in addition to a next value: Push *(&i) to stack (or load to register); increment *(&i). Versus ++i: Increment *(&i); then use *(&i) as needed.

(I get that the "Increment *(&i)" operation may involve a register load, depending on CPU design. In which case, i++ would need either another register or a stack push.)

Anyway, at what point, and why, did i++ become more fashionable?


I'm inclined to believe azheglov: It's a pedagogic thing, and since most of us do C/C++ on a Window or *nix system where the compilers are of high quality, nobody gets hurt.

If you're using a low quality compiler or an interpreted environment, you may need to be sensitive to this. Certainly, if you're doing advanced C++ or device driver or embedded work, hopefully you're well seasoned enough for this to be not a big deal at all. (Do dogs have Buddah-nature? Who really needs to know?)

My theory (why i++ is more fashionable) is that when people learn C (or C++) they eventually learn to code iterations like this:

while( *p++ ) {
    ...
}

Note that the post-fix form is important here (using the infix form would create a one-off type of bug).

When the time comes to write a for loop where ++i or i++ doesn't really matter, it may feel more natural to use the postfix form.

ADDED: What I wrote above applies to primitive types, really. When coding something with primitive types, you tend to do things quickly and do what comes naturally. That's the important caveat that I need to attach to my theory.

If ++ is an overloaded operator on a C++ class (the possibility Rich K. suggested in the comments) then of course you need to code loops involving such classes with extreme care as opposed to doing simple things that come naturally.

How can pattern search make faster ?

9 votes

I am working on about 1GB incremental file and I want to search for a particular pattern. Currently I am using Java Regular expressions, do you have any idea how can I do this faster?

Basically what you need is a state machine that can process a stream. This stream being bounded to the file... Each time the file grow, you read what has been appended to it (like the tail linux command that append to standard output the lines added to the file).

If you need to stop/restart your analyser, you can either just store somewhere the start position (that can depend of the window you need for your pattern matching) and restart from that. Or you can restart from scratch.

That is for the "increasing file" part of the problem.

For the best way to process the content, it depend of what you really need, what kind of data and pattern you want to apply. Regular expression are maybe the best solution: flexible, fast and relatively convenient.

From my understanding, Lucene would be good if you wanted to do document search matching for some natural language content. This would be a poor choice to match all dates or all line with a specific property. Also because Lucene first make an index of the document... This would help only for really heavy processing as indexing in the first place take time.

How do I add custom data and fields to the the Contacts screen in Android?

8 votes

I'm trying to add a custom data field and MIME type to the Contacts screen. Is there a way to do this such that when a user views a contact with my data saved on it, my field appears there? This is something I've seen other apps do--how do the Facebook, Twitter, Last.fm, etc. apps add their status information to contacts, for example? I can't seem to figure it out from the Contacts API documentation.

This guide gives a very good description on how to do what you asked: http://www.c99.org/2010/01/23/writing-an-android-sync-provider-part-1