Best java questions in March 2012

Different results with Java's digest versus external utilities

125 votes

I have written a simple Java class to generate the hash values of the Windows Calculator file. I am using Windows 7 Professional with SP1. I have tried Java 6.0.29 and Java 7.0.03. Can someone tell me why I am getting different hash values from Java versus (many!) external utilities and/or websites? Everything external matches with each other, only Java is returning different results.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.zip.CRC32;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class Checksum 
{
    private static int size = 65536;
    private static File calc = new File("C:/Windows/system32/calc.exe");

    /*
        C:\Windows\System32\calc.exe (verified via several different utilities)
        ----------------------------
        CRC-32b = 8D8F5F8E
        MD5     = 60B7C0FEAD45F2066E5B805A91F4F0FC
        SHA-1   = 9018A7D6CDBE859A430E8794E73381F77C840BE0
        SHA-256 = 80C10EE5F21F92F89CBC293A59D2FD4C01C7958AACAD15642558DB700943FA22
        SHA-384 = 551186C804C17B4CCDA07FD5FE83A32B48B4D173DAC3262F16489029894FC008A501B50AB9B53158B429031B043043D2
        SHA-512 = 68B9F9C00FC64DF946684CE81A72A2624F0FC07E07C0C8B3DB2FAE8C9C0415BD1B4A03AD7FFA96985AF0CC5E0410F6C5E29A30200EFFF21AB4B01369A3C59B58


        Results from this class
        -----------------------
        CRC-32  = 967E5DDE
        MD5     = 10E4A1D2132CCB5C6759F038CDB6F3C9
        SHA-1   = 42D36EEB2140441B48287B7CD30B38105986D68F
        SHA-256 = C6A91CBA00BF87CDB064C49ADAAC82255CBEC6FDD48FD21F9B3B96ABF019916B    
    */    

    public static void main(String[] args)throws Exception {
        Map<String, String> hashes = getFileHash(calc);
        for (Map.Entry<String, String> entry : hashes.entrySet()) {
            System.out.println(String.format("%-7s = %s", entry.getKey(), entry.getValue()));
        }
    }

    private static Map<String, String> getFileHash(File file) throws NoSuchAlgorithmException, IOException {
        Map<String, String> results = new LinkedHashMap<String, String>();

        if (file != null && file.exists()) {
            CRC32 crc32 = new CRC32();
            MessageDigest md5 = MessageDigest.getInstance("MD5");
            MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
            MessageDigest sha256 = MessageDigest.getInstance("SHA-256");

            FileInputStream fis = new FileInputStream(file);
            byte data[] = new byte[size];
            int len = 0;
            while ((len = fis.read(data)) != -1) {
                crc32.update(data, 0, len);
                md5.update(data, 0, len);
                sha1.update(data, 0, len);
                sha256.update(data, 0, len);
            }
            fis.close();

            results.put("CRC-32", toHex(crc32.getValue()));
            results.put(md5.getAlgorithm(), toHex(md5.digest()));
            results.put(sha1.getAlgorithm(), toHex(sha1.digest()));
            results.put(sha256.getAlgorithm(), toHex(sha256.digest()));
        }
        return results;
    }

    private static String toHex(byte[] bytes) {
        String result = "";
        if (bytes != null) {
            StringBuilder sb = new StringBuilder(bytes.length * 2);
            for (byte element : bytes) {
                if ((element & 0xff) < 0x10) {
                    sb.append("0");
                }
                sb.append(Long.toString(element & 0xff, 16));
            }
            result = sb.toString().toUpperCase();
        }
        return result;
    }

    private static String toHex(long value) {
        return Long.toHexString(value).toUpperCase();
    }

}

Thanks for any advice.

Mike V.

Got it. The Windows file system is behaving differently depending on the architecture of your process. This article explains it all - in particular:

But what about 32-bit applications that have the system path hard coded and is running in a 64-bit Windows? How can they find the new SysWOW64 folder without changes in the program code, you might think. The answer is that the emulator redirects calls to System32 folder to the SysWOW64 folder transparently so even if the folder is hard coded to the System32 folder (like C:\Windows\System32), the emulator will make sure that the SysWOW64 folder is used instead. So same source code, that uses the System32 folder, can be compiled to both 32-bit and 64-bit program code without any changes.

Try copying calc.exe to somewhere else... then run the same tools again. You'll get the same results as Java. Something about the Windows file system is giving different data to the tools than it's giving to Java... I'm sure it's something to do with it being in the Windows directory, and thus probably handled "differently".

Furthermore, I've reproduced it in C#... and found out that it depends on the architecture of the process you're running. So here's a sample program:

using System;
using System.IO;
using System.Security.Cryptography;

class Test
{
    static void Main()
    {
        using (var md5 = MD5.Create())
        {
            string path = "c:/Windows/System32/Calc.exe";
            var bytes = md5.ComputeHash(File.ReadAllBytes(path));
            Console.WriteLine(BitConverter.ToString(bytes));
        }
    }
}

And here's a console session (minus chatter from the compiler):

c:\users\jon\Test>csc /platform:x86 Test.cs    

c:\users\jon\Test>test
60-B7-C0-FE-AD-45-F2-06-6E-5B-80-5A-91-F4-F0-FC

c:\users\jon\Test>csc /platform:x64 Test.cs

c:\users\jon\Test>test
10-E4-A1-D2-13-2C-CB-5C-67-59-F0-38-CD-B6-F3-C9

Why does Math.round(0.49999999999999994) return 1

121 votes

In the following program you can see that for each value slightly less that .5 is rounded down, except for 0.5.

for (int i = 10; i >= 0; i--) {
    long l = Double.doubleToLongBits(i + 0.5);
    double x;
    do {
        x = Double.longBitsToDouble(l);
        System.out.println(x + " rounded is " + Math.round(x));
        l--;
    } while (Math.round(x) > i);
}

prints

10.5 rounded is 11
10.499999999999998 rounded is 10
9.5 rounded is 10
9.499999999999998 rounded is 9
8.5 rounded is 9
8.499999999999998 rounded is 8
7.5 rounded is 8
7.499999999999999 rounded is 7
6.5 rounded is 7
6.499999999999999 rounded is 6
5.5 rounded is 6
5.499999999999999 rounded is 5
4.5 rounded is 5
4.499999999999999 rounded is 4
3.5 rounded is 4
3.4999999999999996 rounded is 3
2.5 rounded is 3
2.4999999999999996 rounded is 2
1.5 rounded is 2
1.4999999999999998 rounded is 1
0.5 rounded is 1
0.49999999999999994 rounded is 1
0.4999999999999999 rounded is 0

I am using Java 6 update 31.

According to the Java 6 docs, round(x) is implemented as floor(x+0.5).1 But 0.5+0.49999999999999994 is exactly 1 in double precision:

public static void main(String args[]) {
    double a = 0.5;
    double b = 0.49999999999999994;
    System.out.printf("%016x\n", Double.doubleToLongBits(a));
    System.out.printf("%016x\n", Double.doubleToLongBits(b));
    System.out.printf("%016x\n", Double.doubleToLongBits(a+b));
    System.out.printf("%016x\n", Double.doubleToLongBits(1.0));
}

gives:

3fe0000000000000
3fdfffffffffffff
3ff0000000000000
3ff0000000000000

This is because 0.49999999999999994 has a smaller exponent than 0.5, so when they're added, its mantissa is shifted, and the ULP gets bigger.


1. At least, this is the definition given in the Java 6 docs. It's not given in the Java 7 docs, which may explain why people are seeing different behaviour when they run in Java 7. UPDATE: According to Simon Nickerson's answer, it's a known bug, so this almost certainly explains the difference in the docs and the observed behaviour between versions.

List<Map<String, String>> vs List<? extends Map<String, String>>

78 votes

Just wondering if there is any difference between this:

List<Map<String, String>>

and this:

List<? extends Map<String, String>>

If there is no difference, what is the benefit of using ? extends? I am kinda confused.

The difference is that, for example, a

List<HashMap<String,String>>

is a

List<? extends Map<String,String>>

but not a

List<Map<String,String>>

So:

void withWilds( List<? extends Map<String,String>> foo ){}
void noWilds( List<Map<String,String>> foo ){}

void main( String[] args ){
    List<HashMap<String,String>> myMap;

    withWilds( myMap ); // Works
    noWilds( myMap ); // Compiler error
}

You would think a List of HashMaps should be a List of Maps, but there's a good reason why it isn't:

Suppose you could do:

List<HashMap<String,String>> hashMaps = new ArrayList<HashMap<String,String>>();

List<Map<String,String>> maps = hashMaps; // Won't compile,
                                          // but imagine that it could

Map<String,String> aMap = Collections.singletonMap("foo","bar"); // Not a HashMap

maps.add( aMap ); // Perfectly legal (adding a Map to a List of Maps)

// But maps and hashMaps are the same object, so this should be the same as

hashMaps.add( aMap ); // Should be illegal (aMap is not a HashMap)

So this is why a List of HashMaps shouldn't be a List of Maps.

Why inner class can override final method?

53 votes

I wondered if it makes sense to declare a private method as final as well, and I thought it doesn't make sense. But I imagined there's an exclusive situation and wrote the code to figure it out:

public class Boom {

    private void touchMe() {
        System.out.println("super::I am not overridable!");
    }

    private class Inner extends Boom {

        private void touchMe() {
            super.touchMe();
            System.out.println("sub::You suck! I overrided you!");
        }
    }

    public static void main(String... args) {
        Boom boom = new Boom();
        Boom.Inner inner = boom.new Inner();
        inner.touchMe();
    }
}

It compiled and worked. "I should make touchMe() final" I thought and did it:

public class Boom {

    private final void touchMe() {
        System.out.println("super::I am not overridable!");
    }

    private class Inner extends Boom {

        private void touchMe() {
            super.touchMe();
            System.out.println("sub::You suck! I overrided you!");
        }
    }

    public static void main(String... args) {
        Boom boom = new Boom();
        Boom.Inner inner = boom.new Inner();
        inner.touchMe();
    }
}

and it also works and tells me

chicout@chicout-linlap:~$ java Boom
super::I am not overridable!
sub::You suck! I overrided you!

why?

Private methods can not be overridden (private methods are not inherited!) In fact, it makes no difference if you declare a private method final or not.

The two methods you have declared, Boom.touchMe and Boom.Inner.touchMe are two completely separate methods which just happen to share the same identifier. The fact that super.touchMe refers to a different method than touchMe, is just because Boom.Inner.touchMe shadows Boom.touchMe (and not because it overrides it).

This can be demonstrated in a number of ways:

  • As you discovered yourself, if you change the methods to be public, the compiler will complain because you are suddenly trying to override a final method.

  • If you keep the methods private and add the @Override annotation, the compiler will complain.

  • As alpian points out, if you cast the Boom.Inner object to a Boom object (((Boom) inner).touchMe()) the Boom.touchMe is called (if it indeed was overridden, the cast wouldn't matter).

Related question:

Seeking clarification on apparent contradictions regarding weakly typed languages

52 votes

I think I understand strong typing, but every time I look for examples for what is weak typing I end up finding examples of programming languages that simply coerce/convert types automatically.

For instance, in this article named Typing: Strong vs. Weak, Static vs. Dynamic says that Python is strongly typed because you get an exception if you try to:

Python

1 + "1"
Traceback (most recent call last):
File "", line 1, in ? 
TypeError: unsupported operand type(s) for +: 'int' and 'str'

However, such thing is possible in Java and in C#, and we do not consider them weakly typed just for that.

Java

  int a = 10;
  String b = "b";
  String result = a + b;
  System.out.println(result);

C#

int a = 10;
string b = "b";
string c = a + b;
Console.WriteLine(c);

In this another article named Weakly Type Languages the author says that Perl is weakly typed simply because I can concatenate a string to a number and viceversa without any explicit conversion.

Perl

$a=10;
$b="a";
$c=$a.$b;
print $c; #10a

So the same example makes Perl weakly typed, but not Java and C#?.

Gee, this is confusing enter image description here

The authors seem to imply that a language that prevents the application of certain operations on values of different types is strongly typed and the contrary means weakly typed.

Therefore, at some point I have felt prompted to believe that if a language provides a lot of automatic conversions or coercion between types (as perl) may end up being considered weakly typed, whereas other languages that provide only a few conversions may end up being considered strongly typed.

I am inclined to believe, though, that I must be wrong in this interepretation, I just do not why or how to explain it.

So, my questions are:

  • What does it really mean for a language to be truly weakly typed?
  • Could you mention any good examples of weakly typing that are not related to automatic conversion/automatic coercion done by the language?
  • Can a language be weakly typed and strongly typed at the same time?

Thanks in advance for any references, use cases or examples that you provide that can lead me into the right direction.

What does it really mean for a language to be "weakly typed"?

It means "this language uses a type system that I find distasteful". A "strongly typed" language by contrast is a language with a type system that I find pleasant.

The terms are essentially meaningless and you should avoid them. Wikipedia lists eleven different meanings for "strongly typed", several of which are contradictory. This indicates that the odds of confusion being created are high in any conversation involving the term "strongly typed" or "weakly typed".

All that you can really say with any certainty is that a "strongly typed" language under discussion has some additional restriction in the type system, either at runtime or compile time, that a "weakly typed" language under discussion lacks. What that restriction might be cannot be determined without further context.

Instead of using "strongly typed" and "weakly typed", you should describe in detail what kind of type safety you mean. For example, C# is a statically typed language and a type safe language and a memory safe language, for the most part. C# allows all three of those forms of "strong" typing to be violated. The cast operator violates static typing; it says to the compiler "I know more about the runtime type of this expression than you do". If the developer is wrong, then the runtime will throw an exception in order to protect type safety. If the developer wishes to break type safety or memory safety, they can do so by turning off the type safety system by making an "unsafe" block. In an unsafe block you can use pointer magic to treat an int as a float (violating type safety) or to write to memory you do not own. (Violating memory safety.)

C# imposes type restrictions that are checked at both compile-time and at runtime, thereby making it a "strongly typed" language compared to languages that do less compile-time checking or less runtime checking. C# also allows you to in special circumstances do an end-run around those restrictions, making it a "weakly typed" language compared with languages which do not allow you to do such an end-run.

Which is it really? It is impossible to say; it depends on the point of view of the speaker and their attitude towards the various language features.

What makes reference comparison (==) work for some strings in Java?

35 votes

I have following lines of codes to compare String. str1 not equal to str2, which is understandable since it compares object reference. But then why s1 is equal to s2?

String s1 = "abc";
String s2 = "abc";

String str1 = new String("abc");
String str2 = new String("abc");

if (s1==s2)
    System.out.println("s1==s2");           
else
    System.out.println("s1!=s2");

if (str1==str2)
    System.out.println("str1==str2");           
else
    System.out.println("str1!=str2");

if (s1==str1)
    System.out.println("str1==s1");         
else
    System.out.println("str1!=s1");

Output:

  s1==s2
  str1!=str2
  str1!=s1 

The string constant pool will essentially cache all string literals so they're the same object underneath, which is why you see the output you do for s1==s2. It's essentially an optimisation in the VM to avoid creating a new string object each time a literal is declared, which could get very expensive very quickly! With your str1==str2 example, you're explicitly telling the VM to create new string objects, hence why it's false.

As an aside, calling the intern() method on any string will add it to the constant pool (and return the String that it's added to the pool.) It's not necessarily a good idea to do this however unless you're sure you're dealing with strings that will definitely be used as constants, otherwise you may end up creating hard to track down memory leaks.

How to count the number of 1's a number will have in binary?

29 votes

Possible Duplicate:
Best algorithm to count the number of set bits in a 32-bit integer?

How do I count the number of 1's a number will have in binary?

So let's say I have the number 45, which is equal to 101101 in binary and has 4 1's in it. What's the most efficient way to write an algorithm to do this?

Instead of writing an algorithm to do this its best to use the built in function. Integer.bitCount()

What makes this especially efficient is that the JVM can treat this as an intrinsic. i.e. recognise and replace the whole thing with a single machine code instruction on a platform which supports it e.g. Intel/AMD


To demonstrate how effective this optimisation is

public static void main(String... args) {
    perfTestIntrinsic();

    perfTestACopy();
}

private static void perfTestIntrinsic() {
    long start = System.nanoTime();
    long countBits = 0;
    for (int i = 0; i < Integer.MAX_VALUE; i++)
        countBits += Integer.bitCount(i);
    long time = System.nanoTime() - start;
    System.out.printf("Intrinsic: Each bit count took %.1f ns, countBits=%d%n", (double) time / Integer.MAX_VALUE, countBits);
}

private static void perfTestACopy() {
    long start2 = System.nanoTime();
    long countBits2 = 0;
    for (int i = 0; i < Integer.MAX_VALUE; i++)
        countBits2 += myBitCount(i);
    long time2 = System.nanoTime() - start2;
    System.out.printf("Copy of same code: Each bit count took %.1f ns, countBits=%d%n", (double) time2 / Integer.MAX_VALUE, countBits2);
}

// Copied from Integer.bitCount()
public static int myBitCount(int i) {
    // HD, Figure 5-2
    i = i - ((i >>> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
    i = (i + (i >>> 4)) & 0x0f0f0f0f;
    i = i + (i >>> 8);
    i = i + (i >>> 16);
    return i & 0x3f;
}

prints

Intrinsic: Each bit count took 0.4 ns, countBits=33285996513
Copy of same code: Each bit count took 2.4 ns, countBits=33285996513

Each bit count using the intrinsic version and loop takes just 0.4 nano-second on average. Using a copy of the same code takes 6x longer (gets the same result)

Why are strings immutable in many programming languages?

20 votes

Possible Duplicate:
Why can't strings be mutable in Java and .NET?
Why .NET String is immutable?

Several languages have chosen for this, such as C#, Java, C++, and Python. If it is intended to save memory or gain efficiency for operations like compare, what effect does it have on concatenation and other modifying operations?

Immutable types are a good thing generally:

  • They work better for concurrency (you don't need to lock something that can't change!)
  • They reduce errors: mutable objects are vulnerable to being changed when you don't expect it which can introduce all kinds of strange bugs ("action at a distance")
  • They can be safely shared (i.e. multiple references to the same object) which can reduce memory consumption and improve cache utilisation.
  • Sharing also makes copying a very cheap O(1) operation when it would be O(n) if you have to take a defensive copy of a mutable object. This is a big deal because copying is an incredibly common operation (e.g. whenever you want to pass parameters around....)

As a result, it's a pretty reasonable language design choice to make strings immutable.

Some languages (particularly functional languages like Haskell and Clojure) go even further and make pretty much everything immutable. This enlightening video is very much worth a look if you are interested in the benefits of immutability.

There are a couple of minor downsides for immutable types:

  • Operations that create a changed string like concatenation are more expensive because you need to construct new objects. Typically the cost is O(n+m) for concatenating two immutable Strings, though it can go as low as O(log (m+n)) if you use a tree-based string data structure like a Rope. Plus you can always use special tools like Java's StringBuilder if you really need to concatenate Strings efficiently.
  • A small change on a large string can result in the need to construct a completely new copy of the large String, which obviously increases memory consumption. Note however that this isn't usually a big issue in garbage-collected languages since the old copy will get garbage collected pretty quickly if you don't keep a reference to it.

Overall though, the advantages of immutability vastly outweigh the minor disadvantages. Even if you are only interested in performance, the concurrency advantages and cheapness of copying will in general make immutable strings much more performant than mutable ones with locking and defensive copying.

JVM does not work as expected with JNI C++ code containing a class named "Node"

20 votes

Myself and some teammates have been unable to understand why the following snippet of code will not give the correct output when using JVMs versions 1.6u23 through 1.6u31 (the latest as of this posting). This code snippet represents a simplification of a larger problem:

UPDATE: Modified the example slightly to put focus on the issue that "virtual_function()" does not seem to get called.

UPDATE: Simplified the example even more based on comments to-date.

NodeTester.cpp:

#include <iostream>
#include <jni.h>

class Node {
  public:
    Node () :m_counter(0) {}
    virtual ~Node () {}

    virtual void virtual_function () {
      m_counter += 10;
    }

    void non_virtual_function () {
      m_counter += 1;
    }

    int get_counter () {
      return m_counter;
    }

  private:
    int m_counter;

};

extern "C" {
  JNIEXPORT void JNICALL Java_NodeTester_testNode (JNIEnv *jni_env_rptr, 
                                                   jclass java_class) {
    Node *node_rptr = new Node();
    node_rptr->non_virtual_function();
    node_rptr->virtual_function();

    std::cout << node_rptr->get_counter() << std::endl;

    delete node_rptr;
  }
}

NodeTester.java:

public class NodeTester {
  public static native void testNode ();

  static {
    System.loadLibrary("nodetester");
  }

  public static final void main (String[] args) {
    NodeTester.testNode();
  }
}

expected output:

11

actual output with JVM 1.6u23 through 1.6u31:

1

It seems like the JVM is incorrectly constructing the "Node" object within JNI; although it's possible that this code has something incorrect about its use of JNI. When the class "Node" gets more functionality added to it (e.g. more attributes, additional virtual and non-virtual operations), we can cause a segmentation fault, rather than just incorrect output. We're compiling the cpp code into a RedHat linux 64-bit shared object library using g++, and running the java code with the 64-bit Server VM. Note that on JVMs 1.6u20 through 1.6u22, this produces expected output. I haven't tried any earlier versions.

We've decided to put a bounty on this question! Here's more information on what we already know:

  • JVMs 1.6u22 (and prior) produce expected results
  • Renaming "Node" or putting it in a namespace produces expected results
  • Allocating a "Node" object on the stack instead of the heap in the JNI function produces expected results
  • There are no issues with non-virtual components of the class "Node"

Unfortunately for us, none of these items lead to viable solutions - the "larger problem" I alluded to was that we're dealing with a large, existing code base with a C++ class named "Node", which we need to access via JNI. We also tried several g++ and javac compiler options, and several JVM options, to no avail (although if someone stumbles on one that actually yields expected results, this would be an acceptable solution).

Ok, this is not a perfect answer, but if we have nothing better, the following may help. As explained a bit in the other comments, the crux of the problem lies in two distinct C++ classes both named Node in the global namespace, one from OpenJDK or SunJDK 1.6u23 and up on RedHat Linux (at least) and another from another library, both of which need to have their symbols shared with other libraries. To get our symbols loaded before the JDK, we may set the LD_PRELOAD environment variable, by calling e.g.:

LD_PRELOAD=libTheNodeTester.so java ...

But this may crash the JDK, if it actually starts using our symbols as if it were the ones from its libraries...

The most efficient way to search for an array of strings in another string

16 votes

I have a large arrray of strings that looks something like this: String temp[] = new String[200000].

I have another String, let's call it bigtext. What I need to do is iterate through each entry of temp, checking to see if that entry is found in bigtext and then do some work based on it. So, the skeletal code looks something like this:

for (int x = 0; x < temp.length; x++) {
  if (bigtext.indexOf(temp[x]) > -1 {

  //do some stuff
  } else continue;
}

Because there are so many entries in temp and there are many instances of bigtext as well, I want to do this in the most efficient way. I am wondering if what I've outlined is the most efficient way to iterate through this search of if there are better ways to do this.

Thanks,

Elliott

I think you're looking for an algorithm like Rabin-Karp or Aho–Corasick which are designed to search in parallel for a large number of sub-strings in a text.

Android vibration intensity and damage

14 votes

I realize this is not strictly a code question, but I guess it belongs here anyway. If not, my apologies in advance.

Being as there's no inbuilt way to change the vibration intensity for the droid in code I'm using a kind of PWM control (switching the vibrator on and off at high frequency gives me a kind of control over vibration intensity). Right now I'm using a 20ms period (for example, with a 50% duty cycle the vibrator is on for 10ms and off for 10ms and it kind of feels like half power).

My question is, can some damage occurr to the vibrator motor using this kind of control?

Thanks!

I'm no engineer, but we're in luck because there is one sitting next to me. Apparently there's a kind of life cycle to things that relates in some ways to altering the state and in some other ways to duration of use so yes doing what you're talking about will stress the device in one way by trying to get something to go from 0% to 100% and back again very rapidly, but relieve some stress by only having it on half the time. Overall, what you're talking about doing shouldn't do any harm that would shorten the Android's life span as long as this pattern isn't intended to run for very long. I would definitely suggest getting in touch with someone who knows the mechanical part of the device more intimately because every device is different and general knowledge doesn't always translate into spot-on specific knowledge.

How to implement a temporal table using JPA?

13 votes

I would like to know how to implement temporal tables in JPA 2 with EclipseLink. By temporal I mean tables who define validity period.

One problem that I'm facing is that referencing tables can no longer have foreign keys constraints to the referenced tables (temporal tables) because of the nature of the referenced tables which now their primary keys include the validity period.

  • How would I map the relationships of my entities?
  • Would that mean that my entities can no longer have a relationship to those valid-time entities?
  • Should the responsability to initialize those relationships now do by me manually in some kind of a Service or specialized DAO?

The only thing I've found is a framework called DAO Fusion which deals with this.

  • Are there any other ways to solve this?
  • Could you provide an example or resources about this topic (JPA with temporal databases)?

Here is an fictional example of a data model and its classes. It starts as a simple model that doesn't have to deal with temporal aspects:

1st Scenario: Non Temporal Model

Data Model: Non Temporal Data Model

Team:

@Entity
public class Team implements Serializable {

    private Long id;
    private String name;
    private Integer wins = 0;
    private Integer losses = 0;
    private Integer draws = 0;
    private List<Player> players = new ArrayList<Player>();

    public Team() {

    }

    public Team(String name) {
        this.name = name;
    }


    @Id
    @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="SEQTEAMID")
    @SequenceGenerator(name="SEQTEAMID", sequenceName="SEQTEAMID", allocationSize=1)
    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    @Column(unique=true, nullable=false)
    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Integer getWins() {
        return wins;
    }

    public void setWins(Integer wins) {
        this.wins = wins;
    }

    public Integer getLosses() {
        return losses;
    }

    public void setLosses(Integer losses) {
        this.losses = losses;
    }

    public Integer getDraws() {
        return draws;
    }

    public void setDraws(Integer draws) {
        this.draws = draws;
    }

    @OneToMany(mappedBy="team", cascade=CascadeType.ALL)
    public List<Player> getPlayers() {
        return players;
    }

    public void setPlayers(List<Player> players) {
        this.players = players;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Team other = (Team) obj;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }


}

Player:

@Entity
@Table(uniqueConstraints={@UniqueConstraint(columnNames={"team_id","number"})})
public class Player implements Serializable {

    private Long id;
    private Team team;
    private Integer number;
    private String name;

    public Player() {

    }

    public Player(Team team, Integer number) {
        this.team = team;
        this.number = number;
    }

    @Id
    @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="SEQPLAYERID")
    @SequenceGenerator(name="SEQPLAYERID", sequenceName="SEQPLAYERID", allocationSize=1)
    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    @ManyToOne
    @JoinColumn(nullable=false)
    public Team getTeam() {
        return team;
    }

    public void setTeam(Team team) {
        this.team = team;
    }

    @Column(nullable=false)
    public Integer getNumber() {
        return number;
    }

    public void setNumber(Integer number) {
        this.number = number;
    }

    @Column(unique=true, nullable=false)
    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((number == null) ? 0 : number.hashCode());
        result = prime * result + ((team == null) ? 0 : team.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Player other = (Player) obj;
        if (number == null) {
            if (other.number != null)
                return false;
        } else if (!number.equals(other.number))
            return false;
        if (team == null) {
            if (other.team != null)
                return false;
        } else if (!team.equals(other.team))
            return false;
        return true;
    }


}

Test class:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration({"/META-INF/application-context-root.xml"})
@Transactional
public class TestingDao {

    @PersistenceContext
    private EntityManager entityManager;
    private Team team;

    @Before
    public void setUp() {
        team = new Team();
        team.setName("The Goods");
        team.setLosses(0);
        team.setWins(0);
        team.setDraws(0);

        Player player = new Player();
        player.setTeam(team);
        player.setNumber(1);
        player.setName("Alfredo");
        team.getPlayers().add(player);

        player = new Player();
        player.setTeam(team);
        player.setNumber(2);
        player.setName("Jorge");
        team.getPlayers().add(player);

        entityManager.persist(team);
        entityManager.flush();
    }

    @Test
    public void testPersistence() {
        String strQuery = "select t from Team t where t.name = :name";
        TypedQuery<Team> query = entityManager.createQuery(strQuery, Team.class);
        query.setParameter("name", team.getName());
        Team persistedTeam = query.getSingleResult();
        assertEquals(2, persistedTeam.getPlayers().size()); 

        //Change the player number
        Player p = null;
        for (Player player : persistedTeam.getPlayers()) {
            if (player.getName().equals("Alfredo")) {
                p = player;
                break;
            }
        }
        p.setNumber(10);        
    }


}

Now you are asked to keep an history of how the Team and Player was on certain point of time so what you need to do is to add a period time for each table that wants to be tracked. So let's add these temporal columns. We are going to start with just Player.

2nd Scenario: Temporal Model

Data Model: Temporal Data Model

As you can see we had to drop the primary key and define another one that includes the dates (period). Also we had to drop the unique constraints because now they can be repeated in the table. Now the table can contain the current entries and also the history.

Things get pretty ugly if also we have to make Team temporal, in this case we would need to drop the foreign key constraint that Player table has to Team. The problem is how would you model that in Java and JPA.

Note that ID is a surrogate key. But now the surrogate keys have to include the date because if they don't it wouldn't allow to store more than one "version" of the same entity (during the timeline).

I am very interested in this topic. I am working for several years now in the development of applications which use these patterns, the idea came in our case from a German diploma thesis.

I didn't know the "DAO Fusion" frameworks, they provide interesting information and links, thanks for providing this information. Especially the pattern page and the aspects page are great!

To your questions: no, I cannot point out other sites, examples or frameworks. I am afraid that you have to use either the DAO Fusion framework or implement this functionality by yourself. You have to distinguish which kind of functionality you really need. To speak in terms of "DAO Fusion" framework: do you need both "valid temporal" and "record temporal"? Record temporal states when the change applied to your database (usually used for auditing issues), valid temporal states when the change occurred in real life or is valid in real life (used by the application) which might differ from record temporal. In most cases one dimension is sufficient and the second dimension is not needed.

Anyway, temporal functionality has impacts on your database. As you stated: "which now their primary keys include the validity period". So how do you model the identity of an entity? I prefer the usage of surrogate keys. In that case this means:

  • one id for the entity
  • one id for the object in the database (the row)
  • the temporal columns

The primary key for the table is the object id. Each entity has one or more (1-n) entries in a table, identified by the object id. Linking between tables is based on the entity id. Since the temporal entries multiply the amount of data, standard relationships don't work. A standard 1-n relationship might become a x*1-y*n relationship.

How do you solve this? The standard approach would be to introduce a mapping table, but this is not a naturally approach. Just for editing one table (eg. an residence change occurs) you would also have to update/insert the mapping table which is strange for every programmer.

The other approach would be not to use a mapping table. In this case you cannot use referential integrity and foreign keys, each table is acting isolated, the linking from one table to the others must be implemented manual and not with JPA functionality.

The functionality of initializing database objects should be within the objects (as in the DAO Fusion framework). I would not put it in a service. If you give it into a DAO or use Active Record Pattern is up to you.

I am aware that my answer doesn't provide you with an "ready to use" framework. You are in a very complicated area, from my experience resources to this usage scenario are very hard to find. Thanks for your question! But anyway I hope that I helped you in your design.

In this answer you will find the reference book "Developing Time-Oriented Database Applications in SQL", see http://stackoverflow.com/a/800516/734687

Update: Example

  • Question: Let's say that I have a PERSON table who has a surrogate key which is a field named "id". Every referencing table at this point will have that "ID" as a foreign key constraint. If I add temporal columns now I have to change the primary key to "id+from_date+to_date". Before changing the primary key I would have to first drop every foreign constraint of every referencing table to the this referenced table (Person). Am I right? I believe that's what you mean with the surrogate key. ID is a generated key that could be generated by a sequence. The business key of the Person table is the SSN.
  • Answer: Not exactly. SSN would be a natural key, which I do not use for objcet identity. Also "id+from_date+to_date" would be a composite key, which I would also avoid. If you look at the example you would have two tables, person and residence and for our example say we have a 1-n relationship with a foreign key residence. Now we adding temporal fields on each table. Yes we drop every foreign key constraint. Person will get 2 IDs, one ID to identify the row (call it ROW_ID), one ID to identify the person itself (call it ENTIDY_ID) with an index on that id. Same for the person. Of course your approach would work too, but in that case you would have operations which change the ROW_ID (when you close a time interval), which I would avoid.

To extend the example implemented with the assumptions above (2 tables, 1-n):

  • a query to show all entries in the database (all validity information and record - aka technical - information included):

    SELECT * FROM Person p, Residence r
    WHERE p.ENTITY_ID = r.FK_ENTITY_ID_PERSON          // JOIN 
  • a query to hide the record - aka technical - information. This shows all the validy-Changes of the entities.

    SELECT * FROM Person p, Residence r
    WHERE p.ENTITY_ID = r.FK_ENTITY_ID_PERSON AND
    p.recordTo=[infinity] and r.recordTo=[infinity]    // only current technical state
  • a query to show the actual values.

    SELECT * FROM Person p, Residence r
    WHERE p.ENTITY_ID = r.FK_ENTITY_ID_PERSON AND
    p.recordTo=[infinity] and r.recordTo=[infinity] AND
    p.validFrom <= [now] AND p.validTo > [now] AND        // only current valid state person
    r.validFrom <= [now] AND r.validTo > [now]            // only current valid state residence

As you can see I never use the ROW_ID. Replace [now] with a timestamp to go back in time.

Update to reflect your update
I would recommend the following data model:

Introduce a "PlaysInTeam" table:

  • ID
  • ID Team (foreign key to team)
  • ID Player (foreign key to player)
  • ValidFrom
  • ValidTo

When you list the players of a team you have to query with the date for which the relationship is valid and has to be in [ValdFrom, ValidTo)

For making team temporal I have two approaches;

Approach 1: Introduce a "Season" table which models a validity for a season

  • ID
  • Season name (eg. Summer 2011)
  • From (maybe not necessary, because every one knows when the season is)
  • To (maybe not necessary, because every one knows when the season is)

Split the team table. You will have fields which belong to the team and which are not time relevant (name, address, ...) and fields which are time relevant for a season (win, loss, ..). In that case I would use Team and TeamInSeason. PlaysInTeam could link to TeamInSeason instead of Team (has to be considered - I would let it point to Team)

TeamInSeason

  • ID
  • ID Team
  • ID Season
  • Win
  • Loss
  • ...

Approach 2: Do not model the season explicitly. Split the team table. You will have fields which belong to the team and which are not time relevant (name, address, ...) and fields which are time relevant (win, loss, ..). In that case I would use Team and TeamInterval. TeamInterval would have fields "from" and "to" for the interval. PlaysInTeam could link to TeamInterval instead of Team (I would let it on Team)

TeamInterval

  • ID
  • ID Team
  • From
  • To
  • Win
  • Loss
  • ...

In both approaches: if you do not need a seperate team table for no time relevant field, do not split.

Java UDP hole punching example - connecting through firewall

13 votes

Lets say I have two computers.

They know each others public and private IPs via ice4j.

One client listening and the other one sending some string.

I'd like to see this happen via UPD hole punching:

Let A be the client requesting the connection

Let B be the client that is responding to the request

Let S be the ice4j STUN server that they contact to initiate the connection
--
A sends a connection request to S

S responds with B's IP and port info, and sends A's IP and port info to B

A sends a UDP packet to B, which B's router firewall drops but it still
punches a hole in A's own firewall where B can connect

B sends a UDP packet to A, that both punches a hole in their own firewall,
and reaches A through the hole that they punched in their own firewall

A and B can now communicate through their established connection without 
the help of S

Could any one post examples of how to gather External IP and Port of each client by using server S (it can be STUN server or any other server that would do this, even your own)?

It would be nice if you accounted for double NAT as well.

CONCLUSION:

You can use STUN to discover the IP and Port but you have to write your own code that would send the IP:Port to your server via keepalive technique.

Once one client identifies the other via unique ID on the server it will be provided with the other's client IP:port info to UDP hole punch the data it needs to send and receive.

This example is in C#, not in Java, but the concepts of NAT traversal are language-agnostic.

See Michael Lidgren's network library which has NAT traversal built in.

Link: http://code.google.com/p/lidgren-network-gen3/ Specific C# File Dealing with NAT Traversal: http://code.google.com/p/lidgren-network-gen3/source/browse/trunk/Lidgren.Network/NetNatIntroduction.cs

The process you've posted is correct. It will work, for only 3 out of 4 general types of NAT devices (I say general because NAT behavior isn't really standardized): Full-Cone NATs, Restricted-Cone NATs, and Port-Restricted-Cone NATs. NAT traversal will not work with Symmetric NATs, which are found mostly in corporate networks for enhanced security. If one party uses a Symmetric NAT and the other party doesn't, it's still possible to traverse the NAT but it requires more guesswork. A Symmetric NAT to Symmetric NAT traversal is extremely difficult - you can read a paper about it here.

But really, the process you've described works exactly. I've implemented it for my own remote screen sharing program (also in C#, unfortunately). Just make sure you've disabled Windows firewall (if you're using Windows) and third-party firewalls. But yes, I can happily confirm that it will work.

Clarifying the Process of NAT Traversal

I'm writing this update to clarify the process of NAT traversal for you and future readers. Hopefully, this can be a clear summary of the history and the process.

Some Reference Sources: http://think-like-a-computer.com/2011/09/16/types-of-nat/, and http://en.wikipedia.org/wiki/Network_address_translation, http://en.wikipedia.org/wiki/IPv4, http://en.wikipedia.org/wiki/IPv4_address_exhaustion.

IPv4 addresses, with the capacity to uniquely name approximately 4.3 billion computers, have run out. Smart people foresaw this problem, and, among other reasons, invented routers to combat IPv4 address exhaustion, by assigning a network of computers connected to itself 1 shared IP address.

There are LAN IPs. And then there are WAN IPs. LAN IPs are Local Area Network IPs which uniquely identify computers in a local network, say the desktops, laptops, printers, and smartphones connected to a home router. WAN IPs uniquely identify computers outside of the local area network in a wide area network - commonly taken to mean The Internet. So these routers assign a group of computers 1 WAN IP. Each computer still has its own LAN IP. LAN IPs are what you see when you type ipconfig in your Command Prompt and get IPv4 Address . . . . . . . . 192.168.1.101. WAN IPs are what you see when you connect to cmyip.com and get 128.120.196.204.

Just as the radio spectrum is bought out, so entire IP ranges are bought out and reserved as well by agencies and organizations, as well as port numbers. The short message is, again, that we don't have any more IPv4 addresses to spare.

What does this have to do with NAT traversal? Well, since routers were invented, direct connections (end-to-end connectivity) have been somewhat ... impossible, without a few hacks. If you have a network of 2 computers (Computer A and Computer B) both sharing the WAN IP of 128.120.196.204, to which computer does a connection go? I'm talking about an external computer (say google.com) initiating a connection to 128.120.196.204. The answer is: nobody knows, and neither does the router, which is why the router drops the connection. If Computer A initiates a connection to, say, google.com, then that's a different story. The router then remembers that Computer A with LAN IP 192.168.1.101 intiated a connection to 74.125.227.64 (google.com). As Computer A's request packet leaves the router, the router actually re-writes LAN IP 192.168.1.101 to the router's WAN IP of 128.120.196.204. So, when google.com receives Computer A's request packet, it sees the sender IP that the router re-wrote, not the LAN IP of Computer A (google.com sees 128.120.196.204 as the IP to reply to). When google.com finally replies, the packet reaches the router, the router remembers (it has a state table) that it was expecting a reply from google.com, and it appropriately forwards the packet to Computer A.

In other words, your router has no problem when you initiate the connection - your router will remember to forward the replying packet back to your computer (through that whole process described above). But, when an external server initiates a connection to you, the router can't know which computer the connection was meant for, since Computer A and Computer B both share the WAN IP of 128.120.196.204 ... unless, there's a clear rule that instructs the router to forward all packets originally going to destination port X, now to go to Computer A, destination port Y. This is known as port-forwarding. Unfortunately, if you're thinking of using port-forwarding for your networking applications, it's not practical, as your users may not understand how to enable it, and may be reluctant to enable it if they think it's a security risk. UPnP simply refers to the technology that allows you to programatically enable port-forwarding. Unfortunately, if you're thinking of using UPnP to port-forward your networking applications, it's not practical either, as UPnP is not always available, and when it is, it may not turned on by default.

So what's the solution then? The solution is to either proxy your entire traffic over your own computer (which you have carefully pre-configured to be globally reachable), or to come up with a way to beat the system. The first solution is (I believe) called TURN, and magically solves all connectivity issues at the price of providing a farm of servers with the available bandwidth. The second solution is called NAT traversal, and it's what we'll be exploring next.

Earlier, I described the process of an external server (say google.com) initiating a connection to 128.120.196.204. I said that, without the router having specific rules to understand which computer to forward google's connection request to, the router would simply drop the connection. This was a generalized scenario, and is not accurate because there are different types of NATs. (Note: A router is the actual physical device that you can drop on the floor. NAT (Network Address Translation) is a software process programmed into the router which helps save IPv4 addresses like trees). So, depending on which NAT the router employs, connection scenarios vary. A router may even combine NAT processes.

There are four types of NATs with standardized behavior: Full-Cone NATs, Restricted-Cone NATs, Port-Restricted-Cone NATs, and Symmetric NATs. Aside from these types, there can be other types of NATs with non-standardized behavior, but it's rarer.

Note: I'm not really too familiar with NATs...it seems like there are many ways of looking at routers, and information on the internet is very spread out on this topic. Classifying NATs by full, restricted, and port-restricted cones has been somewhat deprecated, says Wikipedia? There's something called static and dynamic NATs...just a bunch of various concepts that I can't reconcile together. Nevertheless, the following model worked for my own application. You can find out more about NATs by reading the links below and above and throughout this post. I can't post more about them because I don't really understand much about them.

Hoping for some network gurus to correct/add input, so that we can all learn more about this mysterious process.

To answer your question about gathering the external IP and Port of each client:

The headers of all UDP packets are structured the same with one source IP and one source port. UDP packet headers do not contain an "internal" source IP and an "external" source IP. UDP packet headers only contain one source IP. If you want to get an "internal" and "external" source IP, you need to actually send the internal source IP as part of your payload. But it doesn't sound like you need an internal source IP and port. It sounds like you only need an external IP and port, as your question stated. Which means that your solution it to simply read the source IP and port off the packet like the fields they are.

Two scenarios below (they don't really explain anything else):

LAN Communication

Computer A has a LAN IP of 192.168.1.101. Computer B has a LAN IP of 192.168.1.102. Computer A sends a packet, from port 3000, to Computer B at port 6000. The source IP on the UDP packet will be 192.168.1.101. And that will be the only IP. "External" has no context here, because the network is purely a local area network. In this example, a wide area network (like the Internet) doesn't exist. About ports though, because I'm unsure about NATs, I'm not sure if the port inscribed on the packet will be 3000. The NAT device may re-write the packet's port from 3000 to something random like 49826. Either way, you should use whatever port inscribed on the packet to reply - it's what you're supposed to use to reply. So in this example of LAN communication, you need send only one IP - the LAN IP, because that's all that matters. You don't have to worry about the port - the router takes care of that for you. When you receive the packet, you gather the only IP and port simply by reading it off the packet.

WAN Communication

Computer A has a LAN IP, again, of 192.168.1.101. Computer B has a LAN IP, again, of 192.168.1.102. Both Computer A and Computer B will share a WAN IP of 128.120.196.204. Server S is a server, a globally reachable computer on, let's say, an Amazon EC2 server, with a WAN IP of 1.1.1.1. Server S may have a LAN IP, but it's irrelevant. Computer B is irrelevant too.

Computer A sends a packet, from port 3000, to Server S. On the way out the router, the packet's source LAN IP from Computer A gets re-written to the WAN IP of the router. The router also re-writes the source port of 300 to 32981. What does Server S see, in terms of the external IP and port? Server S sees 128.120.196.204 as the IP, not 192.168.1.101, and Server S sees 32981 as the port, not 3000. Although these aren't the original IP and ports Computer A used to send the packet, these are the correct IPs and ports to reply to. When you receive the packet, you can only know the WAN IP and rewritten port. If that's what you want (you were asking for just the external IP and port), then you're set. Otherwise, if you also wanted the internal IP of the sender, you would need to have transmitted that as normal data separate from your header.

Code:

As stated above (below To answer your question about gathering the external IP), to gather the External IP and Port of each client, you simply read them off the packet. Each datagram sent always has the source IP and source port of the sender; you don't even need a fancy custom protocol because these two fields are always included - every single UDP packet must, by definition, have these two fields.

// Java language
// Buffer for receiving incoming data
byte[] inboundDatagramBuffer = new byte[1024];
DatagramPacket inboundDatagram = new DatagramPacket(inboundDatagramBuffer, inboundDatagramBuffer.length);
// Source IP address
InetAddress sourceAddress = inboundDatagram.getAddress();
// Source port
int sourcePort = inboundDatagram.getPort();
// Actually receive the datagram
socket.receive(inboundDatagram);

Because getAddress() and getPort() can return either the destination or source port, depending on what you set it to be, on the client (sending) machine, call setAddress() and setPort() to the server (receiving) machine, and on the server (receiving) machine, call setAddress() and setPort() back to the client (sending) machine. There must be a way to do this in receive(). Please elaborate if this (getAddress() and getPort() don't return the source IP and port you expect) is your actual roadblock. This is assuming the server to be a "standard" UDP server (it's not a STUN server).

Further Update:

I read your update about "how to use STUN to take the IP and port from one client and give it to the other"? A STUN server isn't designed to exchange endpoints or perform NAT traversal. A STUN server is designed to tell you your public IP, public port, and type of NAT device (whether it's a Full-Cone NAT, Restricted-Cone NAT, or Port-Restricted Cone NAT). I'd call the middleman server responsible for exchanging endpoints and performing the actual NAT traversal the "introducer". In my personal project, I don't actually need to use STUN to perform NAT traversing. My "introducer" (the middleman server that introduces clients A and B) is a standard server listening for UDP datagrams. As both clients A and B register themselves with the introducer, the introducer reads off their public IP and port and private IP (in case they're on a LAN). The public IP is read off the datagram header, like for all standard UDP datagrams. The private IP is written as part of the datagram payload, and the introducer just reads it as part of the payload. So, about STUN's usefulness, you don't need to rely on STUN to get the public IP and public port of each of your clients - any connected socket can tell you this. I'd say STUN is useful only for determining what type of NAT device your client is under so that you know whether to perform NAT traversal (if the NAT device type is Full-Cone, Restricted, or Port-Restricted), or to perform all-out TURN traffic proxying (if the NAT device type is Symmetric).

Please elaborate on your roadblock: if you want advice on best practices for designing an application messaging protocol, and advice on reading the fields off received messages in an orderly and systematic fashion (based on the comment you posted below), could you share your current method?

Java Switch Statement - Is "or"/"and" possible?

13 votes

I implemented a font system that finds out which letter to use via char switch statements. There are only capital letters in my font image. I need to make it so that, for example, 'a' and 'A' both have the same output. Instead of having 2x the amount of cases, could it be something like the following:

char c;

switch(c){
case 'a' & 'A': /*get the 'A' image*/; break;
case 'b' & 'B': /*get the 'B' image*/; break;
...
case 'z' & 'Z': /*get the 'Z' image*/; break;
}

Is this possible in java?

You can use switch-case fall through by omitting the break; statement.

char c = /* whatever */;

switch(c){
    case 'a':
    case 'A':
        //get the 'A' image;
        break;
    case 'b':
    case 'B':
        //get the 'B' image;
        break;
    ...
    case 'z'
    case 'Z':
        //get the 'Z' image;
        break;
}

...or you could just normalize to lower case or upper case before switching.

char c = Character.toUpperCase(/* whatever */);

switch(c){
    case 'A':
        //get the 'A' image;
        break;
    case 'B':
        //get the 'B' image;
        break;
    ...
    case 'Z':
        //get the 'Z' image;
        break;
}

Memory consumption issues of a Java program

12 votes

I have a Java program that runs on my Ubuntu 10.04 machine and, without any user interaction, repeatedly queries a MySQL database and then constructs img- and txt-files according to the data read from the DB. It makes tens of thousands of queries and creates tens of thousands of files.

After some hours of running, the available memory on my machine including swap space is totally used up. I haven't started other programs and the processes running in the background don't consume much memory and don't really grow in consumption.

To find out what is allocating so much memory I wanted to analyse a heap dump, so I started the process with -Xms64m -Xmx128m -XX:+HeapDumpOnOutOfMemoryError.

To my surprise, the situation was the same as before, after some hours the program was allocating all of the swap which is way beyond the given max of 128m.

Another run debugged with VisualVM showed that the heap allocation never is beyond the max of 128m - when the allocated memory is approximating the max, a big part of it is released again (I assume by the garbage collector).

So, it cannot be a problem a steadily growing heap.

When the memory is all used up:

free shows the following:

             total       used       free     shared    buffers     cached
Mem:       2060180    2004860      55320          0        848    1042908
-/+ buffers/cache:     961104    1099076
Swap:      3227640    3227640          0

top shows the following:

USER    VIRT    RES     SHR     COMMAND
[my_id] 504m    171m    4520    java
[my_id] 371m    162m    4368    java

(by far the two "biggest" processes and the only java processes running)

My first question is:

  • How can I find out on the OS level (e.g. with command line tools) what is allocating so much memory? top / htop hasn't helped me. In case of many, many tiny processes of the same type eating up the memory: is there a way to intelligently sum up similar processes? (I know that is probably off topic as it is a Linux/Ubuntu question, but my main problem may still be Java-related)

My old questions were:

  • Why isn't the memory consumption of my program given in the top output?
  • How can I find out what is allocating so much memory?
  • If the heap isn't the problem, is the only "allocating factor" the stack? (the stack shouldn't be a problem as there is no deep "method call depth")
  • What about external resources as DB connections?

As there was no activity after the day I asked the question (until March 23) and as I still couldn't find the cause for the memory consumption I "solved" the problem pragmatically.

The program causing the problem is basically a repetition of a "task" (i.e. querying a DB and then creating files). It is relatively easy to parameterize the program so that a certain subset of tasks is executed and not all of them.

So now I repeatedly run my program from a shell script, in each process executing only a set of tasks (parameterized through arguments). In the end, all tasks are being executed, but as a single process only processes a subset of tasks there are no memory issues any more.

For me that is a sufficient solution. If you have a similar problem and your program has a batch-like execution structure this may be a pragmatic approach.

When I find the time I will look into the new suggestions hopefully identifying the root cause (thanks for the help!).

Why is the difference in declaration of generic Lists?

12 votes

I want to decare two Lists: First is a list of Integers. I decare it as:

  List<Integer> ints= Arrays.asList(1,2,3);

It works fine.

Second is a list of Objects. I declare it as:

  List<Object> objs= Arrays.asList(1,2.13,"three");

But it gives a error in eclipse as soon as I write it. The error is:

  Multiple markers at this line
- Type mismatch: cannot convert from List<Object&Comparable<?>&Serializable> to 
 List<Object>
- Type safety: A generic array of Object&Comparable<?>&Serializable is created for
       a varargs parameter

Instead if I write

  List<Object> objs = Arrays.<Object>asList(1,2.13,"three");

It works fine.

I am not able figure out the reason.

Look at this post on stackoverflow.

15.12.2.7 Inferring Type Arguments Based on Actual Arguments

A supertype constraint T :> X implies that the solution is one of supertypes of X. Given several such constraints on T, we can intersect the sets of supertypes implied by each of the constraints, since the type parameter must be a member of all of them. We can then choose the most specific type that is in the intersection

The most restrictive type intersection between String,Double and Integer is the interfaces Comparable and Serializable. So when you write

Arrays.asList(1,2.13,"three"); 

It infers T to be implements Comparable<?>, Serializable.Then it is as if you are doing

List<Object> objs = List<T extends Comparable<?>, Serializable>

When you provide Object explicitely, no interference is made

How can I get array of elements, including missing elements, using XPath in XSLT?

10 votes

Given the following XML-compliant HTML:

<div>
 <a>a1</a>
 <b>b1</b>
</div>

<div>
 <b>b2</b>
</div>

<div>
 <a>a3</a>
 <b>b3</b>
 <c>c3</c>
</div>

doing //a will return:

[a1,a3]

The problem with above is that the third column data is now in second place, when A is not found it is completely skipped.

how can you express an xpath to get all A elements which will return:

[a1, null, a3]

same case for //c, I wonder if it's possible to get

[null, null, c3]

UPDATE: consider another scenario where are no common parents <div>.

<h1>heading1</h1>
 <a>a1</a>
 <b>b1</b>


<h1>heading2</h1>
 <b>b2</b>


<h1>heading3</h1>
 <a>a3</a>
 <b>b3</b>
 <c>c3</c>

UPDATE: I am now able to use XSLT as well.

There is no null value in XPath. There's a semi-related question here which also explains this: http://www.velocityreviews.com/forums/t686805-xpath-query-to-return-null-values.html

Realistically, you've got three options:

  1. Don't use XPath at all.
  2. Use this: //a | //div[not(a)], which would return the div element if there was no a within it, and have your Java code handle any div's returned as 'no a element present'. Depending on the context, this may even allow you to output something more useful if required, as you'll have access to the entire contents of the div, for example an error 'no a element found in div (some identifier)'.
  3. Preprocess your XML with an XSLT that inserts a elements in any div element that does not already have one with a suitable default.

Your second case is a little tricky, and to be honest, I'd actually recommend not using XPath for it at all, but it can be done:

//a | //h1[not(following-sibling::a) or generate-id(.) != generate-id(following-sibling::a[1]/preceding-sibling::h1[1])]

This will match any a elements, or any h1 elements where no following a element exists before the next h1 element, or the end of the document. As Dimitre pointed out though, this only works if you're using it from within XSLT, as generate-id is an XSLT function.

If you're not using it from within XLST, you can use this rather contrived formula:

//a | //h1[not(following-sibling::a) or count(. | preceding-sibling::h1) != count(following-sibling::a[1]/preceding-sibling::h1)]

It works by matching h1 elements where the count of itself and all preceding h1 elements is not the same as the count of all h1 elements preceding the next a. There may be a more efficient way of doing it in XPath, but if it's going to get any more contrived than that, I'd definitely recommend not using XPath at all.

error installing java on ubuntu 10 64bit

9 votes

EDIT

I added this note to explain why I keep this question here. I added "Android" as keyword and I'd like to know whether someone else has tried to download the code and how it is possible to work around this problem. I fear that if I ask Ubuntu they would suggest me to use OpenJDK but the question is: did someone use that SDK to build Android code?

ORIGINAL

Sometime ago I downloaded the android source code on Ubuntu 10 64bit. I had problems but at the end I managed to get everything working. Now I'm trying to do it again on a fresh install of the same Ubuntu version but I'm having a problem.

Although I followed the instruction here I keep having the error:

Package sun-java6-jdk is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source E: Package sun-java6-jdk has no installation candidate

Googling gives you a lot of results that all give you the same solution:

sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"   
sudo apt-get update
sudo apt-get install sun-java6-jdk

I did it but it didn't work.

I'm running Ubuntu on a VM under VMWare.

I have also tried to add another source:

sudo add-apt-repository "deb-src http://archive.canonical.com/ubuntu lucid partner

but it didn't help

Maybe the answer is here:

Answer in SuperUser

but it is weird that on the Android portal there isn't any mention of it

Ubuntu 10.04 Lucid Lynx has Java 6 packages available, but you should activate partner packages first of all to download it from repositories. To do it, edit file /etc/apt/sources.list with command:

gksudo gedit /etc/apt/sources.list

and uncomment these lines:

deb http://archive.canonical.com/ubuntu lucid partner
deb-src http://archive.canonical.com/ubuntu lucid partner

then you can update repositories and install Java 6 packages with:

sudo apt-get update
sudo apt-get install sun-java6-jdk

You don't need to add third-party repositories.

Which Windows GUI frameworks are currently worth learning?

8 votes

I'm planning to write a Windows app to help myself with some exploratory testing tasks (note taking, data generation, defect logging) and I've got stuck at the early stage of choosing a framework/language. My sole experience is with web development and from what I can see, WinForms, WPF, Silverlight, Swing etc are all simultaneously obsolete and thriving depending on who you ask.

While my main aim is to create the app, obviously I'd like to learn something useful while doing so rather than picking up skills with something that's never going to be seen on a project at work. Which Java or C# frameworks would people recommend learning?

Native Applications

For employment: Well, nowadays most companies (at least most companies from Oman and the UAE, where I live) are slowly migrating to the cloud. However there are still some opportunities for native app development. The most demanding framework nowadays, is, ( no.. not WPF ), it's Windows Forms!

Why plain old Windows Forms instead of the awesome WPF? One reason, legacy apps. Nowadays most companies only start small scale GUI Application projects, mainly Business applications. For that, WPF will be very expensive since they already have a work-force experienced in Windows Forms, and a lot of legacy code, however for WPF they will have to create a new code-base, and that's pretty risky. So the best thing to keep you employed is Windows Forms.

For new projects: However, if by 'worth learning', you mean, new, ambitious and glamorous. Then WPF may be the best choice for you. It depends on what your requirements are, really.

The Cloud

Now, for the cloud. Java FX and Silverlight are both currently head to head. However Java FX may have an edge since it supports a greater number of platforms. But then again, Silverlight has all the power and resources of Microsoft behind it, and it's ideal for Windows Phone development.

Comparison

For a comparison, here's what you get by each toolkit:

Windows Presentation Foundation:

  • The power and resources of Microsoft
  • Ideal for creating new Desktop Applications
  • Eye candy
  • Awesome API
  • XAML, best way to separate design from logic
  • Create Apps for the Cloud (but they only work on Windows with .NET though)
  • Windows Phone can run a subset of WPF

Windows Forms:

  • Used to possess the power and resources of Microsoft, now WPF has that
  • Ideal for maintaining legacy applications
  • A well-trained workforce, if you're an entrepreneur
  • Pretty mature API
  • Supports more platforms than WPF (through Mono)

Java FX:

  • Create Apps for the Cloud
  • Backed by Oracle
  • Pretty nice API
  • Cross-platform, runs on most PCs, smart phones are a problem.

Silverlight:

  • Create Apps for the Cloud
  • Backed by Microsoft
  • Pretty awesome API
  • XAML
  • Cross-platform, runs on Mac and PC, runs on Windows Phone.

GTK#:

  • Cross-platform, runs on most PCs, runs on no smart phone.
  • Backed by the Open-Source world
  • Endorsed by Mono
  • Ideal for creating Apps for Gnome.

Swing:

  • Cross-platforms, runs on most PCs, smart phones are a problem.
  • Pretty mature
  • Ideal for creating 2D games, using Java2D

Conclusion

As you say:

While my main aim is to create the app, obviously I'd like to learn something useful while doing so rather than picking up skills with something that's never going to be seen on a project at work.

Well, the frameworks you are most likely to see at work (if you don't for mainstream companies like Microsoft, Oracle, Google etc. ) are Windows Forms and WPF. At least that's what most companies use here. So those are what I recommend. JavaFX and Silverlight also look like they have potential and may be used in the near future.

Print a carriage return in java on windows on console

6 votes

On my OS X machine, the following line gives me a nice and easy way to track the state of my loops:

for (int index = 0; index < 100; index++)
    for (int subIndex = index; subIndex < 100; subIndex++)
        System.out.print("\r" + index + "/" + subIndex + "       ");

But when I try to run the same thing on windows, it prints out newlines instead of a carriage return. How can I achieve the same simple method of tracking the process on windows?

I had the statement and it worked in the command prompt

System.out.println("This is Java"+'\r'+"That");

and gives me output as

That is Java

That means it works perfectly.

Note: I run it in Windows 7 with JDK 7 and simple notepad.

It is the problem of eclipse, it will take \r as a new line character and will print

This is Java
That

as output