I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized keyword, Broke Synchronization should never be printed from the class TestBuffer.
Here are the 4 classes that will reproduce the issue (at least for me). I'm not interested in how to fix this broken example, but rather why it breaks in the first place.
And here is the output I get when I run it:
java -cp . SyncTest
Before Adding
Creating a TestBuffer
Before Remove
Broke Synchronization
1365192
Broke Synchronization
1365193
Broke Synchronization
1365194
Broke Synchronization
1365195
Broke Synchronization
1365196
Done
UPDATE: @Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition
Based on the feedback I've gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0's) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.
ORIGINAL QUESTION:
I have a class which provides the following methods:
- remove - Removes the given item from the list
- getBuffer - Iterates over all the items in the list
I've reduced the problem down to the 2 functions below, both of which are in the same object and they're both synchronized. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer should always be set back to false before remove can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException.
See Also:
Very strange race condition which looks like a JRE issue
I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.
Running as:
java -XX:+PrintCompilation -XX:CompileThreshold=10000 phil.StrangeRaceConditionTest
you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:
68 1 java.lang.String::hashCode (64 bytes)
97 2 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
104 3 java.math.BigInteger::mulAdd (81 bytes)
106 4 java.math.BigInteger::multiplyToLen (219 bytes)
111 5 java.math.BigInteger::addOne (77 bytes)
113 6 java.math.BigInteger::squareToLen (172 bytes)
114 7 java.math.BigInteger::primitiveLeftShift (79 bytes)
116 1% java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
121 8 java.math.BigInteger::montReduce (99 bytes)
126 9 sun.security.provider.SHA::implCompress (491 bytes)
138 10 java.lang.String::charAt (33 bytes)
139 11 java.util.ArrayList::ensureCapacity (58 bytes)
139 12 java.util.ArrayList::add (29 bytes)
139 2% phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
158 13 java.util.HashMap::indexFor (6 bytes)
159 14 java.util.HashMap::hash (23 bytes)
159 15 java.util.HashMap::get (79 bytes)
159 16 java.lang.Integer::valueOf (32 bytes)
168 17 s phil.StrangeRaceConditionTest::getBuffer (66 bytes)
168 18 s phil.StrangeRaceConditionTest::remove (10 bytes)
171 19 s phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
172 3% phil.StrangeRaceConditionTest::strangeRaceConditionTest @ 36 (76 bytes)
ERRORS //my little change
219 15 made not entrant java.util.HashMap::get (79 bytes)
There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:
exclude phil/StrangeRaceConditionTest strangeRaceConditionTest
When you run the program again, you get this output:
CompilerOracle: exclude phil/StrangeRaceConditionTest.strangeRaceConditionTest
73 1 java.lang.String::hashCode (64 bytes)
104 2 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
110 3 java.math.BigInteger::mulAdd (81 bytes)
113 4 java.math.BigInteger::multiplyToLen (219 bytes)
118 5 java.math.BigInteger::addOne (77 bytes)
120 6 java.math.BigInteger::squareToLen (172 bytes)
121 7 java.math.BigInteger::primitiveLeftShift (79 bytes)
123 1% java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
128 8 java.math.BigInteger::montReduce (99 bytes)
133 9 sun.security.provider.SHA::implCompress (491 bytes)
145 10 java.lang.String::charAt (33 bytes)
145 11 java.util.ArrayList::ensureCapacity (58 bytes)
146 12 java.util.ArrayList::add (29 bytes)
146 2% phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
165 13 java.util.HashMap::indexFor (6 bytes)
165 14 java.util.HashMap::hash (23 bytes)
165 15 java.util.HashMap::get (79 bytes)
166 16 java.lang.Integer::valueOf (32 bytes)
174 17 s phil.StrangeRaceConditionTest::getBuffer (66 bytes)
174 18 s phil.StrangeRaceConditionTest::remove (10 bytes)
### Excluding compile: phil.StrangeRaceConditionTest::strangeRaceConditionTest
177 19 s phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
324 15 made not entrant java.util.HashMap::get (79 bytes)
and the problem does not appear (at least not in the repeated attempts that I've made).
Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.
java -XX:+PrintCompilation -XX:CompileThreshold=100000 phil.StrangeRaceConditionTest
java -XX:+PrintCompilation -XX:FreqInlineSize=1 phil.StrangeRaceConditionTest
Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.
See here for info on the JVM options.
See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.