Often I have to debug crashing C++ programs on Windows where I can reproduce the crash, but it is hard to determine what sequence of instructions in the code caused the crash (e.g. another thread overwriting memory of the crashing thread). Even a call stack does not help in that case. Usually I resort to narrowing down the crash cause by commenting out sections of the source code, but this is very tedious.
Does anyone know a tool for Windows that can report or replay the last few source code lines or machine code instructions executed in all threads immediately before a crash? I.e. something like the reverse debugging capability of gdb or something like Mutek's BugTrapper (which no longer is available). I am looking for a released and stable tool (I am aware of SoftwareVerify's 'Bug Validator' and Hexray's IDA Pro 6.3 Trace Replayer, both of which still are in closed beta programs).
What I already tried were the WinDbg trace commands
ta @$ra, but both commands have the disadvantage that they stop automatically after a few seconds. I require trace commands that run until the crash happens, and that trace all threads of the running program.
NOTE: I am not looking for a debug tool designed to fix a particular problem, like gflags, pageheap, Memory Validator, Purify, etc. I am looking for released and stable tool to trace or replay at the instruction level.
I found a solution: "replay debugging" using VMware Workstation and Visual Studio 2010. Setting it up takes a lot of time, but you are rewarded with a Visual Studio C++ debugger that can debug backwards in time. Here is a video that demonstrates how replay debugging works: http://blogs.vmware.com/workstation/2010/01/replay-debugging-try-it-today.html.
A drawback of the solution is that VMware seemingly has discontinued replay debugging in the latest VMware versions. Furthermore, only certain processor types seem to support replaying. I have not found any comprehensive list of supported processors; I tested the replay features on three of my PCs: replaying did not work on a Core i7 200; replaying worked on a Core2 6700 and on a Core2 Q9650.
I really hope that VMware reconsiders and introduces replay debugging again in future VMware Workstation versions, because this really adds a new dimension to debugging.
For those of you who are interested, here is a description how you can set up an environment for replay debugging:
In the description below, "local debugging" means that Visual Studio and VMware are installed on the same PC. "Remote debugging" means that Visual Studio and VMware are installed on different PCs.
Install Visual Studio 2010 with SP1 on the host system.
Make sure Visual Studio has been configured to use Microsoft's symbol servers. (Under "Tools | Options | Debugging | Symbols").
On the host system, install "Debugging Tools for Windows".
Install VMware Workstation 7.1. (Version 8.0 no longer contains the replay debugging feature). This will also install a plug-in into Visual Studio.
Install a virtual machine (VM) on VMware with Windows XP SP3.
If the application under test is a debug build, install the Visual Studio debug DLLs on the VM. (See http://msdn.microsoft.com/en-us/library/dd293568.aspx for instructions how to do that, but use a "Debug" configuration instead of "Release").
Copy "gflags.exe" from the host's "Debugging Tools for Windows" directory to the VM, run gflags.exe on the VM, select "Disable paging of kernel stacks" under the "System Registry tab" and press OK. Reboot the VM.
Copy all EXE and DLL files of the application under test to the VM and make sure that you can start the application and reproduce the problem.
Shutdown the VM and create a snapshot (via context menu item "Take Snapshot" in VMware Workstation).
(Only for remote debugging:) Start the following command on the Visual Studio PC and enter an arbitrary passcode:
C:\Program Files\VMware\VMware Workstation\Visual Studio Integrated Debugger\dclProxy.exe hostname
Replace hostname by the name of the PC.
(Only for remote debugging:) Create a recording manually for the VM. I.e. log in to the VM's operating system, start the recording (via context menu "Record"), run the application under test and perform the actions necessary to reproduce the problem. Then stop and save the recording.
Start Visual Studio and go to "VMware | Options | Replay Debugging in VM | General", and set the following values:
- "Local or Remote" must be set to "Local" for local debugging or to "Remote" for remote debugging.
- "Virtual Machine" must be set to the path to the VM's .vmx file .
- "Remote Machione Passcode" must be set to be passcode you used above (only for remote debugging).
- "Recording to Replay" must be set to a recording name that you previously created with VMware.
- "Host Executable Search Path" must be set to a directory in which you save DLLs which are required by the application under test and which are needed by Visual Studio to display correct stack traces.
Go to "VMware | Options | Replay Debugging in VM | Pre-Record Event", and set the following values:
- "Base Snapshot for Recording": name of snapshot created previously.
(For local debugging:) In Visual Studio, select "VMware | Create Recording for Replay"; this restarts the VM. Login to the VM, run the application under test and perform the actions necessary to reproduce the problem. Then stop and save the recording.
Select "VMware | Start Replay Debugging". VMware now automatically restarts the VM and the application under test and replays the recorded actions. Wait until the application crashes; the Visual Studio debugger then automatically becomes active.
In the Visual Studio debugger, set a breakpoint to a location where you think the application has been before the crash. Then, select "VMware | Reverse Continue". The debugger now runs backwards to the breakpoint. This operation can take some time because the VM will be restarted and replayed until your breakpoint is reached. (You can speed up this operation by adding a snapshot a few seconds before the crash happens when you record the scenario. You can add additional snapshots during replay debugging.)
Once VMware has replayed the VM to your breakpoint, you can use "Step Over" and "Step Into" to step forward from your breakpoint, i.e. you replay the recorded history of events, until you reach a point where you can identify the reason why your application crashed.
Further information: http://www.replaydebugging.com/