Best oracle questions in November 2011

update x set y = null takes a long time

10 votes

At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is that

UPDATE table1 SET y = null

takes much more time than filling the column with data which is generated, for example, in the sql query from other columns of the same table or queried from other tables in a subquery. It does not matter if I go through all table rows at once (like in the update query above) or if I use a cursor to go through the table row by row (using the pk). It does not matter if I use the large table at work or if I create a small test table and fill it with some hundredthousands of test-rows. Setting the column to null always takes way longer (Throughout the tests, I encountered factors of 2 to 10) than updating the column with some dynamic data (which is different for each row).

Whats the reason for this? What does Oracle do when setting a column to null? Or - what's is my error in reasoning?

Thanks for your help!

P.S.: I am using oracle 11g2, and found these results using both plsql developer and oracle sql developer.

Summary

I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.

What's so special about null?

From the Oracle Database Concepts:

"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).

Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns, the columns more likely to contain nulls should be defined last to conserve disk space."

Test

Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs, objects should be recreated for each run, and the high and low values should be discarded.

For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.

I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.

create table test1(col1 number);

BEGIN
    dbms_output.enable(1000000);

   runstats_pkg.rs_start;

    for i in 1 .. 10 loop
        execute immediate 'drop table test1 purge';
        execute immediate 'create table test1 (col1 number)';
        execute immediate 'insert /*+ append */ into test1 select 1 col1
            from dual connect by level <= 100000';
        commit;
        execute immediate 'update test1 set col1 = 1';
        commit;
    end loop;

   runstats_pkg.rs_pause;
   runstats_pkg.rs_resume;

    for i in 1 .. 10 loop
        execute immediate 'drop table test1 purge';
        execute immediate 'create table test1 (col1 number)';
        execute immediate 'insert /*+ append */ into test1 select 1 col1
            from dual connect by level <= 100000';
        commit;
        execute immediate 'update test1 set col1 = null';
        commit;
    end loop;

   runstats_pkg.rs_stop();
END;
/

Result

There are dozens of differences, these are the four I think are most relevant:

Type  Name                                 Run1         Run2         Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs)                1,269        4,738        3,469
STAT  heap block compress                     1        2,028        2,027
STAT  undo change vector size        55,855,008  181,387,456  125,532,448
STAT  redo size                     133,260,596  581,641,084  448,380,488

Solutions?

The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables. So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything. The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.

However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.

How to execute sql file from java

6 votes

I have an ORACLE SQL sctipt with several queries and tables, and I wan to run that script from my java program at the starting of the program to ensure that everything is on the right place. I found a code to run the script, but it doesn't work for some reason. Can anyone provide me samples so that I can follow it.

This is what I found :

try {
    String line;
    Process p = Runtime.getRuntime().exec ("psql -U sas -d oracle -h @localhost -f Lab_05_Tables.sql");
    BufferedReader input =new BufferedReader(new InputStreamReader(p.getInputStream()));

    while ((line = input.readLine()) != null) {        
        System.out.println(line);
    }
    input.close();
}
catch (Exception err) {
    err.printStackTrace();
}

But it doesn't work though.

Error

java.io.IOException: Cannot run program "psql": CreateProcess error=2, The system     
cannot find the file specified

It would be much better, if you have resources, to port SQLs from your script into Java program itself.

See Java JDBC tutorial.

Performance difference between VARCHAR2 to NUMBER

4 votes

If I query my DataBase (Oracle) with unique Index on multiple columns, Will there be any performance difference if I change one of the columns from VARCHAR2 to NUMBER?
If there is, is it significant?

(It's varchar2 because I need '0' at the beginning but I can change it in the presentation layer in my app)

Yes, it should be quicker to use NUMBER. Whether it gives you a significant increase will depend on your data, indexes and queries. If you're having performance problems, this is unlikely to be the magic fix.