Best oracle questions in November 2010

Do any databases allow multiple indexes on the same table to be created simultaneously?

10 votes

I'm pretty sure this can't be done in Oracle, but I'd love to be proved wrong...

Say I have a huge table in with lots of columns and I want to create indexes on a dozen or so columns. Using Oracle, I'd fire off several sequential create index statements and go off and boil the kettle.

Each create index needs to scan through every row in the table to form the index.

i.e. 10 indexes = 10 full scans.

You'd think an obvious optimisation would be to scan the table once and index the 10 columns at the same time. Wouldn't you?

create indexes on mytable (
    ix_mytable_cola (cola),
    ix_mytable_colb (colb),
    ix_mytable_colc (colc)
);

So obvious that there must be a great reason why it's not there.

Any ideas?

I could fire off each create index simultaneously in separate sessions and hope the database buffer cache saved the day, but seems like a long shot.

EDIT

I didn't get a definitive answer so I asked the same question on Oracle-L:

http://www.freelists.org/post/oracle-l/Creating-multiple-indexes

General consensus was that it isn't available but would perhaps be a useful feature. Most useful response was from David Aldridge who suggested that if the create index statements were all kicked off at the same time then Oracle would 'do the right thing'.

The answer is no for Oracle and according to my research it is also no for DB2. I doubt any others have that feature.

Is UPDATE = DELETE(marked as) + INSERT?

10 votes

This is SQL Server question but I would appreciate the answers form other DBMS contexts properly identified.

The answer by Seth Lynch to my question in MSDN forum:

tells:

"When data is updated it is not over written - the original row is marked as deleted and a new row is inserted"

Is it correct statement? Can you give references supporting this in docs?
How can it be verified?

Related discussions:

Update: Not long time ago I believed that dirty reads permitted in READ UNCOMMITTED transaction isolation level (or, what is the same in SQL Server, through WITH(NOLOCK) hint) permitted reading (from other transactions) uncommitted (or committed, if not yet changed) values but not partly-changed (partly updated, partly deleted or partly inserted.

RESUME': putting it short, that phrase is generally and for most cases incorrect (while it states categorically about rather uncommon cases in SQL Server)

According to Kalen Delaney, in her book Inside Microsoft SQL Server 2005: The Storage Engine, SQL Server 2005 (and now 2008) can update a row by either using an insert/delete or in place, by just changing one column's value. Here's a quick summary of what she says on p. 306-311 of the book.

The normal behavior in SQL Server 2005/2008 is to update a row in place. The row stays in the same location on the page and only the affected bytes are changed. An example of this would be updating the value in an integer column that's not part of the culstered index.

A row may updated with an insert/delete when it's size changes and it no longer fits on the original page. This could happen when you change the value in a varchar column and make it longer. It also happens when the clustered index column is changing and the row needs to move because of its position in the index (because rows are ordered by the clustered key). An example of this would be changing someone's last name from "Smith" to "Jones" in a table with a clustered index on last name.

Why doesn't Oracle raise "ORA-00918: column ambiguously defined" for this query?

9 votes

I've just come across a strange behaviour in Oracle where I would expect ORA-00918 to be raised, but isn't. Take this query, for example.

SELECT *
FROM USER_TABLES TAB
JOIN USER_TRIGGERS TRG ON TRG.TABLE_NAME = TAB.TABLE_NAME
WHERE STATUS = 'DISABLED'

This query is notionally looking for the details of tables with disabled triggers, but please note that this is not the problem I'm trying to solve. The problem is not unique to this query, the data dictionary, views or tables; as far as I can tell it applies to any set of tables or views (from the two or three I've tried).

Anyway, try to run this query and you get ORA-00918 because both USER_TABLES and USER_TRIGGERS have a column called STATUS so to get the query to run the WHERE clause needs to be changed to TRG.STATUS. Ok, cool, but try instead joining another table.

SELECT *
FROM USER_TABLES TAB
JOIN USER_TRIGGERS TRG ON TRG.TABLE_NAME = TAB.TABLE_NAME
JOIN USER_CONSTRAINTS CON ON CON.TABLE_NAME = TAB.TABLE_NAME
WHERE STATUS = 'DISABLED'

This query, without qualifying which STATUS column you mean, magically works! Never mind the semantics or what the query returns, there is no error. USER_CONSTRAINTS even has a column called STATUS too, so how come it doesn't know what to do when there are two columns to choose from but it's okay with even more ambiguity?

This is all on 10.2.0.3.0 by the way, and in essence ORA-00918 stops being raised if you have more than two tables in your query. If this is an Oracle bug, does anyone know when it was fixed and so which Oracle version is likely to cause cowboy queries to blow up if our database is upgraded?

Update

Thanks to BQ for demonstrating the bug is fixed in 11.2.0.1.0. Bounty for anyone that can show it fixed in an earlier version!

Can't say when it was fixed, but here's my results:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> SELECT *
  2  FROM USER_TABLES TAB
  3  JOIN USER_TRIGGERS TRG ON TRG.TABLE_NAME = TAB.TABLE_NAME
  4  WHERE STATUS = 'DISABLED';
WHERE STATUS = 'DISABLED'
      *
ERROR at line 4:
ORA-00918: column ambiguously defined

SQL> ed
Wrote file afiedt.buf

  1  SELECT *
  2  FROM USER_TABLES TAB
  3  JOIN USER_TRIGGERS TRG ON TRG.TABLE_NAME = TAB.TABLE_NAME
  4  JOIN USER_CONSTRAINTS CON ON CON.TABLE_NAME = TAB.TABLE_NAME
  5* WHERE STATUS = 'DISABLED'
SQL> /
WHERE STATUS = 'DISABLED'
      *
ERROR at line 5:
ORA-00918: column ambiguously defined

Do I need to create indexes on foreign keys?

8 votes

I have a table A and a table B. A has a foreign key to B on B's primary key, B_ID.

For some reason (I know there are legitimate reasons) it is not using an index when I join these two tables on the key.

Do I need to separately create an index on A.B_ID or should the existence of a foreign key provide that?

The foreign key constraint alone does not provide the index - one must (and should) be created.

Handling Revisions within Oracle

8 votes

I have a table say:

CREATE TABLE "DataNode" (
   "ID" NUMBER(7,0),
   "TYPE" NUMBER(7,0),
   "NAME" VARCHAR2(100),
   "STATUS" NUMBER(7,0),
   "REVISION" NUMBER(4,0),
   "MODIFIEDAT" DATE
 );

 CREATE TABLE "DataNode_Revisions" (
   "ID" NUMBER(7,0),
   "NODEID" NUMBER(7,0),
   "TYPE" NUMBER(7,0),
   "NAME" VARCHAR2(100),
   "STATUS" NUMBER(7,0),
   "REVISION" NUMBER(4,0),
   "MODIFIEDAT" DATE
 ) COMPRESS;

So I have these two tables. I do all my reads from "DataNode" and when a change occurs I write out the current entry to "DataNode_Revisions" and then modify my existing "DataNode" record. Makes sense?

Is this the best way to go about it? I can already tell I am going to run into problems when the Schema changes. I am not seeing a better alternative but if there is one please let me know! I assume keeping this all in one table would result in massive performance losses woudl it not? I mean I would be more then quadrupling the number of records and there is already quite a few. I think Drupal stores node revisions like this, and I am curious how they do not suffer performance problems from it.

"DataNode" is constantly being read by a lot of users. However, very few writes ever occur. "DataNode_Revisions" is only read from on occasion. I am just worried about maintaining so many tables. "DataNode" is one of ~25 tables very similar to this one.

Whether there will be any performance implications from storing the old rows in the DataNode table depends on how the DataNode rows are accessed. If the reads are all single-row lookups for the current row, the number of rows in the table is relatively immaterial-- it's not going to take any more work to find the current row for a particular ID than it would to get the row for that ID from the current DataNode table (I'm assuming here that ID is the key for the table). On the other hand, if you have a number of queries that are doing table scans of the DataNode table, then quadrupling the number of rows will increase the time required to run those queries.

If you want to go down the path of putting the historical rows in the DataNode table, you would probably want to add an EXPIRATION_DATE column that is NULL for the current row and populated for the expired rows. You could then create a function-based index based on the EXPIRATION_DATE that would have data for only the current rows, i.e.

CREATE INDEX idx_current_ids
    ON DataNode( (CASE WHEN expiration_date IS NULL THEN id ELSE null END) );

which would be used in a query like

SELECT *
  FROM DataNode
 WHERE (CASE WHEN expiration_date IS NULL THEN id ELSE null END) = <<some id>>

Obviously, you'd probably want to create a view that has this condition rather than rewriting it every time you need the current row, i.e.

CREATE VIEW CurrentDataNode
AS
SELECT (CASE WHEN expiration_date IS NULL THEN id ELSE null END) id,
       type,
       name,
       status
  FROM DataNode;

SELECT *
  FROM CurrentDataNode
 WHERE id = <<some value>>