Best sql questions in April 2011

Joins are for lazy people?

76 votes

I recently had a discussion with another developer who claimed to me that JOINs (SQL) are useless. This is technically true but he added that using joins is less efficient than making several requests and link tables in the code (C# or Java).

For him joins are for lazy people that don't care about performance. Is this true? Should we avoid using joins?

No, we should avoid developers who hold such incredibly wrong opinions.

In many cases, a database join is several orders of magnitude faster than anything done via the client, because it avoids DB roundtrips, and the DB can use indexes to perform the join.

Off the top of my head, I can't even imagine a single scenario where a correctly used join would be slower than the equivalent client-side operation.

Edit: There are some rare cases where custom client code can do things more efficiently than a straightforward DB join (see comment by meriton). But this is very much the exception.

Need help to optimize MySQL query

16 votes

I have 6 tables:

CREATE TABLE IF NOT EXISTS `sbpr_groups` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `active` tinyint(1) DEFAULT '0',
  `dnd` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=32 ;

CREATE TABLE IF NOT EXISTS `sbpr_newsletter` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `from` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `mail` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `subject` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `body` text COLLATE utf8_unicode_ci,
  `attach1` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `attach2` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `attach3` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=14;

CREATE TABLE IF NOT EXISTS `sbpr_news_groups` (
  `newsletter_id` int(11) NOT NULL,
  `groups` int(11) NOT NULL,
  KEY `fk_sbpr_news_groups_sbpr_newsletter1` (`newsletter_id`),
  KEY `fk_sbpr_news_groups_sbpr_groups1` (`groups`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `sbpr_news_recs` (
  `newsletter_id` int(11) NOT NULL,
  `recipients` int(11) NOT NULL,
  KEY `fk_sbpr_news_recs_sbpr_newsletter1` (`newsletter_id`),
  KEY `fk_sbpr_news_recs_sbpr_recipients1` (`recipients`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `sbpr_recipients` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `mail` varchar(160) DEFAULT NULL,
  `date_reg` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `active` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=3008 ;

CREATE TABLE IF NOT EXISTS `sbpr_rec_groups` (
  `rec_id` int(11) NOT NULL,
  `group` int(11) NOT NULL,
  KEY `fk_sbpr_rec_groups_sbpr_recipients` (`rec_id`),
  KEY `fk_sbpr_rec_groups_sbpr_groups1` (`group`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

With this foregin keys:

ALTER TABLE `sbpr_news_groups`
  ADD CONSTRAINT `fk_sbpr_news_groups_sbpr_groups1` 
    FOREIGN KEY (`groups`) REFERENCES `sbpr_groups` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION,
  ADD CONSTRAINT `fk_sbpr_news_groups_sbpr_newsletter1` 
    FOREIGN KEY (`newsletter_id`) REFERENCES `sbpr_newsletter` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION;

ALTER TABLE `sbpr_news_recs`
  ADD CONSTRAINT `fk_sbpr_news_recs_sbpr_newsletter1` 
    FOREIGN KEY (`newsletter_id`) REFERENCES `sbpr_newsletter` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION,
  ADD CONSTRAINT `fk_sbpr_news_recs_sbpr_recipients1` 
    FOREIGN KEY (`recipients`) REFERENCES `sbpr_recipients` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION;

ALTER TABLE `sbpr_rec_groups`
  ADD CONSTRAINT `fk_sbpr_rec_groups_sbpr_groups1` 
    FOREIGN KEY (`group`) REFERENCES `sbpr_groups` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION,
  ADD CONSTRAINT `fk_sbpr_rec_groups_sbpr_recipients` 
    FOREIGN KEY (`rec_id`) REFERENCES `sbpr_recipients` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION;

Visual structure of tables: enter image description here

I want to select all rows from sbpr_newsletter table, and add to each of these lines the number of rows from sbpr_recipients whose id prescribed in sbpr_news_recs or prescribed in sbpr_rec_groups depence on FKs.

Ex. I whant to select count of all recipients of curent newsletter wich are in sbpr_news_recs or exists in group wich are in sbpr_rec_groups plus count of active recipients.

I have workinq SQL:

SELECT d.id,  d.subject , d.created_date,
    (SELECT count(*) FROM sbpr_recipients r
      LEFT JOIN sbpr_news_recs nr ON nr.recipients = r.id
      LEFT JOIN sbpr_rec_groups g ON g.rec_id = r.id
      LEFT JOIN sbpr_news_groups ng ON ng.groups = g.group
      WHERE nr.newsletter_id = d.id OR ng.newsletter_id = d.id) AS repicients,

    (SELECT count(*) FROM sbpr_recipients r
      LEFT JOIN sbpr_news_recs nr ON nr.recipients = r.id
      LEFT JOIN sbpr_rec_groups g ON g.rec_id = r.id
      LEFT JOIN sbpr_news_groups ng ON ng.groups = g.group
      WHERE (nr.newsletter_id = d.id OR ng.newsletter_id = d.id) 
      AND r.active = 1) AS active_repicients
FROM sbpr_newsletter d
ORDER BY d.id ASC, d.id

Explain of this sql: enter image description here

Question: How can I optimize my sql?

Just approach to optimize, two SELECT queries are transfered into JOIN clause -

SELECT d.id
     , d.subject
     , d.created_date
     , count(if(nr_newsletter_id is not null or ng_newsletter_id is not null, 1, null)) repicients
     , count(if((nr_newsletter_id is not null or ng_newsletter_id is not null) and t.active = 1, 1, null)) active_repicients
FROM
  sbpr_newsletter d
LEFT JOIN (
  SELECT nr.newsletter_id nr_newsletter_id
       , ng.newsletter_id ng_newsletter_id
       , r.active
  FROM
    sbpr_recipients r
  LEFT JOIN sbpr_news_recs nr
    ON nr.recipients = r.id
  LEFT JOIN sbpr_rec_groups g
    ON g.rec_id = r.id
  LEFT JOIN sbpr_news_groups ng
    ON ng.groups = g.group
  ) t
ON nr_newsletter_id = d.id OR ng_newsletter_id = d.id
GROUP BY
  d.id;

I rewrited your query a little, it is not tested, but try it.

Hibernate authentication without passwords stored in plain text

12 votes

My goal is to authenticate to the database using a JDBC/Hibernate in a secure manner, without storing passwords in plain text. Code examples appreciated. I'm already using waffle to authenticate the user so if there was some way to use the credentials that waffle obtained from the user, and forward those to the DB, that would be good.

Two questions:

  1. What is the recommended way to do multi hop authentication (the client, web server, and database are all different machines) with tomcat/hibernate/spring on web server, a sql database, and obviously client browser?
  2. I would also settle for a way to use a single user account to do authentication, as long as that user account's information was not stored in plain text anywhere. The user account will need both read/write privileges on the DB.

I found some useful information about connecting to SQL Server in this thread. However, I'm expecting that Tomcat will be running under the default account which is like, Local System or something. As far as I know, that account cannot be used to do windows authentication to the database.

My solution:

I did end up using the approach mentioned in the above thread. Instead of running the Tomcat service as Local System it is now running as a user. That user has permission to access the database. My hibernate configuration file is configured as follows:

    <property name="hibernate.connection.url">
jdbc:sqlserver://system:port;databaseName=myDb;integratedSecurity=true;
</property>

To those who provided responses

I appreciate everyone's help and I will try out some of the techniques mentioned in the thread. My issue with some of the responses is that they require symmetric encryption which requires a secret key. Keeping the key secret is almost the exact same problem as storing the password in plain text.

Okay, let's take a look at the problem. You want to have the authentication information made available but not hardcoded anywhere in code or in file system. What I would suggest:

  • require the administrator of the application to specify the authentication information upon application startup either via jmx or via a webpage that does not require any database connection.
  • Add a servlet filter to limit access until database authentication information is entered.

This solution does require some extending spring context loading so that it waits until the authentication information is specified (via entry page).

Audit Tables: Maintaining Referential Integrity - Good or Bad

12 votes

We are planning on introducing simple Audit Trail in our database using triggers and separate history table for each table that requires auditing.

For example consider table StudentScore, it has few foreign keys (eg. StudentID, CourseID) linking it to corresponding parent tables (Student & Course).

Table StudentScore (
    StudentScoreID, -- PK
    StudentID ref Student(StudentID),  -- FK to Student
    CourseID ref Course(CourseID),   -- FK to Course
)

If StudentScore requires auditing, we are planning to create audit table StudentScoreHistory -

Table StudentScoreHistory (
    StudentScoreHistoryID, -- PK
    StudentScoreID,
    StudentID,
    CourseID,
    AuditActionCode,
    AuditDateTime,
    AuditActionUserID
)

If any row in StudentScore is modified we'll move old row to StudentScoreHistory.

One of the points raised during design discussion was to make StudentID and CourseID in StudentHistory table a FK, to maintain referential integrity. Argument made in favour of this was as we always mostly do a soft (logical Boolean flag) delete rather than hard delete, its good to maintain referential integrity to ensure we do not have any orphan ids in audit table.

Table StudentScoreHistory (
    StudentScoreHistoryID, -- PK
    StudentScoreID,
    StudentID ref Student(StudentID), -- FK to Student
    CourseID ref Course(CourseID), -- FK to Course
    AuditActionCode,
    AuditDateTime,
    AuditActionUserID
)

This seems to be a bit odd design to me. I do agree with @Jonathan Leffler's comment that audit record should not stop the deletion of parent data. Instead, if this is required, should be handled via foreign keys in main table and not in audit table. I want to get your opinion, to make sure I'm not missing some value in extending foreign keys to audit tables.

Now my question is: Is it a good design to have these foreign keys in History tables?

Any details on key arguments (e.x. performance, best practice, design flexibility etc) would be highly appreciated.

For benefit of anyone looking for specific purpose and our environment:

Purpose:

  1. Maintain critical data history
  2. Allow auditing of user activity with support to recreate scenario
  3. To limited extent allow roll-back of user activity

Environment:

  • Transactional database
  • Not every table requires auditing
  • Uses soft-delete to the extent possible, specifically for static/reference data
  • Few highly transactional tables do use hard deletes

When discussing auditing, I would go back to the purpose behind it. It isn't really a backup but rather a history of what has been. For example, for StudentScore, you would want to be sure not to lose the fact that the student originally had a 65% when they now have a 95%. This audit trail would allow you to walk back through the changes to see what happened and who did it. From this, you could identify what a particular user did to abuse the system. In some ways this could be a type of backup since you could roll back these changes to their previous states without rolling back entire tables.

With this in mind (if my assumptions about what you are using this for are correct), the only place you would want a FK/PK relationship is between the history table and its "live" counterpart. Your audit (history) table should not refer to any other table because it is no longer a part of that system. Instead, it is simply a record of what has happened in one table. Period. The only referential integrity you might want to consider is between the history table and the live table (thus the possible FK/PK relationship). If you allow records to be deleted from the live table, don't include the FK in the history table. Then the history table could include deleted records (which is what you want if you allow deletions).

Don't get confused with relational integrity in the main database with this history table. The history tables are all stand-alone. They only serve as a history of one table (not a set of tables).

The relating of two history tables together is possible and even more advanced relations between the live and history tables together (Students and Courses with both live and history, for example) so you can deal with even the possibility that a student was deleted (shudder) since the record would still be in the history table. The only issue here would be if you don't keep the history for a particular table, in which case you are choosing to lose that data (if you allow deletes).

Interesting SQL puzzle

10 votes

Without loops or cursors, how do you take a list of date intervals and turn them into a string of 1s and 0s such that:

  • each bit represents each day from min(all the dates) to max(all the dates)
  • the bit is 1 if that day falls inside any of the date intervals
  • the bit is 0 if that day does not fall in any of the intervals

So for example, if the intervals were:

  • 1/1/2011 to 1/2/2011
  • 1/4/2011 to 1/5/2011

Then the SQL you write should output 11011. Here is a setup script you could use:

declare @TimeSpan table
(
    start datetime
    ,finish datetime
)

-- this is a good data set, with overlapping and non-overlapping time spans
insert into @TimeSpan values ('02/02/2010', '02/02/2010')
insert into @TimeSpan values ('02/03/2010', '02/03/2010')
insert into @TimeSpan values ('02/04/2010', '02/05/2010')
insert into @TimeSpan values ('02/05/2010', '02/06/2010')
insert into @TimeSpan values ('02/07/2010', '02/09/2010')
insert into @TimeSpan values ('02/08/2010', '02/08/2010')
insert into @TimeSpan values ('02/08/2010', '02/10/2010')
insert into @TimeSpan values ('02/14/2010', '02/16/2010')

-- for this set of data, the output string would be 111111111000111

DECLARE @Result VARCHAR(MAX), @start DATETIME

SELECT @start= MIN(start) ,
       @Result =REPLICATE('0',1+DATEDIFF(DAY,MIN(start),MAX(finish)))
FROM @TimeSpan

SELECT @Result = STUFF(@Result,
                       DATEDIFF(DAY,@start,start)+1,
                       DATEDIFF(DAY,start,finish)+1,
                       REPLICATE('1',1+DATEDIFF(DAY,start,finish)))
FROM @TimeSpan 

SELECT @Result       

Finding strings that differ with at most one letter from a given string in SAS with PROC SQL

10 votes

First some context. I am using proc sql in SAS, and need to fetch all the entries in a data set (with a couple of million entries) that have variable "Name" equal to (let's say) "Massachusetts". Of course, since the data was once manually entered by humans, close to all conceivable spelling errors occur ("Amssachusetts", "Kassachusetts" etc.).

I have found that few entries get more than two characters wrong, so the code

Name like "__ssachusetts" OR Name like "_a_sachusetts" OR ... OR Name like "Massachuset__"

would select the entries I am looking for. However, I am hoping that there must be a more convenient way to write

Name that differs by at most 2 characters from "Massachusetts";

Is there? Or is there some other strategy for fetching these entries? I tried searching both stackoverflow and the web but was unsuccesful. I am also a relative beginner with both SQL and SAS.

Some additional information: The database is not in English (and the actual string is not "Massachusetts") so using SOUNDEX is not really feasible (if it ever were).

Thanks in advance.

(Edit: Improved the title)

SAS has built-in functions COMPGED and COMPLEV to compute distances between strings. Here is an example that shows how to select just those with a Levenshtein edit distance of less than or equal to 2.

data typo;
input name $20.;
datalines;
massachusetts
masachusets
mssachusetts
nassachusets
nassachussets
massachusett
;

proc sql;
  select name from typo
  where complev(name, "massachusetts") <= 2;
quit;

Is there any good dynamic SQL builder library in Java?

9 votes

Hello.

Anyone knows some good SQL builder library for Java like Squiggle (not maintained anymore it seems). Preferably, a project in active development.

Preferably with syntax like Zend_Db_Select, something that will allow to make a query like

String query = db.select().from('products').order('product_id');

Querydsl and jooq are two popular choices.

Note: I work for the company behind Querydsl.

Count number of rows that are not within 10 seconds of each other

7 votes

Hello all,

I track web visitors. I store the IP address as well as the timestamp of the visit.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.79.3  1301654354
180.2.79.3  1301654356
180.2.79.3  1301654358
180.2.79.3  1301654366
180.2.79.3  1301654368
180.2.79.3  1301654422

I have a query to get total tracks:

SELECT COUNT(*) AS tracks FROM tracking

However, I now want to disregard visits from users that have visited multiple times within 10 seconds of each visit. Since I don't consider this another visit, its still part of the first visit.

When the ip_address is the same, check timestamp and only count those rows that are 10 seconds away from each other.

I am having difficulty in putting this into a SQL query form, I would appreciate any help on this!

Let me start with this table. I'll use ordinary timestamps so we can easily see what's going on.

180.2.79.3   2011-01-01 08:00:00
180.2.79.3   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:20
180.2.79.3   2011-01-01 08:00:23
180.2.79.3   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:40
180.2.79.4   2011-01-01 08:00:00
180.2.79.4   2011-01-01 08:00:13
180.2.79.4   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:25
180.2.79.4   2011-01-01 08:00:27
180.2.79.4   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:50

If I understand you correctly, you want to count these like this.

180.2.79.3   3
180.2.79.4   3

You can do that for each ip_address by selecting the maximum timestamp that is both

  • greater than the current row's timestamp, and
  • less than or equal to 10 seconds greater than the current row's timestamp.

Taking these two criteria together will introduce some nulls, which turn out to be really useful.

select ip_address, 
       t_s.time_stamp, 
       (select max(t.time_stamp) 
        from t_s t 
        where t.ip_address = t_s.ip_address 
          and t.time_stamp > t_s.time_stamp
          and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
from t_s 
group by ip_address, t_s.time_stamp
order by ip_address, t_s.time_stamp;

ip_address   time_stamp            next_page
180.2.79.3   2011-01-01 08:00:00   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:09   <null>
180.2.79.3   2011-01-01 08:00:20   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:23   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:25   <null>
180.2.79.3   2011-01-01 08:00:40   <null>
180.2.79.4   2011-01-01 08:00:00   <null>
180.2.79.4   2011-01-01 08:00:13   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:23   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:25   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:27   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:29   <null>
180.2.79.4   2011-01-01 08:00:50   <null>

The timestamp that marks the end of a visit has a null for its own next_page. That's because no timestamp is less than or equal to time_stamp + 10 seconds for that row.

To get a count, I'd probably create a view and count the nulls.

select ip_address, count(*)
from t_s_visits 
where next_page is null
group by ip_address

180.2.79.3   3
180.2.79.4   3

Not equal <> != operator in T-SQL on NULL

7 votes

Could someone please explain the following behavior in SQL?

SELECT * FROM MyTable WHERE MyColumn != NULL (0 Results)
SELECT * FROM MyTable WHERE MyColumn <> NULL (0 Results)
SELECT * FROM MyTable WHERE MyColumn IS NOT NULL (568 Results)

<> is Standard SQL-92, != is it's equivalent. Both evaluate for values, which NULL is not -- NULL is a placeholder to say there is the absence of a value.

Which is why you can only use IS NULL/IS NOT NULL to evaluate for such situations.

And it's not specific to SQL Server. All standards-compliant SQL dialects work the same way.

How do I group on continuous ranges

7 votes

I know some basic sql, but this one is beyond me. I have looked high and low but no dice. I need a view of the following data, I can do this in the application layer code. But unfortunately for this particular one, the code must be put in the data layer.

I am using T-SQL.

Table

Date      Crew       DayType
01-02-11  John Doe  SEA  
02-02-11  John Doe  SEA  
03-02-11  John Doe  SEA  
04-02-11  John Doe  HOME  
05-02-11  John Doe  HOME  
06-02-11  John Doe  SEA 

I need a view like this

DateFrom  DateTo    Name      DayType
01-02-11  03-02-11  John Doe  SEA
04-02-11  05-02-11  John Doe  HOME
06-02-11  06-02-11  John Doe  SEA

Unfortunately the base table is required for application layer to be in the format show. Is this possible to do in query?

Thanks

Luke

WITH    q AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY crew, dayType ORDER BY [date]) AS rnd,
                ROW_NUMBER() OVER (PARTITION BY crew ORDER BY [date]) AS rn
        FROM    mytable
        )
SELECT  MIN([date]), MAX([date]), crew AS name, dayType
FROM    q
GROUP BY
        crew, dayType, rnd - rn

This article may be of interest to you:

How to select part of a Timestamp in a SQL Query

7 votes

Hi all,

In the DB I am working with, I want to select only the year from a specific TimeStamp field. In particular, I'm hoping to select the unique years from this database column.

For instance, if all of the timestamps in the field "ModifyTimeStamp" are either from the year 2002 or the year 2006, I would like returned simply a result of '2002' and '2006'. If this is impossible, I'd be content with getting a result of a bunch of '2002's mixed with '2006's and would parse it later.

All I've been able to get working so far is "Select ModifyTimeStamp from Table" - all my attempts to parse have failed. I started reading about the extract command for SQL, but I believe it's only for PostgreSQL.

Any advice greatly appreciated!

Edit: Got the answer, thanks a lot datagod and Marc. The code I was looking for ended up being:

"Select distinct YEAR(ModifyTimeStamp) from Table"

You don't specify which RDBMS (database server) you're using, but most databases do have date handling functions built-in:

  • MySQL/SQL Server:

    select YEAR(modifytimestamp) from yourtable
    
  • Access/SQL Server:

    select DatePart(year, modifytimestamp) from yourtable
    
  • Oracle:

    select TO_CHAR(modifytimestamp, 'YYYY') from yourtable
    

Database: How to distinguish "one-or-more" and "zero-or-more" relationship?

6 votes

Design tools allow distinguish between "one TO zero-or-more" and "one TO one-or-more" relationships. I could imagine how to implement "one TO zero-or-more" relation:


CHILD_TABLE

(pk) chid_id
(fk) parent_id (required)


PARENT_TABLE

(pk) parent_id


How to implement "one TO one-or-more" relationship? How to say that parent requires at least one child? Or is "one-or-more" usually implemented as "zero-or-more" ?

Implementing a 1 to 1-or-more relationship is done through the business logic, or at least using a transaction. You cannot insert into both tables simultaneously; you insert into one table at a time. Therefore, you need to insert the parent before you can insert the child and there is no built-in way for the database to enforce the required logic.

If you wrap the two INSERT statements inside a transaction, then you guarantee if the child insert fails, the parent insert will be rolled back. However, it is up to the business logic to make sure a parent is not inserted without a child.

SQL Server 2005 Transactional Replication Fails to Publish Stored Procedure Containing an Index Create

6 votes

I've experienced a bizarre problem with a SQL Server 2005 Transactional Publication. The issue is this: If the publication contains an article that is a stored procedure that contains a create index statement, then there is an error thrown when attempting to replicate the schema of the stored procedure to a subscriber.

The behavior is very odd, because even if the create index statement is commented out, it still gives the exception, and it will only work if it is removed altogether.

Here is the exact error that's being returned:

Command attempted: GRANT EXECUTE ON [dbo].[usp_Test] TO [CompanyDatabase_access]

(Transaction sequence number: 0x00000170000008B9000500000000, Command ID: 5)

Error messages: Cannot find the object 'usp_Test', because it does not exist or you do not have permission. (Source: MSSQLServer, Error number: 15151) Get help: http://help/15151 Cannot find the object 'usp_Test', because it does not exist or you do not have permission. (Source: MSSQLServer, Error number: 15151) Get help: http://help/15151

The error is accurate, because when I check on the subscriber, the stored procedure wasn't created as expected... but that was the purpose of the publication...

Additionally, I can create the stored procedure manually on the subscriber, but when I generate a snapshot, it deletes the existing stored procedure and then still returns this error message.

And here's a sample publication that creates this issue.

The stored procedure:

USE [CompanyDatabase]
GO

CREATE PROCEDURE [dbo].[usp_Test]

AS

CREATE TABLE #TempTable(ID INT)
CREATE NONCLUSTERED INDEX [IX_TempTable] ON [dbo].[#TempTable](ID)
SELECT 'Test'
GO

GRANT EXECUTE ON [dbo].[usp_Test] TO [CompanyDatabase_access]
GO

The publication script:

-- Adding the transactional publication
use [CompanyDatabase]
exec sp_addpublication 
    @publication = N'Replication Test', 
    @description = N'Publication of database ''CompanyDatabase''.', 
    @sync_method = N'concurrent', 
    @retention = 0, 
    @allow_push = N'true', 
    @allow_pull = N'true', 
    @allow_anonymous = N'false', 
    @enabled_for_internet = N'false', 
    @snapshot_in_defaultfolder = N'true', 
    @compress_snapshot = N'false', 
    @ftp_port = 21, 
    @ftp_login = N'anonymous', 
    @allow_subscription_copy = N'false', 
    @add_to_active_directory = N'false', 
    @repl_freq = N'continuous', 
    @status = N'active', @independent_agent = N'true', 
    @immediate_sync = N'false', 
    @allow_sync_tran = N'false', 
    @autogen_sync_procs = N'false', 
    @allow_queued_tran = N'false', 
    @allow_dts = N'false', 
    @replicate_ddl = 1, 
    @allow_initialize_from_backup = N'false', 
    @enabled_for_p2p = N'false', 
    @enabled_for_het_sub = N'false'
GO

-- Adding the transactional articles
use [CompanyDatabase]
exec sp_addarticle 
    @publication = N'Replication Test', 
    @article = N'usp_Test', 
    @source_owner = N'dbo', 
    @source_object = N'usp_Test', 
    @type = N'proc schema only', 
    @description = N'', 
    @creation_script = N'', 
    @pre_creation_cmd = N'drop', 
    @schema_option = 0x0000000048000001, 
    @destination_table = N'usp_Test', 
    @destination_owner = N'dbo', 
    @status = 16
GO

-- Adding the transactional subscriptions
use [CompanyDatabase]
exec sp_addsubscription 
    @publication = N'Replication Test', 
    @subscriber = N'OtherDatabaseServer', 
    @destination_db = N'CompanyDatabase', 
    @subscription_type = N'Pull', 
    @sync_type = N'automatic', 
    @article = N'all', 
    @update_mode = N'read only', 
    @subscriber_type = 0
GO

The subscription script:

/****** Begin: Script to be run at Subscriber ******/
use [CompanyDatabase]
exec sp_addpullsubscription 
    @publisher = N'DatabaseServer', 
    @publication = N'Replication Test', 
    @publisher_db = N'CompanyDatabase', 
    @independent_agent = N'True', 
    @subscription_type = N'pull', 
    @description = N'', 
    @update_mode = N'read only', 
    @immediate_sync = 0

exec sp_addpullsubscription_agent 
    @publisher = N'DatabaseServer', 
    @publisher_db = N'CompanyDatabase', 
    @publication = N'Replication Test', 
    @distributor = N'DatabaseServer', 
    @distributor_security_mode = 1, 
    @distributor_login = N'', 
    @distributor_password = N'', 
    @enabled_for_syncmgr = N'False', 
    @frequency_type = 64, 
    @frequency_interval = 0, 
    @frequency_relative_interval = 0, 
    @frequency_recurrence_factor = 0, 
    @frequency_subday = 0, 
    @frequency_subday_interval = 0, 
    @active_start_time_of_day = 0, 
    @active_end_time_of_day = 235959, 
    @active_start_date = 0, 
    @active_end_date = 0, 
    @alt_snapshot_folder = N'', 
    @working_directory = N'', 
    @use_ftp = N'False', 
    @job_login = null, 
    @job_password = null, 
    @publication_type = 0
GO
/****** End: Script to be run at Subscriber ******/

Again, the odd thing is that the publication will still contain the same error if the create index statement is commented out, but it will work if it is removed altogether.

For now, I've just removed all stored procedures that contain these create index statements from the publication, but I would like to have them replicated to the subscribers so that any DDL updates to the procedures will be automatically reflected on the subscribers.

-- EDIT --

Looking in the snapshot directory, the .sch file for usp_Test contains the exact same code block I previously posted for the stored procedure... based on the error returned, it seems like the snapshot agent decides not to run the CREATE PROCEDURE command if it contains a create index, but then continues on and tries to run the GRANT EXECUTE command, which causes the error.

Also, my exact version of SQL Server is:

Microsoft SQL Server 2005 - 9.00.5254.00 (2005 + SP4 Cumulative Update 1)

-- END EDIT --

My question is, why is this happening? Is there an issue with the configuration of my publication or subscription? As anyone else experienced anything like this? Where would I start in troubleshooting this issue?

-- UPDATE --

I've been talking to Hilary Cotter on technet... and still no luck. If I remove the GRANT EXECUTE permission on the procedure, then it creates successfully with the CREATE INDEX. So it will work with GRANT EXECUTE OR CREATE INDEX, but not both. Hilary suggested that it might be some type of spam appliance in my domain that was preventing the snapshot from being transferred correctly when it contained both of those keywords, but if I manually copy the .sch file to the subscriber and validate that it contains the expected commands, I still get the same issue.

With the following code: the stored procedure in the snapshot will failed to be applied:

CREATE NONCLUSTERED INDEX [IX_TempTable] ON [dbo].[#TempTable](ID)

But, changing the syntax slightly causes the stored procedure to create without issue:

ALTER TABLE dbo.#TempTable ADD CONSTRAINT IX_TempTable UNIQUE NONCLUSTERED ( ID )

I can't explain it, and after spending literally hours on this glitch, I'm ready to just stop looking for an explanation and settle for this workaround.

Database Normalization

6 votes

I have recently broken up a very large database table into smaller, manageable tables and for the most part I am satisfied with my work and I feel that the data is properly normalized.

But there is one exception to this. The tables in question are from a products database that stores information about (you guessed it) products that the company sells. I have separated much of the information into two tables: ProductBase and ProductBasePackaging.

These tables hold the umbrella of information that is relevant to a base part number rather than an individual product (there are multiple products to each base number).

ProductBase contains rather general information such as MarketingCopy, Keywords etc. and also information on construction i.e. material, components etc.

And ProductBasePackaging of course holds data about the packaging.

Now that I am writing the application for data manipulation, I am beginning to second guess myself. It seems like I have just made it harder on myself now that I have to keep track of multiple tables that use the same key (the base part number). Or am I right to have separated them as such and maybe taken it a step further and separated the construction into it's own table as well?

I am pretty well versed in using sql but this is the first time I have ever had to actually design a database structure, let alone restructure a large existing database. So basically what I am asking is should I have multiple tables with the same key that are separated by type of data or keep things together in the single table where I can reference everything I need from one table using the same key?

Sorry I know that was a lot to read, I hope it made sense, and thanks to all those who make it through!

Normalization might look like a pain in the a** right now - but trust me, in the long run, you'll be glad you did it! Non-normalized "flat" tables with everything but the kitchen sink in them will become very unmanageable over time, data inconsistencies will creep in, and before you know it, you have a huge steaming pile of crap - errrg - data that doesn't make any sense anymore!

Yes, joining tables can be a bit of work - but especially for displaying data, you should definitely check out views which can help you write those JOINs once and then just use them as "virtual tables" that hold everything again.

Database normalization - up to roughly 3NF - is a good thing (TM) for sure! I would always recommend doing it, and then maybe at that point introduce back some limited de-normalization where performance needs might require it - but only in a very controlled way, and with your full understanding and knowledge that you are in fact denormalizing something again.

Parser for Oracle SQL

6 votes

For my current project I need a SQL parser that parses Oracle SQL statements. Currently I've been using jsqlparser, which worked well for simple queries. But when specific functions occur (e.g. cast() or (+)) the parser fails.

Can anyone suggest a parser that is fully compliant to Oracle SQL?

Best, Will

Have you considered General SQL Parser? I don't have any experience with it myself but browsing their website it has potential. Personally I have rolled my own built on the parser in Eclipse Data Tools Platform (sorry I can't share, it's proprietary), but now I will have to evaluate the one I linked above because it claims to have more coverage of Oracle SQL than my parser does.

Double insert in MS Access?

4 votes

I'm somehow getting a double insert; every time I submit the form, it sends two records to the database. I can't figure out what's going on. This is a general idea of what my code looks like:

Protected Sub btnSubmit_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnSubmit.Click

'Data collection'

'Define Connection'
    Dim myConn As New OleDbConnection
    myConn.ConnectionString = adsGrandmaster.ConnectionString
    myConn.Open()

'Insert command'
    Dim myIns1 As New OleDbCommand("INSERT INTO tableGrandmaster (date_received, prefix, course_number, title, new, changed, inactivate, end_date, credits, description, hours_lecture, hours_lec_lab, hours_lab, hours_total, related_instruction, repeat, challengeable, in_catalog, in_printed_schedule, core_course, core_name, program_elective, program_name, prereqs, coreqs, recommended, green_course, code, dept_code, division_code, changing_depts, acti_code, grading, general_ed, writing, social_science, math, information_literacy, arts_letters, science_computer, speech_comm, cultural_literacy, date_curriculum_approval, date_state_sent, date_state_approval, date_created) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", myConn)

'Insert parameters'

'Execute command'
    myIns1.ExecuteNonQuery()

'Close connection'
    myConn.Close()

Update:

The last little piece of my .aspx.vb file:

'Execute command'
    myIns1.ExecuteNonQuery()

    Label1.Text = "Grandmaster submitted."

    'Close connection'
    myConn.Close()

End Sub

Protected Sub btnBack_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnBack.Click
    Response.Redirect("./index.htm")
End Sub
End Class

If I place my breakpoint at or before myIns1.ExecuteNonQuery(), nothing inserts. If I place it after myIns1.ExecuteNonQuery(), it inserts once. If I place it after "End Sub" (under myConn.Close()), it inserts twice.

Make sure that the aspx:button declaration is not wired up to the onclick event on the code-in-front page as well as the code behind page.

Django: 'unique_together' and 'blank=True'

4 votes

I have a Django model which looks like this:

class MyModel(models.Model):
    parent = models.ForeignKey(ParentModel)
    name   = models.CharField(blank=True, max_length=200)
    ... other fields ...

    class Meta:
        unique_together = ("name", "parent")

This works as expected; If there is the same 'name' more than once in the same 'parent' then i get an error: "MyModel with this Name and Parent already exists."

However, i also get an error when i save more than one 'MyModel' with the same 'parent' but with the 'name' field blank, but this should be allowed. So basically i dont want to get the above error when the 'name' field is blank. Is that possible somehow?

Using unique_together, you're telling Django that you don't want any two MyModel instances with the same parent and name attributes -- which applies even when name is an empty string.

This is enforced at the database level using the unique attribute on the appropriate database columns. So to make any exceptions to this behavior, you'll have to avoid using unique_together in your model.

Instead, you can get what you want by overriding the save method on the model and enforcing the unique restraint there. When you try to save an instance of your model, your code can check to see if there are any existing instances that have the same parent and name combination, and refuse to save the instance if there are. But you can also allow the instance to be saved if the name is an empty string. A basic version of this might look like this:

class MyModel(models.Model):
    ...

    def save(self, *args, **kwargs):

        if self.name != '':
            conflicting_instance = MyModel.objects.filter(parent=self.parent, \
                                                          name=self.name)
            if self.id:
                # This instance has already been saved. So we need to filter out
                # this instance from our results.
                conflicting_instance = conflicting_instance.exclude(pk=self.id)

            if conflicting_instance.exists():
                raise Exception('MyModel with this name and parent already exists.')

        super(MyModel, self).save(*args, **kwargs)

Hope that helps.

versioning each field vs history date field?

4 votes

Which do you recommend and why?

I have a few tables, when i make a change to the data... it should go to a history table (audit) with a effective date.

The other solution is versioning each field to insert a new row when making changes to the data?

Which is the best method for the invoice information? Item name and price is always change

These are slowly changing dimensions, type 2 and type 4, appropriately.

Both methods are valid and may be more appropriate for your needs, depending on your model and query requirements.

Basically, type 2 (versioning) is more appropriate when you need to query historical values as often as the current one, while type 4 (history table) is more suited when you are querying the current value more often and there are more queries (more queries to develop I mean) against the most recent value.

Is it ok to store presentation data in the database?

4 votes

I am to displaying rows from a database (SQL SERVER 2005) table in a webpage. These rows contain a statusID (foreign key) that is further defined by Status table (eg. id, name, modifiedDate).

The various statuses should display differently (lets say they simply have a different background color).

I am using php to query the database, and build the web page table. As I build the table I am going to apply a css class to an element based on the status of that row.

I have at least 2 options to do this:

  1. Define code logic in the php to handle it, and if the statuses are changed in the database, I will have to change the code.

  2. Store the "class" in the database and simply apply the class that has been stored.

The latter option seems better to me, but I am unsure if embedding presentational data into the database is a bad design choice. This will be the foundation on which I create several intranet utilities, and I'd like to start off on the right foot.

There's nothing wrong with storing any data in the database, including presentation data. If it helps you produce effective results, while writing less code, then it is a good practice. The thing you need to make sure of is that you don't mix your presentation logic with your database logic.

You can ensure that these concerns are separated by encapsulating the data for your presentation layer in the properties of an elementInfo object (for example).

Since it is a CSS class that you are talking about, this presentation data should be kept separate from the business data. So, while it is okay to store both presentation data and business data in the database, it is not acceptable to store them in the same table.

Update re: comment No, you should not add a PresentationClassRecord's Id as an FK on the business object. I made a sample of an approach to the db below. I called the DummyTable your business objects, and the rest follows specification. The most relevant part is the StatusPresentationAssignmentTable

 -----------------------------------------------
 DummyTable
 -----------------------------------------------
 Id      Name     SomeOtherDataField    StatusId
 PK int  varchar  int                   FK int

 -----------------------------------------------
 StatusTable
 -----------------------------------------------
 Id      Name     ModifiedDate
 PK int  varchar  datetime

 -----------------------------------------------
 PresentationTable
 -----------------------------------------------
 Id      PresentationType    Value 
 PK int  varchar            
 sample data:
 43      CssClass            prettyBackground

 -----------------------------------------------
 StatusPresentationAssignmentTable
 -----------------------------------------------
 StatusId    PresentationId 
 FK int      FK int

Now with two simple join clauses you can get the presentation data and it is completely decoupled from your business data. Your script could do something like check if the Status of the Dummy has any presentation assignments. If it does then it looks at the PresentationType, gets the appropriate function to apply the presentation-data to the presentation, and executes it. (You would need to have a function for each PresentationType that knows how to handle the value - something that could be encapsulated by something like function applyPresentationValue(presentationElement, presentationType, presentationValue) that calls a different function applyCssClass(presentationElement, value) if the presentationType == "CssClass" ).