Best sql questions in October 2011

Is varchar(128) better than varchar(100)

10 votes

Quick question. Does it matter from the point of storing data if I will use decimal field limits or hexadecimal (say 16,32,64 instead of 10,20,50)?

I ask because I wonder if this will have anything to do with clusters on HDD?

Thanks!

VARCHAR(128) is better than VARCHAR(100) if you need to store strings longer than 100 bytes.

Otherwise, there is very little to choose between them; you should choose the one that better fits the maximum length of the data you might need to store. You won't be able to measure the performance difference between them. All else apart, the DBMS probably only stores the data you send, so if your average string is, say, 16 bytes, it will only use 16 (or, more likely, 17 - allowing 1 byte for storing the length) bytes on disk. The bigger size might affect the calculation of how many rows can fit on a page - detrimentally. So choosing the smallest size that is adequate makes sense - waste not, want not.

So, in summary, there is precious little difference between the two in terms of performance or disk usage, and aligning to convenient binary boundaries doesn't really make a difference.

Oracle hierarchical query on non-hierarchical data

9 votes

I hava data in an Oracle table that is organized as a graph that can contain cycles (see example).

Data sample

My goal is to get all nodes that are children of node X. Let's say 2. My expected result is then: 3, 4, 5, 6, 8

I know that I can design a query like this:

with T (parent, child) as
(
                  select 1 parent, 2 child from dual
        union all select 1 parent, 8 child from dual
        union all select 2 parent, 3 child from dual
        union all select 2 parent, 4 child from dual
        union all select 2 parent, 8 child from dual
        union all select 3 parent, 4 child from dual
        union all select 3 parent, 6 child from dual
        union all select 4 parent, 5 child from dual
        union all select 5 parent, 8 child from dual
        union all select 6 parent, 5 child from dual
        union all select 7 parent, 3 child from dual
        union all select 7 parent, 5 child from dual
        union all select 8 parent, 6 child from dual
)
SELECT child, sys_connect_by_path(child,'/')
   FROM T
  START WITH parent = 2
CONNECT BY NOCYCLE PRIOR child = PARENT;

The problem with such a query is that it will go through all possible paths until they cycle, and there are way too many of them in my actual data. The result consists of many duplicates – Here it is:

child | sys_connect_by_path (for information)
3     | /3
4     | /3/4
5     | /3/4/5
8     | /3/4/5/8
6     | /3/4/5/8/6
6     | /3/6
5     | /3/6/5
8     | /3/6/5/8
4     | /4
5     | /4/5
8     | /4/5/8
6     | /4/5/8/6
8     | /8
6     | /8/6
5     | /8/6/5

My actual data is much more complex. the cost of execution of such a query is so huge that my TEMP tablespace, which was autoextendable, reached 10Gb (originally 500 Mb) and my database actually broke because of disk full.

I tried to design the query like this (recursive WITH clause) :

WITH T(parent, child) AS (/* same as above */)
   , descendants(node) AS
( SELECT 2 node FROM dual
  UNION ALL
  (
  SELECT child
    FROM T
   INNER JOIN descendants D
      ON T.parent = D.node
   MINUS SELECT node FROM descendants
  )
)
SELECT * FROM descendants

The problem that I encounter is:

  • with Oracle 10g, this is not implemented (ORA-32033: unsupported column aliasing, and some customers use Oracle 9 or 10),
  • with Oracle 11g, I get ORA-32041: UNION ALL operation in recursive WITH clause must have only two branches. If I remove the MINUS clause I will get cycles (ORA-32044: cycle detected while executing recursive WITH query).

How would you query my original data to get those nodes 3, 4, 5, 6, 8 efficiently? PL/SQL solutions are also welcome.

Thank you.

What is your expected maximum depth to reach any child node?

If it's relatively small, you could loop down, while checking for nodes you have already visited, in a manner something like this...

(Note, I'm not an Oracle expert so this is closer to pseudo code with a little real SQL mixed in)

CREATE TABLE myMap (parent INT, child INT);

INSERT INTO myTable SELECT NULL, 2 FROM DUAL;

WHILE (SQL%ROWCOUNT > 0)
LOOP

  INSERT INTO
    myMap
  SELECT DISTINCT
    dataMap.parent,
    dataMap.child
  FROM
    myMap
  INNER JOIN
    dataMap
      ON myMap.child = dataMap.parent
  WHERE
    NOT EXISTS (SELECT * FROM myMap WHERE parent = dataMap.parent)

END LOOP;

Depending on performance, you may also want a depth field in myMap; optimising the join so as to only join on the most recent nodes. This would imply two indexes; one for the JOIN (depth) and one for the NOT EXISTS (parent).

EDIT

Added the DISTINCT key word, to avoid the following case...
- Node 2 maps to 3 and 4
- Nodes 3 and 4 both map to node 5
- All children of node 5 would now be processed twice

GROUP BY, or many other options, can be used to cater for this instead of DISTINCT. It's just that the NOT EXISTS on it's own is not sufficient.

need assistance with sql injection

8 votes

First, I'm not trying to hack or do anything illegal. Thought I let you guys know. I have a client that want's me to do some modifications on his system, when I was looking at it I notice that NOTHING was escaped. I'm not joking, nothing is being escaped. I explained to him that it's insecure to have a system like that. He then proceeds to tell me that he's had his system like this for few years and nothing has happened. I need to show him that his system is not safe, but I really don't know to do perform an sql injection. Here's a few queries that use $_GET and are not escaped.

SELECT *,DATE_FORMAT(joined,'%M %d, %Y') as \"Joined\" FROM `members` WHERE `name` LIKE '".$ltr."%' ORDER BY points DESC LIMIT $page,50

Here's another one:

SELECT * FROM groups WHERE id=$thisladder[grid]

The only thing that I see that "might" clean the $_GET is this function:

if (!ini_get('register_globals')) {
   $superglobals = array($_SERVER, $_ENV,
       $_FILES, $_COOKIE, $_POST, $_GET);
   if (isset($_SESSION)) {
       array_unshift($superglobals, $_SESSION);
   }
   foreach ($superglobals as $superglobal) {
       extract($superglobal, EXTR_SKIP);
   }
}

It's possible that the function above may be sanitizing the variables. And yes, the system also uses register globals, which is also bad.

I also made a backup, just in case.

Can't say it better than http://xkcd.com/327/.

But then again, as Marc B says, forget SQL injection, register_globals is much, much worse. Never thought I'd actually see it emulated, just in case it's off.

Explicitly specify sort order for mysql query?

8 votes

If I have:

ID | Title
1  | Shirt
2  | CD
3  | Cap
4  | Mp3
5  | Badge

If I want to sort by this order: 4, 2, 5, 3,1. Is there a way to do an sql query where you explicitly specify this? Something like:

select * from TABLE order by ID(4,2,5,3,1) ??

Actually, you were surprisingly close. It's a simple as:

select * from TABLE order by field(ID,4,2,5,3,1)

Self documenting SQL

7 votes

I have used Doxygen to self document code in languages such as PHP. But is does not support SQL. Is there an equilivant program to produce documentation from SQL?

Did you take a look at IzzySoft HyperSQL?

What's wrong with the Foreign Key Constraint in this table

7 votes

MySQL 5.1.59 throws an error error with this create table:

CREATE  TABLE IF NOT EXISTS `genre` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `abv` CHAR(3) CHARACTER SET 'latin1' COLLATE 'latin1_bin' NULL DEFAULT NULL ,
  `name` VARCHAR(80) NOT NULL DEFAULT '' ,
  `parent_id` INT NULL DEFAULT NULL ,
  PRIMARY KEY (`id`) ,
  INDEX `fk_genre_genre1` (`parent_id` ASC) ,
  CONSTRAINT `fk_genre_genre1`
    FOREIGN KEY (`parent_id` )
    REFERENCES `genre` (`id` )
    ON DELETE SET NULL
    ON UPDATE CASCADE)
ENGINE = InnoDB;

Which was generated by MySQLWorkbench 5.2.33.

The error massage is:

ERROR 1005 (HY000) at line __: Can't create table 'mydb.genre' (errno: 150)

What's wrong with this create table?

The manual says:

If MySQL reports an error number 1005 from a CREATE TABLE statement, and the error message refers to error 150, table creation failed because a foreign key constraint was not correctly formed.

It also says that foreign key references to the same table are allowed:

InnoDB supports foreign key references within a table. In these cases, “child table records” really refers to dependent records within the same table.

The relationship I want is a non-identifying parent-child to represent a hierarchy of genres and sub-genres. A genre doesn't have to have a parent, hence parent_id is nullable.

It may be relevant that MySQLWorkbench set the following:

SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='TRADITIONAL';

Your column id is int unsigned; your column parent_id is int. Those don't match. The solution is to change parent_id to be int unsigned as well.

If you run the SHOW ENGINE InnoDB STATUS I put in a comment, you see this:

11005 17:18:38 Error in foreign key constraint of table test/genre:

    FOREIGN KEY (`parent_id` )
    REFERENCES `genre` (`id` )
    ON DELETE SET NULL
    ON UPDATE CASCADE)
ENGINE = InnoDB:
Cannot find an index in the referenced table where the
referenced columns appear as the first columns, or column types
in the table and the referenced table do not match for constraint.
Note that the internal storage type of ENUM and SET changed in
tables created with >= InnoDB-4.1.12, and such columns in old tables
cannot be referenced by such columns in new tables.
See http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
for correct foreign key definition.

Note the "column types in the table and the referenced table do not match" part.

SQL vs NOSQL: Which to use for this schema?

7 votes

I've got an upcoming project and I can't decide whether to stick with SQL or switch over to NoSQL. It's basically a reporting system with the main interface being reporting on the data entered in by users.

Here's the schema I've got mapped out:

enter image description here

Because this schema is so nested, I started thinking about NoSQL. With SQL, I'm afraid I'm going to have a crap-ton of joins to get to the bottom of the tree (the Record model).

My concerns, though, are two-fold:

  1. I'm only just starting to get into NoSQL and I'm worried my knowledge may limit me because of the tight timeframe.
  2. Although creating data at the bottom of the tree will probably be relatively simple, I'm worried that it may be hard to report on without getting into some heavy map/reduce stuff (that I have zero experience with)

My question: Given my concerns, do you think this schema -- because of how deeply nested it is -- lends itself more to NoSQL? If so, do you think the reporting on the "records" will be difficult?

I realize that it may be difficult to answer these questions without more info, so please let me know what other info may be helpful in coming up with an answer.

Thanks in advance for your help!

Just my opinion:

I Stared at diagram for approx 3 sec, this is clearly relational. Benefits of an RDBMS heavily outweigh a NoSQL solution here. Why would you want to use NoSQL? Are there 100,000+ records (may a million plus)? You need microsecond/millisecond performance?

NoSQL, as I understand, is not because you don't like lots of joins. It's because big systems for hierarchical data don't suit every situation. This suit this perfectly, however.

How to get the difference in years from two different dates?

7 votes

I want to get the difference in years from two different dates using MySQL database.

for example:

  • 2011-07-20 - 2011-07-18 => 0 year
  • 2011-07-20 - 2010-07-20 => 1 year
  • 2011-06-15 - 2008-04-11 => 2 3 years
  • 2011-06-11 - 2001-10-11 => 9 years

How about the SQL syntax? Is there any built in function from MySQL to produce the result?

Here's the expression that also caters for leap years:

YEAR(date1) - YEAR(date2) - (DATE_FORMAT(date1, '%m%d') < DATE_FORMAT(date2, '%m%d'))

This works because the expression (DATE_FORMAT(date1, '%m%d') < DATE_FORMAT(date2, '%m%d')) is true if date1 is "earlier in the year" than date2 and because in mysql, true = 1 and false = 0, so the adjustment is simply a matter of subtracting the "truth" of the comparison.

This gives the correct values for your test cases, except for test #3 - I think it should be "3" to be consistent with test #1:

create table so7749639 (date1 date, date2 date);
insert into so7749639 values
('2011-07-20', '2011-07-18'),
('2011-07-20', '2010-07-20'),
('2011-06-15', '2008-04-11'),
('2011-06-11', '2001-10-11'),
('2007-07-20', '2004-07-20');
select date1, date2,
YEAR(date1) - YEAR(date2)
    - (DATE_FORMAT(date1, '%m%d') < DATE_FORMAT(date2, '%m%d')) as diff_years
from so7749639;

Output:

+------------+------------+------------+
| date1      | date2      | diff_years |
+------------+------------+------------+
| 2011-07-20 | 2011-07-18 |          0 |
| 2011-07-20 | 2010-07-20 |          1 |
| 2011-06-15 | 2008-04-11 |          3 |
| 2011-06-11 | 2001-10-11 |          9 |
| 2007-07-20 | 2004-07-20 |          3 |
+------------+------------+------------+

Does DBIx::Class do unions?

7 votes

I haven't found a way to do unions with DBIx::Class other than using a view and writing out the SQL manually. This seems strange to me. I feel like there should be some way to union two ResultSets without a lot of extra work because set addition and subtraction are such a core part of SQL. Is there an easier way to do unions? If not, why not?

DBIx::Class::Helper::ResultSet::SetOperations

my $rs1 = $rs->search({ foo => 'bar' });  
my $rs2 = $rs->search({ baz => 'biff' });  
for ($rs1->union($rs2)->all) { ... }

Converting Varchar to NVarchar?

6 votes

I know when converting from nvarchar to varchar, some data will be lost/changed.

However, are there any risks of data changing when converting a varchar data type to an nvarchar?

nvarchar stores unicode characters which are twice the size of varchar characters. So as long as your nvarchar is atleast twice the length of your vchar this will not be a problem.

Converting the other way could still be achieved providing you have not used any characters outside the ASCII character range (i.e. you do not have Unicode characters)

In short, make sure your nvarchar lengths are twice the size of your largest varchar value and then make the change

Getting all consecutive rows differing by certain value?

6 votes

I am trying to get my head around doing this as it involves comparison of consecutive rows. I am trying to group values that differ by a certain number. For instance, let us say I have this table:

CREATE TABLE #TEMP (A int, B int)

-- Sample table
INSERT INTO #TEMP VALUES 
(3,1), 
(3,2), 
(3,3),
(3,4),
(5,1),
(6,1),
(7,2),
(8,3),
(8,4),
(8,5),
(8,6)

SELECT * FROM #TEMP

DROP TABLE #TEMP

And let us say I have to group all values that differ by 1 having the same value for A. Then I am trying to get an output like this:

A B GroupNo
3 1 1
3 2 1
3 3 1
3 4 1
5 1 2
6 1 3
7 2 4
8 3 5
8 4 5
8 5 5
8 6 5

(3,1) (3,2) (3,3) (3,4) and (8,3) (8,4) (8,5) (8,6) have been put into the same group because they differ by a value 1. I will first show my attempt:

CREATE TABLE #TEMP (A int, B int)

-- Sample table
INSERT INTO #TEMP VALUES 
(3,1), (3,2), (3,3), (3,4), (5,1), (6,1), (7,2),
(8,3), (8,4), (8,5), (8,6)

-- Assign row numbers and perform a left join
-- so that we can compare consecutive rows
SELECT ROW_NUMBER() OVER (ORDER BY A ASC) ID, * 
INTO #TEMP2
FROM #TEMP

;WITH CTE AS
(
    SELECT X.A XA, X.B XB, Y.A YA, Y.B YB
    FROM #TEMP2 X
    LEFT JOIN #TEMP2 Y
    ON X.ID = Y.ID - 1
    WHERE X.A = Y.A AND
    X.B = Y.B - 1
)
SELECT XA, XB
INTO #GROUPS
FROM CTE
UNION 
SELECT YA, YB
FROM CTE
ORDER BY XA ASC 

-- Finally assign group numbers
SELECT X.XA, X.XB, Y.GID
FROM #GROUPS X
INNER JOIN
(SELECT XA, ROW_NUMBER() OVER (ORDER BY XA ASC) GID
    FROM #GROUPS Y
    GROUP BY XA
) Y
ON X.XA = Y.XA

DROP TABLE #TEMP
DROP TABLE #TEMP2
DROP TABLE #GROUPS

I will be doing this on a large table (about 30 million rows) so I was hoping there is a better way of doing this for arbitrary values (for instance, not just differing by 1, but it could be 2 or 3 which I will incorporate later into a procedure). Any suggestions on whether my approach is bug-free and if it can be improved?

declare @Diff int = 1

;with C as
(
  select A, 
         B,
         row_number() over(partition by A order by B) as rn
  from #TEMP
),
R as
(
  select C.A,
         C.B,
         1 as G,
         C.rn
  from C
  where C.rn = 1
  union all
  select C.A,
         C.B,
         G + case when C.B-R.B <= @Diff 
               then 0
               else 1
             end,
         C.rn
  from C
    inner join R
       on R.rn + 1 = C.rn and
          R.A = C.A       
)
select A,
       B,
       dense_rank() over(order by A, G) as G
from R
order by A, G

better way of using a single parameter multiple times in c#

6 votes

I'm new in using prepared statements for querying data from the database and I'm experiencing problems implementing parameters for c# specifically OracleParameters.

So let's say I have the following SQL:

    string sql = "select * from table1 t1, table2 t2 where t1.columnX = @parm and t2.columnY = @parm"

And the code is this:

    OracleCommand cmd = new OracleCommand(sql, conn);
    cmd.Parameters.Add(new OracleParameter("@parm", strParm));

The problem is when the cmd gets executed t1.columnX gets the value of strParm but when t2.columnY is just about to get the value of strParm, it throws an "ORA-01008: not all variables bound" exception.

It seems to me that the parameter gets to be substituted only once even though that parameter is seen somewhere else in the sql.

One solution I tried and works for me is this:

    OracleCommand cmd = new OracleCommand(sql, conn);
    cmd.Parameters.Add(new OracleParameter("@parm", strParm));
    cmd.Parameters.Add(new OracleParameter("@parm", strParm));

Another solution is this:

    OracleCommand cmd = new OracleCommand(sql, conn);
    cmd.Parameters.Add(new OracleParameter("@parm1", strParm));
    cmd.Parameters.Add(new OracleParameter("@parm2", strParm));

and the modified sql is this:

    string sql = "select * from table1 t1, table2 t2 where t1.columnX = @parm1 and t2.columnY = @parm2" 

The question is, is there a better way of doing this so that I don't have to add another parameter with the same value.

Note: I've just simplified the query above to show that @parm is used in several parts of the query. In real life this query has used the same parameter several times and its a pain to have to add more than one parameter of the same name and value.

I've had the same issue before and IIRC this solved it:

cmd.BindByName = true;

EDIT: I've just re-checked and this does allow you to set the parameter value once even though the parameter may be specified multiple times in the query. The only other thing that I'm doing differently to you is that I specify the parameter names with a leading :, for example :param.

How to query count over 2 columns

6 votes

I'm trying to write a sql query that shows how often two teams have played against each other.

Id | Team1 | Team2 | Date 
1  |   A   |   B   | 25/5/11
2  |   B   |   A   | 26/5/11
3  |   A   |   C   | 27/5/11
4  |   C   |   B   | 28/5/11
5  |   A   |   B   | 28/5/11

result should be:

A vs B => 3
A vs C => 1
C vs B => 1 

Counting A-B and B-A as different is an easy query. But I can't get them to be counted together.

Any suggestions?

SELECT Team1, Team2, SUM(num) FROM (
   SELECT Team1, Team2, COUNT(*) num
   FROM table_name
   GROUP BY Team1, Team2
  UNION ALL
   SELECT Team2, Team1, COUNT(*) num
   FROM table_name
   GROUP BY Team2, Team1
) combined
WHERE Team1 < Team2
GROUP BY Team1, Team2

Edit: Updated to reverse teams when necessary.

Note: This will run a lot faster than the versions using CASE you have been given in other answers because it will make full use of indexes.

Edit2: Moved the where to be even faster with indexes.

Oracle: Indexing a subset of rows of a table

5 votes

I have a table which has active an inactive entries, active = 1 for active and active = 0 for inactive.

I have a variety of indexes on this table, but I only need the indexes maintained for active entries, as the application only queries against active data. Inactive data needs to be kept because it can become active again, but this is generally only done with bulk updates, which wouldn't use an index anyway.

I'm noticing indexing the inactive entries (of there are increasingly more than active entries) takes quite a bit of space.

Is there a way in Oracle (10g) to do something like this:

create index an_idx on tab (active, col1, col2, ... , coln) where active = 1?

Previous attempt:

I tried using a function based index to set the first column to null when active = 0 like so:

create index an_idx on tab (decode(active, 1, 1, null), col1, col2, ... , coln)

But Oracle still seems to index the inactive columns in this case.

Partition the table by ACTIVE, create local indexes, and make the indexes for the inactive partitions UNUSABLE. This will eliminate the time spent on indexing inactive data.

create table tab(active number, col1 number, col2 number, col3 number)
    partition by list(active) 
    (partition tab_active values(1), partition tab_inactive values(0));

create index tab_index1 on tab(col1) local;

alter index tab_index1 modify partition tab_inactive unusable;

But there are some potential downsides to this approach:

  • Not all types of indexes can be unusable.
  • It's not normal to have unusable objects in the database. People will likely complain about it or assume it's a bug and rebuild it.
  • Some operations, such as truncate, will automatically make the indexes usable again.

SQL: eliminate the selection of redundant records that occur within X minutes

4 votes

DB in use is Firebird 2.1,in case not familiar with here is select statement sql ref:
http://ibexpert.net/ibe/index.php?n=Doc.DataRetrieval
functions ref: http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd21.html

i will be happy with any sql slang [i'll convert it].

table schema:

CREATE TABLE EVENT_MASTER (
EVENT_ID                BIGINT NOT NULL,
EVENT_TIME              BIGINT NOT NULL,
DATA_F1                 VARCHAR(40),
DATA_F2                 VARCHAR(40),
PRIMARY KEY (EVENT_ID)
);

the bad news is EVENT_TIME stored as seconds elapsed since the Epoch.

data samples:

"EVENT_ID","EVENT_TIME","DATA_F1","DATA_F2"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25329,1297824773,"8604","A"
25330,1297824793,"8604","A"
25331,1297824809,"8604","1"
25332,1297824811,"8604","GREY"
25333,1297824812,"8604","A"
25334,1297824825,"8604","GREY"
25335,1297824831,"8604","A"
25336,1297824833,"8604","GREY"
25337,1297824838,"8604","A"
25338,1297824840,"8604","1"
25339,1297824850,"8604","A"
25340,1297824864,"8604","A"
25341,1297824875,"8804","GREY" //notice DATA_F1 is different
25342,1297824876,"8604","G"
25343,1297824877,"8604","A"
25344,1297824880,"8604","GREY"
25345,1297824895,"8604","1"
25346,1297824899,"8604","A"
25347,1297824918,"8604","GREY"
25348,1297824930,"8604","YELLOW"
25349,1297824939,"8604","GREY"
25350,1297824940,"8604",""
25351,1297824944,"8604","A"
25352,1297824945,"8604","1"
25353,1297824954,"8604","B"
25354,1297824958,"8604",""
25355,1297824964,"8604","1"
25356,1297824966,"8604","GREY"
25357,1297824974,"8604","1"
25358,1297824981,"8604","GREY"
25359,1297824983,"8604",""
25360,1297824998,"8604","GREY"
25361,1297825003,"8604","2"
25362,1297825009,"8604","G"
25363,1297825018,"8604","GREY"
25364,1297825026,"8604","F"
25365,1297825045,"8604","GREY"
25366,1297825046,"8604","1"

expected output:
distinct "DATA_F1","DATA_F2" rows within X minutes according to EVENT_TIME: like:

25341,1297824875,"8804","GREY"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25332,1297824811,"8604","GREY"
25348,1297824930,"8604","YELLOW"
..etc

Requirements: eliminate the selection of redundant records that occur within 5 minutes [range calcs according to EVENT_TIME column].

last i'm trying follow this pattern:

SELECT * FROM EVENT_MASTER inner join (
SELECT distinct  DATA_F1, DATA_F2 FROM EVENT_MASTER where /*the hard stuff that i need help with: (EVENT_TIME difference within X minutes)*/
) as RemovedDup ON /*EVENT_MASTER.EVENT_ID = problem is i cant select RemovedDup ID otherwise distinct becomes useless!!*/

please help ASAP.

thanks,

EDIT

adding output based on Andrei K. answer:

25331,1297824809,"8604","1"
25327,1297824698,"8604","A"
25342,1297824876,"8604","G"
25332,1297824811,"8604","GREY"
25328,1297824770,"8604","I"
25341,1297824875,"8804","GREY"
25350,1297824940,"8604",""
25352,1297824945,"8604","1" /*bug: time still within 300 seconds, this same as first record*/
25361,1297825003,"8604","2"
25351,1297824944,"8604","A"
25353,1297824954,"8604","B"
25364,1297825026,"8604","F"
25362,1297825009,"8604","G"
25347,1297824918,"8604","GREY"
25372,1297825087,"8604","ORANGE"
25348,1297824930,"8604","YELLOW"
25382,1297825216,"8604","1"
25387,1297825270,"8604","B"
25394,1297825355,"8604","BLUE"
25381,1297825211,"8604","GREY"

EDIT 2: Russell query output: not good output and its very very slow.

1297824698,"8604","A"
1297824770,"8604","I"
1297824809,"8604","1"
1297824811,"8604","GREY"
1297824825,"8604","GREY"
1297824840,"8604","1"
1297824875,"8804","GREY"
1297824876,"8604","G"
1297824880,"8604","GREY"
1297824918,"8604","GREY"
1297824930,"8604","YELLOW"
1297824939,"8604","GREY"
1297824940,"8604",""
1297824945,"8604","1"
1297824954,"8604","B"
1297824964,"8604","1"
1297824998,"8604","GREY"
1297825003,"8604","2"
1297825018,"8604","GREY"
1297825026,"8604","F"
1297825045,"8604","GREY"
1297825046,"8604","1"
1297825063,"8604","1"
1297825079,"8604","GREY"
1297825087,"8604","ORANGE"
1297825094,"8604","GREY"
1297825100,"8604","1"
1297825133,"8604","GREY"
1297825176,"8604","GREY"
1297825216,"8604","1"

EDIT 3:

based on Russell request here is: all rows WHERE DATA_F1 = '8604' AND DATA_F2 = 'GREY'

25332,1297824811,"8604","GREY"
25334,1297824825,"8604","GREY"
25336,1297824833,"8604","GREY"
25344,1297824880,"8604","GREY"
25347,1297824918,"8604","GREY"
25349,1297824939,"8604","GREY"
25356,1297824966,"8604","GREY"
25358,1297824981,"8604","GREY"
25360,1297824998,"8604","GREY"
25363,1297825018,"8604","GREY"
25365,1297825045,"8604","GREY"
25367,1297825059,"8604","GREY"
25371,1297825079,"8604","GREY"
25373,1297825094,"8604","GREY"
25376,1297825116,"8604","GREY"
25378,1297825133,"8604","GREY"
25380,1297825176,"8604","GREY"
25381,1297825211,"8604","GREY"
25384,1297825234,"8604","GREY"
25389,1297825286,"8604","GREY"
25390,1297825314,"8604","GREY"
25391,1297825323,"8604","GREY"
25393,1297825343,"8604","GREY"
25396,1297825370,"8604","GREY"
25397,1297825387,"8604","GREY"
25399,1297825416,"8604","GREY"
25401,1297825436,"8604","GREY"
25402,1297825445,"8604","GREY"
25404,1297825454,"8604","GREY"
50282,1299137344,"8604","GREY"
380151,1309849420,"8604","GREY"

As of this moment [Oct 11,2011 5:00 AM GMT] no absolute correct answer posted, and Andrei K. answer still the best try among others. so sql experts please help me find the solution otherwise i'll start to think that sql is not capable of handling the question requirements! Is it??

remark: event_time is not unique so multiple events can occur at the same second.

You'd need a very nasty recursive query to accomplish this in purely "functional" way. I don't profess to be clever enough to construct such a query, let alone make it performant.

On the other hand, allowing side-effects (i.e. temporary table) significantly simplifies things. You should even be able to make it fairly quick by adding appropriate indexes on the temporary table (not shown here). Here is the actual SQL:

CREATE GLOBAL TEMPORARY TABLE EVENT_MASTER_TMP (
    EVENT_ID                BIGINT NOT NULL,
    EVENT_TIME              BIGINT NOT NULL,
    DATA_F1                 VARCHAR(40),
    DATA_F2                 VARCHAR(40),
    PRIMARY KEY (EVENT_ID)
);

INSERT INTO EVENT_MASTER_TMP
SELECT * FROM
    (SELECT * FROM EVENT_MASTER ORDER BY EVENT_TIME) E
WHERE
    NOT EXISTS (
        SELECT *
        FROM EVENT_MASTER_TMP T
        WHERE
            E.DATA_F1 = T.DATA_F1
            AND E.DATA_F2 = T.DATA_F2
            AND E.EVENT_TIME - T.EVENT_TIME <= 5*60
    );

SELECT * FROM EVENT_MASTER_TMP;

In plain English:

  • Go through events from older to newer,
  • for each event, check if it is redundant relative to some row that is already in the temporary table
  • and if not, insert it in the temporary table, so it can be used as a criteria for the remaining events.

Executing this on your test data yields:

25327   1297824698  8604    A
25328   1297824770  8604    I
25331   1297824809  8604    1
25332   1297824811  8604    GREY
25341   1297824875  8804    GREY
25342   1297824876  8604    G
25348   1297824930  8604    YELLOW
25350   1297824940  8604    
25353   1297824954  8604    B
25361   1297825003  8604    2
25364   1297825026  8604    F

Lowering the time threshold from 5*60 to, say, 233, yield this:

25327   1297824698  8604    A
25328   1297824770  8604    I
25331   1297824809  8604    1
25332   1297824811  8604    GREY
25341   1297824875  8804    GREY
25342   1297824876  8604    G
25348   1297824930  8604    YELLOW
25350   1297824940  8604    
25351   1297824944  8604    A       <-- 246s difference
25353   1297824954  8604    B
25361   1297825003  8604    2
25364   1297825026  8604    F
25365   1297825045  8604    GREY    <-- 234s difference
25366   1297825046  8604    1       <-- 237s difference

Is it a good idea to create schema types to organize the relationships of tables

4 votes

Is it a good idea to create a schema type to separate the table relationships. I guess when you are browsing the tables in SSMS you will see them group together by schema type. But is it worth the trouble? Anyone with experience with this in real world scenarios?

enter image description here

I've generally found that to be more of a hassle than any help it's provided. What do you do with tables that are relevant to multiple areas? What happens when a table seems to belong to one area but later migrates to another area of the application? Do you change its schema and refactor all of your code?

I have used multiple schemata to make delineations when there is a VERY clear boundary between objects, but usually not something like what you have in your diagram. One example is objects which are used just for DBA support. I might put those into their own schema if they aren't used by the actual application itself.

Remove values in comma separated list from database

4 votes

I have a table in my MySQL database called 'children'. In that table is a row called 'wishes' (a comma separated list of the child's wishlist items). I need to be able to update that list so that it only removes one value. i.e. the list = Size 12 regular jeans, Surfboard, Red Sox Baseball Cap; I want to remove Surfboard.

My query right now looks like this

$select = mysql_query('SELECT * FROM children WHERE caseNumber="'.$caseNum.'" LIMIT 1 ');
    $row = mysql_fetch_array($select);

    foreach ($wish as $w) {
        $allWishes = $row['wishes'];
        $newWishes = str_replace($w, '', $allWishes);
        $update = mysql_query("UPDATE children SET wishes='$newWishes' WHERE caseNum='".$caseNum."'");
}

But the UPDATE query isn't removing anything. How can I do what I need?

Using these user-defined REGEXP_REPLACE() functions, you may be able to replace it with an empty string:

UPDATE children SET wishes = REGEXP_REPLACE(wishes, '(,(\s)?)?Surfboard', '') WHERE caseNum='whatever';

Unfortunately, you cannot just use plain old REPLACE() because you don't know where in the string 'Surfboard' appears. In fact, the regex above would probably need additional tweaking if 'Surfboard' occurs at the beginning or end.

Perhaps you could trim off leading and trailing commas left over like this:

UPDATE children SET wishes = TRIM(BOTH ',' FROM REGEXP_REPLACE(wishes, '(,(\s)?)?Surfboard', '')) WHERE caseNum='whatever';

So what's going on here? The regex removes 'Surfboard' plus an optional comma & space before it. Then the surrounding TRIM() function eliminates a possible leading comma in case 'Surfboard' occurred at the beginning of the string. That could probably be handled by the regex as well, but frankly, I'm too tired to puzzle it out.

Note, I've never used these myself and cannot vouch for their effectiveness or robustness, but it is a place to start. And, as others are mentioning in the comments, you really should have these in a normalized wishlist table, rather than as a comma-separated string.

Update

Thinking about this more, I'm more partial to just forcing the use of built-in REPLACE() and then cleaning out the extra comma where you may get two commas in a row. This is looking for two commas side by side, as though there had been no spaces separating your original list items. If the items had been separated by commas and spaces, change ',,' to ', ,' in the outer REPLACE() call.

UPDATE children SET wishes = TRIM(BOTH ',' FROM REPLACE(REPLACE(wishes, 'Surfboard', ''), ',,', ',')) WHERE caseNum='whatever';