DB in use is Firebird 2.1,in case not familiar with here is select statement sql ref:
http://ibexpert.net/ibe/index.php?n=Doc.DataRetrieval
functions ref:
http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd21.html
i will be happy with any sql slang [i'll convert it].
table schema:
CREATE TABLE EVENT_MASTER (
EVENT_ID BIGINT NOT NULL,
EVENT_TIME BIGINT NOT NULL,
DATA_F1 VARCHAR(40),
DATA_F2 VARCHAR(40),
PRIMARY KEY (EVENT_ID)
);
the bad news is EVENT_TIME stored as seconds elapsed since the Epoch.
data samples:
"EVENT_ID","EVENT_TIME","DATA_F1","DATA_F2"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25329,1297824773,"8604","A"
25330,1297824793,"8604","A"
25331,1297824809,"8604","1"
25332,1297824811,"8604","GREY"
25333,1297824812,"8604","A"
25334,1297824825,"8604","GREY"
25335,1297824831,"8604","A"
25336,1297824833,"8604","GREY"
25337,1297824838,"8604","A"
25338,1297824840,"8604","1"
25339,1297824850,"8604","A"
25340,1297824864,"8604","A"
25341,1297824875,"8804","GREY" //notice DATA_F1 is different
25342,1297824876,"8604","G"
25343,1297824877,"8604","A"
25344,1297824880,"8604","GREY"
25345,1297824895,"8604","1"
25346,1297824899,"8604","A"
25347,1297824918,"8604","GREY"
25348,1297824930,"8604","YELLOW"
25349,1297824939,"8604","GREY"
25350,1297824940,"8604",""
25351,1297824944,"8604","A"
25352,1297824945,"8604","1"
25353,1297824954,"8604","B"
25354,1297824958,"8604",""
25355,1297824964,"8604","1"
25356,1297824966,"8604","GREY"
25357,1297824974,"8604","1"
25358,1297824981,"8604","GREY"
25359,1297824983,"8604",""
25360,1297824998,"8604","GREY"
25361,1297825003,"8604","2"
25362,1297825009,"8604","G"
25363,1297825018,"8604","GREY"
25364,1297825026,"8604","F"
25365,1297825045,"8604","GREY"
25366,1297825046,"8604","1"
expected output:
distinct "DATA_F1","DATA_F2" rows within X minutes according to EVENT_TIME:
like:
25341,1297824875,"8804","GREY"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25332,1297824811,"8604","GREY"
25348,1297824930,"8604","YELLOW"
..etc
Requirements:
eliminate the selection of redundant records that occur within 5 minutes [range calcs according to EVENT_TIME column].
last i'm trying follow this pattern:
SELECT * FROM EVENT_MASTER inner join (
SELECT distinct DATA_F1, DATA_F2 FROM EVENT_MASTER where /*the hard stuff that i need help with: (EVENT_TIME difference within X minutes)*/
) as RemovedDup ON /*EVENT_MASTER.EVENT_ID = problem is i cant select RemovedDup ID otherwise distinct becomes useless!!*/
please help ASAP.
thanks,
EDIT
adding output based on Andrei K. answer:
25331,1297824809,"8604","1"
25327,1297824698,"8604","A"
25342,1297824876,"8604","G"
25332,1297824811,"8604","GREY"
25328,1297824770,"8604","I"
25341,1297824875,"8804","GREY"
25350,1297824940,"8604",""
25352,1297824945,"8604","1" /*bug: time still within 300 seconds, this same as first record*/
25361,1297825003,"8604","2"
25351,1297824944,"8604","A"
25353,1297824954,"8604","B"
25364,1297825026,"8604","F"
25362,1297825009,"8604","G"
25347,1297824918,"8604","GREY"
25372,1297825087,"8604","ORANGE"
25348,1297824930,"8604","YELLOW"
25382,1297825216,"8604","1"
25387,1297825270,"8604","B"
25394,1297825355,"8604","BLUE"
25381,1297825211,"8604","GREY"
EDIT 2:
Russell query output: not good output and its very very slow.
1297824698,"8604","A"
1297824770,"8604","I"
1297824809,"8604","1"
1297824811,"8604","GREY"
1297824825,"8604","GREY"
1297824840,"8604","1"
1297824875,"8804","GREY"
1297824876,"8604","G"
1297824880,"8604","GREY"
1297824918,"8604","GREY"
1297824930,"8604","YELLOW"
1297824939,"8604","GREY"
1297824940,"8604",""
1297824945,"8604","1"
1297824954,"8604","B"
1297824964,"8604","1"
1297824998,"8604","GREY"
1297825003,"8604","2"
1297825018,"8604","GREY"
1297825026,"8604","F"
1297825045,"8604","GREY"
1297825046,"8604","1"
1297825063,"8604","1"
1297825079,"8604","GREY"
1297825087,"8604","ORANGE"
1297825094,"8604","GREY"
1297825100,"8604","1"
1297825133,"8604","GREY"
1297825176,"8604","GREY"
1297825216,"8604","1"
EDIT 3:
based on Russell request here is: all rows WHERE DATA_F1 = '8604' AND DATA_F2 = 'GREY'
25332,1297824811,"8604","GREY"
25334,1297824825,"8604","GREY"
25336,1297824833,"8604","GREY"
25344,1297824880,"8604","GREY"
25347,1297824918,"8604","GREY"
25349,1297824939,"8604","GREY"
25356,1297824966,"8604","GREY"
25358,1297824981,"8604","GREY"
25360,1297824998,"8604","GREY"
25363,1297825018,"8604","GREY"
25365,1297825045,"8604","GREY"
25367,1297825059,"8604","GREY"
25371,1297825079,"8604","GREY"
25373,1297825094,"8604","GREY"
25376,1297825116,"8604","GREY"
25378,1297825133,"8604","GREY"
25380,1297825176,"8604","GREY"
25381,1297825211,"8604","GREY"
25384,1297825234,"8604","GREY"
25389,1297825286,"8604","GREY"
25390,1297825314,"8604","GREY"
25391,1297825323,"8604","GREY"
25393,1297825343,"8604","GREY"
25396,1297825370,"8604","GREY"
25397,1297825387,"8604","GREY"
25399,1297825416,"8604","GREY"
25401,1297825436,"8604","GREY"
25402,1297825445,"8604","GREY"
25404,1297825454,"8604","GREY"
50282,1299137344,"8604","GREY"
380151,1309849420,"8604","GREY"
As of this moment [Oct 11,2011 5:00 AM GMT] no absolute correct answer posted, and Andrei K. answer still the best try among others. so sql experts please help me find the solution otherwise i'll start to think that sql is not capable of handling the question requirements! Is it??
remark: event_time is not unique so multiple events can occur at the same second.
You'd need a very nasty recursive query to accomplish this in purely "functional" way. I don't profess to be clever enough to construct such a query, let alone make it performant.
On the other hand, allowing side-effects (i.e. temporary table) significantly simplifies things. You should even be able to make it fairly quick by adding appropriate indexes on the temporary table (not shown here). Here is the actual SQL:
CREATE GLOBAL TEMPORARY TABLE EVENT_MASTER_TMP (
EVENT_ID BIGINT NOT NULL,
EVENT_TIME BIGINT NOT NULL,
DATA_F1 VARCHAR(40),
DATA_F2 VARCHAR(40),
PRIMARY KEY (EVENT_ID)
);
INSERT INTO EVENT_MASTER_TMP
SELECT * FROM
(SELECT * FROM EVENT_MASTER ORDER BY EVENT_TIME) E
WHERE
NOT EXISTS (
SELECT *
FROM EVENT_MASTER_TMP T
WHERE
E.DATA_F1 = T.DATA_F1
AND E.DATA_F2 = T.DATA_F2
AND E.EVENT_TIME - T.EVENT_TIME <= 5*60
);
SELECT * FROM EVENT_MASTER_TMP;
In plain English:
- Go through events from older to newer,
- for each event, check if it is redundant relative to some row that is already in the temporary table
- and if not, insert it in the temporary table, so it can be used as a criteria for the remaining events.
Executing this on your test data yields:
25327 1297824698 8604 A
25328 1297824770 8604 I
25331 1297824809 8604 1
25332 1297824811 8604 GREY
25341 1297824875 8804 GREY
25342 1297824876 8604 G
25348 1297824930 8604 YELLOW
25350 1297824940 8604
25353 1297824954 8604 B
25361 1297825003 8604 2
25364 1297825026 8604 F
Lowering the time threshold from 5*60 to, say, 233, yield this:
25327 1297824698 8604 A
25328 1297824770 8604 I
25331 1297824809 8604 1
25332 1297824811 8604 GREY
25341 1297824875 8804 GREY
25342 1297824876 8604 G
25348 1297824930 8604 YELLOW
25350 1297824940 8604
25351 1297824944 8604 A <-- 246s difference
25353 1297824954 8604 B
25361 1297825003 8604 2
25364 1297825026 8604 F
25365 1297825045 8604 GREY <-- 234s difference
25366 1297825046 8604 1 <-- 237s difference