# SQL for Date Spanned Data

I was recently presented with a sql challenge. Although I came up with something that works, I'm not convinced it is the best solution. This is a very long post that is mostly question, but possibly a helpful example. The details are below.  
  
I was given a table with the following columns  
ID -- Primary Key  
DIVISION -- Identifier of a Division  
START\_DATE -- Date when the Division starts  
END\_DATE -- Date when the Division ends  
  
If you want to try this yourself, I added all the create statements and sample data creation to the end of this note.  
  
The data has an implicit concept of a "span." A Division may have many spans. The start date of a new span is typically one day after the end date of the previous span. In this case a Division has a contiguous span. Spans never overlap; that is, a single division never has a row with a start date between the start and end date of another row of the same division (and never has an end date between the start and end of another row of the same division).  
  
Below is some sample data:  
  
Id, Division, Start\_date, End\_date  
  

> 10 1 09-JUN-2009 10-JUN-2009  
> 11 1 11-JUN-2009 12-JUN-2009  
> 12 1 13-JUN-2009 14-JUN-2009 -- Note the following row is not contiguous  
> 14 1 17-JUN-2009 18-JUN-2009  
> 15 1 19-JUN-2009 20-JUN-2009

  
  
If this were the total data set then Division 1 would have 5 spans, but only two contiguous spans (09 - 14 Jun and 17 - 20 Jun).  
  
The challenge was to create a sql statement to only return contiguous spans by division. I immediately thought of a CONNECT BY but the challenge was to determine the START WITH condition.  
  
\[Update: In response to this post Alex Nuijten came up with a better method than any described below. Take a look here:  
[http://nuijten.blogspot.com/2009/06/analytic-function-finding-gaps.html](http://nuijten.blogspot.com/2009/06/analytic-function-finding-gaps.html) \]  
I came up with two ways to do this. I'm still not certain if there is a better solution. The first is far better and uses some SQL analytics to determine the START WITH condition. If you were to use a WHERE clause, it should go in the WITH section as identified. My example is below.  
  

> with span\_info as (select s.division, s.start\_date, s.end\_date  
> , (s.start\_date -lag(end\_date, 1) over (partition by s.division order by start\_date)) new\_span  
> \-- null or > 1 indicates a new contiguous span  
> , (lead(start\_date, 1) over (partition by s.division order by start\_date) - s.end\_date) end\_span  
> \-- the end\_span is not needed, just left for example  
> \-- null or > 1 indicates the end of a contiguous span  
> from spantest s  
> \-- a WHERE clause would go HERE, not in the main SELECT below  
> order by 1, 2 )  
> select division  
> , connect\_by\_root start\_date span\_start -- start of a contiguous span  
> \-- , start\_date -- start of the last span of a contiguous span  
> , end\_date -- end of the contiguous span  
> \-- , new\_span, end\_span, connect\_by\_isleaf  
> from span\_info  
> where connect\_by\_isleaf = 1  
> \-- gets only "leaf" recodes, i.e. the end of a contiguous span  
> start with nvl(new\_span,1000) > 1  
> \-- gets beginning of a contiguous span  
> connect by  
> prior end\_date + 1 = start\_date  
> \-- connects spans, the 1 here and in the "start with" can be changed together  
> and prior division = division  
> \-- ensures spans are for the same division  
> order by 1, 2

  
My second idea is to use an exists command to determine if the the first there exists a row with an end\_date equal to start\_date - 1 for the division. I can write the statement, but I'm not sure how to get the combination of the DIVISION and the START\_DATE in to the START WITH. I could do some concatenation, but I don't like that at all. The code is below, but DO NOT RUN THE QUERY; it is very slow.  
  
\-- \*\*\*\*\*\*\*\*\*\*\*\* DO NOT RUN THIS QUERY  

> select division  
> , connect\_by\_root start\_date span\_start -- start of a contiguous span  
> \-- , start\_date -- start of the last span of a contiguous span  
> , end\_date -- end of the contiguous span  
> \-- , new\_span, end\_span, connect\_by\_isleaf  
> from spantest  
> where connect\_by\_isleaf = 1  
> \-- gets only "leaf" recodes, i.e. the end of a contiguous span  
> start with to\_char(start\_date,'yyyymmdd') ||'~' || division in  
> (select to\_char(st2.start\_date,'yyyymmdd') ||'~' || st2.division  
> from spantest st2  
> where not exists  
> (select 1 from spantest st3  
> where st3.division = st2.division  
> and st3.end\_date = st2.start\_date + 1  
> and st3.id != st2.id)  
> )  
> \-- gets beginning of a contiguous span  
> connect by  
> prior end\_date + 1 = start\_date  
> \-- connects spans, the 1 here and in the "start with" can be changed together  
> and prior division = division  
> \-- ensures spans are for the same division  
> order by 1, 2

  
  
This is a lot of code and no-one other than me may ever read it. If you do, though, and have other ideas, I would love to hear them.  
  
\-- \*\*\*\*\*\*\*\*\* Create Statements and sample data  
  
CREATE TABLE "SPANTEST"  
( "ID" NUMBER NOT NULL ENABLE,  
"DIVISION" NUMBER NOT NULL ENABLE,  
"START\_DATE" DATE NOT NULL ENABLE,  
"END\_DATE" DATE NOT NULL ENABLE,  
CONSTRAINT "SPANTEST\_PK" PRIMARY KEY ("ID") ENABLE  
)  
/  
  
CREATE INDEX "SPANTEST\_IDX3" ON "SPANTEST" ("END\_DATE")  
/  
CREATE INDEX "SPANTEST\_IDX2" ON "SPANTEST" ("START\_DATE")  
/  
CREATE INDEX "SPANTEST\_IDX1" ON "SPANTEST" ("DIVISION")  
/  
  
CREATE SEQUENCE "SPANTEST\_SEQ" MINVALUE 1 MAXVALUE 999999999999999999999999999 INCREMENT BY 1 START WITH 1441 CACHE 20 NOORDER NOCYCLE  
/  
  
CREATE OR REPLACE TRIGGER "BI\_SPANTEST"  
before insert on "SPANTEST"  
for each row  
begin  
if :NEW."ID" is null then  
select "SPANTEST\_SEQ".nextval into :NEW."ID" from dual;  
end if;  
end;  
  
/  
ALTER TRIGGER "BI\_SPANTEST" ENABLE  
/  
  
  
\-- create sample data for 10 Divisions  
begin  
for i in 1..20 loop  
insert into spantest (division, start\_date, end\_date)  
select 1, trunc(sysdate + (rownum\*2 - 1)),trunc(sysdate + (rownum\*2)) from all\_tables;  
end loop;  
end;  
/  
  
\-- create span gaps on the 15th of each month  
delete from spantest where  
to\_char(start\_date,'dd') = '15'  
/