Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2005-09-22 : 08:22:49
|
| Greg Cook writes "HI,i have an issue that i hope you can help with.I have two tables. which i need to join together, its a 1 : N relationship. egTable1=======IDField1Field2Table 2=======ID,Table1IDTimeStamp,FieldXFieldYTable one can store say a Person, there is 1 row per Person.Table 2 stores events for each person, there could be multiple rows per person.What i want to do is a select where i return one row of data for a Person and i return the Last event that person performed ( based on the TimeStamp )So in in general i do an inner join from table1 to table2 where table1.ID = table2.table1ID, The problem is this join would return multiple rows for each person. One way to fix this is to do the followingSelect tb1.Field1, tbl2.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY From tbl1 inner join ( select ID, FieldY, FieldY, Table1ID from Table2 inner Join ( Select Table1ID, Max(TimeStamp) from Table2 Group BY Table1ID) MaxID on Table2.Table1ID = MaxID.TableID and Table2.TimeStamp = MaxID.TimeStamp) as tb2 on tbl1.ID = tb2.Table1IDThe code above is rough and might not work 100%, but essentially im doing a sub select to get the Max TimeStamp for each ID. I Then Join back to the same table based on the Time stamp and the Table1ID so that i can select the additional fields i want ( couldnt return these fields in the group by ).This should return 1 row of data to table 2 which has a Max TimeStamp for each person as well as the other fields that table holds. This is then joined to the Table1 and gives me my final result of one row per Person that has the Last Event that person did.Is there another way to do this.. this seems very messy and required self joins. Any advice on how to fix this would be appreciatedThanks" |
|
|
DonAtWork
Master Smack Fu Yak Hacker
2167 Posts |
Posted - 2005-09-22 : 11:20:25
|
Give this a shotcreate table dbo.tbl1 ([ID] int IDENTITY(1,1) PRIMARY KEY CLUSTERED, Field1 varchar(10) , Field2 varchar(10))create table dbo.tbl2 ([ID] int IDENTITY(1,1) PRIMARY KEY CLUSTERED, Table1ID int , [TimeStamp] timestamp, FieldX varchar(10), FieldY varchar(10))insert into tbl1 (Field1,Field2) values ('foo','Bar') insert into tbl1 (Field1,Field2) values ('blah','up') insert into tbl1 (Field1,Field2) values ('say','snoogie') insert into tbl1 (Field1,Field2) values ('what','yes') insert into tbl1 (Field1,Field2) values ('tool','no') insert into tbl2 (Table1ID,FieldX,FieldY) values (1,'Yo','Gah')insert into tbl2 (Table1ID,FieldX,FieldY) values (1,'Blea','nah')insert into tbl2 (Table1ID,FieldX,FieldY) values (1,'Last','Insert')insert into tbl2 (Table1ID,FieldX,FieldY) values (2,'Yo','Gah')insert into tbl2 (Table1ID,FieldX,FieldY) values (2,'Last','Insert')insert into tbl2 (Table1ID,FieldX,FieldY) values (3,'Yo','Gah')insert into tbl2 (Table1ID,FieldX,FieldY) values (3,'Yo','Gah')insert into tbl2 (Table1ID,FieldX,FieldY) values (3,'Last','Insert')Select tbl1.Field1, tbl1.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY , tbl2.[TimeStamp]From tbl1 inner join tbl2 on tbl1.ID = tbl2.Table1IDwhere tbl2.[TimeStamp] in (select [TimeStamp] from ( select table1id, max([TimeStamp]) AS [TimeStamp] from tbl2 group by table1id) G) drop table tbl1drop table tbl2EDIT: remove extraneous select Help us help YOU!Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx*need more coffee*SELECT * FROM Users WHERE CLUE > 0(0 row(s) affected) |
 |
|
|
Xerxes
Aged Yak Warrior
666 Posts |
Posted - 2005-09-22 : 12:36:48
|
Don, This is a terrific solution ! But I'm just being nosy.....yet I have two questions: 1.) Why use Primary Key Clustered? 2.) Why Create an ID when the Primary Key could be the Name?Nice work!Semper fi, XERXES, USMC(Ret.)------------------------------------------------------The Marine Corps taught me everything but SQL! |
 |
|
|
DonAtWork
Master Smack Fu Yak Hacker
2167 Posts |
Posted - 2005-09-22 : 14:27:32
|
1) habit 2) just trying to construct the tables from how the question asker described them.Help us help YOU!Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx*need more coffee*SELECT * FROM Users WHERE CLUE > 0(0 row(s) affected) |
 |
|
|
Xerxes
Aged Yak Warrior
666 Posts |
Posted - 2005-09-22 : 14:41:46
|
quote: Originally posted by DonAtWork 1) habit 2) just trying to construct the tables from how the question asker described them.Help us help YOU!Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx*need more coffee*SELECT * FROM Users WHERE CLUE > 0(0 row(s) affected)
That's what I thought !Semper fi, XERXES, USMC(Ret.)------------------------------------------------------The Marine Corps taught me everything but SQL! |
 |
|
|
DonAtWork
Master Smack Fu Yak Hacker
2167 Posts |
|
|
annbythebay
Starting Member
1 Post |
Posted - 2007-10-31 : 16:53:27
|
| This is extremely helpful, I've been looking for a way to pull just one row of principal information per account, but needed two fields: first and last name so the min function wouldn't work. e.g. given these two principalsAnn ZackXerxes Browneither one would do, but using min(firstname) and min(lastname) yields Ann Brown!Your code lets me do exactly what I've been trying to do! Thanks! |
 |
|
|
arajderkar
Starting Member
2 Posts |
Posted - 2007-11-01 : 10:26:36
|
| Hi Don, I read all the posts on this page. Your work is great but I think the query does not gaurantee to return the expected result.you have used timestamp from table 2, which is not a unique field. Thi might result in more than one records for a single person. (a persons second most latest timestamp could be equal to other persons latest one). RegardsAks |
 |
|
|
Reporter
Starting Member
48 Posts |
Posted - 2007-11-01 : 11:36:57
|
| I have the same problemIt's my solution. However it's also far form perfection.select v2.* from(select v1.Table1ID,v1.TimeStamp,ID=max(v2.ID) from(select Table1ID,TimeStamp=max(TimeStamp) from table2 group by Table1ID) as v1 inner jointable2 v2 on v1.Table1ID=v2.Table1ID and v1.TimeStamp=v2.TimeStampgroup by v1.Table1ID,v1.TimeStamp) as v1 inner jointable2 v2 on v1.Table1ID=v2.Table1ID and v1.TimeStamp=v2.TimeStamp and v1.ID=v2.IDThe number of nest is defined by primary keys of table2.But it's work 100%. And it's not depended on the additional (unkeys) fields. |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-11-01 : 13:56:51
|
To build on Dons sample, you may (or may not) be able to gain some performance by changing the IN clause to an INNER JOIN:SELECT tbl1.Field1, tbl1.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY , tbl2.[TimeStamp]FROM tbl1 INNER JOIN tbl2 ON tbl1.ID = tbl2.Table1IDINNER JOIN ( SELECT table1id, MAX([TimeStamp]) AS [TimeStamp] FROM tbl2 GROUP BY table1id ) AS Temp ON tbl2.Table1ID = Temp.Table1ID AND tbl2.TimeStamp = Temp.TimeStamp |
 |
|
|
arajderkar
Starting Member
2 Posts |
Posted - 2007-11-02 : 03:50:06
|
I think Lamprey has provided with a better solution and will give a 100% gaurantee of the work. It will also be faster than using IN . But I would like to suggest some changes in his code.SELECT tbl1.Field1, tbl1.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY , tbl2.[TimeStamp]FROM tbl1 INNER JOIN ( SELECT table1Id, MAX([TimeStamp]) AS [TimeStamp] FROM tbl2 GROUP BY table1id ) AS tbl2 ON tbl1.ID = tbl2.Table1ID I think Lamprey did use the second table twice which could be done using one join itself. Please let me know if I am wrong or if I missed something. |
 |
|
|
Reporter
Starting Member
48 Posts |
Posted - 2007-11-02 : 04:44:36
|
quote: Originally posted by arajderkar
SELECT tbl1.Field1, tbl1.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY , tbl2.[TimeStamp]FROM tbl1 INNER JOIN ( SELECT table1Id, MAX([TimeStamp]) AS [TimeStamp] FROM tbl2 GROUP BY table1id ) AS tbl2 ON tbl1.ID = tbl2.Table1ID
There is a error in this code - there are no fields tbl2.FieldX and tbl2.FieldY at the table named as tbl2 |
 |
|
|
Reporter
Starting Member
48 Posts |
Posted - 2007-11-02 : 07:10:38
|
| Let's simplify this problem. There is no doubt that it's very easy to join two suggested table.Let's choose second table only and formalize the problem like this.declare @t table (pk1 int,pk2 int,Start datetime,f1 char(10),f2 char(10))-- where pk1,pk2 primary key and the number of other fields is not significantinsert @tselect 1,1,convert(datetime,'01.01.2007'),'f11','f21' union allselect 1,2,convert(datetime,'02.01.2007'),'f12','f22' union allselect 1,3,convert(datetime,'03.01.2007'),'f13','f23' union allselect 2,1,convert(datetime,'04.01.2007'),'f14','f24' union allselect 2,2,convert(datetime,'05.01.2007'),'f15','f25' union allselect 2,3,convert(datetime,'06.01.2007'),'f16','f26' union allselect 3,1,convert(datetime,'07.01.2007'),'f17','f27' union allselect 3,2,convert(datetime,'09.01.2007'),'f18','f28' union allselect 3,3,convert(datetime,'09.01.2007'),'f19','f29'what is wantedpk1 pk2 Start f1 f2 ----------- ----------- --------------- ---------- ---------- 1 3 2007-03-01 f13 f23 2 3 2007-06-01 f16 f26 3 3 2007-09-01 f19 f29here is my methodselect v2.* from(select v1.pk1,pk2=max(v2.pk2),v1.Start from(select pk1,Start=max(Start) from @t group by pk1) as v1inner join @t v2 on v1.pk1=v2.pk1 and v1.Start=v2.Start group by v1.pk1,v1.Start) as v1inner join @t v2 on v1.pk1=v2.pk1 and v1.pk2=v2.pk2Is there any progressiver way to obtain it? |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2007-11-02 : 14:25:36
|
Reporter, I was not 100% clear on teh requirement. Is the goal to get the "newest" row for each pk1 based on Start and then on pk2 or just by pk2? I asssume it is by Start then pk2, but the data supports using just pk2.Here are a couple of methods that work under different scenarios:-- Using just pk2SELECT T.pk1, T.pk2, T.Start, T.f1, T.f2FROM @T AS TINNER JOIN ( SELECT pk1, MAX(pk2) AS pk2 FROM @T AS T1 GROUP BY pk1 ) AS Temp ON T.pk1 = Temp.pk1 AND T.pk2 = Temp.pk2--Using Start then MAXSELECT T.pk1, MAX(T.pk2) AS pk2, MAX(T.Start) AS Start, MAX(T.f1) AS f1, MAX(T.f2) AS f2FROM @T AS TINNER JOIN ( SELECT pk1, MAX(Start) AS Start FROM @T AS T1 GROUP BY pk1 ) AS Temp ON T.pk1 = Temp.pk1 AND T.Start = Temp.StartGROUP BY T.pk1-- For Grins - The 2005 Way - Using Start then pk2SELECT Temp.pk1, Temp.pk2, Temp.Start, Temp.f1, Temp.f2FROM ( SELECT T.pk1, T.pk2, T.Start, T.f1, T.f2, ROW_NUMBER() OVER (PARTITION BY t.pk1 ORDER BY t.Start DESC, t.pk2 DESC) AS RowNum FROM @T AS T ) AS TempWHERE RowNum = 1 |
 |
|
|
Reporter
Starting Member
48 Posts |
Posted - 2007-11-06 : 05:42:35
|
| pk1 is the person identifier and the main grouping condition (PersonID)pk2 is the event for each personpk1+pk2 is the primary keyI should like to obtain one "newest" row for each person (pk1) based on Start.And if there are equal datetime points then based on pk2 also.Here is the example for person=1pk1, pk2, Start, f1, ... , fN1, 7, 01.01.2007, yes, ... , 1111, 3, 15.01.2007, no, ... , 2221, 5, 20.01.2007, yes, ... , 1211, 4, 07.01.2007, yes, ... , 2121, 2, 20.01.2007, no, ... , 221Expected result1, 5, 20.01.2007, yes, ... , 121An aggregate function using such as MAX(T.f1) and MAX(T.f2) are not very good solution.At the common case I can get all of them by select * from @t where pk1=1 and pk2=5In order to obtain the primary key (1,5) I use the next queryselect v1.pk1,pk2=max(v2.pk2),v1.Start from(select pk1,Start=max(Start) from @t group by pk1) as v1 inner join @t v2 on v1.pk1=v2.pk2 and v1.Start=v2.Startgroup by v1.pk1,v1.StartIs this only one possible method for the Server 2000? |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-11-06 : 07:07:12
|
No.-- prepare sample datacreate table #t (pk1 int, pk2 int, start datetime, f1 char(10), f2 char(10))set dateformat dmyinsert #tselect 1, 1, '01.01.2007', 'f11', 'f21' union allselect 1, 2, '02.01.2007', 'f12', 'f22' union allselect 1, 3, '03.01.2007', 'f13', 'f23' union allselect 2, 1, '04.01.2007', 'f14', 'f24' union allselect 2, 2, '05.01.2007', 'f15', 'f25' union allselect 2, 3, '06.01.2007', 'f16', 'f26' union allselect 3, 1, '07.01.2007', 'f17', 'f27' union allselect 3, 2, '09.01.2007', 'f18', 'f28' union allselect 3, 3, '09.01.2007', 'f19', 'f29'-- Reporter / 65% of batch / duration 1, reads 15select v2.*from ( select v1.pk1, pk2 = max(v2.pk2), v1.start from ( select pk1, start = max(start) from #t group by pk1 ) as v1 inner join #t v2 on v1.pk1 = v2.pk1 and v1.start = v2.start group by v1.pk1, v1.start ) as v1inner join #t v2 on v1.pk1 = v2.pk1 and v1.pk2 = v2.pk2-- Peso / 11% of batch / duration 0, reads 30select t1.pk1, t1.pk2, t1.start, t1.f1, t1.f2from #t as t1where exists (select * from #t as t2 where t2.pk1 = t1.pk1 group by t2.pk1 having convert(varchar, t1.start, 120) + cast(t1.pk2 as varchar) = max(convert(varchar, t2.start, 120) + cast(t2.pk2 as varchar)))-- SQL Server 2005 / 24% of batch / duration 0, reads 3select pk1, pk2, start, f1, f2from ( select pk1, pk2, start, f1, f2, row_number() over (partition by pk1 order by start desc, pk2 desc) as recid from #t ) as dwhere recid = 1-- clean updrop table #t-- Reporter-- Scan count 3, logical reads 5, physical reads 0---- Peso-- Scan count 10, logical reads 10, physical reads 0---- SQL Server 2005-- Scan count 1, logical reads 1, physical reads 0 E 12°55'05.25"N 56°04'39.16" |
 |
|
|
Reporter
Starting Member
48 Posts |
Posted - 2007-11-06 : 09:51:29
|
| Hi, Peso!select t1.pk1, t1.pk2, t1.start, t1.f1, t1.f2from #t as t1where exists (select * from #t as t2 where t2.pk1 = t1.pk1 group by t2.pk1 having convert(varchar, t1.start, 120) + cast(t1.pk2 as varchar) = max(convert(varchar, t2.start, 120) + cast(t2.pk2 as varchar)))That's good idea. This is almost what I am looking for.Here is the same with some modificationsselect t1.pk1, t1.pk2, t1.start, t1.f1, t1.f2from #t as t1where exists (select * from #t as t2 where t2.pk1 = t1.pk1 group by t2.pk1 having convert(float,t1.start)+t1.pk2=max(convert(float,t2.start)+t2.pk2))But unfortunately it's failed for datetime data type.For example, it does not work correctly under next conditionsinsert #tselect 3, 2, '09.01.2007 20:20:20.099', 'f18', 'f28' union allselect 3, 3, '09.01.2007 20:20:20.033', 'f19', 'f29'Result3 3 2007-01-09 20:20:20.033 f19 f29 instead3 2 2007-01-09 20:20:20.099 f18 f28 PS. Nevertheless, I like this query. Thank you very much. |
 |
|
|
twocents
Starting Member
1 Post |
Posted - 2007-11-26 : 16:09:29
|
| /* dummy-up some dataIF OBJECT_ID('tempdb..#Table1') IS NOT NULL DROP TABLE #Table1IF OBJECT_ID('tempdb..#Table2') IS NOT NULL DROP TABLE #Table2CREATE TABLE #Table1 ( ID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED, Field1 varchar(50) NOT NULL, Field2 varchar(50) NOT NULL )CREATE TABLE #Table2 ( ID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED, Table1ID int NOT NULL, "TimeStamp" datetime NOT NULL, FieldX varchar(50) NOT NULL, FieldY varchar(50) NOT NULL,-- UNIQUE (Table1ID, "TimeStamp"), -- see queries below FOREIGN KEY (Table1ID) REFERENCES #Table1 (ID) -- will be skipped anyway, but... )INSERT #Table1 (Field1, Field2)SELECT 'Alpha', 'One'UNION ALL SELECT 'Beta', 'Two'UNION ALL SELECT 'Charlie', 'Three'UNION ALL SELECT 'Delta', 'Four'UNION ALL SELECT 'Echo', 'Five'DECLARE @i intSET @i = 0WHILE @i < 5 BEGIN INSERT #Table2 (Table1ID, "TimeStamp", FieldX, FieldY) SELECT t1.ID, DATEADD(s, @i, GETDATE()), 'Event' + CAST(@i AS varchar(1)), 'E' + CAST(@i AS varchar(1)) + CAST(t1.ID AS varchar(1)) FROM #Table1 t1 SET @i = @i + 1END*/-- this is the best execution plan we've found (for our live data/indexing), but-- it requires #Table2: UNIQUE (Table1ID, "TimeStamp")-- or multiple rows are possible for a single #Table1 row-- however, if the unique constraint is actually enforced in this example-- it causes an additional key lookup (go figure)SELECT t1.ID, t1.Field1, t1.Field2, t2.TimeStamp, t2.FieldX, t2.FieldYFROM #Table1 t1 LEFT JOIN #Table2 t2 ON t1.ID = t2.Table1ID AND t2.TimeStamp = ( SELECT MAX(t.TimeStamp) FROM #Table2 t WHERE t.Table1ID = t1.ID -- subquery within "ON" clause may use "outer" table )-- this is the best execution plan we've found which always works-- if the unique constraint is actually enforced in this example,-- its query plan is comparable to the previous "optimal" query planSELECT t1.ID, t1.Field1, t1.Field2, t2.TimeStamp, t2.FieldX, t2.FieldYFROM #Table1 t1 LEFT JOIN #Table2 t2 ON t2.ID = ( SELECT TOP (1) t.ID FROM #Table2 t WHERE t.Table1ID = t1.ID ORDER BY t.TimeStamp, t.ID DESC ) |
 |
|
|
|
|
|
|
|