Join 2 tables in 1:N Relationship, using Max.

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Join 2 tables in 1:N Relationship, using Max.

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2005-09-22 : 08:22:49

Greg Cook writes "HI,

i have an issue that i hope you can help with.

I have two tables. which i need to join together, its a 1 : N relationship. eg

Table1
=======
ID
Field1
Field2

Table 2
=======
ID,
Table1ID
TimeStamp,
FieldX
FieldY

Table one can store say a Person, there is 1 row per Person.
Table 2 stores events for each person, there could be multiple rows per person.

What i want to do is a select where i return one row of data for a Person and i return the Last event that person performed ( based on the TimeStamp )

So in in general i do an inner join from table1 to table2 where table1.ID = table2.table1ID, The problem is this join would return multiple rows for each person. One way to fix this is to do the following

Select tb1.Field1, tbl2.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY From tbl1 inner join ( select ID, FieldY, FieldY, Table1ID from Table2 inner Join ( Select Table1ID, Max(TimeStamp) from Table2 Group BY Table1ID) MaxID on Table2.Table1ID = MaxID.TableID and Table2.TimeStamp = MaxID.TimeStamp) as tb2 on tbl1.ID = tb2.Table1ID

The code above is rough and might not work 100%, but essentially im doing a sub select to get the Max TimeStamp for each ID. I Then Join back to the same table based on the Time stamp and the Table1ID so that i can select the additional fields i want ( couldnt return these fields in the group by ).

This should return 1 row of data to table 2 which has a Max TimeStamp for each person as well as the other fields that table holds. This is then joined to the Table1 and gives me my final result of one row per Person that has the Last Event that person did.

Is there another way to do this.. this seems very messy and required self joins. Any advice on how to fix this would be appreciated

Thanks"

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2005-09-22 : 11:20:25

Give this a shot


create table dbo.tbl1 ([ID] int IDENTITY(1,1) PRIMARY KEY CLUSTERED, Field1 varchar(10)
	, Field2 varchar(10))

create table dbo.tbl2 ([ID] int IDENTITY(1,1) PRIMARY KEY CLUSTERED, Table1ID int
	, [TimeStamp] timestamp, FieldX varchar(10), FieldY varchar(10))


insert into tbl1 (Field1,Field2) values ('foo','Bar') 
insert into tbl1 (Field1,Field2) values ('blah','up') 
insert into tbl1 (Field1,Field2) values ('say','snoogie') 
insert into tbl1 (Field1,Field2) values ('what','yes') 
insert into tbl1 (Field1,Field2) values ('tool','no') 


insert into tbl2 (Table1ID,FieldX,FieldY) values (1,'Yo','Gah')
insert into tbl2 (Table1ID,FieldX,FieldY) values (1,'Blea','nah')
insert into tbl2 (Table1ID,FieldX,FieldY) values (1,'Last','Insert')
insert into tbl2 (Table1ID,FieldX,FieldY) values (2,'Yo','Gah')
insert into tbl2 (Table1ID,FieldX,FieldY) values (2,'Last','Insert')
insert into tbl2 (Table1ID,FieldX,FieldY) values (3,'Yo','Gah')
insert into tbl2 (Table1ID,FieldX,FieldY) values (3,'Yo','Gah')
insert into tbl2 (Table1ID,FieldX,FieldY) values (3,'Last','Insert')

Select  tbl1.Field1, tbl1.Field2, tbl1.ID, tbl2.FieldX, tbl2.FieldY , tbl2.[TimeStamp]
From tbl1 inner join tbl2 
on tbl1.ID = tbl2.Table1ID
where tbl2.[TimeStamp] in 
(select [TimeStamp] from
	( select table1id, max([TimeStamp]) AS [TimeStamp] from tbl2 group by table1id) G
) 

drop table tbl1
drop table tbl2

EDIT: remove extraneous select

Help us help YOU!
Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx

*need more coffee*
SELECT * FROM Users WHERE CLUE > 0
(0 row(s) affected)

Xerxes
Aged Yak Warrior

666 Posts

Posted - 2005-09-22 : 12:36:48

Don,

This is a terrific solution

! But I'm just being nosy.....yet I have two questions:

1.) Why use Primary Key Clustered?

2.) Why Create an ID when the Primary Key could be the Name?

Nice work!

Semper fi,
XERXES, USMC(Ret.)
------------------------------------------------------
The Marine Corps taught me everything but SQL!

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2005-09-22 : 14:27:32

1) habit

2) just trying to construct the tables from how the question asker described them.

Help us help YOU!
Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx

*need more coffee*
SELECT * FROM Users WHERE CLUE > 0
(0 row(s) affected)

Xerxes
Aged Yak Warrior

666 Posts

Posted - 2005-09-22 : 14:41:46

quote:
Originally posted by DonAtWork

1) habit
2) just trying to construct the tables from how the question asker described them.

Help us help YOU!
Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx

*need more coffee*
SELECT * FROM Users WHERE CLUE > 0
(0 row(s) affected)

That's what I thought

!

Semper fi,
XERXES, USMC(Ret.)
------------------------------------------------------
The Marine Corps taught me everything but SQL!

DonAtWork
Master Smack Fu Yak Hacker

2167 Posts

Posted - 2005-09-23 : 09:02:47

Nosey Jarheads.

Help us help YOU!
Read this blog entry for more details: http://weblogs.sqlteam.com/brettk/archive/2005/05/25.aspx

*need more coffee*
SELECT * FROM Users WHERE CLUE > 0
(0 row(s) affected)

annbythebay
Starting Member

1 Post

Posted - 2007-10-31 : 16:53:27

This is extremely helpful, I've been looking for a way to pull just one row of principal information per account, but needed two fields: first and last name so the min function wouldn't work.

e.g. given these two principals
Ann Zack
Xerxes Brown

either one would do, but using min(firstname) and min(lastname) yields Ann Brown!

Your code lets me do exactly what I've been trying to do! Thanks!

arajderkar
Starting Member

2 Posts

Posted - 2007-11-01 : 10:26:36

Hi Don,
I read all the posts on this page. Your work is great but I think the query does not gaurantee to return the expected result.
you have used timestamp from table 2, which is not a unique field. Thi might result in more than one records for a single person. (a persons second most latest timestamp could be equal to other persons latest one).

Regards
Aks

Reporter
Starting Member

48 Posts

Posted - 2007-11-01 : 11:36:57

I have the same problem

It's my solution. However it's also far form perfection.

select v2.* from
(select v1.Table1ID,v1.TimeStamp,ID=max(v2.ID) from
(select Table1ID,TimeStamp=max(TimeStamp) from table2 group by Table1ID) as v1 inner join
table2 v2 on v1.Table1ID=v2.Table1ID and v1.TimeStamp=v2.TimeStamp
group by v1.Table1ID,v1.TimeStamp) as v1 inner join
table2 v2 on v1.Table1ID=v2.Table1ID and v1.TimeStamp=v2.TimeStamp and v1.ID=v2.ID

The number of nest is defined by primary keys of table2.
But it's work 100%. And it's not depended on the additional (unkeys) fields.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2007-11-01 : 13:56:51

To build on Dons sample, you may (or may not) be able to gain some performance by changing the IN clause to an INNER JOIN:

SELECT  
	tbl1.Field1, 
	tbl1.Field2, 
	tbl1.ID, 
	tbl2.FieldX, 
	tbl2.FieldY , 
	tbl2.[TimeStamp]
FROM 
	tbl1 
INNER JOIN 
	tbl2 
	ON tbl1.ID = tbl2.Table1ID
INNER JOIN
	(
		SELECT 
			table1id, 
			MAX([TimeStamp]) AS [TimeStamp] 
		FROM 
			tbl2 
		GROUP BY 
			table1id
	) AS Temp
	ON tbl2.Table1ID = Temp.Table1ID
	AND tbl2.TimeStamp = Temp.TimeStamp

arajderkar
Starting Member

2 Posts

Posted - 2007-11-02 : 03:50:06

I think Lamprey has provided with a better solution and will give a 100% gaurantee of the work. It will also be faster than using IN . But I would like to suggest some changes in his code.


SELECT  
	tbl1.Field1, 
	tbl1.Field2, 
	tbl1.ID, 
	tbl2.FieldX, 
	tbl2.FieldY , 
	tbl2.[TimeStamp]
FROM 
	tbl1 
INNER JOIN 
	(
		SELECT 
			table1Id, 
			MAX([TimeStamp]) AS [TimeStamp] 
		FROM 
			tbl2 
		GROUP BY 
			table1id
	) AS tbl2
	ON tbl1.ID = tbl2.Table1ID

I think Lamprey did use the second table twice which could be done using one join itself.
Please let me know if I am wrong or if I missed something.

Reporter
Starting Member

48 Posts

Posted - 2007-11-02 : 04:44:36

quote:

Originally posted by arajderkar


SELECT  
	tbl1.Field1, 
	tbl1.Field2, 
	tbl1.ID, 
	tbl2.FieldX, 
	tbl2.FieldY , 
	tbl2.[TimeStamp]
FROM 
	tbl1 
INNER JOIN 
	(
		SELECT 
			table1Id, 
			MAX([TimeStamp]) AS [TimeStamp] 
		FROM 
			tbl2 
		GROUP BY 
			table1id
	) AS tbl2
	ON tbl1.ID = tbl2.Table1ID

There is a error in this code - there are no fields tbl2.FieldX and tbl2.FieldY at the table named as tbl2

Reporter
Starting Member

48 Posts

Posted - 2007-11-02 : 07:10:38

Let's simplify this problem.
There is no doubt that it's very easy to join two suggested table.
Let's choose second table only and formalize the problem like this.

declare @t table (pk1 int,pk2 int,Start datetime,f1 char(10),f2 char(10))

-- where pk1,pk2 primary key and the number of other fields is not significant

insert @t
select 1,1,convert(datetime,'01.01.2007'),'f11','f21' union all
select 1,2,convert(datetime,'02.01.2007'),'f12','f22' union all
select 1,3,convert(datetime,'03.01.2007'),'f13','f23' union all
select 2,1,convert(datetime,'04.01.2007'),'f14','f24' union all
select 2,2,convert(datetime,'05.01.2007'),'f15','f25' union all
select 2,3,convert(datetime,'06.01.2007'),'f16','f26' union all
select 3,1,convert(datetime,'07.01.2007'),'f17','f27' union all
select 3,2,convert(datetime,'09.01.2007'),'f18','f28' union all
select 3,3,convert(datetime,'09.01.2007'),'f19','f29'

what is wanted

pk1 pk2 Start f1 f2
----------- ----------- --------------- ---------- ----------
1 3 2007-03-01 f13 f23
2 3 2007-06-01 f16 f26
3 3 2007-09-01 f19 f29

here is my method

select v2.* from
(select v1.pk1,pk2=max(v2.pk2),v1.Start from
(select pk1,Start=max(Start) from @t group by pk1) as v1
inner join @t v2 on v1.pk1=v2.pk1 and v1.Start=v2.Start group by v1.pk1,v1.Start) as v1
inner join @t v2 on v1.pk1=v2.pk1 and v1.pk2=v2.pk2

Is there any progressiver way to obtain it?

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2007-11-02 : 14:25:36

Reporter, I was not 100% clear on teh requirement. Is the goal to get the "newest" row for each pk1 based on Start and then on pk2 or just by pk2? I asssume it is by Start then pk2, but the data supports using just pk2.

Here are a couple of methods that work under different scenarios:

-- Using just pk2
SELECT
	T.pk1,
	T.pk2,
	T.Start,
	T.f1,
	T.f2
FROM 
	@T AS T
INNER JOIN
	(
		SELECT pk1, MAX(pk2) AS pk2
		FROM @T AS T1
		GROUP BY pk1
	) AS Temp
	ON T.pk1 = Temp.pk1
	AND T.pk2 = Temp.pk2


--Using Start then MAX
SELECT
	T.pk1,
	MAX(T.pk2) AS pk2,
	MAX(T.Start) AS Start,
	MAX(T.f1) AS f1,
	MAX(T.f2) AS f2
FROM 
	@T AS T
INNER JOIN
	(
		SELECT pk1, MAX(Start) AS Start
		FROM @T AS T1
		GROUP BY pk1
	) AS Temp
	ON T.pk1 = Temp.pk1
	AND T.Start = Temp.Start
GROUP BY
	T.pk1

-- For Grins - The 2005 Way - Using Start then pk2
SELECT
	Temp.pk1,
	Temp.pk2,
	Temp.Start,
	Temp.f1,
	Temp.f2
FROM
	(
		SELECT
			T.pk1,
			T.pk2,
			T.Start,
			T.f1,
			T.f2,
			ROW_NUMBER() OVER (PARTITION BY t.pk1 ORDER BY t.Start DESC, t.pk2 DESC) AS RowNum
		FROM 
			@T AS T
	) AS Temp
WHERE RowNum = 1

Reporter
Starting Member

48 Posts

Posted - 2007-11-06 : 05:42:35

pk1 is the person identifier and the main grouping condition (PersonID)

pk2 is the event for each person

pk1+pk2 is the primary key

I should like to obtain one "newest" row for each person (pk1) based on Start.
And if there are equal datetime points then based on pk2 also.

Here is the example for person=1

pk1, pk2, Start, f1, ... , fN

1, 7, 01.01.2007, yes, ... , 111
1, 3, 15.01.2007, no, ... , 222
1, 5, 20.01.2007, yes, ... , 121
1, 4, 07.01.2007, yes, ... , 212
1, 2, 20.01.2007, no, ... , 221

Expected result

1, 5, 20.01.2007, yes, ... , 121

An aggregate function using such as MAX(T.f1) and MAX(T.f2) are not very good solution.
At the common case I can get all of them by select * from @t where pk1=1 and pk2=5

In order to obtain the primary key (1,5) I use the next query

select v1.pk1,pk2=max(v2.pk2),v1.Start from

(select pk1,Start=max(Start) from @t group by pk1) as v1

inner join @t v2 on v1.pk1=v2.pk2 and v1.Start=v2.Start

group by v1.pk1,v1.Start

Is this only one possible method for the Server 2000?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-11-06 : 07:07:12

No.

-- prepare sample data
create table #t (pk1 int, pk2 int, start datetime, f1 char(10), f2 char(10))

set dateformat dmy

insert	#t
select	1, 1, '01.01.2007', 'f11', 'f21' union all
select	1, 2, '02.01.2007', 'f12', 'f22' union all
select	1, 3, '03.01.2007', 'f13', 'f23' union all
select	2, 1, '04.01.2007', 'f14', 'f24' union all
select	2, 2, '05.01.2007', 'f15', 'f25' union all
select	2, 3, '06.01.2007', 'f16', 'f26' union all
select	3, 1, '07.01.2007', 'f17', 'f27' union all
select	3, 2, '09.01.2007', 'f18', 'f28' union all
select	3, 3, '09.01.2007', 'f19', 'f29'

-- Reporter / 65% of batch / duration 1, reads 15
select		v2.*
from		(
			select		v1.pk1,
					pk2 = max(v2.pk2),
					v1.start
			from		(
						select		pk1,
								start = max(start)
						from		#t
						group by	pk1
					) as v1
			inner join	#t v2 on v1.pk1 = v2.pk1 and v1.start = v2.start
			group by	v1.pk1,
					v1.start
		) as v1
inner join	#t v2 on v1.pk1 = v2.pk1 and v1.pk2 = v2.pk2

-- Peso / 11% of batch / duration 0, reads 30
select	t1.pk1,
	t1.pk2,
	t1.start,
	t1.f1,
	t1.f2
from	#t as t1
where	exists (select * from #t as t2 where t2.pk1 = t1.pk1 group by t2.pk1 having convert(varchar, t1.start, 120) + cast(t1.pk2 as varchar) = max(convert(varchar, t2.start, 120) + cast(t2.pk2 as varchar)))

-- SQL Server 2005 / 24% of batch / duration 0, reads 3
select	pk1,
	pk2,
	start,
	f1,
	f2
from	(
		select	pk1,
			pk2,
			start,
			f1,
			f2,
			row_number() over (partition by pk1 order by start desc, pk2 desc) as recid
		from	#t
	) as d
where	recid = 1

-- clean up
drop table #t

-- Reporter
-- Scan count 3, logical reads 5, physical reads 0
--
-- Peso
-- Scan count 10, logical reads 10, physical reads 0
--
-- SQL Server 2005
-- Scan count 1, logical reads 1, physical reads 0

E 12°55'05.25"
N 56°04'39.16"

Reporter
Starting Member

48 Posts

Posted - 2007-11-06 : 09:51:29

Hi, Peso!

select t1.pk1,
t1.pk2,
t1.start,
t1.f1,
t1.f2
from #t as t1
where exists (select * from #t as t2 where t2.pk1 = t1.pk1 group by t2.pk1 having
convert(varchar, t1.start, 120) + cast(t1.pk2 as varchar) = max(convert(varchar, t2.start, 120) + cast(t2.pk2 as varchar)))

That's good idea. This is almost what I am looking for.

Here is the same with some modifications

select t1.pk1,
t1.pk2,
t1.start,
t1.f1,
t1.f2
from #t as t1
where exists (select * from #t as t2 where t2.pk1 = t1.pk1 group by t2.pk1 having
convert(float,t1.start)+t1.pk2=max(convert(float,t2.start)+t2.pk2))

But unfortunately it's failed for datetime data type.
For example, it does not work correctly under next conditions

insert #t
select 3, 2, '09.01.2007 20:20:20.099', 'f18', 'f28' union all
select 3, 3, '09.01.2007 20:20:20.033', 'f19', 'f29'

Result

3 3 2007-01-09 20:20:20.033 f19 f29

instead

3 2 2007-01-09 20:20:20.099 f18 f28

PS. Nevertheless, I like this query. Thank you very much.

twocents
Starting Member

1 Post

Posted - 2007-11-26 : 16:09:29

/* dummy-up some data
IF OBJECT_ID('tempdb..#Table1') IS NOT NULL
DROP TABLE #Table1

IF OBJECT_ID('tempdb..#Table2') IS NOT NULL
DROP TABLE #Table2

CREATE TABLE #Table1
(
ID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Field1 varchar(50) NOT NULL,
Field2 varchar(50) NOT NULL
)

CREATE TABLE #Table2
(
ID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Table1ID int NOT NULL,
"TimeStamp" datetime NOT NULL,
FieldX varchar(50) NOT NULL,
FieldY varchar(50) NOT NULL,
-- UNIQUE (Table1ID, "TimeStamp"), -- see queries below
FOREIGN KEY (Table1ID) REFERENCES #Table1 (ID) -- will be skipped anyway, but...
)

INSERT #Table1 (Field1, Field2)
SELECT 'Alpha', 'One'
UNION ALL SELECT 'Beta', 'Two'
UNION ALL SELECT 'Charlie', 'Three'
UNION ALL SELECT 'Delta', 'Four'
UNION ALL SELECT 'Echo', 'Five'

DECLARE @i int

SET @i = 0

WHILE @i < 5 BEGIN
INSERT #Table2 (Table1ID, "TimeStamp", FieldX, FieldY)
SELECT t1.ID, DATEADD(s, @i, GETDATE()), 'Event' + CAST(@i AS varchar(1)),
'E' + CAST(@i AS varchar(1)) + CAST(t1.ID AS varchar(1))
FROM #Table1 t1

SET @i = @i + 1
END
*/

-- this is the best execution plan we've found (for our live data/indexing), but
-- it requires #Table2: UNIQUE (Table1ID, "TimeStamp")
-- or multiple rows are possible for a single #Table1 row
-- however, if the unique constraint is actually enforced in this example
-- it causes an additional key lookup (go figure)
SELECT t1.ID, t1.Field1, t1.Field2, t2.TimeStamp, t2.FieldX, t2.FieldY
FROM #Table1 t1
LEFT JOIN #Table2 t2 ON t1.ID = t2.Table1ID AND t2.TimeStamp =
(
SELECT MAX(t.TimeStamp)
FROM #Table2 t
WHERE t.Table1ID = t1.ID -- subquery within "ON" clause may use "outer" table
)

-- this is the best execution plan we've found which always works
-- if the unique constraint is actually enforced in this example,
-- its query plan is comparable to the previous "optimal" query plan
SELECT t1.ID, t1.Field1, t1.Field2, t2.TimeStamp, t2.FieldX, t2.FieldY
FROM #Table1 t1
LEFT JOIN #Table2 t2 ON t2.ID =
(
SELECT TOP (1) t.ID
FROM #Table2 t
WHERE t.Table1ID = t1.ID
ORDER BY t.TimeStamp, t.ID DESC
)

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources