Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Can't get Distinct to Work As I Want

Author  Topic 

Mr Fett
Starting Member

28 Posts

Posted - 2010-05-18 : 06:53:12
Hi all,

This is a problem that's puzzled me for some time and I've always resorted to nasty workarounds to try and get it done. The post seems long but its a very simple SQL statement (I've just tried to be as informative as possible). Basically I have two tables:

JOBS
====
jobID
jobName

JOB LOG
=======
logID
jobID
entryTime


I want to retrieve the 10 most recently updated jobs (as in the 10 jobs with an entry in their job log, ordered by entryTime ). Unfortunately if a job has two or more recent entries in it, it features multiple times in my results:

SELECT TOP 10 JOB_jobs.jobID, JOB_jobs.jobName, JOB_jobLog.entryTime
FROM JOB_jobs INNER JOIN JOB_jobLog ON JOB_jobs.jobID = JOB_jobLog.jobID
ORDER BY JOB_jobLog.entryTime DESC


Output:

10123 Tech Support 2010-05-17 19:03:00
10123 Tech Support 2010-05-13 16:19:00
10123 Tech Support 2010-05-13 14:36:00
10210 Dolphin Discovery Support 2010-05-13 14:08:00
10210 Dolphin Discovery Support 2010-05-13 14:07:00
10210 Dolphin Discovery Support 2010-05-13 14:06:00
10213 Midocean Support 2010-05-12 19:27:00

I tried just leaving the 'entryTime' out of the select list and using DISTINCT:

SELECT DISTINCT TOP 10 JOB_jobs.jobID, JOB_jobs.jobName
FROM JOB_jobs INNER JOIN JOB_jobLog ON JOB_jobs.jobID = JOB_jobLog.jobID
ORDER BY JOB_jobLog.entryTime DESC


But I get an error:

ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

This is the crux of the problem and one of the limitations I've never really understood about DISTINCT - all the fields you can search by have to be in the select list, vastly limiting its power. Am I missing something obvious? I'm self taught and it happens a lot!

Thanks in advance for any help

Bob

P.S. I'm running SQL Server 2000 SP4 on Windows Server 2003

Ifor
Aged Yak Warrior

700 Posts

Posted - 2010-05-18 : 09:19:25
[code]
SELECT TOP 10 J.jobID, J.jobName, D.entryTime
FROM JOB_jobs J
JOIN
(
SELECT L.jobID, MAX(L.entryTime) AS entryTime
FROM JOB_jobLog L
GROUP BY L.jobID
) D
ON J.jobID = D.jobID
ORDER BY entryTime DESC
[/code]
Go to Top of Page

tony7671
Starting Member

1 Post

Posted - 2010-05-18 : 09:45:20
Not sure the previous example does exactly what you need.

You maybe need an aggregate query, since what you're really doing is and implicit "MAX(entryTime)":

SELECT TOP 10 J.jobID, J.jobName,
MAX(D.entryTime) AS latest_entryTime
FROM JOB_jobs J
JOIN JOB_jobLog ON JOB_jobs.jobID = JOB_jobLog.jobID
GROUP BY J.jobID, J.jobName
ORDER BY MAX(D.entryTime) DESC
quote:

This is the crux of the problem and one of the limitations I've never really understood about DISTINCT - all the fields you can search by have to be in the select list, vastly limiting its power. Am I missing something obvious? I'm self taught and it happens a lot!


You're not really searching by the field - you can have WHERE conditions that use fields that are not in the select list (or in this case the "GROUP BY"). You are trying to order by a field not in the select list though, so effectively it's telling you that it doesn't know what you mean since it doesn't know which of the potentially different entry times for each job ID you want it to order by. The aggregate query spells this out.

In a way, "SELECT DISTINCT" is a subset of what you can do with an aggregate query - e.g.
   SELECT DISTINCT X FROM TABLE_A
is pretty much functionally equivalent to:
   SELECT X FROM TABLE_A GROUP BY X
but you can go a lot further with the latter!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-05-18 : 14:33:29
quote:
Originally posted by tony7671

Not sure the previous example does exactly what you need.

You maybe need an aggregate query, since what you're really doing is and implicit "MAX(entryTime)":

SELECT TOP 10 J.jobID, J.jobName,
MAX(D.entryTime) AS latest_entryTime
FROM JOB_jobs J
JOIN JOB_jobLog ON JOB_jobs.jobID = JOB_jobLog.jobID
GROUP BY J.jobID, J.jobName
ORDER BY MAX(D.entryTime) DESC
quote:

This is the crux of the problem and one of the limitations I've never really understood about DISTINCT - all the fields you can search by have to be in the select list, vastly limiting its power. Am I missing something obvious? I'm self taught and it happens a lot!


You're not really searching by the field - you can have WHERE conditions that use fields that are not in the select list (or in this case the "GROUP BY"). You are trying to order by a field not in the select list though, so effectively it's telling you that it doesn't know what you mean since it doesn't know which of the potentially different entry times for each job ID you want it to order by. The aggregate query spells this out.

In a way, "SELECT DISTINCT" is a subset of what you can do with an aggregate query - e.g.
   SELECT DISTINCT X FROM TABLE_A
is pretty much functionally equivalent to:
   SELECT X FROM TABLE_A GROUP BY X
but you can go a lot further with the latter!



isnt this query giving same result as previous suggestion?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

Berend
Starting Member

2 Posts

Posted - 2010-05-19 : 04:48:48

Hi,

The way to solve your problem is realizing the identifying column is the JobName and not the Id or Timestamp

I'v made an example selecting the top 10 users with the latest modification per person this way

Btw the solution provided does not work in a replicated enviroment because highest Id and the newest entry don't have to be the same.

But in normal situations this will do:

SELECT   idtable.id 
, idtable.modifyuser
, datetable.modifydate
FROM
(SELECT TOP 10
gbr.modifyuser
, Max(gbr.modifydate) AS modifydate
FROM
tblgebruikers gbr
WHERE
gbr.modifyuser IS NOT NULL
GROUP BY gbr.modifyuser
) AS datetable
INNER JOIN
(SELECT TOP 10
gbr.modifyuser
, Max(gbr.gebruikerid) AS id
FROM
tblgebruikers gbr
WHERE gbr.modifyuser IS NOT NULL
GROUP BY modifyuser
) idtable
ON idtable.modifyuser = datetable.modifyuser
ORDER BY datetable.modifydate DESC



This way (Job)Names won't appear twice as it avoids grouping on Id and date in one go

Have fun!
Go to Top of Page
   

- Advertisement -