Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Help eliminate dups from correlated subquery

Author  Topic 

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 04:01:27
Hello,

This correlated subquery is supposed to return the country with the highest population in each region, but is returning multiple rows if more than one country in a region has the same population (like zero).

What's the best way to only return one of the countries if they both have the same population ?

Table CIA has 3 columns,
Region = continent
name = country name
population = you guessed it


select region, name, population
from cia a
where population =
(select max(population)
from cia b
where a.region = b.region)




Thanks,
Kevin

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2003-08-12 : 04:05:15
what about using a SELECT TOP 1 MAX(blabla...) ? if it doesn't really matter which one is returned

Cheers,
Frank
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 04:31:03
Nope, that won't work.

The subquery always returns a single value. In this case the problem value is population = 0. Multiple records from the outer query are matching this value of zero, hence I'm getting multiple rows back.


Thanks,
Kevin

Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2003-08-12 : 04:44:11
So you want to return one single value from

select region, name, population
from cia a...?

Sorry, for asking, but in which region is there a population of 0 ?
If there's electricity and internet access I might consider relocating there :-)


Cheers,
Frank
Go to Top of Page

mr_mist
Grunnio

1870 Posts

Posted - 2003-08-12 : 04:45:51
What is your criteria for choosing which country to return in these cases?

SELECT
region, min ([name]), population
from cia a
where population =
(select max(population)
from cia b
where a.region = b.region)
group by region, population

for example, gives you the one with the earliest name...

-------
Moo. :)
Go to Top of Page

mr_mist
Grunnio

1870 Posts

Posted - 2003-08-12 : 04:49:06
quote:
Originally posted by a5xo3z1

what about using a SELECT TOP 1 MAX(blabla...) ? if it doesn't really matter which one is returned





Just for info, you do not need TOP 1 when you have a MAX or other similar agregate function...

-------
Moo. :)
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 05:02:18
quote:
Originally posted by a5xo3z1

So you want to return one single value from

select region, name, population
from cia a...?

Sorry, for asking, but in which region is there a population of 0 ?
If there's electricity and internet access I might consider relocating there :-)


Cheers,
Frank



Well, in this test database the region is "Antarctic Region", still want to relocate ? LOL


Thanks,
Kevin
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 05:07:26
quote:
Originally posted by mr_mist

What is your criteria for choosing which country to return in these cases?

SELECT
region, min ([name]), population
from cia a
where population =
(select max(population)
from cia b
where a.region = b.region)
group by region, population

for example, gives you the one with the earliest name...

-------
Moo. :)



Works great !

Group by still confuses me when it's used like this to exclude records. I guess I should consult something that will explain (using little pictures) how Group By achieves it's magic.


Thanks,
Kevin
Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2003-08-12 : 05:08:39
well, in this case I'll opt for an additional central heating

ROTFL

Cheers,
Frank
Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2003-08-12 : 05:11:10
quote:

Just for info, you do not need TOP 1 when you have a MAX or other similar agregate function...


yup, you're right.
I'll get another cup of coffee, before posting other misleading information

Cheers,
Frank
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 05:19:34
Ok,

Now my curiosity is perked....

I MUST know how this statement is logicaly processed. HELP !

Assume this is what the table data looks like:

Region Name Population
Antartic e 0
Antartic c 0
Antartic b 0
Antartic d 0
Antartic a 0

select region, min(name), population
from cia a
where population =
(select max(population)
from cia b
where a.region = b.region)

group by region, population
order by region, population


My assumption on how this is processed:

1) The first row is retrieved (country e)
2) The inner select is run, which returns the value of 0
3) Here's where I get lost - the min(name) bit loses me. Is the record for country 'e' selected here, then filtered out later ?


Kevin
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-08-12 : 05:45:42
If there are a small number -- 200 say -- countries in CIA, it's quite likely to sort (a copy of) the table by region and population (desc), scan the result with the Segment operation and for each region encountered, and take the top 1 with ties on population, then re-sort it on region and population (asc) and finally stream aggregate the minimum country name. i.e. not do any joins at all.
The Segment operation isn't documented.
Go to Top of Page

mr_mist
Grunnio

1870 Posts

Posted - 2003-08-12 : 05:48:26
The group by batches stuff together by region and population. The min statement means that within each batch only the lowest value for name is returned, so only one record per batch.

How sql server decides to do this I do not know. Someone else here with a better knowledge of the internals can probably help with that. My guess is that it is decided at query compile time which is the best method. read what Arnold said. :)

-------
Moo. :)
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 07:12:03
Arnold,

You're going a little too fast for me, I am still a newbie

Are you saying that the min(name) is determined before the record is selected ?

I don't suppose you've got a flash movie that shows how this works with dancing rows and such ?


Kevin

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-08-12 : 07:47:58
quote:
I MUST know how this statement is logicaly processed. HELP !
I think you're getting too hung up on HOW it works. As mentioned in another thread, SQL is declarative: you tell it WHAT you want, and it gets it for you. HOW it does it really isn't important.

You can dump iodine crystals in the atmosphere, or you can watch Indians dance on the ground. As long as it rains, what difference does the method make?

I'm not trying to be vague or to discourage your inquisitiveness, but delving into the actual steps of how SQL works will only cloud your mind with unnecessary details, and keep you from using it to its fullest.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-08-12 : 08:00:53
Here's another way to look at it (and solve the problem), which I think lends itself to being understood easier because it is a nice step by step process:

Step1: Get the Largest population for each region:

SELECTt region, max(Population) as MaxPop
FROM CIA
GROUP BY region

Step 2: Using the results from Step 1, return all the names in each region that have this max population:

SELECT cia.*
FROM
cia
INNER JOIN
(SQL from step 1) a
ON a.region = cia.region and a.Maxpop = cia.Population

The inner join is acting as a filter in this case. Notice you still have that same problem of multiple names listed per region if they have the same population.


Step 3:

Using the results from step 2, for each region, take the min of the name:

SELECT region, min(name) as Name, Population
FROM
(SQL from step 2) a
GROUP BY Region, Population

Hopefully that will make a little more sense ... sometimes it's more clear to handle filters in the WHERE clause with a correlated sub-query, but other times I feel a JOIN is easier to understand and really lends itself to a more "step by step" understanding of how to work out a problem.

- Jeff
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 10:14:23
quote:
Originally posted by robvolk

quote:
I MUST know how this statement is logicaly processed. HELP !
I think you're getting too hung up on HOW it works. As mentioned in another thread, SQL is declarative: you tell it WHAT you want, and it gets it for you. HOW it does it really isn't important.

You can dump iodine crystals in the atmosphere, or you can watch Indians dance on the ground. As long as it rains, what difference does the method make?

I'm not trying to be vague or to discourage your inquisitiveness, but delving into the actual steps of how SQL works will only cloud your mind with unnecessary details, and keep you from using it to its fullest.



Rob,

I probably wasn't too clear in my explanation.

I'm not actually interested in understanding how SQL Server handles the request internally - I'm just looking to understand how the request works logically.

Until I understand how the request is handled logically, it's really difficult for me to know how to understand why the statements work, and how to best use them.

Believe me, I have no interest in delving into the deep recesses of how a RDMS works it's magic, life is too short.

So - at the risk of offending anyone, are all the records initially selected, and then the record with the min(name) selected ?


Thanks,
Kevin

Go to Top of Page

mr_mist
Grunnio

1870 Posts

Posted - 2003-08-12 : 10:18:44
I'm not sure that you can seperate the logic from the process.

I mean if it does not matter how SQL does it, then surely you can choose whatever logical process works in your head to describe it?

-------
Moo. :)
Go to Top of Page

Jusvistin
Yak Posting Veteran

81 Posts

Posted - 2003-08-12 : 11:11:47
Ok,

I withdraw the question !

I have probably been focusing on the min(name) part, rather than the GROUP BY part.

Why most of the other statements do what they do seems pretty clear, GROUP BY and how it relates to the aggregate condition in the SELECT statement are not nearly as clear to me.

I'll go investigate this some more.


Thanks,
Kevin
Go to Top of Page
   

- Advertisement -