complicated query - please HELP !

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Old Forums

CLOSED - General SQL Server

complicated query - please HELP !

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-06-17 : 13:48:33

Kien writes "Dear SQL team (you will become my savour if you help me answer this brain teaser =)
ARRG..it's been bugging me for a while now !=(

NOTE: sql server 2000, microsoft 2000

my ultimate goal is to have a table similar to the following:


City        Longitude    Latitute     Query
London      10           11           London Oxford Bristol
Oxford      10           12           Oxford London Bristol 
Bristol     11           10           Bristol London Oxford
London      20           20           London Toronto Barrie 
Toronto     21           21           Toronto London Barrie  
Barrie      20           21           Barrie Toronto London
....etc

--

Currently I have a massive table (1 million plus entries) of cities, longitude and latitute. I'm trying to develop a column called "Query" that displays the names of all the cities that are near the city that is in question (say within 1 degree of longitude and latitude).

In otherwords, each field under the "Query" column needs to be able to search through the entire table, looking for neibouring cities to the city that is in question

I need to be able to accomlish this automatically, since my table of cities is huge =(

Notice that for London (20,20), it displays Toronto and Barrie, while London (10,11) displays Oxford and Bristol.

If you have any suggestions, examples, leads...your input would be MUCH APPRECIATED ! =)

Thanks again
Kien"

LarsG
Constraint Violating Yak Guru

284 Posts

Posted - 2002-06-17 : 14:43:23

You could use a function to get adjacent cities. You may have to change som criteria. The code is untested but the general idea should be ok. With the table size there may be some performance issues. Which columns are indexed?

create function calcCities(@longitude int,@latitude int)
returns varchar(400)
as
declare @s varchar(400)

select @s = coalesce(@s + ' ','') + @city
from cities
where longitude between @longitude - 1 and @longitude + 1
and latitude between @latitude - 1 and @latitude + 1
return @s

go

select city,longitude,latitude, calcCities(longitude,latitude)
from cities
where cities in ('London','Bristol')

efelito
Constraint Violating Yak Guru

478 Posts

Posted - 2002-06-17 : 16:05:37

This should do what you are looking for. Unfortunately I was unable to find a set based solution for this problem.

 
DECLARE @CityList varchar(1000)
DECLARE @city varchar(20)
DECLARE @longitude int
DECLARE @latitude int

DECLARE city_cursor CURSOR FOR
	SELECT	city, 
		longitude, 
		latitude
	FROM	CityLocation
OPEN city_cursor
FETCH NEXT FROM city_cursor INTO @city, @longitude, @latitude

WHILE @@FETCH_STATUS = 0
BEGIN
	SELECT 	@CityList = COALESCE(@CityList + ' ', '') + isnull(querycity, '')
	FROM	(SELECT t1.city, 
			t1.longitude, 
			t1.latitude, 
			t2.city AS querycity
		 FROM	CityLocation t1
		 INNER JOIN CityLocation t2 ON (abs(t1.longitude - t2.longitude) <= 1 and abs(t1.latitude - t2.latitude) <= 1)) AS t3 
	WHERE 	city = @city
	AND	longitude = @longitude
	AND	latitude = @latitude

	UPDATE	CityLocation
		SET 	query = @CityList
		WHERE 	city = @city
		AND	longitude = @longitude
		AND	latitude = @latitude
	
	SET @CityList = Null

	FETCH NEXT FROM city_cursor INTO @city, @longitude, @latitude
END
CLOSE city_cursor
DEALLOCATE city_cursor

Jeff Banschbach
Consultant, MCDBA

Edited by - efelito on 06/17/2002 16:07:01

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-17 : 17:03:21

Capt'n Set-Based to the res-queue . . .

First, lets set the stage(I wish ppl would provide this stuff when they post a question!!) ...


create table kien (
    pkey int identity(1,1) not null primary key,
    city varchar(20) not null,
    longitude int not null,
    latitude int not null,
    query varchar(2000) null )
insert kien(city,longitude,latitude)
select 'London',10,11 union
select 'Oxford',10,12 union
select 'Bristol',11,10 union
select 'London',20,20 union
select 'Toronto',21,21 union
select 'Barrie',20,21
go

Now, here is the set based code . . . You better beef up your TempBD if you've got millions of records!!!


--some variable we'll need 
declare @query varchar(2000), @lastpkey int
select @query = '', @lastpkey = -1

--create a temp table to work with
select
    k.pkey,
    k2.city as neighbor,
    space(2000) as query
into
    #worktable
from
    kien k
    inner join kien k2
        on ( abs(k.longitude - k2.longitude) <= 1 and
             abs(k.latitude  - k2.latitude) <= 1)
order by
    k.pkey, k2.city

-- You may or may not want to index tempdb . . . I'll let you play with that
create index idx_worktable on #worktable(pkey,neighbor)

-- update the temp table building the query column
update
    #worktable
set
    @query = query = case
                         when @lastpkey <> pkey then neighbor
                         else @query + ' ' + neighbor
                     end,
    @lastpkey = pkey

-- now update the final table from the temp table
update
    kien
set
    query = (select max(query) from #worktable where pkey = k.pkey)
from
    kien k

-- clean up
drop table #worktable

Class dismissed

<O>

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-17 : 17:08:45

Oh yeah, your sample data shows Oxford in the Query column for Bristol and your requirements say "within 1 degree of longitude and latitude". . . what gives?

<O>

efelito
Constraint Violating Yak Guru

478 Posts

Posted - 2002-06-18 : 10:09:19

Thanks for the set based solution capt'n. I really did try to resist the dark side on this one and come up with something like that. I was close, but I always forget about the set @var = col = value trick. Coming from an application dev background is helpful in db admin in a lot of ways, but I think it's actually harmful when it comes to creating set based solutions.

Jeff Banschbach
Consultant, MCDBA

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-06-18 : 10:52:18

quote:

(say within 1 degree of longitude and latitude).

So for places on the equator, that's a patch 222 km E-W by about the same N-S (a little less, but not much). But at 60°N, the patch is only half as wide. That's not exactly fair!

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-06-18 : 10:58:37

Well yeah, but most equatorial territory is jungle, desert, or mountainous, really not that heavily populated.

OK, OK, Rio. You got me...

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-06-18 : 11:03:07

While I'm throwing spanners...

abs(k.longitude - k2.longitude) <= 1 and abs(k.latitude - k2.latitude) <= 1)

isn't right either: think of a map of the Pacific.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-06-18 : 18:10:37

Hey people! We haven't solved this one and it's interesting!!

For the time being let's pretend I didn't make those snide remarks about the world being spheroidal, and imagine that we are dealing with a flat, rectangular area.
First let's create some cities.


CREATE TABLE Cities (
  id int IDENTITY(1,1) PRIMARY KEY,
  long numeric(6,3) NOT NULL,
  lat numeric(5,2) NOT NULL
)
GO

INSERT INTO Cities (long, lat)
SELECT RAND(CAST(NEWID() AS binary(4))) * 360 - 180,
  RAND(CAST(NEWID() AS binary(4))) * 180 - 90
FROM Numbers
WHERE n < 5000
GO

Numbers is the usual single column (n) of integers table.
Although these cities have "latitude" and "longitude" with degreeoid values, they're really just on a 360×180 rectangle.
(and apologies for using "long" as a column name, but it's not a SQL keyword

)


SELECT C1.id AS id1, C2.id AS id2
FROM Cities AS C1
INNER JOIN Cities AS C2
  ON ABS(C1.long - C2.long) <= 1.0 AND ABS(C1.lat - C2.lat) <= 1.0
  AND C1.id <> C2.id

The trivial case where C1.id = C2.id is omitted, but it makes little difference to performance -- either way it sucks.

It's that join condition: there's no way SQL Server can do anything except nested loop scans.
Also note the number of rows: we have only 5000 rows and already have about 1540 neighbours -- how's that going to grow when there's 10^6 rows?

Here's a possible way out of the problem: use a grid to cut down the numbers


-- just saving my fingers here
CREATE VIEW CitiesSq
AS
SELECT id, long, lat,
  CAST(FLOOR(long) AS smallint) AS longsq,
  CAST(FLOOR(lat) AS smallint) AS latsq
FROM Cities
GO

-- Caution: Ugly code ahead!
SELECT id1, id2
FROM (
    SELECT C1.id AS id1, C1.long AS long1, C1.lat AS lat1,
      C2.id AS id2, C2.long AS long2, C2.lat AS lat2
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq AND C1.latsq = C2.latsq
      AND C1.id <> C2.id
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq AND C1.latsq = C2.latsq + 1
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq AND C1.latsq = C2.latsq - 1
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq + 1 AND C1.latsq = C2.latsq
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq + 1 AND C1.latsq = C2.latsq + 1
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq + 1 AND C1.latsq = C2.latsq - 1
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq - 1 AND C1.latsq = C2.latsq
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq - 1 AND C1.latsq = C2.latsq + 1
  UNION ALL
    SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat
    FROM CitiesSq AS C1
    INNER JOIN CitiesSq AS C2
      ON C1.longsq = C2.longsq - 1 AND C1.latsq = C2.latsq - 1
  ) AS NeighbouringSquare
WHERE ABS(long1 - long2) <= 1.0 AND ABS(lat1 - lat2) <= 1.0

I've tried this as far as 100000 cities, and it scales ok, but they are uniform random distributed -- I would guess that real cities will have more clumping and that this would slow things down. With the population scheme above, the number of neighbours appears to be 6.17863E-05 * n^2 or thereabouts, which would give around 61.7 million neighbours for 1 million cities. The number of neighbours per city looks like a ~~LogNormal~~ Poisson distribution to me (I am not a statistician), but it's getting late and I have no idea what (if anything) that means. Anyway, you may want to think over whether 1 degree either way is really what you want.

Edited by - Arnold Fribble on 06/19/2002 02:14:46

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-06-19 : 09:56:21

What's the distinction between Page's ABS(long1 - long2) <= 1.0 AND ABS(lat1 - lat2) <= 1.0 and your union all query? I notice in one case you've included the city1.id != city2.id, but generally the individual branches of the union's seem redundant.

setBasedIsTheTruepath
<O>

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-06-19 : 12:53:55

I didn't make it explicit, but my big post was assuming SQL Server 2000. I haven't tested with earlier versions, so I don't know how the behaviour compares.

Ok, in isolation, finding the Manhattan distance using

ABS(C1.long - C2.long) <= 1.0 AND ABS(C1.lat - C2.lat) <= 1.0

appears to be faster as a filter than

C1.long BETWEEN C2.long - 1 AND C2.lat + 1
AND C1.lat BETWEEN C2.lat - 1 AND C2.lat + 1

However, the former cannot utilize indexes: a join using that condition has no option but to scan the whole table for suitable C2 rows for every C1 row. Which unless all cities are in a 1 by 1 square (and so all pairs match), is not very efficient, especially when there are over 1 million rows in the table.

If you create a covering index on Cities:

CREATE INDEX LongLat ON Cities (long, lat, id)

and try the BETWEEN form, you'll find it does use the index:


SELECT C1.id AS id1, C2.id AS id2
FROM Cities AS C1
INNER JOIN Cities AS C2
  ON C1.long BETWEEN C2.long - 1.0 AND C2.long + 1.0
  AND C1.lat BETWEEN C2.lat - 1.0 AND C2.lat + 1.0
  AND C1.id <> C2.id

It still has to use a loop because the join is using the BETWEEN, but at least you get a seek on the inner loop. But because both the longitude and latitude have BETWEENs, the SEEK condition cannot do all the work. What it does is return the rows with values that fall in this area:


+------------------------------------------+
|                   22                     |
|                   22                     |
|                   222                    |
|                   212                    |  ^
|                   222                    |  |
|                    22                    |  |
|                    22                    | Lat
|                    22                    |
+------------------------------------------+
   Long -->

1 represents the value of C1.long and C1.lat in the outer loop
2 represents the possible values of C2.long and C2.lat returned by the SEEK.
The WHERE part then filters out the rows that don't fulfill the full join condition.

Pretty neat, but there's still a lot of cities returned by the SEEK that can't possibly be within the distance. Which is where the grid comes in...
Cities are neighbours if the distance between them is no more than 1 'degree'. So we assign each city to a square 1×1 'degree' -- the view CitiesSq. Now each neighbouring cities C2 of a city C1 must either be in one of the 8 neighbouring squares or in the same square. The advantage of this is that we can do equijoins on each case and (a) use a hash join and reduce the number of table scans (b) eliminate far more impossible pairs.
The UNION ALL is just testing each of C1's neighbouring square to find C2. The C1.id <> C2.id condition is only required when C1 and C2 are in the same square.


+------------------------------------------+
|                                          |
|                                          |
|                   222                    |
|                   212                    |  ^
|                   222                    |  |
|                                          |  |
|                                          | Lat
|                                          |
+------------------------------------------+
   Long -->

In practice, I found it rather difficult to get a good query plan without explicitly enumerating the neighbouring squares, hence the ugly UNION query. With a bit of head bashing, I got it to do something similar with one hash join (though I guess you're trading hash table size for number of scans), but it's arguably even uglier, and without the join hint it's a disaster!


SELECT C1.id AS id1, C2.id AS id2
FROM (
    SELECT id, long, lat,
      (CAST(FLOOR(long) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat
    FROM Cities
  ) AS C1
INNER HASH JOIN (
    SELECT id, long, lat,
      (CAST(FLOOR(long) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat
    FROM Cities
    CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta
    CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta
  ) AS C2
  ON C1.longlat = C2.longlat
WHERE ABS(C1.long - C2.long) <= 1.0 AND ABS(C1.lat - C2.lat) <= 1.0
  AND C1.id <> C2.id

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-19 : 15:07:19

Frib, thanks for your thoughtful approach to this. I need to spend a bit more time with your post to fully grasp it.

For Kien(or anyone), in case this isn't/wasnt' clear, Mr.Frib's model (and resulting rowset) is a better way of defining the #worktable temp table from my post. You would still use the two update statements I outlined to build your final 'Query' column from the #worktable populated by Frib's rowset. (correct me if I am wrong)

I can't imagine it would be better, but I'd love to hear how an iterative method of building the 'Query' column stacks up against the set based method we have/are come/coming up with...

<O>

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-06-19 : 16:56:08

quote:

Frib, thanks for your thoughtful approach to this. I need to spend a bit more time with your post to fully grasp it.

It's spookily close to the first few pages of Chapter 26 (Range Searching) in Robert Sedgewick's Algorithms (least it is in my ancient '84 edition with examples in nasty-looking Pascal). I didn't get on to 2D Trees because I couldn't see any way of doing that sensibly in this context.

quote:

For Kien(or anyone), in case this isn't/wasnt' clear, Mr.Frib's model (and resulting rowset) is a better way of defining the #worktable temp table from my post. You would still use the two update statements I outlined to build your final 'Query' column from the #worktable populated by Frib's rowset. (correct me if I am wrong)

Yes, ~~I couldn't be arsed~~ you seemed to have that part sorted out.

quote:

I can't imagine it would be better, but I'd love to hear how an iterative method of building the 'Query' column stacks up against the set based method we have/are come/coming up with...

Don't know about building the Query column, but building the test data with a set-based mechanism is a good deal faster than any while loop.

kien
Starting Member

27 Posts

Posted - 2002-07-11 : 17:20:55

Thank you to all the SQL Gurus who responded to this problem. Your help was much appreciated

It's been nearly a month since i posted this problem. I finally have a chance to work on this problem now.

For starters, i'm trying out efelito's code.
I have one quick question (not sure how quickly the solution will take though):

instead of searching within 1 degree of lat/longitude, how do search for within 100km radius given the code that i'm working with.
(i know it has to do with the earths radius which is 6367km)

Thanks again for your input.

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-07-11 : 18:16:55

There's a T-SQL translation of a formula for the Great Circle distance in this thread:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=12572
I didn't test it extensively, but I think it works ok.
But you'd probably want apply something faster to get a rough answer first. Obviously I'd recommend a grid-based method like I presented, since it seemed to work fastest for me, but you'd need to consider how high-latitude areas and the 'seam' down the pacific are handled. Perhaps chop it into 4 pieces: 2 polar caps, a wide strip around the eqator and the seam. The UNION to get the results back together could be a bit slow, though.

It's late again

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-07-12 : 07:36:35

Ok, I've been having a play around with what I outlined last night.

Here's what I've been using as test data -- as I said previously, this isn't particularly realistic because the distribution of cites around the world is not uniform, but at least the extra ASIN term stops the poles being overpopulated here:


INSERT INTO Cities (long, lat)
SELECT RAND(CAST(NEWID() AS binary(4))) * 360 - 180,
  DEGREES(ASIN(RAND(CAST(NEWID() AS binary(4))) * 2.0 - 1.0))
FROM Numbers
WHERE n < 100000
GO

I suggested splitting the globe into 4 parts. On reflection, perhaps 8 is better: an additional pair of rings around the Northern and Southern latitudes from 60 to 75 to account for the smaller degree distances, in addition, the Pacific seam becomes 3 parts. That's what I've used here. As long as you're careful to make sure the pieces don't overlap, you can stick the pieces back together with a UNION ALL which isn't costly if it's using concatenation.
The latitude-within-1-degree check, the non-reflexive check and the final (hopefully!) distance check have been factored out, for clarity -- I don't think it makes much difference to performance either way. The estimate of 61 million rows for a uniform distribution of 1 million cities still seems to be about right. I'm getting run times of ~20 minutes on my desktop machine for 200000, so budget for a few hours running 1 million -- the run time is pretty much quadratic from what I can see.


SELECT C1id, C2id, C1long, C1lat, C2long, C2lat,
  dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) AS distance
FROM (
    -- Equatorial band: latitude (-60, +60), longitude (-178, +178), 2 degree longd grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM (
        SELECT id, long, lat,
          (CAST(FLOOR(long/2.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat
        FROM Cities
        WHERE long > -178 AND long < 178 AND lat > -60 AND lat < 60
      ) AS C1
    INNER HASH JOIN (
        SELECT id, long, lat,
          (CAST(FLOOR(long/2.0) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat
        FROM Cities
        CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta
        CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta
        WHERE lat > -61 AND lat < 61
      ) AS C2
      ON C1.longlat = C2.longlat
    WHERE 
      ABS(C1.long - C2.long) <= 2.0
    
    UNION ALL
    
    -- Northern band: latitude [+60, +75), longitude (-176, +176), 4 degree longd grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM (
        SELECT id, long, lat,
          (CAST(FLOOR(long/4.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat
        FROM Cities
        WHERE long > -176 AND long < 176 AND lat >= 60 AND lat < 75
      ) AS C1
    INNER HASH JOIN (
        SELECT id, long, lat,
          (CAST(FLOOR(long/4.0) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat
        FROM Cities
        CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta
        CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta
        WHERE lat >= 59 AND lat < 76
      ) AS C2
      ON C1.longlat = C2.longlat
    WHERE 
      ABS(C1.long - C2.long) <= 4.0
    
    UNION ALL
    
    -- Southern band: latitude [-60, -75), longitude (-176, +176), 4 degree longd grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM (
        SELECT id, long, lat,
          (CAST(FLOOR(long/4.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat
        FROM Cities
        WHERE long > -176 AND long < 176 AND lat <= -60 AND lat > -75
      ) AS C1
    INNER HASH JOIN (
        SELECT id, long, lat,
          (CAST(FLOOR(long/4.0) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat
        FROM Cities
        CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta
        CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta
        WHERE lat <= -59 AND lat > -76
      ) AS C2
      ON C1.longlat = C2.longlat
    WHERE 
      ABS(C1.long - C2.long) <= 4.0
    
    UNION ALL
    
    -- North polar cap: latitude >=+75, no grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM Cities C1
    INNER JOIN Cities AS C2
      ON C1.lat >= 75 AND C2.lat >= 74
    
    UNION ALL
    
    -- Southern polar cap: latitude <=-75, no grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM Cities C1
    INNER JOIN Cities AS C2
      ON C1.lat <= -75 AND C2.lat <= -74
    
    UNION ALL
    
    -- Pacific seam Equatorial: latitude (-60, +60), longitude <=-178 or >=178, no grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM Cities AS C1
    INNER JOIN Cities AS C2
      ON (C1.long <= -178 OR C1.long >= 178) AND C1.lat > -60 AND C1.lat < 60
      AND (C2.long <= -176 OR C2.long >= 176) AND C2.lat > -61 AND C2.lat < 61

    UNION ALL

    -- Pacific seam Northern: latitude [+60, +75), longitude <=-176 or >=176, no grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM Cities AS C1
    INNER JOIN Cities AS C2
      ON (C1.long <= -176 OR C1.long >= 176) AND C1.lat >= 60 AND C1.lat < 75
      AND (C2.long <= -172 OR C2.long >= 172) AND C2.lat >= 59 AND C2.lat < 76

    UNION ALL

    -- Pacific seam Southern: latitude [-60, -75), longitude <=-176 or >=176, no grid
    SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
    FROM Cities AS C1
    INNER JOIN Cities AS C2
      ON (C1.long <= -176 OR C1.long >= 176) AND C1.lat <= -60 AND C1.lat > -75
      AND (C2.long <= -172 OR C2.long >= 172) AND C2.lat <= -59 AND C2.lat > -76
  ) a
WHERE C1id <> C2id AND ABS(C1lat - C2lat) <= 1.0
  AND dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) <= 100

Edit: had Pacific seam wrong (or more wrong!) before.

Edited by - Arnold Fribble on 07/12/2002 08:44:40

kien
Starting Member

27 Posts

Posted - 2002-07-12 : 15:45:17

Hi Mr. Fribble

I'm not sure if I'm missing something from your code, but where did you concatenate all the city names?

Besides the user function GCDist, is there any other things i should add?

thanks again.

kien
Starting Member

27 Posts

Posted - 2002-07-12 : 16:16:07

Please excuse my previous message.

What I meant was...
I'm not quite sure when Page47 says "You would still use the two update statements I outlined to build your final 'Query' column from the #worktable populated by Frib's rowset"

Forgive my ignorance, but this simple task doesn't seem very straight forward to me. Where exactly do i replace what with what.

Further more, how do I declare the #worktable (ie what variables?)
same as the "table kien"?:

pkey int identity(1,1) not null primary key,
city varchar(20) not null,
longitude int not null,
latitude int not null,
query varchar(2000) null

Sincerely

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-07-12 : 16:39:08

Essentially my query replaces the SELECT INTO commented "create a temp table to work with" in Page's code. At least, it would if it was a SELECT INTO, had an ORDER BY and we'd used the same table definitions

Page's stuff following that would remain -- the UPDATE with an assign to concatenate together the lists of cities and the final copy back into the original table.

I didn't do all that, partly because it was already covered, but mostly because I was more concerned about how you self-join a 1 million row table on a not-obviously sargable condition without it taking a couple of years to get a result.

BTW, how many cities do you really have? 1 million seems an awful lot: if there are 6 billion people in the world, then your smallest cities would have to contain fewer than 6000 people!

kien
Starting Member

27 Posts

Posted - 2002-07-15 : 11:09:15

Hi Mr. Fribble & Page47

The table actually consists of city/village/town, that's why there are so many records.

I've tried out Page47's code on 1000 records...wow SO much faster than the cursor solution =)
how come one does not have to declare #worktable...i find this very odd!

okay here is my major problem trying to "merge" both of your codes. For starters, I do not yet fully understand Frib's code...but here's what I think i should do. Please correct me if i'm wrong -which i think i am probably guilty off cause it doesn't run

If the following makes no sense at, it is because i'm very confused with all the different table column names =(

THANK YOU !!!!!
-----
table definition: Cities [id, city, long, lat, query]

--some variable we'll need
declare @query varchar(2000), @lastpkey int
select @query = '', @lastpkey = -1

--create a temp table to work with
SELECT C1id, C2id, C1long, C1lat, C2long, C2lat,
dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) AS distance, Cities ??.city as neighbor, space(2000) as query
into
#worktable
FROM (
-- Equatorial band: latitude (-60, +60), longitude (-178, +178), 2 degree longd grid
SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat
FROM (
SELECT id, long, lat, city,
(CAST(FLOOR(long/2.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat
FROM Cities
--Frib's code here...
WHERE C1id <> C2id AND ABS(C1lat - C2lat) <= 1.0
AND dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) <= 100

order by
C1.id, C2.city ????

-- You may or may not want to index tempdb . . . I'll let you play with that
create index idx_worktable on #worktable(pkey,neighbor)

-- update the temp table building the query column
update
#worktable
set
@query = query = case
when @lastpkey <> id then neighbor
else @query + ' ' + neighbor
end,
@lastpkey = id

-- now update the final table from the temp table
update
Cities
set
query = (select max(query) from #worktable where id = C1.id)
from
Cities C1

-- clean up
drop table #worktable

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources