Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2002-06-17 : 13:48:33
|
Kien writes "Dear SQL team (you will become my savour if you help me answer this brain teaser =)ARRG..it's been bugging me for a while now !=(NOTE: sql server 2000, microsoft 2000my ultimate goal is to have a table similar to the following:City Longitude Latitute QueryLondon 10 11 London Oxford BristolOxford 10 12 Oxford London Bristol Bristol 11 10 Bristol London OxfordLondon 20 20 London Toronto Barrie Toronto 21 21 Toronto London Barrie Barrie 20 21 Barrie Toronto London....etc --Currently I have a massive table (1 million plus entries) of cities, longitude and latitute. I'm trying to develop a column called "Query" that displays the names of all the cities that are near the city that is in question (say within 1 degree of longitude and latitude).In otherwords, each field under the "Query" column needs to be able to search through the entire table, looking for neibouring cities to the city that is in questionI need to be able to accomlish this automatically, since my table of cities is huge =(Notice that for London (20,20), it displays Toronto and Barrie, while London (10,11) displays Oxford and Bristol.If you have any suggestions, examples, leads...your input would be MUCH APPRECIATED ! =)Thanks againKien" |
|
LarsG
Constraint Violating Yak Guru
284 Posts |
Posted - 2002-06-17 : 14:43:23
|
You could use a function to get adjacent cities. You may have to change som criteria. The code is untested but the general idea should be ok. With the table size there may be some performance issues. Which columns are indexed? create function calcCities(@longitude int,@latitude int)returns varchar(400)asdeclare @s varchar(400)select @s = coalesce(@s + ' ','') + @city from cities where longitude between @longitude - 1 and @longitude + 1 and latitude between @latitude - 1 and @latitude + 1 return @sgoselect city,longitude,latitude, calcCities(longitude,latitude) from cities where cities in ('London','Bristol') |
 |
|
efelito
Constraint Violating Yak Guru
478 Posts |
Posted - 2002-06-17 : 16:05:37
|
This should do what you are looking for. Unfortunately I was unable to find a set based solution for this problem. DECLARE @CityList varchar(1000)DECLARE @city varchar(20)DECLARE @longitude intDECLARE @latitude intDECLARE city_cursor CURSOR FOR SELECT city, longitude, latitude FROM CityLocationOPEN city_cursorFETCH NEXT FROM city_cursor INTO @city, @longitude, @latitudeWHILE @@FETCH_STATUS = 0BEGIN SELECT @CityList = COALESCE(@CityList + ' ', '') + isnull(querycity, '') FROM (SELECT t1.city, t1.longitude, t1.latitude, t2.city AS querycity FROM CityLocation t1 INNER JOIN CityLocation t2 ON (abs(t1.longitude - t2.longitude) <= 1 and abs(t1.latitude - t2.latitude) <= 1)) AS t3 WHERE city = @city AND longitude = @longitude AND latitude = @latitude UPDATE CityLocation SET query = @CityList WHERE city = @city AND longitude = @longitude AND latitude = @latitude SET @CityList = Null FETCH NEXT FROM city_cursor INTO @city, @longitude, @latitudeENDCLOSE city_cursorDEALLOCATE city_cursor Jeff BanschbachConsultant, MCDBAEdited by - efelito on 06/17/2002 16:07:01 |
 |
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-06-17 : 17:03:21
|
Capt'n Set-Based to the res-queue . . .First, lets set the stage(I wish ppl would provide this stuff when they post a question!!) ...create table kien ( pkey int identity(1,1) not null primary key, city varchar(20) not null, longitude int not null, latitude int not null, query varchar(2000) null )insert kien(city,longitude,latitude)select 'London',10,11 unionselect 'Oxford',10,12 unionselect 'Bristol',11,10 unionselect 'London',20,20 unionselect 'Toronto',21,21 unionselect 'Barrie',20,21go Now, here is the set based code . . . You better beef up your TempBD if you've got millions of records!!!--some variable we'll need declare @query varchar(2000), @lastpkey intselect @query = '', @lastpkey = -1--create a temp table to work withselect k.pkey, k2.city as neighbor, space(2000) as queryinto #worktablefrom kien k inner join kien k2 on ( abs(k.longitude - k2.longitude) <= 1 and abs(k.latitude - k2.latitude) <= 1)order by k.pkey, k2.city-- You may or may not want to index tempdb . . . I'll let you play with thatcreate index idx_worktable on #worktable(pkey,neighbor)-- update the temp table building the query columnupdate #worktableset @query = query = case when @lastpkey <> pkey then neighbor else @query + ' ' + neighbor end, @lastpkey = pkey-- now update the final table from the temp tableupdate kienset query = (select max(query) from #worktable where pkey = k.pkey)from kien k-- clean updrop table #worktable Class dismissed <O> |
 |
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-06-17 : 17:08:45
|
Oh yeah, your sample data shows Oxford in the Query column for Bristol and your requirements say "within 1 degree of longitude and latitude". . . what gives?<O> |
 |
|
efelito
Constraint Violating Yak Guru
478 Posts |
Posted - 2002-06-18 : 10:09:19
|
Thanks for the set based solution capt'n. I really did try to resist the dark side on this one and come up with something like that. I was close, but I always forget about the set @var = col = value trick. Coming from an application dev background is helpful in db admin in a lot of ways, but I think it's actually harmful when it comes to creating set based solutions.Jeff BanschbachConsultant, MCDBA |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-06-18 : 10:52:18
|
quote: (say within 1 degree of longitude and latitude).
So for places on the equator, that's a patch 222 km E-W by about the same N-S (a little less, but not much). But at 60°N, the patch is only half as wide. That's not exactly fair! |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-06-18 : 10:58:37
|
Well yeah, but most equatorial territory is jungle, desert, or mountainous, really not that heavily populated. OK, OK, Rio. You got me... |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-06-18 : 11:03:07
|
While I'm throwing spanners...abs(k.longitude - k2.longitude) <= 1 and abs(k.latitude - k2.latitude) <= 1)isn't right either: think of a map of the Pacific. |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-06-18 : 18:10:37
|
Hey people! We haven't solved this one and it's interesting!! For the time being let's pretend I didn't make those snide remarks about the world being spheroidal, and imagine that we are dealing with a flat, rectangular area. First let's create some cities.CREATE TABLE Cities ( id int IDENTITY(1,1) PRIMARY KEY, long numeric(6,3) NOT NULL, lat numeric(5,2) NOT NULL)GOINSERT INTO Cities (long, lat)SELECT RAND(CAST(NEWID() AS binary(4))) * 360 - 180, RAND(CAST(NEWID() AS binary(4))) * 180 - 90FROM NumbersWHERE n < 5000GO Numbers is the usual single column (n) of integers table.Although these cities have "latitude" and "longitude" with degreeoid values, they're really just on a 360×180 rectangle.(and apologies for using "long" as a column name, but it's not a SQL keyword )SELECT C1.id AS id1, C2.id AS id2FROM Cities AS C1INNER JOIN Cities AS C2 ON ABS(C1.long - C2.long) <= 1.0 AND ABS(C1.lat - C2.lat) <= 1.0 AND C1.id <> C2.id The trivial case where C1.id = C2.id is omitted, but it makes little difference to performance -- either way it sucks. It's that join condition: there's no way SQL Server can do anything except nested loop scans.Also note the number of rows: we have only 5000 rows and already have about 1540 neighbours -- how's that going to grow when there's 10^6 rows?Here's a possible way out of the problem: use a grid to cut down the numbers-- just saving my fingers hereCREATE VIEW CitiesSqASSELECT id, long, lat, CAST(FLOOR(long) AS smallint) AS longsq, CAST(FLOOR(lat) AS smallint) AS latsqFROM CitiesGO-- Caution: Ugly code ahead!SELECT id1, id2FROM ( SELECT C1.id AS id1, C1.long AS long1, C1.lat AS lat1, C2.id AS id2, C2.long AS long2, C2.lat AS lat2 FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq AND C1.latsq = C2.latsq AND C1.id <> C2.id UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq AND C1.latsq = C2.latsq + 1 UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq AND C1.latsq = C2.latsq - 1 UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq + 1 AND C1.latsq = C2.latsq UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq + 1 AND C1.latsq = C2.latsq + 1 UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq + 1 AND C1.latsq = C2.latsq - 1 UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq - 1 AND C1.latsq = C2.latsq UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq - 1 AND C1.latsq = C2.latsq + 1 UNION ALL SELECT C1.id, C1.long, C1.lat, C2.id, C2.long, C2.lat FROM CitiesSq AS C1 INNER JOIN CitiesSq AS C2 ON C1.longsq = C2.longsq - 1 AND C1.latsq = C2.latsq - 1 ) AS NeighbouringSquareWHERE ABS(long1 - long2) <= 1.0 AND ABS(lat1 - lat2) <= 1.0 I've tried this as far as 100000 cities, and it scales ok, but they are uniform random distributed -- I would guess that real cities will have more clumping and that this would slow things down. With the population scheme above, the number of neighbours appears to be 6.17863E-05 * n^2 or thereabouts, which would give around 61.7 million neighbours for 1 million cities. The number of neighbours per city looks like a LogNormal Poisson distribution to me (I am not a statistician), but it's getting late and I have no idea what (if anything) that means. Anyway, you may want to think over whether 1 degree either way is really what you want.Edited by - Arnold Fribble on 06/19/2002 02:14:46 |
 |
|
setbasedisthetruepath
Used SQL Salesman
992 Posts |
Posted - 2002-06-19 : 09:56:21
|
What's the distinction between Page's ABS(long1 - long2) <= 1.0 AND ABS(lat1 - lat2) <= 1.0 and your union all query? I notice in one case you've included the city1.id != city2.id, but generally the individual branches of the union's seem redundant.setBasedIsTheTruepath<O> |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-06-19 : 12:53:55
|
I didn't make it explicit, but my big post was assuming SQL Server 2000. I haven't tested with earlier versions, so I don't know how the behaviour compares.Ok, in isolation, finding the Manhattan distance usingABS(C1.long - C2.long) <= 1.0 AND ABS(C1.lat - C2.lat) <= 1.0appears to be faster as a filter thanC1.long BETWEEN C2.long - 1 AND C2.lat + 1 AND C1.lat BETWEEN C2.lat - 1 AND C2.lat + 1However, the former cannot utilize indexes: a join using that condition has no option but to scan the whole table for suitable C2 rows for every C1 row. Which unless all cities are in a 1 by 1 square (and so all pairs match), is not very efficient, especially when there are over 1 million rows in the table.If you create a covering index on Cities:CREATE INDEX LongLat ON Cities (long, lat, id)and try the BETWEEN form, you'll find it does use the index:SELECT C1.id AS id1, C2.id AS id2FROM Cities AS C1INNER JOIN Cities AS C2 ON C1.long BETWEEN C2.long - 1.0 AND C2.long + 1.0 AND C1.lat BETWEEN C2.lat - 1.0 AND C2.lat + 1.0 AND C1.id <> C2.id It still has to use a loop because the join is using the BETWEEN, but at least you get a seek on the inner loop. But because both the longitude and latitude have BETWEENs, the SEEK condition cannot do all the work. What it does is return the rows with values that fall in this area:+------------------------------------------+| 22 || 22 || 222 || 212 | ^| 222 | || 22 | || 22 | Lat| 22 |+------------------------------------------+ Long --> 1 represents the value of C1.long and C1.lat in the outer loop2 represents the possible values of C2.long and C2.lat returned by the SEEK.The WHERE part then filters out the rows that don't fulfill the full join condition.Pretty neat, but there's still a lot of cities returned by the SEEK that can't possibly be within the distance. Which is where the grid comes in...Cities are neighbours if the distance between them is no more than 1 'degree'. So we assign each city to a square 1×1 'degree' -- the view CitiesSq. Now each neighbouring cities C2 of a city C1 must either be in one of the 8 neighbouring squares or in the same square. The advantage of this is that we can do equijoins on each case and (a) use a hash join and reduce the number of table scans (b) eliminate far more impossible pairs.The UNION ALL is just testing each of C1's neighbouring square to find C2. The C1.id <> C2.id condition is only required when C1 and C2 are in the same square.+------------------------------------------+| || || 222 || 212 | ^| 222 | || | || | Lat| |+------------------------------------------+ Long --> In practice, I found it rather difficult to get a good query plan without explicitly enumerating the neighbouring squares, hence the ugly UNION query. With a bit of head bashing, I got it to do something similar with one hash join (though I guess you're trading hash table size for number of scans), but it's arguably even uglier, and without the join hint it's a disaster!SELECT C1.id AS id1, C2.id AS id2FROM ( SELECT id, long, lat, (CAST(FLOOR(long) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat FROM Cities ) AS C1INNER HASH JOIN ( SELECT id, long, lat, (CAST(FLOOR(long) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat FROM Cities CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta ) AS C2 ON C1.longlat = C2.longlatWHERE ABS(C1.long - C2.long) <= 1.0 AND ABS(C1.lat - C2.lat) <= 1.0 AND C1.id <> C2.id |
 |
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-06-19 : 15:07:19
|
Frib, thanks for your thoughtful approach to this. I need to spend a bit more time with your post to fully grasp it.For Kien(or anyone), in case this isn't/wasnt' clear, Mr.Frib's model (and resulting rowset) is a better way of defining the #worktable temp table from my post. You would still use the two update statements I outlined to build your final 'Query' column from the #worktable populated by Frib's rowset. (correct me if I am wrong)I can't imagine it would be better, but I'd love to hear how an iterative method of building the 'Query' column stacks up against the set based method we have/are come/coming up with...<O> |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-06-19 : 16:56:08
|
quote: Frib, thanks for your thoughtful approach to this. I need to spend a bit more time with your post to fully grasp it.
It's spookily close to the first few pages of Chapter 26 (Range Searching) in Robert Sedgewick's Algorithms (least it is in my ancient '84 edition with examples in nasty-looking Pascal). I didn't get on to 2D Trees because I couldn't see any way of doing that sensibly in this context.quote: For Kien(or anyone), in case this isn't/wasnt' clear, Mr.Frib's model (and resulting rowset) is a better way of defining the #worktable temp table from my post. You would still use the two update statements I outlined to build your final 'Query' column from the #worktable populated by Frib's rowset. (correct me if I am wrong)
Yes, I couldn't be arsed you seemed to have that part sorted out.quote: I can't imagine it would be better, but I'd love to hear how an iterative method of building the 'Query' column stacks up against the set based method we have/are come/coming up with...
Don't know about building the Query column, but building the test data with a set-based mechanism is a good deal faster than any while loop. |
 |
|
kien
Starting Member
27 Posts |
Posted - 2002-07-11 : 17:20:55
|
Thank you to all the SQL Gurus who responded to this problem. Your help was much appreciated It's been nearly a month since i posted this problem. I finally have a chance to work on this problem now.For starters, i'm trying out efelito's code.I have one quick question (not sure how quickly the solution will take though):instead of searching within 1 degree of lat/longitude, how do search for within 100km radius given the code that i'm working with.(i know it has to do with the earths radius which is 6367km)Thanks again for your input. |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-07-11 : 18:16:55
|
There's a T-SQL translation of a formula for the Great Circle distance in this thread:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=12572I didn't test it extensively, but I think it works ok.But you'd probably want apply something faster to get a rough answer first. Obviously I'd recommend a grid-based method like I presented, since it seemed to work fastest for me, but you'd need to consider how high-latitude areas and the 'seam' down the pacific are handled. Perhaps chop it into 4 pieces: 2 polar caps, a wide strip around the eqator and the seam. The UNION to get the results back together could be a bit slow, though.It's late again |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-07-12 : 07:36:35
|
Ok, I've been having a play around with what I outlined last night.Here's what I've been using as test data -- as I said previously, this isn't particularly realistic because the distribution of cites around the world is not uniform, but at least the extra ASIN term stops the poles being overpopulated here:INSERT INTO Cities (long, lat)SELECT RAND(CAST(NEWID() AS binary(4))) * 360 - 180, DEGREES(ASIN(RAND(CAST(NEWID() AS binary(4))) * 2.0 - 1.0))FROM NumbersWHERE n < 100000GO I suggested splitting the globe into 4 parts. On reflection, perhaps 8 is better: an additional pair of rings around the Northern and Southern latitudes from 60 to 75 to account for the smaller degree distances, in addition, the Pacific seam becomes 3 parts. That's what I've used here. As long as you're careful to make sure the pieces don't overlap, you can stick the pieces back together with a UNION ALL which isn't costly if it's using concatenation.The latitude-within-1-degree check, the non-reflexive check and the final (hopefully!) distance check have been factored out, for clarity -- I don't think it makes much difference to performance either way. The estimate of 61 million rows for a uniform distribution of 1 million cities still seems to be about right. I'm getting run times of ~20 minutes on my desktop machine for 200000, so budget for a few hours running 1 million -- the run time is pretty much quadratic from what I can see.SELECT C1id, C2id, C1long, C1lat, C2long, C2lat, dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) AS distanceFROM ( -- Equatorial band: latitude (-60, +60), longitude (-178, +178), 2 degree longd grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM ( SELECT id, long, lat, (CAST(FLOOR(long/2.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat FROM Cities WHERE long > -178 AND long < 178 AND lat > -60 AND lat < 60 ) AS C1 INNER HASH JOIN ( SELECT id, long, lat, (CAST(FLOOR(long/2.0) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat FROM Cities CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta WHERE lat > -61 AND lat < 61 ) AS C2 ON C1.longlat = C2.longlat WHERE ABS(C1.long - C2.long) <= 2.0 UNION ALL -- Northern band: latitude [+60, +75), longitude (-176, +176), 4 degree longd grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM ( SELECT id, long, lat, (CAST(FLOOR(long/4.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat FROM Cities WHERE long > -176 AND long < 176 AND lat >= 60 AND lat < 75 ) AS C1 INNER HASH JOIN ( SELECT id, long, lat, (CAST(FLOOR(long/4.0) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat FROM Cities CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta WHERE lat >= 59 AND lat < 76 ) AS C2 ON C1.longlat = C2.longlat WHERE ABS(C1.long - C2.long) <= 4.0 UNION ALL -- Southern band: latitude [-60, -75), longitude (-176, +176), 4 degree longd grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM ( SELECT id, long, lat, (CAST(FLOOR(long/4.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat FROM Cities WHERE long > -176 AND long < 176 AND lat <= -60 AND lat > -75 ) AS C1 INNER HASH JOIN ( SELECT id, long, lat, (CAST(FLOOR(long/4.0) AS int)+xd)*180+CAST(FLOOR(lat) AS int)+yd AS longlat FROM Cities CROSS JOIN (SELECT -1 AS xd UNION SELECT 0 UNION SELECT 1) XDelta CROSS JOIN (SELECT -1 AS yd UNION SELECT 0 UNION SELECT 1) YDelta WHERE lat <= -59 AND lat > -76 ) AS C2 ON C1.longlat = C2.longlat WHERE ABS(C1.long - C2.long) <= 4.0 UNION ALL -- North polar cap: latitude >=+75, no grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM Cities C1 INNER JOIN Cities AS C2 ON C1.lat >= 75 AND C2.lat >= 74 UNION ALL -- Southern polar cap: latitude <=-75, no grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM Cities C1 INNER JOIN Cities AS C2 ON C1.lat <= -75 AND C2.lat <= -74 UNION ALL -- Pacific seam Equatorial: latitude (-60, +60), longitude <=-178 or >=178, no grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM Cities AS C1 INNER JOIN Cities AS C2 ON (C1.long <= -178 OR C1.long >= 178) AND C1.lat > -60 AND C1.lat < 60 AND (C2.long <= -176 OR C2.long >= 176) AND C2.lat > -61 AND C2.lat < 61 UNION ALL -- Pacific seam Northern: latitude [+60, +75), longitude <=-176 or >=176, no grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM Cities AS C1 INNER JOIN Cities AS C2 ON (C1.long <= -176 OR C1.long >= 176) AND C1.lat >= 60 AND C1.lat < 75 AND (C2.long <= -172 OR C2.long >= 172) AND C2.lat >= 59 AND C2.lat < 76 UNION ALL -- Pacific seam Southern: latitude [-60, -75), longitude <=-176 or >=176, no grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM Cities AS C1 INNER JOIN Cities AS C2 ON (C1.long <= -176 OR C1.long >= 176) AND C1.lat <= -60 AND C1.lat > -75 AND (C2.long <= -172 OR C2.long >= 172) AND C2.lat <= -59 AND C2.lat > -76 ) aWHERE C1id <> C2id AND ABS(C1lat - C2lat) <= 1.0 AND dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) <= 100 Edit: had Pacific seam wrong (or more wrong!) before.Edited by - Arnold Fribble on 07/12/2002 08:44:40 |
 |
|
kien
Starting Member
27 Posts |
Posted - 2002-07-12 : 15:45:17
|
Hi Mr. FribbleI'm not sure if I'm missing something from your code, but where did you concatenate all the city names?Besides the user function GCDist, is there any other things i should add?thanks again. |
 |
|
kien
Starting Member
27 Posts |
Posted - 2002-07-12 : 16:16:07
|
Please excuse my previous message.What I meant was...I'm not quite sure when Page47 says "You would still use the two update statements I outlined to build your final 'Query' column from the #worktable populated by Frib's rowset"Forgive my ignorance, but this simple task doesn't seem very straight forward to me. Where exactly do i replace what with what.Further more, how do I declare the #worktable (ie what variables?)same as the "table kien"?: pkey int identity(1,1) not null primary key, city varchar(20) not null, longitude int not null, latitude int not null, query varchar(2000) null Sincerely |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-07-12 : 16:39:08
|
Essentially my query replaces the SELECT INTO commented "create a temp table to work with" in Page's code. At least, it would if it was a SELECT INTO, had an ORDER BY and we'd used the same table definitions Page's stuff following that would remain -- the UPDATE with an assign to concatenate together the lists of cities and the final copy back into the original table.I didn't do all that, partly because it was already covered, but mostly because I was more concerned about how you self-join a 1 million row table on a not-obviously sargable condition without it taking a couple of years to get a result.BTW, how many cities do you really have? 1 million seems an awful lot: if there are 6 billion people in the world, then your smallest cities would have to contain fewer than 6000 people! |
 |
|
kien
Starting Member
27 Posts |
Posted - 2002-07-15 : 11:09:15
|
Hi Mr. Fribble & Page47The table actually consists of city/village/town, that's why there are so many records.I've tried out Page47's code on 1000 records...wow SO much faster than the cursor solution =)how come one does not have to declare #worktable...i find this very odd!okay here is my major problem trying to "merge" both of your codes. For starters, I do not yet fully understand Frib's code...but here's what I think i should do. Please correct me if i'm wrong -which i think i am probably guilty off cause it doesn't run If the following makes no sense at, it is because i'm very confused with all the different table column names =(THANK YOU !!!!!-----table definition: Cities [id, city, long, lat, query]--some variable we'll need declare @query varchar(2000), @lastpkey intselect @query = '', @lastpkey = -1--create a temp table to work withSELECT C1id, C2id, C1long, C1lat, C2long, C2lat, dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) AS distance, Cities ??.city as neighbor, space(2000) as query into #worktableFROM ( -- Equatorial band: latitude (-60, +60), longitude (-178, +178), 2 degree longd grid SELECT C1.id AS C1id, C2.id AS C2id, C1.long AS C1long, C1.lat AS C1lat, C2.long AS C2long, C2.lat AS C2lat FROM ( SELECT id, long, lat, city, (CAST(FLOOR(long/2.0) AS int))*180+CAST(FLOOR(lat) AS int) AS longlat FROM Cities--Frib's code here...WHERE C1id <> C2id AND ABS(C1lat - C2lat) <= 1.0 AND dbo.GCDist(RADIANS(C1lat), RADIANS(C1long), RADIANS(C2lat), RADIANS(C2long)) <= 100order by C1.id, C2.city ????-- You may or may not want to index tempdb . . . I'll let you play with thatcreate index idx_worktable on #worktable(pkey,neighbor)-- update the temp table building the query columnupdate #worktableset @query = query = case when @lastpkey <> id then neighbor else @query + ' ' + neighbor end, @lastpkey = id-- now update the final table from the temp tableupdate Citiesset query = (select max(query) from #worktable where id = C1.id)from Cities C1-- clean updrop table #worktable |
 |
|
Next Page
|
|
|
|
|