Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
RickD
Slow But Sure Yak Herding Master
3608 Posts |
Posted - 2006-06-13 : 12:50:24
|
Just come back from two weeks holiday, brains not entirely in gear and been asked to improve the following SQL. Can anyone please help?I basically need to find out how far each postcode is from another. This is currently a view called by a couple of other procedures and the limiting field is the HowFarMiles, this is chosen by the user from a drop down and can be 20, 50 or 100.SELECT FromPC = PCFind.sPostCode, FromX = PCFind.liX, FromY = PCFind.liY, ToPC = PCHave.sPostCode, ToX = PCHave.liX, ToY = PCHave.liY, HowFarMiles = CAST((SQRT(SQUARE(@FromX - @ToX) + SQUARE(@FromY - @ToY)) / 1609.347218) AS INT)FROM Postcodes PCHave WITH (NOLOCK) CROSS JOIN Postcodes PCFind WITH (NOLOCK) |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-06-13 : 13:52:04
|
I always think that that approach is overly CPU intensive!My approach is to work out the min/max Lat/Long for a square based on the centre position.Then query for records within that square - which is just a simple:WHERE @MinLat <= ObjectLat AND ObjectLat <= @MaxLat AND @MinLong <= ObjectLong AND ObjectLong <= @MaxLong and then to apply all the heavy-duty SQRT CPU stuff to the ones that make it through that filter.Obviously you need an index on your ObjectLat and ObjectLong columns for the above WHERE clause to be super-speedy!Edit Fixed typo Kristen |
 |
|
|
RickD
Slow But Sure Yak Herding Master
3608 Posts |
Posted - 2006-06-14 : 05:26:24
|
Thanks Kristen. I actually convinced them to add an inner join in the end as it only ever tries to match against our companies table..Brain, gear, engage.. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-06-15 : 13:21:31
|
quote: Originally posted by RickD
SELECT FromPC = PCFind.sPostCode, FromX = PCFind.liX, FromY = PCFind.liY, ToPC = PCHave.sPostCode, ToX = PCHave.liX, ToY = PCHave.liY, HowFarMiles = CAST((SQRT(SQUARE(@FromX - @ToX) + SQUARE(@FromY - @ToY)) / 1609.347218) AS INT)FROM Postcodes PCHave WITH (NOLOCK) CROSS JOIN Postcodes PCFind WITH (NOLOCK)
You are doing a cross join (cartesian product, all possible combinations) which means you are calulating all distances twice.A -> B is equal the distance from B -> A. Also you are calculating A -> A. Therefore, add a where to your querySELECT FromPC = PCFind.sPostCode, FromX = PCFind.liX, FromY = PCFind.liY, ToPC = PCHave.sPostCode, ToX = PCHave.liX, ToY = PCHave.liY, HowFarMiles = CAST((SQRT(SQUARE(@FromX - @ToX) + SQUARE(@FromY - @ToY)) / 1609.347218) AS INT)FROM Postcodes PCHave WITH (NOLOCK) CROSS JOIN Postcodes PCFind WITH (NOLOCK)WHERE PCHave.sPostCode < PCFind.sPostCode Peter LarssonHelsingborg, Sweden |
 |
|
|
RickD
Slow But Sure Yak Herding Master
3608 Posts |
Posted - 2006-06-16 : 11:59:15
|
Sorry, but why would you want that where clause? Not sure on Swedish postcodes, but UK ones usually look something like WC1 1ES Which would be somewhere in west London near the city..Outside London, they are usually initials of the nearest big town/city, but are always 6-7 characters..As I said earlier, I didn't write this, it was dumped on me by developers wondering why their queries were taking so long.. I got them to agree to add an inner join to our companies table and made it much quicker.. Developers are the bain of my life at the moment, they should all be taught properly or taken out and shot.. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-06-16 : 18:18:27
|
quote: Originally posted by RickD Sorry, but why would you want that where clause?As I said earlier, I didn't write this, it was dumped on me by developers wondering why their queries were taking so long.
Your query produces n^2 records where n denotes number of postal codes in your table. Having 30,000 postal codes in your table results in 900,000,000 records returned with your query above! 900 million records.And I know there is a total about 1,500,000 postal codes in Britain which potentially could produce 2,250,000,000,000 combinations. This is 2,250 billion records.And since the distance from WC1 1ES to WC4 2AD is the same as the distance from WC4 2AD to WC1 1ES, so why would you like to return the same distance twice? And also, why would you like to return the distance from WC1 1ES to WC1 1ES? And back again from WC1 1ES to WC1 1ES?Putting the where clause in the query makes it return less than half the query. With above example of 30,000 postal codes, the query now "only" returns (n^2 - n) / 2 records which is 449,985,000 records, almost 450 million records.That is less than half the number of records returned, which makes the query go twice as fast than before. This is doable even for a INNER JOIN query.Another technique for accomplishing this is to make a lookup table. How often does postal codes change? Insert into lookup table all combinations once for all.Peter LarssonHelsingborg, Sweden |
 |
|
|
RickD
Slow But Sure Yak Herding Master
3608 Posts |
Posted - 2006-06-19 : 05:37:29
|
| That still doesn't explain why you added a where clause with a < than on the postcode? I am just trying to understand why you think this would work?I have tried it and I am missing quite a few results. As I said at the beginning of this thread, this is set up as a view, so the procedure which calls this had a where clause to limit it to a postcode and a certain distance.As I stated a few posts ago, I have limited the result set by adding an inner join to our companies table into the view..Thanks for your input anyway.. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-06-19 : 12:01:35
|
quote: Originally posted by RickD That still doesn't explain why you added a where clause with a < than on the postcode?
Ok. I will try to explain it better this time. Even if you put an "limit distance" WHERE clause on your view, you will still calculate the "round trip". The PCHave.sPostCode < PCFind.sPostCode filter out those records. See this cartesian product table for {A, B, C} and {A, B, C}. A, B and C are the postal codes used.Have Find---- ---- A A A B A C B A B B B C C A C B C C The table has 9 rows (3x3). Since there is actually no need for retreiving A -> A or B -> B or C -> C, we could use PCHave.sPostCode <> PCFind.sPostCode and the table would look like thisHave Find---- ---- A A <- Deleted since Have = Find A B A C B A B B <- Deleted since Have = Find B C C A C B C C <- Deleted since Have = Find And finished as thisHave Find---- ---- A B A C B A B C C A C B Now we only have (3x2) rows, a cut-off with 33%!Calculating the round trip, A -> B and the same distance B -> A, is also really not needed, so how can we get rid of those? By simple putting PCHave.sPostCode < PCFind.sPostCode to a WHERE clause! See the table what happens when doing soHave Find---- ---- A A <- Deleted since Have = Find A B A C B A <- Deleted since Have > Find B B <- Deleted since Have = Find B C C A <- Deleted since Have > Find C B <- Deleted since Have > Find C C <- Deleted since Have = Find And after PCHave.sPostCode < PCFind.sPostCode there is only 3 rows left of the original 9, a cut off with 67%!Have Find---- ---- A B A C B C So even if you put your WHERE clause (limitation of distance) to the view, this method will at least further bring the number of rows returned down to half for the subset, by reducing duplicate distances/round trips such as A -> B and B -> A!As you can see from this tablePostal codes Gain %------------ ------ 10 55,00% 25 52,00% 50 51,00% 100 50,50% 200 50,25% 300 50,17% 400 50,13% 500 50,10% 1000 50,05% 10000 50,01% 100000 50,00% 1000000 50,00% The gain approximates at 50% when reaching infinity.Peter LarssonHelsingborg, Sweden |
 |
|
|
RickD
Slow But Sure Yak Herding Master
3608 Posts |
Posted - 2006-06-19 : 13:00:48
|
| But it still doesn't explain why when this is run with your where clause, there are postcodes missing out of the result set.. As Have is always set to a postcode, I never get a round trip anyway..The only thing I can think of is due to the differences between mainland european postcodes, which are usually a letter and then numbers and the Uk postcode..Nevermind anyway, as stated earlier, the problem has been solved by joining to our companies table.. |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-06-19 : 13:19:38
|
| Strange. Do you have an example of postcodes missing when putting < to the query?Peter LarssonHelsingborg, Sweden |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-06-20 : 02:50:22
|
| I think you're arguing from different premises: one of you seems to be getting all the pairs of rows that are within a distance of each other, the other wants the nearest n rows from a given point. Clearly eliminating pairs that are the reverse of another pair is going to be the wrong thing to do for the second problem!The problem with getting the nearest n points is that you have to know roughly how far to look for any given point.For points-near-to-a-single-point problem, you'll probably get good enough performance from Kristen's suggestion, assuming your average point density doesn't swamp your maximum search distance.If you're trying to find nearby pairs, you probably need to grid the points on the maximum search distance, then restrict your join to points in the same or neighbouring grid square.If you want a really heavyweight solution, you could look at the stuff Jim Gray et al have done: there's a whole load of stuff in the SQL Server 2005 samples under 'spatial'. |
 |
|
|
RickD
Slow But Sure Yak Herding Master
3608 Posts |
Posted - 2006-06-20 : 05:20:28
|
| Thanks Arnold, I think that clears up the confusion.. |
 |
|
|
|
|
|
|
|