Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
deadc0de
Starting Member
3 Posts |
Posted - 2003-07-04 : 14:01:39
|
Hey Folks, I've got a problem formulating a select query that I think boils down to an issue with the usage of GROUP BY. Here's the situation, I've got two tables (english_table and altlang_table) composed of the same columns, containing slightly different data:--------------------------drop table english_table,altlang_table;CREATE TABLE [english_table] ( [ID] [int] NOT NULL , [symX] [float] NULL , [symY] [float] NULL , [Name] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Level11x] [float] NULL , [Level11y] [float] NULL , PRIMARY KEY CLUSTERED ( [ID] ) ON [PRIMARY] ) ON [PRIMARY];INSERT INTO [english_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(1,-158.017776,21.458059,'Wahiawa',-158.02388,21.45734);INSERT INTO [english_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(2,-157.858337,21.30694,'Honolulu',0.0,0.0);INSERT INTO [english_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(3,-157.858337,21.30694,'HONOLULU',-157.863754,21.30529);INSERT INTO [english_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(4,-123.121223,49.247131,'Vancouver',-123.12867,49.246784);CREATE TABLE [altlang_table] ( [ID] [int] NOT NULL , [symX] [float] NULL , [symY] [float] NULL , [Name] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Level11x] [float] NULL , [Level11y] [float] NULL , PRIMARY KEY CLUSTERED ( [ID] ) ON [PRIMARY] ) ON [PRIMARY];INSERT INTO [altlang_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(1,-157.858337,21.30694,'Honolulu',0.0,0.0);INSERT INTO [altlang_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(2,-157.858337,21.30694,'HONOLULU',-157.8629,21.305498);INSERT INTO [altlang_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(3,-149.900284,61.21806,'Anchorage',-149.908112,61.217361);INSERT INTO [altlang_table]([ID],[symX], [symY], [Name], [Level11x], [Level11y])VALUES(4,-123.121223,49.247131,'Vancouver',-123.129356,49.24646); --------------------------One of these tables contains data in english, one contains data in an alternate language (cannot depend on correspondence on name values). My goal is to get a list of corresponding records that exist in both the english and altlang tables. How do I tell if a correspondence exists? If the distance (squared) calculated between the english sym(X,Y) and altlang sym(X,Y) is within a certain threshold. Alright, so far, things are good, now, the next issue is that the dist squared calculation does not guarantee a 1-1 correspondence, so if one element in the english_table corresponds to more than one element in the altlang_table using the DistSquared(sym), I want to reduce it to mapping to only the single element in the altlang_table which has the minimum DistSquared(Level11) between the english and altlang records.For the sample data listed above, I want the following output:EnglishID, CorrespondingAltLangID2,13,24,4My initial implementation was:--------------------------SELECT english_table.ID as EnglishID,altlang_table.ID as AltLangID,MIN(POWER(english_table.Level11x - altlang_table.Level11x,2) + POWER(english_table.Level11y - altlang_table.Level11y,2)) as LevelDeltaFROM english_table, altlang_tableWHERE (POWER(english_table.symX - altlang_table.symX,2) + POWER(english_table.symY - altlang_table.symY,2)) < 0.001GROUP BY english_table.ID; --------------------------to which MS SQL promptly rebuffed:quote: Column 'altlang_table.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Alright, not sure why it insists on having altlang_table.ID in either the aggregate function or group by clause... so I'll try sticking it in the group by clause (although it doesn't seem correct). So now I try:--------------------------SELECT english_table.ID as EnglishID,altlang_table.ID as AltLangID,MIN(POWER(english_table.Level11x - altlang_table.Level11x,2) + POWER(english_table.Level11y - altlang_table.Level11y,2)) as LevelDeltaFROM english_table, altlang_tableWHERE (POWER(english_table.symX - altlang_table.symX,2) + POWER(english_table.symY - altlang_table.symY,2)) < 0.001GROUP BY english_table.ID,altlang_table.ID; --------------------------but that doesn't give me the output I want. If I try this:--------------------------SELECT english_table.ID as EnglishID,MIN(POWER(english_table.Level11x - altlang_table.Level11x,2) + POWER(english_table.Level11y - altlang_table.Level11y,2)) as LevelDeltaFROM english_table, altlang_tableWHERE (POWER(english_table.symX - altlang_table.symX,2) + POWER(english_table.symY - altlang_table.symY,2)) < 0.001GROUP BY english_table.ID; --------------------------the result set is close to what I expect (3 results, correct set of EnglishIDs, and correct seeming LevelDelta values), but no AltLangID in the output... How can I make the correlation?So, what am I doing wrong?Sorry for the essay length post, and thanks in advance for any assistance.Edited by - deadc0de on 07/04/2003 14:11:21 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-07-04 : 15:15:58
|
Try to enjoy this:SELECT english_table.ID as EnglishID, altLang_table.ID as altLangID,cast(english_table.name as char(12)) as EngName,cast(altlang_table.name as char(12)) as altLangNameFROM english_table, altlang_tableWHERE(POWER(english_table.symX - altlang_table.symX, 2) + POWER(english_table.symY - altlang_table.symY, 2)) < 0.001and(POWER(english_table.Level11x - altlang_table.Level11x, 2) + POWER(english_table.Level11y - altlang_table.Level11y,2)) < 0.00001EnglishID altLangID EngName altLangName ----------- ----------- ------------ ------------ 2 1 Honolulu Honolulu 3 2 HONOLULU HONOLULU 4 4 Vancouver Vancouver - Vit |
 |
|
|
deadc0de
Starting Member
3 Posts |
Posted - 2003-07-04 : 15:38:10
|
| Hey Stoad, Thanks for the reply, but I'm not convinced the query you provide will necessarily solve my problem for all scenarios. What if more than one set of records from the two tables have both really close sym and really close Label11 values (closer than the 0.00001 threshold you set)? I would want to return the pairing/correlation which is closest, where-as your method would return them all. Your approach may work 90% of the time for me, but surely there's another way.Thanks. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-07-04 : 16:01:42
|
| Let me think it over... but I believe that you mustdefine the second threshold.- Vit |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-07-04 : 16:49:05
|
| Look at this (and remember, because of float data typeof your fields you can't apply '=' sign):SELECT e.ID as EnglishID, a.ID as altLangIDFROM english_table e, altlang_table aWHEREPOWER(e.symX - a.symX, 2) + POWER(e.symY - a.symY, 2) < 0.001andabs(POWER(e.Level11x - a.Level11x,2)+POWER(e.Level11y - a.Level11y,2)-(select min(POWER(ee.Level11x - aa.Level11x, 2)+POWER(ee.Level11y - aa.Level11y, 2)) FROMenglish_table ee, altlang_table aawhere ee.symX = e.symX and ee.symY = e.symY)) < 0.000001- Vit |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-07-04 : 18:29:22
|
| Catch my ultimate version:select EID, AID, DIST, LEV from(SELECT e.ID EID, a.ID AID,POWER(e.symX - a.symX,2)+POWER(e.symY - a.symY,2) as DIST,POWER(e.Level11x - a.Level11x,2)+POWER(e.Level11y - a.Level11y,2) as LEVFROM english_table as e, altlang_table as awhere POWER(e.symX - a.symX,2) + POWER(e.symY - a.symY,2) < 0.001) as b,(SELECT ee.ID,min(POWER(ee.Level11x - aa.Level11x,2)+POWER(ee.Level11y - aa.Level11y,2)) as MINLEVFROM english_table as ee, altlang_table as aa where POWER(ee.symX - aa.symX, 2) + POWER(ee.symY - aa.symY, 2) < 0.001group by ee.ID) as cwhere b.EID=c.ID and round(LEV,12)=round(MINLEV,12)Without rounding comparison of two floats may be inaccurate.- VitEdited by - Stoad on 07/04/2003 18:34:18 |
 |
|
|
deadc0de
Starting Member
3 Posts |
Posted - 2003-07-04 : 19:08:31
|
| Hey Stoad, That last one looks like it should work. I'm experimenting with it on my full dataset right now... Thanks for the help so far. |
 |
|
|
|
|
|
|
|