Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
esowell
Starting Member
4 Posts |
Posted - 2003-11-09 : 08:33:10
|
| Here's my issue, which I've queried about in various places and with various people, though with no success. The only way that I can describe the phenomenon that I'm seeing is to say that it seems that Sql Server is ignoring some of my characters. Here's what I'm seeing. I run a sql statement like thisSET @lex_form_id = (SELECT lex_form_id FROM morph_lex_form WHERE lex_form LIKE @lex_form)where @lex_form_id is an INT and column lex_form and variable @lex_form are NVARCHAR. If I doSET @lex_form_id = (SELECT lex_form_id FROM morph_lex_form WHERE lex_form = @lex_form)I get the same issue. Okay, so that's my sql. Here's the problem. When I try to match small words that have lots of diacritical markings it doesn't work. For example, "apo" and "epi" give me problems. Say that the variable @lex_form is the word "apo". If I run the statement and "epi" is in the table morph_lex_form, they match. And here's why I think so. Both of the vowels in both words have diacriticals over them. If it ignores those characters, then all you're matching is "p" and "p", which obviously match.In connection with that, when I do an order by on the table it is also clear that the vowels that have diacritical marks are being ignored.I do have the collation set on that column to Greek, as you can see here: [lex_form] [nvarchar] (50) COLLATE Greek_CS_AS NOT NULLAnd I have the Greek settings for xp professional turned on as well.Any idea how to solve this problem? Thanks.esowell |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2003-11-09 : 10:46:36
|
| try WHERE lex_form = @lex_form collate Greek_CS_ASwhat are the ascii codes for the o and i that you are having problems with.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
esowell
Starting Member
4 Posts |
Posted - 2003-11-09 : 23:43:10
|
| Thanks for the hint, but unfortunately it didn't work. The characters for "a" and "o" are transliterations of the Greek characters alpha and omicron. In the real situation the alpha and omicrons would both have diacriticals. I am also using the Greek extended set of characters for some of the characters, and the regular Greek set for the others, the code charts for which can be seen here.Regular - http://www.unicode.org/charts/PDF/U0370.pdfExtended - http://www.unicode.org/charts/PDF/U1F00.pdfThe two words are make up of the following characters, which can all be found at the address above:"apo"1F00, 03c0, 1f79"epi"1f10, 03c0, 1f77Does that help you diagnose? |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2003-11-10 : 05:26:02
|
Greek_CS_AS doesn't collate Unicode characters in the Greek Extended range properly.CREATE VIEW Greek_chars ASSELECT n, NCHAR(n) COLLATE Greek_CS_AS AS cFROM NumbersWHERE (n BETWEEN 0x0374 AND 0x03FB OR n BETWEEN 0x1F00 AND 0x1FFE) AND n NOT IN (0x0376, 0x0377, 0x0378, 0x0379, 0x037B, 0x037C, 0x037D, 0x037F, 0x0380, 0x0381, 0x0382, 0x0383, 0x038B, 0x038D, 0x03A2, 0x03CF, 0x1F16, 0x1F17, 0x1F1E, 0x1F1F, 0x1F46, 0x1F47, 0x1F4E, 0x1F4F, 0x1F58, 0x1F5A, 0x1F5C, 0x1F5E, 0x1F58, 0x1F7E, 0x1F7F, 0x1FB5, 0x1FC5, 0x1FD4, 0x1FD5, 0x1FDC, 0x1FF0, 0x1FF1, 0x1FF5)GOSELECT CAST(UNICODE(A.c) AS binary(2)), A.c, CAST(UNICODE(B.c) AS binary(2)), ctFROM Greek_chars AS AINNER JOIN ( SELECT c, COUNT(*) AS ct FROM Greek_chars GROUP BY c ) AS BON A.c = B.cORDER BY UNICODE(A.c), UNICODE(B.c) Edit: Hmm, Given that Unicode comparisons are done with a Windows API call, I wonder whether this is operating-system dependent. Anyone running Windows Server 2003 want to give this a try? |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-10 : 05:26:33
|
| How does this work:WHERE lex_form = NCHAR(7936) + NCHAR(960) + NCHAR(8057) -- this is "apo"??? |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-10 : 05:29:51
|
| 31 seconds... Not so bad. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-10 : 07:06:26
|
| Is it silly workaround:WHERE cast(lex_form as varbinary(800)) = cast(@lex_form as varbinary(800))??? |
 |
|
|
esowell
Starting Member
4 Posts |
Posted - 2003-11-11 : 14:58:58
|
| Thanks for your responses guys. I'm not sure if your suggestion will work or not, Stoad. I'll try it later.As for your surely time-intensive response Mr. Fribble, I did get that working but I wasn't sure exactly how to apply that to my query to solve my problem. I'm interested in your thoughts on that.But, for the time being, I've got the functionality I need. When I originally set the collation for the field to the Greek collation (GREEK_CS_AS) I did so through a sql statement rather than through enterprise manager. Now, I was poking around in another table that had Greek characters but still had the default Latin collation. I decided to change it to Greek. I didn't remember the sql statement to change the collation, so I decided to open up the table in design view to set the collation. When I opened up the dialog I went first to the sql collation combo box and was surprised to find the GREEK_CS_AS collation missing, which I thought was a little odd. But, in the spirit of adventure, I decided to try using a Windows collation, chose the one for Greek, and set it on Binary sort. Suddenly, my queries worked! Then I checked the old table that I had previously set using a sql query and saw that it too was set to the Windows Greek collation, except that it was set to Dictionary sort rather than binary sort. Then I saw what the problem was. Binary is just zero's and one's, so of course that would work. Comparisons of that kind should work with no problem. But, if the collation is set to dictionary sort, it doesn't sort by the binary value but by the internal dictionary that Sql Server uses to deal with case sensitivity and accents in other languages. So when you say that GREEK_CS_AS doesn't support the Greek extended set you mean that the dictionary that Sql Server uses doesn't support that set. Ahhh!!!!!! Problem solved. Of course, this means for the time being that I can't use the dictionary and thus get case and accent insensitivity when I want it, which is a feature I will need at some point. But, for the moment, what I was trying to do before works great because of the nature of the words I was dealing with. So, a temporary fix, though I'll need to deal with being able to use dictionary sort at some point. So, any more explanation would be greatly appreciated. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-11 : 16:02:58
|
| How comparisons for strict equality can be affected by collations?It's above my head...The whole idea of the Unicode looks rotten in that. |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2003-11-12 : 06:03:23
|
| If = didn't use the same collation as > and < then indexes wouldn't work.What the Unicode Consortium think about collation can be found here:http://www.unicode.org/reports/tr10 |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2003-11-12 : 06:17:31
|
| To respond to esowell, I think using GREEK_BIN collation would be preferable to converting the strings to binary -- it's less noisy.Sorry for not saying so, but my code was really just to illustrate that there is a problem with collation for all of the Greek Extended range of Unicode characters in SQL Server 2000 (at least, running on Windows 2000 and XP).I'm afraid I don't have any good suggestions for how you can get sensible dictionary collation for strings containing Greek Extended characters. |
 |
|
|
esowell
Starting Member
4 Posts |
Posted - 2003-11-12 : 06:30:16
|
| Well, thanks for your help anyway. I suppose I'll just have to find some way of dealing with case and accent sensitivity some other way. Do you know if a dictionary could be created that could then be used with the Greek collation? If this is possible, is it a super huge project that I wouldn't want to get into? |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-12 : 08:26:45
|
| Funny reading on that link :) Plus, I think that there are only few of those (e.g.) greeks who know how exactly their words should be sorted...1.3 Contextual SensitivityBeyond the concept of levels, there are additional complications in certain languages, whereby the comparison is context sensitive: it depends on more than just single characters compared directly against one another.First are contractions, where two (or more) characters sort as if they were a single base character. In the table below, CH acts like a character after C. Second are expansions, where a single character sorts as if it were two (or more) characters in sorting. In the table below, an OE ligature sorts as if it were O + E. Both of these can be combined: that is, two (or more) characters may sort as if they were a different sequence of two (or more) characters. In the example below, for Japanese, a length mark sorts like the vowel of the previous syllable: as an A after KA and as an I after KI.H < Z, butCH > CZThere are some further oddities in the ways that languages work. Normally, all differences in sorting are assessed going from the start to the end of the string. If all of the base characters are the same, the first accent difference determines the final order. In row 1 of the example below, the first accent difference is on the o, so that is what determines the order. In French and a few other languages, however, it is the last accent difference that determines the order, as in row 2. |
 |
|
|
|
|
|
|
|