| Author |
Topic |
|
Teroman
Posting Yak Master
115 Posts |
Posted - 2002-12-19 : 07:01:19
|
| Hi,I'm running a simple update statement in Query Analyser, and Ĉ just gets converted to a C.This stops queries running properly against this fieldAny ideas how to get this to go in properly?Thanks in advanceEdited by - teroman on 12/19/2002 07:08:50 |
|
|
cez
Starting Member
37 Posts |
Posted - 2002-12-19 : 07:06:48
|
quote: Hi,I'm running a simple update statement in Query Analyser, and Ĉ just gets converted to a C.This stops queries running properly against this fieldAny ideas how to get this to go in properly?Thanks in advance
Use NCHAR for UNICODE.Hope it helps |
 |
|
|
Teroman
Posting Yak Master
115 Posts |
Posted - 2002-12-19 : 07:13:08
|
| That C isnt a double-byte character, so that doesn't make a difference (but I have tried it to check).the following code just produces upper case C for meset nocount oncreate table #t(x char(1))insert into #t values('Ĉ')select * from #tdrop table #tgocreate table #t(x nchar(1))insert into #t values('Ĉ')select * from #tdrop table #tthis is annoying, especially considering these forums run on SQLServer, and they are happily storing the Ĉ charactersI'm on SQL 7 is that makes a difference. |
 |
|
|
verronep
Starting Member
15 Posts |
Posted - 2002-12-19 : 07:27:56
|
| You need to declare the column(s) that need to hold these types of characters as nchar (or nvarchar) - as it is unicode.However, that is not enough on it's own. When using a unicode column data type, you need to put a N in front of the value to tell SQL Server to store the data as unicode.So what you have should really be something like this:create table #t(x nchar(1)) insert into #t values(N'Ĉ') select * from #t drop table #tHTH |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-12-19 : 08:03:15
|
I don't think the character Latin Capital Letter C With Circumflex (U+0108) present in any SQL server supported 8-bit character sets. As far as I can see, it's in ISO-8859-3, but that doesn't have a corresponding supported Windows codepage or SQL Server collation.It's not a common character, I can only find reference to its use in Esperanto!If you take the example of the more common Latin Capital Letter C With Caron (U+010C), then this is in ISO-8859-2 which maps onto Windows CP 1250.This demonstrates how collations make a difference in SQL Server 2000. As you can see, the use of a COLLATE specifier on a literal (non-unicode) string does not change the allowable characters: Č still gets down-converted to C. The CHAR function also appears to be working in CP1252*.* or is it the character set of the server/database collation?CREATE TABLE #t (i int IDENTITY(1,1), x char(1) COLLATE Latin1_General_CI_AS NOT NULL)INSERT INTO #t VALUES('Č')INSERT INTO #t VALUES('Č' COLLATE Czech_CI_AS)INSERT INTO #t VALUES(CHAR(0xC8)) -- In CP1250, Latin Capital Letter C With Caron is at 0xC8INSERT INTO #t VALUES(N'Č')INSERT INTO #t VALUES(NCHAR(0x10C))SELECT * FROM #tORDER BY iDROP TABLE #tGOCREATE TABLE #t (i int IDENTITY(1,1), x char(1) COLLATE Czech_CI_AS NOT NULL)INSERT INTO #t VALUES('Č')INSERT INTO #t VALUES('Č' COLLATE Czech_CI_AS)INSERT INTO #t VALUES(CHAR(0xC8)) -- In CP1250, Latin Capital Letter C With Caron is at 0xC8INSERT INTO #t VALUES(N'Č')INSERT INTO #t VALUES(NCHAR(0x10C))SELECT * FROM #tORDER BY iDROP TABLE #t Result:i x ----------- ---- 1 C2 C3 È4 C5 Ci x ----------- ---- 1 C2 C3 E4 Č5 Č Edited by - Arnold Fribble on 12/19/2002 08:05:15 |
 |
|
|
Teroman
Posting Yak Master
115 Posts |
Posted - 2002-12-19 : 08:46:03
|
| Thanks for the advice Arnold.I knew the darned character must be a single byte because it came out of a byte array in .NETI got this coming up because I am encrypting some password, and this came up in the result hash.Oh well, I guess I'll just have to think up something to work around it and quietly curse MS under my breath ;)thanks againcol |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-12-19 : 09:06:49
|
You're not inadvertantly interpreting some byte sequence as UTF-8 encoded Unicode, are you? If you had the sequence, uh, 0xC4 0x88, which would look like "Ĉ" if interpreted as ISO-8859-1 (aka Latin 1, Windows CP1252*) and tried to, say, shove that into a UTF-8 encoded HTML document then it would look like C-circumflex.* except that CP1252 has some extra characters with codepoints between 128 and 159.0x88 is between 0x80 and 09F, so it doesn't have a graphic character in ISO Latin 1.Edited by - Arnold Fribble on 12/19/2002 09:32:00 |
 |
|
|
Teroman
Posting Yak Master
115 Posts |
Posted - 2002-12-19 : 09:11:48
|
| Damn, you're right!I think I should be using ascii, maybe System.Text.Encoding.ASCII is the class for me!cheers<edit> System.Text.ASCIIEncoding i mean </edit>Edited by - teroman on 12/19/2002 09:15:55 |
 |
|
|
|