Store the Ĉ character in a char field

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Store the Ĉ character in a char field

Author

Topic

Teroman
Posting Yak Master

115 Posts

Posted - 2002-12-19 : 07:01:19

Hi,

I'm running a simple update statement in Query Analyser, and Ĉ just gets converted to a C.

This stops queries running properly against this field

Any ideas how to get this to go in properly?

Thanks in advance

Edited by - teroman on 12/19/2002 07:08:50

cez
Starting Member

37 Posts

Posted - 2002-12-19 : 07:06:48

quote:

Hi,

I'm running a simple update statement in Query Analyser, and Ĉ just gets converted to a C.

This stops queries running properly against this field

Any ideas how to get this to go in properly?

Thanks in advance

Use NCHAR for UNICODE.
Hope it helps

Teroman
Posting Yak Master

115 Posts

Posted - 2002-12-19 : 07:13:08

That C isnt a double-byte character, so that doesn't make a difference (but I have tried it to check).

the following code just produces upper case C for me

set nocount on

create table #t(x char(1))
insert into #t values('Ĉ')
select * from #t
drop table #t

go

create table #t(x nchar(1))
insert into #t values('Ĉ')
select * from #t
drop table #t

this is annoying, especially considering these forums run on SQLServer, and they are happily storing the Ĉ characters

I'm on SQL 7 is that makes a difference.

verronep
Starting Member

15 Posts

Posted - 2002-12-19 : 07:27:56

You need to declare the column(s) that need to hold these types of characters as nchar (or nvarchar) - as it is unicode.

However, that is not enough on it's own. When using a unicode column data type, you need to put a N in front of the value to tell SQL Server to store the data as unicode.

So what you have should really be something like this:

create table #t(x nchar(1))
insert into #t values(N'Ĉ')
select * from #t
drop table #t

HTH

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-12-19 : 08:03:15

I don't think the character Latin Capital Letter C With Circumflex (U+0108) present in any SQL server supported 8-bit character sets. As far as I can see, it's in ISO-8859-3, but that doesn't have a corresponding supported Windows codepage or SQL Server collation.
It's not a common character, I can only find reference to its use in Esperanto!

If you take the example of the more common Latin Capital Letter C With Caron (U+010C), then this is in ISO-8859-2 which maps onto Windows CP 1250.
This demonstrates how collations make a difference in SQL Server 2000. As you can see, the use of a COLLATE specifier on a literal (non-unicode) string does not change the allowable characters: Č still gets down-converted to C. The CHAR function also appears to be working in CP1252*.

* or is it the character set of the server/database collation?


CREATE TABLE #t (i int IDENTITY(1,1), x char(1) COLLATE Latin1_General_CI_AS NOT NULL)

INSERT INTO #t VALUES('Č')
INSERT INTO #t VALUES('Č' COLLATE Czech_CI_AS)
INSERT INTO #t VALUES(CHAR(0xC8)) -- In CP1250, Latin Capital Letter C With Caron is at 0xC8
INSERT INTO #t VALUES(N'Č')
INSERT INTO #t VALUES(NCHAR(0x10C))

SELECT * FROM #t
ORDER BY i

DROP TABLE #t

GO

CREATE TABLE #t (i int IDENTITY(1,1), x char(1) COLLATE Czech_CI_AS NOT NULL)

INSERT INTO #t VALUES('Č')
INSERT INTO #t VALUES('Č' COLLATE Czech_CI_AS)
INSERT INTO #t VALUES(CHAR(0xC8)) -- In CP1250, Latin Capital Letter C With Caron is at 0xC8
INSERT INTO #t VALUES(N'Č')
INSERT INTO #t VALUES(NCHAR(0x10C))

SELECT * FROM #t
ORDER BY i

DROP TABLE #t

Result:


i           x    
----------- ---- 
1           C
2           C
3           È
4           C
5           C

i           x    
----------- ---- 
1           C
2           C
3           E
4           Č
5           Č

Edited by - Arnold Fribble on 12/19/2002 08:05:15

Teroman
Posting Yak Master

115 Posts

Posted - 2002-12-19 : 08:46:03

Thanks for the advice Arnold.

I knew the darned character must be a single byte because it came out of a byte array in .NET

I got this coming up because I am encrypting some password, and this came up in the result hash.

Oh well, I guess I'll just have to think up something to work around it and quietly curse MS under my breath ;)

thanks again

col

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-12-19 : 09:06:49

You're not inadvertantly interpreting some byte sequence as UTF-8 encoded Unicode, are
you? If you had the sequence, uh, 0xC4 0x88, which would look like "Äˆ" if interpreted as ~~ISO-8859-1 (aka Latin 1,~~ Windows CP1252*) and tried to, say, shove that into a UTF-8 encoded HTML document then it would look like C-circumflex.

* except that CP1252 has some extra characters with codepoints between 128 and 159.
0x88 is between 0x80 and 09F, so it doesn't have a graphic character in ISO Latin 1.

Edited by - Arnold Fribble on 12/19/2002 09:32:00

Teroman
Posting Yak Master

115 Posts

Posted - 2002-12-19 : 09:11:48

Damn, you're right!

I think I should be using ascii, maybe System.Text.Encoding.ASCII is the class for me!

cheers

<edit> System.Text.ASCIIEncoding i mean </edit>

Edited by - teroman on 12/19/2002 09:15:55

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources