Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 Unexpected special characters in column values

Author  Topic 

shinjith
Starting Member

1 Post

Posted - 2009-05-09 : 07:53:45
Hello Guys,

I have a problem with entering values into database.

The lefthand side values I entered into db ... but in db its appearing like those corresponding values in right hand side

Intel Core™ -> Intel CoreGäó
FP202W 20” -> FP202W 20GÇ¥
kijken van familiefoto’s -> kijken van familiefotoGÇÖs
benötigen -> ben+¦ti
Ladegerät -> Ladeger+ñt,

This is an urgent problem i need to solve... so any valid information is helpful....

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-05-09 : 10:35:55
what's the datatype of field? are you using unicode type?
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2009-05-09 : 11:50:20
What I'm seeing on the right-hand side of your posting is consistent with taking utf-8 encoded text, treating the individual octets as characters encoded in DOS code page 437, and then downconverting those characters to ones in Windows code page 1252. The downconversion part is consistent with what SQL Server (2005, anyway) would do with those characters.

What I can't tell you is where the misinterpretation and downconversion is taking place. You'd need to provide more information about how the data is getting into that column and what the column's data type is.
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2009-05-09 : 13:53:56
To demonstrate what I mean, here's some SQL. The database engine in SQL Server can't produce utf-8 itself, so I've written a little function to take an nvarchar(max) string and return a varbinary(max) containing the octets (bytes) of the utf-8 encoding. It's not supposed to be production quality, but it's good enough for your strings.

So we take your original strings, encode them as utf-8, misinterpret the utf-8 bytes as code page 437 and finally downconvert into code page 1252. So the first column in the output is what you had on the left hand side, and the last column is what you had on the right hand side.

The round_tripped_through_utf8 column in the output was just to give me some assurance that the conversion function worked well enough!

CREATE FUNCTION dbo.nvarchar_to_utf8_varbinary(@s nvarchar(max))
RETURNS varbinary(max)
BEGIN
DECLARE @i int, @l int, @c int, @res varbinary(max)
SET @res = 0x
SET @i = 0
SET @l = DATALENGTH(@s) / 2
WHILE @i < @l
BEGIN
SET @i = @i + 1
SET @c = UNICODE(SUBSTRING(@s, @i, 1))
-- just explode if we find anything that looks like a UTF-16 surrogate:
IF @c BETWEEN 0xD800 AND 0xDFFF
SET @res = @res + (SELECT 1/0)
SET @res = @res +
CASE
WHEN @c < 0x0080 THEN CAST(@c AS varbinary(1))
WHEN @c < 0x0800 THEN CAST(0xC080 | ((@c & 0xFC0) * 0x0004) | (@c & 0x003F) AS varbinary(2))
ELSE CAST(0xE08080 | ((@c & 0xF000) * 0x0010) | ((@c & 0x0FC0) * 0x0004) | (@c & 0x003F) AS varbinary(3))
END
END
RETURN @res
END
GO

SELECT *,
ISNULL((SELECT CAST(NULL AS varchar(max)) COLLATE Latin1_General_BIN),
utf8_octets_interpreted_as_cp437) AS cp437_downcast_to_cp1252
FROM (
SELECT *,
CAST(CAST(utf8_encoded_in_varbinary AS xml) AS nvarchar(max)) AS round_tripped_through_utf8,
ISNULL((SELECT CAST(NULL AS varchar(max)) COLLATE SQL_Latin1_General_Cp437_BIN),
utf8_encoded_in_varbinary) AS utf8_octets_interpreted_as_cp437
FROM (
SELECT original_string, dbo.nvarchar_to_utf8_varbinary(original_string) AS utf8_encoded_in_varbinary
FROM (
SELECT N'Intel Core™' AS original_string
UNION ALL SELECT N'FP202W 20”'
UNION ALL SELECT N'kijken van familiefoto’s'
UNION ALL SELECT N'benötigen'
UNION ALL SELECT N'Ladegerät'
) AS A
) AS A
) AS A
GO
Go to Top of Page
   

- Advertisement -