Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
donpolix
Yak Posting Veteran
97 Posts |
Posted - 2005-06-09 : 07:28:27
|
| If I have a field with type nvarchar(4000), I want to query this field so that if the data contains any double byte characters, it will return 'NON ENGLISH CHARS'. for example: fieldname = MessageFieldMessageField--------------asdfq werrtu im äµ�¬äŽz2xcv asdf12 ‹X . helloI want to return it this way:MessageField--------------asdfq wer'NON ENGLISH CHARS'z2xcv asdf'NON ENGLISH CHARS'Appreciate any help on this. Thanks!Donn Policarpio |
|
|
donpolix
Yak Posting Veteran
97 Posts |
Posted - 2005-06-09 : 07:40:52
|
| any help.Donn Policarpio |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2005-06-09 : 08:22:30
|
| Assuming that you don't stray outside the Unicode baseplane, all characters in an nvarchar column are stored as two bytes. (It's probably not a good idea to expect SQL Server 2000 to work well with characters outside the baseplane, since nvarchar was based on Unicode 2.0 and that didn't have any.)So presumably what you meant was characters that would require more than 1 byte if stored as UTF-8. In that case, it's just a question of finding characters c where UNICODE(c) >= 128.Can't think of any quick ways of doing that. Two slow ways are:1. write a UTF that returns whether an nvarchar value contains any characters with a Unicode codepoint >= 128.2. use a tally table that replaces the WHILE loop in the UTF and do the same check in the query itself.Neither is very appealing! |
 |
|
|
donpolix
Yak Posting Veteran
97 Posts |
Posted - 2005-06-09 : 20:55:26
|
quote: Originally posted by Arnold Fribble Assuming that you don't stray outside the Unicode baseplane, all characters in an nvarchar column are stored as two bytes. (It's probably not a good idea to expect SQL Server 2000 to work well with characters outside the baseplane, since nvarchar was based on Unicode 2.0 and that didn't have any.)So presumably what you meant was characters that would require more than 1 byte if stored as UTF-8. In that case, it's just a question of finding characters c where UNICODE(c) >= 128.Can't think of any quick ways of doing that. Two slow ways are:1. write a UTF that returns whether an nvarchar value contains any characters with a Unicode codepoint >= 128.2. use a tally table that replaces the WHILE loop in the UTF and do the same check in the query itself.Neither is very appealing!
Thanks Arnold!>128 it is. I actually wrote a UDF to scan each characters, however, this is painful since I am about to go through a bunch of text (as in a body of an email)..But since, if anybody else might also think of going with this crunching approach (like what you suggested), I will be fine with it. I guess I have no other choice =(.Donn Policarpio |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2005-06-10 : 03:37:26
|
I'm being an idiot! You only want to know whether they contain characters where UNICODE(c) >= 128, you don't need to know what the characters are. Consequently, this will work:CASE WHEN MessageField COLLATE Latin1_General_BIN LIKE N'%[' + NCHAR(128) + '-' + NCHAR(65535) + N']%'THEN N'''NON ENGLISH CHARS'''ELSE MessageFieldEND |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-06-10 : 03:49:37
|
>> I'm being an idiot!Gee, that must be by Your standard.What does that make the rest of us, imbecils ? rockmoose |
 |
|
|
|
|
|
|
|