UNICODE File Attribute - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Other Forums

Other Topics

UNICODE File Attribute

Author

Topic

SamC
White Water Yakist

3467 Posts

Posted - 2004-05-23 : 14:34:44

Anyone know a way to inspect a file property to see if it was saved in UNICODE (not ANSI) format (using Windows Explorer).

It would be a great help if there were a way to change all files in a folder to be UNICODE format without opening each file individually and saving back with the UNICODE attribute? Maybe using copy with target file attribute set to UNICODE?

Sam

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-05-24 : 03:33:56

I think the only way that programs like notepad decide on character encoding is by looking for a Unicode Byte Order Mark at the start of the file.

If it starts with FE FF then it's UTF-16 BE "Unicode Big Endian"

If it starts with FF FE then it's UTF-16 LE "Unicode"

There's also some heuristic used by notepad for detecting UTF-16 LE files without a BOM, but not for UTF-16 BE

If it starts with EF BB BF then it's UTF-8 BOM "UTF-8"

There's also some heuristic used by notepad for detecting UTF-8 files without a BOM

Otherwise it's assumed to be whatever the current locale's 8-bit character set is, "ANSI".

SamC
White Water Yakist

3467 Posts

Posted - 2004-05-24 : 07:56:51

Arnold,

Thanks. I suppose opening each file and saving as "UNICODE" is the answer (Query Analyzer).

Dreamweaver calls it something else (UTF-8) I think.

Sam

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-05-24 : 08:23:07

Well, UTF-8 and UTF-16 are both character encodings for Unicode: they can encode the same character set, but use different bit patterns to do so. Specifically, UTF-16 encodes characters in the base plane (up to U+FFFF) as 2 bytes, and other characters as 4 bytes. UTF-8 encodes ASCII characters as 1 byte and other characters as 2, 3 or 4 bytes.

Query Analyzer's "Unicode" option in Save As... saves as UTF-16 in LE, with a BOM. Query Analyzer doesn't appear to be able to recognize UTF-8 files as such, so if they have a BOM, it will appear (as "ï»¿") and any non-ASCII characters will go wrong.

http://www.unicode.org
http://www-106.ibm.com/developerworks/library/utfencodingforms

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources