Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Other Forums
 Other Topics
 UNICODE File Attribute

Author  Topic 

SamC
White Water Yakist

3467 Posts

Posted - 2004-05-23 : 14:34:44
Anyone know a way to inspect a file property to see if it was saved in UNICODE (not ANSI) format (using Windows Explorer).

It would be a great help if there were a way to change all files in a folder to be UNICODE format without opening each file individually and saving back with the UNICODE attribute? Maybe using copy with target file attribute set to UNICODE?

Sam

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-05-24 : 03:33:56
I think the only way that programs like notepad decide on character encoding is by looking for a Unicode Byte Order Mark at the start of the file.

  • If it starts with FE FF then it's UTF-16 BE "Unicode Big Endian"

  • If it starts with FF FE then it's UTF-16 LE "Unicode"

  • There's also some heuristic used by notepad for detecting UTF-16 LE files without a BOM, but not for UTF-16 BE

  • If it starts with EF BB BF then it's UTF-8 BOM "UTF-8"

  • There's also some heuristic used by notepad for detecting UTF-8 files without a BOM

  • Otherwise it's assumed to be whatever the current locale's 8-bit character set is, "ANSI".


Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2004-05-24 : 07:56:51
Arnold,

Thanks. I suppose opening each file and saving as "UNICODE" is the answer (Query Analyzer).

Dreamweaver calls it something else (UTF-8) I think.

Sam
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-05-24 : 08:23:07
Well, UTF-8 and UTF-16 are both character encodings for Unicode: they can encode the same character set, but use different bit patterns to do so. Specifically, UTF-16 encodes characters in the base plane (up to U+FFFF) as 2 bytes, and other characters as 4 bytes. UTF-8 encodes ASCII characters as 1 byte and other characters as 2, 3 or 4 bytes.

Query Analyzer's "Unicode" option in Save As... saves as UTF-16 in LE, with a BOM. Query Analyzer doesn't appear to be able to recognize UTF-8 files as such, so if they have a BOM, it will appear (as "") and any non-ASCII characters will go wrong.

http://www.unicode.org
http://www-106.ibm.com/developerworks/library/utfencodingforms
Go to Top of Page
   

- Advertisement -