Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
SamC
White Water Yakist
3467 Posts |
Posted - 2004-05-23 : 14:34:44
|
Anyone know a way to inspect a file property to see if it was saved in UNICODE (not ANSI) format (using Windows Explorer).It would be a great help if there were a way to change all files in a folder to be UNICODE format without opening each file individually and saving back with the UNICODE attribute? Maybe using copy with target file attribute set to UNICODE?Sam |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-05-24 : 03:33:56
|
I think the only way that programs like notepad decide on character encoding is by looking for a Unicode Byte Order Mark at the start of the file.- If it starts with FE FF then it's UTF-16 BE "Unicode Big Endian"
- If it starts with FF FE then it's UTF-16 LE "Unicode"
- There's also some heuristic used by notepad for detecting UTF-16 LE files without a BOM, but not for UTF-16 BE
- If it starts with EF BB BF then it's UTF-8 BOM "UTF-8"
- There's also some heuristic used by notepad for detecting UTF-8 files without a BOM
- Otherwise it's assumed to be whatever the current locale's 8-bit character set is, "ANSI".
|
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2004-05-24 : 07:56:51
|
Arnold,Thanks. I suppose opening each file and saving as "UNICODE" is the answer (Query Analyzer).Dreamweaver calls it something else (UTF-8) I think.Sam |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2004-05-24 : 08:23:07
|
Well, UTF-8 and UTF-16 are both character encodings for Unicode: they can encode the same character set, but use different bit patterns to do so. Specifically, UTF-16 encodes characters in the base plane (up to U+FFFF) as 2 bytes, and other characters as 4 bytes. UTF-8 encodes ASCII characters as 1 byte and other characters as 2, 3 or 4 bytes.Query Analyzer's "Unicode" option in Save As... saves as UTF-16 in LE, with a BOM. Query Analyzer doesn't appear to be able to recognize UTF-8 files as such, so if they have a BOM, it will appear (as "") and any non-ASCII characters will go wrong.http://www.unicode.orghttp://www-106.ibm.com/developerworks/library/utfencodingforms |
|
|
|
|
|