|
|
Code?
There isn't really any code - but these are related:
NCharDet
HtmlEncode
NileGlobal
Enjoy.
|
|
|
About Charsets & Unicode
ASCII
First, there was ASCII - the American Standard Code for Information Interchange -
using an enormous 7-bits to map 128 (count them!) characters including
control chars, the English alphabet and some other bits and pieces.
Extended codepages
When presented with the problem of storing and displaying other languages
on early computers, simply extending the underlying representation to
9, 10 or more bits to cope with more characters
really wasn't as simple as it might sound. The
underlying architecture (hardware, compilers, etc) was focussed on
shuffling 8 bits around, and the compilers and existing software
were already using ASCII, EBCIDIC or something of the sort.
Various different
encodings were proposed
Unicode
The 'ultimate code-page', Unicode defines an enormous set of code-points
to represent (almost) every character used in any language! These code-points
can be represented in different forms such as 7-bit, 8-bit, 16-bit, 32-bit...
Using Unicode
Development environments like Microsoft.NET and Java now use
Unicode internally to represent a 'character', making the development
of multilingual applications significantly easier. However you still
need to be aware of how you display characters to the user,
particular on the web.
|
|
Useful links
Unicode.org
kinda obvious...
i18nguy.com on Unicode
more references than you can poke a stick at...
character code issues
A tutorial on character code issues by Jukka "Yucca" Korpela.
Unicode Case Mapping
Discussion of how Mozilla handles ToUpper and ToLower case conversions for Unicode data
|