网页中的字符编码:
1、编码转换(to Unicode)
(程序代码来源于网络)
Js版
<script>
test = "你好abc"
str = ""
for( i=0; i<test.length; i++ )
{
temp = test.charCodeAt(i).toString(16);
str += "\\u"+ new Array(5-String(temp).length).join("0") +temp;
}
document.write (str)
</script>
vbs版
Function Unicode(str1)
Dim str,temp
str = ""
For i=1 to len(str1)
temp = Hex(AscW(Mid(str1,i,1)))
If len(temp) < 5 Then temp = right("0000" & temp, 4)
str = str & "\u" & temp
Next
Unicode = str
End Function
Function htmlentities(str)
For i = 1 to Len(str)
char = mid(str, i, 1)
If Ascw(char) > 128 then
htmlentities = htmlentities & "" & Ascw(char) & ";"
Else
htmlentities = htmlentities & char
End if
Next
End Function
coldfusion版
function nochaoscode(str)
{
var new_str = “”;
for(i=1; i lte len(str);i=i+1){
if(asc(mid(str,i,1)) lt 128){
new_str = new_str & mid(str,i,1);
}else{
new_str = new_str & “#” & asc(mid(str,i,1));
}
}
return new_str;
}
附:
在php中我们可以用mbstring的mb_convert_encoding函数实现这个正向及反向的转化。 如:
mb_convert_encoding ("你好", "HTML-ENTITIES", "gb2312"); //输出:你好 mb_convert_encoding ("你好", "gb2312", "HTML-ENTITIES"); //输出:你好
如果需要对整个页面转化,则只需要在php文件的头部加上这三行代码:
mb_internal_encoding("gb2312"); // 这里的gb2312是你网站原来的编码 mb_http_output("HTML-ENTITIES"); ob_start(‘mb_output_handler‘);
如果没有打开mbstring扩展,可以参考上的这两篇文章: 在任意字符集下正常显示网页的方法 在任意字符集下正常显示网页的方法(续)
2、HTML实体
HTML 4.01 支持 ISO 8859-1 (Latin-1) 字符集。
提示 实体名是区分大小写的。
备注 同一个符号,可以用“实体名称”和“实体编号”两种方式引用,“实体名称”的优势在于便于记忆,但不能保证所有的浏览器都能顺利识别它,而“实体编号”则没有这种担忧,但它实在不方便记忆。
ASCII中部分实体的新名字
显示
|
描述
|
实体名称
|
实体编号
|
" |
quotation mark
|
" |
" |
‘ |
apostrophe |
' (IE下无效)
|
' |
& |
ampersand |
& |
& |
< |
less-than |
< |
< |
> |
greater-than |
> |
> |
ISO 8859-1 符号实体
显示
|
描述
|
实体名称
|
实体编号
|
|
non-breaking space
|
|
|
¡ |
inverted exclamation mark
|
¡ |
¡ |
¤ |
currency |
¤ |
¤ |
¢
|
cent |
¢ |
¢ |
£
|
pound |
£ |
£ |
¥
|
yen |
¥ |
¥ |
¦ |
broken vertical bar
|
¦ |
¦ |
§ |
section |
§ |
§ |
¨ |
spacing diaeresis
|
¨ |
¨ |
© |
copyright |
© |
© |
a |
feminine ordinal indicator
|
ª |
ª |
« |
angle quotation mark (left)
|
« |
« |
|
negation |
¬ |
¬ |
- |
soft hyphen
|
­ |
|
® |
registered trademark
|
® |
® |
™ |
trademark |
™ |
™ |
ˉ |
spacing macron
|
¯ |
¯ |
° |
degree |
° |
° |
± |
plus-or-minus |
± |
± |
2 |
superscript 2
|
² |
² |
3 |
superscript 3
|
³ |
³ |
′ |
spacing acute
|
´
|
´ |
μ |
micro |
µ |
µ |
|
paragraph |
¶ |
¶ |
· |
middle dot
|
· |
· |
|
spacing cedilla
|
¸ |
¸ |
1 |
superscript 1
|
¹ |
¹ |
o |
masculine ordinal indicator
|
º |
º |
» |
angle quotation mark (right)
|
» |
» |
|
fraction 1/4
|
¼ |
¼ |
|
fraction 1/2
|
½ |
½ |
|
fraction 3/4
|
¾ |
¾ |
|
inverted question mark
|
¿ |
¿ |
× |
multiplication |
× |
× |
÷ |
division |
÷ |
÷ |
ISO 8859-1 字符实体
显示
|
描述
|
实体名称
|
实体编号
|
À |
capital a, grave accent
|
À |
À |
Á |
capital a, acute accent
|
Á |
Á |
 |
capital a, circumflex accent
|
 |
 |
à |
capital a, tilde
|
à |
à |
Ä |
capital a, umlaut mark
|
Ä |
Ä |
Å |
capital a, ring
|
Å |
Å |
Æ |
capital ae
|
Æ |
Æ |
Ç |
capital c, cedilla
|
Ç |
Ç |
È |
capital e, grave accent
|
È |
È |
É |
capital e, acute accent
|
É |
É |
Ê |
capital e, circumflex accent
|
Ê |
Ê |
Ë |
capital e, umlaut mark
|
Ë |
Ë |
Ì |
capital i, grave accent
|
Ì |
Ì |
Í |
capital i, acute accent
|
Í |
Í |
Î |
capital i, circumflex accent
|
Î |
Î |
Ï |
capital i, umlaut mark
|
Ï |
Ï |
Ð |
capital eth, Icelandic
|
Ð |
Ð |
Ñ |
capital n, tilde
|
Ñ |
Ñ |
Ò |
capital o, grave accent
|
Ò |
Ò |
Ó |
capital o, acute accent
|
Ó |
Ó |
Ô |
capital o, circumflex accent
|
Ô |
Ô |
Õ |
capital o, tilde
|
Õ |
Õ |
Ö |
capital o, umlaut mark
|
Ö |
Ö |
Ø |
capital o, slash
|
Ø |
Ø |
ù |
capital u, grave accent
|
Ù |
Ù |
ú |
capital u, acute accent
|
Ú |
Ú |
|
capital u, circumflex accent
|
Û |
Û |
ü |
capital u, umlaut mark
|
Ü |
Ü |
Y |
capital y, acute accent
|
Ý |
Ý |
T |
capital THORN, Icelandic
|
Þ |
Þ |
|
small sharp s, German
|
ß |
ß |
à |
small a, grave accent
|
à |
à |
á |
small a, acute accent
|
á |
á |
a |
small a, circumflex accent
|
â |
â |
|
small a, tilde
|
ã |
ã |
|
small a, umlaut mark
|
ä |
ä |
|
small a, ring
|
å |
å |
|
small ae
|
æ |
æ |
|
small c, cedilla
|
ç |
ç |
è |
small e, grave accent
|
è |
è |
é |
small e, acute accent
|
é |
é |
ê |
small e, circumflex accent
|
ê |
ê |
|
small e, umlaut mark
|
ë |
ë |
ì |
small i, grave accent
|
ì |
ì |
í |
small i, acute accent
|
í |
í |
|
small i, circumflex accent
|
î |
î |
|
small i, umlaut mark
|
ï |
ï |
e |
small eth, Icelandic
|
ð |
ð |
|
small n, tilde
|
ñ |
ñ |
ò |
small o, grave accent
|
ò |
ò |
ó |
small o, acute accent
|
ó |
ó |
|
small o, circumflex accent
|
ô |
ô |
|
small o, tilde
|
õ |
õ |
|
small o, umlaut mark
|
ö |
ö |
|
small o, slash
|
ø |
ø |
ù |
small u, grave accent
|
ù |
ù |
ú |
small u, acute accent
|
ú |
ú |
|
small u, circumflex accent
|
û |
û |
ü |
small u, umlaut mark
|
ü |
ü |
y |
small y, acute accent
|
ý |
ý |
t |
small thorn, Icelandic
|
þ |
þ |
|
small y, umlaut mark
|
ÿ |
ÿ |
其它一些 HTML 所支持的实体
显示
|
描述
|
实体名称
|
实体编号
|
Œ |
capital ligature OE
|
Œ |
Œ |
œ |
small ligature oe
|
œ |
œ |
Š |
capital S with caron
|
Š |
Š |
š |
small S with caron
|
š |
š |
Ÿ |
capital Y with diaeres
|
Ÿ |
Ÿ |
ˆ |
modifier letter circumflex accent
|
ˆ |
ˆ |
˜ |
small tilde
|
˜ |
˜ |
|
en space
|
  |
|
|
em space
|
  |
|
|
thin space
|
  |
|
|
zero width non-joiner
|
‌ |
|
|
zero width joiner
|
‍ |
|
|
left-to-right mark
|
‎ |
|
|
right-to-left mark
|
‏ |
|
– |
en dash
|
– |
– |
— |
em dash
|
— |
— |
‘ |
left single quotation mark
|
‘ |
‘ |
’ |
right single quotation mark
|
’ |
’ |
‚ |
single low-9 quotation mark
|
‚ |
‚ |
“ |
left double quotation mark
|
“ |
“ |
” |
right double quotation mark
|
” |
” |
„ |
double low-9 quotation mark
|
„ |
„ |
† |
dagger |
† |
† |
‡ |
double dagger
|
‡ |
‡ |
… |
horizontal ellipsis
|
… |
… |
‰ |
per mille
|
‰ |
‰ |
‹ |
single left-pointing angle quotation
|
‹ |
‹ |
› |
single right-pointing angle quotation
|
› |
› |
€ |
euro |
€ |
€ |
参考:
http://www./?p=72 http://www./forum/read.php?tid=258
|