分享

HTML编码转换、HTML部分实体

 Ralf_Jones 2006-06-22

网页中的字符编码:

1、编码转换(to Unicode)

(程序代码来源于网络)

 

Js版

<script>
test = "你好abc"
str = ""
for( i=0;   i<test.length; i++ )
{
temp = test.charCodeAt(i).toString(16);
str   += "\\u"+ new Array(5-String(temp).length).join("0") +temp;
}
document.write (str)
</script>


vbs版

Function Unicode(str1)
Dim str,temp
str = ""
For i=1   to len(str1)
temp = Hex(AscW(Mid(str1,i,1)))
If len(temp) < 5 Then   temp = right("0000" & temp, 4)
str = str & "\u" & temp
Next
Unicode = str
End Function


Function htmlentities(str)
For i = 1 to Len(str)
char = mid(str, i, 1)
If Ascw(char) > 128 then
htmlentities = htmlentities & "&#" & Ascw(char) & ";"
Else
htmlentities = htmlentities & char
End if
Next
End Function

 

coldfusion

 

function nochaoscode(str)
{
var new_str = “”;
for(i=1; i lte len(str);i=i+1){
if(asc(mid(str,i,1)) lt 128){
new_str = new_str & mid(str,i,1);
}else{
new_str = new_str & “&##” & asc(mid(str,i,1));
}
}
return new_str;
}

 


 

附:

在php中我们可以用mbstring的mb_convert_encoding函数实现这个正向及反向的转化。 如:


mb_convert_encoding ("你好", "HTML-ENTITIES", "gb2312");    //输出:你好
mb_convert_encoding ("你好", "gb2312", "HTML-ENTITIES");    //输出:你好 

 

如果需要对整个页面转化,则只需要在php文件的头部加上这三行代码:

 

mb_internal_encoding("gb2312");  // 这里的gb2312是你网站原来的编码
mb_http_output("HTML-ENTITIES");
ob_start(‘mb_output_handler‘); 


如果没有打开mbstring扩展,可以参考上的这两篇文章:
在任意字符集下正常显示网页的方法
在任意字符集下正常显示网页的方法(续)


 

2、HTML实体

 

HTML 4.01 支持 ISO 8859-1 (Latin-1) 字符集。

提示 实体名是区分大小写的。

备注 同一个符号,可以用“实体名称”和“实体编号”两种方式引用,“实体名称”的优势在于便于记忆,但不能保证所有的浏览器都能顺利识别它,而“实体编号”则没有这种担忧,但它实在不方便记忆。


ASCII中部分实体的新名字

显示

描述

实体名称

实体编号

"

quotation mark

" "
apostrophe 

' (IE下无效)

'
& ampersand & &
less-than < <
greater-than > >

ISO 8859-1 符号实体

显示

描述

实体名称

实体编号

 

non-breaking space

   
¡

inverted exclamation mark

&iexcl; ¡
¤ currency &curren; ¤

cent &cent; ¢

pound &pound; £

yen &yen; ¥
¦

broken vertical bar

&brvbar; ¦
§ section &sect; §
¨

spacing diaeresis

&uml; ¨
© copyright &copy; ©
a

feminine ordinal indicator

&ordf; ª
«

angle quotation mark (left)

&laquo; «
negation &not; ¬
-

soft hyphen

&shy; ­
®

registered trademark

&reg; ®
trademark &trade;
ˉ

spacing macron

&macr; ¯
° degree &deg; °
± plus-or-minus  &plusmn; ±
2

superscript 2

&sup2; ²
3

superscript 3

&sup3; ³

spacing acute

&acute;

´
μ micro &micro; µ
paragraph &para;
·

middle dot

&middot; ·

spacing cedilla

&cedil; ¸
1

superscript 1

&sup1; ¹
o

masculine ordinal indicator

&ordm; º
»

angle quotation mark (right)

&raquo; »

fraction 1/4

&frac14; ¼

fraction 1/2

&frac12; ½

fraction 3/4

&frac34; ¾

inverted question mark

&iquest; ¿
× multiplication × ×
÷ division ÷ ÷

ISO 8859-1 字符实体

显示

描述

实体名称

实体编号

À

capital a, grave accent

&Agrave; À
Á

capital a, acute accent

&Aacute; Á
Â

capital a, circumflex accent

&Acirc; Â
Ã

capital a, tilde

&Atilde; Ã
Ä

capital a, umlaut mark

&Auml; Ä
Å

capital a, ring

&Aring; Å
Æ

capital ae

&AElig; Æ
Ç

capital c, cedilla

&Ccedil; Ç
È

capital e, grave accent

&Egrave; È
É

capital e, acute accent

&Eacute; É
Ê

capital e, circumflex accent

&Ecirc; Ê
Ë

capital e, umlaut mark

&Euml; Ë
Ì

capital i, grave accent

&Igrave; Ì
Í

capital i, acute accent

&Iacute; Í
Î

capital i, circumflex accent

&Icirc; Î
Ï

capital i, umlaut mark

&Iuml; Ï
Ð

capital eth, Icelandic

&ETH; Ð
Ñ

capital n, tilde

&Ntilde; Ñ
Ò

capital o, grave accent

&Ograve; Ò
Ó

capital o, acute accent

&Oacute; Ó
Ô

capital o, circumflex accent

&Ocirc; Ô
Õ

capital o, tilde

&Otilde; Õ
Ö

capital o, umlaut mark

&Ouml; Ö
Ø

capital o, slash

&Oslash; Ø
ù

capital u, grave accent

&Ugrave; Ù
ú

capital u, acute accent

&Uacute; Ú

capital u, circumflex accent

&Ucirc; Û
ü

capital u, umlaut mark

&Uuml; Ü
Y

capital y, acute accent

&Yacute; Ý
T

capital THORN, Icelandic

&THORN; Þ

small sharp s, German

&szlig; ß
à

small a, grave accent

&agrave; à
á

small a, acute accent

&aacute; á
a

small a, circumflex accent

&acirc; â

small a, tilde

&atilde; ã

small a, umlaut mark

&auml; ä

small a, ring

&aring; å

small ae

&aelig; æ

small c, cedilla

&ccedil; ç
è

small e, grave accent

&egrave; è
é

small e, acute accent

&eacute; é
ê

small e, circumflex accent

&ecirc; ê

small e, umlaut mark

&euml; ë
ì

small i, grave accent

&igrave; ì
í

small i, acute accent

&iacute; í

small i, circumflex accent

&icirc; î

small i, umlaut mark

&iuml; ï
e

small eth, Icelandic

&eth; ð

small n, tilde

&ntilde; ñ
ò

small o, grave accent

&ograve; ò
ó

small o, acute accent

&oacute; ó

small o, circumflex accent

&ocirc; ô

small o, tilde

&otilde; õ

small o, umlaut mark

&ouml; ö

small o, slash

&oslash; ø
ù

small u, grave accent

&ugrave; ù
ú

small u, acute accent

&uacute; ú

small u, circumflex accent

&ucirc; û
ü

small u, umlaut mark

&uuml; ü
y

small y, acute accent

&yacute; ý
t

small thorn, Icelandic

&thorn; þ

small y, umlaut mark

&yuml; ÿ

其它一些 HTML 所支持的实体

显示

描述

实体名称

实体编号

Œ

capital ligature OE

&OElig; Œ
œ

small ligature oe

&oelig; œ
Š

capital S with caron

&Scaron; Š
š

small S with caron

&scaron; š
Ÿ

capital Y with diaeres

&Yuml; Ÿ
ˆ

modifier letter circumflex accent

&circ; ˆ
˜

small tilde

&tilde; ˜

en space

&ensp;

em space

&emsp;

thin space

&thinsp;

zero width non-joiner

&zwnj;

zero width joiner

&zwj;

left-to-right mark

&lrm;

right-to-left mark

&rlm;

en dash

&ndash;

em dash

&mdash;

left single quotation mark

&lsquo;

right single quotation mark

&rsquo;

single low-9 quotation mark

&sbquo;

left double quotation mark

&ldquo;

right double quotation mark

&rdquo;

double low-9 quotation mark

&bdquo;
dagger &dagger;

double dagger

&Dagger;

horizontal ellipsis

&hellip;

per mille 

&permil;

single left-pointing angle quotation

&lsaquo;

single right-pointing angle quotation

&rsaquo;
euro &euro;

 


参考:

http://www./?p=72
http://www./forum/read.php?tid=258

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多