问题:不同网站的跳转出现乱码,不同编码的页面传递参数也出现乱码 搞清楚两个问题:
URL的编码规则
OK,无论网站使用什么平台,URL编码规则,都是一致的,所以不同网站进行页面跳转时,出现乱码问题,就是网站使用的编码不一致所引起的。 URL解码-> 转换字符串的编码->URL编码 一、PHP的解决方案:PHP字符串编码转换函数iconv() 函数 Description string iconv ( string in_charset, string out_charset, string str ) 注意:第二个参数,除了可以指定要转化到的编码以外,还可以增加两个后缀:
eg:$str = iconv("UTF-8","GB2312//TRANSLIT",$str); mb_convert_encoding() 函数 Description string mb_convert_encoding ( string str, string to-encoding [, mixed from-encoding]) 注意:需要enable mbstring 扩展库。 两者区别:mb_convert_encoding 中根据内容自动识别编码;mb_convert_encoding功能强大,但是执行效率比iconv差太多; 总结:一般情况下用 iconv,只有当遇到无法确定原编码是何种编码时用 mb_convert_encoding 函数. URL编码解码函数urlencode函数 string urlencode (string str) urlencode函数 string urldecode (string str) 其他的 实例: <?PHP $url=$_GET['url']; $url = htmlspecialchars(urldecode($url)); $keyword = iconv("UTF-8","GB2312//TRANSLIT",$url); $keyword = urlencode($url); header("Location: " . $url); ?> 二、JavaScript的解决方案 在使用url进行参数传递时,经常会传递一些中文名的参数或URL地址,在后台处理时会发生转换错误。在有些传递页面使用GB2312,而在接收页面使用 UTF8,这样接收到的参数就可能会与原来发生不一致。使用服务器端的urlEncode函数编码的URL,与使用客户端javascript的 encodeURI函数编码的URL,结果就不一样。 javaScript中的编码方法: escape() 方法: 英文解释:MSDN JScript Reference: The escape method returns a string value (in Unicode format) that contains the contents of [the argument]. All spaces, punctuation, accented characters, and any other non-ASCII characters are replaced with %xx encoding, where xx is equivalent to the hexadecimal number representing the character. For example, a space is returned as “%20.” encodeURI() 方法: 把URI字符串采用UTF-8编码格式转化成escape格式的字符串。不会被此方法编码的字符:! @ # $& * ( ) = : / ; ? + ‘ 英文解释:MSDN JScript Reference: The encodeURI method returns an encoded URI. If you pass the result to decodeURI, the original string is returned. The encodeURI method does not encode the following characters: “:”, “/”, “;”, and “?”. Use encodeURIComponent to encode these characters. Edge Core Javascript Guide: Encodes a Uniform Resource Identifier (URI) by replacing each instance of certain characters by one, two, or three escape sequences representing the UTF-8 encoding of the character encodeURIComponent() 方法: 把URI字符串采用UTF-8编码格式转化成escape格式的字符串。与encodeURI()相比,这个方法将对更多的字符进行编码,比如 / 等字符。所以如果字符串里面包含了URI的几个部分的话,不能用这个方法来进行编码,否则 / 字符被编码之后URL将显示错误。不会被此方法编码的字符:! * ( ) 英文解释:MSDN JScript Reference: The encodeURIComponent method returns an encoded URI. If you pass the result to decodeURIComponent, the original string is returned. Because the encodeURIComponent method encodes all characters, be careful if the string represents a path such as /folder1/folder2/default.html. The slash characters will be encoded and will not be valid if sent as a request to a web server. Use the encodeURI method if the string contains more than a single URI component. Mozilla Developer Core Javascript Guide: Encodes a Uniform Resource Identifier (URI) component by replacing each instance of certain characters by one, two, or three escape sequences representing the UTF-8 encoding of the character. 另外,encodeURI/encodeURIComponent是在javascript1.5之后引进的,escape则在javascript1.0版本就有。 英文注释:The escape() method does not encode the + character which is interpreted as a space on the server side as well as generated by forms with spaces in their fields. Due to this shortcoming, you should avoid use of escape() whenever possible. The best alternative is usually encodeURIComponent().Use of the encodeURI() method is a bit more specialized than escape() in that it encodes for URIs [REF] as opposed to the querystring, which is part of a URL. Use this method when you need to encode a string to be used for any resource that uses URIs and needs certain characters to remain un-encoded. Note that this method does not encode the ‘ character, as it is a valid character within URIs.Lastly, the encodeURIComponent() method should be used in most cases when encoding a single component of a URI. This method will encode certain chars that would normally be recognized as special chars for URIs so that many components may be included. Note that this method does not encode the ‘ character, as it is a valid character within URIs. 三、jsp、servlet的解决方案
在Servlet中,一般有参数传递的话,会设置页面接收参数和传递参数的编码。即下面两句:
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("utf-8"); 一般情况下,大部分都会想到使用这个,但是这两句代码的位置有时却容易被忽视。正确的写法是,request.setCharacterEncoding("UTF-8"); response.setCharacterEncoding("utf-8");要放在
PrintWriter out = response.getWriter();的后面。因为out对象初始化之后,再设置编码已经没有任何意义了!所以必须在out对象初始化之前进行编码的设置。
|
|