在应用程序的开发过程中经常需要通过HTTP、HTTPS协议与服务器交互,这其中一个常见的情况就是要求对URL进行encode处理,以保证中文、特殊字符的正确传递.
1. URLEncode:发送给服务端的请求中的参数值,如果含有特殊符号,需要是做URLEncode,服务端才可以正常解析,否则可能会出错。URLEncode主要是把一些特殊字符转换成转意字符,比如:&要转换成&这样的。
Java1.2提供了一个类URLEncoder把String字符串编码成"%xx"形式。同时Java1.2也增加了一个类URLDecoder,它能以这种形式解码String。这个方法之前总是用它所在平台的默认编码形式,所以在不同系统上,它就会产生不同的结果,但是在java1.4中,这个方法被另一种方法取代了。
特别需要注意的是这个方法对符号也做了编码,"" ,"&","=",":"和" ",并把空格" " 转换成了 + 。它不会尝试着去规定在一个URL中这些字符怎样被使用。由此,你不得不分块编码你的URL,而不是把整个URL一次传给这个方法。这是很重要的,因为为了和服务器端使用GET方式的程序进行交互,我们经常进行查询String,而get方式查询时难免会有一些中文及特殊字符的关键字.
URLEncode,对url进行编码:
1⃣️.数字和字母不变,中文会变化; 2⃣️.空格变为"+"号; 3⃣️.其他被编码成"%xx"加上他们的ASCII的十六进制。
2. URLDecode:与URLEncoder 类相对应的URLDecoder 类有两种静态方法。它们解码以x-www-form-url-encoded这种形式编码的String。也就是说,它们把所有的加号(+)转换成空格符,把所有的%xx分别转换成与之相对应的字符:
URLDeCode,对url进行解码:
1️⃣:如果是页面解码,其实Request.QueryString()会自动做解码的动作,无需再写一遍URLDeCode。
2️⃣:如果是其他地方调用,如Andriod中调用.net的WebService,则需要做一次解码的动作。
解码和编码的唯一不同是:解码时只对16进制数(中文编码)解码,而反斜杠/,英文,数字是不会有所改变的,所以解码时不需要以斜杠/ 为界限,利用split()分割来解码了,而是直接传入路径.
例如 String decode = URLDecoder.decode("http://www.dbank.com/documents/%E5%8D%8E%E4%BB%94.jpg","UTF-8");
//打印结果 http://www.dbank.com/documents/图片/华仔.jpg
二.使用场景:
假如我们要下载一个图片,下载路径是 http://www.dbank.com/documents/图片/华仔.jpg,如果直接用new URL(path)的方式,是无法取得网络连接的,因为在程序中无法直接请求一个含有中文的url,所以,我们要对它编码.
我们一般会编成gbk或者utf-8,但是URL编码和gbk,utf-8编码的区别是,它不会把所有字符都编码,它只对非字母及数字的字符借助utf-8或gbk来编码。
编码时注意:不能把整个url路径传到UrlEncoder的decode方法中.因为它也会把反斜杠/和冒号:等特殊字符也编码,所以域名部分是不需要编码的,取出路径部分 /图片/华仔.jpg,然后用split方法分割再分别编码,算法如下
public static String formatUrl(String url) throws
UnsupportedEncodingException{
String[] dir = url.split("/");
StringBuffer tempPath = new StringBuffer("");
for(int i = 0; i < dir.length; i++){
tempPath.append(URLEncoder.encode(dir[i], "UTF-8"));
tempPath.append("/");
}
return tempPath.toString().substring(0, tempPath.length() - 1);
}
最后获取的就是: 例如 "http://www.dbank.com/documents/%E5%8D%8E%E4%BB%94.jpg" 。这下可以new URL(url)了.
思考:为什么我们可以直接在浏览器中请求一个含有中文的url路径,这是为什么呢?这是因为,浏览器的内部程序作了判断,不管什么URL,它都给来统一为URL编码,所以含有中文的就可以请求了.
总结:
如果想让URL对象成功new出来,url地址必须要符合2个条件: 1⃣️.不能包含中文; 2⃣️.符合ur地址的格式,即 http://xx.com/a/b
三.具体用法:public class URLDecoder extends Object
public class URLEncoder extends Object
//编码:
String encode = URLEncoder.encode("中国","UTF-8");
//解码:
String decode = URLDecoder.decode(encode,"UTF-8");
这两条语句在同一个页面中的话,得到的结果是:
encode: %E4%B8%AD%E5%9B%BD
decode: 中国
简要封装Encode方法和Decode方法:
public static String toURLEncoded(String paramString) {
if (paramString == null || paramString.equals("")) {
Log.d("toURLEncoded error:"+paramString);
return "";
}
try{
String str = new String(paramString.getBytes(),"UTF-8");
str = URLEncoder.encode(str, "UTF-8");
return str;
}catch (Exception localException){
Log.e("toURLEncoded error:"+paramString,localException);
}
return "";
}
public static String toURLDecoded(String paramString) {
if (paramString == null || paramString.equals("")) {
Log.d("toURLDecoded error:"+paramString);
return "";
}
try{
String str = new String(paramString.getBytes(), "UTF-8");
str = URLDecoder.decode(str, "UTF-8");
return str;
}catch (Exception localException){
Log.e("toURLDecoded error:"+paramString,localException);
}
return "";
}
四.源码实现:
JDK中 URLEncoder.encode(String s, String enc) 与URLDecode.decode()方法源码:
public static String encode(String s, String enc)
throws UnsupportedEncodingException {
boolean needToChange = false;
StringBuffer out = new StringBuffer(s.length());
Charset charset;
CharArrayWriter charArrayWriter = new CharArrayWriter();
if (enc == null)
throw new NullPointerException("charsetName");
try {
charset = Charset.forName(enc);
} catch (IllegalCharsetNameException e) {
throw new UnsupportedEncodingException(enc);
} catch (UnsupportedCharsetException e) {
throw new UnsupportedEncodingException(enc);
}
for (int i = 0; i < s.length();) {
int c = (int) s.charAt(i);
//System.out.println("Examining character: " + c);
if (dontNeedEncoding.get(c)) {
if (c == ' ') {
c = '+';
needToChange = true;
}
//System.out.println("Storing: " + c);
out.append((char)c);
i++;
} else {
// convert to external encoding before hex conversion
do {
charArrayWriter.write(c);
/*
* If this character represents the start of a Unicode
* surrogate pair, then pass in two characters. It's not
* clear what should be done if a bytes reserved in the
* surrogate pairs range occurs outside of a legal
* surrogate pair. For now, just treat it as if it were
* any other character.
*/
if (c >= 0xD800 && c = 0xDC00 && d > 4) & 0xF,
16);
// converting to use uppercase letter as part of
// the hex value if ch is a letter.
if (Character.isLetter(ch)) {
ch -= caseDiff;
}
out.append(ch);
ch = Character.forDigit(ba[j] & 0xF, 16);
if (Character.isLetter(ch)) {
ch -= caseDiff;
}
out.append(ch);
}
charArrayWriter.reset();
needToChange = true;
}
}
return (needToChange? out.toString() : s);
}
这里dontNeedEncoding的初始化是在静态代码块中完成的,如下:
static BitSet dontNeedEncoding;
static final int caseDiff = ('a' - 'A');
static String dfltEncName = null;
static {
/* The list of characters that are not encoded has been
* determined as follows:
*
* RFC 2396 states:
* -----
* Data characters that are allowed in a URI but do not have a
* reserved purpose are called unreserved. These include upper
* and lower case letters, decimal digits, and a limited set of
* punctuation marks and symbols.
*
* unreserved = alphanum | mark
*
* mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
*
* Unreserved characters can be escaped without changing the
* semantics of the URI, but this should not be done unless the
* URI is being used in a context that does not allow the
* unescaped character to appear.
* -----
*
* It appears that both Netscape and Internet Explorer escape
* all special characters from this list with the exception
* of "-", "_", ".", "*". While it is not clear why they are
* escaping the other characters, perhaps it is safest to
* assume that there might be contexts in which the others
* are unsafe if not escaped. Therefore, we will use the same
* list. It is also noteworthy that this is consistent with
* O'Reilly's "HTML: The Definitive Guide" (page 164).
*
* As a last note, Intenet Explorer does not encode the "@"
* character which is clearly not unreserved according to the
* RFC. We are being consistent with the RFC in this matter,
* as is Netscape.
*
*/
dontNeedEncoding = new BitSet(256);
int i;
for (i = 'a'; i
关注
打赏
最近更新
- 深拷贝和浅拷贝的区别(重点)
- 【Vue】走进Vue框架世界
- 【云服务器】项目部署—搭建网站—vue电商后台管理系统
- 【React介绍】 一文带你深入React
- 【React】React组件实例的三大属性之state,props,refs(你学废了吗)
- 【脚手架VueCLI】从零开始,创建一个VUE项目
- 【React】深入理解React组件生命周期----图文详解(含代码)
- 【React】DOM的Diffing算法是什么?以及DOM中key的作用----经典面试题
- 【React】1_使用React脚手架创建项目步骤--------详解(含项目结构说明)
- 【React】2_如何使用react脚手架写一个简单的页面?