您当前的位置: 首页 >  Java

一一哥Sun

暂无认证

  • 4浏览

    0关注

    622博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

Day13_02_Java中的加解密之URL编码

一一哥Sun 发布时间:2019-06-27 22:05:08 ,浏览量:4

02_Java中的加解密之URL编码 URLEncode与URLDecode 一.简介:

在应用程序的开发过程中经常需要通过HTTP、HTTPS协议与服务器交互,这其中一个常见的情况就是要求对URL进行encode处理,以保证中文、特殊字符的正确传递.

1. URLEncode:

发送给服务端的请求中的参数值,如果含有特殊符号,需要是做URLEncode,服务端才可以正常解析,否则可能会出错。URLEncode主要是把一些特殊字符转换成转意字符,比如:&要转换成&这样的。

Java1.2提供了一个类URLEncoder把String字符串编码成"%xx"形式。同时Java1.2也增加了一个类URLDecoder,它能以这种形式解码String。这个方法之前总是用它所在平台的默认编码形式,所以在不同系统上,它就会产生不同的结果,但是在java1.4中,这个方法被另一种方法取代了。

特别需要注意的是这个方法对符号也做了编码,"" ,"&","=",":"和" ",并把空格" " 转换成了 + 。它不会尝试着去规定在一个URL中这些字符怎样被使用。由此,你不得不分块编码你的URL,而不是把整个URL一次传给这个方法。这是很重要的,因为为了和服务器端使用GET方式的程序进行交互,我们经常进行查询String,而get方式查询时难免会有一些中文及特殊字符的关键字.

URLEncode,对url进行编码:

 1⃣️.数字和字母不变,中文会变化;  2⃣️.空格变为"+"号;  3⃣️.其他被编码成"%xx"加上他们的ASCII的十六进制。

2. URLDecode:

与URLEncoder 类相对应的URLDecoder 类有两种静态方法。它们解码以x-www-form-url-encoded这种形式编码的String。也就是说,它们把所有的加号(+)转换成空格符,把所有的%xx分别转换成与之相对应的字符:

URLDeCode,对url进行解码:

1️⃣:如果是页面解码,其实Request.QueryString()会自动做解码的动作,无需再写一遍URLDeCode。

2️⃣:如果是其他地方调用,如Andriod中调用.net的WebService,则需要做一次解码的动作。

解码和编码的唯一不同是:解码时只对16进制数(中文编码)解码,而反斜杠/,英文,数字是不会有所改变的,所以解码时不需要以斜杠/ 为界限,利用split()分割来解码了,而是直接传入路径.

例如 String decode = URLDecoder.decode("http://www.dbank.com/documents/%E5%8D%8E%E4%BB%94.jpg","UTF-8");  
//打印结果  http://www.dbank.com/documents/图片/华仔.jpg
二.使用场景:

假如我们要下载一个图片,下载路径是 http://www.dbank.com/documents/图片/华仔.jpg,如果直接用new URL(path)的方式,是无法取得网络连接的,因为在程序中无法直接请求一个含有中文的url,所以,我们要对它编码.

我们一般会编成gbk或者utf-8,但是URL编码和gbk,utf-8编码的区别是,它不会把所有字符都编码,它只对非字母及数字的字符借助utf-8或gbk来编码。

编码时注意:不能把整个url路径传到UrlEncoder的decode方法中.因为它也会把反斜杠/和冒号:等特殊字符也编码,所以域名部分是不需要编码的,取出路径部分 /图片/华仔.jpg,然后用split方法分割再分别编码,算法如下

public static String formatUrl(String url) throws     
    UnsupportedEncodingException{  
    String[] dir = url.split("/");  
    StringBuffer tempPath = new StringBuffer("");  
    for(int i = 0; i < dir.length; i++){  
        tempPath.append(URLEncoder.encode(dir[i], "UTF-8"));  
        tempPath.append("/");  
    }  
    return tempPath.toString().substring(0, tempPath.length() - 1);  
}  

最后获取的就是: 例如 "http://www.dbank.com/documents/%E5%8D%8E%E4%BB%94.jpg" 。这下可以new URL(url)了.

思考:为什么我们可以直接在浏览器中请求一个含有中文的url路径,这是为什么呢?这是因为,浏览器的内部程序作了判断,不管什么URL,它都给来统一为URL编码,所以含有中文的就可以请求了.

总结:

如果想让URL对象成功new出来,url地址必须要符合2个条件: 1⃣️.不能包含中文; 2⃣️.符合ur地址的格式,即 http://xx.com/a/b

三.具体用法:
public class URLDecoder extends Object
public class URLEncoder extends Object

//编码:
String  encode  =  URLEncoder.encode("中国","UTF-8");   
//解码:
String  decode  =  URLDecoder.decode(encode,"UTF-8");   
    
 这两条语句在同一个页面中的话,得到的结果是:   
 encode:   %E4%B8%AD%E5%9B%BD     
 decode:   中国  
  
简要封装Encode方法和Decode方法:
public static String toURLEncoded(String paramString) {  
    if (paramString == null || paramString.equals("")) {  
        Log.d("toURLEncoded error:"+paramString);  
        return "";  
    }    
    try{  
       String str = new String(paramString.getBytes(),"UTF-8");  
       str = URLEncoder.encode(str, "UTF-8");  
       return str;  
    }catch (Exception localException){  
       Log.e("toURLEncoded error:"+paramString,localException);  
    }     
    return "";  
}  
public static String toURLDecoded(String paramString) {  
    if (paramString == null || paramString.equals("")) {  
        Log.d("toURLDecoded error:"+paramString);  
        return "";  
    }            
    try{  
      String str = new String(paramString.getBytes(), "UTF-8");  
      str = URLDecoder.decode(str, "UTF-8");  
      return str;  
    }catch (Exception localException){  
       Log.e("toURLDecoded error:"+paramString,localException);  
    }     
    return "";  
}  
四.源码实现:

JDK中 URLEncoder.encode(String s, String enc)  与URLDecode.decode()方法源码:

public static String encode(String s, String enc)    
  throws UnsupportedEncodingException {
        boolean needToChange = false;        
        StringBuffer out = new StringBuffer(s.length());
        Charset charset;
        CharArrayWriter charArrayWriter = new CharArrayWriter();

        if (enc == null)
            throw new NullPointerException("charsetName");

        try {
            charset = Charset.forName(enc);
        } catch (IllegalCharsetNameException e) {
            throw new UnsupportedEncodingException(enc);
        } catch (UnsupportedCharsetException e) {
            throw new UnsupportedEncodingException(enc);
        }

        for (int i = 0; i < s.length();) {
            int c = (int) s.charAt(i);
            //System.out.println("Examining character: " + c);
            if (dontNeedEncoding.get(c)) {
                if (c == ' ') {
                    c = '+';
                    needToChange = true;
                }
                //System.out.println("Storing: " + c);
                out.append((char)c);
                i++;
            } else {
                // convert to external encoding before hex conversion
                do {
                    charArrayWriter.write(c);
                    /*
                     * If this character represents the start of a Unicode
                     * surrogate pair, then pass in two characters. It's not
                     * clear what should be done if a bytes reserved in the
                     * surrogate pairs range occurs outside of a legal
                     * surrogate pair. For now, just treat it as if it were
                     * any other character.
                     */
                    if (c >= 0xD800 && c = 0xDC00 && d > 4) & 0xF, 
                    16);
                    // converting to use uppercase letter as part of
                    // the hex value if ch is a letter.
                    if (Character.isLetter(ch)) {
                        ch -= caseDiff;
                    }
                    out.append(ch);
                    ch = Character.forDigit(ba[j] & 0xF, 16);
                    if (Character.isLetter(ch)) {
                        ch -= caseDiff;
                    }
                    out.append(ch);
                }
                charArrayWriter.reset();
                needToChange = true;
            }
        }

        return (needToChange? out.toString() : s);
    }

这里dontNeedEncoding的初始化是在静态代码块中完成的,如下:

     static BitSet dontNeedEncoding;
    static final int caseDiff = ('a' - 'A');
    static String dfltEncName = null;

    static {

        /* The list of characters that are not encoded has been
         * determined as follows:
         *
         * RFC 2396 states:
         * -----
         * Data characters that are allowed in a URI but do not have a
         * reserved purpose are called unreserved.  These include upper
         * and lower case letters, decimal digits, and a limited set of
         * punctuation marks and symbols.
         *
         * unreserved  = alphanum | mark
         *
         * mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
         *
         * Unreserved characters can be escaped without changing the
         * semantics of the URI, but this should not be done unless the
         * URI is being used in a context that does not allow the
         * unescaped character to appear.
         * -----
         *
         * It appears that both Netscape and Internet Explorer escape
         * all special characters from this list with the exception
         * of "-", "_", ".", "*". While it is not clear why they are
         * escaping the other characters, perhaps it is safest to
         * assume that there might be contexts in which the others
         * are unsafe if not escaped. Therefore, we will use the same
         * list. It is also noteworthy that this is consistent with
         * O'Reilly's "HTML: The Definitive Guide" (page 164).
         *
         * As a last note, Intenet Explorer does not encode the "@"
         * character which is clearly not unreserved according to the
         * RFC. We are being consistent with the RFC in this matter,
         * as is Netscape.
         *
         */

        dontNeedEncoding = new BitSet(256);
        int i;
        for (i = 'a'; i             
关注
打赏
1665624836
查看更多评论
0.0551s