URL Encoding and Best Practices
URL encoding (URL Encoding
), also known as Percent Encoding
, is an encoding mechanism used in URLs consisting of % followed by two hexadecimal digits. See Percent Encoding.
Unreserved Characters
The following characters are unreserved characters:
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
It is recommended that all URIs must not percent-encode unreserved characters, and other characters should be converted to UTF-8 first and then percent-encoded byte by byte.
Encoding Rules
- Unreserved characters remain unchanged
- Other characters are first converted to UTF-8 encoding
- Each byte of UTF-8 encoding is converted to percent encoding
Conversion Process
For example, 山月
(Chinese characters):
- In Unicode To UTF-8, we can see that the UTF-8 encoding of
山月
isE5 B1 B1 E6 9C 88
- Add percent signs to each encoded UTF-8 byte:
%E5%B1%B1%E6%9C%88
API
Note the different handling of reserved characters !
(
etc. in various language APIs
JavaScript
// => '%E5%B1%B1%E6%9C%88'
encodeURIComponent('山月')
// => '山月'
decodeURIComponent('%E5%B1%B1%E6%9C%88')
// => '(!'
encodeURIComponent('(!')
Python
from urllib.parse import quote, unquote
# => '%E5%B1%B1%E6%9C%88'
quote('山月')
# => '山月'
unquote('%E5%B1%B1%E6%9C%88')
# => '%3F%21'
quote('?!')
URL Encoding Best Practices
- Always encode user input: Don't assume user input contains only safe characters.
- Use the correct encoding function: Encoding functions in different languages may have subtle differences, choose the one that suits your needs. For example, in
JavaScript
, there's a difference betweenencodeURIComponent
andencodeURI
. - Pay attention to encoding scope: Some characters (like
/
) have different meanings in different parts of URLs, decide whether to encode based on context. - Avoid double encoding: Decoding and then encoding again may lead to unexpected results.
- Consider internationalization: Ensure your application can correctly handle various languages and character sets.
- Test edge cases: Test inputs containing various special characters and non-ASCII characters.
- Follow RFC standards: Refer to RFC 3986 for more details.
Related Tools
- Unicode To UTF-8: Understand how characters are encoded in UTF-8 before URL encoding
- HTML Entity Encoding: Another encoding method commonly used in web development
- Base64 Encoding: Another common encoding method for binary data
Connection to UTF-8
URL encoding is closely related to UTF-8 encoding. When URL encoding non-ASCII characters:
- First step: Convert the character to UTF-8 bytes
- Second step: Apply percent encoding to each UTF-8 byte
Understanding UTF-8 encoding helps you better understand why URL encoding produces specific results. For example, the Chinese character 山
becomes %E5%B1%B1
because:
山
in UTF-8 is the bytesE5 B1 B1
- Each byte gets a
%
prefix:%E5%B1%B1
Use our UTF-8 encoding tool to see the detailed conversion process!
Conclusion
URL encoding is an indispensable part of web development. Properly understanding and applying URL encoding can help you build more robust and secure applications. By following the best practices mentioned in this article, you can avoid many common issues related to URL encoding and improve the reliability and user experience of your applications.