2019-12-02

Uri.ToString() automatically decodes url encoded characters

Uri.ToString() has been discussed quite a lot and its output has not always been consequent.
It can decode URL encoded characters in the URL, especially in the query part where it will be troublesome.

See the article from Anders Abel that discusses the differences between framework 4.0 and framework 4.5 https://coding.abel.nu/2014/10/beware-of-uri-tostring/

As shown in this article, more changes have been introduced int the newer frameworks.

I have done some testing by setting different targetFramework in the httpRuntime element of web.config.

Tests have been performed by creating a URI with a parameter and iterating all ASCII characters (256) and then calling ToString().

for(var i=0;i<16*16;i++) {
  var enc = $"%{i:x2}".ToUpper();
  var uri = new Uri($"https://some.site/?key={enc}");
  uri.ToString();
}

Tests show that different encodings are being decoded when other encodings are kept and the following table shows the differences between each framework where x marks the characters being decoded and a blank space shows that the encoding is kept even after ToString().



targetFramework decodedControl
00-1F

20
!
21
"
22
#
23
$
24
%
25
&
26
'()*+,
27-2C
-
2D
.
2E
/
2F
0-9
30-39
:
3A
;
3B
<
3C
=
3D
>
3E
?
3F
@
40
A-Z
41-5A
[
5B
\
5C
]
5D
^_`a-z{\|}~
5E-7E
DEL
7F
extended
80-FF
4.0 126 x x x x x x x x x x x x x x x x x x x x x x x
4.5 - 4.7.1 83 x x x x x x x x x x x x x x x
4.7.2 - 4.8 75 x x x x x x x x x x
The decoded column lists the number of characters that is decoded (of the 256 possible ASCII characters available).

Summary

ToString always decodes
%20,%22,%2D,%2E,%2F,%30-%39,%3C,%3E,%41-%5A,%5E-%7E 
 "-./0-9<>A-Z^_`a-z{\|}~

I'm not sure that I would like the %20 (whitespace) to be decoded but that's perhaps a personal preference. Of course, it's readability is improving but for programming use, I'm not a fan.

4.5-4.7.1 also decodes
%21,%27-%2C,%3A,%5B,%5D 
!'()*+,:[]

4.0 also decodes
-%1F,%24,%26,%3B,%3D,%3F,%40,%5C 
Control characters $&;=?@\

4.0 is not good since an embedded query string will be decoded since an encoded & and = (%26,%3D) in a value will be decoded.

Example

https://some.site/?key=key%3Dvalue
will be decoded to
https://some.site/?key=key=value
which will mess up the whole meaning

Workaround

If you would like to avoid decoding any URL:s you can use the following extension method.
This will use the AbsoluteUri when we are dealing with an absolute URI and the OriginalString for relative

public static string ToUrl(this Uri uri)
{
    if (uri == null) return null;
    if (uri.IsAbsoluteUri) return uri.AbsoluteUri;
    return uri.OriginalString;
}

Conclusion


  1. Don't use target framework 4.0, at least switch to 4.5.
  2. If you want to avoid encoding, don't use Uri.ToString(). It will always decode some encoded characters (even if newer frameworks are better att avoiding characters that might get you in trouble).

No comments:

Post a Comment