character encoding - Decode a string with unknown encode method received from web-browser -


inside webapplication processing requests url like

    http://example.com/<website-base-url> 

im logging raw parameter of request in uft8 database column , in filesystem. few chinese domains requests website-base-url parameter like

    %c3%83%c2%a3%c3%82%c2%a5%c3%83%c2%a2%c3%82%c2%a4%c3%83%c2%a2%c3%82%c2%a7%c3%83%c2%a3%c3%82%c2%a5%c3%83%c2%a2%c3%82%c2%a4%c3%83%c2%a2%c3%82%c2%b4%c3%83%c2%a3%c3%82%c2%a8%c3%83%c2%a2%c3%82%c2%b4%c3%83%c2%a2%c3%82%c2%b4.cn  

decoding urldecode returns

    ã¥â¤â§ã¥â¤â´ã¨â´â´.cn 

this not seem domain name user wants request.

i have tried urlencoding, base64, utf8 , combinations wihtout success.

any suggestions how decode given parameter utf8?

url percentage encodings encode raw bytes. not give hint regarding actual encoding of text. if not know encoding these bytes represent, can guess.

php > $d = urldecode('%c3%83%c2%a3%c3%82%c2%a5%c3%83%c2%a2%c3%82%c2%a4%c3%83%c2%a2%c3%82%c2%a7%c3%83%c2%a3%c3%82%c2%a5%c3%83%c2%a2%c3%82%c2%a4%c3%83%c2%a2%c3%82%c2%b4%c3%83%c2%a3%c3%82%c2%a8%c3%83%c2%a2%c3%82%c2%b4%c3%83%c2%a2%c3%82%c2%b4.cn');  php > echo $d; ã¥â¤â§ã¥â¤â´ã¨â´â´.cn  php > echo iconv('big5', 'utf-8', $d);  php > echo iconv('shift-jis', 'utf-8', $d); テδ」テつ・テδ「テつ、テδ「テつァテδ」テつ・テδ「テつ、テδ「テつエテδ」テつィテδ「テつエテδ「テつエ.cn  php > echo iconv('gb18030', 'utf-8', $d); 脙拢脗楼脙垄脗陇脙垄脗搂脙拢脗楼脙垄脗陇脙垄脗麓脙拢脗篓脙垄脗麓脙垄脗麓.cn 

gb18030 seem best candidate, decoded string looks bit repetitive useful chinese.


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -