在抓取某网站数据,结果在数据包中发现了一串编码的数据:"......u65b0u6d6au5faeu535a......", 这其实是中文被unicode编码后了的数据,想解码出中文来。
解决方案:
方案A(稳定版+推荐):
<span>function</span> replace_unicode_escape_sequence(<span>$match</span><span>) {
</span><span>return</span> mb_convert_encoding(<span>pack</span>('H*', <span>$match</span>[1]), 'UTF-8', 'UCS-2BE'<span>);
}
</span><span>$name</span> = 'u65b0u6d6au5faeu535a'<span>;
</span><span>$str</span> = <span>preg_replace_callback</span>('/\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', <span>$name</span><span>);
</span><span>echo</span> <span>$str</span>; <span>//</span><span>输出: 新浪微博
// www.jbxue.com 脚本学堂
//咱将上述方案A给封装起来~~~(方案A稳定版+升级+推荐)</span>
<span>class</span><span> Helper_Tool
{
</span><span>static</span> <span>function</span> unicodeDecode(<span>$data</span><span>)
{
</span><span>function</span> replace_unicode_escape_sequence(<span>$match</span><span>) {
</span><span>return</span> mb_convert_encoding(<span>pack</span>('H*', <span>$match</span>[1]), 'UTF-8', 'UCS-2BE'<span>);
}
</span><span>$rs</span> = <span>preg_replace_callback</span>('/\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', <span>$data</span><span>);
</span><span>return</span> <span>$rs</span><span>;
}
}
</span><span>//</span><span>调用</span>
<span>$name</span> = 'u65b0u6d6au5faeu535a'<span>;
</span><span>$data</span> = Helper_Tool::unicodeDecode(<span>$name</span>); <span>//</span><span>输出新浪微博</span>小贴士:多翻翻国外的php教程,很有帮助哦。
方案B(次推荐):
<?<span>php
</span><span>function</span> unicodeDecode(<span>$name</span><span>){
</span><span>$json</span> = '{"str":"'.<span>$name</span>.'"}'<span>;
</span><span>$arr</span> = json_decode(<span>$json</span>,<span>true</span><span>);
</span><span>if</span>(<span>empty</span>(<span>$arr</span>)) <span>return</span> ''<span>;
</span><span>return</span> <span>$arr</span>['str'<span>];
} // www.jbxue.com
</span><span>$name</span> = 'u65b0u6d6au5faeu535a'<span>;
</span><span>echo</span> unicodeDecode(<span>$name</span>); <span>//</span><span>输出: 新浪微博 </span>对于方案B, 注意事项, 在好友 XAR (猛戳XAR博客) 的技术支持下,总结出要处理的字符串(即传递给函数unicodeDecode的参数$name的内容中一定不能包含单引号,否则就会导致解析失败, 所以有必要的话可以借助 str_replace()函数将非法字符格式化为合格字符)
立即学习“PHP免费学习笔记(深入)”;
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号