近日遇到一个神奇的字“弢(tao)”。
具体的过程是这样的:
<span>1</span> <span>$list</span> = <span>explode</span>('|', 'abc弢|bc'<span>);
</span><span>2</span> <span>var_dump</span>(<span>$list</span>);取得这个分割的结果。
和想象不同,结果居然是这样:
<span>array</span>(3<span>) {
[</span>0]=>
<span>string</span>(4) "<span>abc?
[1]=>
string(0) </span>""<span>
[2]=>
string(2) </span>"bc"<span>
}</span>出现了乱码,而且莫名其妙的出现了一个空元素。
究其原因,原来这个字“弢”的gbk编码是8f7c,而|的ASCII是7c,这样explode就把弢的第二ASCII作为|切割了。
既然是双字节的问题,我们用mbstring解决好了。
可惜,php并没有mb_explode这种函数,找了找,找到一个mb_split。
<span>array</span> mb_split ( <span>string</span> <span>$pattern</span> , <span>string</span> <span>$string</span> [, int <span>$limit</span> = -1 ] )
没有声明编码的地方。仔细一看,他是通过mb_regex_encoding声明编码的。
于是写出以下的代码:
<span>1</span> mb_regex_encoding('gbk'<span>);
</span><span>2</span> <span>$list</span> = mb_split('\|', 'abc弢|bc'<span>);
</span><span>3</span> <span>var_dump</span>(<span>$list</span>);结果php报错,mb_regex_encoding不认识gbk,囧。
那就使用它认识的:
<span>1</span> mb_regex_encoding('gb2312'<span>);
</span><span>2</span> <span>$list</span> = mb_split('\|', 'abc弢|bc'<span>);
</span><span>3</span> <span>var_dump</span>(<span>$list</span>);结果:
<span>array</span>(3<span>) {
[</span>0]=>
<span>string</span>(4) "<span>abc?
[1]=>
string(0) </span>""<span>
[2]=>
string(2) </span>"bc"<span>
}</span>发现,这种方法并没有什么用处。、
至于原因?“弢”这个字居然不在GB2312的编码集里面!!!!!但是有这个字的编码集(GBK, GB18030)这个函数都不支持!!!!!
既然这个不好用,也许万能的正则表达式是ok的。于是得到以下代码:
<span>1</span> <span>var_dump</span>(<span>preg_match_all</span>('/([^\|])*/', 'abc弢|bc', <span>$matches</span><span>));
</span><span>2</span> <span>var_dump</span>(<span>$matches</span>);结果:
int(2<span>)
</span><span>array</span>(2<span>) {
[</span>0]=>
<span>array</span>(2<span>) {
[</span>0]=>
<span>string</span>(4) "<span>abc?
[1]=>
string(2) </span>"bc"<span>
}
[1]=>
array(2) {
[0]=>
string(1) </span>"?<span>
[</span>1]=>
<span>string</span>(1) "c"<span>
}
}</span>好吧,我想多了。
现在研究一下,如何用正则描述这个场景。
参考一下,鸟哥大神的博客:分割GBK中文遭遇乱码的解决。遗憾的是,正则能力比较low的我,还是想不出来合适的正则表达式(如果有想出这个正则表达式的大神们,希望可以告诉我)。
没办法,思来想去,只好用substr了:
<span> 1</span> <span>function</span> mb_explode(<span>$delimiter</span>, <span>$string</span>, <span>$encoding</span> = <span>null</span><span>){
</span><span> 2</span> <span>$list</span> = <span>array</span><span>();
</span><span> 3</span> <span>is_null</span>(<span>$encoding</span>) && <span>$encoding</span> =<span> mb_internal_encoding();
</span><span> 4</span> <span>$len</span> = mb_strlen(<span>$delimiter</span>, <span>$encoding</span><span>);
</span><span> 5</span> <span>while</span>(<span>false</span> !== (<span>$idx</span> = mb_strpos(<span>$string</span>, <span>$delimiter</span>, 0, <span>$encoding</span><span>))){
</span><span> 6</span> <span>$list</span>[] = mb_substr(<span>$string</span>, 0, <span>$idx</span>, <span>$encoding</span><span>);
</span><span> 7</span> <span>$string</span> = mb_substr(<span>$string</span>, <span>$idx</span> + <span>$len</span>, <span>null</span>, <span>$encoding</span><span>);
</span><span> 8</span> <span> }
</span><span> 9</span> <span>$list</span>[] = <span>$string</span><span>;
</span><span>10</span> <span>return</span> <span>$list</span><span>;
</span><span>11</span> } 测试代码:
<span>1</span> <span>$a</span> = 'abc弢|bc'<span>;
</span><span>2</span>
<span>3</span> <span>var_dump</span>(mb_explode('|', <span>$a</span>, 'gbk'<span>));
</span><span>4</span> <span>var_dump</span>(mb_explode('bc', <span>$a</span>, 'gbk'<span>));
</span><span>5</span> <span>var_dump</span>(mb_explode('弢', <span>$a</span>, 'gbk'));结果:
<span>array</span>(2<span>) {
[</span>0]=>
<span>string</span>(5) "abc弢"<span>
[</span>1]=>
<span>string</span>(2) "bc"<span>
}
</span><span>array</span>(3<span>) {
[</span>0]=>
<span>string</span>(1) "a"<span>
[</span>1]=>
<span>string</span>(3) "弢|"<span>
[</span>2]=>
<span>string</span>(0) ""<span>
}
</span><span>array</span>(2<span>) {
[</span>0]=>
<span>string</span>(3) "abc"<span>
[</span>1]=>
<span>string</span>(3) "|bc"<span>
}</span>这样就可以得到正确的结果了。
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
C++高性能并发应用_C++如何开发性能关键应用
Java AI集成Deep Java Library_Java怎么集成AI模型部署
Golang后端API开发_Golang如何高效开发后端和API
Python异步并发改进_Python异步编程有哪些新改进
C++系统编程内存管理_C++系统编程怎么与Rust竞争内存安全
Java GraalVM原生镜像构建_Java怎么用GraalVM构建高效原生镜像
Python FastAPI异步API开发_Python怎么用FastAPI构建异步API
C++现代C++20/23/26特性_现代C++有哪些新标准特性如modules和coroutines
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号