
不管是日常业务数据处理中,还是数据库的导入导出,都可能遇到需要处理大量数据的插入。插入的方式和数据库引擎都会对插入速度造成影响,这篇文章旨在从理论和实践上对各种方法进行分析和比较,方便以后应用中插入方法的选择。
插入分析
MySQL中插入一个记录需要的时间由下列因素组成,其中的数字表示大约比例:
如果我们每插入一条都执行一个SQL语句,那么我们需要执行除了连接和关闭之外的所有步骤N次,这样是非常耗时的,优化的方式有一下几种:
每种方式执行的性能如下。
Innodb引擎
InnoDB 给 MySQL 提供了具有事务(commit)、回滚(rollback)和崩溃修复能力(crash recovery capabilities)的事务安全(transaction-safe (ACID compliant))型表。InnoDB 提供了行锁(locking on row level)以及外键约束(FOREIGN KEY constraints)。
InnoDB 的设计目标是处理大容量数据库系统,它的 CPU 利用率是其它基于磁盘的关系数据库引擎所不能比的。在技术上,InnoDB 是一套放在 MySQL 后台的完整数据库系统,InnoDB 在主内存中建立其专用的缓冲池用于高速缓冲数据和索引。
测试环境
Macbook Air 12mid apache2.2.26 php5.5.10 mysql5.6.16
总数100W条数据
插入完后数据库大小38.6MB(无索引),46.8(有索引)
网趣购物系统静态版支持网站一键静态生成,采用动态进度条模式生成静态,生成过程更加清晰明确,商品管理上增加淘宝数据包导入功能,与淘宝数据同步更新!采用领先的AJAX+XML相融技术,速度更快更高效!系统进行了大量的实用性更新,如优化核心算法、增加商品图片批量上传、谷歌地图浏览插入等,静态版独特的生成算法技术使静态生成过程可随意掌控,从而可以大大减轻服务器的负担,结合多种强大的SEO优化方式于一体,使
0
MyIASM引擎
MyISAM 是MySQL缺省存贮引擎。设计简单,支持全文搜索。
测试环境
Macbook Air 12mid apache2.2.26 php5.5.10 mysql5.6.16
总数100W条数据
插入完后数据库大小19.1MB(无索引),38.6(有索引)
总结
我测试的数据量不是很大,不过可以大概了解这几种插入方式对于速度的影响,最快的必然是Load Data方式。这种方式相对比较麻烦,因为涉及到了写文件,但是可以兼顾内存和速度。
测试代码
<ol class="dp-j"><li class="alt"><span><span><?php </span></span></li><li><span>$dsn = <span class="string">'mysql:host=localhost;dbname=test'</span><span>; </span></span></li><li class="alt"><span>$db = <span class="keyword">new</span><span> PDO($dsn,</span><span class="string">'root'</span><span>,</span><span class="string">''</span><span>,array(PDO::ATTR_PERSISTENT => </span><span class="keyword">true</span><span>)); </span></span></li><li><span><span class="comment">//删除上次的插入数据</span><span> </span></span></li><li class="alt"><span>$db->query(<span class="string">'delete from `test`'</span><span>); </span></span></li><li><span><span class="comment">//开始计时</span><span> </span></span></li><li class="alt"><span>$start_time = time(); </span></li><li><span>$sum = <span class="number">1000000</span><span>; </span></span></li><li class="alt"><span><span class="comment">// 测试选项</span><span> </span></span></li><li><span>$num = <span class="number">1</span><span>; </span></span></li><li class="alt"><span> </span></li><li><span><span class="keyword">if</span><span> ($num == </span><span class="number">1</span><span>){ </span></span></li><li class="alt"><span><span class="comment">// 单条插入</span><span> </span></span></li><li><span><span class="keyword">for</span><span>($i = </span><span class="number">0</span><span>; $i < $sum; $i++){ </span></span></li><li class="alt"><span>$db->query(<span class="string">"insert into `test` (`id`,`name`) values ($i,'tsetssdf')"</span><span>); </span></span></li><li><span>} </span></li><li class="alt"><span>} elseif ($num == <span class="number">2</span><span>) { </span></span></li><li><span><span class="comment">// 批量插入,为了不超过max_allowed_packet,选择每10万插入一次</span><span> </span></span></li><li class="alt"><span><span class="keyword">for</span><span> ($i = </span><span class="number">0</span><span>; $i < $sum; $i++) { </span></span></li><li><span><span class="keyword">if</span><span> ($i == $sum - </span><span class="number">1</span><span>) { </span><span class="comment">//最后一次</span><span> </span></span></li><li class="alt"><span><span class="keyword">if</span><span> ($i%</span><span class="number">100000</span><span> == </span><span class="number">0</span><span>){ </span></span></li><li><span>$values = <span class="string">"($i, 'testtest')"</span><span>; </span></span></li><li class="alt"><span>$db->query(<span class="string">"insert into `test` (`id`, `name`) values $values"</span><span>); </span></span></li><li><span>} <span class="keyword">else</span><span> { </span></span></li><li class="alt"><span>$values .= <span class="string">",($i, 'testtest')"</span><span>; </span></span></li><li><span>$db->query(<span class="string">"insert into `test` (`id`, `name`) values $values"</span><span>); </span></span></li><li class="alt"><span>} </span></li><li><span><span class="keyword">break</span><span>; </span></span></li><li class="alt"><span>} </span></li><li><span><span class="keyword">if</span><span> ($i%</span><span class="number">100000</span><span> == </span><span class="number">0</span><span>) { </span><span class="comment">//平常只有在这个情况下才插入</span><span> </span></span></li><li class="alt"><span><span class="keyword">if</span><span> ($i == </span><span class="number">0</span><span>){ </span></span></li><li><span>$values = <span class="string">"($i, 'testtest')"</span><span>; </span></span></li><li class="alt"><span>} <span class="keyword">else</span><span> { </span></span></li><li><span>$db->query(<span class="string">"insert into `test` (`id`, `name`) values $values"</span><span>); </span></span></li><li class="alt"><span>$values = <span class="string">"($i, 'testtest')"</span><span>; </span></span></li><li><span>} </span></li><li class="alt"><span>} <span class="keyword">else</span><span> { </span></span></li><li><span>$values .= <span class="string">",($i, 'testtest')"</span><span>; </span></span></li><li class="alt"><span>} </span></li><li><span>} </span></li><li class="alt"><span>} elseif ($num == <span class="number">3</span><span>) { </span></span></li><li><span><span class="comment">// 事务插入</span><span> </span></span></li><li class="alt"><span>$db->beginTransaction(); </span></li><li><span><span class="keyword">for</span><span>($i = </span><span class="number">0</span><span>; $i < $sum; $i++){ </span></span></li><li class="alt"><span>$db->query(<span class="string">"insert into `test` (`id`,`name`) values ($i,'tsetssdf')"</span><span>); </span></span></li><li><span>} </span></li><li class="alt"><span>$db->commit(); </span></li><li><span>} elseif ($num == <span class="number">4</span><span>) { </span></span></li><li class="alt"><span><span class="comment">// 文件load data</span><span> </span></span></li><li><span>$filename = dirname(__FILE__).<span class="string">'/test.sql'</span><span>; </span></span></li><li class="alt"><span>$fp = fopen($filename, <span class="string">'w'</span><span>); </span></span></li><li><span><span class="keyword">for</span><span>($i = </span><span class="number">0</span><span>; $i < $sum; $i++){ </span></span></li><li class="alt"><span>fputs($fp, <span class="string">"$i,'testtest'\r\n"</span><span>); </span></span></li><li><span>} </span></li><li class="alt"><span>$db->exec(<span class="string">"load data infile '$filename' into table test fields terminated by ','"</span><span>); </span></span></li><li><span>} </span></li><li class="alt"><span> </span></li><li><span>$end_time = time(); </span></li><li class="alt"><span>echo <span class="string">"总耗时"</span><span>, ($end_time - $start_time), </span><span class="string">"秒\n"</span><span>; </span></span></li><li><span>echo <span class="string">"峰值内存"</span><span>, round(memory_get_peak_usage()/</span><span class="number">1000</span><span>), </span><span class="string">"KB\n"</span><span>; </span></span></li><li class="alt"><span> </span></li><li><span>?> </span></li></ol>以上就是MySQL大量数据插入各种方法性能分析与比较,希望能帮到你。
博文出处:http://yansu.org/2014/04/16/insert-large-number-of-data-in-mysql.html
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号