PHP数组高级合并技巧：根据共同键收集并整合重复值

心靈之曲

发布时间：2025-10-01 15:31:00

459人浏览过

来源于php中文网

原创

PHP数组高级合并技巧：根据共同键收集并整合重复值

本教程详细讲解如何在PHP中将两个数组根据共同的键进行合并，特别处理源数组中存在重复键值的情况。通过遍历目标数组，并利用array_column和array_keys高效查找并收集源数组中所有匹配的字段值，最终将这些值作为子数组整合到目标数组的对应记录中，同时提供性能优化方案。

问题描述与目标

在实际开发中，我们经常会遇到需要将不同数据源的数据进行整合的情况。一个常见的场景是，我们有两个关联数组，它们通过一个共同的标识符（例如epid）进行关联。其中一个数组可能包含多个具有相同标识符的记录，而我们希望将这些重复记录中的特定字段（例如hash）收集起来，并合并到另一个数组中对应标识符的记录下，形成一个子数组。

例如，我们有以下两个数组：

源数组 (Array 1)：包含多个epid和hash对，epid可能重复。

$sourceArray = [
    ["epid" => "123", "hash" => "xxxxxxA"],
    ["epid" => "456", "hash" => "xxxxxxB"],
    ["epid" => "789", "hash" => "xxxxxxC"],
    ["epid" => "123", "hash" => "xxxxxxD"],
    ["epid" => "123", "hash" => "xxxxxxE"],
];

目标数组 (Array 2)：包含epid和name，每个epid通常是唯一的。

$targetArray = [
    ["epid" => "123", "name" => "This is a title"],
    ["epid" => "456", "name" => "This is a title"],
    ["epid" => "789", "name" => "This is a title"]
];

我们的目标是将sourceArray中所有与targetArray中epid匹配的hash值收集起来，并作为一个hash数组添加到targetArray的对应记录中。期望的输出结果如下：

立即学习“PHP免费学习笔记（深入）”；

[
    ["epid" => "123", "name" => "This is a title", "hash" => ["xxxxxxA", "xxxxxxD", "xxxxxxE"]],
    ["epid" => "456", "name" => "This is a title", "hash" => ["xxxxxxB"]],
    ["epid" => "789", "name" => "This is a title", "hash" => ["xxxxxxC"]]
]

核心解决方案

解决这个问题的基本思路是：遍历目标数组的每一个元素，对于每个元素，在源数组中查找所有epid匹配的记录，然后提取这些匹配记录的hash值，并将它们聚合到一个新的hash字段中。

PHP 代码实现

以下是使用PHP实现上述逻辑的代码：

Action Figure AI

借助Action Figure AI的先进技术，瞬间将照片转化为定制动作人偶。

下载

 "123", "hash" => "xxxxxxA"],
    ["epid" => "456", "hash" => "xxxxxxB"],
    ["epid" => "789", "hash" => "xxxxxxC"],
    ["epid" => "123", "hash" => "xxxxxxD"],
    ["epid" => "123", "hash" => "xxxxxxE"],
];

$targetArray = [
    ["epid" => "123", "name" => "This is a title"],
    ["epid" => "456", "name" => "This is a title"],
    ["epid" => "789", "name" => "This is a title"]
];

// 遍历目标数组并整合数据
foreach ($targetArray as $index => $item) {
    // 1. 从源数组中提取所有 'epid' 列的值
    // 2. 查找这些 'epid' 值中与当前目标项 'epid' 匹配的所有键（索引）
    $matchingKeys = array_keys(array_column($sourceArray, 'epid'), $item["epid"]);

    // 初始化当前目标项的 'hash' 字段为一个空数组，以确保后续可以添加元素
    $targetArray[$index]["hash"] = [];

    // 遍历所有匹配的键，将对应的 'hash' 值添加到目标项的 'hash' 数组中
    foreach ($matchingKeys as $key) {
        $targetArray[$index]["hash"][] = $sourceArray[$key]["hash"];
    }
}

// 输出整合后的结果
echo "";
print_r($targetArray);
echo "";

?>代码解析


foreach ($targetArray as $index => $item): 我们首先遍历targetArray。使用$index可以让我们直接通过索引修改原始$targetArray中的元素。

array_column($sourceArray, 'epid'): 这个函数用于从多维数组中提取某一列的值。在这里，它会从$sourceArray中提取所有epid的值，生成一个一维数组，例如 ["123", "456", "789", "123", "123"]。

array_keys(array_column($sourceArray, 'epid'), $item["epid"]): array_keys()函数用于在数组中搜索给定值，并返回所有匹配的键。结合array_column()，它能够找出$sourceArray中所有epid与当前$item["epid"]匹配的原始索引。例如，当$item["epid"]是"123"时，$matchingKeys将得到 [0, 3, 4]。

$targetArray[$index]["hash"] = [];: 在开始收集hash值之前，我们为当前$targetArray项创建一个空的hash数组。这确保了即使没有匹配的hash值，hash字段也会存在（只是一个空数组），保持数据结构的一致性。

foreach ($matchingKeys as $key): 遍历上一步找到的所有匹配索引。

$targetArray[$index]["hash"][] = $sourceArray[$key]["hash"];: 将$sourceArray中对应索引$key的hash值添加到当前$targetArray项的hash数组中。[]语法用于向数组末尾添加元素。

优化与注意事项
上述解决方案对于中小型数据集是有效且易于理解的。然而，对于非常大的数据集，尤其当$sourceArray和$targetArray都非常庞大时，嵌套循环中重复调用array_column和array_keys可能会导致性能问题，因为它们每次迭代都需要扫描整个$sourceArray。
性能优化：预构建查找表
为了提高效率，我们可以预先对$sourceArray进行处理，构建一个基于epid的哈希查找表。这样，在遍历$targetArray时，每次查找hash值都只需要常数时间（O(1)）而不是线性时间（O(N)）。
 "123", "hash" => "xxxxxxA"],
    ["epid" => "456", "hash" => "xxxxxxB"],
    ["epid" => "789", "hash" => "xxxxxxC"],
    ["epid" => "123", "hash" => "xxxxxxD"],
    ["epid" => "123", "hash" => "xxxxxxE"],
];

$targetArray = [
    ["epid" => "123", "name" => "This is a title"],
    ["epid" => "456", "name" => "This is a title"],
    ["epid" => "789", "name" => "This is a title"]
];

// 优化方案：预构建哈希查找表
$hashLookup = [];
foreach ($sourceArray as $item) {
    // 将所有相同epid的hash值收集到一个子数组中
    $hashLookup[$item['epid']][] = $item['hash'];
}

// 遍历目标数组，使用查找表进行高效合并
foreach ($targetArray as $index => $item) {
    $epid = $item['epid'];
    if (isset($hashLookup[$epid])) {
        // 如果在查找表中找到匹配的epid，则直接赋值
        $targetArray[$index]['hash'] = $hashLookup[$epid];
    } else {
        // 如果没有匹配的hash，则设置为空数组
        $targetArray[$index]['hash'] = [];
    }
}

// 输出整合后的结果
echo "";
print_r($targetArray);
echo "";

?>优化代码解析：


构建查找表 ($hashLookup)：
我们首先遍历$sourceArray一次。
对于每个元素，我们以其epid作为键，将hash值添加到$hashLookup中对应的数组。如果epid是第一次出现，会自动创建一个新的数组。
例如，$hashLookup会变成：[
    "123" => ["xxxxxxA", "xxxxxxD", "xxxxxxE"],
    "456" => ["xxxxxxB"],
    "789" => ["xxxxxxC"]
]
这个步骤的时间复杂度是 O(N)，其中 N 是$sourceArray的元素数量。



遍历目标数组并合并：
然后，我们遍历$targetArray。
对于每个目标项的epid，我们直接通过isset($hashLookup[$epid])检查查找表中是否存在对应的hash列表。
如果存在，直接将$hashLookup[$epid]的值赋给$targetArray[$index]['hash']。
如果不存在，则将$targetArray[$index]['hash']设置为空数组。
这个步骤的时间复杂度是 O(M)，其中 M 是$targetArray的元素数量。每次查找都是 O(1)。



通过这种优化，总的时间复杂度从 O(M * N) 降低到 O(N + M)，显著提高了处理大型数据集的性能。
数据完整性与默认值
在合并过程中，如果targetArray中的某个epid在sourceArray中没有对应的hash值，原方案会跳过该epid的hash字段赋值，导致该字段可能不存在。在优化方案中，我们明确地将其初始化为空数组[]。这是一种良好的实践，可以确保最终数据结构的统一性，避免后续处理时因字段缺失而引发错误。
总结
本教程介绍了两种在PHP中合并关联数组并处理重复键值的方法。第一种方法利用array_column和array_keys直接在循环中查找并聚合数据，代码简洁易懂，适用于中小型数据集。第二种方法通过预先构建哈希查找表来优化性能，将时间复杂度从平方级降低到线性级，更适合处理大型数据集，是更专业的解决方案。在实际应用中，应根据数据规模和性能要求选择合适的合并策略，并注意数据结构的统一性。