在数据处理中,我们经常会遇到需要将分散在不同列表中的相关信息聚合到一起的场景。例如,你可能有一个主数据列表,以及多个辅助列表,这些辅助列表包含了主数据中某些字段的补充或“原始”值。我们的目标是根据共同的键(如name或address)将这些辅助信息合并到主数据项中。
假设我们有以下三组数据:
listA = [ { "name": "name sample 1", "original_name" : "original name sample 1", }, { "name": "name sample 2", "original_name" : "original name sample 2", } ] listB = [ { "address": "address sample 1", "original_address" : "original address sample 1", }, { "address": "address sample 2", "original_address" : "original address sample 2", } ] dataList = [ { "id": "1", "created_at": "date 1", "name": "name sample 1", "address": "address sample 1", }, { "id": "2", "created_at": "date 2", "name": "name sample 2", "address": "address sample 2", } ]
我们期望的最终结果finalList应该在dataList的每个字典中添加original_name和original_address字段,这些值分别来自listA和listB,通过匹配name和address键来获取。
finalList = [ { "id": "1", "created_at": "date 1", "name": "name sample 1", "original_name" : "original name sample 1", "address": "address sample 1", "original_address" : "original address sample 1", }, { "id": "2", "created_at": "date 2", "name": "name sample 2", "original_name" : "original name sample 2", "address": "address sample 2", "original_address" : "original address sample 2", } ]
一种直观的方法是使用嵌套循环。首先,我们复制一份dataList以避免修改原始数据。然后,遍历listA和listB中的每个条目,并在finalList中查找匹配的项,找到后更新其属性。
立即学习“Python免费学习笔记(深入)”;
from copy import deepcopy listA = [ {"name": "name sample 1", "original_name" : "original name sample 1"}, {"name": "name sample 2", "original_name" : "original name sample 2"} ] listB = [ {"address": "address sample 1", "original_address" : "original address sample 1"}, {"address": "address sample 2", "original_address" : "original address sample 2"} ] dataList = [ {"id": "1", "created_at": "date 1", "name": "name sample 1", "address": "address sample 1"}, {"id": "2", "created_at": "date 2", "name": "name sample 2", "address": "address sample 2"} ] finalList = deepcopy(dataList) # 使用 deepcopy 确保不影响原始 dataList # 合并 listA 和 listB,以便一次性处理 # 这种方式会遍历 finalList 多次,效率较低,但逻辑直观 for entry in listA + listB: if "name" in entry: # 处理来自 listA 的数据 for data_item in finalList: if data_item.get('name') == entry['name']: # 使用 .get() 避免 KeyError data_item['original_name'] = entry['original_name'] elif "address" in entry: # 处理来自 listB 的数据 for data_item in finalList: if data_item.get('address') == entry['address']: data_item['original_address'] = entry['original_address'] print("原始 dataList:", dataList) print("合并后的 finalList:", finalList)
代码解析:
注意事项: 这种方法在数据量较小时易于理解和实现。然而,其时间复杂度较高。如果dataList有N个元素,listA有M个元素,listB有P个元素,那么查找和更新original_name的操作是M * N,查找和更新original_address的操作是P * N。总时间复杂度近似为 O((M+P)*N),在大规模数据处理时效率低下。
为了提高效率,特别是当listA、listB或dataList的数据量较大时,我们可以利用哈希表的O(1)平均查找时间特性。核心思想是将listA和listB预处理成字典(哈希表),以name和address作为键,方便快速查找对应的original_name和original_address。
from copy import deepcopy listA = [ {"name": "name sample 1", "original_name" : "original name sample 1"}, {"name": "name sample 2", "original_name" : "original name sample 2"} ] listB = [ {"address": "address sample 1", "original_address" : "original address sample 1"}, {"address": "address sample 2", "original_address" : "original address sample 2"} ] dataList = [ {"id": "1", "created_at": "date 1", "name": "name sample 1", "address": "address sample 1"}, {"id": "2", "created_at": "date 2", "name": "name sample 2", "address": "address sample 2"} ] # 1. 预处理 listA 和 listB 为字典,以便快速查找 name_map = {item['name']: item['original_name'] for item in listA} address_map = {item['address']: item['original_address'] for item in listB} # 2. 创建 finalList 的副本 finalList = deepcopy(dataList) # 3. 遍历 finalList,根据映射关系添加新字段 for data_item in finalList: # 查找并添加 original_name name_key = data_item.get('name') if name_key in name_map: data_item['original_name'] = name_map[name_key] # else: 可以选择处理未找到匹配的情况,例如设置默认值或跳过 # 查找并添加 original_address address_key = data_item.get('address') if address_key in address_map: data_item['original_address'] = address_map[address_key] # else: 可以选择处理未找到匹配的情况 print("原始 dataList:", dataList) print("优化合并后的 finalList:", finalList)
代码解析:
性能优势: 这种优化方法的总时间复杂度为O(M + P + N),远优于嵌套循环的O((M+P)*N),尤其是在数据量大时,性能提升显著。
在实际应用中,除了上述两种方法,还需要考虑一些额外因素:
本教程介绍了在Python中根据键值匹配合并多个列表数据字典的两种主要方法:
在实际开发中,推荐使用字典预处理的优化方法,因为它在处理大量数据时能提供更好的性能。同时,务必考虑数据中键的唯一性、缺失数据处理以及内存消耗等因素,以构建健壮且高效的数据处理流程。
以上就是Python中基于键值匹配合并多源列表数据的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号