在数据处理和集成场景中,我们经常需要将分散在不同数据源中的信息合并到一个统一的结构中。特别是在处理列表嵌套字典的数据格式时,根据特定键值进行匹配并扩展现有数据是一项常见的操作。本教程将深入探讨如何在Python中高效地实现这一目标。
假设我们有以下三个列表,每个列表都包含一系列字典:
listA = [ {"name": "name sample 1", "original_name" : "original name sample 1"}, {"name": "name sample 2", "original_name" : "original name sample 2"}, # ... 更多数据 ]
listB = [ {"address": "address sample 1", "original_address" : "original address sample 1"}, {"address": "address sample 2", "original_address" : "original address sample 2"}, # ... 更多数据 ]
dataList = [ {"id": "1", "created_at": "date 1", "name": "name sample 1", "address": "address sample 1"}, {"id": "2", "created_at": "date 2", "name": "name sample 2", "address": "address sample 2"}, # ... 更多数据 ]
我们的目标是创建一个新的列表 finalList,它基于 dataList 的内容,并通过以下规则进行数据扩展:
最终 finalList 期望的结构如下:
立即学习“Python免费学习笔记(深入)”;
finalList = [ { "id": "1", "created_at": "date 1", "name": "name sample 1", "original_name" : "original name sample 1", "address": "address sample 1", "original_address" : "original address sample 1", }, # ... ]
这种方法直观且易于理解,适用于数据量不大的场景。其核心思想是遍历目标列表的每个元素,然后分别遍历源列表以查找匹配项并更新数据。
from copy import deepcopy listA = [ {"name": "name sample 1", "original_name" : "original name sample 1"}, {"name": "name sample 2", "original_name" : "original name sample 2"}, ] listB = [ {"address": "address sample 1", "original_address" : "original address sample 1"}, {"address": "address sample 2", "original_address" : "original address sample 2"}, ] dataList = [ {"id": "1", "created_at": "date 1", "name": "name sample 1", "address": "address sample 1"}, {"id": "2", "created_at": "date 2", "name": "name sample 2", "address": "address sample 2"}, ] # 1. 创建dataList的深拷贝,避免修改原始数据 finalList = deepcopy(dataList) # 2. 遍历listA和listB中的所有条目 for entry in listA + listB: # 3. 根据条目中存在的键进行匹配 if "name" in entry: # 4. 遍历finalList,查找匹配的name for data_item in finalList: if data_item.get('name') == entry['name']: data_item['original_name'] = entry['original_name'] # 找到匹配后可以跳出内层循环,如果name是唯一的 # break elif "address" in entry: # 5. 遍历finalList,查找匹配的address for data_item in finalList: if data_item.get('address') == entry['address']: data_item['original_address'] = entry['original_address'] # 找到匹配后可以跳出内层循环,如果address是唯一的 # break print("--- 原始 dataList ---") print(dataList) print("\n--- 合并后的 finalList ---") print(finalList)
为了提高数据量较大时的性能,我们可以利用哈希表(Python中的字典)进行 O(1) 平均时间复杂度的查找。这种方法的核心思想是预先将 listA 和 listB 转换为查找字典,然后只需遍历 dataList 一次即可完成数据扩展。
from copy import deepcopy listA = [ {"name": "name sample 1", "original_name" : "original name sample 1"}, {"name": "name sample 2", "original_name" : "original name sample 2"}, ] listB = [ {"address": "address sample 1", "original_address" : "original address sample 1"}, {"address": "address sample 2", "original_address" : "original address sample 2"}, ] dataList = [ {"id": "1", "created_at": "date 1", "name": "name sample 1", "address": "address sample 1"}, {"id": "2", "created_at": "date 2", "name": "name sample 2", "address": "address sample 2"}, ] # 1. 构建查找字典 name_map = {item['name']: item['original_name'] for item in listA} address_map = {item['address']: item['original_address'] for item in listB} # 2. 创建dataList的深拷贝 finalList = deepcopy(dataList) # 3. 遍历finalList并使用查找字典进行更新 for data_item in finalList: # 查找并添加 original_name name_key = data_item.get('name') if name_key in name_map: data_item['original_name'] = name_map[name_key] # 查找并添加 original_address address_key = data_item.get('address') if address_key in address_map: data_item['original_address'] = address_map[address_key] print("--- 原始 dataList ---") print(dataList) print("\n--- 合并后的 finalList (优化版) ---") print(finalList)
本教程介绍了两种在Python中合并和扩展列表字典数据的方法:基于嵌套循环的直接合并和基于哈希映射的优化合并。
在实际开发中,根据您的数据规模、性能要求以及对键唯一性的假设,选择最适合的方案至关重要。通常,推荐优先考虑哈希映射的优化方法,因为它提供了更好的可伸缩性和性能。
以上就是Python中基于键值匹配的多列表字典数据合并与扩展的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号