
本教程详细介绍了如何使用java dom解析器处理包含多层级和关联数据的xml文件。文章首先纠正了getelementsbytagname全局搜索的常见误区,并演示了如何通过限定父节点范围进行精确查找。随后,教程深入探讨了如何利用java对象和map结构聚合来自不同xml节点的数据,实现基于关联id的统一输出,从而有效管理和展示复杂xml数据。
理解XML结构与Java DOM解析基础
在处理复杂的XML数据时,尤其当数据分布在不同的层级并存在关联时,使用Java的Document Object Model (DOM) 解析器是一种常见且有效的方法。DOM解析器将整个XML文档加载到内存中,并将其表示为一个树形结构,开发者可以通过遍历这棵树来访问和操作数据。
考虑以下员工信息XML结构,它包含员工列表、职位详情和员工联系信息三个主要类别,并通过ref属性进行关联:
Andrei Rus 23 Junior Developer Java 1 AndreiR Timisoara 1999 0
我们的目标是解析这些数据,并最终以统一的格式输出每个员工的所有关联信息。
DOM解析器初始化
在使用DOM解析XML之前,需要进行一些标准的初始化步骤:
立即学习“Java免费学习笔记(深入)”;
- 创建DocumentBuilderFactory实例:这是创建DocumentBuilder的工厂。
- 创建DocumentBuilder实例:用于解析XML文档。
- 解析XML文件:将XML文件解析成一个Document对象,该对象代表了整个XML文档的DOM树。
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
import java.util.ArrayList;
import java.util.List;
public class XmlParserTutorial {
public static void main(String[] args) {
try {
File xmlDoc = new File("employees.xml"); // 确保XML文件存在于项目根目录或指定路径
DocumentBuilderFactory dbFact = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuild = dbFact.newDocumentBuilder();
Document doc = dBuild.parse(xmlDoc);
// 可选:规范化XML文档,合并相邻的文本节点
doc.getDocumentElement().normalize();
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
System.out.println("-----------------------------------------------------------------------------");
// 后续解析逻辑将在此处添加
// ...
} catch (Exception e) {
e.printStackTrace(); // 打印异常堆栈,便于调试
}
}
}精确获取节点:避免getElementsByTagName的全局搜索陷阱
Document.getElementsByTagName(tagName)方法会从整个文档的根节点开始,全局搜索所有匹配给定标签名的元素。这可能导致一些非预期的结果。例如,如果XML根元素本身包含与某个子元素相同的标签名,或者文档中存在多个同名但层级不同的元素,全局搜索可能会返回多余或不准确的节点。
为了避免这种情况,我们应该在更具体的父元素上下文中调用getElementsByTagName,从而限定搜索范围。
错误示例(可能包含根元素或非直接子元素):
// 假设根元素也是"employee",或者其他地方有"employee"标签
NodeList nList = doc.getElementsByTagName("employee"); // 可能返回多于预期的结果正确做法:限定搜索范围
对于employee_list类别,我们应该首先找到employee_list元素,然后在该元素下搜索employee:
// 获取employee_list节点
NodeList employeeListNodes = doc.getElementsByTagName("employee_list");
Element employeeListElement = (Element) employeeListNodes.item(0); // 假设只有一个employee_list
// 在employee_listElement下搜索employee节点
NodeList employeeNodes = employeeListElement.getElementsByTagName("employee");
System.out.println("Total employees found: " + employeeNodes.getLength());同样,对于position_details和employee_info,也应采用类似策略:
// 获取position_details节点,并在其下搜索position
NodeList positionDetailsNodes = doc.getElementsByTagName("position_details");
Element positionDetailsElement = (Element) positionDetailsNodes.item(0);
NodeList positionNodes = positionDetailsElement.getElementsByTagName("position");
System.out.println("Total positions found: " + positionNodes.getLength());
// 获取employee_info节点,并在其下搜索detail
NodeList employeeInfoNodes = doc.getElementsByTagName("employee_info");
Element employeeInfoElement = (Element) employeeInfoNodes.item(0);
NodeList detailNodes = employeeInfoElement.getElementsByTagName("detail");
System.out.println("Total details found: " + detailNodes.getLength());数据聚合:构建关联的Java对象
为了实现按人员分组的输出,我们需要将来自不同XML部分的关联数据整合到一个Java对象中。这可以通过以下步骤完成:
- 定义数据模型(POJO):创建一个Java类来表示一个完整的员工记录。
- 预解析辅助数据:将position_details和employee_info解析到Map中,以便通过ID快速查找。
- 遍历主数据并关联:遍历employee_list中的每个employee,根据其ref属性从Map中查找并填充关联数据。
1. 定义EmployeeRecord数据模型
// EmployeeRecord.java
class EmployeeRecord {
private String id;
private String firstname;
private String lastname;
private String age;
private String role;
private String skillName;
private String experience;
private String username;
private String residence;
private String yearOfBirth;
private String phone;
// 构造函数、Getter和Setter方法
public EmployeeRecord() {}
// 省略所有getter和setter以保持代码简洁,实际开发中应包含
public void setId(String id) { this.id = id; }
public String getId() { return id; }
public void setFirstname(String firstname) { this.firstname = firstname; }
public String getFirstname() { return firstname; }
public void setLastname(String lastname) { this.lastname = lastname; }
public String getLastname() { return lastname; }
public void setAge(String age) { this.age = age; }
public String getAge() { return age; }
public void setRole(String role) { this.role = role; }
public String getRole() { return role; }
public void setSkillName(String skillName) { this.skillName = skillName; }
public String getSkillName() { return skillName; }
public void setExperience(String experience) { this.experience = experience; }
public String getExperience() { return experience; }
public void setUsername(String username) { this.username = username; }
public String getUsername() { return username; }
public void setResidence(String residence) { this.residence = residence; }
public String getResidence() { return residence; }
public void setYearOfBirth(String yearOfBirth) { this.yearOfBirth = yearOfBirth; }
public String getYearOfBirth() { return yearOfBirth; }
public void setPhone(String phone) { this.phone = phone; }
public String getPhone() { return phone; }
@Override
public String toString() {
return "Person ID: " + id + "\n" +
"First Name: " + firstname + "\n" +
"Last Name: " + lastname + "\n" +
"Age: " + age + "\n" +
"Role: " + role + "\n" +
"Skill Name: " + skillName + "\n" +
"Experience: " + experience + "\n" +
"Username: " + username + "\n" +
"Residence: " + residence + "\n" +
"Year of Birth: " + yearOfBirth + "\n" +
"Phone: " + phone + "\n" +
"--------------------------------------------------------------------------";
}
}2. 预解析position_details和employee_info到Map
在XmlParserTutorial的main方法中,在解析employee_list之前,先解析其他两个辅助类别:
// ... (之前的DOM初始化代码) ... // 存储职位详情的Map,键为position ID MappositionDetailsMap = new HashMap<>(); NodeList positionDetailsNodes = doc.getElementsByTagName("position_details"); if (positionDetailsNodes.getLength() > 0) { Element positionDetailsElement = (Element) positionDetailsNodes.item(0); NodeList positionNodes = positionDetailsElement.getElementsByTagName("position"); for (int i = 0; i < positionNodes.getLength(); i++) { Node node = positionNodes.item(i); if (node.getNodeType() == Node.ELEMENT_NODE) { Element positionElement = (Element) node; positionDetailsMap.put(positionElement.getAttribute("ID"), positionElement); } } } // 存储员工额外信息的Map,键为detail ID Map employeeInfoMap = new HashMap<>(); NodeList employeeInfoNodes = doc.getElementsByTagName("employee_info"); if (employeeInfoNodes.getLength() > 0) { Element employeeInfoElement = (Element) employeeInfoNodes.item(0); NodeList detailNodes = employeeInfoElement.getElementsByTagName("detail"); for (int i = 0; i < detailNodes.getLength(); i++) { Node node = detailNodes.item(i); if (node.getNodeType() == Node.ELEMENT_NODE) { Element detailElement = (Element) node; employeeInfoMap.put(detailElement.getAttribute("ID"), detailElement); } } } // ... (后续解析employee_list的代码) ...
3. 遍历employee_list并聚合数据
现在,我们可以遍历employee_list中的每个employee,并使用之前构建的Map来查找和关联数据。
// 存储所有完整员工记录的列表 ListallEmployeeRecords = new ArrayList<>(); NodeList employeeListNodes = doc.getElementsByTagName("employee_list"); if (employeeListNodes.getLength() > 0) { Element employeeListElement = (Element) employeeListNodes.item(0); NodeList employeeNodes = employeeListElement.getElementsByTagName("employee"); for (int i = 0; i < employeeNodes.getLength(); i++) { Node node = employeeNodes.item(i); if (node.getNodeType() == Node.ELEMENT_NODE) { Element employeeElement = (Element) node; EmployeeRecord record = new EmployeeRecord(); // 解析employee_list中的数据 record.setId(employeeElement.getAttribute("ID")); record.setFirstname(getTagValue("firstname", employeeElement)); record.setLastname(getTagValue("lastname", employeeElement)); record.setAge(getTagValue("age", employeeElement)); // 获取position-skill ref并查找关联的position详情 String positionSkillRef = employeeElement.getElementsByTagName("position-skill").item(0).getAttributes().getNamedItem("ref").getNodeValue(); Element positionElement = positionDetailsMap.get(positionSkillRef); if (positionElement != null) { record.setRole(getTagValue("role", positionElement)); record.setSkillName(getTagValue("skill_name", positionElement)); record.setExperience(getTagValue("experience", positionElement)); } // 获取detail-ref并查找关联的employee_info详情 String detailRef = employeeElement.getElementsByTagName("detail-ref").item(0).getAttributes().getNamedItem("ref").getNodeValue(); Element detailElement = employeeInfoMap.get(detailRef); if (detailElement != null) { record.setUsername(getTagValue("username", detailElement)); record.setResidence(getTagValue("residence", detailElement)); record.setYearOfBirth(getTagValue("yearOfBirth", detailElement)); record.setPhone(getTagValue("phone", detailElement)); } allEmployeeRecords.add(record); } } } // 辅助方法:安全地获取子标签的文本内容 private static String getTagValue(String tagName, Element element) { NodeList nodeList = element.getElementsByTagName(tagName); if (nodeList != null && nodeList.getLength() > 0) { Node node = nodeList.item(0); if (node != null && node.getNodeType() == Node.ELEMENT_NODE) { return node.getTextContent(); } } return ""; // 返回空字符串或null,表示未找到 } // 打印所有聚合后的员工记录 System.out.println("\n============================================================================================="); System.out.println("Aggregated Employee Records:"); System.out.println("============================================================================================="); for (EmployeeRecord record : allEmployeeRecords) { System.out.println(record); }
完整示例代码
将以上所有片段整合到一个XmlParserTutorial.java文件中,并确保EmployeeRecord.java类也在同一包或可访问的位置。
// XmlParserTutorial.java
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class XmlParserTutorial {
public static void main(String[] args) {
try {
File xmlDoc = new File("employees.xml"); // 确保XML文件存在于项目根目录或指定路径
DocumentBuilderFactory dbFact = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuild = dbFact.newDocumentBuilder();
Document doc = dBuild.parse(xmlDoc);
doc.getDocumentElement().normalize(); // 规范化XML文档
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
System.out.println("-----------------------------------------------------------------------------");
// 1. 预解析position_details到Map
Map positionDetailsMap = new HashMap<>();
NodeList positionDetailsNodes = doc.getElementsByTagName("position_details");
if (positionDetailsNodes.getLength() > 0) {
Element positionDetailsElement = (Element) positionDetailsNodes.item(0);
NodeList positionNodes = positionDetailsElement.getElementsByTagName("position");
for (int i = 0; i < positionNodes.getLength(); i++) {
Node node = positionNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element positionElement = (Element) node;
positionDetailsMap.put(positionElement.getAttribute("ID"), positionElement);
}
}
}
// 2. 预解析employee_info到Map
Map employeeInfoMap = new HashMap<>();
NodeList employeeInfoNodes = doc.getElementsByTagName("employee_info");
if (employeeInfoNodes.getLength() > 0) {
Element employeeInfoElement = (Element) employeeInfoNodes.item(0);
NodeList detailNodes = employeeInfoElement.getElementsByTagName("detail");
for (int i = 0; i < detailNodes.getLength(); i++) {
Node node = detailNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element detailElement = (Element) node;
employeeInfoMap.put(detailElement.getAttribute("ID"), detailElement);
}
}
}
// 3. 遍历employee_list并聚合数据
List allEmployeeRecords = new ArrayList<>();
NodeList employeeListNodes = doc.getElementsByTagName("employee_list");
if (employeeListNodes.getLength() > 0) {
Element employeeListElement = (Element) employeeListNodes.item(0);
NodeList employeeNodes = employeeListElement.getElementsByTagName("employee");
for (int i = 0; i < employeeNodes.getLength(); i++) {
Node node = employeeNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element employeeElement = (Element) node;
EmployeeRecord record = new EmployeeRecord();
// 解析employee_list中的数据
record.setId(employeeElement.getAttribute("ID"));
record.setFirstname(getTagValue("firstname", employeeElement));
record.setLastname(getTagValue("lastname", employeeElement));
record.setAge(getTagValue("age", employeeElement));
// 获取position-skill ref并查找关联的position详情
NodeList positionSkillList = employeeElement.getElementsByTagName("position-skill");
if (positionSkillList.getLength() > 0) {
String positionSkillRef = positionSkillList.item(0).getAttributes().getNamedItem("ref").getNodeValue();
Element positionElement = positionDetailsMap.get(positionSkillRef);
if (positionElement != null) {
record.setRole(getTagValue("role", positionElement));
record.setSkillName(getTagValue("skill_name", positionElement));
record.setExperience(getTagValue("experience", positionElement));
}
}
// 获取detail-ref并查找关联的employee_info详情
NodeList detailRefList = employeeElement.getElementsByTagName("detail-ref");
if (detailRefList.getLength() > 0) {
String detailRef = detailRefList.item(0).getAttributes().getNamedItem("ref").getNodeValue();
Element detailElement = employeeInfoMap.get(detailRef);
if (detailElement != null) {
record.setUsername(getTagValue("username", detailElement));
record.setResidence(getTagValue("residence", detailElement));
record.setYearOfBirth(getTagValue("yearOfBirth", detailElement));
record.setPhone(getTagValue("phone", detailElement));
}
}
allEmployeeRecords.add(record);
}
}
}
// 打印所有聚合后的员工记录
System.out.println("\n=============================================================================================");
System.out.println("Aggregated Employee Records:");
System.out.println("=============================================================================================");
for (EmployeeRecord record : allEmployeeRecords) {
System.out.println(record);
}
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();










