
本教程详细讲解如何使用java dom解析器高效处理多层xml文件。文章首先指出 `getelementsbytagname` 的全局搜索特性可能导致的问题,并提供了在特定父节点下进行局部查找的解决方案。接着,教程引入了面向对象的数据建模方法,通过创建pojo类来存储和关联解析出的数据,最终实现对多层xml数据的结构化、分组式输出,确保数据逻辑清晰且易于管理。
在现代软件开发中,XML文件作为一种通用的数据交换格式被广泛应用。当XML文件结构复杂,包含多层嵌套和关联数据时,如何有效地解析并提取所需信息成为一项重要任务。本教程将以一个员工信息XML文件为例,详细介绍如何利用Java DOM(Document Object Model)解析器处理这类多层结构,并实现数据的有效组织和输出。
理解多层XML结构
我们以以下XML文件为例,该文件包含员工列表(employee_list)、职位详情(position_details)和员工附加信息(employee_info)三个主要类别:
Andrei Rus 23 Ion Popescu 25 Georgiana Domide 33 Junior Developer Java 1 Developer Python 3 Senior Developer C 5 AndreiR Timisoara 1999 0 IonP Timisoara 1997 0 GeorgianaD Arad 1989 0
该XML文件的根元素是
Java DOM解析器基础
Java DOM解析器通过将整个XML文档加载到内存中,构建一个树形结构(DOM树),从而允许开发者通过导航树来访问和操作XML数据。主要涉及的类包括:
立即学习“Java免费学习笔记(深入)”;
- DocumentBuilderFactory: 用于创建 DocumentBuilder 实例。
- DocumentBuilder: 用于解析XML文件并生成 Document 对象。
- Document: 代表整个XML文档,是DOM树的根。
- Node: DOM树中的基本单元,可以是元素、属性、文本等。
- Element: Node 的一个子类型,代表XML元素。
- NodeList: 包含一组 Node 对象的列表。
getElementsByTagName 的全局搜索问题
在使用 Document.getElementsByTagName(String name) 方法时,需要特别注意其全局搜索的特性。该方法会在整个XML文档中查找所有匹配指定标签名的元素,而不仅仅是当前节点的直接子元素。
例如,在上述XML文件中,如果直接调用 doc.getElementsByTagName("employee"),它不仅会找到
局部化元素查找
为了避免全局搜索带来的问题,并确保只在特定父节点下查找子元素,我们应该先获取父节点,然后在其上下文中使用 getElementsByTagName。
以下是针对 employee_list、position_details 和 employee_info 三个类别的正确解析方法:
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
public class XmlParser {
public static void main(String[] args) {
try {
File xmlFile = new File("employees.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize(); // 规范化文档,合并相邻的文本节点
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
System.out.println("-----------------------------------------------------------------------------");
// 解析 employee_list 类别
System.out.println("Parsing Employee List:");
NodeList employeeListNodes = doc.getElementsByTagName("employee_list");
if (employeeListNodes.getLength() > 0) {
Element employeeListElement = (Element) employeeListNodes.item(0);
NodeList employees = employeeListElement.getElementsByTagName("employee");
System.out.println("Total Employees found: " + employees.getLength());
for (int i = 0; i < employees.getLength(); i++) {
Node employeeNode = employees.item(i);
if (employeeNode.getNodeType() == Node.ELEMENT_NODE) {
Element employeeElement = (Element) employeeNode;
System.out.println(" ID: " + employeeElement.getAttribute("ID"));
System.out.println(" First Name: " + getTagValue("firstname", employeeElement));
System.out.println(" Last Name: " + getTagValue("lastname", employeeElement));
System.out.println(" Age: " + getTagValue("age", employeeElement));
System.out.println(" Position Skill Ref: " + getAttributeValue("position-skill", "ref", employeeElement));
System.out.println(" Detail Ref: " + getAttributeValue("detail-ref", "ref", employeeElement));
System.out.println("--------------------------------------------------------------------------");
}
}
}
// 解析 position_details 类别
System.out.println("\nParsing Position Details:");
NodeList positionDetailsNodes = doc.getElementsByTagName("position_details");
if (positionDetailsNodes.getLength() > 0) {
Element positionDetailsElement = (Element) positionDetailsNodes.item(0);
NodeList positions = positionDetailsElement.getElementsByTagName("position");
System.out.println("Total Positions found: " + positions.getLength());
for (int i = 0; i < positions.getLength(); i++) {
Node positionNode = positions.item(i);
if (positionNode.getNodeType() == Node.ELEMENT_NODE) {
Element positionElement = (Element) positionNode;
System.out.println(" ID: " + positionElement.getAttribute("ID"));
System.out.println(" Role: " + getTagValue("role", positionElement));
System.out.println(" Skill Name: " + getTagValue("skill_name", positionElement));
System.out.println(" Experience: " + getTagValue("experience", positionElement));
System.out.println("--------------------------------------------------------------------------");
}
}
}
// 解析 employee_info 类别
System.out.println("\nParsing Employee Info:");
NodeList employeeInfoNodes = doc.getElementsByTagName("employee_info");
if (employeeInfoNodes.getLength() > 0) {
Element employeeInfoElement = (Element) employeeInfoNodes.item(0);
NodeList details = employeeInfoElement.getElementsByTagName("detail");
System.out.println("Total Details found: " + details.getLength());
for (int i = 0; i < details.getLength(); i++) {
Node detailNode = details.item(i);
if (detailNode.getNodeType() == Node.ELEMENT_NODE) {
Element detailElement = (Element) detailNode;
System.out.println(" ID: " + detailElement.getAttribute("ID"));
System.out.println(" Username: " + getTagValue("username", detailElement));
System.out.println(" Residence: " + getTagValue("residence", detailElement));
System.out.println(" Year of Birth: " + getTagValue("yearOfBirth", detailElement));
System.out.println(" Phone: " + getTagValue("phone", detailElement));
System.out.println("--------------------------------------------------------------------------");
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
// 辅助方法:获取子元素的文本内容
private static String getTagValue(String tagName, Element element) {
NodeList nodeList = element.getElementsByTagName(tagName);
if (nodeList != null && nodeList.getLength() > 0) {
Node node = nodeList.item(0);
if (node != null && node.getNodeType() == Node.ELEMENT_NODE) {
return node.getTextContent();
}
}
return "";
}
// 辅助方法:获取子元素的属性值
private static String getAttributeValue(String tagName, String attrName, Element element) {
NodeList nodeList = element.getElementsByTagName(tagName);
if (nodeList != null && nodeList.getLength() > 0) {
Node node = nodeList.item(0);
if (node != null && node.getNodeType() == Node.ELEMENT_NODE) {
Element subElement = (Element) node;
return subElement.getAttribute(attrName);
}
}
return "";
}
}在上述代码中,我们首先通过 doc.getElementsByTagName("employee_list") 获取到
结构化数据存储与分组输出
直接打印解析出的数据虽然简单,但对于多层关联数据,更专业的做法是将其映射到Java对象(POJO,Plain Old Java Object)中,以便于后续的数据处理和统一输出。我们可以为 Employee、PositionDetail 和 EmployeeInfo 定义相应的POJO类。
1. 定义POJO类
// Employee.java
public class Employee {
private String id;
private String firstname;
private String lastname;
private int age;
private String positionSkillRef;
private String detailRef;
// Constructor
public Employee(String id, String firstname, String lastname, int age, String positionSkillRef, String detailRef) {
this.id = id;
this.firstname = firstname;
this.lastname = lastname;
this.age = age;
this.positionSkillRef = positionSkillRef;
this.detailRef = detailRef;
}
// Getters
public String getId() { return id; }
public String getFirstname() { return firstname; }
public String getLastname() { return lastname; }
public int getAge() { return age; }
public String getPositionSkillRef() { return positionSkillRef; }
public String getDetailRef() { return detailRef; }
// Setters (optional, if data is immutable after parsing)
public void setId(String id) { this.id = id; }
public void setFirstname(String firstname) { this.firstname = firstname; }
public void setLastname(String lastname) { this.lastname = lastname; }
public void setAge(int age) { this.age = age; }
public void setPositionSkillRef(String positionSkillRef) { this.positionSkillRef = positionSkillRef; }
public void setDetailRef(String detailRef) { this.detailRef = detailRef; }
@Override
public String toString() {
return "Employee{" +
"id='" + id + '\'' +
", firstname='" + firstname + '\'' +
", lastname='" + lastname + '\'' +
", age=" + age +
", positionSkillRef='" + positionSkillRef + '\'' +
", detailRef='" + detailRef + '\'' +
'}';
}
}
// PositionDetail.java
public class PositionDetail {
private String id;
private String role;
private String skillName;
private int experience;
// Constructor
public PositionDetail(String id, String role, String skillName, int experience) {
this.id = id;
this.role = role;
this.skillName = skillName;
this.experience = experience;
}
// Getters
public String getId() { return id; }
public String getRole() { return role; }
public String getSkillName() { return skillName; }
public int getExperience() { return experience; }
// Setters
public void setId(String id) { this.id = id; }
public void setRole(String role) { this.role = role; }
public void setSkillName(String skillName) { this.skillName = skillName; }
public void setExperience(int experience) { this.experience = experience; }
@Override
public String toString() {
return "PositionDetail{" +
"id='" + id + '\'' +
", role='" + role + '\'' +
", skillName='" + skillName + '\'' +
", experience=" + experience +
'}';
}
}
// EmployeeInfo.java
public class EmployeeInfo {
private String id;
private String username;
private String residence;
private int yearOfBirth;
private String phone;
// Constructor
public EmployeeInfo(String id, String username, String residence, int yearOfBirth, String phone) {
this.id = id;
this.username = username;
this.residence = residence;
this.yearOfBirth = yearOfBirth;
this.phone = phone;
}
// Getters
public String getId() { return id; }
public String getUsername() { return username; }
public String getResidence() { return residence; }
public int getYearOfBirth() { return yearOfBirth; }
public String getPhone() { return phone; }
// Setters
public void setId(String id) { this.id = id; }
public void setUsername(String username) { this.username = username; }
public void setResidence(String residence) { this.residence = residence; }
public void setYearOfBirth(int yearOfBirth) { this.yearOfBirth = yearOfBirth; }
public void setPhone(String phone) { this.phone = phone; }
@Override
public String toString() {
return "EmployeeInfo{" +
"id='" + id + '\'' +
", username='" + username + '\'' +
", residence='" + residence + '\'' +
", yearOfBirth=" + yearOfBirth +
", phone='" + phone + '\'' +
'}';
}
}2. 解析数据并填充POJO列表
接下来,修改解析逻辑,将数据填充到 List 集合中。为了方便通过 ref 属性查找关联数据,我们可以使用 Map 来存储 PositionDetail 和 EmployeeInfo,其中键为它们的 ID。
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class AdvancedXmlParser {
public static void main(String[] args) {
List employees = new ArrayList<>();
Map positionDetailsMap = new HashMap<>();
Map employeeInfoMap = new HashMap<>();
try {
File xmlFile = new File("employees.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
// 1. 解析 employee_list 并填充 employees 列表
NodeList employeeListNodes = doc.getElementsByTagName("employee_list");
if (employeeListNodes.getLength() > 0) {
Element employeeListElement = (Element) employeeListNodes.item(0);
NodeList employeeNodes = employeeListElement.getElementsByTagName("employee");
for (int i = 0; i < employeeNodes.getLength(); i++) {
Node employeeNode = employeeNodes.item(i);
if (employeeNode.getNodeType() == Node.ELEMENT_NODE) {
Element employeeElement = (Element) employeeNode;
String id = employeeElement.getAttribute("ID");
String firstname = getTagValue("firstname", employeeElement);
String lastname = getTagValue("lastname", employeeElement);
int age = Integer.parseInt(getTagValue("age", employeeElement));
String positionSkillRef = getAttributeValue("position-skill", "ref", employeeElement);
String detailRef = getAttributeValue("detail-ref", "ref", employeeElement);
employees.add(new Employee(id, firstname, lastname, age, positionSkillRef, detailRef));
}
}
}
// 2. 解析 position_details 并填充 positionDetailsMap
NodeList positionDetailsNodes = doc.getElementsByTagName("position_details");
if (positionDetailsNodes.getLength() > 0) {
Element positionDetailsElement = (Element) positionDetailsNodes.item(0);
NodeList positionNodes = positionDetailsElement.getElementsByTagName("position");
for (int i = 0; i < positionNodes.getLength(); i++) {
Node positionNode = positionNodes.item(i);
if (positionNode.getNodeType() == Node.ELEMENT_NODE) {
Element positionElement = (Element) positionNode;
String id = positionElement.getAttribute("ID");
String role = getTagValue("role", positionElement);
String skillName = getTagValue("skill_name", positionElement);
int experience = Integer.parseInt(getTagValue("experience", positionElement));
positionDetailsMap.put(id, new PositionDetail(id, role, skillName, experience));
}
}
}
// 3. 解析 employee_info 并填充 employeeInfoMap
NodeList employeeInfoNodes = doc.getElementsByTagName("employee_info");
if (employeeInfoNodes.getLength() > 0) {
Element employeeInfoElement = (Element) employeeInfoNodes.item(0);
NodeList detailNodes = employeeInfoElement.getElementsByTagName("detail");
for (int i = 0; i < detailNodes.getLength(); i++) {
Node detailNode = detailNodes.item(i);
if (detailNode.getNodeType() == Node.ELEMENT_NODE) {
Element detailElement =










