tfhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。
配置
1.导入libxml2.tbd
立即学习“前端免费学习笔记(深入)”;
2.设置编译路径
使用
这里使用一个例子来说明
http://so.gushiwen.org/guwen/book_2.aspx
立即学习“前端免费学习笔记(深入)”;
1.创建TFHpple对象,data为网站返回的数据
TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];
立即学习“前端免费学习笔记(深入)”;
2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度
NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]
这样我们获取了论语的数据
立即学习“前端免费学习笔记(深入)”;
3。获取并分析元素
TFHppleElement *element = [elements objectAtIndex:i];
TFHppleElement对象包含许多属性,下面简单介绍一下各属性
1。
<strong>@property (nonatomic, copy, readonly) NSString *raw</strong>
raw是包含html标记的网页数据
<div class="bookcont"> <ul> <span><a href="/guwen/bookv_19.aspx">学而篇</a></span> <span><a href="/guwen/bookv_20.aspx">为政篇</a></span> <span><a href="/guwen/bookv_21.aspx">八佾篇</a></span> <span><a href="/guwen/bookv_22.aspx">里仁篇</a></span> <span><a href="/guwen/bookv_23.aspx">公冶长篇</a></span> <span><a href="/guwen/bookv_24.aspx">雍也篇</a></span> <span><a href="/guwen/bookv_25.aspx">述而篇</a></span> <span><a href="/guwen/bookv_26.aspx">泰伯篇</a></span> <span><a href="/guwen/bookv_27.aspx">子罕篇</a></span> <span><a href="/guwen/bookv_28.aspx">乡党篇</a></span> <span><a href="/guwen/bookv_29.aspx">先进篇</a></span> <span><a href="/guwen/bookv_30.aspx">颜渊篇</a></span> <span><a href="/guwen/bookv_31.aspx">子路篇</a></span> <span><a href="/guwen/bookv_32.aspx">宪问篇</a></span> <span><a href="/guwen/bookv_33.aspx">卫灵公篇</a></span> <span><a href="/guwen/bookv_34.aspx">季氏篇</a></span> <span><a href="/guwen/bookv_35.aspx">阳货篇</a></span> <span><a href="/guwen/bookv_36.aspx">微子篇</a></span> <span><a href="/guwen/bookv_37.aspx">子张篇</a></span> <span><a href="/guwen/bookv_38.aspx">尧曰篇</a></span> </ul> </div>
立即学习“前端免费学习笔记(深入)”;
立即学习“前端免费学习笔记(深入)”;
2.content是网页的具体数据,不包含html标记
学而篇 为政篇 八佾篇 里仁篇 公冶长篇 雍也篇 述而篇 泰伯篇 子罕篇 乡党篇 先进篇 颜渊篇 子路篇 宪问篇 卫灵公篇 季氏篇 阳货篇 微子篇 子张篇 尧曰篇
立即学习“前端免费学习笔记(深入)”;
立即学习“前端免费学习笔记(深入)”;
3.tagName是html标签
输出只有div
立即学习“前端免费学习笔记(深入)”;
4.attributes,属性。。。。。。。
<strong>class = bookcont;</strong>
立即学习“前端免费学习笔记(深入)”;
立即学习“前端免费学习笔记(深入)”;
5.children子节点
( "{
nodeContent = "\n ";
nodeName = text;
}", "{
nodeChildArray = (
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_19.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5b66\U800c\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5b66\U800c\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_19.aspx\">\U5b66\U800c\U7bc7</a>";
}
);
nodeContent = "\U5b66\U800c\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_19.aspx\">\U5b66\U800c\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_20.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U4e3a\U653f\U7bc7";
nodeName = text;
}
);
nodeContent = "\U4e3a\U653f\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_20.aspx\">\U4e3a\U653f\U7bc7</a>";
}
);
nodeContent = "\U4e3a\U653f\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_20.aspx\">\U4e3a\U653f\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_21.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U516b\U4f7e\U7bc7";
nodeName = text;
}
);
nodeContent = "\U516b\U4f7e\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_21.aspx\">\U516b\U4f7e\U7bc7</a>";
}
);
nodeContent = "\U516b\U4f7e\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_21.aspx\">\U516b\U4f7e\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_22.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U91cc\U4ec1\U7bc7";
nodeName = text;
}
);
nodeContent = "\U91cc\U4ec1\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_22.aspx\">\U91cc\U4ec1\U7bc7</a>";
}
);
nodeContent = "\U91cc\U4ec1\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_22.aspx\">\U91cc\U4ec1\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_23.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U516c\U51b6\U957f\U7bc7";
nodeName = text;
}
);
nodeContent = "\U516c\U51b6\U957f\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_23.aspx\">\U516c\U51b6\U957f\U7bc7</a>";
}
);
nodeContent = "\U516c\U51b6\U957f\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_23.aspx\">\U516c\U51b6\U957f\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_24.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U96cd\U4e5f\U7bc7";
nodeName = text;
}
);
nodeContent = "\U96cd\U4e5f\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_24.aspx\">\U96cd\U4e5f\U7bc7</a>";
}
);
nodeContent = "\U96cd\U4e5f\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_24.aspx\">\U96cd\U4e5f\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_25.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U8ff0\U800c\U7bc7";
nodeName = text;
}
);
nodeContent = "\U8ff0\U800c\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_25.aspx\">\U8ff0\U800c\U7bc7</a>";
}
);
nodeContent = "\U8ff0\U800c\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_25.aspx\">\U8ff0\U800c\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_26.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U6cf0\U4f2f\U7bc7";
nodeName = text;
}
);
nodeContent = "\U6cf0\U4f2f\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_26.aspx\">\U6cf0\U4f2f\U7bc7</a>";
}
);
nodeContent = "\U6cf0\U4f2f\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_26.aspx\">\U6cf0\U4f2f\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_27.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5b50\U7f55\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5b50\U7f55\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_27.aspx\">\U5b50\U7f55\U7bc7</a>";
}
);
nodeContent = "\U5b50\U7f55\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_27.aspx\">\U5b50\U7f55\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_28.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U4e61\U515a\U7bc7";
nodeName = text;
}
);
nodeContent = "\U4e61\U515a\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_28.aspx\">\U4e61\U515a\U7bc7</a>";
}
);
nodeContent = "\U4e61\U515a\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_28.aspx\">\U4e61\U515a\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_29.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5148\U8fdb\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5148\U8fdb\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_29.aspx\">\U5148\U8fdb\U7bc7</a>";
}
);
nodeContent = "\U5148\U8fdb\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_29.aspx\">\U5148\U8fdb\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_30.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U989c\U6e0a\U7bc7";
nodeName = text;
}
);
nodeContent = "\U989c\U6e0a\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_30.aspx\">\U989c\U6e0a\U7bc7</a>";
}
);
nodeContent = "\U989c\U6e0a\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_30.aspx\">\U989c\U6e0a\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_31.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5b50\U8def\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5b50\U8def\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_31.aspx\">\U5b50\U8def\U7bc7</a>";
}
);
nodeContent = "\U5b50\U8def\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_31.aspx\">\U5b50\U8def\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_32.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5baa\U95ee\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5baa\U95ee\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_32.aspx\">\U5baa\U95ee\U7bc7</a>";
}
);
nodeContent = "\U5baa\U95ee\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_32.aspx\">\U5baa\U95ee\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_33.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U536b\U7075\U516c\U7bc7";
nodeName = text;
}
);
nodeContent = "\U536b\U7075\U516c\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_33.aspx\">\U536b\U7075\U516c\U7bc7</a>";
}
);
nodeContent = "\U536b\U7075\U516c\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_33.aspx\">\U536b\U7075\U516c\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_34.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5b63\U6c0f\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5b63\U6c0f\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_34.aspx\">\U5b63\U6c0f\U7bc7</a>";
}
);
nodeContent = "\U5b63\U6c0f\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_34.aspx\">\U5b63\U6c0f\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_35.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U9633\U8d27\U7bc7";
nodeName = text;
}
);
nodeContent = "\U9633\U8d27\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_35.aspx\">\U9633\U8d27\U7bc7</a>";
}
);
nodeContent = "\U9633\U8d27\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_35.aspx\">\U9633\U8d27\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_36.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5fae\U5b50\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5fae\U5b50\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_36.aspx\">\U5fae\U5b50\U7bc7</a>";
}
);
nodeContent = "\U5fae\U5b50\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_36.aspx\">\U5fae\U5b50\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_37.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5b50\U5f20\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5b50\U5f20\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_37.aspx\">\U5b50\U5f20\U7bc7</a>";
}
);
nodeContent = "\U5b50\U5f20\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_37.aspx\">\U5b50\U5f20\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
},
{
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "/guwen/bookv_38.aspx";
}
);
nodeChildArray = (
{
nodeContent = "\U5c27\U66f0\U7bc7";
nodeName = text;
}
);
nodeContent = "\U5c27\U66f0\U7bc7";
nodeName = a;
raw = "<a href=\"/guwen/bookv_38.aspx\">\U5c27\U66f0\U7bc7</a>";
}
);
nodeContent = "\U5c27\U66f0\U7bc7";
nodeName = span;
raw = "<span><a href=\"/guwen/bookv_38.aspx\">\U5c27\U66f0\U7bc7</a></span>";
},
{
nodeContent = "\n \n ";
nodeName = text;
}
);
nodeContent = "\n \n \U5b66\U800c\U7bc7\n \n \U4e3a\U653f\U7bc7\n \n \U516b\U4f7e\U7bc7\n \n \U91cc\U4ec1\U7bc7\n \n \U516c\U51b6\U957f\U7bc7\n \n \U96cd\U4e5f\U7bc7\n \n \U8ff0\U800c\U7bc7\n \n \U6cf0\U4f2f\U7bc7\n \n \U5b50\U7f55\U7bc7\n \n \U4e61\U515a\U7bc7\n \n \U5148\U8fdb\U7bc7\n \n \U989c\U6e0a\U7bc7\n \n \U5b50\U8def\U7bc7\n \n \U5baa\U95ee\U7bc7\n \n \U536b\U7075\U516c\U7bc7\n \n \U5b63\U6c0f\U7bc7\n \n \U9633\U8d27\U7bc7\n \n \U5fae\U5b50\U7bc7\n \n \U5b50\U5f20\U7bc7\n \n \U5c27\U66f0\U7bc7\n \n ";
nodeName = ul;
raw = "<ul> \n \n <span><a href=\"/guwen/bookv_19.aspx\">\U5b66\U800c\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_20.aspx\">\U4e3a\U653f\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_21.aspx\">\U516b\U4f7e\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_22.aspx\">\U91cc\U4ec1\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_23.aspx\">\U516c\U51b6\U957f\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_24.aspx\">\U96cd\U4e5f\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_25.aspx\">\U8ff0\U800c\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_26.aspx\">\U6cf0\U4f2f\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_27.aspx\">\U5b50\U7f55\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_28.aspx\">\U4e61\U515a\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_29.aspx\">\U5148\U8fdb\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_30.aspx\">\U989c\U6e0a\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_31.aspx\">\U5b50\U8def\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_32.aspx\">\U5baa\U95ee\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_33.aspx\">\U536b\U7075\U516c\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_34.aspx\">\U5b63\U6c0f\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_35.aspx\">\U9633\U8d27\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_36.aspx\">\U5fae\U5b50\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_37.aspx\">\U5b50\U5f20\U7bc7</a></span> \n \n <span><a href=\"/guwen/bookv_38.aspx\">\U5c27\U66f0\U7bc7</a></span> \n \n </ul>";
}", "{
nodeContent = "\n ";
nodeName = text;
}")
立即学习“前端免费学习笔记(深入)”;
立即学习“前端免费学习笔记(深入)”;
6.firstChild
{ nodeContent = "
"; nodeName = text;}
立即学习“前端免费学习笔记(深入)”;
上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象
立即学习“前端免费学习笔记(深入)”;
这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。
立即学习“前端免费学习笔记(深入)”;
HTML怎么学习?HTML怎么入门?HTML在哪学?HTML怎么学才快?不用担心,这里为大家提供了HTML速学教程(入门课程),有需要的小伙伴保存下载就能学习啦!
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号