python之htmlParser入门教程分享

时间2024-05-16 22:38:03发布访客分类HTML浏览30

导读： HTMLParser.HTMLParser( htmlParser模块包含了类HTMLParser ?这个类本身很有用.因为当产生事件时，本身并不做任何工作。对?的利用需要实现其子类，并且编写处理你感兴趣事件的方法 HTM...

　　HTMLParser.HTMLParser() 　　htmlParser模块包含了类HTMLParser ?这个类本身很有用.因为当产生事件时，本身并不做任何工作。对?的利用需要实现其子类，并且编写处理你感兴趣事件的方法　　HTMLPaser模块定义一个类HTMLParser ，可以用作解析html和xhtml 的基础.和htmllib中的parser不同，这个parser并不是基于sgmllib实现　　一个简单htmlparser 使用样例　　输出结果　　Encountered a start tag: html 　　Encountered a start tag: head 　　Encountered a start tag: title 　　Encountered some data : Test 　　Encountered an end tag : title 　　Encountered an end tag : head 　　Encountered a start tag: body 　　Encountered a start tag: h1 　　Encountered some data : Parse me! 　　Encountered an end tag : h1 　　Encountered an end tag : body 　　Encountered an end tag : html 　　If it is important to keep track of the structural position of the current event within the document, you will need to maintain a data structure with this information. If you are certain that the document you are processing is well-formed XHTML, a stack suffices. For example: 　　如果要记录当前标签在整个html文档中的结构位置，则需要维护一个记录位置信息的数据结构。如果你可以确定要处理html文档是严格遵循xhtml标准的，一个栈结构就足够了。　　使用栈结构进行html标签匹配的思想，如果不理解可以参考括号匹配内容-----来源《数据结构》　　运行结果　　/html/head/title > > Advice 　　/html/body/p > > The 　　/html/body/p/a > > IETF admonishes: 　　/html/body/p/a/i > > Be strict in what you 　　/html/body/p/a/i/b > > send 　　/html/body/p/a/i > > . 　　如果要处理的数据不那么良好，就需要实现一个更复杂的栈，我们可以定义一个新的对象，这个对象可以删除和endtag相对应的最近的一个starttag,同时还可以避免没有被闭合的

和嵌套在其中。你可以为一个应用，顺着这种方式做更多的工作，这里的TagStack是一个很好的例子，可以作为开端　　对pop方法的一点简单说明，因为刚开始学习python ，这里曾产生困惑: 　　pop操作首先对lst进行反转，然后self.lst.index(tag)，注意，index()方法返回的是第一个匹配查找目标的位置，所以这里可以获得与endtag相匹配的最近的一个starttag的位置

声明：本文内容由网友自发贡献，本站不承担相应法律责任。对本内容有异议或投诉，请联系2913721942#qq.com核实处理，我们将尽快回复您，谢谢合作！

若转载请注明出处： python之htmlParser入门教程分享
本文地址： https://pptw.com/jishu/661572.html

Python入门怎么学习整理 Python网络爬虫实战分享快速入门