分享

jsoup 1.8.3 发布,HTML 解析器

 飞鹰飞龙飞天 2015-08-05

jsoup 1.8.3 发布,HTML 解析器

oschina 发布于: 2015年08月03日 (26评)
分享到: 
收藏 +74

8月22日珠海 OSC 源创会正在报名,送机械键盘和开源无码内裤  

jsoup 1.8.3 发布,此版本主要改进有:解析大型 HTML 文件的一些性能提升;抓取 XML 文档时,自动切换到 XML 解析器;重要 bug 修复。

更新内容:

改进

  • Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster.

  • On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of various websites; also from further memory reduction for nodes with no children, and other tweaks.

  • When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser.

  • Improved support for boolean attributes in HTML5.

  • When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML.

Bug 修复

  • Fixed an issue in Element.elementSiblingIndex() (and related methods) where sibling elements with the same content would incorrectly have the same sibling index.

  • Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the document.

  • Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.

  • When serializing a document using the XHTML encoding entities, if the character set did not support   chars (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; (the hex code for non-breaking-space); when using XHTML encoding entities (as   is not defined), regardless of the output character set.

  • Fixed an issue when resolving URLs, where if the absolute URL had no path, the relative URL was not normalized correctly.

  • Fixed an issue where connections that were redirected to a relative URL did not have the same normalization rules as a URL read from Nodes.absUrl(String).

本站使用 jsoup 来解析 HTML。

jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。

jsoup的主要功能如下:

  1. 从一个URL,文件或字符串中解析HTML;

  2. 使用DOM或CSS选择器来查找、取出数据;

  3. 可操作HTML元素、属性、文本;

jsoup是基于MIT协议发布的,可放心使用于商业项目。

相关链接

想通过手机客户端(支持 Android、iPhone 和 Windows Phone)访问开源中国:请点这里

本站文章除注明转载外,均为本站原创或编译
欢迎任何形式的转载,但请务必注明出处,尊重他人劳动共创开源社区
转载请注明:文章转载自:开源中国社区 [http://www.oschina.net]
本文标题:jsoup 1.8.3 发布,HTML 解析器
本文地址:http://www.oschina.net/news/64802/jsoup-1-8-3
相关资讯 相关讨论话题

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多