分享

Jsoup选择器规则

 pengx 2012-03-21

Selector syntaxA selector is a chain of simple selectors, seperated by combinators. Selectors are case insensitive (including against elements, attributes, and attribute values).The universal selector (*) is implicit when no element selector is supplied (i.e. *.header and .header is equivalent). div:not(.logo) finds all divs that do not have the "logo" class




* any element *
tag elements with the given tag name div
ns|E elements of type E in the namespace ns fb|name finds <fb:name> elements
#id elements with attribute ID of "id" div#wrap, #logo
.class elements with a class name of "class" div.left, .result
[attr] elements with an attribute named "attr" (with any value) a[href], [title]
[^attrPrefix] elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets [^data-], div[^data-]
[attr=val] elements with an attribute named "attr", and value equal to "val" img[width=500], a[rel=nofollow]
[attr^=valPrefix] elements with an attribute named "attr", and value starting with "valPrefix" a[href^=http:]
[attr$=valSuffix] elements with an attribute named "attr", and value ending with "valSuffix" img[src$=.png]
[attr*=valContaining] elements with an attribute named "attr", and value containing "valContaining" a[href*=/search/]
[attr~=regex] elements with an attribute named "attr", and value matching the regular expression img[src~=(?i)\\.(png|jpe?g)]

The above may be combined in any order div.header[title]

Combinators
E F an F element descended from an E element div a, .logo h1
E > F an F direct child of E ol > li
E + F an F element immediately preceded by sibling E li + li, div.head + div
E ~ F an F element preceded by sibling E h1 ~ p
E, F, G all matching elements E, F, or G a[href], div, h3

Pseudo selectors
:lt(n) elements whose sibling index is less than n td:lt(3) finds the first 2 cells of each row
:gt(n) elements whose sibling index is greater than n td:gt(1) finds cells after skipping the first two
:eq(n) elements whose sibling index is equal to n td:eq(0) finds the first cell of each row
:has(selector) elements that contains at least one element matching the selector div:has(p) finds divs that contain p elements
:not(selector) elements that do not match the selector. See also Elements.not(String)
:contains(text) elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants. p:contains(jsoup) finds p elements containing the text "jsoup".
:matches(regex) elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants. td:matches(\\d+) finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively.
:containsOwn(text) elements that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants. p:containsOwn(jsoup) finds p elements with own text "jsoup".
:matchesOwn(regex) elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants. td:matchesOwn(\\d+) finds table cells directly containing digits. div:matchesOwn((?i)login) finds divs containing the text, case insensitively.

The above may be combined in any order and with other selectors .light:contains(name):eq(0)

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多