分享

Playwright的使用

 wenxuefeng360 2022-07-07 发布于四川

1. 基本使用

  1. 同步模式
from playwright.sync_api import sync_playwright

url = 'https://www.baidu.com'

with sync_playwright() as p:
    for browser_type in [p.chromium, p.firefox, p.webkit]:
        browser = browser_type.launch(headless=False)
        page = browser.new_page()
        page.goto(url)
        page.screenshot(path=f'sync-{browser_type.name}.png')
        print(page.title())
        browser.close()
  1. 异步模式
import asyncio
from playwright.async_api import async_playwright

url = 'https://www.baidu.com'

async def main():
    async with async_playwright() as p:
        for browser_type in [p.chromium, p.firefox, p.webkit]:
            browser = await browser_type.launch()
            page = await browser.new_page()
            await page.goto(url)
            await page.screenshot(path=f'async-{browser_type.name}.png')
            print(await page.title())
            await browser.close()
asyncio.run(main())

2. 代码生成

Playwright可以录制在浏览器的操作并自动生成代码。codegen

# 查看codegen命令的参数
playwright codegen --help

# 例如:启动firefox浏览器,并将操作结果输出到script.py文件
playwright codegen -o script.py -b firefox https://www.baidu.com

3. 选择器

  1. 文本选择
page.click("text=Log in")
  1. CSS选择器
page.click("button")
page.click("#nav-bar .contact-us-item")
page.click("[data-test=login-button]")
page.click("[aria-label='Sign in']")
  1. XPath
# 需在开头自行指定 “xpath=字符串”
page.click("xpath=//button")

4. 事件监听

​ page对象提供一个on方法,用来监听页面中发生的各个事件,例如close, console, load, request, response等。

对于Ajax加载的数据,即使这个Ajax请求中有加密参数,也不用担心,因为我们截获的是最后的响应结果

from playwright.sync_api import Playwright, sync_playwright


# def on_response(response):
#     """
#     输出浏览器Network面板中的所有请求和相应
#     """
#     print(f'Status {response.status}: {response.url}')


def on_response(response):
    """
    通过on_response方法拦截Ajax请求,直接获取响应结果。
    """
    if "api/movie/" in response.url and response.status == 200:
        print(response.json())


def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    page = browser.new_page()
    # 监听response事件,同时将回调方法设为on_response
    page.on('response', on_response)
    page.goto("https://spa6./")
    page.wait_for_load_state("networkidle")
    page.close()
    browser.close()


with sync_playwright() as playwright:
    run(playwright)

5. 常用方法

  1. 获取网页源码:page.content()

  2. 页面点击:page.click(selector, kwargs) 参考官方文档

  3. 文本输入:page.fill(selector, value, kwargs)

  4. 获取节点属性:page.get_attribute(selector, name, kwargs)

    # 只返回单个节点属性
    href = page.get_attribute("a.name", "href")
    
  5. 获取多个节点:query_selector_all()

    1. 节点属性:element.get_attribute(name)
    2. 节点文本:element.text_content()
    elements = page.query_selector_all("a.name")
    for element in elements:
      href = element.get_attribute("href")
      text = element.text_content()
    
  6. 获取单个节点:query_selector()

    element = page.query_selector("a.name")
    href = element.get_attribute("href")
    text = element.text_content()
    

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多