前言嗨喽!大家好呀~这里是魔王 本次所需知识点:所使用开发环境:版 本: python 3.8 编辑器: pycharm 2021.2
思路分析第一个步骤 (找到数据来源)
代码实现:发送请求 获取数据 解析数据 保存数据
代码import requests # 发送请求
headers = { 'cookie': 'SUB=_2AkMWuiaof8NxqwJRmfEcxW7kZYV1zQHEieKg5tdzJRMxHRl-yT8XqmlbtRB6PToIR8vzOUazMyBaDx1yoAhoGvmhBh2R; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9WFhP5UbeyRGEMWCEO66rKKN; SINAGLOBAL=4378435525987.705.1642506657635; UOR=,,www.baidu.com; YF-V-WEIBO-G0=35846f552801987f8c1e8f7cec0e2230; _s_tentry=www.baidu.com; Apache=4202086709610.053.1651127548346; ULV=1651127548462:5:1:1:4202086709610.053.1651127548346:1647671293014; XSRF-TOKEN=-zQTQde7oNPbwv2z7IZNWn7x; WBPSESS=5Gh1MjbHbWED7wnbzL0HessirGvmylijYYvflqusiD9GEsQ6rqnU_tJ77BAIaB7ziYAGd2bn8bjGxvzctVcMOww-G_WpuVuFa86yECy9FyzCc1G6phFPW88j0AwEPWrz', 'origin': 'https://www.weibo.com', 'page-referer': '/tv/show/1034:4762666296868953', 'referer': 'https://www.weibo.com/tv/show/1034:4762666296868953?mid=4762667594547707', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36', 'x-xsrf-token': '-zQTQde7oNPbwv2z7IZNWn7x'} def get_next(next_cursor): list_data = { 'data': '{"Component_Channel_Subchannel":{"cid":"4379160563414139"}}' } if next_cursor != '': list_data = { 'data': '{"Component_Channel_Subchannel":{"next_cursor":'+str(next_cursor)+',"cid":"4379160563414139"}}' } # https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/4379160563414139 list_url = 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/4379160563414139' list_json = requests.post(list_url, headers=headers, data=list_data).json() if list_json['data']['Component_Channel_Subchannel'] == None: return 0 data_list = list_json['data']['Component_Channel_Subchannel']['list'] next_cursor = list_json['data']['Component_Channel_Subchannel']['next_cursor'] for dat_ in data_list: oid = dat_['oid'] data = { 'data': '{"Component_Play_Playinfo":{"oid":"'+oid+'"}}' } url = 'https://www.weibo.com/tv/api/component?page=/tv/show/' + oid # 1. 发送请求 response = requests.post(url=url, data=data, headers=headers) # 2. 获取数据 json_dict = response.json() # 3. 解析数据 try: dict_urls = json_dict['data']['Component_Play_Playinfo']['urls'] video_url = 'https:' + dict_urls[list(dict_urls.keys())[0]] title = json_dict['data']['Component_Play_Playinfo']['title'] media_id = json_dict['data']['Component_Play_Playinfo']['media_id'] title = str(media_id) + title print(title, video_url) # 4. 保存数据 # video_data = requests.get(video_url).content # with open(f'video/{title}.mp4', mode='wb') as f: # f.write(video_data) except: print('视频违规,已下架!') get_next(next_cursor)
get_next('')
尾语好了,我的这篇文章写到这里就结束啦! 有更多建议或问题可以评论区或私信我哦!一起加油努力叭(ง ·_·)ง 喜欢就关注一下博主,或点赞收藏评论一下我的文章叭!!!
|