【原】AI办公自动化-用kimi把PDF文档按照章节自动拆分成多个docx文档

AIGC部落 2024-05-13 发布于广东

展开全文

一个PDF文档很长，希望按照章节分拆成小文档。

可以在kimichat中输入提示词：

你是一个Python编程专家，要完成一个编写拆分PDF文档的Python脚本的任务，具体步骤如下：

打开文件夹：D:\chatgpt图书\图书1，

读取文件夹里面的PDF文档：Porter L. Learn AI-assisted Python Programming；

这个PDF文档里面一共有11个章节，需要拆分出来的章节是这6、7、8、9这三个章节：

第6章Testing and prompt engineering，页码范围是：99—123

第7章 Problem decomposition，页码范围是：124—157

第8章Debugging and better understanding your code，页码范围是：158—181

第9章Automating tedious tasks，页码范围是：182—215

用PyPDF2库将第7章、第8章、第9章分拆成3个独立的PDF文档，文档标题名和章节标题名一致，保存在文件夹：打开文件夹：D:\chatgpt图书\图书分拆；

然后用python-docx库将PDF文档转换成docx文档格式；

注意：

每一步都要打印相关的信息；

要有错误处理和调试信息，这有助于找出问题所在；

PdfFileReader 类在 PyPDF2 版本 3.0.0 中已经被弃用并移除了，应该使用 PdfReader 类来代替。

这是kimi智能助手生成的代码：

import os

import PyPDF2

from PyPDF2 import PdfReader, PdfWriter

from pdf2docx import Converter

# 定义文件夹路径和文件名

pdf_folder_path = "D:\\chatgpt图书\\图书1"

output_folder_path = "D:\\chatgpt图书\\图书分拆"

pdf_filename = "Porter L. Learn AI-assisted Python Programming.pdf"

# 定义章节的页码范围

chapters = {

"Problem decomposition": (124, 157),

"Debugging and better understanding your code": (158, 181),

"Automating tedious tasks": (182, 215)

}

# 检查输入文件夹是否存在

if not os.path.exists(pdf_folder_path):

print(f"Error: The folder {pdf_folder_path} does not exist.")

exit()

# 检查输出文件夹是否存在，如果不存在则创建

if not os.path.exists(output_folder_path):

os.makedirs(output_folder_path)

# 读取PDF文档

pdf_path = os.path.join(pdf_folder_path, pdf_filename)

if not os.path.exists(pdf_path):

print(f"Error: The file {pdf_filename} does not exist in the folder {pdf_folder_path}.")

exit()

pdf_reader = PdfReader(pdf_path) # 使用PdfReader替代PdfFileReader

num_pages = len(pdf_reader.pages)

print(f"The PDF document has {num_pages} pages.")

# 拆分PDF文档并转换为docx

for chapter_title, (start_page, end_page) in chapters.items():

try:

# 创建一个PDF写入器

pdf_writer = PdfWriter()

# 添加指定页码范围的页面到pdf_writer

for page_num in range(start_page, end_page + 1):

if page_num <= num_pages:

pdf_writer.add_page(pdf_reader.pages[page_num - 1]) # PdfReader使用pages属性

else:

print(f"Warning: Page {page_num} does not exist in the document.")

# 设置文档信息

pdf_writer.add_metadata({

'/Title': chapter_title,

'/Author': 'Porter L.'

})

# 保存PDF文档

output_pdf_path = os.path.join(output_folder_path, f"{chapter_title}.pdf")

pdf_writer.write(open(output_pdf_path, 'wb'))

print(f"Chapter '{chapter_title}' has been successfully saved as {output_pdf_path}")

# 使用pdf2docx转换PDF为docx

cv = Converter(output_pdf_path)

docx_path = os.path.splitext(output_pdf_path)[0] + ".docx"

cv.convert(docx_path, start=0, end=None)

cv.close()

print(f"Chapter '{chapter_title}' has been successfully converted to docx at {docx_path}")

except Exception as e:

print(f"An error occurred while processing chapter '{chapter_title}': {e}")

print("PDF splitting and conversion process completed.")

运行程序，成功分拆文档：

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自： AIGC部落 > 《待分类》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

AIGC部落

关注对话

TA的最新馆藏

AI数据分析：用kimi批量根据word文档生成词云图片
AI视频下载：ChatGPT数据科学与机器学习课程
AI网络爬虫：无限下拉滚动页面的另类爬取方法
通义千问图像识别功能的23个实用案例
AI播客下载：Acquired podcast每个公司都有一个故事
AI办公自动化：kimi批量新建文件夹

喜欢该文的人也喜欢更多

热门阅读换一换