python 读取pdf，导出 txt 或 html

本文主要是介绍python 读取pdf，导出 txt 或 html，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

本文链接：https://www.cnblogs.com/tujia/p/16670374.html

一、安装 pdfminer.six

pip install pdfminer.six

二、使用代码读取pdf

from io import StringIO
from pdfminer.layout import LAParams
from pdfminer.high_level import extract_text_to_fp


output_string = StringIO()

with open('test.pdf', 'rb') as fin:
    # 导出txt
    # extract_text_to_fp(fin, output_string)
    # 导出html
    extract_text_to_fp(fin, output_string, laparams=LAParams(), output_type='html', codec=None)


with open('test.html', 'w', encoding='utf-8') as f:
    f.write(output_string.getvalue().strip())

官方文档：

https://pdfminersix.readthedocs.io/en/latest/tutorial/highlevel.html

https://pdfminersix.readthedocs.io/en/latest/reference/highlevel.html

三、使用脚本读取pdf

https://pdfminersix.readthedocs.io/en/latest/tutorial/commandline.html

https://pdfminersix.readthedocs.io/en/latest/reference/commandline.html

说明：略

本文链接：https://www.cnblogs.com/tujia/p/16670374.html

完。

这篇关于python 读取pdf，导出 txt 或 html的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Python教程

python 读取pdf，导出 txt 或 html

前端开发

后端开发

移动端开发

数据库

服务器运维

人工智能

区块链

游戏开发

网站运营

大数据/云计算

软件工程

软件/开发工具使用

资讯