Have fun with sci.dog

word,excel,ppt文件批量转PDF

前天碰到一个项目归档的要求,归档方要求所有的word,excel,ppt文件都必须有对应的PDF版本。

由于项目内含有数千个word,exce,ppt文件,而且存放在不同的路径下,手动转换显然耗时耗力。

因此,笔者尝试调用Microsoft Office的.com接口来完成自动批量转换工作,实际上,python,matlab,php等语言都支持调用.com接口的。这里以python为例。

先摆上源代码

#office.py
import comtypes.client
import os

def PPTtoPDF(powerpoint,inputFileName, formatType = 32):
    #powerpoint = comtypes.client.CreateObject("Powerpoint.Application")
    #powerpoint.Visible = 1
    filename, file_extension = os.path.splitext(inputFileName)    
    outputFileName = filename + ".pdf"
    deck = powerpoint.Presentations.Open(inputFileName)
    deck.SaveAs(outputFileName, formatType) # formatType = 32 for ppt to pdf
    deck.Close()


def WordtoPDF(word,inputFileName, formatType = 17):
    #word = comtypes.client.CreateObject("Word.Application")
    #word.Visible = 1
    filename, file_extension = os.path.splitext(inputFileName)    
    outputFileName = filename + ".pdf"
    deck = word.Documents.Open(inputFileName)
    deck.SaveAs(outputFileName, formatType) # formatType = 17 for word to pdf
    deck.Close()
    

def ExceltoPDF(excel,inputFileName, formatType = 0):
    #excel = comtypes.client.CreateObject("Excel.Application")
    #excel.Visible = 1
    filename, file_extension = os.path.splitext(inputFileName)    
    outputFileName = filename + ".pdf"
    books = excel.Workbooks.Open(inputFileName)       
    books.ExportAsFixedFormat(formatType,outputFileName,0,True,True) 
    books.Close()

Microsoft Office的.com接口设计的其实挺怪的,Word中代表文件的类是Documents,Excel中是Workboos,PowerPoint中是Presentations,不知道为什么不统一定义为Files?

另外一个奇怪的地方就是SaveAs方法的参数,PPT中保存pdf文件对应的formatType=32,但word中保存pdf文件对应的formatType=17,不知道为什么不统一,也是比较奇葩。更为奇葩的是,excel中的Saveas函数没有提供保存pdf的选项,迷醉。但pdf的保存方法被放在了ExportAsFixedFormat方法里。

有了转换函数后,再写个脚本即可,代码如下

import os,csv
import comtypes.client
from  office import WordtoPDF,PPTtoPDF,ExceltoPDF
# 启动com服务器
word  = comtypes.client.CreateObject("Word.Application")
excel = comtypes.client.CreateObject("Excel.Application")
ppt   = comtypes.client.CreateObject("Powerpoint.Application")

word.Visible = 1
excel.Visible = 1
ppt.Visible = 1
# 路径
workdir = r"D:\0.tem\zdzx\课题2017ZX05049006归档文件-20210811"
# log 文件
f = open('convert_log.csv','w',encoding='utf-8',newline="")
csv_write = csv.writer(f)

for root, dirs, files in os.walk(workdir):    
    for name in files:
        fileadd = os.path.join(root,name)
        filename, file_extension = os.path.splitext(fileadd)
        if not os.path.exists(filename+'.pdf'):
            if file_extension == '.doc' or file_extension == '.docx' or file_extension == '.xls' or file_extension=='.xlsx' or file_extension == '.ppt' or file_extension =='.pptx':
                try:
                    if file_extension == '.doc' or file_extension == '.docx':
                        WordtoPDF(word,fileadd)
                    elif file_extension == '.xls' or file_extension == '.xlsx':
                        ExceltoPDF(excel,fileadd)
                    elif file_extension == '.ppt' or file_extension == '.pptx':
                        PPTtoPDF(ppt,fileadd)
                    else:
                        print('this is not office file')
                    csv_write.writerow([fileadd,name,1])
                except:
                    csv_write.writerow([fileadd,name,0])

f.close()
word.Quit()
excel.Quit()
ppt.Quit()

项目的github地址,这个主要我自己记录一下:

rename/covert at master · gouff/rename (github.com)

最后,提供OFFICE COM接口的类和方法的查询地址。

Office Visual Basic for Applications (VBA) reference | Microsoft Docs

赞(1)
未经允许不得转载:SciDog » word,excel,ppt文件批量转PDF

评论 抢沙发