doc转docx(java-python)

PyJava老鸟 / 2024-07-09 / 原文

本文功能借助 python实现的doc转docx,调研了一下开源的工具或者类库转换效果不理想,所以选择python

 

1. /resources/convert.py(py文件放到resources下)

import argparse
from doc2docx import convert

def convert_doc_to_docx(docFilePath, docxFilePath):
    convert(docFilePath, docxFilePath)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Convert a .doc file to .docx')
    parser.add_argument('input', help='Input .doc file path')
    parser.add_argument('output', help='Output .docx file path')
    args = parser.parse_args()

    convert_doc_to_docx(args.input, args.output)

2. java相关代码-installPythonPackage

    private static void installPythonPackage() {
        String command = "pip install doc2docx";

        try {
            Process process = Runtime.getRuntime().exec(command);
            int exitCode = process.waitFor();
            if (exitCode != 0) {
                System.out.println("Package installation failed with exit code: " + exitCode);
            } else {
                System.out.println("Package installed successfully.");
            }
        } catch (IOException | InterruptedException e) {
            System.out.println("An error occurred during package installation:");
            e.printStackTrace();
        }
    }

 

3. java相关代码-convertDocToDocx

public static void convertDocToDocx(String docFilePath, String docxFilePath) throws IOException, InterruptedException {
        // 获取资源文件输入流
        InputStream in = Doc2DocxUtil.class.getClassLoader().getResourceAsStream("convert.py");
        if (in == null) {
            throw new IllegalArgumentException("Script file not found");
        }

        // 创建临时文件
        Path temp = Files.createTempFile("script", ".py");
        File tempFile = temp.toFile();
        // 确保临时文件在 JVM 退出时会被删除
        tempFile.deleteOnExit();
        // 将资源文件复制到临时文件
        FileUtils.copyInputStreamToFile(in, tempFile);

        ProcessBuilder pb = new ProcessBuilder("python", tempFile.getAbsolutePath(), docFilePath, docxFilePath);
        Process p = pb.start();  
        int exitCode = p.waitFor();  
        if (exitCode != 0) {
            throw new RuntimeException("Python script execution failed with exit code " + exitCode);
        }
    }

 

4. Doc2DocxUtil类中增加静态代码块

    static{
        installPythonPackage();
    }

 

 

5. main方法

    public static void main(String[] args) throws Exception {
        String libreOfficePath="D:\\Program Files\\LibreOffice\\program\\soffice.exe";
        String docFilePath = "D:\\yy\\xxx.doc";
        String docxFilePath = "D:\\yy\\xxx.docx";

        convertDocToDocx(docFilePath,docxFilePath);
    }