分享

How to convert docx/odt to pdf/html with Java? | Angelo's Blog

 libZMT 2016-12-19
 
28 Votes

How to convert docx/odt to pdf/html with Java? This question comes up all the time in any forum like stackoverflow. So I decided to write an article about this topic to enumerate the Java (open source) frameworks which manages that.

Here some paid product which manages docx/odt to pdf/html converters :

To be honest with you, I have not tried those solution because it’s not free. I will not speak about them in this article.

Here some open source product which manages docx/odt to pdf/html converters :

  • JODConverter : JODConverter automates conversions between office document formats using OpenOffice.org or LibreOffice. Supported formats include OpenDocument, PDF, RTF, HTML, Word, Excel, PowerPoint, and Flash. It can be used as a Java library, a command line tool, or a web application.
  • docx4j: docx4j is a Java library for creating and manipulating Microsoft Open XML (Word docx, Powerpoint pptx, and Excel xlsx) files. It is similar to Microsoft’s OpenXML SDK, but for Java. docx4j uses JAXB to create the in-memory object representation.
  • XDocReport which provides:

Here criteria that I think which are important for converters :

  • best renderer : the converter must not loose some formatting information.
  • fast : the converter must be the more fast.
  • less memory intensive to avoid OutOfMemory problem.
  • streaming: use InputStream/OutputStream instead of File. Using streaming instead of File, avoids some problems (hard disk is not used, no need to have write right on the hard disk)
  • easy to install: no need to install OpenOffice/LibreOffice, MS Word on the server to manage converter.

In this article I will introduce those 3 Java frameworks converters and I will compare it to give Pros/Cons for each framework and try to be more frankly because I’m one of XDocReport developer.

If you want to compare result of conversion, performance, etc of docx4j and XDocReport quickly, you can play with our live demo which provides a JAX-RS REST converter service.

Sorry with my English!

Before starting to read this article, I would like to apologize me with my bad English. I don’t want to say ? XDocReport is the best ? and I don’t want to have some offence with JODConverter, docx4j, FOP guys. Goal of this article is to introduce those 3 frameworks converters and share my skills about odt and docx converters to PDF.

Download

You can download samples of docx/odt converters explained in this article :

How to manage PDF with Java?

Here the 3 best famous Java PDF libraries:

  • Apache FOP: Apache FOP (Formatting Objects Processor) is a print formatter driven by XSL formatting objects (XSL-FO) and an output independent formatter. It is a Java application that reads a formatting object (FO) tree and renders the resulting pages to a specified output. Output formats currently supported include PDF, PS, PCL, AFP, XML (area tree representation), Print, AWT and PNG, and to a lesser extent, RTF and TXT. The primary output target is PDF.
  • Apache PDFBox: The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities. Apache PDFBox is published under the Apache License v2.0
  • iText: iText is a library that allows you to create and manipulate PDF documents. It enables developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.

    With iText, there are 2 versions:

    • 2.3.x which is MPL License.
    • 5.x which is AGPL License.

How to convert docx/odt to pdf/html with Java?

Just for information, docx and odt files are a zip which is composed with:

  • several xml entries like word/document.xml (docx), content.xml (odt) which describes with XML the content of the document, styles.xml which describes used styles, etc.
  • binary data for image.

To compare performance between JODConverter, docx4j, XDocReport framework converters, tests must follow 2 rules:

  • logs must be disabled to ignore time of generated log (ex: docx4j generates a lot of logs which degrade the performance).
  • convert twice the docx/odt to html/pdf, to ignore time of the initialization of the framework converter (ex: ignore time of connection to LibreOffice with JODConverter, ignore time of the load of JAXB classes of docx4j, etc). To compare our converters frameworks, we will convert twice the docx and will retain the last elapsed time.

To compare the result quality of the conversion, I have tried to use on each samples converters project, several docx which are designed with Table (border, rows/cols span), Header/Footer, images etc. In this article we will just study simple docx HelloWorld.docx :

But you can launch the other docx of each Java Eclipse Project to see the result of html and pdf conversion.

JODConverter with docx

To test and use JODConverter, you need to install OpenOffice or LibreOffice. In my case I have installed LibreOffice 3.5 on Windows.

org.samples.docxconverters.jodconverter Eclipse project that you can download here is sample of docx converter with JODConverter. This project contains:

  • docx folder which contains several docx to convert. Those docx comes from the XDocReport Git that we use to test our converter.
  • pdf and html folders where docx will be converted.
  • lib folder whith JODConverter and dependencies JARs.

Download JARs

To download JODConverter JARs, download the zip jodconverter-core-3.0-beta-4-dist.zip, unzip it and copy paste the lib folder of the zip to your Eclipse Java project. Add those JARs in your classpath.

My test was done with LibreOffice 3.5 and the official distribution doesn’t work with LibreOffice 3.5 (see issue 103).
To fix this problem, I have replaced the official JARs jodconverter-core-3.0-beta-4.jar with jodconverter-core-3.0-beta-4-jahia2.jar.

HTML converter

Here the JODConverter Java code which converts twice the ? docx/HelloWorld.docx ? to ? html/HelloWorld.html ?:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package org.samples.docxconverters.jodconverter.html;
import java.io.File;
import org.artofsolving.jodconverter.OfficeDocumentConverter;
import org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration;
import org.artofsolving.jodconverter.office.OfficeManager;
public class HelloWorldToHTML {
    public static void main(String[] args) {
        // 1) Start LibreOffice in headless mode.
        OfficeManager officeManager = null;
        try {
            officeManager = new DefaultOfficeManagerConfiguration()
                    .setOfficeHome(new File("C:/Program Files/LibreOffice 3.5"))
                    .buildOfficeManager();
            officeManager.start();
            // 2) Create JODConverter converter
            OfficeDocumentConverter converter = new OfficeDocumentConverter(
                    officeManager);
            // 3) Create HTML
            createHTML(converter);
            createHTML(converter);
        } finally {
            // 4) Stop LibreOffice in headless mode.
            if (officeManager != null) {
                officeManager.stop();
            }
        }
    }
    private static void createHTML(OfficeDocumentConverter converter) {
        try {
            long start = System.currentTimeMillis();
            converter.convert(new File("docx/HelloWorld.docx"), new File(
                    "html/HelloWorld.html"));
            System.err.println("Generate html/HelloWorld.html with "
                    + (System.currentTimeMillis() - start) + "ms");
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

You can notice that code uses java.io.File for docx input and html output because JODConverter cannot work with Streaming.

After running this class, you will see on the console few JODConverter logs and the elapsed time of the conversion :

1
2
Generate html/HelloWorld.html with 12109ms
Generate html/HelloWorld.html with 391ms

JODConverter converts a simple HelloWorld.docx to HTML with 391ms. The quality of the conversion is perfect.

Note that, in my case the connection to LibreOffice takes a long time (5219ms) and disconnection too.

PDF converter

Here the JODConverter Java code which converts twice the ? docx/HelloWorld.docx ? to ? pdf/HelloWorld.pdf ?:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package org.samples.docxconverters.jodconverter.pdf;
import java.io.File;
import org.artofsolving.jodconverter.OfficeDocumentConverter;
import org.artofsolving.jodconverter.office.DefaultOfficeManagerConfiguration;
import org.artofsolving.jodconverter.office.OfficeManager;
public class HelloWorldToPDF {
    public static void main(String[] args) {
        // 1) Start LibreOffice in headless mode.
        OfficeManager officeManager = null;
        try {
            officeManager = new DefaultOfficeManagerConfiguration()
                    .setOfficeHome(new File("C:/Program Files/LibreOffice 3.5"))
                    .buildOfficeManager();
            officeManager.start();
            // 2) Create JODConverter converter
            OfficeDocumentConverter converter = new OfficeDocumentConverter(
                    officeManager);
            // 3) Create PDF
            createPDF(converter);
            createPDF(converter);
        } finally {
            // 4) Stop LibreOffice in headless mode.
            if (officeManager != null) {
                officeManager.stop();
            }
        }
    }
    private static void createPDF(OfficeDocumentConverter converter) {
        try {
            long start = System.currentTimeMillis();
            converter.convert(new File("docx/HelloWorld.docx"), new File(
                    "pdf/HelloWorld.pdf"));
            System.err.println("Generate pdf/HelloWorld.pdf with "
                    + (System.currentTimeMillis() - start) + "ms");
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

After running this class, you will see on the console few JODConverter logs and the elapsed time of the conversion :

1
2
Generate pdf/HelloWorld.pdf with 3172ms
Generate pdf/HelloWorld.pdf with 468ms

JODConverter converts a simple HelloWorld.docx to PDF with 468ms. The quality of the conversion is perfect.

docx4j

dox4j provides several docx converters :

  • docx to HTML converter.
  • docx to PDF converter based on XSL-FO and FOP.

org.samples.docxconverters.docx4j Eclipse project that you can download here is sample of docx converter with docx4j. This project contains:

  • docx folder which contains several docx to convert. Those docx comes from the XDocReport Git that we use to test our converter.
  • pdf and html folders where docx will be converted.
  • lib folder whit docx4j and dependencies JARs.

For docx4j, logs must be disabled because it generates a lot of logs which degrade the performance. To do that :

  • create src/docx4j.properties like this :
    1
    docx4j.Log4j.Configurator.disabled=true
  • create src/log4j.properties like this :
    1
    log4j.rootLogger=ERROR

Donload

with maven

To download docx4j and their dependencies JARS, the best mean is to use maven with this pom:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.samples.docxconverters.docx4j</groupId>
    <artifactId>org.samples.docxconverters.docx4j</artifactId>
    <packaging>pom</packaging>
    <version>0.0.1-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <version>2.1</version>
                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>lib</outputDirectory>
                            <overWriteReleases>true</overWriteReleases>
                            <overWriteSnapshots>true</overWriteSnapshots>
                            <overWriteIfNewer>true</overWriteIfNewer>
                            <excludeTypes>libd</excludeTypes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j</artifactId>
            <version>2.8.1</version>
        </dependency>
    </dependencies>
</project>

After you can do :

1
mvn process-resources

and it will download the well JARs and will copy it to the lib folder.

without maven

Go at docx4j downloads.

HTML converter

Here the docx4j Java code which converts twice the ? docx/HelloWorld.docx ? to ? html/HelloWorld.html ?:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
package org.samples.docxconverters.docx4j.html;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.xml.transform.stream.StreamResult;
import org.docx4j.convert.out.html.AbstractHtmlExporter;
import org.docx4j.convert.out.html.AbstractHtmlExporter.HtmlSettings;
import org.docx4j.convert.out.html.HtmlExporterNG2;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class HelloWorldToHTML {
    public static void main(String[] args) {
        createHTML();
        createHTML();
    }
    private static void createHTML() {
        try {
            long start = System.currentTimeMillis();
            // 1) Load DOCX into WordprocessingMLPackage
            InputStream is = new FileInputStream(new File(
                    "docx/HelloWorld.docx"));
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                    .load(is);
            // 2) Prepare HTML settings
            HtmlSettings htmlSettings = new HtmlSettings();
            // 3) Convert WordprocessingMLPackage to HTML
            OutputStream out = new FileOutputStream(new File(
                    "html/HelloWorld.html"));
            AbstractHtmlExporter exporter = new HtmlExporterNG2();
            StreamResult result = new StreamResult(out);
            exporter.html(wordMLPackage, result, htmlSettings);
            System.err.println("Generate html/HelloWorld.html with "
                    + (System.currentTimeMillis() - start) + "ms");
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

You can notice that code uses InputStream/OutputStream (Streaming) for docx input and html output.

After running this class, you will see on the console the elapsed time of the conversion :

1
2
Generate html/HelloWorld.html with 5109ms
Generate html/HelloWorld.html with 47ms

docx4j converts a simple HelloWorld.docx to HTML with 47ms. The quality of the conversion is perfect.

PDF converter

Here the docx4j Java code which converts twice the ? docx/HelloWorld.docx ? to ? pdf/HelloWorld.pdf ?:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package org.samples.docxconverters.docx4j.pdf;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.docx4j.convert.out.pdf.PdfConversion;
import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class HelloWorld2PDF {
    public static void main(String[] args) {
        createPDF();
        createPDF();
    }
    private static void createPDF() {
        try {
            long start = System.currentTimeMillis();
            // 1) Load DOCX into WordprocessingMLPackage
            InputStream is = new FileInputStream(new File(
                    "docx/HelloWorld.docx"));
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                    .load(is);
            // 2) Prepare Pdf settings
            PdfSettings pdfSettings = new PdfSettings();
            // 3) Convert WordprocessingMLPackage to Pdf
            OutputStream out = new FileOutputStream(new File(
                    "pdf/HelloWorld.pdf"));
            PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
                    wordMLPackage);
            converter.output(out, pdfSettings);
            System.err.println("Generate pdf/HelloWorld.pdf with "
                    + (System.currentTimeMillis() - start) + "ms");
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

You can notice that code uses InputStream/OutputStream (Streaming) for docx input and pdf output.

After running this class, you will see on the console the elapsed time of the conversion :

1
2
Generate pdf/HelloWorld.pdf with 16156ms
Generate pdf/HelloWorld.pdf with 219ms

docx4j converts a simple HelloWorld.docx to PDF with 219ms. The quality of the conversion is perfect.

XDocReport (Apache POI XWPF)

XDocReport provides docx converters based on Apache POI XWPF:

Pay attention, this converter works only with docx and not with doc format. If you wish convert doc format, please see the official converter of Apache POI.

The basic idea with XDocReport (Apache POI XWPF) is to

  1. load docx with XWPFDocument Apache POI XWPF.
  2. loop for each XWPF Java structures (XWPFParagraph, XWPFTable etc) of the loaded XWPFDocument to
    • generate HTML with SAX for html converter.
    • generate PDF with iText structure (Paragraph, table etc).

org.samples.docxconverters.xdocreport Eclipse project that you can download here is sample of docx converter with XDocReport (Apache POI XWPF). This project contains:

  • docx folder which contains several docx to convert. Those docx comes from the XDocReport Git that we use to test our converter.
  • pdf and html folders where docx will be converted.
  • lib folder whit XDocReport and dependencies JARs.

Donload

with maven

To download XDocReport (Apache POI XWPF) and their JARs dependencies the best mean is to use maven with this pom:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.samples.docxconverters.xdocreport</groupId>
    <artifactId>org.samples.docxconverters.xdocreport</artifactId>
    <packaging>pom</packaging>
    <version>0.0.1-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <version>2.1</version>
                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>lib</outputDirectory>
                            <overWriteReleases>true</overWriteReleases>
                            <overWriteSnapshots>true</overWriteSnapshots>
                            <overWriteIfNewer>true</overWriteIfNewer>
                            <excludeTypes>libd</excludeTypes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
            <version>1.0.0</version>
        </dependency>
    </dependencies>
</project>

After you can do :

1
mvn process-resources

and it will download the well JARs and will copy it to the lib folder.

without maven

You can download docx.converters-xxx-sample.zip which contains the well JARs.

HTML converter

Here the XDocReport (Apache POI XWPF) Java code which converts twice the ? docx/HelloWorld.docx ? to ? html/HelloWorld.html ?:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package org.samples.docxconverters.xdocreport.html;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class HelloWorldToHTML {
    public static void main(String[] args) {
        createHTML();
        createHTML();
    }
    private static void createHTML() {
        try {
            long start = System.currentTimeMillis();
            // 1) Load DOCX into XWPFDocument
            InputStream is = new FileInputStream(new File(
                    "docx/HelloWorld.docx"));
            XWPFDocument document = new XWPFDocument(is);
            // 2) Prepare Html options
            XHTMLOptions options = XHTMLOptions.create();
            // 3) Convert XWPFDocument to HTML
            OutputStream out = new FileOutputStream(new File(
                    "html/HelloWorld.html"));
            XHTMLConverter.getInstance().convert(document, out, options);
            System.err.println("Generate html/HelloWorld.html with "
                    + (System.currentTimeMillis() - start) + "ms");
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

You can notice that code uses InputStream/OutputStream (Streaming) for docx input and html output.

After running this class, you will see on the console the elapsed time of the conversion :

1
2
Generate html/HelloWorld.html with 828ms
Generate html/HelloWorld.html with 32ms

XDocReport (Apache POI XWPF) converts a simple HelloWorld.docx to HTML with 32ms. The quality of the conversion is perfect.

PDF converter

Here the XDocReport (Apache POI XWPF) Java code which converts twice the ? docx/HelloWorld.docx ? to ? pdf/HelloWorld.pdf ?:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package org.samples.docxconverters.xdocreport.pdf;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class HelloWorldToPDF {
    public static void main(String[] args) {
        createPDF();
        createPDF();
    }
    private static void createPDF() {
        try {
            long start = System.currentTimeMillis();
            // 1) Load DOCX into XWPFDocument
            InputStream is = new FileInputStream(new File(
                    "docx/HelloWorld.docx"));
            XWPFDocument document = new XWPFDocument(is);
            // 2) Prepare Pdf options
            PdfOptions options = PdfOptions.create();
            // 3) Convert XWPFDocument to Pdf
            OutputStream out = new FileOutputStream(new File(
                    "pdf/HelloWorld.pdf"));
            PdfConverter.getInstance().convert(document, out, options);
             
            System.err.println("Generate pdf/HelloWorld.pdf with "
                    + (System.currentTimeMillis() - start) + "ms");
             
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

You can notice that code uses InputStream/OutputStream (Streaming) for docx input and pdf output.

After running this class, you will see on the console the elapsed time of the conversion :

1
2
Generate pdf/HelloWorld.pdf with 1375ms
Generate pdf/HelloWorld.pdf with 63ms

XDocReport (Apache POI XWPF) converts a simple HelloWorld.docx to PDF with 63ms. The quality of the conversion is perfect.

Comparison

docx converters with HelloWorld

Framework Fast for HTML? Fast for PDF? Less memory intensive Streaming Easy to install?
JODConverter 391ms 468ms Don’t know how to test that? No No because it requires installation of OpenOffice/LibreOffice
docx4j 47ms 219ms OutOffMemory when memory is not enough Yes Yes
XDocReport 32ms 63ms Yes Yes Yes

what about with complex docx conversion?

At this step we have seen how to convert a simple docx to html and pdf with JODConverter, docx4j and XDocReport (Apache POI XWPF).

But docx can be more complex like table, paragraph, header footer, image etc. To compare the result of html and pdf conversion, you can start the other classes *ToHTML and *ToPDF inluded in the 3 projects.

It’s difficult to tell which framework converter is the best : it depends of the content of docx :

  • JODConverter doesn’t manage correctly border table (see FormattingTests.docx).
  • docx4j have problems with table border styled for first/end row (see AdvancedTable.docx).
  • XDocReport has problems with bulleted list (see Resume.docx)

If you have problem with XDocReport, please create an issuse with your attached docx or odt by explaining your problem.

Conclusion

JODConverter

The Pros with JODConverter is that it is based on OpenOffice/LibreOffice which is a powerfull software to manage the write and convert document. The quality of the conversion is very good.

However, in my case with LibreOffice 3.5, I have several problems with docx conversion to pdf with table borders (see FormattingTests.docx), rows/cols spans (see TableWithRowsColsSpanToPDF).

The Cons with JODConverter is:

  • you must install OpenOffice/LibreOffice on your server side
  • you must have write rights on your server side, because it doesn’t manage Streaming.
  • performance is not very good.

docx4j

Pros for docx4j is a great library to manage docx (merge several docx, compare it, etc). It provides several implemention for Conversion to PDF like FOP and iText which manages streaming and are easy to install (no need to install MS Word or OpenOffice/LibreOffice). But iText version is not official and have not a good renderer. Conversion with FOP have a good renderer.

The Cons with docx converter with FOP is:

  • FOP is more slow than iText.
  • with FOP you can have problems with OutOfMemory. In our case we needed to add memory on the Tomcat Server on Cloudbee for our live demo to avoid OutOfMemory.

More it uses some XSLT (for XSL-FO). I’m not a big fan with XSL : debugging XSL is very hard (debugging Java is easy). I think docx4j should switch to iText conversion implementation instead of FOP. FOP is very powerfull, but in the case of odt/docx converter I think it’s better to use iText.

I think FOP should provide the capability to create PDF with Java model like iText and not only with FO. I have posted my suggestion on FOP forum but I had had none answer.

XDocReport

Since XDocReport 1.0.0, odt/docx converters was improved a lot. There are fast, uses less memory intensive, manage streaming and are easy to install (no need to install MS Word or OpenOffice/LobreOffice).

The quality of the renderer manage a lot of cases, but it’s not magic and there are again problems like with table border, bullet-ed/numbered list shapes which are not managed.

HTML converters should be more improved (manage table border, bullet-ed/numbered list etc).

Note that XDocReport provides a REST Converter services (you can use XDocReport (Apache POI XWPF) or docx4j). With this feature you can use those converter with any technologies like PHP, C#, Python, etc by developping a REST client.

What about with odt converter?

To convert odt to PDF or HTML, you can use :

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多