Automating document conversion in Linux using JODConverter/OOo
Problem: Need to automatically convert an existing Microsoft Word document to a PDF on the fly.
My solution to this is using OpenOffice.org and JODConverter and call the JODConverter webservice from PHP. I tried searching for a quite a while for other ways to do this, and they were either hardly as easy, or didn’t look half as good when they came out as PDFs. Of course you can convert a lot more than just DOC -> PDF. At latest glance the JODConverter homepage lists the following possible conversions:
* DOC to PDF, DOC to ODT, DOC to RTF
* XLS to PDF, XLS to ODS, XLS to CSV
* PPT to PDF, PPT to ODP, PPT to SWF
* ODT to PDF, ODT to DOC, ODT to RTF
* ODS to PDF, ODS to XLS, ODS to CSV
* ODP to PDF, ODP to PPT, ODP to SWF
In any case, let’s see how to do this in 5 steps:
1. Remove all traces of whatever OpenOffice.org installations you may already have on your system. In my case, this was as simple as
[vic@ares:~$] sudo yum remove openoffice.org*
2. Download and install the newest version of OOo from http://openoffice.org
[vic@ares:~$] wget http://spout.ussg.indiana.edu/openoffice/stable/2.4.0/OOo_2.4.0_LinuxIntel_install_wJRE_en-US.tar.gz [...] [vic@ares:~$] tar zxf OOo_2.4.0_LinuxIntel_install_wJRE_en-US.tar.gz [vic@ares:~$] ls [...] [vic@ares:~$] cd OOH680_m12_native_packed-1_en-US.9286/ [vic@ares:~/OOH680_m12_native_packed-1_en-US.9286$] sudo ./setup Checksumming… Extracting … 93128 blocks Done. Using /var/tmp/install_21191/usr/java/jre1.6.0_04/bin/java java version “1.6.0_04″ Java(TM) SE Runtime Environment (build 1.6.0_04-b12) Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode) Running installer /var/tmp/install_21191/usr/java/jre1.6.0_04/bin/java -DHOME=/home/vic -DJRE_FILE=jre-6u4-linux-i586.rpm -jar JavaSetup.jar System locale: en_US Root privileges OS: Linux Mode: installation
If you have a way to display an X-server, at this point, the installation will open up some fancy windows and allow you to install stuff. Make sure that ‘Headless application support’ is enabled if you do a custom installation.
3. Write a short chkconfig script so the headless server can automatically start at boot time.
#!/bin/bash
# openoffice.org headless server script
#
# chkconfig: 2345 80 30
# description: headless openoffice server script
# processname: openoffice
OOo_HOME=/opt/openoffice.org2.4
SOFFICE_PATH=$OOo_HOME/program/soffice
PIDFILE=$OOo_HOME/openoffice-server.pid
case "$1" in
start)
if [ -f $PIDFILE ]; then
echo “OpenOffice headless server has already started.”
exit
fi
echo “Starting OpenOffice headless server”
$SOFFICE_PATH -headless -accept=”socket,host=127.0.0.1,port=81000;urp;” -nofirststartwizard & > /dev/null 2>&1
touch $PIDFILE
;;
stop)
if [ -f $PIDFILE ]; then
echo “Stopping OpenOffice headless server.”
killall -9 soffice && killall -9 soffice.bin
rm -f $PIDFILE
exit
fi
echo “Openoffice headless server is not running, foo.”
exit
;;
*)
echo “Usage: $0 {start|stop}”
exit 1
esac
exit 0
Save it as /etc/init.d/openoffice. Now we need to add it to chkconfig.
[vic@ares:/etc/init.d$] sudo chkconfig –add openoffice [vic@ares:/etc/init.d$] sudo chkconfig –list openoffice openoffice 0:off 1:off 2:on 3:on 4:on 5:on 6:off [vic@ares:/etc/init.d$] sudo service openoffice start Starting OpenOffice headless server [vic@ares:/etc/init.d$]
Feel free to grep through ps aux to confirm that the soffice process did indeed start.
4. Read Usage as a Web Application on Art of Solving and download jodconverter-tomcat-2.2.1.zip and extract it somewhere. Make sure that either JAVA_HOME or JRE_HOME are set and then start jodconverter/bin/startup.sh. If you realllllly feel like it, you can create another chkconfig script for this. Since I’m lazy, I just added it to my previous openoffice headless server script like so:
#!/bin/bash
# openoffice.org headless server script
#
# chkconfig: 2345 80 30
# description: headless openoffice server script
# processname: openoffice
#
# Author: Vic Vijayakumar
#
OOo_HOME=/opt/openoffice.org2.4
SOFFICE_PATH=$OOo_HOME/program/soffice
PIDFILE=$OOo_HOME/openoffice-server.pid
JOD_HOME=/wwwroot/apps/jodconverter
case "$1" in
start)
if [ -f $PIDFILE ]; then
echo “OpenOffice headless server has already started.”
exit
fi
echo “Starting OpenOffice headless server”
$SOFFICE_PATH -headless -accept=”socket,host=127.0.0.1,port=8100;urp;” -nofirststartwizard & > /dev/null 2>&1
$JOD_HOME/bin/startup.sh & > /dev/null 2>&1
touch $PIDFILE
;;
stop)
if [ -f $PIDFILE ]; then
echo “Stopping OpenOffice headless server.”
killall -9 soffice && killall -9 soffice.bin
$JOD_HOME/bin/catalina.sh stop > /dev/null 2>&1
rm -f $PIDFILE
exit
fi
echo “Openoffice headless server is not running, foo.”
exit
;;
*)
echo “Usage: $0 {start|stop}”
exit 1
esac
exit 0
5. Time to write the code to connect to the jodconverter service and convert our documents for us. Here’s my code:
<?php
// prepare the file download (in my case, these actually come from a database record)
$file = '/path/to/input/file';
// instantiate the document converter class.
require_once 'HTTP/Request.php';
class Converter{
var $url = 'http://localhost:8080/converter/service';
function convert($input, $input_file_type, $output_file_type){
$request = new HTTP_Request($this->url);
$request->setMethod("POST");
$request->addHeader("Content-Type", $input_file_type);
$request->addHeader("Accept", $output_file_type);
$request->setBody($input);
$request->sendRequest();
return $request->getResponseBody();
}
}
// do whatever else we need to do to make the magic happen
$converter = new Converter();
$input_file_type = 'application/vnd.oasis.opendocument.text';
$output_file_type = 'application/pdf';
$output_file = '/path/to/' . basename($file, get_extension($file)) . 'pdf';
$output = $converter->convert(file_get_contents($file), $input_file_type, $output_file_type);
file_put_contents($output_file, $output);
// and now replace the file variable with what we just created
$file = $output_file;
$download_filename = basename($file);
// required for IE, otherwise Content-Disposition is ignored
if(ini_get('zlib.output_compression')) ini_set('zlib.output_compression', 'Off');
// and now pipe the file out to the customer
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); // some day in the past
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private", false);
header('Content-Description: File Transfer');
header("Content-Type: application/octet-stream");
header("Content-Transfer-Encoding: Binary");
header("Content-Length: " . filesize($file));
header('Content-Disposition: attachment; filename="' . $download_filename . '"');
set_time_limit(0);
readfile($file);
exit;
?>
Modify as you need, of course.
Reference: JODConverter Online Guide
You’re currently reading “Automating document conversion in Linux using JODConverter/OOo”, an entry on sudo make me a sandwich
- Published:
- 05.30.08 / 1am
- Category:
- Linux, PHP, Programming
- Tags:
- conversion, document, java, jodconverter, Linux, open office, openoffice.org, PHP
- Post Navigation:
- « High-Availability with Fedora, DRBD, Heartbeat and Mon [and Xen]
When geeks get bored »
