Sunday, February 05, 2017

Deploying EpubCheck 4.0.x as a Web-based EPUB validator


(10 Sept '17): Epubcheck-web updated on Github

I'm very pleased to report that Jason Darwin has merged some small changes I submitted (to the Ant .war build file) and made his project up-to-date, ready to use with the latest version of epubcheck.jar.

https://github.com/jcdarwin/epubcheck-web


Whilst manually creating an EPUB file for Thursday’s Lotus, I wrote some PHP code that I wrote for the ‘heavy lifting’, particularly for assembling the final package. I have recently continued work on the automated support with a view to creating a general-purpose web-based system. An initial aim is to reach the stage where I can upload a book authored in MS Word (saved as filtered HTML) and use the service to generate a valid EPUB file that I can then manually tweak. Ideally, it will be good enough to publish straightaway. As part of the process, the assembled EPUB needs to be validated. Previously, I had run this separately at the command line using epubcheck.jar, as made available by the EpubCheck project, but now I needed to set this up as a web application providing a basic web service, which is the main focus of this post.

Epubcheck is a Java application; the GitHub repository shows the current release at 4.0.2. The project home page indicates, "EpubCheck can be run as a standalone command-line tool or used as a Java library." with the wiki providing some guidance on usage in a variety of contexts, including some GUIs, but no explicit mention of web usage. However, a tantalising hint is found in the distribution README file in the source code (epubcheck/src/main/assembly/README-dist.txt), which mentions "EpubCheck can be run as a standalone command-line tool, installed as a web application or used as a library." (The emphasis is my own.)

My experience of Java is limited to coding very elementary programs and deploying a few web application archive (.war) files, so realistically I need a web application archive (or sources ready to build). Whilst EpubCheck doesn’t include these, fortunately, Jason Darwin has addressed this very problem. A few years ago he wrote about the procedure on his blog, with a post entitled, Creating a WAR file for epubcheck. So it could be done. However, those instructions are for epubcheck up to version 3 and were written when the repository was using Subversion on Google Projects (from which it has moved to GitHub). Accordingly he then set up a GitHub project, epubcheck-web, for version 4. By following the instructions I eventually got it working on my laptop after a few tweaks. Conveniently my development environment is also Mac with the Homebrew package manager, and I am running the Oracle-supplied JDK SDK, currently 1.8.

There are two separate build processes:
  1. epubcheck.jar - the standalone validator built using maven (alternatively, this can be downloaded ready-built from the IDPF site).
  2. epubcheck.war - the web application archive built using ant for deploying in a servlet container (I installed Tomcat locally)
For the epubcheck.jar build, one of the unit tests failed:
remote_Test(com.adobe.epubcheck.test.single_file_Test)  
  Time elapsed: 0.238 sec  <<< FAILURE!
  junit.framework.AssertionFailedError: Missing message
  at junit.framework.Assert.fail(Assert.java:50)
This concerns the processing of single files that are not ePubs. After seeing what it was attempting I skipped it, trusting that it wasn’t an issue, by specifying an exclude in pom.xml
. The compilation then completed safely. Having built epubcheck.jar, I turned to the second build process — the web interface to invoke the validator. Again, I generally followed the instructions, though to actually download the sources I used:
$ git clone https://github.com/jcdarwin/epubcheck-web.git
I found the key to get it working is to ensure that epubcheck.jar is included in the right place. I simply copied the file to the webapp’s lib/ folder and then referenced it in build-war.xml, alongside the other .jar files
<path id="epubcheckServlet.classpath">
...
  <fileset dir="${epubcheck.web.includelibs}"><include name="Saxon-*.jar" />
    <include name="epubcheck.jar" />
  </fileset>
I also commented out any reference to building epubcheck.jar, which I think is superfluous as far as building the web interface is concerned. Then I proceeded to build with ant and copy over the .war, as instructed. Tomcat duly deployed the webapp, with the minimalist web form:

When supplying an ePub file to validate I was initially getting blank output and wondering why, I started thinking it had to do with the Java classpath used by Tomcat. On reading Understanding The Tomcat Classpath - Common Problems And How To Fix Them, I examined more closely WEB-INF/classes and WEB-INF/lib and realised I was missing epubchecker.jar! It was then that I was prompted to add this to the epubcheck-web src/lib/ folder and rebuild. The resulting .war file was duly increased in size by about 1MB. And on redeploying the app and applying it to my EPUB file I got a reassuring pause for processing before the output came through as expected.

Now it was ready for use as a basic web service for my PHP-based system, which I'll describe in the next post.

No comments: