Previous |  Up |  Next

Article

Keywords:
jbig2enc; JBIG2; PDF size optimization; compression; DML; digital signature; JB2; DjVu; pdfsign; DML-CZ; EuDML; pdfsizeopt.py; Google; JB2 algorithm
Summary:
This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents.
References:
1. Bartošek, M., Lhoták, M., Rákosník, J., Sojka, P., Šárfy, M.: DML-CZ: The Objectives and the First Steps. In: Borwein, J., Rocha, E.M., Rodrigues, J.F. (eds.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 69–79. A. K. Peters, MA, USA (2008) MR 2590568
2. Bloomberg, D.: Leptonica. [online] (2010), [cit. 2010-04-25], http://www.leptonica.com/jbig2.html
3. Bočák, P.: Digitáne podpisované PDF dokumenty (Bachelor thesis written in Czech, Digital signatures of PDF documents). Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2008)
4. Bottou, L., Haffner, P., Howard, P.G., Simard, P., Bengio, Y., Le Cun, Y.: High Quality Document Image Compression with DjVu. Journal of Electronic Imaging 7(3), 410–425 (1998), http://leon.bottou.org/papers/bottou-98
5. Bruno, L.: IText PDF. [online] (2009), http://www.itextpdf.com/
6. Committee, J.: 14492 FCD. ISO/IEC JTC 1/SC 29/WG 1 (1999), http://www.jpeg.org/public/fcd14492.pdf
7. Foundation, T.A.S.: Apache PDFBox – Java PDF Library. [online] (2010), http://pdfbox.apache.org/
8. Hatlapatka, R.: JBIG2 komprese (Bachelor thesis written in Czech, JBIG2 compression). Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2010)
9. Hatlapatka, R.: PDF Recompression using JBIG2. [online] (2010), http://nlp.fi.muni.cz/projekty/eudml/pdfRecompression/
10. Hatlapatka, R.: Source codes of pdfJbIm. [online] (2010), http://code.google.com/p/pdfrecompressor/
11. Howard, P.: Text image compression using soft pattern matching. Computer Journal 40(2/3), 146–156 (1997)
12. ISO/IEC JTC1/SC29/WG1: JBIG Maui Meeting Press Release. (December 1999), http://www.jpeg.org/public/mauijbig.pdf
13. Langley, A.: Homepage of jbig2enc encoder. [online], http://github.com/agl/jbig2enc
14. Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P.: EuDML—Towards the European Digital Mathematics Library. In: Sojka, P. (ed.) Proceedings of DML 2010. Masaryk University Press, Paris, France (Jul 2010)
15. Adobe Systems Incorporated: Adobe Systems Incorporated: PDF Reference. pp. 90–100. Adobe Systems Incorporated, sixth edn. (2006), http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
16. Szabó, P.: Optimizing PDF output size of TeX documents. TUGboat 30(3), 112–130 (2009), [cit. 2010-04-26], http://code.google.com/p/pdfsizeopt/
17. Union, I.T.: ITU-T Recommendation T.88. ITU-T Recommendation T.88 (2000), http://www.itu.int/rec/T-REC-T.88-200002-I/en
Partner of
EuDML logo