With a growing community of researchers working on the recognition, parsing and digital exploitation of mathematical formulae, a need has arisen for a set of samples or benchmarks which can be used to compare, evaluate and help to develop different implementations and algorithms. The benchmark set would have to cover a wide range of mathematics, contain enough information to be able to search for specific samples and be accessible to the whole community. In this paper, we propose an on-line system and repository where researchers may upload samples of mathematics in various formats such as scanned images, images directly rendered from born-digital documents, or born-digital document extracts. The system will support community tagging of these samples with attributes about their syntactic structure, semantic origin, image quality and source. Each sample in the database may then be searched for by any of its associated attributes, and users could download sets of sorted or random formulae to meet their own requirements. Associated with the system will be freely downloadable tools to assist in extracting and clipping mathematical samples from various kinds of documents to prepare them for uploading. Additionally, the system will allow users to annotate each sample with their own files, in LaTeX, MathML, OpenMath and other formats. The intention here is that these annotation files will correspond either to the recognition results of the users’ own systems on the samples, or manually constructed results. We believe that this facility will help to build a community verified ground truth set, available to anyone accessing the system.
1. Hoos, H.H., Stutzle, T.: SATLIB: An online resource for research on SAT
. In: Proceedings of the Third Workshop on Satisfiability (SAT 2000), IOS Press (2000) 283–292 http://www.satlib.org
2. Sutcliffe, G., Suttner, C.: The TPTP Problem Library: CNF Release v1.2.1
. Journal of Automated Reasoning 21(2) (1998) 177–203 MR 1646570
| Zbl 0910.68197
3. Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database
. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), IEEE Society Press (2005) 675–679 http://www.inftyproject.org/en/database.html