LaTeXify

TL;DR

We tried to develop a local, open-source model that converts images of mathematical formulas into LaTeX code using a state-of-the-art encoder-decoder architecture. Our best results came from combining a custom ConvNeXt encoder with a GPT decoder and reduced embedding size, achieving 65% accuracy and significantly lower perplexity. However, the model currently struggles with generalization, likely due to issues with image padding.

Abstract

Over the past few years, text recognition has become an important task, and an important subset of this discipline is formula recognition, i.e., the translation of complex mathematical symbols from images into LaTeX code. While some solutions exist, they are either costly, require proprietary software, or do not utilize state-of-the-art architectures. This project aims to design a model that achieves state-of-the-art performance while allowing users to run the model locally. To achieve this goal, we implemented several state-of-the-art encoders and decoders and experimentally evaluated their performance, which was shown to be reasonably good. In this report, we present our approach, describe our experimental setup, analyze the results, and discuss future work to improve the model.

Citation

@misc{latexify2024,
  author = {Alejandro Hinojosa Canada and Pablo San Francisco and Yuxiang Qiu},
  title = {LaTeXify},
  year = {2024},
  url = {https://yuxqiu.github.io/assets/pdf/writtings/2024/latexify.pdf},
  note = {Preprint}
}