prof_pic.jpg

Benjamin Minixhofer

b{lastname}@gmail.com

Hi there! I am a PhD student at the Language Technology Lab of the University of Cambridge and intern at Google DeepMind. I do research in Natural Language Processing. Right now, I am especially interested in multilinguality, tokenization and language emergence. I am also interested in Rust as a language for writing fast, correct research code. I obtained a BSc in Artifical Intelligence from Johannes Kepler University Linz in 2023. Previously, I interned at Cohere, H2O.ai and Huawei Noah’s Ark Lab in London. I started out by being active on Kaggle.

Selected Publications

  1. Zero-Shot Tokenizer Transfer
    Benjamin Minixhofer, Edoardo Maria Ponti, and Ivan Vulić
    arXiv preprint arXiv:2405.07883, 2024
  2. CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
    Benjamin Minixhofer, Jonas Pfeiffer, and Ivan Vulić
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
  3. Where’s the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
    Benjamin Minixhofer, Jonas Pfeiffer, and Ivan Vulić
    In Proceedings of the 2023 Conference of the Association for Computational Linguistics: Human Language Technologies, Jul 2023
  4. WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
    Benjamin Minixhofer, Fabian Paischer, and Navid Rekabsaz
    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul 2022