Fixing Rogue Memorization in Many-to-One Multilingual Translators of Extremely-Low-Resource Languages by Rephrasing Training SamplesPaulo Rodrigo CavalinPedro Domingueset al.2024NAACL 2024
Quantifying the Ethical Dilemma of Using Culturally Toxic Training Data in AI Tools for Indigenous LanguagesPedro DominguesClaudio Santos Pinhanezet al.2024LREC-COLING 2024
Training Large Language Encoders with the Curated Carolina CorpusGuilherme Lamartine MelloPaulo Rodrigo Cavalinet al.2024PROPOR 2024