Training Large Language Encoders with the Curated Carolina CorpusGuilherme Lamartine MelloPaulo Rodrigo Cavalinet al.2024PROPOR 2024