Towards a Benchmark for Causal Business Process Reasoning with LLMsFabiana FournierLior Limonadet al.2024BPM 2024
Data Contamination Report from the 2024 CONDA Shared TaskOscar SainzIker García-ferreroet al.2024ACL 2024
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven ArgumentationTomas Bueno MomcilovicBeat Buesseret al.2024xAI 2024
Exploring Vulnerabilities in LLMs: A Red Teaming Approach to Evaluate Social BiasYuya Jeremy OngJay Pankaj Galaet al.2024IEEE CISOSE 2024
Navigating the Modern Evaluation Landscape: Considerations in Benchmarks and Frameworks for Large Language Models (LLMs)Leshem ChoshenAriel Geraet al.2024LREC-COLING 2024
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?Yu-Lin TsaiChia-yi Hsuet al.2024ICLR 2024
Can LLMs Fix Issues with Reasoning Models? Towards More Likely Models for AI PlanningTurgay CaglarSirine Belhajet al.2024AAAI 2024
Human Evaluation of the Usefulness of Fine-Tuned English Translators for the Guarani Mbya and Nheengatu Indigenous LanguagesClaudio Santos PinhanezPaulo Rodrigo Cavalinet al.2024PROPOR 2024