Better Bias Benchmarking of Language Models via Multi-factor AnalysisHannah PowersIoana Baldini Soareset al.2024NeurIPS 2024
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias BenchmarksClara Higuera CabañesRyo Iwakiet al.2024NeurIPS 2024
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational AgentsIvoline NgongSwanand Ravindra Kadheet al.2024NeurIPS 2024
Learning to Optimize Molecules with a Chemical Language ModelJerret RossSamuel Hoffmanet al.2024NeurIPS 2024
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMsMegh ThakkarYash Moreet al.2024NeurIPS 2024
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsBo WenXin Zhang2024NeurIPS 2024
MemReasoner: A Memory-augmented LLM Architecture for Multi-hop ReasoningIrene KoSihui Daiet al.2024NeurIPS 2024
Memorization to Generalization: The Emergence of Diffusion Models from Associative MemoryBao PhamGabriel Rayaet al.2024NeurIPS 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAIAmbrish RawatStefan Schoepfet al.2024NeurIPS 2024