Better Bias Benchmarking of Language Models via Multi-factor AnalysisHannah PowersIoana Baldini Soareset al.2024NeurIPS 2024
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational AgentsIvoline NgongSwanand Ravindra Kadheet al.2024NeurIPS 2024
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2024NeurIPS 2024
Consistency-based Black-box Uncertainty Quantification for Text-to-SQLDebarun BhattacharjyaBalaji Ganesanet al.2024NeurIPS 2024
Towards Unbiased Evaluation of Time-series Anomaly DetectorDebarpan BhattacharyyaSumanta Mukherjeeet al.2024NeurIPS 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAIAmbrish RawatStefan Schoepfet al.2024NeurIPS 2024