Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You InItay NakashGeorge Kouret al.2025NAACL 2025
Exploring Straightforward Methods for Automatic Conversational Red-TeamingGeorge KourNaama Zwerdlinget al.2025NAACL 2025
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial ScenariosSamuel AckermanElla Rabinovichet al.2024EMNLP 2024
Unveiling Safety Vulnerabilities of Large Language ModelsGeorge KourMarcel Zalmanoviciet al.2023EMNLP 2023
Predicting Question-Answering Performance of Large Language Models through Semantic ConsistencyElla RabinovichSamuel Ackermanet al.2023EMNLP 2023
Text Augmentation Using Dataset Reconstruction for Low-Resource ClassificationAdir RahamimGuy Uzielet al.2023ACL 2023
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text CorporaGeorge KourSamuel Ackermanet al.2022EMNLP 2022