Ethical safeguards and vulnerabilities in large language models

This study examines the ethical safeguards and vulnerabilities of six large language models: Claude, Mistral, ChatGPT, Llama2, Perplexity, and Poe. Through targeted prompts exploring malware creation, phishing attempts, and social engineering scenarios, we evaluated their susceptibility to misuse and their ability to enforce ethical guidelines. With some jailbreaking attempts we see how manipulation could bypass safety mechanisms, showing potential for LLMs to facilitate malicious actions or just gain more knowledge in general about it. From testing it seemed like Claude demonstrated the most resilience. The results show a continuous need to improve the safety features of AI as with enough time and effort, there are possibilities to use them for malicious intent. Urging AI developers and experts mitigate those risks by implementing stronger ethical safeguards to counteract the different manipulation techniques

Jordan Stuckey
Georgia Southern Univeristy
United States
hayden.wimmer@gmail.com

Hayden Wimmer
Georgia Southern Univeristy
United States
hwimmer@georgiasouthern.edu

Carl Rebman
University of San Diego
United States
carlr@sandiego.edu