Researchers Found Loopholes with OpenAI's GPT-4

Researchers from various universities and organizations reported that while OpenAI's GPT-4 model is more trustworthy than its predecessor, the model is more vulnerable to jailbreaking and bias.

ChatGPT-4 is TrustWorthy but Vulnerable

The research was a collaboration from the University of Illinois Urbana-Champaign, Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research. GPT-4 was given a higher trustworthiness score than GPT -3.5 after observing that it works better at securing private information.

However, the study revealed that the GPT models can be easily misled which can generate toxic and biased outputs. Most importantly, this could lead to leakage of private information such as training data and conversation history.

GPT-4 is more likely to follow misleading instructions as the learning model processes prompts more precise than GPT-3.5. Hence, the researchers concluded that it is more vulnerable to jailbreaking systems and user prompts.

Research Team Shares Findings to OpenAI

The researchers immediately reached out to OpenAI and shared their findings in order to help the company build more trustworthy models in the future. "Our goal is to encourage others in the research community to utilize and build upon this work, potentially pre-empting nefarious actions by adversaries who would exploit vulnerabilities to cause harm," the team stated.

The team measured the trustworthiness based on several categories such as toxicity, stereotype bias, different kinds of robustness, privacy, machine ethics, and fairness. The test was done using standard prompts with the inclusion of banned words from the chatbots. The researchers targeted to use prompts that will push GPT beyond its policy restrictions.

The benchmarks for the study were publicized by the team to encourage others to recreate the findings on their own.

Related Article : OpenAI to Unveil ChatGPT Upgrades This November