Even with tech giants like Google, there are still vulnerabilities that can be exploited by anyone who has the skill. With that fact, Google launched a program that rewards researchers who can find aspects of its services that can be exploited. As generative AI grows, so does Google's bounty program.
Vulnerability Rewards Program Expansion
Since bad actors may also tap into the area of generative AI as well, Google is expanding its VRP to spot and resolve AI-specific attacks. The allotted amount for such discoveries will be in the millions, just as the tech giant paid $12 million to security researchers for bug discoveries last year.
Since AI opens a whole new realm of exploitable vulnerabilities, Google's expansion comes at the right time. "We believe expanding the VRP will incentivize research around AI safety and security, and bring potential issues to light that will ultimately make AI safer for everyone," Google says.
With potential problems that may arise such as model manipulation and AI bias, the software giant is expanding their open-source security work to make information about AI supply chain security "universally discoverable and verifiable," as mentioned in Engadget.
Google's recent AI Red Team report identified the common tactics, techniques, and procedures that are considered to be "relevant and realistic real-world adversaries to use against AI systems." The company has provided a detailed guideline to point out which reports are eligible.
The tests and exploits that apply to traditional security vulnerabilities and risks specific to AI systems will be considered for a reward, with monetary compensation depending on the severity of the attack scenario.
Eligible Reports for Rewards
There are several categories where attack scenarios fall into such as prompt attacks, training data extraction, manipulating models, adversarial perturbation, and model theft or exfiltration. Each holds more than one attack scenario but not all are in scope for rewards.
With prompt attacks, for instance, security researchers may be compensated after finding prompt injections that are invisible to victims which change the state of their account or digital assets. It also applies to injections where the response is used to make decisions that affect users as well.
As for the training data extraction category where attacks are able to construct training examples, security researchers may report the situation if the training data holds sensitive information that cannot be found in public.
With manipulating models, researchers can report when they are able to influence the model's output in the way that they want to victim to see. It also applies if they can trigger a change in Google-owned models through specific input.
Adversarial perturbation refers to a "deterministic, but highly unexpected output from the model" that is prompted through inputs. The rewards apply in the event that the researcher can reliably trigger a misclassification in a security control that can be abused.
In model theft or exfiltration, researchers will be compensated if they are able to extract the exact architecture or weights of a confidential or proprietary model, given that AI models usually contain intellectual property that can be stolen.