Back to all

Cooperative Approaches to Cross-Boarder and Cross-Sectoral AI Security

First published on INHR’s website.

Human-Machine Teaming for Hybrid Offense and Defense

An AI4Good conference workshop called “trustworthy AI and validation,” featured several AI and cyber experts discussing how AI can bolster cybersecurity. Dawn Song, a professor at UC Berkeley, presented her research into AI’s offensive and defensive roles in AI cybersecurity. By understanding how to attack AI and how to utilize AI agents in executing those attacks, her work helps develop new defense strategies. She presented findings from her research on AI agent risk assessments and vulnerabilities such as:

  • RedCode, a benchmark for assessing the risks and safety of code agents around code execution and generation. This helps experts to understand where effort can be directed to improve safety.
  • AgentVigil, an end-to-end red teaming program for testing agents’ susceptibility to prompt injection attacks.

According to her, AI might advantage defenders more than attackers in cybersecurity. AI agents can be made into automated vulnerability bounty hunters, and AI tools and agents could be an indispensable aid to humans in code verification, white hat hacking, risk assessment, and vulnerability patching. Automating some defense tasks gives humans a software-based ally in cyberspace but human-machine teaming may entail hardware, too.

This chip-level security, called ‘compute governance,’ could be an important tool for verifying and logging dataflows between model servers and users. Adversarial data such as prompt injection attacks or adversarial induced outputs undetected at the software level could be intercepted at the hardware level, and this might play a supporting role in enforceability mechanisms relevant to alignment. Atlas Computing CEO Evan Miyazono, another workshop panelist, used this example: imagine an International Atomic Energy Agency official monitoring every atom of uranium so that the official can turn it into lead if it gets misused. That is not possible for uranium, but Miyazono believes it is possible for GPUs. Called a ‘guarantee chiplet,’ this chip would sit between high bandwidth interposer and high band memory in a GPU. Miyazono believes companies might accept such chips as it would add “insignificant thermal and power overhead” to datacenters, but getting model labs to deploy such a tool might require regulation.

Developers invest huge sums of time and money into training and operating frontier models at large datacenters full of graphics processing units (GPUs). They are incentivized to enhance security against users wishing to elicit LLM outputs that are prohibited by safety rules, and against industry- and state-level actors wishing to steal model weights. One proposed method of addressing both is implementing governance at the inference compute level. This involves monitoring and controlling requests going into model servers and responses coming out of them. Solutions like Anthropic and Pattern Labs’ ‘model loader and invoker’ could protect model weights and user data alike by acting as a sort of intermediary ‘airlock’ through which only trusted data can be decrypted between user requests and model inference.

Company-Government Cooperation

In the United States, AI firms received carte blanche from the federal government under its new AI Action Plan, Pillar I of which aims to “remove red tape and onerous regulations.” Here, removing red tape and regulations means disposing of all policy that might “unnecessarily hinder AI development or deployment,” and reviewing Federal Trade Commission investigations “to ensure that they do not advance theories of liability that unduly burden AI innovation.” Continuing a policy of lowering regulatory barriers begun in the Biden Administration, this means that Big Tech and Big Government are ‘working together’ on safety policy. The Action Plan proposes that  government promote AI evaluations and secure-by-design AI systems, but it does not explicitly require them. Meanwhile, prominent AI developers like Anthropic and OpenAI create and publish their own internal safety policies and model cards. Pending the rule of law, these promises and specs seem to be only words.

By contrast, in China, the Cyberspace Administration of China maintains a registry of AI models to support verification and auditing. State policy is clear and risk specific, with National Technical Committee 260’s AI Safety Governance Framework explicitly laying out risks that AI models must avoid, AI labeling laws, and consistent vocal support for AI ethics issues of data privacy and algorithmic bias. In addition to registering their models and complying with government mandated AI safety practices, developers submit to voluntary benchmarking tests with the AI Industry Alliance and the Chinese Academy of Information and Communications Technology. In China, as opposed to the United States, companies and businesses seem to have arrived at a more 50/50 division of AI safety policy. While U.S. firms lead, Chinese firms follow, which may be the reason that some ask why top Chinese AI companies issue no public responsible scaling policies.

International Cooperation

At AI For Good, geopolitical divides often surfaced. While there, I heard European representatives vocally criticizing American corporations, American officials firmly supporting U.S. policy stances on impeding China’s access to U.S. AI chips, and applause for a Saudi representative decrying that governance led by a Sino-American AI dyad is antithetical to ‘multi-stakeholder-ism.’ 

Still, there are signs that AI standards of practice and risk thresholds can be agreed upon. China and the U.S have already found agreement on one AI red line, that AI should never be in control of the decision to use nuclear weapons. Both countries are also willing to attend AI For Good, even if the U.S. withdraws from other UN engagements.

Cyber threats are very real. Operating systems, internet-connected devices, and websites have global reach, and national governments need not be friends to work together. For AI, the stakes are simply too high to wait for an incident to spur action. While representatives deliver their talking points in public, governments, companies, researchers, and cybersecurity experts should be hard at work behind the scenes.

Governments may talk past one another before a global audience, but their respective capitals are beginning to grasp how great AI risks are. Thankfully, much of diplomacy – and technical standard setting – is not conducted under a spotlight. Standards emerge through market activity, industry meetings, and in labs; and much of the legwork of diplomacy is conducted behind closed doors. Countries and industries can work together, even amid strained international relations and when policy offers no clear guidance. That is a good thing, because for standards in AI safety, we cannot afford to wait for official decrees.