Thread
If AI models get even better, their unchecked use will begin to pose serious dangers to society.

Most people agree it’d be great if countries could agree on rules to prevent AI misuse/accidents, & avoid an arms race.

But how could rules on AI actually be enforced?
Paper thread:
I wrote a paper on my current best guess for how to do it.

That said, verifying compliance internationally is a messy business. The devil is in the details, and the details aren’t on Twitter.
See the paper for all my caveats!

Ok, why are AI laws hard?
arxiv.org/abs/2303.11341
The problem is that NN weights are digital goods: once created, they can be trivially copied, fine-tuned, and deployed.
Usually, that’s wonderful: it’ll mean cheap OSS models that create broad benefits.
But if a model can be used to hurt, this untraceability of use is a nightmare
Cutting-edge models can be run on a few gaming GPUs (10^8 sold), and this may get even cheaper over time.

Once acquired, there is no way countries could prevent a criminal from using an AI model, let alone stop a foreign company/government from using it. arxiv.org/abs/2303.06865
There are two ways we can respond:
1) Building societal defenses (eg. models to spot and fix code vulnerabilities before a cyberattacker finds them; detecting deepfakes w/ ML).
We need these societal-hardening measures ASAP. But building them takes time.

How do we buy time?
This brings us to our focus:
2) Placing limits on when and how models get trained.
If a harmful model isn’t built, it can’t be deployed. And the training process for cutting-edge AI models is much more spottable and amenable to oversight by govts, because it’s *expensive*
Training models like GPT-4 or PaLM currently requires thousands of specialized chips running for months.

Moore’s Law + algorithmic innovation will make that cheaper over time, but the cutting edge (to which we’ve had the least time to build defenses) will likely stay expensive.
Also, cutting-edge model training is done on specialized data-center chips (like TPUs or A100s) that are almost all bought by companies/governments.
So we can exempt consumer devices. That’s important, since regulating access to personal computers is totally unacceptable.
We can leverage the highly-concentrated chip supply chain to get a pretty good sense for who owns enough specialized chips to train cutting-edge AI models.

But does that help us?
There’s no easily-detectable difference between “use GPUs to train small socially-beneficial AI models” and “use GPUs to build big potentially-dangerous AI models”.
They’re the same devices, connected slightly differently and running different code.
But unless we find a way to distinguish between safe and dangerous AI training, govs are likely to resort to messier regulations that stifle both.
(In some sense, this might’ve already started; see the China chip export controls.)
[Sidebar: what rules might we actually want to enforce?
My paper stays neutral on this. Some ideas:
1) Limits on a model’s performance on dangerous-use/weaponization benchmarks
2) Requiring use of safety best practices to prevent accidents
3) Creator liability for downstream use]
What if we just pass some laws, and don’t worry too hard about enforcement?

That’d be a HUGE step.

& there are a lot of common-sense measures to improve enforcement, like improving cloud KYC and auditing AI developers/cloud operators.

But long-term, that won’t be enough.
State-backed actors & a black market may break laws covertly, if they can get away with it.

AI models will be major, economy-shaping tools. If we want binding multilateral laws/agreements, we need good verification.
The paper examines a particular system for international verification of rules on AI training.

It asks: what can a Prover (eg a company) do to prove to a Verifier (eg a govt) that every AI training run in the Prover’s data-centers followed the rules?
At its core, the system focuses on ML training chips, and how to make them log sufficient info so that an inspector could retroactively detect any large training run.
There are 3 constraints.
1st, this needs to the extent possible preserve the secrecy of the Prover’s data, code, and model weights.
2nd, it needs to be hard to circumvent, even for state-actors.
3rd, needs to be possible for today’s AI training setups
The proposed system has 3 parts:

1) Using verified firmware on AI chips to occasionally save snapshots of NN weights-in-memory 💾
2) Proving what training run would produce those weights 🏋️
3) Making sure the Prover isn't secretly using extra chips & hiding them
Each of these parts is hard to do, and real-world implementation would require us to solve gnarly open technical problems.

But having real laws on AI is one of the most important problems of our time, and failure isn’t an option. So let’s see if we can figure it out!

Diving in:
1) During large-scale NN training, model weights are shared across chips. The firmware should occasionally save these weight shards, & log a hash of them to NVRAM (or send it to an approved server).
(A few challenges here, esp. hardware security; check paper for deeper dive.)
2) Ask the Prover to provide (hashes of) training data, hyperparams, & model checkpoints. Then Prover & Verifier jointly check that training would lead to the weight-snapshot from (1).
We suggest how to do this ~securely, w/o data leaks. (Open Q is if we can do this cheaply!)
2 ctd) if the reported info checks out, then we can check the training data, model performance, hparams, to see whether the training run followed the rules.
If not, that’s proof the law/agreement was violated, and triggers penalties.
[“What penalties?” was a Q I left open. Domestically, the justice system. Internationally, could be chip embargoes, or the threat of mutual defection+your enemies building and misusing AI systems, the same basis on which START treaties worked.]
3) For chip-monitoring to be effective, we need to know large concentrations of ML chips. But in the West, that seems likely to happen independently, given the importance to enforcing export controls.
This is nice and all, but it’s only going to happen if each of the relevant stakeholders *wants* to make it work. In the last section, I describe the benefits to each of the relevant parties (the public, AI cos, chip/cloud cos, and govs/mils).
I also list several biz/policy actions that are independently useful near-term, and that’d set us up to turn on such a system later if we needed to.
To pick one: AI co’s can protect against model-stealing by buying security-enabled hardware.
Inevitably, such a system would be implemented iteratively, piece by piece, first by a few companies voluntarily & expanding over time.
Even if only some participate, unilateral confidence-building measures often have strategic value.
The paper isn’t a solution itself, but a framework. It leaves many open problems, & if some can’t be fixed, we’ll have to do something else.
My hope is that someone reads it, thinks it’s bad, and then counter-proposes something better!
To all ML researchers, this paper is also a warning. If you think “the AI policy people will just set up laws against AI misuse”... no one knows how to do that (yet).
We need serious AI people on it. I needed to be a CS PhD to write this.
Someday, international rules on AI are going to be very important. When that day comes, we need to have solutions ready.
If you’re a policy or ML person, and interested in this, let’s talk (DMs open!). Your success is our success.
P.S. In case someone thinks this is anti-OSS, I want to clarify: sharing model weights is awesome once society can protect itself against bad actors misusing those weights.

These rules are how we keep AI co’s accountable in the meantime.
Here’s the paper again: arxiv.org/abs/2303.11341

Mentions
There are no mentions of this content so far.