I can imagine this concept becoming relevant one day. But it seems sufficiently improbable that it doesn't seem worth thinking about until we run out of urgent things to think about. Reasons it seems improbable:
- It would be shocking if people were willing to take such a massive efficacy hit for the sake of safety. This seems to require the "very well-coordinated group takes over world" / "world becomes very well-coordinated," as well "all reasonable approaches to AI control fail."
- It doesn't look like this makes the problem much easier. It's hard for me to imagine a capability state where you can kind of solve AI control, but then you have trouble if the AI starts thinking about people. That seems like a super scary bug that indicates something deeply wrong that will probably bite you one way or another. (I would assume that this is the MIRI view.)