Harnessing Collective Intelligence for Dynamic AI Governance

MICHAEL ZARGHAM, JULES HEDGES

Institutional Problem or Research Question

Describe what the open institutional problem or research question you’ve identified is, what features make it challenging, and how people deal with it currently.

As artificial intelligence (AI) plays an increasingly critical role in various aspects of modern society, effective governance of AI systems becomes paramount. This research proposal aims to develop a general theory of institutional learning and adaptive feedback mechanisms for governing transformer-based AI systems. By synthesising the expertise of control theorist Michael Zargham and applied category theorist Jules Hedges, we will investigate the design, operations, and governance of emerging institutions, including large language models, using the emerging field of categorical cybernetics.

Traditional AI governance approaches have been static, assuming human preferences and goals remain constant across contexts and over time. However, this view is insufficient as both preferences and contexts change over time. AI systems also have an incomplete view of the world and cannot access certain facts, such as the internal preferences of humans. Therefore, designing feedback mechanisms that allow the AI system to express variety and to be updated over time is essential.

Proposed Solution

Describe what your proposed solution is and how it makes use of AI. If there’s a hypothesis you’re testing, what is it? What makes this approach particularly tractable? How would you implement your solution?

Objectives:

  1. Develop a theoretical framework for designing feedback mechanisms that empower humans to provide meaningful input in updating transformer-based AI models.

  2. Investigate input-output stable feedback mechanisms and their implications with respect to desirable properties, such as minority preference representation.

  3. Create accessible software models that allow institution designers to experiment with the dynamics of institution change without needing to understand the technical tools behind it.

Methodology:

We will consider a population of humans impacted by the decisions of an AI within a specific context. The AI is a transformer model characterised by a certain configuration. Our goal is to design a mechanism which allows humans to aggregate their observations into a feedback process that updates the model's configuration.

The proposed design space includes a wide range of time scales; this is an expanded form of Reinforcement Learning with Human Feedback (RLHF) with a focus on varying the scope and time scale in model configuration updates, including but not limited to:

  1. Updating the model's configuration every time a user logs an observation, which may create gaming or thrashing.

  2. Periodically updating the model's configuration based on accumulated observations, which may cause lags or waste computation when no update is needed.

  3. More nuanced event-based models which determine when updates are merited, preserving computation resources, at the cost of introducing additional subject policies regarding ‘when updates are merited’.

Using control theory, categorical cybernetics, and compositional game theory, we will develop a theoretical framework and software models for designing feedback mechanisms. This work is best understood as updating core cybernetics concepts with recent developments in mathematics and computer science while applying that theory to the governance of AI systems.

Expected Outcomes:

  1.  A comprehensive theoretical framework for designing and evaluating stable dynamic feedback mechanisms in AI governance.

  2. Accessible software models that simplify institution design for researchers and practitioners, abstracting away the complexity of the underlying theories.

  3. Empirical results and insights on input-output stable feedback mechanisms and their implications for minority preferences representation in AI governance.

Method of Evaluation

Describe how you will know if your solution works, ideally at both a small and large scale. What resources and stakeholders would you require to implement and test your solution?

Our ability to evaluate, apply and iterate on this theoretical framework in practice involves the following:

  1. A comprehensive theoretic framework will be evaluated first on its formal correctness, achieved via peer reviewed publication of key results proven about the framework.

  2. The formal framework’s usefulness will be determined primarily by our ability to achieve expected outcome 1 (building software that abstracts away the formal framework) while giving designers and practitioners the benefits of the formally proven framework properties.

  3. We can also use the conceptual framework and software tools for rigorous mathematical design and computational testing of feedback mechanisms.

  4. Further success of the framework and associated software will be determined by its uptake by other researchers and institutional designers working on mechanisms to regulate AI systems programmatically.

  5. Finally, the ultimate success criterion would involve seeing mechanisms designed and evaluated using this framework put into production with regular AI models “in the wild.” Ideally this would demonstrate empirically that the associated feedback mechanisms produce stable systems with desirable properties, including but not limited to supporting minority interests and adapting over time to context drift or exogenous shocks.

Risks and Additional Context

What are the biggest risks associated with this project? If someone is strongly opposed to your solution or if it is tried and fails, why do you think that was? Is there any additional context worth bearing in mind?

The largest sources of risk in this project are the large scope and the relative illegibility of feedback control relative to expert systems. On scope, critics will argue that attempting to develop a formal mathematical framework is cart-before-horse. As such, the most common approach to developing mechanisms for regulating AI systems is best understood as an expert systems approach. Expert systems use explicit rules and involve feed-forward control; they take human logic, encode it in software, then apply and hope for the best.

Despite expert systems’ historically poor performance in regulating complex systems, humans trust them for interpretability reasons. Feedback control, in contrast, calls for mechanisms which provably produce desirable properties in the resulting closed-loop systems. While the mechanisms and methods for designing them are more complex, well-designed feedback control systems essentially “cancel out” the underlying complexity, resulting in simpler system-level behaviour. Success criterion 5 above indicates a desire to demonstrate this phenomenon in the regulation of a production AI system. We cannot guarantee that humans will trust or engage with a mechanism they do not fully understand, but observations of physical public infrastructures (e.g. power grids) suggest that if a system serves their needs safely and reliably, then people will likely trust it. (I.e. the trust flows from what the system does rather than how it does it.)

We recognize that this approach requires a large investment of time and effort. The broad scope required to accomplish this goal is, in our opinion, this proposal’s largest shortcoming. Fortunately, existing work from both authors provides a promising foundation. Furthermore, experience with research funded by the robotics, aerospace and defence industries has taught us that some challenges are complex enough to merit such an approach – similarly large investments produced practical advances in control systems engineering.

Conclusion and Next Steps

Outline the next steps of the project and a roadmap for future work. What are your biggest areas of uncertainty?

This research project seeks to address the need for more dynamic and adaptable AI governance mechanisms by designing feedback systems that harness collective intelligence. We hope that by developing a theoretical framework and practical software tools, we can enable institution designers to navigate the complexities of AI governance in a rapidly changing world. Our interdisciplinary approach, combining expertise from control theory and applied category theory, can contribute to the effective governance of AI systems and improve alignment with evolving human values and preferences.

See related hackmd notes here

Previous
Previous

Conversational AI for Non-Human Representation in Decision-Making

Next
Next

Compositional Mechanism Design