The rapid deployment of AI agents places them in high-stakes interaction with humans and with each other (e.g., autonomous driving, web negotiation, policy support). Ensuring safety—alignment with human interests—requires moving beyond a single “universal values” target. Adopting the “polychrome quilt” perspective (Leibo et al., 2025; Rorty, 2021), we frame alignment as conflict management across plural, sometimes incompatible values. This motivates a game-theoretic and reinforcement-learning lens that acknowledges real-world complexity; yet in such contexts, naïve RL often converges to undesirable equilibria (e.g., mutual defection; Foerster et al., 2018). In this talk, I present our efforts toward scalable opponent-shaping methods—Advantage Alignment and extensions—that steer learning dynamics toward robust, cooperative, pro-social equilibria, and discuss implications for LLM-based negotiation and shared-resource dilemmas.