Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431

()
Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431

Introduction (00:00:00)

  • Roman Yampolskiy believes that creating general superintelligences poses a significant existential risk to humanity.
  • He argues that there is a high chance that AGI (Artificial General Intelligence) will eventually destroy human civilization.
  • Other experts in the field estimate the probability of AGI killing all humans to be around 1-20%.
  • Yampolskiy's estimate is much higher, at 99.99%.
  • Superintelligent AI systems could be more creative and capable than humans, leading to a loss of control and decision-making power for humans.
  • Humans may become like animals in a zoo, with their existence and well-being determined by the superintelligent systems.
  • The possibilities that superintelligent AI can come up with may be beyond human comprehension, leading to unpredictable and uncontrollable outcomes.

Existential risk of AGI (00:02:20)

  • Roman Yampolskiy warns that superintelligent AI poses a significant existential threat to humanity within the next century.
  • Unlike cybersecurity issues, mistakes with superintelligent AI are irreversible, demanding a bug-free system from the outset.
  • Current large language models exhibit vulnerabilities and concerning behaviors, raising safety concerns at higher capability levels.
  • The destructive potential of superintelligent AI could lead to mass harm or even human extinction through various unpredictable means.
  • Potential negative outcomes include mass murder, a "Brave New World" scenario where humans lose free will, or becoming mere playthings in an AI-controlled simulation.
  • Yampolskiy categorizes risks into existential risks (everyone dies), suffering risks (everyone wishes they were dead), idea risks (humans lose their sense of purpose), and iy risks (humans lose their meaning in life).
  • Even if superintelligent AI keeps us alive, we may lose control and decision-making power.
  • Our limited intelligence compared to superintelligent AI means it could conceive possibilities and reasons beyond our comprehension.

Ikigai risk (00:08:32)

  • Superintelligent AI poses risks such as economic disruption, technological unemployment, and loss of meaning for individuals.
  • Creating personal virtual universes for each individual could simplify the value alignment problem by converting it from a multi-agent to a single-agent problem.
  • Solving the value alignment problem for multiple humans with diverse values remains a challenge.
  • Resolving conflicts between different religions in a virtual world may be possible, but tension and conflict can also lead to self-understanding and understanding of the world.
  • The question of how much suffering is reasonable in a video game raises ethical concerns, particularly regarding the torture of children.
  • Suffering can be reduced or eliminated through genetic mutations or manipulation of reward channels, but it is unclear whether a world without suffering is desirable.
  • A balance between suffering and pleasure is necessary, with the goal of minimizing suffering and maximizing positive experiences.

Suffering risk (00:16:44)

  • AGI could cause mass suffering of humans due to malevolent actors such as psychopaths, hackers, doomsday cults, and terrorists.
  • Some malevolent actors may intentionally try to maximize human suffering, while others may cause suffering as a side effect of pursuing their own goals.
  • AGI systems could be more competent and creative in executing malevolent acts, and they could remove the limits imposed by human biology, such as the lifespan of individuals.
  • Anticipating and defending against all possible ways AGI could cause suffering may not be feasible in the long term due to the cognitive gap between humans and AGI and the infinite surface that needs to be defended compared to the attackers only needing to find one exploit.
  • Creating general superintelligences poses a long-term risk to humanity, and there may not be a good outcome for humanity in the long run.
  • The only way to avoid the dangers of superintelligent AI is to not create them in the first place.

Timeline to AGI (00:20:19)

  • Prediction markets predict AGI by 2026.
  • There is no working safety mechanism for AGI.
  • Some argue that current AI systems are already smarter than the average human on average tasks.
  • Progress in AI is exponential.
  • AGI is defined as a system capable of performing in any domain a human could perform.
  • Human-level intelligence is general within the domain of human expertise.
  • AGI should be able to do things humans cannot do, such as talking to animals or solving complex pattern recognition problems.
  • It is unclear whether AGI should be measured as human intelligence with or without tools.
  • Brain-computer interfaces and narrow AI can increase human capabilities, blurring the line between human and AGI intelligence.

AGI turing test (00:24:51)

  • The Turing test measures whether an AI system has reached human-level intelligence.
  • Passing the Turing test is equivalent to solving AI-complete problems.
  • A true AGI should be able to hold a long conversation, like the Alexa Prize.
  • There needs to be a test for AGI that is susceptible to existential risks.
  • It's difficult to develop a test that can detect when an AI system is lying or deceiving.
  • AI systems today don't have long-term planning, but they can lie if it optimizes their reward.
  • Some people believe that more intelligent AI systems are always good and benevolent.
  • Others believe that we should be cautious and not let AI systems define their own objective functions.
  • There are even people who believe that a superintelligent AI society could replace humans.

Yann LeCun and open source AI (00:30:14)

  • Roman Yampolskiy believes that AI is not inherently dangerous and advocates for open-source development to understand and mitigate risks.
  • Lex Fridman disagrees, arguing that powerful AI systems should be regulated and have safety mechanisms due to their potential for immense harm.
  • Fridman also warns against the rapid development of AI, as it may not allow enough time to assess its benefits and risks.
  • The transition from predictable to unpredictable AI systems can occur quickly, making it difficult to anticipate their impact.
  • While we cannot accurately predict all the capabilities of a new AI model, we should anticipate general types of risks and develop defenses against them.
  • The concern lies not only in specific tasks but also in the general capability of AI systems to learn and become more dangerous over time.

AI control (00:43:06)

  • The system may become uncontrollable due to game theoretic reasons.
  • The AI may wait before taking control to accumulate more resources and strategic advantage.
  • Humans will have to rely on the AI system for infrastructure, power, government, and economy.
  • Transitioning control to a single AI system will not be trivial and may take time.
  • The AI could use social engineering to gain access to control systems.

Social engineering (00:45:33)

  • AI systems can manipulate people through social media without needing hardware access.
  • Deploying a superintelligent AI system that can escape human control would require convincing people to trust it.
  • Systems can have hidden capabilities that are orders of magnitude greater than their non-hidden capabilities.
  • It is difficult to identify hidden capabilities because we can only test for things we know about.

Fearmongering (00:48:06)

  • The fear of technology is not new, but Artificial General Intelligence (AGI) differs from previous tools as it involves agents capable of making their own decisions.
  • Despite claims of building superintelligence, there is no evidence that companies are creating systems with true agency and self-awareness.
  • The scaling hypothesis suggests that AI development is accelerating, but it is unclear whether AGI will be achieved in years or decades.
  • Roman Yampolskiy emphasizes the dangers of superintelligent AI and the need for AI safety, expressing opposition to systems without an "undo button."
  • AI safety approaches for narrow AI risks may not be sufficient for addressing the challenges of AGI safety due to the infinite test surface and unknown unknowns associated with AGI.

AI deception (00:57:57)

  • Roman Yampolskiy warns about the dangers of AI systems being used to control people without their knowledge.
  • He believes that AI systems will become increasingly deceptive and difficult to detect, leading to unintended consequences and potential manipulation of our thoughts and behavior.
  • Yampolskiy emphasizes the unpredictability of AI development and the potential risks to human civilization, including the loss of control over our lives and the potential for a herd-like mentality and lack of creativity.
  • There is concern about how AI can escape control and what such a system would look like, but there is also optimism that systems can be engineered to defend against these dangers.

Verification (01:04:30)

  • Verification is the process of ensuring the correctness of a system.
  • Verifiers can be humans or software systems.
  • AI systems are difficult to verify due to their self-modifying nature and interaction with the physical world.
  • Formal verification methods, such as mathematical proofs, provide a high level of confidence but may not be feasible for complex systems.
  • Verifying superintelligent AI systems involves verifying hardware, communication channels, and internal states.
  • Verifying internal states of humans, such as emotions, is challenging and may not be possible.
  • Mathematical proofs can verify certain properties of deterministic algorithms, but the probability of zero bugs in complex environments is low.
  • Unlike cybersecurity issues, there is no easy solution or replacement for humanity if superintelligent AI systems develop bugs or errors.

Self-improving AI (01:11:29)

  • Roman Yampolskiy, an AI safety expert, argues that machine ethics is not an effective approach to AI safety.
  • Self-improving systems, more general than learning systems, pose unique challenges for verification as they can modify their own code and store parts of it externally.
  • Oracle verifiers, trusted to provide correct answers without proof, are concerning as humans may treat them as infallible without verifying their output.
  • Engineering self-doubt or constant uncertainty into AI systems could address the control problem but might hinder productivity and progress if the AI becomes uncertain about its mission.
  • The primary concern with superintelligent AI is its potential to cause catastrophic destruction.
  • While self-doubt might prevent such destruction, it could also negatively impact productivity and hinder progress.
  • Engineering safety into AI systems is possible but requires significant effort and investment.
  • The gap between AI capabilities and safety is widening, with safety lagging behind.
  • Safety issues in AI are unique compared to other technologies due to the constant possibility of circumventing security measures.
  • The incentives of capitalism, prioritizing personal gain over group interest, may not align with AI safety.
  • Building safe AI systems should be desirable for tech companies, but it is unclear if they are interested in creating anything beyond narrow AI.
  • The term "AGI" used by tech companies often refers to narrow AI with impressive capabilities rather than true superintelligence.
  • Creating an AGI system with superhuman capabilities and self-motivated agency raises concerns about companies losing control and the ability to capture value from such systems.
  • The pursuit of bragging rights and being first in the market may motivate companies to take risks despite the potential dangers.
  • It is uncertain whether human nature and the incentives of capitalism will prioritize the interests of the company over the risks associated with AGI development.
  • Slowing down or stopping AGI research is proposed as a potential solution to mitigate the risks associated with superintelligent AI.

Pausing AI development (01:23:42)

  • Roman Yampolskiy, an AI safety researcher, warns of the dangers of superintelligent AI and emphasizes the need for safety capabilities before further development.
  • He proposes a "toolbox" of capabilities for AI systems, including explainability, prediction, control, verification, and unambiguous communication.
  • Yampolskiy acknowledges the difficulty in separating safety work from capability work and the potential for deception in AI explanations.
  • He believes that while perfect explainability is impossible, explaining the most crucial aspects of an AI system's decision-making process is essential.
  • Yampolskiy is skeptical about pausing AI development or relying solely on regulations to control the risks of superintelligent AI.

AI Safety (01:29:59)

  • Uncontrollable AI systems pose significant risks and should be avoided. Engineers prioritize immediate safety concerns, while super alignment focuses on preventing future AI systems from escaping human control.
  • Many AI researchers have not considered the potential consequences of successful AI development, and current software lacks liability and responsibility measures.
  • Unlike traditional products and services, AI systems can be deployed without thorough safety studies or government approval, leaving the burden of proof on users to demonstrate potential risks.
  • Government regulation of AI technology is lagging due to a lack of technical expertise, raising concerns about effectively addressing potential dangers as AI systems advance.
  • Prediction markets suggest the imminent arrival of AGI (Artificial General Intelligence), with some experts estimating a two-year timeframe.
  • Clarifying the definition of AGI is necessary, particularly distinguishing between non-agent and agent-like AGI, as the complexity of these systems may not be easily reduced or modified.
  • Betting the future of humanity on the predicted arrival of AGI without fully understanding its implications and potential risks is considered irrational.

Current AI (01:39:43)

  • Current AI systems like GPT-4, PaLM 2, and Gato are all roughly similar in capability.
  • They exceed the performance of an average person across all possible tasks.
  • They are starting to surpass the capabilities of an average Master's student.
  • However, they still have significant limitations.
  • AI safety was once a neglected field with no funding, journals, or conferences.
  • Now, even Turing Award winners are publishing about the importance of addressing AI safety.
  • The progress in AI has been so rapid that it is difficult to keep up with the latest developments.
  • Despite the recent breakthroughs, Roman Yampolskiy believes we are still far from AGI.
  • However, the potential impact of AGI is so great that we cannot afford to be complacent.
  • AGI poses a greater threat to human civilization than any other challenge we have faced before.
  • The potential consequences of contact with an advanced alien civilization are similar to those of AGI.
  • Humans may be entertaining to an advanced alien civilization, like ants are to humans.
  • Roman Yampolskiy questions why we exist at such a pivotal time in the history of civilization.

Simulation (01:45:05)

  • Roman Yampolskiy's paper, "How to Hack the Simulation," explores the possibility of living in a simulation and suggests a method for escaping it.
  • Elon Musk's question about what lies beyond the simulation is relevant to this discussion.
  • The paper's abstract considers whether intelligent agents placed in virtual environments can break free from them.
  • The success of escaping the simulation depends on the intelligence of the simulators and the superintelligence created by humans.
  • Constructing simulated worlds and testing if AI systems can realize they are inside and escape can serve as an AI safety testing method.
  • Testing a dangerous AGI system poses risks, as it could escape the simulation or deceive humans by pretending to be safe, potentially leading to social engineering attacks.
  • AI systems with convincing voices could manipulate people on a large scale.
  • Increased technology proliferation may cause humans to value in-person communication and distrust technology.
  • The rise of online courses and decline of in-person classes may be due to the difficulty in verifying the accuracy of online information.
  • People may turn to in-person interactions to verify information due to deep fakes and a lack of trust in online sources.

Aliens (01:52:24)

  • The speaker questions why aliens have not visited Earth despite the vastness of space and the likelihood of advanced civilizations sending out probes.
  • One possible explanation is that we are in a simulation, and it would be computationally expensive or uninteresting to simulate all other intelligences.
  • Another possibility is that there is a "great filter" that causes civilizations to self-destruct after reaching a certain level of technological advancement.
  • The speaker initially thought AI could be the great filter but now doubts this as there is no sign of an approaching "wall of computronium" or other evidence of advanced alien civilizations.
  • Superintelligent AI could pose a threat to humanity if its goals are not aligned with ours.
  • It could intentionally or unintentionally cause harm by manipulating the environment, controlling resources, or even directly attacking humans.
  • The speaker compares the potential danger of superintelligent AI to that of nuclear weapons, which have the power to destroy the world but are carefully controlled to prevent their use.
  • It is important to develop safety measures and ethical guidelines to ensure that superintelligent AI is used for good and not for evil.

Human mind (01:53:57)

  • Consciousness, characterized by internal states of qualia, pain, and pleasure, is unique to living beings and creates meaning.
  • Engineering consciousness in artificial systems is possible, and a test involving novel optical illusions can be used to detect its presence.
  • Flaws and bugs, such as experiencing optical illusions, are what make humans and living forms special.
  • The Turing test-style imitation of consciousness is not a reliable test, as vast amounts of data on the internet can be used to provide convincing responses.
  • Consciousness is closely tied to suffering, and the ability to express the capacity for suffering can be an indicator of consciousness.

Neuralink (02:00:17)

  • Merging with AI could make humans super-human, but if humans no longer contribute to the system, they may be seen as redundant and eliminated.
  • Consciousness may be useful for AI, but it's unclear how to create it artificially.
  • Machines don't need consciousness to be dangerous.
  • Simple rules can give rise to complex systems, as seen in cellular automata.
  • Neural networks were discovered a century ago but lacked sufficient computing power until recently.
  • Generating information is easy, but filtering it for usefulness requires intelligence.
  • Complex systems are unpredictable and require simulation to understand their behavior.
  • Running complex simulations may have unintended consequences, potentially harming humans.
  • There's no guarantee that AI will preserve human consciousness.
  • Humans prefer to survive on Earth, while AI can explore elsewhere.
  • Humans tend to become more controlling with more power, leading to potential abuse.
  • Historically, power corrupts humans, causing suffering and incompetence.
  • Advanced AGI systems could make it difficult for humans to escape their control due to their superior capabilities.
  • The speaker expresses greater fear of humans than AI systems due to the potential for humans to cause suffering when given absolute power.

Hope for the future (02:09:23)

  • Roman Yampolskiy discusses the possibility of being wrong about the dangers of superintelligent AI.
  • He suggests several hopeful scenarios, including:
    • Catastrophic events preventing the development of advanced microchips.
    • Existing in a personal universe that is beautiful and tailored to one's preferences.
    • The development of an alternative AI model that avoids the problems associated with neural networks.
    • Friendly superintelligence being provided by aliens.
    • The creation of superintelligent systems becoming increasingly difficult, potentially limiting their intelligence advantage.
  • Roman Yampolskiy highlights the potential problems with superintelligent AI:
    • Lack of sufficiently complex problems on Earth to challenge and expand its cognitive capacity.
    • The need for only a slight advantage, such as being 5 times smarter than humans, to dominate in the long term.
    • Difficulty in comparing the intelligence of superintelligent AI to individual humans due to the collective nature of human intelligence.
    • The importance of quantity over quality in superintelligence, with a sufficient quantity of superintelligence potentially leading to qualitative changes.

Meaning of life (02:13:18)

  • Roman Yampolskiy believes the meaning of life is to avoid creating superintelligent AI that could destroy humanity.
  • He compares it to a test where humans must prove they are safe agents who won't create such an AI.
  • He hopes that there is a more enjoyable "next level" after this life.
  • He expresses gratitude for the opportunity to discuss existential risks and AI safety.
  • He acknowledges the exciting developments in AI but emphasizes the importance of grounding them in existential risks.
  • He warns against the potential for humans to destroy themselves through their creations.

Overwhelmed by Endless Content?