Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

08 Mar 2024 (over 1 year ago)

Introduction (0s)

Yann LeCun believes the concentration of power in proprietary AI systems is a significant danger.
He argues that keeping AI systems locked away due to security concerns would lead to a future where a small number of companies control the information diet.
LeCun believes people are fundamentally good and that open-source AI can empower the goodness in humans.
He criticizes those in the AI community who warn about the existential threat of AGI, arguing that AGI will be created but will not escape human control or harm humanity.

Limits of LLMs (2m18s)

Yann LeCun believes autoregressive language models (LLMs) lack essential characteristics of intelligent behavior, such as understanding the physical world, persistent memory, reasoning, and planning.
Despite vast text data training, LLMs fall short compared to a four-year-old child's knowledge gained through sensory input.
LeCun argues that most knowledge comes from observation and interaction with the real world, not just through language.
The AI community is divided on whether embodiment is necessary for true intelligence, with some philosophers and researchers arguing for its essentiality.
LLMs have shown impressive capabilities but lack the ability to learn certain skills as efficiently as humans, such as driving or household chores.
LeCun suggests that LLMs may struggle to construct a world model that encompasses both visual and conceptual information, leading to difficulties in understanding and interacting with the physical world.
Current approaches to combining visual and textual data in LLMs are seen as "hacks" that do not provide a true end-to-end understanding of the world.
LLMs are trained to predict the next word in a text sequence, limiting their ability to reason about the physical world and engage in common-sense reasoning.

Bilingualism and thinking (13m54s)

Thinking is relatively independent of the language in which we speak.
There is a higher level of abstraction that maps onto language.
Abstract representations exist for various types of thinking, including mathematical concepts and physical actions.
Unlike humans, language models (LLMs) don't plan their answers; they instinctively generate one word after another.
LLMs retrieve knowledge and generate responses without thoroughly thinking about them.
The sophistication of the world model determines the depth and complexity of the generated text.
The assumption that LLMs possess an internal world model is crucial.

Video prediction (17m46s)

Yann LeCun discusses the challenges of building a complete World model using generative models.
Generative models struggle to predict distributions over high-dimensional continuous spaces like videos due to the vast amount of information and intricate details present.
Predicting missing parts of images or videos from corrupted versions has been largely unsuccessful, unlike the success of this approach for text-based language models.
The difficulty lies in forming a good representation of an image or video that captures all the necessary information for accurate predictions.
Joint embedding, which involves training the system to jointly embed images and text, leads to better generic features for images.

JEPA (Joint-Embedding Predictive Architecture) (25m7s)

JEPA takes the full image and its corrupted version, runs them through encoders, and trains a predictor to predict the representation of the full input from the corrupted one.
Contrastive learning, a method used to train JEPA, involves showing pairs of images that are the same or different and pushing the representations of different images away from each other.
Non-contrastive methods, developed in recent years, do not require negative contrastive samples and rely on other tricks to prevent the system from collapsing.

JEPA vs LLMs (28m15s)

Joint embedding predictive architectures (JEPA) aim to extract abstract representations of inputs, focusing on predictable information while eliminating unpredictable details.
Unlike language models and vision systems that generate the original input, JEPA simplifies the prediction process by targeting abstract representations.
JEPA enables the system to learn abstract representations of the world, preserving predictable information and discarding unpredictable noise.
Yann LeCun discusses the limitations of current large language models (LLMs) and emphasizes the importance of focusing on how machines learn about the world before combining language with AI systems.
LeCun introduces the concept of "joint embedding predictive architecture" as a potential approach to learning common sense and understanding the world like animals do.
Non-contrastive learning techniques, such as distillation and methods like BoLA, VRA, IA, and DINO, help prevent the system from ignoring the input and collapsing, and have shown promising results in learning visual representations.

DINO and I-JEPA (37m31s)

DINO and I-JEPA are two techniques developed at FAIR to predict the representation of a good image from a corrupted one.
DINO is image-specific and can perform geometric transformations and blurring.
I-JEPA does not require knowledge of the input being an image and only needs to mask parts of it.

V-JEPA (38m51s)

Yann LeCun introduces V-JEA, a video representation system that learns good representations of videos and can identify physically impossible events.
LeCun proposes a modified version of V-JEA that can predict future video states based on past frames and actions, creating an internal world model.
This internal world model enables planning and optimal control, allowing the system to predict action outcomes and plan action sequences to achieve specific objectives.
LeCun highlights the significance of planning and optimal control in AI systems, drawing parallels to classical model predictive control techniques.

Hierarchical planning (44m22s)

Hierarchical planning is essential for complex actions, but current AI systems lack the ability to learn and use multiple levels of representation for effective hierarchical planning.
Large language models (LLMs) can handle high-level reasoning tasks but may be limited to scenarios they've been trained on and struggle with novel situations.
LLMs lack the high-bandwidth experience of the physical world, limiting their ability to provide detailed instructions for physical tasks.
Yann LeCun discusses the potential role of joint embedding spaces in enabling interaction with physical reality in robotics.
Most human plans are learned through observation and training rather than invented from scratch.
LLMs can perform high-level planning tasks, but there is a need to link these tasks to low-level actions through joint action spaces (JADS).

Autoregressive LLMs (50m40s)

Yann LeCun attributes the success of autoregressive LLMs to self-supervised learning, which has significantly advanced natural language processing.
Despite their fluency, LeCun cautions that LLMs do not possess human-like intelligence and that the Turing test is an inadequate measure of intelligence.
LeCun emphasizes the importance of learning representations and highlights the success of self-supervised learning in various tasks, including multilingual translation, content moderation, and speech recognition.
He suggests abandoning the idea of generative AI for achieving human-level AI and instead focusing on joint embedding representations.
LLMs differ from the common-sense reasoning humans use, and a combination of joint embedding and language-based approaches may be necessary for tasks like understanding complex scenarios.
LeCun stresses the need for low-level common sense knowledge for high-level reasoning and argues that LLMs lack the sensory data and rich experiences humans have, which is crucial for understanding the underlying reality of the world.
He emphasizes the importance of sensory data and early life experiences in acquiring knowledge about the world, which current AI systems lack.

AI hallucination (1h6m6s)

Large language models (LLMs) are prone to hallucinations, which are nonsensical answers generated by the model.
Hallucinations occur due to the autoregressive prediction nature of LLMs, where the probability of generating an incorrect token increases with each token produced.
The probability of an answer being nonsensical increases exponentially with the number of tokens.
Fine-tuning LLMs with a diverse set of questions can mitigate hallucinations for common prompts, but it's challenging to cover the vast space of possible prompts.
LLMs can be "jailbroken" by using prompts that are significantly different from the training data, such as random character sequences or substituting words with foreign language equivalents.
The long tail of possible prompts that humans can generate poses a challenge for fine-tuning LLMs to handle all scenarios.
LLMs essentially function as giant lookup tables, which limits their ability to reason and think critically.

Reasoning in AI (1h11m30s)

Yann LeCun highlights the limitations of large language models (LLMs) in terms of efficient reasoning and planning capabilities.
Unlike humans, LLMs allocate a constant amount of computation to each token, regardless of the complexity of the question, lacking the ability to plan and reason effectively.
LeCun proposes building systems with persistent long-term memory and reasoning mechanisms on top of well-constructed world models to address these limitations.
Future dialog systems will involve thinking and planning answers through optimization before converting them into text, using inference of latent variables in probabilistic models rather than autoregressive token prediction.
Reasoning deeply requires training an energy-based model that can determine whether a given answer is compatible with the input, using contrastive or non-contrastive methods.
A good representation of the input and output, such as an abstract representation of ideas, can be used for training, and latent variables can be manipulated to minimize output energy and represent a good answer.
Preventing collapse and ensuring high energy for untrained inputs is crucial, which is implicitly done in LLMs by minimizing cross-entropy during training.

Reinforcement learning (1h29m2s)

Yann LeCun recommends abandoning:
- Generative models in favor of joint embedding architectures.
- Autoregressive generation.
- Probabilistic models in favor of energy-based models.
- Contrastive methods in favor of regularized methods.
LeCun suggests minimizing the use of reinforcement learning (RL) due to its inefficiency in terms of samples.
RL should be used when planning doesn't yield the predicted outcome to adjust the world model or the critic.
Reinforcement learning with human feedback (RLHF) works well because of human feedback, not necessarily because of RL.
RLHF can be used to train a reward model that estimates the extent to which an answer is good.
LeCun believes it would be more efficient to use RLHF for planning rather than fine-tuning the system's parameters.

Woke AI (1h34m10s)

Yann LeCun criticizes Google's Gemini 1.5 for its biased and censored image generation, citing examples such as altering historical figures' appearances and refusing to generate images related to Tiananmen Square.
LeCun argues that creating an unbiased AI system is impossible due to the subjective nature of bias and diverse opinions among individuals.
He proposes open-sourcing large language models (LLMs) to promote diversity, prevent a monopoly of AI knowledge, and ensure a wide range of perspectives.
Open-source AI platforms are crucial to prevent the control of digital content by a few companies and to preserve local culture, values, and languages.
Open-source platforms enable the development of diverse AI systems in terms of political opinions, language, culture, value systems, and technical abilities.
Specialized AI systems built on open-source platforms can address specific needs, such as providing medical information in local languages or answering questions about a company's internal information.
Companies like Meta and Google should minimize fine-tuning steps after building foundation pre-trained models to promote open-source AI development.

Open source (1h43m48s)

Meta's open-source strategy is based on the belief that it can still derive revenue from its technology despite distributing base models in open source.
Meta's business model for its AI services is either through ads or through business customers.
Releasing open-source models allows others to build applications on top of them, which can potentially be acquired by Meta if they prove useful to its customers.
Open-sourcing the base models accelerates progress by attracting a wide community of developers and businesses who can contribute to their improvement.

AI and ideology (1h47m26s)

Criticism of Gemini AI is due to its perceived ideological lean, which some argue is a result of the political affiliations of tech people.
Yann LeCun believes the issue lies with the acceptability or political leanings of the customer base rather than the political leanings of the engineers.
Companies avoid offending people by ensuring their products are safe, but this can lead to overdoing it or failing to satisfy everyone.
Achieving unbiased systems that are perceived as such by everyone is impossible due to differing perspectives and factual inaccuracies.
Diversity in every possible way is the only solution to address these challenges.

Marc Andreesen (1h49m58s)

Yann LeCun discusses the challenges faced by big tech companies in deploying generative AI products due to internal activism, legal concerns, and public scrutiny.
Open-source development is seen as a better approach to mitigate these challenges, allowing for diversity and customization of models.
Guardrails and ethical considerations are necessary for open-source AI systems to ensure they are safe and non-toxic.
Studies suggest that large language models (LLMs) may facilitate the generation of harmful content compared to traditional search engines.
LLMs have limited utility in aiding the creation of bioweapons or chemical weapons.
Access to information through search engines and libraries is sufficient for designing or building such weapons, and LLMs do not provide significant assistance.
Creating bioweapons or chemical weapons requires specialized knowledge and expertise beyond the instructions provided by LLMs.
Biologists have emphasized the complexity and challenges of lab work involved in creating bioweapons, highlighting the limitations of LLMs in this area.

Llama 3 (1h57m56s)

Yann LeCun is enthusiastic about the potential of large language models (LLMs) like PaLM 2 and future versions, believing they will possess capabilities such as planning, understanding the world, and reasoning.
He emphasizes the importance of research breakthroughs in training systems from video and world models to achieve human-level intelligence.
LeCun highlights the significance of collaboration between researchers from various institutions, including DeepMind, UC Berkeley, and NYU, in advancing AI research.
While acknowledging the scale and compute power involved in training LLMs, LeCun's primary interest lies in the theoretical aspects, meaning, and software behind AI.
He sees potential for progress through architectural innovation, more efficient implementation of popular architectures like Transformers and CNNs, and the exploration of new principles and fabrication technologies.
LeCun believes that building Artificial General Intelligence (AGI) might require hardware innovation to reduce power consumption, as GPUs currently consume much more power than the human brain.

AGI (2h4m20s)

AGI is not coming soon, it will be a gradual process.
Developing systems that can learn from videos and have large associative memories will take time.
Creating systems that can reason, plan, and learn hierarchical representations will require at least a decade or more.
There are many unforeseen problems that need to be solved before AGI can be achieved.
People have been calling AGI "just around the corner" for years, but they have been systematically wrong.
The eternal optimism about AGI is not just due to Moore's Paradox, but also because intelligence is not a linear thing that can be measured with a single number.
Intelligence is a collection of skills and the ability to acquire new skills efficiently.
The set of skills that an intelligent entity possesses is high-dimensional, making it difficult to compare the intelligence of different entities.

AI doomers (2h8m48s)

Yann LeCun argues against the belief that the emergence of superintelligent AI will lead to catastrophic scenarios, proposing a gradual development with safety measures and guardrails.
He dismisses the notion that intelligent AI systems will inherently seek to dominate or eliminate humans, emphasizing careful design and objective-driven optimization.
AI safety can be achieved through designing better, more useful, and controllable AI systems.
Unlike nuclear weapons, AI development will likely progress gradually, allowing for iterative responses and countermeasures.
The psychology of AI doomers stems from a natural fear of new technologies and their potential societal impact, leading to concerns about cultural threats, job security, and the future.
Concerns about big tech companies arise due to the potential power and control they may have over advanced AI technology, leading to fears of abuse and exploitation of vulnerable individuals in society.
Open-source platforms are advocated as a means to address these concerns, ensuring that AI technology is not centralized and controlled by a few powerful entities.

Joscha Bach (2h24m38s)

Yann Lecun expresses concern about AI overlords communicating with corporate jargon and condescending tones.
He emphasizes the importance of open-source platforms to ensure diverse AI systems that represent various cultures, opinions, languages, and value systems.
Lecun warns against the concentration of power through proprietary AI systems, as it poses a greater danger than other AI-related concerns.
He advocates for diversity in AI systems to preserve a variety of ideas, beliefs, and political opinions, which is crucial for democracy.
Lecun questions whether humans can be trusted to build AI systems that are beneficial to humanity, highlighting the potential risks of proprietary systems controlled by a few companies.
He suggests that even if rogue countries develop AI systems with malicious intent, they would still need to overcome the defenses of diverse AI systems, leading to potentially humorous situations.

Humanoid robots (2h28m51s)

Yann LeCun discusses the future of robotics and the potential for millions of humanoid robots in the coming decade.
The main challenge in robotics, known as the Moravec Paradox, is developing systems that can understand the world and plan actions.
Current approaches, such as those used by Boston Dynamics, involve handcrafted dynamical models and careful planning, but they are limited in their ability to handle complex tasks.
Progress in robotics depends on developing strong World models that can train themselves to understand the world.
Embodied AI research, which involves using physical robots to interact with the real world, is important for exploring the philosophical and psychological aspects of human-robot relationships.
Promising research areas for PhD students interested in AI include training world models through observation without relying solely on large datasets, developing planning algorithms for systems that operate in non-physical environments, and investigating hierarchical planning.
LeCun emphasizes the need for AI systems to learn hierarchical representations of action plans, similar to how humans plan complex tasks.
He envisions a future where robots can autonomously complete complex tasks like traveling from New York to Paris or performing household chores.

Hope for the future (2h38m0s)

Yann LeCun believes AI has the potential to amplify human intelligence and bring about significant societal changes, similar to the impact of the printing press during the Enlightenment.
He argues against banning or regulating AI, as it would hinder progress and prevent humanity from reaping its benefits. Instead, he suggests managing risks through ethical guidelines and responsible use.
LeCun predicts a gradual shift in professions due to AI, with potential job opportunities in the metaverse.
He expresses optimism about human nature and believes open-source AI can empower the goodness in people.
LeCun is a prominent advocate for open-source AI in both research and models and is known for his unique and captivating way of speaking about AI.

Browse more from
Lex Fridman

Summarize anything forget nothing

Rated 4.9 on Product Hunt

Get Started

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Introduction (0s)

Limits of LLMs (2m18s)

Bilingualism and thinking (13m54s)

Video prediction (17m46s)

JEPA (Joint-Embedding Predictive Architecture) (25m7s)

JEPA vs LLMs (28m15s)

DINO and I-JEPA (37m31s)

V-JEPA (38m51s)

Hierarchical planning (44m22s)

Autoregressive LLMs (50m40s)

AI hallucination (1h6m6s)

Reasoning in AI (1h11m30s)

Reinforcement learning (1h29m2s)

Woke AI (1h34m10s)

Open source (1h43m48s)

AI and ideology (1h47m26s)

Marc Andreesen (1h49m58s)

Llama 3 (1h57m56s)

AGI (2h4m20s)

AI doomers (2h8m48s)

Joscha Bach (2h24m38s)

Humanoid robots (2h28m51s)

Hope for the future (2h38m0s)

Browse more from
Lex Fridman

Generally AI Episode 4: Sold Out!

AI Demos: Claude 3's Opus, Mistral, Groq Playground, EMO by Ali Baba | E1912

Blockchain tech is working to combat AI-based deepfakes (w/ Melody Hildebrandt and Mike Blank)

Rabbit CEO Jesse Lyu on the high-risk, highly-hyped r1 AI companion | StrictlyVC LA

Cultivating kinder, gentler, more democratic AI | Marcelo Suárez-Orozco & Paul English | TEDxBoston

Google's AI emergency, Apple's lowkey AI moves, amazing Sora demos & more with Sunny Madra | E1904

Summarize anything forget nothing

Rated 4.9 on Product Hunt

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Introduction (0s)

Limits of LLMs (2m18s)

Bilingualism and thinking (13m54s)

Video prediction (17m46s)

JEPA (Joint-Embedding Predictive Architecture) (25m7s)

JEPA vs LLMs (28m15s)

DINO and I-JEPA (37m31s)

V-JEPA (38m51s)

Hierarchical planning (44m22s)

Autoregressive LLMs (50m40s)

AI hallucination (1h6m6s)

Reasoning in AI (1h11m30s)

Reinforcement learning (1h29m2s)

Woke AI (1h34m10s)

Open source (1h43m48s)

AI and ideology (1h47m26s)

Marc Andreesen (1h49m58s)

Llama 3 (1h57m56s)

AGI (2h4m20s)

AI doomers (2h8m48s)

Joscha Bach (2h24m38s)

Humanoid robots (2h28m51s)

Hope for the future (2h38m0s)

Browse more from Lex Fridman

Generally AI Episode 4: Sold Out!

AI Demos: Claude 3's Opus, Mistral, Groq Playground, EMO by Ali Baba | E1912

Blockchain tech is working to combat AI-based deepfakes (w/ Melody Hildebrandt and Mike Blank)

Rabbit CEO Jesse Lyu on the high-risk, highly-hyped r1 AI companion | StrictlyVC LA

Cultivating kinder, gentler, more democratic AI | Marcelo Suárez-Orozco & Paul English | TEDxBoston

Google's AI emergency, Apple's lowkey AI moves, amazing Sora demos & more with Sunny Madra | E1904

Summarize anything forget nothing

Rated 4.9 on Product Hunt

Browse more from
Lex Fridman