Stanford CS25: V4 I Overview of Transformers

25 Apr 2024 (almost 2 years ago)

Course Overview

The course aims to discuss the latest advancements in AI, Transformers, and large language models, featuring talks from top researchers and experts in the field.
Speakers will introduce themselves and share their research interests, which span various domains such as robotics, agents, reinforcement learning, NLP, multimodal work, psychology, cognitive science, HCI, visual language models, image editors for accessibility, and interpretability research.
The course will cover the evolution of Transformers, from their initial development to their widespread use in domains beyond NLP, including vision, protein folding, and video models.

Challenges in NLP

The field of NLP faces challenges due to the discrete nature of text, including data augmentation difficulties, precise word meanings, long context lengths, and memory requirements.
Earlier NLP models had weaknesses such as short context length, linear reasoning, and lack of context adaptation.

History of NLP

A brief history of NLP is presented, starting with early chatbots like ELIZA in 1966, which used rule-based approaches and simulated text patterns.
Word embeddings, which are vector representations of words, were developed to capture deeper meanings and semantic relationships between words.
Different types of word embeddings, such as word2vec and GloVe, provide local and global context within documents.
These word embeddings enable various NLP tasks such as question answering, text summarization, sentence completion, and machine translation.

Transformers

Sequence-to-sequence models, commonly used for translation tasks, faced inefficiencies and ineffectiveness due to their sequential nature and reliance on hidden context vectors.
This led to the development of attention mechanisms and Transformers, which revolutionized NLP and enabled parallelization and improved efficiency.
Attention in Transformers involves assigning weights to different parts of an input sequence, allowing the model to focus on relevant information.
Attention relies on queries, keys, and values to determine the relevance of input elements.
Multi-head attention uses multiple attention mechanisms with different randomly initialized parameters to capture diverse aspects of the input.
Transformers use both self-attention (within a single sequence) and cross-attention (between different sequences) for tasks like machine translation.
Transformers address limitations of recurrent neural networks (RNNs) such as long-range dependency issues and sequential processing, enabling efficient parallelization and effective language representation.

Large Language Models (LLMs)

Large language models are scaled-up Transformers trained on vast amounts of text data, exhibiting emergent abilities as they grow in size.
Emergent abilities are unpredictable improvements in model performance that occur at specific thresholds during scaling, characterized by sudden spikes in accuracy on various tasks.
Scaling is a significant factor in the emergence of abilities in large language models (LLMs), but it is not the only factor.
Smaller models can also exhibit emergent abilities with new architectures, higher-quality data, and improved training procedures.
Reinforcement learning from human feedback (RLHF) is a technique used to train LLMs by providing human feedback on their outputs.
Recent advancements include DPO, a faster algorithm that uses preference and non-preference data for training LLMs.
Notable models such as GPT-4, Gemini, and Chinchilla have demonstrated impressive performance in various tasks.
There is a growing focus on human alignment, interaction, and ethical concerns as more people gain access to these models.
Applications of LLMs extend beyond text generation, including audio, music, neuroscience, biology, and diffusion models for text-to-video generation.

Future Advancements in LLMs

Future advancements may involve reducing computational complexity, enhancing human controllability, adaptive learning, multi-sensory embodiment, infinite memory, self-improvement, complete autonomy, emotional intelligence, and ethical reasoning.
Real-world applications of LLMs are already emerging, such as ChatGPT, Whisper, and advancements in music, image, and video generation.
Embodying LLMs in the real world, such as in games like Minecraft, is an exciting area of exploration.
Large language models (LLMs) have made significant advancements, enabling applications such as physical helpers, medical diagnosis, and more.
LLMs differ from humans in learning methods; humans learn more efficiently and may have innate knowledge, multimodal grounding, and active social learning.
Smaller open-source models, memory augmentation, and personalization are emerging trends in LLM research.
Challenges include knowledge frozen in time, lack of end-to-end memory, and the debate on whether LLMs truly learn or memorize.
Pre-training data synthesis and continual learning are potential avenues to bridge the gap between current models and artificial general intelligence (AGI).
Interpreting and understanding LLMs' inner workings is crucial for improving, controlling, and aligning them with human values and safety.

Techniques for Improving LLMs

Model editing involves modifying specific nodes in a model without retraining it, allowing for targeted factual association updates.
Mixture of experts, as seen in models like GBD-4 and Gemini, involves multiple models working together to solve a problem, with ongoing research on optimizing their definition, initialization, and connection.
Continual learning includes self-improvement and self-reflection, with models exhibiting the ability to iteratively refine and improve their output through multiple layers of self-reflection.
The hallucination problem, where models generate incorrect or nonsensical text, can be addressed through internal fact verification, confidence scores, model calibration, and retrieval from a knowledge store.
Chain of Thought reasoning forces models to reason through their ideas step by step, leading to improved accuracy for larger language models.
Challenges for smaller models in Chain of Thought reasoning include missing semantic understanding, weaker arithmetic abilities, and logical loopholes.
Generalizing Chain of Thought reasoning to allow for multiple reasoning paths and Socratic questioning can improve its effectiveness for smaller models.

Transition from Language Models to AI Agents

The transition from language models to AI agents involves considerations such as actions, long-term memory, and communication, with the hypothesis that humans will communicate with AI using language while AI operates machines.
Agents are more powerful than single calls to large foundation models.
Agents require memory, large context windows, personalization, actions, internet access, and tool use.
Agents can be used to automate tasks, such as taking online driving tests.
Human-like agents can leverage existing technology infrastructure and act as digital extensions of users.
Agents can be taught by recording and learning from user actions.
The five levels of autonomy for agents range from L0 (human in control) to L5 (no human intervention).
Two main approaches to building agents are API control and direct interaction with the computer.
Direct interaction with the computer allows agents to control websites and perform tasks without the need for APIs.
Agents are like CPUs that take input tokens in natural language, process them, and output transformed tokens.
Memory in AI is like a disk, where data is stored in embeddings and retrieved using retrieval models.
Personalization involves forming a long-lived user memory to learn user preferences and adapt to them.

Multi-Agent Autonomous Systems

Multi-agent autonomous systems involve multiple agents communicating and collaborating to perform tasks.
Challenges in multi-agent systems include information exchange and miscommunication due to the lossy nature of natural language.
Multi-agent systems can be thought of as a hierarchy of agents, similar to a human organization, where a manager agent coordinates worker agents to complete tasks.
Communication is a challenge in multi-agent systems, requiring robust protocols to minimize miscommunication and ensure reliability.
Verifying the completion of tasks and handling potential failures are important considerations in designing multi-agent systems.

Challenges and Considerations for Deploying Agents

Autonomous agents face challenges related to reliability due to the stochastic nature of AI models, leading to potential errors and deviations from expected behavior.
Testing, benchmarking, and observability are crucial for deploying agents in real-world scenarios, especially when dealing with sensitive tasks like financial transactions.
The "LM operating system" analogy compares language models (LMs) to computer chips, with context length acting as RAM, embeddings as a file system, and various tools and peripherals representing different modalities and capabilities.
Future advancements in AI suggest the possibility of "neural computers" where users interact with a chat interface that delegates tasks to agents, providing a seamless and efficient user experience.
Key challenges for deploying agents in real-world applications include error correction, security, user permissions, sandboxing, and ensuring stability and safety in sensitive scenarios.

Next Week's Talk

Next week, Jason Wei and Kyang Wan from OpenAI will give a talk on cutting-edge research involving large language models.
Jason Wei was the first author of several works discussed in the course, such as Chain of Thought reasoning and emerging behaviors.
The talk will be in person, and those enrolled in the course are encouraged to attend.
Those not enrolled in the course can still audit the lectures, which will be held on Thursdays at the same time each week.
Notifications about the lectures will be sent via email, Canvas, and Discord.