Beyond the AI Hype: What will and needs to happen to make AI useful?

Dec 1

Having created a couple of Google's highest-rated internal courses on LLMs, covering everything from RLHF to prompt engineering, I've seen firsthand the excitement and the challenges. And these takes are based on teaching and reviewing the last 8 years of research. As of 2023, my main takeaway is this: we need to move beyond the theoretical and focus on the practical. That means grappling with the proliferation of models, the need for transparency, and the very real question of how to make these powerful tools truly useful.

Note - these takes might get outdated (writing this in 2023), so read it with a grain of salt.

The LLM Landscape: A Proliferation of Models

The sheer number of LLMs being developed is becoming overwhelming. We have BERT, T5, MUM, PaLM, LaMDA, GPT-1 through 4, and countless others. Each model has its own strengths and weaknesses, its own training data, and its own intended use cases. This proliferation presents a challenge for both developers and end-users.

How is a small business supposed to choose the right LLM for their needs? How can they keep track of the latest advancements and understand the implications of each new model? The current naming conventions (often acronyms or sequential numbers) don't help. Clearer communication and standardized benchmarks are essential. We need a way to navigate this complex landscape, and better marketing, with clear usage scenarios.

Pathways and the Google Mission: A Two-Sided Coin

Google's Pathways vision resonates deeply with the first part of the company's mission: to organize the world's information and make it universally accessible and useful. Pathways aims to "enable a single AI system to generalize across thousands or millions of tasks, to understand different types of data, and to do so with remarkable efficiency." This is a powerful vision, and it aligns perfectly with the drive towards increasingly capable and versatile AI systems.

However, the second part of Google's mission – making information universally accessible and useful – is where the role of business organizations and non-research teams becomes paramount. It's not enough to build a powerful AI; we must ensure that its capabilities are harnessed in ways that benefit users and society. This requires a deep understanding of user needs, business processes, and ethical considerations.

The Growing Need for Independent Audits

The increasingly private nature of LLM research raises concerns about transparency and accountability. As companies like OpenAI become less forthcoming about the details of their models (as evidenced by the GPT-4 technical report, which was more marketing than substance), the need for independent auditing grows.

Imagine a future where a trusted third-party organization, rather than the model creators themselves, evaluates the accuracy, bias, and safety of LLMs. This could provide a much-needed layer of objectivity and build public trust. This is not just a theoretical concern; it's a potential business opportunity, a chance to create a new standard for LLM evaluation.

Beyond the Conversational Interface: The Value of Visuals

There's a tendency to assume that every interaction with AI should be mediated by a natural language interface. But is this always the best approach? Sometimes, a well-designed visual interface is more efficient and intuitive. Think about video games: would a text-based interface really be superior to a controller and a screen?

The real opportunity may lie in empowering non-technical users to create their own solutions without having to write complex code. Imagine a world where anyone can generate custom analytics dashboards, automate repetitive tasks, or build simple applications, all through a user-friendly interface that leverages the power of LLMs behind the scenes. This is about democratizing access to technology, not just about creating chatbots.

Code Generation: The Low-Hanging Fruit?

Code generation might be the area where LLMs have the most immediate and significant impact. Unlike natural language, code has a clear structure and a well-defined set of rules. Training data for code generation is often of high quality (think of well-maintained open-source projects on GitHub). And, crucially, code can be automatically tested for correctness.

This contrasts sharply with the challenges of training LLMs on natural language data, which is often noisy, biased, and full of "SEO spam" and other low-quality content. The "garbage in, garbage out" principle applies here, and code is, generally speaking, less "garbage" than the average webpage.

RLHF: A Better Fit for Code?

Finally, a hunch: Reinforcement Learning from Human Feedback (RLHF), while demonstrably effective for improving the helpfulness, honesty, and harmlessness of language models, might be even more powerful when applied to code generation. The feedback loop for code is tighter and more objective. A compiler provides immediate and unambiguous feedback on whether the generated code is valid. This could lead to faster and more effective training.

Conclusion: Charting a Course for Responsible Innovation

The LLM revolution is underway, and the potential benefits are immense. But we must proceed with caution, mindful of the challenges and responsibilities that come with such powerful technology. We need to prioritize transparency, accountability, and user-centric design. We need to move beyond the hype and focus on building tools that are not just impressive, but truly useful and beneficial to society. The journey is just beginning, and it's up to us to chart a course for responsible innovation.

Rachel Wu https://rachelw.org

Beyond the AI Hype: What will and needs to happen to make AI useful?

Why Everyone Should Read AI Research Papers

rachelw.org