AI researchers and startups are continually adapting and developing new techniques and methodologies in response to advancements in the field, emerging challenges, and a better understanding of the capabilities and limitations of AI models. These shifts are evident across various stages of AI development, from data acquisition and model training to evaluation and deployment.
One significant evolving area is the approach to model architectures. The rise of foundational models, particularly large language models (LLMs) and diffusion models, has fundamentally changed how researchers approach many tasks. Instead of training specialized models from scratch for each specific problem, there's a growing trend towards leveraging these powerful pre-trained models and adapting them through techniques like fine-tuning or prompt engineering. This shift is noted by researchers working on ScreenAI, who are now focusing on how to best utilize advancements in LLMs for screen understanding. Startups like AI21 Labs are also innovating in this space by developing novel architectures like Jamba, which combines Mamba and Transformer elements. The Transformer architecture itself was a result of an evolving approach within Google Research from RNNs and CNNs in translation research.
The understanding and handling of data have also seen significant evolution. There's a clear move towards prioritizing data quality over sheer volume, although defining and measuring data quality remains a challenge. Researchers are increasingly recognizing that the quality of the training data directly impacts the performance and reliability of AI models. This realization has led to more creative and often secretive data acquisition strategies, including collaborations with domain experts like hospitals, licensing agreements, and the leveraging of expert networks. Furthermore, synthetic data generation has emerged as a powerful technique, particularly with the advancements in generative models. LLMs are being used to generate synthetic datasets for training smaller, more efficient inference models and for augmenting existing datasets, especially in areas like 3D modeling and robotics where real-world data can be expensive or scarce. This marks a shift from relying primarily on large, scraped datasets towards a more targeted and potentially synthetic data-centric approach. We deep dive into this later.
Evaluation methodologies are another area undergoing substantial evolution. There's a growing awareness that current evaluation metrics and benchmarks often fail to adequately reflect real-world performance and user needs. Researchers are manually evaluating model responses, datasets, and human rater labels, highlighting the limitations of automated evaluation in many complex scenarios. Attempts to fine-tune models for automated evaluation often fall short, even in mature fields like image compression. This has led to a push for developing more nuanced and user-centric evaluation frameworks, incorporating qualitative assessments alongside quantitative metrics. The process of establishing new benchmarks and gaining community buy-in is also an evolving methodology, often involving the use of public datasets to ensure credibility. Startups and enterprises are recognizing the difficulty in evaluating models and measuring the ROI of AI adoption, leading to a more cautious and experimental approach to large-scale deployments. There's a growing appreciation for the importance of qualitative evaluation methods, particularly when assessing the outputs of generative models and understanding user experiences. With the ambiguity surrounding the definition of data quality for LLMs, qualitative analysis and expert review are becoming increasingly crucial.
Prompt engineering and in-context learning have become increasingly sophisticated techniques, especially with the advent of LLMs. Researchers are developing novel prompting strategies, such as Chain-of-Thought and Chain-of-Table, to elicit more complex reasoning and problem-solving capabilities from these models, even with limited computational resources per token. This represents a shift from solely relying on model scale to also focusing on how to effectively interact with and guide these models through carefully crafted prompts.
The field is also witnessing an increasing integration of multimodality. Projects like ScreenAI, which aims to understand user interfaces and infographics, exemplify the trend of building AI systems that can process and reason across different data types, such as vision and language. This move towards multimodal models reflects a more holistic approach to AI, aiming to create systems that can perceive and interact with the world in a more human-like way. We see this almost like a brain obtaining a pair of eyes, or ears, and so forth.
For specific applications, the evolution of foundational models is directly influencing research methodologies. For instance, in code generation and repair, the advancements in LLMs have made tasks previously considered "toy problems" now seem much more feasible. However, researchers are also learning when LLMs are not the appropriate solution, such as for simple categorization tasks where smaller, more efficient models might be more suitable. A paper on AI-powered software vulnerability patching demonstrates a pragmatic approach, leveraging existing pre-trained models and focusing on the critical challenge of validating the generated fixes. It is able to do so without expensive large foundation models.
In information retrieval and question answering, researchers are exploring techniques like Semi-Extractive Question Answering (SEMQA) to bridge the gap between purely extractive and abstractive methods. Prompt rewriting is also being investigated as a way to improve the relevance of retrieved documents by learning to omit less useful elements from user queries. These novel techniques demonstrate an evolving understanding of how to best leverage language models for information access.
From a startup perspective, there's an evolving focus on identifying practical and high-ROI use cases for AI, particularly those that can drive efficiency and automation within enterprises. Startups like Orby.ai are focusing on automating user workflows by observing on-screen actions and generating code, highlighting a shift towards applying AI to tangible business problems. The advice given to business leaders by AI practitioners emphasizes starting with tangible projects that demonstrate clear value and ROI - often customer support or workflow automation.
The increasing awareness of security risks in the AI software supply chain is also driving the adoption of new methodologies. Organizations are beginning to apply established software supply chain security practices, such as provenance tracking and integrity checks, to AI models and datasets. This reflects an evolving understanding that AI assets need to be managed and secured with the same rigor as traditional software artifacts.
Finally, the perspective of artists and creative practitioners on AI is also evolving. They are increasingly viewing AI models not just as tools but as creative materials that can be manipulated, reinterpreted, and incorporated into their artistic processes. This "craft approach" involves actively engaging with the algorithms' behaviors and pushing the boundaries of their affordances, signifying a shift in how AI is perceived and utilized in creative fields.
Evolving Methods in AI Research and Deployments
Reliance on Purely Augmented Datasets (vs. Synthetic Data): Synthetic data is increasingly seen as superior for certain tasks, particularly in image generation.
Manual and Intuition-Driven Evaluations (vs. User-Centric Evals): User-centric evaluation methodologies are becoming more prevalent.
Fine-Tuning as the Primary Knowledge Infusion Method (vs. RAG): RAG is often more efficient and effective for tasks requiring up-to-date or specific knowledge.
Fixed Sets of Labels in Vision Tasks (vs. Open World Perception): LLMs are enabling "Open World" perception, moving beyond predefined classes.
Developing Separate Models for Different Languages in Recommender Systems (vs. Transfer Learning with LLMs): LLMs facilitate transfer learning, reducing the need for separate models.
Monolithic Service Deployments (vs. Micro-services): Micro-services are better suited for the flexibility and scalability of modern applications, especially in low-latency scenarios.
Novel Techniques Researchers Are Using
LLMs for Synthetic Data Generation: Employing the generative power of LLMs to create training data for other AI models, particularly smaller, specialized ones.
Neural Radiance Fields (NeRFs) and Beyond: including dynamic content, Nerfies, NeRF-W and def-NeRF, Factor Fields, 3D Gaussian Splatting (3DGS), Neural Fields Beyond Conventional Cameras, and NeRFCodec.
Optimization by Prompting (OPRO): Leverages LLMs to iteratively generate prompts that improve AI system performance on specific benchmarks. This "meta-learning" approach guides LLMs to create better instructions for other algorithms.
Retrieval-Augmented Generation (RAG): Enhances LLMs by allowing them to search and retrieve information from a corpus of documents, generating more accurate and contextually relevant responses. RAG often outperforms fine-tuning, is more cost-effective, and is valuable for privacy and information recency.
Multi-Modality Embedding Alignment: Trains AI systems to understand and relate information across different modalities (text, images, audio, etc.) by aligning their embeddings in a shared space.
Mixture of Experts (MoE): An ensemble technique combining multiple machine learning models for improved performance and efficiency, commonly used in top-performing ML systems.
qLORA: A technique to make language model training and fine-tuning more cost-effective.
Graph Networks for Material Science: Applying graph-based AI models to predict and discover new, stable crystal structures.
AI-Powered Code Analysis and Modification: Using AI models to analyze code, prioritize issues, and automatically generate patches for security vulnerabilities.
Explainable AI (XAI) Techniques: Developing methods to make AI decision-making more transparent, with frameworks and tools like the Language Interpretability Tool and What-If Tool.
Deep Learning, Reinforcement Learning (RL), and Federated Learning (FL) in Networks: Applying these paradigms for network management tasks like mobility, scalability, energy efficiency, privacy, and reliability.
Linear RNNs/State Space Models (SSMs): These are emerging as strong contenders to Transformers, especially for long sequences. They address the quadratic complexity of Transformers' attention mechanism. Key examples are Mamba, Griffin, Jamba, H3, S4, Hyena Hierarchy, StripedHyena-7B, RWKV and Megalodon.
Test-Time Training (TTT) Models: This approach replaces the transformer's "hidden state" with an internal machine-learning model. This allows processing much larger amounts of data without a corresponding growth in model size, potentially exceeding the capabilities of current Transformer-based models.
Hybrid Approaches: Combining the strengths of different architectures, such as Transformer-CNN hybrids and Transformer-RNN hybrids.
Neural Radiance Fields (NeRFs) and Beyond: including dynamic content, Nerfies, NeRF-W and def-NeRF, Factor Fields, 3D Gaussian Splatting (3DGS), Neural Fields Beyond Conventional Cameras, and NeRFCodec.
Scaling Rectified Flow Transformers: For High resolution image synthesis.
Conclusion
In conclusion, AI research and startups are characterized by a dynamic and evolving landscape of techniques and methodologies. These shifts are driven by advancements in foundational models, a deeper understanding of the critical role of data, the ongoing challenges in evaluation, the development of sophisticated prompting strategies, the integration of multimodality, a focus on practical applications, new approaches to information retrieval and question answering, the increasing importance of qualitative evaluation, a growing awareness of security concerns, and the evolving perspectives of creative practitioners.