Ray Aviary: Open Source Multi-LLM Serving Platform
In a world where language models have revolutionized natural language processing, a team of passionate engineers from AnyScale, has unveiled a groundbreaking open source project named “Aviary“. This project is designed to manage and run various open source Large Language Models (LLMs), addressing the complexities and challenges associated with deploying LLMs for specific use cases effectively.
The Need for Aviary:
Large Language Models have gained immense popularity due to their ability to generate human-like text and perform various NLP tasks. However, deploying these models in production environments has proved to be a challenging task. The anyscale team, being avid supporters of open source LLMs, identified the need for a platform that simplifies the deployment process, while providing crucial insights into model performance and efficiency.
The Challenge of Model Deployment:
Developing a user-friendly interface was the easy part of this ambitious project. The real challenge arose when tackling the complexities of the back-end. Deploying an LLM for production requires more than just downloading a pre-trained model; various considerations must be taken into account. For example, deep speed optimizations, bug checks, stop tokens, model prompts, and initial prompts all play a vital role in achieving optimal performance. These additional factors significantly influence the model’s quality, cost, and latency in a production setting.
The Aviary Interface:
Aviary features a user-friendly interface that empowers users to experiment with multiple open source LLMs effortlessly. The platform currently supports around 10 different LLMs, including variations of Mosaic ml types like instruct fine-tuned and chat fine-tuned versions. Users can input prompts to see how different LLMs respond to specific queries, allowing for a quick comparison of their performance. Link Here.
Model Evaluation and Performance Metrics:
The Aviary interface not only presents the generated text but also provides valuable statistics such as processing time and token count. These metrics enable users to assess LLM performance effectively. Moreover, anyscale’s internal users have been actively voting and evaluating the models, resulting in a win ratio metric that signifies the quality of the LLMs. The platform’s users have shown a preference for certain models, indicating their superiority in specific use cases.
Performance Factors: Cost and Latency:
Besides evaluating LLM quality, Aviary also considers performance factors like cost and latency. While certain models may deliver high-quality outputs, they may come at a higher cost and increased latency. For example, Amazon light GPT provides fast answers at a higher cost, while Mosaic Story Writer produces more comprehensive outputs, making it more expensive per article.
Aviary’s Solution: Ray Service:
The backbone of Aviary is Ray Service, a powerful system capable of efficiently serving LLMs, even with their massive size and complexity. Ray Service can handle high loads, automatically scaling and distributing the models across multiple machines with GPUs. The platform’s careful use of batching and coding ensures optimal performance, making Ray Service an ideal choice for serving large language models.
Making Aviary a Community-driven Project:
The anyscale team envisions Aviary as a community-driven platform, enabling contributions from LLM enthusiasts worldwide. The team has open-sourced the entire project, inviting developers to add new evaluation algorithms and support for various LLMs. As new models emerge, Aviary aims to swiftly incorporate them, making them readily accessible through its intuitive interface.
Getting Started with Aviary:
To facilitate user contributions, the team has ensured that adding new models to Aviary is a straightforward process. The platform’s YAML-based configuration files allow users to specify various deployment settings, GPU optimization, and other parameters with ease. Aviary’s command-line interface (CLI) offers the flexibility to experiment locally or deploy in a company-wide service, making it suitable for businesses and individual researchers alike.
Aviary on AnyScale:
As Aviary aims to simplify LLM serving, the anyscale team offers a managed hosted service named “anyscale” for deploying Aviary on its clusters. Leveraging this service provides additional benefits such as spot instance support with fallback to On Demand, zero downtime upgrades, and scale-to-zero functionality for cost optimization.
Call to Action:
The anyscale team is actively seeking testers and contributors to help refine and expand Aviary’s capabilities. Whether you represent a business interested in a centralized LLM repository or a researcher keen on evaluating different LLM characteristics, the team welcomes your involvement. Aviary’s release signifies an exciting future, with the potential to unlock new possibilities for the open source LLM community.
The Bottom Line:
With Aviary, anyscale has introduced a game-changing platform that empowers users to harness the power of open source Large Language Models effectively. The seamless integration of Ray Service, user-friendly interface, and community-driven approach make Aviary a promising step towards democratizing LLM deployment and evaluation. As Aviary evolves with contributions from the community, it is poised to revolutionize the world of natural language processing and unlock the full potential of open source LLMs.