Running a Question-Answering System on Ray Serve at Deepset

In this article, we will explore how Deepset, a company based in Germany, runs a question-answering system using Ray Serve. Deepset initially started by providing natural language processing (NLP) professional services to various companies, which helped them gain insights into customer needs and pain points. Building upon this knowledge, they developed an open-source framework called Haystack, which enables the creation of NLP pipelines, including training, fine-tuning, and evaluating models. Additionally, Deepset offers a SaaS platform, Deepsea Cloud, built on top of Haystack, allowing users to manage the entire NLP workflow, from uploading documents to monitoring the production system.

What is Question Answering?

Question answering is the process of finding specific answers to questions within a large dataset or collection of documents. Traditionally, search systems rely on keyword-based searches, but question answering enables users to query the system using natural language. For example, if you have a collection of documents on a topic, you can ask questions like “Who played Jon Snow in Game of Thrones?” and receive precise answers, including document highlighting.

Challenges with Traditional Search Systems

While question answering works efficiently on the internet, where search engines like Google can answer questions based on vast amounts of web data, it becomes more challenging for private documents like PDFs, Word files, and other company-specific data. The challenge arises from the need to find answers in these documents without compromising privacy by sharing data with external services like Google.

The Role of Word Embeddings in NLP

Word embeddings are essential in natural language processing and question answering systems. These are numerical representations of words obtained through large machine learning models exposed to vast text corpora, such as the entire internet. Word embeddings allow semantic understanding of words, capturing their relationships and similarities in a high-dimensional space. This enables machines to understand the meaning behind words and perform tasks like finding similar words, calculating analogies, and more.

Components of a Question Answering System

A question-answering system involves multiple components:

1.     Database: It stores the documents and their corresponding word embeddings.

2.     Indexing Pipeline: This component pre-processes documents, extracts text, generates word embeddings, and stores them in the database. High throughput and GPU support are crucial in this step.

3.     Query Pipeline: It takes user queries, converts them to word embeddings, finds relevant documents in the database, and highlights the answer within the document.

Ray Serve for Scaling the System

Deepset leverages Ray Serve to scale their question-answering system effectively. Ray Serve allows developers to deploy scalable, auto-scalable, and fault-tolerant services with ease. Deepset uses Ray Serve to deploy both the indexing and query pipelines. The Python interface provided by Ray Serve facilitates easy deployment and auto-scaling, eliminating the need for extensive Kubernetes expertise.

Challenges with Ray Serve

While Ray Serve offers numerous benefits, Deepset also encountered some challenges when running their question-answering system with Ray Serve:

1.     Scaling Peaks: The system experienced significant peaks when handling the initial three million document indexing requests from a single customer. Scaling fast enough to meet these demands was a challenge.

2.     Auto-Downscaling: There was an issue with auto-downscaling, resulting in idle worker nodes. The cluster did not scale down effectively, even when the load decreased.

3.     Head Node Dependency: Restarting the cluster often involved waiting for the head node to recover, leading to downtime.

Future Directions and Improvements

Deepset plans to address the challenges and explore future improvements for their question-answering system:

1.     Improving Indexing Architecture: Optimizing the indexing pipeline for faster throughput by using batching or Ray Core.

2.     Deployment Graphs: Utilizing Ray’s deployment graphs to split the monolithic pipeline into individual nodes, enabling better scaling and resource sharing among customers.

3.     Multiple Base Images: Supporting multiple versions of their framework Haystack for different customers to accommodate version-specific requirements.

Learning Ray:

Here are the two highly recommended options for mastering and learning Apache Ray for making your MLOPs journey smooth

Conclusion

Deepset’s experience with Ray Serve demonstrates the power and versatility of Ray as a tool for scaling and deploying complex NLP systems. While facing some challenges, Ray Serve provides a strong foundation for Deepset’s question-answering platform, and they plan to further improve and optimize their system in the future. Ray Serve’s ease of use, auto-scaling capabilities, and flexible deployment options make it a valuable tool for building and deploying high-performance NLP applications at scale.

In this article, we will explore how Deepset, a company based in Germany, runs a question-answering system using Ray Serve. Deepset initially started by providing natural language processing (NLP) professional services to various companies, which helped them gain insights into customer needs and pain points. Building upon this knowledge, they developed an open-source framework called Haystack, which enables the creation of NLP pipelines, including training, fine-tuning, and evaluating models. Additionally, Deepset offers a SaaS platform, Deepsea Cloud, built on top of Haystack, allowing users to manage the entire NLP workflow, from uploading documents to monitoring the production system.

Get Weekly Updates!

We don’t spam! Read our privacy policy for more info.

In this article, we will explore how Deepset, a company based in Germany, runs a question-answering system using Ray Serve. Deepset initially started by providing natural language processing (NLP) professional services to various companies, which helped them gain insights into customer needs and pain points. Building upon this knowledge, they developed an open-source framework called Haystack, which enables the creation of NLP pipelines, including training, fine-tuning, and evaluating models. Additionally, Deepset offers a SaaS platform, Deepsea Cloud, built on top of Haystack, allowing users to manage the entire NLP workflow, from uploading documents to monitoring the production system.

Get Weekly Updates!

We don’t spam! Read our privacy policy for more info.

🤞 Get Weekly Updates!

We don’t spam! Read more in our privacy policy

Share it Now on Your Channel