abacusai.document_retriever =========================== .. py:module:: abacusai.document_retriever Classes ------- .. autoapisummary:: abacusai.document_retriever.DocumentRetriever Module Contents --------------- .. py:class:: DocumentRetriever(client, name=None, documentRetrieverId=None, createdAt=None, featureGroupId=None, featureGroupName=None, indexingRequired=None, latestDocumentRetrieverVersion={}, documentRetrieverConfig={}) Bases: :py:obj:`abacusai.return_class.AbstractApiClass` A vector store that stores embeddings for a list of document trunks. :param client: An authenticated API Client instance :type client: ApiClient :param name: The name of the document retriever. :type name: str :param documentRetrieverId: The unique identifier of the vector store. :type documentRetrieverId: str :param createdAt: When the vector store was created. :type createdAt: str :param featureGroupId: The feature group id associated with the document retriever. :type featureGroupId: str :param featureGroupName: The feature group name associated with the document retriever. :type featureGroupName: str :param indexingRequired: Whether the document retriever is required to be indexed due to changes in underlying data. :type indexingRequired: bool :param latestDocumentRetrieverVersion: The latest version of vector store. :type latestDocumentRetrieverVersion: DocumentRetrieverVersion :param documentRetrieverConfig: The config for vector store creation. :type documentRetrieverConfig: VectorStoreConfig .. py:attribute:: name :value: None .. py:attribute:: document_retriever_id :value: None .. py:attribute:: created_at :value: None .. py:attribute:: feature_group_id :value: None .. py:attribute:: feature_group_name :value: None .. py:attribute:: indexing_required :value: None .. py:attribute:: latest_document_retriever_version .. py:attribute:: document_retriever_config .. py:attribute:: deprecated_keys .. py:method:: __repr__() .. py:method:: to_dict() Get a dict representation of the parameters in this class :returns: The dict value representation of the class parameters :rtype: dict .. py:method:: rename(name) Updates an existing document retriever. :param name: The name to update the document retriever with. :type name: str :returns: The updated document retriever. :rtype: DocumentRetriever .. py:method:: create_version(feature_group_id = None, document_retriever_config = None) Creates a document retriever version from the latest version of the feature group that the document retriever associated with. :param feature_group_id: The ID of the feature group to update the document retriever with. :type feature_group_id: str :param document_retriever_config: The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval. :type document_retriever_config: VectorStoreConfig :returns: The newly created document retriever version. :rtype: DocumentRetrieverVersion .. py:method:: refresh() Calls describe and refreshes the current object's fields :returns: The current object :rtype: DocumentRetriever .. py:method:: describe() Describe a Document Retriever. :param document_retriever_id: A unique string identifier associated with the document retriever. :type document_retriever_id: str :returns: The document retriever object. :rtype: DocumentRetriever .. py:method:: list_versions(limit = 100, start_after_version = None) List all the document retriever versions with a given ID. :param limit: The number of vector store versions to retrieve. The maximum value is 100. :type limit: int :param start_after_version: An offset parameter to exclude all document retriever versions up to this specified one. :type start_after_version: str :returns: All the document retriever versions associated with the document retriever. :rtype: list[DocumentRetrieverVersion] .. py:method:: get_document_snippet(document_id, start_word_index = None, end_word_index = None) Get a snippet from documents in the document retriever. :param document_id: The ID of the document to retrieve the snippet from. :type document_id: str :param start_word_index: If provided, will start the snippet at the index (of words in the document) specified. :type start_word_index: int :param end_word_index: If provided, will end the snippet at the index of (of words in the document) specified. :type end_word_index: int :returns: The documentation snippet found from the document retriever. :rtype: DocumentRetrieverLookupResult .. py:method:: restart() Restart the document retriever if it is stopped or has failed. This will start the deployment of the document retriever, but will not wait for it to be ready. You need to call wait_until_ready to wait until the deployment is ready. :param document_retriever_id: A unique string identifier associated with the document retriever. :type document_retriever_id: str .. py:method:: wait_until_ready(timeout = 3600) A waiting call until document retriever is ready. It restarts the document retriever if it is stopped. :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds. :type timeout: int .. py:method:: wait_until_deployment_ready(timeout = 3600) A waiting call until the document retriever deployment is ready to serve. :param timeout: The waiting time given to the call to finish, if it doesn't finish by the allocated time, the call is said to be timed out. Default value given is 3600 seconds. :type timeout: int .. py:method:: get_status() Gets the status of the document retriever. It represents indexing status until indexing isn't complete, and deployment status after indexing is complete. :returns: A string describing the status of a document retriever (pending, indexing, complete, active, etc.). :rtype: str .. py:method:: get_deployment_status() Gets the deployment status of the document retriever. :returns: A string describing the deployment status of document retriever (pending, deploying, active, etc.). :rtype: str .. py:method:: get_matching_documents(query, filters = None, limit = None, result_columns = None, max_words = None, num_retrieval_margin_words = None, max_words_per_chunk = None, score_multiplier_column = None, min_score = None, required_phrases = None, filter_clause = None, crowding_limits = None, include_text_search = False) Lookup document retrievers and return the matching documents from the document retriever deployed with given query. Original documents are split into chunks and stored in the document retriever. This lookup function will return the relevant chunks from the document retriever. The returned chunks could be expanded to include more words from the original documents and merged if they are overlapping, and permitted by the settings provided. The returned chunks are sorted by relevance. :param query: The query to search for. :type query: str :param filters: A dictionary mapping column names to a list of values to restrict the retrieved search results. :type filters: dict :param limit: If provided, will limit the number of results to the value specified. :type limit: int :param result_columns: If provided, will limit the column properties present in each result to those specified in this list. :type result_columns: list :param max_words: If provided, will limit the total number of words in the results to the value specified. :type max_words: int :param num_retrieval_margin_words: If provided, will add this number of words from left and right of the returned chunks. :type num_retrieval_margin_words: int :param max_words_per_chunk: If provided, will limit the number of words in each chunk to the value specified. If the value provided is smaller than the actual size of chunk on disk, which is determined during document retriever creation, the actual size of chunk will be used. I.e, chunks looked up from document retrievers will not be split into smaller chunks during lookup due to this setting. :type max_words_per_chunk: int :param score_multiplier_column: If provided, will use the values in this column to modify the relevance score of the returned chunks. Values in this column must be numeric. :type score_multiplier_column: str :param min_score: If provided, will filter out the results with score lower than the value specified. :type min_score: float :param required_phrases: If provided, each result will have at least one of the phrases. :type required_phrases: list :param filter_clause: If provided, filter the results of the query using this sql where clause. :type filter_clause: str :param crowding_limits: A dictionary mapping metadata columns to the maximum number of results per unique value of the column. This is used to ensure diversity of metadata attribute values in the results. If a particular attribute value has already reached its maximum count, further results with that same attribute value will be excluded from the final result set. :type crowding_limits: dict :param include_text_search: If true, combine the ranking of results from a BM25 text search over the documents with the vector search using reciprocal rank fusion. It leverages both lexical and semantic matching for better overall results. It's particularly valuable in professional, technical, or specialized fields where both precision in terminology and understanding of context are important. :type include_text_search: bool :returns: The relevant documentation results found from the document retriever. :rtype: list[DocumentRetrieverLookupResult]