abacusai.document_data
======================

.. py:module:: abacusai.document_data


Classes
-------

.. autoapisummary::

   abacusai.document_data.DocumentData


Module Contents
---------------

.. py:class:: DocumentData(client, docId=None, mimeType=None, pageCount=None, totalPageCount=None, extractedText=None, embeddedText=None, pages=None, tokens=None, metadata=None, pageMarkdown=None, extractedPageText=None, augmentedPageText=None)

   Bases: :py:obj:`abacusai.return_class.AbstractApiClass`


   Data extracted from a docstore document.

   :param client: An authenticated API Client instance
   :type client: ApiClient
   :param docId: Unique Docstore string identifier for the document.
   :type docId: str
   :param mimeType: The mime type of the document.
   :type mimeType: str
   :param pageCount: The number of pages for which the data is available. This is generally same as the total number of pages but may be less than the total number of pages in the document if processing is done only for selected pages.
   :type pageCount: int
   :param totalPageCount: The total number of pages in the document.
   :type totalPageCount: int
   :param extractedText: The extracted text in the document obtained from OCR.
   :type extractedText: str
   :param embeddedText: The embedded text in the document. Only available for digital documents.
   :type embeddedText: str
   :param pages: List of embedded text for each page in the document. Only available for digital documents.
   :type pages: list
   :param tokens: List of extracted tokens in the document obtained from OCR.
   :type tokens: list
   :param metadata: List of metadata for each page in the document.
   :type metadata: list
   :param pageMarkdown: The markdown text for the page.
   :type pageMarkdown: list
   :param extractedPageText: List of extracted text for each page in the document obtained from OCR. Available when return_extracted_page_text parameter is set to True in the document data retrieval API.
   :type extractedPageText: list
   :param augmentedPageText: List of extracted text for each page in the document obtained from OCR augmented with embedded links in the document.
   :type augmentedPageText: list


   .. py:attribute:: doc_id
      :value: None


   .. py:attribute:: mime_type
      :value: None


   .. py:attribute:: page_count
      :value: None


   .. py:attribute:: total_page_count
      :value: None


   .. py:attribute:: extracted_text
      :value: None


   .. py:attribute:: embedded_text
      :value: None


   .. py:attribute:: pages
      :value: None


   .. py:attribute:: tokens
      :value: None


   .. py:attribute:: metadata
      :value: None


   .. py:attribute:: page_markdown
      :value: None


   .. py:attribute:: extracted_page_text
      :value: None


   .. py:attribute:: augmented_page_text
      :value: None


   .. py:attribute:: deprecated_keys


   .. py:method:: __repr__()


   .. py:method:: to_dict()

      Get a dict representation of the parameters in this class

      :returns: The dict value representation of the class parameters
      :rtype: dict