abacusai.page_data

Classes

PageData

Data extracted from a docstore page.

Module Contents

class abacusai.page_data.PageData(client, docId=None, page=None, height=None, width=None, pageCount=None, pageText=None, pageTokenStartOffset=None, tokenCount=None, tokens=None, extractedText=None, rotationAngle=None, pageMarkdown=None, embeddedText=None)

Bases: abacusai.return_class.AbstractApiClass

Data extracted from a docstore page.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • docId (str) – Unique Docstore string identifier for the document.

  • page (int) – The page number. Starts from 0.

  • height (int) – The height of the page in pixels.

  • width (int) – The width of the page in pixels.

  • pageCount (int) – The total number of pages in document.

  • pageText (str) – The text extracted from the page.

  • pageTokenStartOffset (int) – The offset of the first token in the page.

  • tokenCount (int) – The number of tokens in the page.

  • tokens (list) – The tokens in the page.

  • extractedText (str) – The extracted text in the page obtained from OCR.

  • rotationAngle (float) – The detected rotation angle of the page in degrees. Positive values indicate clockwise and negative values indicate anti-clockwise rotation from the original orientation.

  • pageMarkdown (str) – The markdown text for the page.

  • embeddedText (str) – The embedded text in the page. Only available for digital documents.

doc_id
page
height
width
page_count
page_text
page_token_start_offset
token_count
tokens
extracted_text
rotation_angle
page_markdown
embedded_text
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict