abacusai.page_data
Classes
Data extracted from a docstore page. |
Module Contents
- class abacusai.page_data.PageData(client, docId=None, page=None, height=None, width=None, pageCount=None, pageText=None, pageTokenStartOffset=None, tokenCount=None, tokens=None, extractedText=None, rotationAngle=None, pageMarkdown=None, embeddedText=None)
Bases:
abacusai.return_class.AbstractApiClass
Data extracted from a docstore page.
- Parameters:
client (ApiClient) – An authenticated API Client instance
docId (str) – Unique Docstore string identifier for the document.
page (int) – The page number. Starts from 0.
height (int) – The height of the page in pixels.
width (int) – The width of the page in pixels.
pageCount (int) – The total number of pages in document.
pageText (str) – The text extracted from the page.
pageTokenStartOffset (int) – The offset of the first token in the page.
tokenCount (int) – The number of tokens in the page.
tokens (list) – The tokens in the page.
extractedText (str) – The extracted text in the page obtained from OCR.
rotationAngle (float) – The detected rotation angle of the page in degrees. Positive values indicate clockwise and negative values indicate anti-clockwise rotation from the original orientation.
pageMarkdown (str) – The markdown text for the page.
embeddedText (str) – The embedded text in the page. Only available for digital documents.
- doc_id
- page
- height
- width
- page_count
- page_text
- page_token_start_offset
- token_count
- tokens
- extracted_text
- rotation_angle
- page_markdown
- embedded_text
- deprecated_keys
- __repr__()