abacusai.api_client_utils

Attributes

INVALID_PANDAS_COLUMN_NAME_CHARACTERS

Classes

StreamingHandler

str(object='') -> str

StreamType

Generic enumeration.

DocstoreUtils

Utility class for loading docstore data.

Functions

clean_column_name(column)

avro_to_pandas_dtype(avro_type)

_get_spark_incompatible_columns(df)

get_non_nullable_type(types)

get_object_from_context(client, context, ...)

load_as_pandas_from_avro_fd(fd)

load_as_pandas_from_avro_files(files, download_method)

validate_workflow_node_inputs(nodes_info, ...)

run(nodes, primary_start_node, graph_info[, ...])

evaluate_edge_condition(source, target, details, ...)

execute_python_source(python_expression, variables)

process_node_response(node_response)

try_abacus_internal_copy(src_suffix, dst_local[, ...])

Retuns true if the file was copied, false otherwise

Module Contents

abacusai.api_client_utils.INVALID_PANDAS_COLUMN_NAME_CHARACTERS = '[^A-Za-z0-9_]'
abacusai.api_client_utils.clean_column_name(column)
abacusai.api_client_utils.avro_to_pandas_dtype(avro_type)
abacusai.api_client_utils._get_spark_incompatible_columns(df)
abacusai.api_client_utils.get_non_nullable_type(types)
class abacusai.api_client_utils.StreamingHandler

Bases: str

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

classmethod process_streaming_data(value, context, section_key, data_type, is_transient)
abacusai.api_client_utils.get_object_from_context(client, context, variable_name, return_type)
abacusai.api_client_utils.load_as_pandas_from_avro_fd(fd)
Parameters:

fd (IO)

abacusai.api_client_utils.load_as_pandas_from_avro_files(files, download_method, max_workers=10)
Parameters:
  • files (List[str])

  • download_method (Callable)

  • max_workers (int)

abacusai.api_client_utils.validate_workflow_node_inputs(nodes_info, agent_workflow_node_id, keyword_arguments, sample_user_inputs, filtered_workflow_vars)
Parameters:
  • keyword_arguments (dict)

  • sample_user_inputs (dict)

  • filtered_workflow_vars (dict)

abacusai.api_client_utils.run(nodes, primary_start_node, graph_info, sample_user_inputs=None, agent_workflow_node_id=None, workflow_vars={}, topological_dfs_stack=[])
Parameters:
  • nodes (List[dict])

  • primary_start_node (str)

  • graph_info (dict)

  • sample_user_inputs (dict)

  • agent_workflow_node_id (str)

  • workflow_vars (dict)

  • topological_dfs_stack (List)

abacusai.api_client_utils.evaluate_edge_condition(source, target, details, workflow_vars)
abacusai.api_client_utils.execute_python_source(python_expression, variables)
abacusai.api_client_utils.process_node_response(node_response)
class abacusai.api_client_utils.StreamType

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

MESSAGE = 'message'
SECTION_OUTPUT = 'section_output'
SEGMENT = 'segment'
class abacusai.api_client_utils.DocstoreUtils

Utility class for loading docstore data. Needs to be updated if docstore formats change.

DOC_ID = 'doc_id'
PREDICTION_PREFIX = 'prediction'
FIRST_PAGE = 'first_page'
LAST_PAGE = 'last_page'
PAGE_TEXT = 'page_text'
PAGES = 'pages'
CONTENT = 'content'
TOKENS = 'tokens'
PAGES_ZIP_METADATA = 'pages_zip_metadata'
PAGE_DATA = 'page_data'
HEIGHT = 'height'
WIDTH = 'width'
METADATA = 'metadata'
PAGE = 'page'
BLOCK = 'block'
LINE = 'line'
EXTRACTED_TEXT = 'extracted_text'
EMBEDDED_TEXT = 'embedded_text'
PAGE_MARKDOWN = 'page_markdown'
PAGE_LLM_OCR = 'page_llm_ocr'
PAGE_TABLE_TEXT = 'page_table_text'
MARKDOWN_FEATURES = 'markdown_features'
MULTI_MODE_OCR_TEXT = 'multi_mode_ocr_text'
DOCUMENT_PROCESSING_CONFIG = 'document_processing_config'
DOCUMENT_PROCESSING_VERSION = 'document_processing_version'
static get_archive_id(doc_id)
Parameters:

doc_id (str)

static get_page_id(doc_id, page)
Parameters:
static get_content_hash(doc_id)
Parameters:

doc_id (str)

classmethod get_pandas_pages_df(df, feature_group_version, doc_id_column, document_column, get_docstore_resource_bytes, get_document_processing_result_infos, max_workers=10)
Parameters:
  • feature_group_version (str)

  • doc_id_column (str)

  • document_column (str)

  • get_docstore_resource_bytes (Callable[Ellipsis, bytes])

  • get_document_processing_result_infos (Callable)

  • max_workers (int)

classmethod get_pandas_documents_df(df, feature_group_version, doc_id_column, document_column, get_docstore_resource_bytes, get_document_processing_result_infos, max_workers=10)
Parameters:
  • feature_group_version (str)

  • doc_id_column (str)

  • document_column (str)

  • get_docstore_resource_bytes (Callable)

  • get_document_processing_result_infos (Callable)

  • max_workers (int)

abacusai.api_client_utils.try_abacus_internal_copy(src_suffix, dst_local, raise_exception=True)

Retuns true if the file was copied, false otherwise