abacusai.dataset_version

Classes

DatasetVersion

A specific version of a dataset

Module Contents

class abacusai.dataset_version.DatasetVersion(client, datasetVersion=None, status=None, datasetId=None, size=None, rowCount=None, fileInspectMetadata=None, createdAt=None, error=None, incrementalQueriedAt=None, uploadId=None, mergeFileSchemas=None, databaseConnectorConfig=None, applicationConnectorConfig=None, invalidRecords=None)

Bases: abacusai.return_class.AbstractApiClass

A specific version of a dataset

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • datasetVersion (str) – The unique identifier of the dataset version.

  • status (str) – The current status of the dataset version

  • datasetId (str) – A reference to the Dataset this dataset version belongs to.

  • size (int) – The size in bytes of the file.

  • rowCount (int) – Number of rows in the dataset version.

  • fileInspectMetadata (dict) – Metadata information about file’s inspection. For example - the detected delimiter for CSV files.

  • createdAt (str) – The timestamp this dataset version was created.

  • error (str) – If status is FAILED, this field will be populated with an error.

  • incrementalQueriedAt (str) – If the dataset version is from an incremental dataset, this is the last entry of timestamp column when the dataset version was created.

  • uploadId (str) – If the dataset version is being uploaded, this the reference to the Upload

  • mergeFileSchemas (bool) – If the merge file schemas policy is enabled.

  • databaseConnectorConfig (dict) – The database connector query used to retrieve data for this version.

  • applicationConnectorConfig (dict) – The application connector used to retrieve data for this version.

  • invalidRecords (str) – Invalid records in the dataset version

dataset_version = None
status = None
dataset_id = None
size = None
row_count = None
file_inspect_metadata = None
created_at = None
error = None
incremental_queried_at = None
upload_id = None
merge_file_schemas = None
database_connector_config = None
application_connector_config = None
invalid_records = None
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

get_metrics(selected_columns=None, include_charts=False, include_statistics=True)

Get metrics for a specific dataset version.

Parameters:
  • selected_columns (List) – A list of columns to order first.

  • include_charts (bool) – A flag indicating whether charts should be included in the response. Default is false.

  • include_statistics (bool) – A flag indicating whether statistics should be included in the response. Default is true.

Returns:

The metrics for the specified Dataset version.

Return type:

DataMetrics

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

DatasetVersion

describe()

Retrieves a full description of the specified dataset version, including its ID, name, source type, and other attributes.

Parameters:

dataset_version (str) – Unique string identifier associated with the dataset version.

Returns:

The dataset version.

Return type:

DatasetVersion

delete()

Deletes the specified dataset version from the organization.

Parameters:

dataset_version (str) – String identifier of the dataset version to delete.

get_logs()

Retrieves the dataset import logs.

Parameters:

dataset_version (str) – The unique version ID of the dataset version.

Returns:

The logs for the specified dataset version.

Return type:

DatasetVersionLogs

wait_for_import(timeout=900)

A waiting call until dataset version is imported.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_inspection(timeout=None)

A waiting call until dataset version is completely inspected.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the dataset version.

Returns:

A string describing the status of a dataset version (importing, inspecting, complete, etc.).

Return type:

str