abacusai.upload

Classes

Upload

A Upload Reference for uploading file parts

Module Contents

class abacusai.upload.Upload(client, uploadId=None, datasetUploadId=None, status=None, datasetId=None, datasetVersion=None, modelId=None, modelVersion=None, batchPredictionId=None, parts=None, createdAt=None)

Bases: abacusai.return_class.AbstractApiClass

A Upload Reference for uploading file parts

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • uploadId (str) – The unique ID generated when the upload process of the full large file in smaller parts is initiated.

  • datasetUploadId (str) – Same as upload_id. It is kept for backwards compatibility purposes.

  • status (str) – The current status of the upload.

  • datasetId (str) – A reference to the dataset this upload is adding data to.

  • datasetVersion (str) – A reference to the dataset version the upload is adding data to.

  • modelId (str) – A reference the model the upload is creating a version for

  • modelVersion (str) – A reference to the model version the upload is creating.

  • batchPredictionId (str) – A reference to the batch prediction the upload is creating.

  • parts (list[dict]) – A list containing the order of the file parts that have been uploaded.

  • createdAt (str) – The timestamp at which the upload was created.

upload_id
dataset_upload_id
status
dataset_id
dataset_version
model_id
model_version
batch_prediction_id
parts
created_at
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

cancel()

Cancels an upload.

Parameters:

upload_id (str) – A unique string identifier for the upload.

part(part_number, part_data)

Uploads part of a large dataset file from your bucket to our system. Our system currently supports parts of up to 5GB and full files of up to 5TB. Note that each part must be at least 5MB in size, unless it is the last part in the sequence of parts for the full file.

Parameters:
  • part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.

  • part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.

Returns:

The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.

Return type:

UploadPart

mark_complete()

Marks an upload process as complete.

Parameters:

upload_id (str) – A unique string identifier for the upload process.

Returns:

The upload object associated with the process, containing details of the file.

Return type:

Upload

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

Upload

describe()

Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.

Parameters:

upload_id (str) – The unique ID associated with the file uploaded or being uploaded in parts.

Returns:

Details associated with the large dataset file uploaded in parts.

Return type:

Upload

upload_part(upload_args)

Uploads a file part.

Returns:

The object ‘UploadPart’ that encapsulates the hash and the etag for the part that got uploaded.

Return type:

UploadPart

upload_file(file, threads=10, chunksize=1024 * 1024 * 10, wait_timeout=600)

Uploads the file in the specified chunk size using the specified number of workers.

Parameters:
  • file (IOBase) – A bytesIO or StringIO object to upload to Abacus.AI

  • threads (int) – The max number of workers to use while uploading the file

  • chunksize (int) – The number of bytes to use for each chunk while uploading the file. Defaults to 10 MB

  • wait_timeout (int) – The max number of seconds to wait for the file parts to be joined on Abacus.AI. Defaults to 600.

Returns:

The upload file object.

Return type:

Upload

_yield_upload_part(file, chunksize)
wait_for_join(timeout=600)

A waiting call until the upload parts are joined.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to have timed out. Defaults to 600.

get_status()

Gets the status of the upload.

Returns:

A string describing the status of the upload (pending, complete, etc.).

Return type:

str