abacusai.pipeline

Classes

Pipeline

A Pipeline For Steps.

Module Contents

class abacusai.pipeline.Pipeline(client, pipelineName=None, pipelineId=None, createdAt=None, notebookId=None, cron=None, nextRunTime=None, isProd=None, warning=None, createdBy=None, steps={}, pipelineReferences={}, latestPipelineVersion={}, codeSource={}, pipelineVariableMappings={})

Bases: abacusai.return_class.AbstractApiClass

A Pipeline For Steps.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • pipelineName (str) – The name of the pipeline this step is a part of.

  • pipelineId (str) – The reference to the pipeline this step belongs to.

  • createdAt (str) – The date and time which the pipeline was created.

  • notebookId (str) – The reference to the notebook this pipeline belongs to.

  • cron (str) – A cron-style string that describes when this refresh policy is to be executed in UTC

  • nextRunTime (str) – The next time this pipeline will be run.

  • isProd (bool) – Whether this pipeline is a production pipeline.

  • warning (str) – Warning message for possible errors that might occur if the pipeline is run.

  • createdBy (str) – The email of the user who created the pipeline

  • steps (PipelineStep) – A list of the pipeline steps attached to the pipeline.

  • pipelineReferences (PipelineReference) – A list of references from the pipeline to other objects

  • latestPipelineVersion (PipelineVersion) – The latest version of the pipeline.

  • codeSource (CodeSource) – information on the source code

  • pipelineVariableMappings (PythonFunctionArgument) – A description of the function variables into the pipeline.

pipeline_name = None
pipeline_id = None
created_at = None
notebook_id = None
cron = None
next_run_time = None
is_prod = None
warning = None
created_by = None
steps
pipeline_references
latest_pipeline_version
code_source
pipeline_variable_mappings
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

Pipeline

describe()

Describes a given pipeline.

Parameters:

pipeline_id (str) – The ID of the pipeline to describe.

Returns:

An object describing a Pipeline

Return type:

Pipeline

update(project_id=None, pipeline_variable_mappings=None, cron=None, is_prod=None)

Updates a pipeline for executing multiple steps.

Parameters:
  • project_id (str) – A unique string identifier for the pipeline.

  • pipeline_variable_mappings (List) – List of Python function arguments for the pipeline.

  • cron (str) – A cron-like string specifying the frequency of the scheduled pipeline runs.

  • is_prod (bool) – Whether the pipeline is a production pipeline or not.

Returns:

An object that describes a Pipeline.

Return type:

Pipeline

rename(pipeline_name)

Renames a pipeline.

Parameters:

pipeline_name (str) – The new name of the pipeline.

Returns:

An object that describes a Pipeline.

Return type:

Pipeline

delete()

Deletes a pipeline.

Parameters:

pipeline_id (str) – The ID of the pipeline to delete.

list_versions(limit=200)

Lists the pipeline versions for a specified pipeline

Parameters:

limit (int) – The maximum number of pipeline versions to return.

Returns:

A list of pipeline versions.

Return type:

list[PipelineVersion]

run(pipeline_variable_mappings=None)

Runs a specified pipeline with the arguments provided.

Parameters:

pipeline_variable_mappings (List) – List of Python function arguments for the pipeline.

Returns:

The object describing the pipeline

Return type:

PipelineVersion

create_step(step_name, function_name=None, source_code=None, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None, timeout=None)

Creates a step in a given pipeline.

Parameters:
  • step_name (str) – The name of the step.

  • function_name (str) – The name of the Python function.

  • source_code (str) – Contents of a valid Python source code file. The source code should contain the transform feature group functions. A list of allowed imports and system libraries for each language is specified in the user functions documentation section.

  • step_input_mappings (List) – List of Python function arguments.

  • output_variable_mappings (List) – List of Python function outputs.

  • step_dependencies (list) – List of step names this step depends on.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • cpu_size (str) – Size of the CPU for the step function.

  • memory (int) – Memory (in GB) for the step function.

  • timeout (int) – Timeout for the step in minutes, default is 300 minutes.

Returns:

Object describing the pipeline.

Return type:

Pipeline

describe_step_by_name(step_name)

Describes a pipeline step by the step name.

Parameters:

step_name (str) – The name of the step.

Returns:

An object describing the pipeline step.

Return type:

PipelineStep

unset_refresh_schedule()

Deletes the refresh schedule for a given pipeline.

Parameters:

pipeline_id (str) – The id of the pipeline.

Returns:

Object describing the pipeline.

Return type:

Pipeline

pause_refresh_schedule()

Pauses the refresh schedule for a given pipeline.

Parameters:

pipeline_id (str) – The id of the pipeline.

Returns:

Object describing the pipeline.

Return type:

Pipeline

resume_refresh_schedule()

Resumes the refresh schedule for a given pipeline.

Parameters:

pipeline_id (str) – The id of the pipeline.

Returns:

Object describing the pipeline.

Return type:

Pipeline

create_step_from_function(step_name, function, step_input_mappings=None, output_variable_mappings=None, step_dependencies=None, package_requirements=None, cpu_size=None, memory=None)

Creates a step in the pipeline from a python function.

Parameters:
  • step_name (str) – The name of the step.

  • function (callable) – The python function.

  • step_input_mappings (List[PythonFunctionArguments]) – List of Python function arguments.

  • output_variable_mappings (List[OutputVariableMapping]) – List of Python function ouputs.

  • step_dependencies (List[str]) – List of step names this step depends on.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • cpu_size (str) – Size of the CPU for the step function.

  • memory (int) – Memory (in GB) for the step function.

wait_for_pipeline(timeout=1200)

A waiting call until all the stages of the latest pipeline version is completed.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the pipeline version.

Returns:

A string describing the status of a pipeline version (pending, running, complete, etc.).

Return type:

str