abacusai.feature_group_version

Classes

FeatureGroupVersion

A materialized version of a feature group

Module Contents

class abacusai.feature_group_version.FeatureGroupVersion(client, featureGroupVersion=None, featureGroupId=None, sql=None, sourceTables=None, sourceDatasetVersions=None, createdAt=None, status=None, error=None, deployable=None, cpuSize=None, memory=None, useOriginalCsvNames=None, pythonFunctionBindings=None, indexingConfigWarningMsg=None, materializationStartedAt=None, materializationCompletedAt=None, columns=None, templateBindings=None, features={}, pointInTimeGroups={}, codeSource={}, annotationConfig={}, indexingConfig={})

Bases: abacusai.return_class.AbstractApiClass

A materialized version of a feature group

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureGroupVersion (str) – The unique identifier for this materialized version of feature group.

  • featureGroupId (str) – The unique identifier of the feature group this version belongs to.

  • sql (str) – The sql definition creating this feature group.

  • sourceTables (list[str]) – The source tables for this feature group.

  • sourceDatasetVersions (list[str]) – The dataset version ids for this feature group version.

  • createdAt (str) – The timestamp at which the feature group version was created.

  • status (str) – The current status of the feature group version.

  • error (str) – Relevant error if the status is FAILED.

  • deployable (bool) – whether feature group is deployable or not.

  • cpuSize (str) – Cpu size specified for the python feature group.

  • memory (int) – Memory in GB specified for the python feature group.

  • useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.

  • pythonFunctionBindings (list) – Config specifying variable names, types, and values to use when resolving a Python feature group.

  • indexingConfigWarningMsg (str) – The warning message related to indexing keys.

  • materializationStartedAt (str) – The timestamp at which the feature group materialization started.

  • materializationCompletedAt (str) – The timestamp at which the feature group materialization completed.

  • columns (list[feature]) – List of resolved columns.

  • templateBindings (list) – Template variable bindings used for resolving the template.

  • features (Feature) – List of features.

  • pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups

  • codeSource (CodeSource) – If a python feature group, information on the source code

  • annotationConfig (AnnotationConfig) – The annotations config for the feature group.

  • indexingConfig (IndexingConfig) – The indexing config for the feature group.

feature_group_version
feature_group_id
sql
source_tables
source_dataset_versions
created_at
status
error
deployable
cpu_size
memory
use_original_csv_names
python_function_bindings
indexing_config_warning_msg
materialization_started_at
materialization_completed_at
columns
template_bindings
features
point_in_time_groups
code_source
annotation_config
indexing_config
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

create_snapshot_feature_group(table_name)

Creates a Snapshot Feature Group corresponding to a specific Feature Group version.

Parameters:

table_name (str) – Name for the newly created Snapshot Feature Group table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

Returns:

Feature Group corresponding to the newly created Snapshot.

Return type:

FeatureGroup

export_to_file_connector(location, export_file_format, overwrite=False)

Export Feature group to File Connector.

Parameters:
  • location (str) – Cloud file location to export to.

  • export_file_format (str) – Enum string specifying the file format to export to.

  • overwrite (bool) – If true and a file exists at this location, this process will overwrite the file.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

export_to_database_connector(database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Export Feature group to Database Connector.

Parameters:
  • database_connector_id (str) – Unique string identifier for the Database Connector to export to.

  • object_name (str) – Name of the database object to write to.

  • write_mode (str) – Enum string indicating whether to use INSERT or UPSERT.

  • database_feature_mapping (dict) – Key/value pair JSON object of “database connector column” -> “feature name” pairs.

  • id_column (str) – Required if write_mode is UPSERT. Indicates which database column should be used as the lookup key.

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

export_to_console(export_file_format)

Export Feature group to console.

Parameters:

export_file_format (str) – File format to export to.

Returns:

The FeatureGroupExport instance.

Return type:

FeatureGroupExport

delete()

Deletes a Feature Group Version.

Parameters:

feature_group_version (str) – String identifier for the feature group version to be removed.

get_materialization_logs(stdout=False, stderr=False)

Returns logs for a materialized feature group version.

Parameters:
  • stdout (bool) – Set to True to get info logs.

  • stderr (bool) – Set to True to get error logs.

Returns:

A function logs object.

Return type:

FunctionLogs

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

FeatureGroupVersion

describe()

Describe a feature group version.

Parameters:

feature_group_version (str) – The unique identifier associated with the feature group version.

Returns:

The feature group version.

Return type:

FeatureGroupVersion

get_metrics(selected_columns=None, include_charts=False, include_statistics=True)

Get metrics for a specific feature group version.

Parameters:
  • selected_columns (List) – A list of columns to order first.

  • include_charts (bool) – A flag indicating whether charts should be included in the response. Default is false.

  • include_statistics (bool) – A flag indicating whether statistics should be included in the response. Default is true.

Returns:

The metrics for the specified feature group version.

Return type:

DataMetrics

get_logs()

Retrieves the feature group materialization logs.

Parameters:

feature_group_version (str) – The unique version ID of the feature group version.

Returns:

The logs for the specified feature group version.

Return type:

FeatureGroupVersionLogs

wait_for_results(timeout=3600)

A waiting call until feature group version is materialized

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_materialization(timeout=3600)

A waiting call until feature group version is materialized.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status()

Gets the status of the feature group version.

Returns:

A string describing the status of a feature group version (pending, complete, etc.).

Return type:

str

_download_avro_file(file_part, tmp_dir, part_index)
load_as_pandas(max_workers=10)

Loads the feature group version into a pandas dataframe.

Parameters:

max_workers (int) – The number of threads.

Returns:

A pandas dataframe displaying the data in the feature group version.

Return type:

DataFrame

load_as_pandas_documents(doc_id_column, document_column, max_workers=10)

Loads a feature group with documents data into a pandas dataframe.

Parameters:
  • doc_id_feature (str) – The name of the feature / column containing the document ID.

  • document_feature (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.

  • max_workers (int) – The number of threads.

  • doc_id_column (str)

  • document_column (str)

Returns:

A pandas dataframe containing the extracted document data.

Return type:

DataFrame