abacusai.feature_group_version
Classes
A materialized version of a feature group |
Module Contents
- class abacusai.feature_group_version.FeatureGroupVersion(client, featureGroupVersion=None, featureGroupId=None, sql=None, sourceTables=None, sourceDatasetVersions=None, createdAt=None, status=None, error=None, deployable=None, cpuSize=None, memory=None, useOriginalCsvNames=None, pythonFunctionBindings=None, indexingConfigWarningMsg=None, materializationStartedAt=None, materializationCompletedAt=None, columns=None, templateBindings=None, features={}, pointInTimeGroups={}, codeSource={}, annotationConfig={}, indexingConfig={})
Bases:
abacusai.return_class.AbstractApiClass
A materialized version of a feature group
- Parameters:
client (ApiClient) – An authenticated API Client instance
featureGroupVersion (str) – The unique identifier for this materialized version of feature group.
featureGroupId (str) – The unique identifier of the feature group this version belongs to.
sql (str) – The sql definition creating this feature group.
sourceTables (list[str]) – The source tables for this feature group.
sourceDatasetVersions (list[str]) – The dataset version ids for this feature group version.
createdAt (str) – The timestamp at which the feature group version was created.
status (str) – The current status of the feature group version.
error (str) – Relevant error if the status is FAILED.
deployable (bool) – whether feature group is deployable or not.
cpuSize (str) – Cpu size specified for the python feature group.
memory (int) – Memory in GB specified for the python feature group.
useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.
pythonFunctionBindings (list) – Config specifying variable names, types, and values to use when resolving a Python feature group.
indexingConfigWarningMsg (str) – The warning message related to indexing keys.
materializationStartedAt (str) – The timestamp at which the feature group materialization started.
materializationCompletedAt (str) – The timestamp at which the feature group materialization completed.
columns (list[feature]) – List of resolved columns.
templateBindings (list) – Template variable bindings used for resolving the template.
features (Feature) – List of features.
pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups
codeSource (CodeSource) – If a python feature group, information on the source code
annotationConfig (AnnotationConfig) – The annotations config for the feature group.
indexingConfig (IndexingConfig) – The indexing config for the feature group.
- feature_group_version
- feature_group_id
- sql
- source_tables
- source_dataset_versions
- created_at
- status
- error
- deployable
- cpu_size
- memory
- use_original_csv_names
- python_function_bindings
- indexing_config_warning_msg
- materialization_started_at
- materialization_completed_at
- columns
- template_bindings
- features
- point_in_time_groups
- code_source
- annotation_config
- indexing_config
- deprecated_keys
- __repr__()
- to_dict()
Get a dict representation of the parameters in this class
- Returns:
The dict value representation of the class parameters
- Return type:
- create_snapshot_feature_group(table_name)
Creates a Snapshot Feature Group corresponding to a specific Feature Group version.
- Parameters:
table_name (str) – Name for the newly created Snapshot Feature Group table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.
- Returns:
Feature Group corresponding to the newly created Snapshot.
- Return type:
- export_to_file_connector(location, export_file_format, overwrite=False)
Export Feature group to File Connector.
- Parameters:
- Returns:
The FeatureGroupExport instance.
- Return type:
- export_to_database_connector(database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)
Export Feature group to Database Connector.
- Parameters:
database_connector_id (str) – Unique string identifier for the Database Connector to export to.
object_name (str) – Name of the database object to write to.
write_mode (str) – Enum string indicating whether to use INSERT or UPSERT.
database_feature_mapping (dict) – Key/value pair JSON object of “database connector column” -> “feature name” pairs.
id_column (str) – Required if write_mode is UPSERT. Indicates which database column should be used as the lookup key.
additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting.
- Returns:
The FeatureGroupExport instance.
- Return type:
- export_to_console(export_file_format)
Export Feature group to console.
- Parameters:
export_file_format (str) – File format to export to.
- Returns:
The FeatureGroupExport instance.
- Return type:
- delete()
Deletes a Feature Group Version.
- Parameters:
feature_group_version (str) – String identifier for the feature group version to be removed.
- get_materialization_logs(stdout=False, stderr=False)
Returns logs for a materialized feature group version.
- Parameters:
- Returns:
A function logs object.
- Return type:
- refresh()
Calls describe and refreshes the current object’s fields
- Returns:
The current object
- Return type:
- describe()
Describe a feature group version.
- Parameters:
feature_group_version (str) – The unique identifier associated with the feature group version.
- Returns:
The feature group version.
- Return type:
- get_metrics(selected_columns=None, include_charts=False, include_statistics=True)
Get metrics for a specific feature group version.
- Parameters:
- Returns:
The metrics for the specified feature group version.
- Return type:
- get_logs()
Retrieves the feature group materialization logs.
- Parameters:
feature_group_version (str) – The unique version ID of the feature group version.
- Returns:
The logs for the specified feature group version.
- Return type:
- wait_for_results(timeout=3600)
A waiting call until feature group version is materialized
- Parameters:
timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.
- wait_for_materialization(timeout=3600)
A waiting call until feature group version is materialized.
- Parameters:
timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.
- get_status()
Gets the status of the feature group version.
- Returns:
A string describing the status of a feature group version (pending, complete, etc.).
- Return type:
- _download_avro_file(file_part, tmp_dir, part_index)
- load_as_pandas(max_workers=10)
Loads the feature group version into a pandas dataframe.
- Parameters:
max_workers (int) – The number of threads.
- Returns:
A pandas dataframe displaying the data in the feature group version.
- Return type:
DataFrame
- load_as_pandas_documents(doc_id_column, document_column, max_workers=10)
Loads a feature group with documents data into a pandas dataframe.
- Parameters:
doc_id_feature (str) – The name of the feature / column containing the document ID.
document_feature (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.
max_workers (int) – The number of threads.
doc_id_column (str)
document_column (str)
- Returns:
A pandas dataframe containing the extracted document data.
- Return type:
DataFrame