abacusai.feature_group

Classes

FeatureGroup

A feature group.

Module Contents

class abacusai.feature_group.FeatureGroup(client, featureGroupId=None, modificationLock=None, name=None, featureGroupSourceType=None, tableName=None, sql=None, datasetId=None, functionSourceCode=None, functionName=None, sourceTables=None, createdAt=None, description=None, sqlError=None, latestVersionOutdated=None, referencedFeatureGroups=None, tags=None, primaryKey=None, updateTimestampKey=None, lookupKeys=None, streamingEnabled=None, incremental=None, mergeConfig=None, samplingConfig=None, cpuSize=None, memory=None, streamingReady=None, featureTags=None, moduleName=None, templateBindings=None, featureExpression=None, useOriginalCsvNames=None, pythonFunctionBindings=None, pythonFunctionName=None, useGpu=None, versionLimit=None, exportOnMaterialization=None, features={}, duplicateFeatures={}, pointInTimeGroups={}, annotationConfig={}, concatenationConfig={}, indexingConfig={}, codeSource={}, featureGroupTemplate={}, explanation={}, refreshSchedules={}, exportConnectorConfig={}, latestFeatureGroupVersion={}, operatorConfig={})

Bases: abacusai.return_class.AbstractApiClass

A feature group.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • featureGroupId (str) – Unique identifier for this feature group.

  • modificationLock (bool) – If feature group is locked against a change or not.

  • name (str)

  • featureGroupSourceType (str) – The source type of the feature group

  • tableName (str) – Unique table name of this feature group.

  • sql (str) – SQL definition creating this feature group.

  • datasetId (str) – Dataset ID the feature group is sourced from.

  • functionSourceCode (str) – Source definition creating this feature group.

  • functionName (str) – Function name to execute from the source code.

  • sourceTables (list[str]) – Source tables for this feature group.

  • createdAt (str) – Timestamp at which the feature group was created.

  • description (str) – Description of the feature group.

  • sqlError (str) – Error message with this feature group.

  • latestVersionOutdated (bool) – Is latest materialized feature group version outdated.

  • referencedFeatureGroups (list[str]) – Feature groups this feature group is used in.

  • tags (list[str]) – Tags added to this feature group.

  • primaryKey (str) – Primary index feature.

  • updateTimestampKey (str) – Primary timestamp feature.

  • lookupKeys (list[str]) – Additional indexed features for this feature group.

  • streamingEnabled (bool) – If true, the feature group can have data streamed to it.

  • incremental (bool) – If feature group corresponds to an incremental dataset.

  • mergeConfig (dict) – Merge configuration settings for the feature group.

  • samplingConfig (dict) – Sampling configuration for the feature group.

  • cpuSize (str) – CPU size specified for the Python feature group.

  • memory (int) – Memory in GB specified for the Python feature group.

  • streamingReady (bool) – If true, the feature group is ready to receive streaming data.

  • featureTags (dict) – Tags for features in this feature group

  • moduleName (str) – Path to the file with the feature group function.

  • templateBindings (dict) – Config specifying variable names and values to use when resolving a feature group template.

  • featureExpression (str) – If the dataset feature group has custom features, the SQL select expression creating those features.

  • useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.

  • pythonFunctionBindings (dict) – Config specifying variable names, types, and values to use when resolving a Python feature group.

  • pythonFunctionName (str) – Name of the Python function the feature group was built from.

  • useGpu (bool) – Whether this feature group is using gpu

  • versionLimit (int) – Version limit for the feature group.

  • exportOnMaterialization (bool) – Whether to export the feature group on materialization.

  • features (Feature) – List of resolved features.

  • duplicateFeatures (Feature) – List of duplicate features.

  • pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups.

  • annotationConfig (AnnotationConfig) – Annotation config for this feature

  • latestFeatureGroupVersion (FeatureGroupVersion) – Latest feature group version.

  • concatenationConfig (ConcatenationConfig) – Feature group ID whose data will be concatenated into this feature group.

  • indexingConfig (IndexingConfig) – Indexing config for the feature group for feature store

  • codeSource (CodeSource) – If a Python feature group, information on the source code.

  • featureGroupTemplate (FeatureGroupTemplate) – FeatureGroupTemplate to use when this feature group is attached to a template.

  • explanation (NaturalLanguageExplanation) – Natural language explanation of the feature group

  • refreshSchedules (RefreshSchedule) – List of schedules that determines when the next version of the feature group will be created.

  • exportConnectorConfig (FeatureGroupRefreshExportConfig) – The export config (file connector or database connector information) for feature group exports.

  • operatorConfig (OperatorConfig) – Operator configuration settings for the feature group.

feature_group_id
modification_lock
name
feature_group_source_type
table_name
sql
dataset_id
function_source_code
function_name
source_tables
created_at
description
sql_error
latest_version_outdated
referenced_feature_groups
tags
primary_key
update_timestamp_key
lookup_keys
streaming_enabled
incremental
merge_config
sampling_config
cpu_size
memory
streaming_ready
feature_tags
module_name
template_bindings
feature_expression
use_original_csv_names
python_function_bindings
python_function_name
use_gpu
version_limit
export_on_materialization
features
duplicate_features
point_in_time_groups
annotation_config
concatenation_config
indexing_config
code_source
feature_group_template
explanation
refresh_schedules
export_connector_config
latest_feature_group_version
operator_config
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

add_to_project(project_id, feature_group_type='CUSTOM_TABLE')

Adds a feature group to a project.

Parameters:
  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type of the feature group, based on the use case under which the feature group is being created.

set_project_config(project_id, project_config=None)

Sets a feature group’s project config

Parameters:
  • project_id (str) – Unique string identifier for the project.

  • project_config (ProjectFeatureGroupConfig) – Feature group’s project configuration.

get_project_config(project_id)

Gets a feature group’s project config

Parameters:

project_id (str) – Unique string identifier for the project.

Returns:

The feature group’s project configuration.

Return type:

ProjectConfig

remove_from_project(project_id)

Removes a feature group from a project.

Parameters:

project_id (str) – The unique ID associated with the project.

set_type(project_id, feature_group_type='CUSTOM_TABLE')

Update the feature group type in a project. The feature group must already be added to the project.

Parameters:
  • project_id (str) – Unique identifier associated with the project.

  • feature_group_type (str) – The feature group type to set the feature group as.

describe_annotation(feature_name=None, doc_id=None, feature_group_row_identifier=None)

Get the latest annotation entry for a given feature group, feature, and document.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

Returns:

The latest annotation entry for the given feature group, feature, document, and/or annotation key value.

Return type:

AnnotationEntry

verify_and_describe_annotation(feature_name=None, doc_id=None, feature_group_row_identifier=None)

Get the latest annotation entry for a given feature group, feature, and document along with verification information.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

Returns:

The latest annotation entry for the given feature group, feature, document, and/or annotation key value. Includes the verification information.

Return type:

AnnotationEntry

update_annotation_status(feature_name, status, doc_id=None, feature_group_row_identifier=None, save_metadata=False)

Update the status of an annotation entry.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • status (str) – The new status of the annotation. Must be one of the following: ‘TODO’, ‘IN_PROGRESS’, ‘DONE’.

  • doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.

  • save_metadata (bool) – If True, save the metadata for the annotation entry.

Returns:

The updated annotation entry.

Return type:

AnnotationEntry

get_document_to_annotate(project_id, feature_name, feature_group_row_identifier=None, get_previous=False)

Get an available document that needs to be annotated for a annotation feature group.

Parameters:
  • project_id (str) – The ID of the project that the annotation is associated with.

  • feature_name (str) – The name of the feature the annotation is on.

  • feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the primary key value. If provided, fetch the immediate next (or previous) available document.

  • get_previous (bool) – If True, get the previous document instead of the next document. Applicable if feature_group_row_identifier is provided.

Returns:

The document to annotate.

Return type:

AnnotationDocument

get_annotations_status(feature_name=None, check_for_materialization=False)

Get the status of the annotations for a given feature group and feature.

Parameters:
  • feature_name (str) – The name of the feature the annotation is on.

  • check_for_materialization (bool) – If True, check if the feature group needs to be materialized before using for annotations.

Returns:

The status of the annotations for the given feature group and feature.

Return type:

AnnotationsStatus

import_annotation_labels(file, annotation_type)

Imports annotation labels from csv file. All valid values in the file will be imported as labels (including header row if present).

Parameters:
  • file (io.TextIOBase) – The file to import. Must be a csv file.

  • annotation_type (str) – The type of the annotation.

Returns:

The annotation config for the feature group.

Return type:

AnnotationConfig

create_sampling(table_name, sampling_config, description=None)

Creates a new Feature Group defined as a sample of rows from another Feature Group.

For efficiency, sampling is approximate unless otherwise specified. (e.g. the number of rows may vary slightly from what was requested).

Parameters:
  • table_name (str) – The unique name to be given to this sampling Feature Group. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

  • sampling_config (SamplingConfig) – Dictionary defining the sampling method and its parameters.

  • description (str) – A human-readable description of this Feature Group.

Returns:

The created Feature Group.

Return type:

FeatureGroup

set_sampling_config(sampling_config)

Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.

Parameters:

sampling_config (SamplingConfig) – A JSON string object specifying the sampling method and parameters specific to that sampling method. An empty sampling_config indicates no sampling.

Returns:

The updated FeatureGroup.

Return type:

FeatureGroup

set_merge_config(merge_config)

Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.

Parameters:

merge_config (MergeConfig) – JSON object string specifying the merge rule. An empty merge_config will default to only including the latest dataset version.

Returns:

The updated FeatureGroup.

Return type:

FeatureGroup

set_operator_config(operator_config)

Set a OperatorFeatureGroup’s operator config to the values provided.

Parameters:

operator_config (OperatorConfig) – A dictionary object specifying the pre-defined operations.

Returns:

The updated FeatureGroup.

Return type:

FeatureGroup

set_schema(schema)

Creates a new schema and points the feature group to the new feature group schema ID.

Parameters:

schema (list) – JSON string containing an array of objects with ‘name’ and ‘dataType’ properties.

get_schema(project_id=None)

Returns a schema for a given FeatureGroup in a project.

Parameters:

project_id (str) – The unique ID associated with the project.

Returns:

A list of objects for each column in the specified feature group.

Return type:

list[Feature]

create_feature(name, select_expression)

Creates a new feature in a Feature Group from a SQL select statement.

Parameters:
  • name (str) – The name of the feature to add.

  • select_expression (str) – SQL SELECT expression to create the feature.

Returns:

A Feature Group object with the newly added feature.

Return type:

FeatureGroup

add_tag(tag)

Adds a tag to the feature group

Parameters:

tag (str) – The tag to add to the feature group.

remove_tag(tag)

Removes a tag from the specified feature group.

Parameters:

tag (str) – The tag to remove from the feature group.

add_annotatable_feature(name, annotation_type)

Add an annotatable feature in a Feature Group

Parameters:
  • name (str) – The name of the feature to add.

  • annotation_type (str) – The type of annotation to set.

Returns:

The feature group after the feature has been set

Return type:

FeatureGroup

set_feature_as_annotatable_feature(feature_name, annotation_type, feature_group_row_identifier_feature=None, doc_id_feature=None)

Sets an existing feature as an annotatable feature (Feature that can be annotated).

Parameters:
  • feature_name (str) – The name of the feature to set as annotatable.

  • annotation_type (str) – The type of annotation label to add.

  • feature_group_row_identifier_feature (str) – The key value of the feature group row the annotation is on (cast to string) and uniquely identifies the feature group row. At least one of the doc_id or key value must be provided so that the correct annotation can be identified.

  • doc_id_feature (str) – The name of the document ID feature.

Returns:

A feature group object with the newly added annotatable feature.

Return type:

FeatureGroup

set_annotation_status_feature(feature_name)

Sets a feature as the annotation status feature for a feature group.

Parameters:

feature_name (str) – The name of the feature to set as the annotation status feature.

Returns:

The updated feature group.

Return type:

FeatureGroup

unset_feature_as_annotatable_feature(feature_name)

Unsets a feature as annotatable

Parameters:

feature_name (str) – The name of the feature to unset.

Returns:

The feature group after unsetting the feature

Return type:

FeatureGroup

add_annotation_label(label_name, annotation_type, label_definition=None)

Adds an annotation label

Parameters:
  • label_name (str) – The name of the label.

  • annotation_type (str) – The type of the annotation to set.

  • label_definition (str) – the definition of the label.

Returns:

The feature group after adding the annotation label

Return type:

FeatureGroup

remove_annotation_label(label_name)

Removes an annotation label

Parameters:

label_name (str) – The name of the label to remove.

Returns:

The feature group after adding the annotation label

Return type:

FeatureGroup

add_feature_tag(feature, tag)

Adds a tag on a feature

Parameters:
  • feature (str) – The feature to set the tag on.

  • tag (str) – The tag to set on the feature.

remove_feature_tag(feature, tag)

Removes a tag from a feature

Parameters:
  • feature (str) – The feature to remove the tag from.

  • tag (str) – The tag to remove.

create_nested_feature(nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)

Creates a new nested feature in a feature group from a SQL statement.

Parameters:
  • nested_feature_name (str) – The name of the feature.

  • table_name (str) – The table name of the feature group to nest. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent.

  • where_clause (str) – A SQL WHERE statement to filter the nested rows.

  • order_clause (str) – A SQL clause to order the nested rows.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

update_nested_feature(nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)

Updates a previously existing nested feature in a feature group.

Parameters:
  • nested_feature_name (str) – The name of the feature to be updated.

  • table_name (str) – The name of the table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent.

  • where_clause (str) – An SQL WHERE statement to filter the nested rows.

  • order_clause (str) – An SQL clause to order the nested rows.

  • new_nested_feature_name (str) – New name for the nested feature.

Returns:

A feature group object with the updated nested feature.

Return type:

FeatureGroup

delete_nested_feature(nested_feature_name)

Delete a nested feature.

Parameters:

nested_feature_name (str) – The name of the feature to be deleted.

Returns:

A feature group object without the specified nested feature.

Return type:

FeatureGroup

create_point_in_time_feature(feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)

Creates a new point in time feature in a feature group using another historical feature group, window spec, and aggregate expression.

We use the aggregation keys and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group.

If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature greater than or equal to lookbackStartCount and less than the value of the current rows timeFeature are considered. An optional lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to ensure that these rows are available in the online context when we are performing a lookup on this feature group. If the window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is greater than or equal to lookbackCount and includes the row just prior to the current one. The lag is specified in terms of positions using lookbackUntilPosition.

Parameters:
  • feature_name (str) – The name of the feature to create.

  • history_table_name (str) – The table name of the history table.

  • aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature.

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

update_point_in_time_feature(feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)

Updates an existing Point-in-Time (PiT) feature in a feature group. See createPointInTimeFeature for detailed semantics.

Parameters:
  • feature_name (str) – The name of the feature.

  • history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.

  • aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of the feature which contains the timestamp value for the PiT feature.

  • historical_timestamp_key (str) – Name of the feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If the window is specified in terms of time, the number of seconds in the past from the current time for the start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If the window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • new_feature_name (str) – New name for the PiT feature.

Returns:

A feature group object with the newly added nested feature.

Return type:

FeatureGroup

create_point_in_time_group(group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)

Create a Point-in-Time Group

Parameters:
  • group_name (str) – The name of the point in time group.

  • window_key (str) – Name of feature to use for ordering the rows on the source table.

  • aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.

  • history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.

  • lookback_window (float) – Number of seconds in the past from the current time for the start of the window. If 0, the lookback will include all rows.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, “future” rows in the history table are used.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many “future” rows in the history table are used.

Returns:

The feature group after the point in time group has been created.

Return type:

FeatureGroup

generate_point_in_time_features(group_name, columns, window_functions, prefix=None)

Generates and adds PIT features given the selected columns to aggregate over, and the operations to include.

Parameters:
  • group_name (str) – Name of the point-in-time group.

  • columns (list) – List of columns to generate point-in-time features for.

  • window_functions (list) – List of window functions to operate on.

  • prefix (str) – Prefix for generated features, defaults to group name

Returns:

Feature group object with newly added point-in-time features.

Return type:

FeatureGroup

update_point_in_time_group(group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)

Update Point-in-Time Group

Parameters:
  • group_name (str) – The name of the point-in-time group.

  • window_key (str) – Name of feature which contains the timestamp value for the point-in-time feature.

  • aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.

  • history_aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.

  • lookback_window (float) – Number of seconds in the past from the current time for the start of the window.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, future rows in the history table are looked at.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many future rows in the history table are looked at.

Returns:

The feature group after the update has been applied.

Return type:

FeatureGroup

delete_point_in_time_group(group_name)

Delete point in time group

Parameters:

group_name (str) – The name of the point in time group.

Returns:

The feature group after the point in time group has been deleted.

Return type:

FeatureGroup

create_point_in_time_group_feature(group_name, name, expression)

Create point in time group feature

Parameters:
  • group_name (str) – The name of the point-in-time group.

  • name (str) – The name of the feature to add to the point-in-time group.

  • expression (str) – A SQL aggregate expression which can convert a sequence of rows into a scalar value.

Returns:

The feature group after the update has been applied.

Return type:

FeatureGroup

update_point_in_time_group_feature(group_name, name, expression)

Update a feature’s SQL expression in a point in time group

Parameters:
  • group_name (str) – The name of the point-in-time group.

  • name (str) – The name of the feature to add to the point-in-time group.

  • expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.

Returns:

The feature group after the update has been applied.

Return type:

FeatureGroup

set_feature_type(feature, feature_type, project_id=None)

Set the type of a feature in a feature group. Specify the feature group ID, feature name, and feature type, and the method will return the new column with the changes reflected.

Parameters:
  • feature (str) – The name of the feature.

  • feature_type (str) – The machine learning type of the data in the feature.

  • project_id (str) – Optional unique ID associated with the project.

Returns:

The feature group after the data_type is applied.

Return type:

Schema

concatenate_data(source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)

Concatenates data from one Feature Group to another. Feature Groups can be merged if their schemas are compatible, they have the special updateTimestampKey column, and (if set) the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).

Parameters:
  • source_feature_group_id (str) – The Feature Group to concatenate with the destination Feature Group.

  • merge_type (str) – UNION or INTERSECTION.

  • replace_until_timestamp (int) – The UNIX timestamp to specify the point until which we will replace data from the source Feature Group.

  • skip_materialize (bool) – If True, will not materialize the concatenated Feature Group.

remove_concatenation_config()

Removes the concatenation config on a destination feature group.

Parameters:

feature_group_id (str) – Unique identifier of the destination feature group to remove the concatenation configuration from.

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

FeatureGroup

describe()

Describe a Feature Group.

Parameters:

feature_group_id (str) – A unique string identifier associated with the feature group.

Returns:

The feature group object.

Return type:

FeatureGroup

set_indexing_config(primary_key=None, update_timestamp_key=None, lookup_keys=None)

Sets various attributes of the feature group used for primary key, deployment lookups and streaming updates.

Parameters:
  • primary_key (str) – Name of the feature which defines the primary key of the feature group.

  • update_timestamp_key (str) – Name of the feature which defines the update timestamp of the feature group. Used in concatenation and primary key deduplication.

  • lookup_keys (list) – List of feature names which can be used in the lookup API to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.

update(description=None)

Modify an existing Feature Group.

Parameters:

description (str) – Description of the Feature Group.

Returns:

Updated Feature Group object.

Return type:

FeatureGroup

detach_from_template()

Update a feature group to detach it from a template.

Parameters:

feature_group_id (str) – Unique string identifier associated with the feature group.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_template_bindings(template_bindings=None)

Update the feature group template bindings for a template feature group.

Parameters:

template_bindings (list) – Values in these bindings override values set in the template.

Returns:

Updated feature group.

Return type:

FeatureGroup

update_python_function_bindings(python_function_bindings)

Updates an existing Feature Group’s Python function bindings from a user-provided Python Function. If a list of feature groups are supplied within the Python function bindings, we will provide DataFrames (Pandas in the case of Python) with the materialized feature groups for those input feature groups as arguments to the function.

Parameters:

python_function_bindings (List) – List of python function arguments.

update_python_function(python_function_name, python_function_bindings=None, cpu_size=None, memory=None, use_gpu=None, use_original_csv_names=None)

Updates an existing Feature Group’s python function from a user provided Python Function. If a list of feature groups are supplied within the python function

bindings, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.

Parameters:
  • python_function_name (str) – The name of the python function to be associated with the feature group.

  • python_function_bindings (List) – List of python function arguments.

  • cpu_size (CPUSize) – Size of the CPU for the feature group python function.

  • memory (MemorySize) – Memory (in GB) for the feature group python function.

  • use_gpu (bool) – Whether the feature group needs a gpu or not. Otherwise default to CPU.

  • use_original_csv_names (bool) – If enabled, it uses the original column names for input feature groups from CSV datasets.

update_sql_definition(sql)

Updates the SQL statement for a feature group.

Parameters:

sql (str) – The input SQL statement for the feature group.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_dataset_feature_expression(feature_expression)

Updates the SQL feature expression for a Dataset FeatureGroup’s custom features

Parameters:

feature_expression (str) – The input SQL statement for the feature group.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_version_limit(version_limit)

Updates the version limit for the feature group.

Parameters:

version_limit (int) – The maximum number of versions permitted for the feature group. Once this limit is exceeded, the oldest versions will be purged in a First-In-First-Out (FIFO) order.

Returns:

The updated feature group.

Return type:

FeatureGroup

update_feature(name, select_expression=None, new_name=None)

Modifies an existing feature in a feature group.

Parameters:
  • name (str) – Name of the feature to be updated.

  • select_expression (str) – SQL statement for modifying the feature.

  • new_name (str) – New name of the feature.

Returns:

Updated feature group object.

Return type:

FeatureGroup

list_exports()

Lists all of the feature group exports for the feature group

Parameters:

feature_group_id (str) – Unique identifier of the feature group

Returns:

List of feature group exports

Return type:

list[FeatureGroupExport]

set_modifier_lock(locked=True)

Lock a feature group to prevent modification.

Parameters:

locked (bool) – Whether to disable or enable feature group modification (True or False).

list_modifiers()

List the users who can modify a given feature group.

Parameters:

feature_group_id (str) – Unique string identifier of the feature group.

Returns:

Information about the modification lock status and groups/organizations added to the feature group.

Return type:

ModificationLockInfo

add_user_to_modifiers(email)

Adds a user to a feature group.

Parameters:

email (str) – The email address of the user to be added.

add_organization_group_to_modifiers(organization_group_id)

Add OrganizationGroup to a feature group modifiers list

Parameters:

organization_group_id (str) – Unique string identifier of the organization group.

remove_user_from_modifiers(email)

Removes a user from a specified feature group.

Parameters:

email (str) – The email address of the user to be removed.

remove_organization_group_from_modifiers(organization_group_id)

Removes an OrganizationGroup from a feature group modifiers list

Parameters:

organization_group_id (str) – The unique ID associated with the organization group.

delete_feature(name)

Removes a feature from the feature group.

Parameters:

name (str) – Name of the feature to be deleted.

Returns:

Updated feature group object.

Return type:

FeatureGroup

delete()

Deletes a Feature Group.

Parameters:

feature_group_id (str) – Unique string identifier for the feature group to be removed.

create_version(variable_bindings=None)

Creates a snapshot for a specified feature group. Triggers materialization of the feature group. The new version of the feature group is created after it has materialized.

Parameters:

variable_bindings (dict) – Dictionary defining variable bindings that override parent feature group values.

Returns:

A feature group version.

Return type:

FeatureGroupVersion

list_versions(limit=100, start_after_version=None)

Retrieves a list of all feature group versions for the specified feature group.

Parameters:
  • limit (int) – The maximum length of the returned versions.

  • start_after_version (str) – Results will start after this version.

Returns:

A list of feature group versions.

Return type:

list[FeatureGroupVersion]

set_export_connector_config(feature_group_export_config=None)

Sets FG export config for the given feature group.

Parameters:

feature_group_export_config (FeatureGroupExportConfig) – The export config to be set for the given feature group.

set_export_on_materialization(enable)

Can be used to enable or disable exporting feature group data to the export connector associated with the feature group.

Parameters:

enable (bool) – If true, will enable exporting feature group to the connector. If false, will disable.

create_template(name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)

Create a feature group template.

Parameters:
  • name (str) – User-friendly name for this feature group template.

  • template_sql (str) – The template SQL that will be resolved by applying values from the template variables to generate SQL for a feature group.

  • template_variables (list) – The template variables for resolving the template.

  • description (str) – Description of this feature group template.

  • template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.

  • should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.

Returns:

The created feature group template.

Return type:

FeatureGroupTemplate

suggest_template_for()

Suggest values for a feature gruop template, based on a feature group.

Parameters:

feature_group_id (str) – Unique identifier associated with the feature group to use for suggesting values to use in the template.

Returns:

The suggested feature group template.

Return type:

FeatureGroupTemplate

get_recent_streamed_data()

Returns recently streamed data to a streaming feature group.

Parameters:

feature_group_id (str) – Unique string identifier associated with the feature group.

append_data(streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters:
  • streaming_token (str) – The streaming token for authenticating requests.

  • data (dict) – The data to record as a JSON object.

append_multiple_data(streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters:
  • streaming_token (str) – Streaming token for authenticating requests.

  • data (list) – Data to record, as a list of JSON objects.

upsert_data(data, streaming_token=None, blobs=None)

Update new data into the feature group for a given lookup key record ID if the record ID is found; otherwise, insert new data into the feature group.

Parameters:
  • data (dict) – The data to record, in JSON format.

  • streaming_token (str) – Optional streaming token for authenticating requests if upserting to streaming FG.

  • blobs (None) – A dictionary of binary data to populate file fields’ in data to upsert to the streaming FG.

Returns:

The feature group row that was upserted.

Return type:

FeatureGroupRow

delete_data(primary_key)

Deletes a row from the feature group given the primary key

Parameters:

primary_key (str) – The primary key value for which to delete the feature group row

get_data(primary_key=None, num_rows=None)

Gets the feature group rows for online updatable feature groups.

If primary key is set, row corresponding to primary_key is returned. If num_rows is set, we return maximum of num_rows latest updated rows.

Parameters:
  • primary_key (str) – The primary key value for which to retrieve the feature group row (only for online feature groups).

  • num_rows (int) – Maximum number of rows to return from the feature group

Returns:

A list of feature group rows.

Return type:

list[FeatureGroupRow]

get_natural_language_explanation(feature_group_version=None, model_id=None)

Returns the saved natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model

Parameters:
  • feature_group_version (str) – A unique string identifier associated with the Feature Group Version.

  • model_id (str) – A unique string identifier associated with the Model.

Returns:

The object containing natural language explanation(s) as field(s).

Return type:

NaturalLanguageExplanation

generate_natural_language_explanation(feature_group_version=None, model_id=None)

Generates natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model

Parameters:
  • feature_group_version (str) – A unique string identifier associated with the Feature Group Version.

  • model_id (str) – A unique string identifier associated with the Model.

Returns:

The object containing natural language explanation(s) as field(s).

Return type:

NaturalLanguageExplanation

wait_for_dataset(timeout=7200)

A waiting call until the feature group’s dataset, if any, is ready for use.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.

wait_for_upload(timeout=7200)

Waits for a feature group created from a dataframe to be ready for materialization and version creation.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.

wait_for_materialization(timeout=7200)

A waiting call until feature group is materialized.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.

wait_for_streaming_ready(timeout=600)

Waits for the feature group indexing config to be applied for streaming

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 600 seconds.

get_status(streaming_status=False)

Gets the status of the feature group.

Returns:

A string describing the status of a feature group (pending, complete, etc.).

Return type:

str

Parameters:

streaming_status (bool)

load_as_pandas()

Loads the feature groups into a python pandas dataframe.

Returns:

A pandas dataframe with annotations and text_snippet columns.

Return type:

DataFrame

load_as_pandas_documents(doc_id_column, document_column)

Loads a feature group with documents data into a pandas dataframe.

Parameters:
  • doc_id_feature (str) – The name of the feature / column containing the document ID.

  • document_feature (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.

  • doc_id_column (str)

  • document_column (str)

Returns:

A pandas dataframe containing the extracted document data.

Return type:

DataFrame

describe_dataset()

Displays the dataset attached to a feature group.

Returns:

A dataset object with all the relevant information about the dataset.

Return type:

Dataset

materialize()

Materializes the feature group’s latest change at the api call time. It’ll skip materialization if no change since the current latest version.

Returns:

A feature group object with the lastest changes materialized.

Return type:

FeatureGroup