abacusai.feature_group
Classes
A feature group. |
Module Contents
- class abacusai.feature_group.FeatureGroup(client, featureGroupId=None, modificationLock=None, name=None, featureGroupSourceType=None, tableName=None, sql=None, datasetId=None, functionSourceCode=None, functionName=None, sourceTables=None, createdAt=None, description=None, sqlError=None, latestVersionOutdated=None, referencedFeatureGroups=None, tags=None, primaryKey=None, updateTimestampKey=None, lookupKeys=None, streamingEnabled=None, incremental=None, mergeConfig=None, samplingConfig=None, cpuSize=None, memory=None, streamingReady=None, featureTags=None, moduleName=None, templateBindings=None, featureExpression=None, useOriginalCsvNames=None, pythonFunctionBindings=None, pythonFunctionName=None, useGpu=None, versionLimit=None, exportOnMaterialization=None, features={}, duplicateFeatures={}, pointInTimeGroups={}, annotationConfig={}, concatenationConfig={}, indexingConfig={}, codeSource={}, featureGroupTemplate={}, explanation={}, refreshSchedules={}, exportConnectorConfig={}, latestFeatureGroupVersion={}, operatorConfig={})
Bases:
abacusai.return_class.AbstractApiClass
A feature group.
- Parameters:
client (ApiClient) – An authenticated API Client instance
featureGroupId (str) – Unique identifier for this feature group.
modificationLock (bool) – If feature group is locked against a change or not.
name (str)
featureGroupSourceType (str) – The source type of the feature group
tableName (str) – Unique table name of this feature group.
sql (str) – SQL definition creating this feature group.
datasetId (str) – Dataset ID the feature group is sourced from.
functionSourceCode (str) – Source definition creating this feature group.
functionName (str) – Function name to execute from the source code.
sourceTables (list[str]) – Source tables for this feature group.
createdAt (str) – Timestamp at which the feature group was created.
description (str) – Description of the feature group.
sqlError (str) – Error message with this feature group.
latestVersionOutdated (bool) – Is latest materialized feature group version outdated.
referencedFeatureGroups (list[str]) – Feature groups this feature group is used in.
primaryKey (str) – Primary index feature.
updateTimestampKey (str) – Primary timestamp feature.
lookupKeys (list[str]) – Additional indexed features for this feature group.
streamingEnabled (bool) – If true, the feature group can have data streamed to it.
incremental (bool) – If feature group corresponds to an incremental dataset.
mergeConfig (dict) – Merge configuration settings for the feature group.
samplingConfig (dict) – Sampling configuration for the feature group.
cpuSize (str) – CPU size specified for the Python feature group.
memory (int) – Memory in GB specified for the Python feature group.
streamingReady (bool) – If true, the feature group is ready to receive streaming data.
featureTags (dict) – Tags for features in this feature group
moduleName (str) – Path to the file with the feature group function.
templateBindings (dict) – Config specifying variable names and values to use when resolving a feature group template.
featureExpression (str) – If the dataset feature group has custom features, the SQL select expression creating those features.
useOriginalCsvNames (bool) – If true, the feature group will use the original column names in the source dataset.
pythonFunctionBindings (dict) – Config specifying variable names, types, and values to use when resolving a Python feature group.
pythonFunctionName (str) – Name of the Python function the feature group was built from.
useGpu (bool) – Whether this feature group is using gpu
versionLimit (int) – Version limit for the feature group.
exportOnMaterialization (bool) – Whether to export the feature group on materialization.
features (Feature) – List of resolved features.
duplicateFeatures (Feature) – List of duplicate features.
pointInTimeGroups (PointInTimeGroup) – List of Point In Time Groups.
annotationConfig (AnnotationConfig) – Annotation config for this feature
latestFeatureGroupVersion (FeatureGroupVersion) – Latest feature group version.
concatenationConfig (ConcatenationConfig) – Feature group ID whose data will be concatenated into this feature group.
indexingConfig (IndexingConfig) – Indexing config for the feature group for feature store
codeSource (CodeSource) – If a Python feature group, information on the source code.
featureGroupTemplate (FeatureGroupTemplate) – FeatureGroupTemplate to use when this feature group is attached to a template.
explanation (NaturalLanguageExplanation) – Natural language explanation of the feature group
refreshSchedules (RefreshSchedule) – List of schedules that determines when the next version of the feature group will be created.
exportConnectorConfig (FeatureGroupRefreshExportConfig) – The export config (file connector or database connector information) for feature group exports.
operatorConfig (OperatorConfig) – Operator configuration settings for the feature group.
- feature_group_id = None
- modification_lock = None
- name = None
- feature_group_source_type = None
- table_name = None
- sql = None
- dataset_id = None
- function_source_code = None
- function_name = None
- source_tables = None
- created_at = None
- description = None
- sql_error = None
- latest_version_outdated = None
- referenced_feature_groups = None
- tags = None
- primary_key = None
- update_timestamp_key = None
- lookup_keys = None
- streaming_enabled = None
- incremental = None
- merge_config = None
- sampling_config = None
- cpu_size = None
- memory = None
- streaming_ready = None
- feature_tags = None
- module_name = None
- template_bindings = None
- feature_expression = None
- use_original_csv_names = None
- python_function_bindings = None
- python_function_name = None
- use_gpu = None
- version_limit = None
- export_on_materialization = None
- features
- duplicate_features
- point_in_time_groups
- annotation_config
- concatenation_config
- indexing_config
- code_source
- feature_group_template
- explanation
- refresh_schedules
- export_connector_config
- latest_feature_group_version
- operator_config
- deprecated_keys
- __repr__()
- to_dict()
Get a dict representation of the parameters in this class
- Returns:
The dict value representation of the class parameters
- Return type:
- add_to_project(project_id, feature_group_type='CUSTOM_TABLE')
Adds a feature group to a project.
- set_project_config(project_id, project_config=None)
Sets a feature group’s project config
- Parameters:
project_id (str) – Unique string identifier for the project.
project_config (ProjectFeatureGroupConfig) – Feature group’s project configuration.
- get_project_config(project_id)
Gets a feature group’s project config
- Parameters:
project_id (str) – Unique string identifier for the project.
- Returns:
The feature group’s project configuration.
- Return type:
- remove_from_project(project_id)
Removes a feature group from a project.
- Parameters:
project_id (str) – The unique ID associated with the project.
- set_type(project_id, feature_group_type='CUSTOM_TABLE')
Update the feature group type in a project. The feature group must already be added to the project.
- describe_annotation(feature_name=None, doc_id=None, feature_group_row_identifier=None)
Get the latest annotation entry for a given feature group, feature, and document.
- Parameters:
feature_name (str) – The name of the feature the annotation is on.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
- Returns:
The latest annotation entry for the given feature group, feature, document, and/or annotation key value.
- Return type:
- verify_and_describe_annotation(feature_name=None, doc_id=None, feature_group_row_identifier=None)
Get the latest annotation entry for a given feature group, feature, and document along with verification information.
- Parameters:
feature_name (str) – The name of the feature the annotation is on.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
- Returns:
The latest annotation entry for the given feature group, feature, document, and/or annotation key value. Includes the verification information.
- Return type:
- update_annotation_status(feature_name, status, doc_id=None, feature_group_row_identifier=None, save_metadata=False)
Update the status of an annotation entry.
- Parameters:
feature_name (str) – The name of the feature the annotation is on.
status (str) – The new status of the annotation. Must be one of the following: ‘TODO’, ‘IN_PROGRESS’, ‘DONE’.
doc_id (str) – The ID of the primary document the annotation is on. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the feature group’s primary / identifier key value. At least one of the doc_id or feature_group_row_identifier must be provided in order to identify the correct annotation.
save_metadata (bool) – If True, save the metadata for the annotation entry.
- Returns:
The updated annotation entry.
- Return type:
- get_document_to_annotate(project_id, feature_name, feature_group_row_identifier=None, get_previous=False)
Get an available document that needs to be annotated for a annotation feature group.
- Parameters:
project_id (str) – The ID of the project that the annotation is associated with.
feature_name (str) – The name of the feature the annotation is on.
feature_group_row_identifier (str) – The key value of the feature group row the annotation is on (cast to string). Usually the primary key value. If provided, fetch the immediate next (or previous) available document.
get_previous (bool) – If True, get the previous document instead of the next document. Applicable if feature_group_row_identifier is provided.
- Returns:
The document to annotate.
- Return type:
- get_annotations_status(feature_name=None, check_for_materialization=False)
Get the status of the annotations for a given feature group and feature.
- Parameters:
- Returns:
The status of the annotations for the given feature group and feature.
- Return type:
- import_annotation_labels(file, annotation_type)
Imports annotation labels from csv file. All valid values in the file will be imported as labels (including header row if present).
- Parameters:
file (io.TextIOBase) – The file to import. Must be a csv file.
annotation_type (str) – The type of the annotation.
- Returns:
The annotation config for the feature group.
- Return type:
- create_sampling(table_name, sampling_config, description=None)
Creates a new Feature Group defined as a sample of rows from another Feature Group.
For efficiency, sampling is approximate unless otherwise specified. (e.g. the number of rows may vary slightly from what was requested).
- Parameters:
table_name (str) – The unique name to be given to this sampling Feature Group. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.
sampling_config (SamplingConfig) – Dictionary defining the sampling method and its parameters.
description (str) – A human-readable description of this Feature Group.
- Returns:
The created Feature Group.
- Return type:
- set_sampling_config(sampling_config)
Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.
- Parameters:
sampling_config (SamplingConfig) – A JSON string object specifying the sampling method and parameters specific to that sampling method. An empty sampling_config indicates no sampling.
- Returns:
The updated FeatureGroup.
- Return type:
- set_merge_config(merge_config)
Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.
- Parameters:
merge_config (MergeConfig) – JSON object string specifying the merge rule. An empty merge_config will default to only including the latest dataset version.
- Returns:
The updated FeatureGroup.
- Return type:
- set_operator_config(operator_config)
Set a OperatorFeatureGroup’s operator config to the values provided.
- Parameters:
operator_config (OperatorConfig) – A dictionary object specifying the pre-defined operations.
- Returns:
The updated FeatureGroup.
- Return type:
- set_schema(schema)
Creates a new schema and points the feature group to the new feature group schema ID.
- Parameters:
schema (list) – JSON string containing an array of objects with ‘name’ and ‘dataType’ properties.
- get_schema(project_id=None)
Returns a schema for a given FeatureGroup in a project.
- create_feature(name, select_expression)
Creates a new feature in a Feature Group from a SQL select statement.
- Parameters:
- Returns:
A Feature Group object with the newly added feature.
- Return type:
- add_tag(tag)
Adds a tag to the feature group
- Parameters:
tag (str) – The tag to add to the feature group.
- remove_tag(tag)
Removes a tag from the specified feature group.
- Parameters:
tag (str) – The tag to remove from the feature group.
- add_annotatable_feature(name, annotation_type)
Add an annotatable feature in a Feature Group
- Parameters:
- Returns:
The feature group after the feature has been set
- Return type:
- set_feature_as_annotatable_feature(feature_name, annotation_type, feature_group_row_identifier_feature=None, doc_id_feature=None)
Sets an existing feature as an annotatable feature (Feature that can be annotated).
- Parameters:
feature_name (str) – The name of the feature to set as annotatable.
annotation_type (str) – The type of annotation label to add.
feature_group_row_identifier_feature (str) – The key value of the feature group row the annotation is on (cast to string) and uniquely identifies the feature group row. At least one of the doc_id or key value must be provided so that the correct annotation can be identified.
doc_id_feature (str) – The name of the document ID feature.
- Returns:
A feature group object with the newly added annotatable feature.
- Return type:
- set_annotation_status_feature(feature_name)
Sets a feature as the annotation status feature for a feature group.
- Parameters:
feature_name (str) – The name of the feature to set as the annotation status feature.
- Returns:
The updated feature group.
- Return type:
- unset_feature_as_annotatable_feature(feature_name)
Unsets a feature as annotatable
- Parameters:
feature_name (str) – The name of the feature to unset.
- Returns:
The feature group after unsetting the feature
- Return type:
- add_annotation_label(label_name, annotation_type, label_definition=None)
Adds an annotation label
- Parameters:
- Returns:
The feature group after adding the annotation label
- Return type:
- remove_annotation_label(label_name)
Removes an annotation label
- Parameters:
label_name (str) – The name of the label to remove.
- Returns:
The feature group after adding the annotation label
- Return type:
- add_feature_tag(feature, tag)
Adds a tag on a feature
- remove_feature_tag(feature, tag)
Removes a tag from a feature
- create_nested_feature(nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)
Creates a new nested feature in a feature group from a SQL statement.
- Parameters:
nested_feature_name (str) – The name of the feature.
table_name (str) – The table name of the feature group to nest. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.
using_clause (str) – The SQL join column or logic to join the nested table with the parent.
where_clause (str) – A SQL WHERE statement to filter the nested rows.
order_clause (str) – A SQL clause to order the nested rows.
- Returns:
A feature group object with the newly added nested feature.
- Return type:
- update_nested_feature(nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)
Updates a previously existing nested feature in a feature group.
- Parameters:
nested_feature_name (str) – The name of the feature to be updated.
table_name (str) – The name of the table. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.
using_clause (str) – The SQL join column or logic to join the nested table with the parent.
where_clause (str) – An SQL WHERE statement to filter the nested rows.
order_clause (str) – An SQL clause to order the nested rows.
new_nested_feature_name (str) – New name for the nested feature.
- Returns:
A feature group object with the updated nested feature.
- Return type:
- delete_nested_feature(nested_feature_name)
Delete a nested feature.
- Parameters:
nested_feature_name (str) – The name of the feature to be deleted.
- Returns:
A feature group object without the specified nested feature.
- Return type:
- create_point_in_time_feature(feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)
Creates a new point in time feature in a feature group using another historical feature group, window spec, and aggregate expression.
We use the aggregation keys and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group.
If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature greater than or equal to lookbackStartCount and less than the value of the current rows timeFeature are considered. An optional lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to ensure that these rows are available in the online context when we are performing a lookup on this feature group. If the window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is greater than or equal to lookbackCount and includes the row just prior to the current one. The lag is specified in terms of positions using lookbackUntilPosition.
- Parameters:
feature_name (str) – The name of the feature to create.
history_table_name (str) – The table name of the history table.
aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.
timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature.
historical_timestamp_key (str) – Name of feature which contains the historical timestamp.
expression (str) – SQL aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
- Returns:
A feature group object with the newly added nested feature.
- Return type:
- update_point_in_time_feature(feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)
Updates an existing Point-in-Time (PiT) feature in a feature group. See createPointInTimeFeature for detailed semantics.
- Parameters:
feature_name (str) – The name of the feature.
history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.
aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.
timestamp_key (str) – Name of the feature which contains the timestamp value for the PiT feature.
historical_timestamp_key (str) – Name of the feature which contains the historical timestamp.
expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.
lookback_window_seconds (float) – If the window is specified in terms of time, the number of seconds in the past from the current time for the start of the window.
lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window. If it is negative, we are looking at the “future” rows in the history table.
lookback_count (int) – If the window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of the window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.
new_feature_name (str) – New name for the PiT feature.
- Returns:
A feature group object with the newly added nested feature.
- Return type:
- create_point_in_time_group(group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)
Create a Point-in-Time Group
- Parameters:
group_name (str) – The name of the point in time group.
window_key (str) – Name of feature to use for ordering the rows on the source table.
aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.
history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.
lookback_window (float) – Number of seconds in the past from the current time for the start of the window. If 0, the lookback will include all rows.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, “future” rows in the history table are used.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many “future” rows in the history table are used.
- Returns:
The feature group after the point in time group has been created.
- Return type:
- generate_point_in_time_features(group_name, columns, window_functions, prefix=None)
Generates and adds PIT features given the selected columns to aggregate over, and the operations to include.
- Parameters:
- Returns:
Feature group object with newly added point-in-time features.
- Return type:
- update_point_in_time_group(group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)
Update Point-in-Time Group
- Parameters:
group_name (str) – The name of the point-in-time group.
window_key (str) – Name of feature which contains the timestamp value for the point-in-time feature.
aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation.
history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used.
history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used.
history_aggregation_keys (list) – List of keys to use for joining the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys.
lookback_window (float) – Number of seconds in the past from the current time for the start of the window.
lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed. If it is negative, future rows in the history table are looked at.
lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row).
lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, the start of the window is delayed by that many rows. If it is negative, those many future rows in the history table are looked at.
- Returns:
The feature group after the update has been applied.
- Return type:
- delete_point_in_time_group(group_name)
Delete point in time group
- Parameters:
group_name (str) – The name of the point in time group.
- Returns:
The feature group after the point in time group has been deleted.
- Return type:
- create_point_in_time_group_feature(group_name, name, expression)
Create point in time group feature
- Parameters:
- Returns:
The feature group after the update has been applied.
- Return type:
- update_point_in_time_group_feature(group_name, name, expression)
Update a feature’s SQL expression in a point in time group
- Parameters:
- Returns:
The feature group after the update has been applied.
- Return type:
- set_feature_type(feature, feature_type, project_id=None)
Set the type of a feature in a feature group. Specify the feature group ID, feature name, and feature type, and the method will return the new column with the changes reflected.
- concatenate_data(source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)
Concatenates data from one Feature Group to another. Feature Groups can be merged if their schemas are compatible, they have the special updateTimestampKey column, and (if set) the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).
- Parameters:
source_feature_group_id (str) – The Feature Group to concatenate with the destination Feature Group.
merge_type (str) – UNION or INTERSECTION.
replace_until_timestamp (int) – The UNIX timestamp to specify the point until which we will replace data from the source Feature Group.
skip_materialize (bool) – If True, will not materialize the concatenated Feature Group.
- remove_concatenation_config()
Removes the concatenation config on a destination feature group.
- Parameters:
feature_group_id (str) – Unique identifier of the destination feature group to remove the concatenation configuration from.
- refresh()
Calls describe and refreshes the current object’s fields
- Returns:
The current object
- Return type:
- describe()
Describe a Feature Group.
- Parameters:
feature_group_id (str) – A unique string identifier associated with the feature group.
- Returns:
The feature group object.
- Return type:
- set_indexing_config(primary_key=None, update_timestamp_key=None, lookup_keys=None)
Sets various attributes of the feature group used for primary key, deployment lookups and streaming updates.
- Parameters:
primary_key (str) – Name of the feature which defines the primary key of the feature group.
update_timestamp_key (str) – Name of the feature which defines the update timestamp of the feature group. Used in concatenation and primary key deduplication.
lookup_keys (list) – List of feature names which can be used in the lookup API to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.
- update(description=None)
Modify an existing Feature Group.
- Parameters:
description (str) – Description of the Feature Group.
- Returns:
Updated Feature Group object.
- Return type:
- detach_from_template()
Update a feature group to detach it from a template.
- Parameters:
feature_group_id (str) – Unique string identifier associated with the feature group.
- Returns:
The updated feature group.
- Return type:
- update_template_bindings(template_bindings=None)
Update the feature group template bindings for a template feature group.
- Parameters:
template_bindings (list) – Values in these bindings override values set in the template.
- Returns:
Updated feature group.
- Return type:
- update_python_function_bindings(python_function_bindings)
Updates an existing Feature Group’s Python function bindings from a user-provided Python Function. If a list of feature groups are supplied within the Python function bindings, we will provide DataFrames (Pandas in the case of Python) with the materialized feature groups for those input feature groups as arguments to the function.
- Parameters:
python_function_bindings (List) – List of python function arguments.
- update_python_function(python_function_name, python_function_bindings=None, cpu_size=None, memory=None, use_gpu=None, use_original_csv_names=None)
Updates an existing Feature Group’s python function from a user provided Python Function. If a list of feature groups are supplied within the python function
bindings, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.
- Parameters:
python_function_name (str) – The name of the python function to be associated with the feature group.
python_function_bindings (List) – List of python function arguments.
cpu_size (CPUSize) – Size of the CPU for the feature group python function.
memory (MemorySize) – Memory (in GB) for the feature group python function.
use_gpu (bool) – Whether the feature group needs a gpu or not. Otherwise default to CPU.
use_original_csv_names (bool) – If enabled, it uses the original column names for input feature groups from CSV datasets.
- update_sql_definition(sql)
Updates the SQL statement for a feature group.
- Parameters:
sql (str) – The input SQL statement for the feature group.
- Returns:
The updated feature group.
- Return type:
- update_dataset_feature_expression(feature_expression)
Updates the SQL feature expression for a Dataset FeatureGroup’s custom features
- Parameters:
feature_expression (str) – The input SQL statement for the feature group.
- Returns:
The updated feature group.
- Return type:
- update_version_limit(version_limit)
Updates the version limit for the feature group.
- Parameters:
version_limit (int) – The maximum number of versions permitted for the feature group. Once this limit is exceeded, the oldest versions will be purged in a First-In-First-Out (FIFO) order.
- Returns:
The updated feature group.
- Return type:
- update_feature(name, select_expression=None, new_name=None)
Modifies an existing feature in a feature group.
- Parameters:
- Returns:
Updated feature group object.
- Return type:
- list_exports()
Lists all of the feature group exports for the feature group
- Parameters:
feature_group_id (str) – Unique identifier of the feature group
- Returns:
List of feature group exports
- Return type:
- set_modifier_lock(locked=True)
Lock a feature group to prevent modification.
- Parameters:
locked (bool) – Whether to disable or enable feature group modification (True or False).
- list_modifiers()
List the users who can modify a given feature group.
- Parameters:
feature_group_id (str) – Unique string identifier of the feature group.
- Returns:
Information about the modification lock status and groups/organizations added to the feature group.
- Return type:
- add_user_to_modifiers(email)
Adds a user to a feature group.
- Parameters:
email (str) – The email address of the user to be added.
- add_organization_group_to_modifiers(organization_group_id)
Add OrganizationGroup to a feature group modifiers list
- Parameters:
organization_group_id (str) – Unique string identifier of the organization group.
- remove_user_from_modifiers(email)
Removes a user from a specified feature group.
- Parameters:
email (str) – The email address of the user to be removed.
- remove_organization_group_from_modifiers(organization_group_id)
Removes an OrganizationGroup from a feature group modifiers list
- Parameters:
organization_group_id (str) – The unique ID associated with the organization group.
- delete_feature(name)
Removes a feature from the feature group.
- Parameters:
name (str) – Name of the feature to be deleted.
- Returns:
Updated feature group object.
- Return type:
- delete()
Deletes a Feature Group.
- Parameters:
feature_group_id (str) – Unique string identifier for the feature group to be removed.
- create_version(variable_bindings=None)
Creates a snapshot for a specified feature group. Triggers materialization of the feature group. The new version of the feature group is created after it has materialized.
- Parameters:
variable_bindings (dict) – Dictionary defining variable bindings that override parent feature group values.
- Returns:
A feature group version.
- Return type:
- list_versions(limit=100, start_after_version=None)
Retrieves a list of all feature group versions for the specified feature group.
- Parameters:
- Returns:
A list of feature group versions.
- Return type:
- set_export_connector_config(feature_group_export_config=None)
Sets FG export config for the given feature group.
- Parameters:
feature_group_export_config (FeatureGroupExportConfig) – The export config to be set for the given feature group.
- set_export_on_materialization(enable)
Can be used to enable or disable exporting feature group data to the export connector associated with the feature group.
- Parameters:
enable (bool) – If true, will enable exporting feature group to the connector. If false, will disable.
- create_template(name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)
Create a feature group template.
- Parameters:
name (str) – User-friendly name for this feature group template.
template_sql (str) – The template SQL that will be resolved by applying values from the template variables to generate SQL for a feature group.
template_variables (list) – The template variables for resolving the template.
description (str) – Description of this feature group template.
template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.
should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.
- Returns:
The created feature group template.
- Return type:
- suggest_template_for()
Suggest values for a feature gruop template, based on a feature group.
- Parameters:
feature_group_id (str) – Unique identifier associated with the feature group to use for suggesting values to use in the template.
- Returns:
The suggested feature group template.
- Return type:
- get_recent_streamed_data()
Returns recently streamed data to a streaming feature group.
- Parameters:
feature_group_id (str) – Unique string identifier associated with the feature group.
- append_data(streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- append_multiple_data(streaming_token, data)
Appends new data into the feature group for a given lookup key recordId.
- upsert_data(data, streaming_token=None, blobs=None)
Update new data into the feature group for a given lookup key record ID if the record ID is found; otherwise, insert new data into the feature group.
- Parameters:
- Returns:
The feature group row that was upserted.
- Return type:
- delete_data(primary_key)
Deletes a row from the feature group given the primary key
- Parameters:
primary_key (str) – The primary key value for which to delete the feature group row
- get_data(primary_key=None, num_rows=None)
Gets the feature group rows for online updatable feature groups.
If primary key is set, row corresponding to primary_key is returned. If num_rows is set, we return maximum of num_rows latest updated rows.
- Parameters:
- Returns:
A list of feature group rows.
- Return type:
- get_natural_language_explanation(feature_group_version=None, model_id=None)
Returns the saved natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model
- Parameters:
- Returns:
The object containing natural language explanation(s) as field(s).
- Return type:
- generate_natural_language_explanation(feature_group_version=None, model_id=None)
Generates natural language explanation of an artifact with given ID. The artifact can be - Feature Group or Feature Group Version or Model
- Parameters:
- Returns:
The object containing natural language explanation(s) as field(s).
- Return type:
- wait_for_dataset(timeout=7200)
A waiting call until the feature group’s dataset, if any, is ready for use.
- Parameters:
timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.
- wait_for_upload(timeout=7200)
Waits for a feature group created from a dataframe to be ready for materialization and version creation.
- Parameters:
timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.
- wait_for_materialization(timeout=7200)
A waiting call until feature group is materialized.
- Parameters:
timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 7200 seconds.
- wait_for_streaming_ready(timeout=600)
Waits for the feature group indexing config to be applied for streaming
- Parameters:
timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out. Default value given is 600 seconds.
- get_status(streaming_status=False)
Gets the status of the feature group.
- load_as_pandas()
Loads the feature groups into a python pandas dataframe.
- Returns:
A pandas dataframe with annotations and text_snippet columns.
- Return type:
DataFrame
- load_as_pandas_documents(doc_id_column='doc_id', document_column='page_infos')
Loads a feature group with documents data into a pandas dataframe.
- Parameters:
doc_id_column (str) – The name of the feature / column containing the document ID.
document_column (str) – The name of the feature / column which either contains the document data itself or page infos with path to remotely stored documents. This column will be replaced with the extracted document data.
- Returns:
A pandas dataframe containing the extracted document data.
- Return type:
DataFrame
- describe_dataset()
Displays the dataset attached to a feature group.
- Returns:
A dataset object with all the relevant information about the dataset.
- Return type:
- materialize()
Materializes the feature group’s latest change at the api call time. It’ll skip materialization if no change since the current latest version.
- Returns:
A feature group object with the lastest changes materialized.
- Return type: