abacusai.api_class.feature_group
Classes
An abstract class for the sampling config of a feature group |
|
The number of distinct values of the key columns to include in the sample, or number of rows if key columns not specified. |
|
The fraction of distinct values of the feature group to include in the sample. |
|
Helper class that provides a standard way to create an ABC using |
|
An abstract class for the merge config of a feature group |
|
Merge LAST N chunks/versions of an incremental dataset. |
|
Merge rows within a given timewindow of the most recent timestamp |
|
Helper class that provides a standard way to create an ABC using |
|
Configuration for a template Feature Group Operation |
|
Unpivot Columns in a FeatureGroup. |
|
Transform a input column to a markdown column. |
|
Transform a input column of urls to html text |
|
Extracts data from documents. |
|
Generate synthetic data using a model for finetuning an LLM. |
|
Takes Union of current feature group with 1 or more selected feature groups of same type. |
|
A class to select and return the the correct type of Operator Config based on a serialized OperatorConfig instance. |
Module Contents
- class abacusai.api_class.feature_group.SamplingConfig
Bases:
abacusai.api_class.abstract.ApiClass
An abstract class for the sampling config of a feature group
- sampling_method: abacusai.api_class.enums.SamplingMethodType
- classmethod _get_builder()
- __post_init__()
- class abacusai.api_class.feature_group.NSamplingConfig
Bases:
SamplingConfig
The number of distinct values of the key columns to include in the sample, or number of rows if key columns not specified.
- Parameters:
- __post_init__()
- class abacusai.api_class.feature_group.PercentSamplingConfig
Bases:
SamplingConfig
The fraction of distinct values of the feature group to include in the sample.
- Parameters:
- __post_init__()
- class abacusai.api_class.feature_group._SamplingConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_class_key = 'sampling_method'
- config_abstract_class
- config_class_map
- class abacusai.api_class.feature_group.MergeConfig
Bases:
abacusai.api_class.abstract.ApiClass
An abstract class for the merge config of a feature group
- merge_mode: abacusai.api_class.enums.MergeMode
- classmethod _get_builder()
- __post_init__()
- class abacusai.api_class.feature_group.LastNMergeConfig
Bases:
MergeConfig
Merge LAST N chunks/versions of an incremental dataset.
- Parameters:
- __post_init__()
- class abacusai.api_class.feature_group.TimeWindowMergeConfig
Bases:
MergeConfig
Merge rows within a given timewindow of the most recent timestamp
- Parameters:
- __post_init__()
- class abacusai.api_class.feature_group._MergeConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
Helper class that provides a standard way to create an ABC using inheritance.
- config_class_key = 'merge_mode'
- config_abstract_class
- config_class_map
- class abacusai.api_class.feature_group.OperatorConfig
Bases:
abacusai.api_class.abstract.ApiClass
Configuration for a template Feature Group Operation
- operator_type: abacusai.api_class.enums.OperatorType
- classmethod _get_builder()
- __post_init__()
- class abacusai.api_class.feature_group.UnpivotConfig
Bases:
OperatorConfig
Unpivot Columns in a FeatureGroup.
- Parameters:
columns (List[str]) – Which columns to unpivot.
index_column (str) – Name of new column containing the unpivoted column names as its values
value_column (str) – Name of new column containing the row values that were unpivoted.
exclude (bool) – If True, the unpivoted columns are all the columns EXCEPT the ones in the columns argument. Default is False.
- __post_init__()
- class abacusai.api_class.feature_group.MarkdownConfig
Bases:
OperatorConfig
Transform a input column to a markdown column.
- Parameters:
input_column (str) – Name of input column to transform.
output_column (str) – Name of output column to store transformed data.
input_column_type (MarkdownOperatorInputType) – Type of input column to transform.
- input_column_type: abacusai.api_class.enums.MarkdownOperatorInputType
- __post_init__()
- class abacusai.api_class.feature_group.CrawlerTransformConfig
Bases:
OperatorConfig
Transform a input column of urls to html text
- Parameters:
input_column (str) – Name of input column to transform.
output_column (str) – Name of output column to store transformed data.
depth_column (str) – Increasing depth explores more links, capturing more content
disable_host_restriction (bool) – If True, will not restrict crawling to the same host.
honour_website_rules (bool) – If True, will respect robots.txt rules.
user_agent (str) – If provided, will use this user agent instead of randomly selecting one.
- __post_init__()
- class abacusai.api_class.feature_group.ExtractDocumentDataConfig
Bases:
OperatorConfig
Extracts data from documents.
- Parameters:
doc_id_column (str) – Name of input document ID column.
document_column (str) – Name of the input document column which contains the page infos. This column will be transformed to include the document processing config in the output feature group.
document_processing_config (DocumentProcessingConfig) – Document processing configuration.
- document_processing_config: abacusai.api_class.dataset.DocumentProcessingConfig
- __post_init__()
- class abacusai.api_class.feature_group.DataGenerationConfig
Bases:
OperatorConfig
Generate synthetic data using a model for finetuning an LLM.
- Parameters:
prompt_col (str) – Name of the input prompt column.
completion_col (str) – Name of the output completion column.
description_col (str) – Name of the description column.
id_col (str) – Name of the identifier column.
generation_instructions (str) – Instructions for the data generation model.
temperature (float) – Sampling temperature for the model.
fewshot_examples (int) – Number of fewshot examples used to prompt the model.
concurrency (int) – Number of concurrent processes.
examples_per_target (int) – Number of examples per target.
subset_size (Optional[int]) – Size of the subset to use for generation.
verify_response (bool) – Whether to verify the response.
token_budget (int) – Token budget for generation.
oversample (bool) – Whether to oversample the data.
documentation_char_limit (int) – Character limit for documentation.
frequency_penalty (float) – Penalty for frequency of token appearance.
model (str) – Model to use for data generation.
seed (Optional[int]) – Seed for random number generation.
- __post_init__()
- class abacusai.api_class.feature_group.UnionTransformConfig
Bases:
OperatorConfig
Takes Union of current feature group with 1 or more selected feature groups of same type.
- Parameters:
- __post_init__()
- class abacusai.api_class.feature_group._OperatorConfigFactory
Bases:
abacusai.api_class.abstract._ApiClassFactory
A class to select and return the the correct type of Operator Config based on a serialized OperatorConfig instance.
- config_abstract_class
- config_class_key = 'operator_type'
- config_class_map