abacusai.api_class.feature_group
================================

.. py:module:: abacusai.api_class.feature_group


Classes
-------

.. autoapisummary::

   abacusai.api_class.feature_group.SamplingConfig
   abacusai.api_class.feature_group.NSamplingConfig
   abacusai.api_class.feature_group.PercentSamplingConfig
   abacusai.api_class.feature_group._SamplingConfigFactory
   abacusai.api_class.feature_group.MergeConfig
   abacusai.api_class.feature_group.LastNMergeConfig
   abacusai.api_class.feature_group.TimeWindowMergeConfig
   abacusai.api_class.feature_group._MergeConfigFactory
   abacusai.api_class.feature_group.OperatorConfig
   abacusai.api_class.feature_group.UnpivotConfig
   abacusai.api_class.feature_group.MarkdownConfig
   abacusai.api_class.feature_group.CrawlerTransformConfig
   abacusai.api_class.feature_group.ExtractDocumentDataConfig
   abacusai.api_class.feature_group.DataGenerationConfig
   abacusai.api_class.feature_group.UnionTransformConfig
   abacusai.api_class.feature_group._OperatorConfigFactory


Module Contents
---------------

.. py:class:: SamplingConfig

   Bases: :py:obj:`abacusai.api_class.abstract.ApiClass`


   An abstract class for the sampling config of a feature group


   .. py:attribute:: sampling_method
      :type:  abacusai.api_class.enums.SamplingMethodType
      :value: None


   .. py:method:: _get_builder()
      :classmethod:


   .. py:method:: __post_init__()


.. py:class:: NSamplingConfig

   Bases: :py:obj:`SamplingConfig`


   The number of distinct values of the key columns to include in the sample, or number of rows if key columns not specified.

   :param sample_count: The number of rows to include in the sample
   :type sample_count: int
   :param key_columns: The feature(s) to use as the key(s) when sampling
   :type key_columns: List[str]


   .. py:attribute:: sample_count
      :type:  int


   .. py:attribute:: key_columns
      :type:  List[str]
      :value: []


   .. py:method:: __post_init__()


.. py:class:: PercentSamplingConfig

   Bases: :py:obj:`SamplingConfig`


   The fraction of distinct values of the feature group to include in the sample.

   :param sample_percent: The percentage of the rows to sample
   :type sample_percent: float
   :param key_columns: The feature(s) to use as the key(s) when sampling
   :type key_columns: List[str]


   .. py:attribute:: sample_percent
      :type:  float


   .. py:attribute:: key_columns
      :type:  List[str]
      :value: []


   .. py:method:: __post_init__()


.. py:class:: _SamplingConfigFactory

   Bases: :py:obj:`abacusai.api_class.abstract._ApiClassFactory`


   Helper class that provides a standard way to create an ABC using
   inheritance.


   .. py:attribute:: config_class_key
      :value: 'sampling_method'


   .. py:attribute:: config_abstract_class


   .. py:attribute:: config_class_map


.. py:class:: MergeConfig

   Bases: :py:obj:`abacusai.api_class.abstract.ApiClass`


   An abstract class for the merge config of a feature group


   .. py:attribute:: merge_mode
      :type:  abacusai.api_class.enums.MergeMode
      :value: None


   .. py:method:: _get_builder()
      :classmethod:


   .. py:method:: __post_init__()


.. py:class:: LastNMergeConfig

   Bases: :py:obj:`MergeConfig`


   Merge LAST N chunks/versions of an incremental dataset.

   :param num_versions: The number of versions to merge. num_versions == 0 means merge all versions.
   :type num_versions: int
   :param include_version_timestamp_column: If set, include a column with the creation timestamp of source FG versions.
   :type include_version_timestamp_column: bool


   .. py:attribute:: num_versions
      :type:  int


   .. py:attribute:: include_version_timestamp_column
      :type:  bool
      :value: None


   .. py:method:: __post_init__()


.. py:class:: TimeWindowMergeConfig

   Bases: :py:obj:`MergeConfig`


   Merge rows within a given timewindow of the most recent timestamp

   :param feature_name: Time based column to index on
   :type feature_name: str
   :param time_window_size_ms: Range of merged rows will be [MAX_TIME - time_window_size_ms, MAX_TIME]
   :type time_window_size_ms: int
   :param include_version_timestamp_column: If set, include a column with the creation timestamp of source FG versions.
   :type include_version_timestamp_column: bool


   .. py:attribute:: feature_name
      :type:  str


   .. py:attribute:: time_window_size_ms
      :type:  int


   .. py:attribute:: include_version_timestamp_column
      :type:  bool
      :value: None


   .. py:method:: __post_init__()


.. py:class:: _MergeConfigFactory

   Bases: :py:obj:`abacusai.api_class.abstract._ApiClassFactory`


   Helper class that provides a standard way to create an ABC using
   inheritance.


   .. py:attribute:: config_class_key
      :value: 'merge_mode'


   .. py:attribute:: config_abstract_class


   .. py:attribute:: config_class_map


.. py:class:: OperatorConfig

   Bases: :py:obj:`abacusai.api_class.abstract.ApiClass`


   Configuration for a template Feature Group Operation


   .. py:attribute:: operator_type
      :type:  abacusai.api_class.enums.OperatorType
      :value: None


   .. py:method:: _get_builder()
      :classmethod:


   .. py:method:: __post_init__()


.. py:class:: UnpivotConfig

   Bases: :py:obj:`OperatorConfig`


   Unpivot Columns in a FeatureGroup.

   :param columns: Which columns to unpivot.
   :type columns: List[str]
   :param index_column: Name of new column containing the unpivoted column names as its values
   :type index_column: str
   :param value_column: Name of new column containing the row values that were unpivoted.
   :type value_column: str
   :param exclude: If True, the unpivoted columns are all the columns EXCEPT the ones in the columns argument. Default is False.
   :type exclude: bool


   .. py:attribute:: columns
      :type:  List[str]
      :value: None


   .. py:attribute:: index_column
      :type:  str
      :value: None


   .. py:attribute:: value_column
      :type:  str
      :value: None


   .. py:attribute:: exclude
      :type:  bool
      :value: None


   .. py:method:: __post_init__()


.. py:class:: MarkdownConfig

   Bases: :py:obj:`OperatorConfig`


   Transform a input column to a markdown column.

   :param input_column: Name of input column to transform.
   :type input_column: str
   :param output_column: Name of output column to store transformed data.
   :type output_column: str
   :param input_column_type: Type of input column to transform.
   :type input_column_type: MarkdownOperatorInputType


   .. py:attribute:: input_column
      :type:  str
      :value: None


   .. py:attribute:: output_column
      :type:  str
      :value: None


   .. py:attribute:: input_column_type
      :type:  abacusai.api_class.enums.MarkdownOperatorInputType
      :value: None


   .. py:method:: __post_init__()


.. py:class:: CrawlerTransformConfig

   Bases: :py:obj:`OperatorConfig`


   Transform a input column of urls to html text

   :param input_column: Name of input column to transform.
   :type input_column: str
   :param output_column: Name of output column to store transformed data.
   :type output_column: str
   :param depth_column: Increasing depth explores more links, capturing more content
   :type depth_column: str
   :param disable_host_restriction: If True, will not restrict crawling to the same host.
   :type disable_host_restriction: bool
   :param honour_website_rules: If True, will respect robots.txt rules.
   :type honour_website_rules: bool
   :param user_agent: If provided, will use this user agent instead of randomly selecting one.
   :type user_agent: str


   .. py:attribute:: input_column
      :type:  str
      :value: None


   .. py:attribute:: output_column
      :type:  str
      :value: None


   .. py:attribute:: depth_column
      :type:  str
      :value: None


   .. py:attribute:: input_column_type
      :type:  str
      :value: None


   .. py:attribute:: crawl_depth
      :type:  int
      :value: None


   .. py:attribute:: disable_host_restriction
      :type:  bool
      :value: None


   .. py:attribute:: honour_website_rules
      :type:  bool
      :value: None


   .. py:attribute:: user_agent
      :type:  str
      :value: None


   .. py:method:: __post_init__()


.. py:class:: ExtractDocumentDataConfig

   Bases: :py:obj:`OperatorConfig`


   Extracts data from documents.

   :param doc_id_column: Name of input document ID column.
   :type doc_id_column: str
   :param document_column: Name of the input document column which contains the page infos. This column will be transformed to include the document processing config in the output feature group.
   :type document_column: str
   :param document_processing_config: Document processing configuration.
   :type document_processing_config: DocumentProcessingConfig


   .. py:attribute:: doc_id_column
      :type:  str
      :value: None


   .. py:attribute:: document_column
      :type:  str
      :value: None


   .. py:attribute:: document_processing_config
      :type:  abacusai.api_class.dataset.DocumentProcessingConfig
      :value: None


   .. py:method:: __post_init__()


.. py:class:: DataGenerationConfig

   Bases: :py:obj:`OperatorConfig`


   Generate synthetic data using a model for finetuning an LLM.

   :param prompt_col: Name of the input prompt column.
   :type prompt_col: str
   :param completion_col: Name of the output completion column.
   :type completion_col: str
   :param description_col: Name of the description column.
   :type description_col: str
   :param id_col: Name of the identifier column.
   :type id_col: str
   :param generation_instructions: Instructions for the data generation model.
   :type generation_instructions: str
   :param temperature: Sampling temperature for the model.
   :type temperature: float
   :param fewshot_examples: Number of fewshot examples used to prompt the model.
   :type fewshot_examples: int
   :param concurrency: Number of concurrent processes.
   :type concurrency: int
   :param examples_per_target: Number of examples per target.
   :type examples_per_target: int
   :param subset_size: Size of the subset to use for generation.
   :type subset_size: Optional[int]
   :param verify_response: Whether to verify the response.
   :type verify_response: bool
   :param token_budget: Token budget for generation.
   :type token_budget: int
   :param oversample: Whether to oversample the data.
   :type oversample: bool
   :param documentation_char_limit: Character limit for documentation.
   :type documentation_char_limit: int
   :param frequency_penalty: Penalty for frequency of token appearance.
   :type frequency_penalty: float
   :param model: Model to use for data generation.
   :type model: str
   :param seed: Seed for random number generation.
   :type seed: Optional[int]


   .. py:attribute:: prompt_col
      :type:  str
      :value: None


   .. py:attribute:: completion_col
      :type:  str
      :value: None


   .. py:attribute:: description_col
      :type:  str
      :value: None


   .. py:attribute:: id_col
      :type:  str
      :value: None


   .. py:attribute:: generation_instructions
      :type:  str
      :value: None


   .. py:attribute:: temperature
      :type:  float
      :value: None


   .. py:attribute:: fewshot_examples
      :type:  int
      :value: None


   .. py:attribute:: concurrency
      :type:  int
      :value: None


   .. py:attribute:: examples_per_target
      :type:  int
      :value: None


   .. py:attribute:: subset_size
      :type:  int
      :value: None


   .. py:attribute:: verify_response
      :type:  bool
      :value: None


   .. py:attribute:: token_budget
      :type:  int
      :value: None


   .. py:attribute:: oversample
      :type:  bool
      :value: None


   .. py:attribute:: documentation_char_limit
      :type:  int
      :value: None


   .. py:attribute:: frequency_penalty
      :type:  float
      :value: None


   .. py:attribute:: model
      :type:  str
      :value: None


   .. py:attribute:: seed
      :type:  int
      :value: None


   .. py:method:: __post_init__()


.. py:class:: UnionTransformConfig

   Bases: :py:obj:`OperatorConfig`


   Takes Union of current feature group with 1 or more selected feature groups of same type.

   :param feature_group_ids: List of feature group IDs to union with source FG.
   :type feature_group_ids: List[str]
   :param drop_non_intersecting_columns: If true, will drop columns that are not present in all feature groups. If false fills missing columns with nulls.
   :type drop_non_intersecting_columns: bool


   .. py:attribute:: feature_group_ids
      :type:  List[str]
      :value: None


   .. py:attribute:: drop_non_intersecting_columns
      :type:  bool
      :value: False


   .. py:method:: __post_init__()


.. py:class:: _OperatorConfigFactory

   Bases: :py:obj:`abacusai.api_class.abstract._ApiClassFactory`


   A class to select and return the the correct type of Operator Config based on a serialized OperatorConfig instance.


   .. py:attribute:: config_abstract_class


   .. py:attribute:: config_class_key
      :value: 'operator_type'


   .. py:attribute:: config_class_map