abacusai.model

Classes

Model

A model

Module Contents

class abacusai.model.Model(client, name=None, modelId=None, modelConfigType=None, modelPredictionConfig=None, createdAt=None, projectId=None, shared=None, sharedAt=None, trainFunctionName=None, predictFunctionName=None, predictManyFunctionName=None, initializeFunctionName=None, trainingInputTables=None, sourceCode=None, cpuSize=None, memory=None, trainingFeatureGroupIds=None, algorithmModelConfigs=None, trainingVectorStoreVersions=None, documentRetrievers=None, documentRetrieverIds=None, isPythonModel=None, defaultAlgorithm=None, customAlgorithmConfigs=None, restrictedAlgorithms=None, useGpu=None, notebookId=None, trainingRequired=None, location={}, refreshSchedules={}, codeSource={}, databaseConnector={}, dataLlmFeatureGroups={}, latestModelVersion={}, modelConfig={})

Bases: abacusai.return_class.AbstractApiClass

A model

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • name (str) – The user-friendly name for the model.

  • modelId (str) – The unique identifier of the model.

  • modelConfigType (str) – Name of the TrainingConfig class of the model_config.

  • modelPredictionConfig (dict) – The prediction config options for the model.

  • createdAt (str) – Date and time at which the model was created.

  • projectId (str) – The project this model belongs to.

  • shared (bool) – If model is shared to the Abacus.AI model showcase.

  • sharedAt (str) – The date and time at which the model was shared to the model showcase

  • trainFunctionName (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predictFunctionName (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predictManyFunctionName (str) – Name of the function found in the source code that will be executed to run batch predictions trhough the model.

  • initializeFunctionName (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • trainingInputTables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • sourceCode (str) – Python code used to make the model.

  • cpuSize (str) – Cpu size specified for the python model training.

  • memory (int) – Memory in GB specified for the python model training.

  • trainingFeatureGroupIds (list of unique string identifiers) – The unique identifiers of the feature groups used as the inputs to train this model on.

  • algorithmModelConfigs (list[dict]) – List of algorithm specific training configs.

  • trainingVectorStoreVersions (list) – The vector store version IDs used as inputs during training to create this ModelVersion.

  • documentRetrievers (list) – List of document retrievers use to create this model.

  • documentRetrieverIds (list) – List of document retriever IDs used to create this model.

  • isPythonModel (bool) – If this model is handled as python model

  • defaultAlgorithm (str) – If set, this algorithm will always be used when deploying the model regardless of the model metrics

  • customAlgorithmConfigs (dict) – User-defined configs for each of the user-defined custom algorithm

  • restrictedAlgorithms (dict) – User-selected algorithms to train.

  • useGpu (bool) – If this model uses gpu.

  • notebookId (str) – The notebook associated with this model.

  • trainingRequired (bool) – If training is required to keep the model up-to-date.

  • latestModelVersion (ModelVersion) – The latest model version.

  • location (ModelLocation) – Location information for models that are imported.

  • refreshSchedules (RefreshSchedule) – List of refresh schedules that indicate when the next model version will be trained

  • codeSource (CodeSource) – If a python model, information on the source code

  • databaseConnector (DatabaseConnector) – Database connector used by the model.

  • dataLlmFeatureGroups (FeatureGroup) – List of feature groups used by the model for queries

  • modelConfig (TrainingConfig) – The training config options used to train this model.

name
model_id
model_config_type
model_prediction_config
created_at
project_id
shared
shared_at
train_function_name
predict_function_name
predict_many_function_name
initialize_function_name
training_input_tables
source_code
cpu_size
memory
training_feature_group_ids
algorithm_model_configs
training_vector_store_versions
document_retrievers
document_retriever_ids
is_python_model
default_algorithm
custom_algorithm_configs
restricted_algorithms
use_gpu
notebook_id
training_required
location
refresh_schedules
code_source
database_connector
data_llm_feature_groups
latest_model_version
model_config
deprecated_keys
__repr__()
to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict

describe_train_test_data_split_feature_group()

Get the train and test data split for a trained model by its unique identifier. This is only supported for models with custom algorithms.

Parameters:

model_id (str) – The unique ID of the model. By default, the latest model version will be returned if no version is specified.

Returns:

The feature group containing the training data and fold information.

Return type:

FeatureGroup

refresh()

Calls describe and refreshes the current object’s fields

Returns:

The current object

Return type:

Model

describe()

Retrieves a full description of the specified model.

Parameters:

model_id (str) – Unique string identifier associated with the model.

Returns:

Description of the model.

Return type:

Model

rename(name)

Renames a model

Parameters:

name (str) – The new name to assign to the model.

update_python(function_source_code=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None, use_gpu=None, is_thread_safe=None, training_config=None)

Updates an existing Python Model using user-provided Python code. If a list of input feature groups is supplied, they will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName. predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.

Parameters:
  • function_source_code (str) – Contents of a valid Python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions through the model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed to run batch predictions through the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized DataFrames (same type as the functions return value).

  • cpu_size (str) – Size of the CPU for the model training function.

  • memory (int) – Memory (in GB) for the model training function.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • use_gpu (bool) – Whether this model needs gpu

  • is_thread_safe (bool) – Whether this model is thread safe

  • training_config (TrainingConfig) – The training config used to train this model.

Returns:

The updated model.

Return type:

Model

update_python_zip(train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None, use_gpu=None)

Updates an existing Python Model using a provided zip file. If a list of input feature groups are supplied, they will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contain the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName, and predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.

Parameters:
  • train_function_name (str) – Name of the function found in the train module that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the predict module that will be executed to run predictions through the model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed to run batch predictions through the model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the function’s return value).

  • cpu_size (str) – Size of the CPU for the model training function.

  • memory (int) – Memory (in GB) for the model training function.

  • package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • use_gpu (bool) – Whether this model needs gpu

Returns:

The updated model.

Return type:

Upload

update_python_git(application_connector_id=None, branch_name=None, python_root=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, use_gpu=None)

Updates an existing Python model using an existing Git application connector. If a list of input feature groups are supplied, these will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contain the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName, and predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.

Parameters:
  • application_connector_id (str) – The unique ID associated with the Git application connector.

  • branch_name (str) – Name of the branch in the Git repository to be used for training.

  • python_root (str) – Path from the top level of the Git repository to the directory containing the Python source code. If not provided, the default is the root of the Git repository.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the predict module that will be executed to run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed to run batch predictions through model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the CPU for the model training function.

  • memory (int) – Memory (in GB) for the model training function.

  • use_gpu (bool) – Whether this model needs gpu

Returns:

The updated model.

Return type:

Model

set_training_config(training_config, feature_group_ids=None)

Edits the default model training config

Parameters:
  • training_config (TrainingConfig) – The training config used to train this model.

  • feature_group_ids (List) – The list of feature groups used as input to the model.

Returns:

The model object corresponding to the updated training config.

Return type:

Model

set_prediction_params(prediction_config)

Sets the model prediction config for the model

Parameters:

prediction_config (dict) – Prediction configuration for the model.

Returns:

Model object after the prediction configuration is applied.

Return type:

Model

get_metrics(model_version=None, return_graphs=False, validation=False)

Retrieves metrics for all the algorithms trained in this model version.

If only the model’s unique identifier (model_id) is specified, the latest trained version of the model (model_version) is used.

Parameters:
  • model_version (str) – Version of the model.

  • return_graphs (bool) – If true, will return the information used for the graphs on the model metrics page such as PR Curve per label.

  • validation (bool) – If true, will return the validation metrics instead of the test metrics.

Returns:

An object containing the model metrics and explanations for what each metric means.

Return type:

ModelMetrics

list_versions(limit=100, start_after_version=None)

Retrieves a list of versions for a given model.

Parameters:
  • limit (int) – Maximum length of the list of all dataset versions.

  • start_after_version (str) – Unique string identifier of the version after which the list starts.

Returns:

An array of model versions.

Return type:

list[ModelVersion]

retrain(deployment_ids=None, feature_group_ids=None, custom_algorithms=None, builtin_algorithms=None, custom_algorithm_configs=None, cpu_size=None, memory=None, training_config=None, algorithm_training_configs=None)

Retrains the specified model, with an option to choose the deployments to which the retraining will be deployed.

Parameters:
  • deployment_ids (List) – List of unique string identifiers of deployments to automatically deploy to.

  • feature_group_ids (List) – List of feature group IDs provided by the user to train the model on.

  • custom_algorithms (list) – List of user-defined algorithms to train. If not set, will honor the runs from the last time and applicable new custom algorithms.

  • builtin_algorithms (list) – List of algorithm names or algorithm IDs of Abacus.AI built-in algorithms to train. If not set, will honor the runs from the last time and applicable new built-in algorithms.

  • custom_algorithm_configs (dict) – User-defined training configs for each custom algorithm.

  • cpu_size (str) – Size of the CPU for the user-defined algorithms during training.

  • memory (int) – Memory (in GB) for the user-defined algorithms during training.

  • training_config (TrainingConfig) – The training config used to train this model.

  • algorithm_training_configs (list) – List of algorithm specifc training configs that will be part of the model training AutoML run.

Returns:

The model that is being retrained.

Return type:

Model

delete()

Deletes the specified model and all its versions. Models which are currently used in deployments cannot be deleted.

Parameters:

model_id (str) – Unique string identifier of the model to delete.

set_default_algorithm(algorithm=None, data_cluster_type=None)

Sets the model’s algorithm to default for all new deployments

Parameters:
  • algorithm (str) – Algorithm to pin in the model.

  • data_cluster_type (str) – Data cluster type to set the lead model for.

list_artifacts_exports(limit=25)

List all the model artifacts exports.

Parameters:

limit (int) – Maximum length of the list of all exports.

Returns:

List of model artifacts exports.

Return type:

list[ModelArtifactsExport]

get_training_types_for_deployment(model_version=None, algorithm=None)

Returns types of models that can be deployed for a given model instance ID.

Parameters:
  • model_version (str) – The unique ID associated with the model version to deploy.

  • algorithm (str) – The unique ID associated with the algorithm to deploy.

Returns:

Model training types for deployment.

Return type:

ModelTrainingTypeForDeployment

update_agent(function_source_code=None, agent_function_name=None, memory=None, package_requirements=None, description=None, enable_binary_input=None, agent_input_schema=None, agent_output_schema=None, workflow_graph=None, agent_interface=None, included_modules=None, org_level_connectors=None, user_level_connectors=None, initialize_function_name=None, initialize_function_code=None)

Updates an existing AI Agent. A new version of the agent will be created and published.

Parameters:
  • memory (int) – Memory (in GB) for the agent.

  • package_requirements (list) – A list of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].

  • description (str) – A description of the agent, including its purpose and instructions.

  • workflow_graph (WorkflowGraph) – The workflow graph for the agent.

  • agent_interface (AgentInterface) – The interface that the agent will be deployed with.

  • included_modules (List) – A list of user created custom modules to include in the agent’s environment.

  • org_level_connectors (List) – A list of org level connector ids to be used by the agent.

  • user_level_connectors (Dict) – A dictionary mapping ApplicationConnectorType keys to lists of OAuth scopes. Each key represents a specific user level application connector, while the value is a list of scopes that define the permissions granted to the application.

  • initialize_function_name (str) – The name of the function to be used for initialization.

  • initialize_function_code (str) – The function code to be used for initialization.

  • function_source_code (str)

  • agent_function_name (str)

  • enable_binary_input (bool)

  • agent_input_schema (dict)

  • agent_output_schema (dict)

Returns:

The updated agent.

Return type:

Agent

wait_for_training(timeout=None)

A waiting call until model is trained.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_evaluation(timeout=None)

A waiting call until model is evaluated completely.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_publish(timeout=None)

A waiting call until agent is published.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

wait_for_full_automl(timeout=None)

A waiting call until full AutoML cycle is completed.

Parameters:

timeout (int) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to be timed out.

get_status(get_automl_status=False)

Gets the status of the model training.

Returns:

A string describing the status of a model training (pending, complete, etc.).

Return type:

str

Parameters:

get_automl_status (bool)

create_refresh_policy(cron)

To create a refresh policy for a model.

Parameters:

cron (str) – A cron style string to set the refresh time.

Returns:

The refresh policy object.

Return type:

RefreshPolicy

list_refresh_policies()

Gets the refresh policies in a list.

Returns:

A list of refresh policy objects.

Return type:

List[RefreshPolicy]

get_train_test_feature_group_as_pandas()

Get the model train test data split feature group as pandas.

Returns:

A pandas dataframe for the training data with fold column.

Return type:

pandas.Dataframe