abacusai.project

Classes

Project

A project is a container which holds datasets, models and deployments

Module Contents

class abacusai.project.Project(client, projectId=None, name=None, useCase=None, problemType=None, createdAt=None, tags=None)

Bases: abacusai.return_class.AbstractApiClass

A project is a container which holds datasets, models and deployments

Parameters:

client (ApiClient) – An authenticated API Client instance
projectId (str) – The ID of the project.
name (str) – The name of the project.
useCase (str) – The use case associated with the project.
problemType (str) – The problem type associated with the project.
createdAt (str) – The date and time when the project was created.
tags (list[str]) – List of tags associated with the project.

project_id = None

name = None

use_case = None

problem_type = None

created_at = None

tags = None

deprecated_keys

__repr__()

to_dict()

Get a dict representation of the parameters in this class

Returns:: The dict value representation of the class parameters
Return type:: dict

refresh()

Calls describe and refreshes the current object’s fields

Returns:: The current object
Return type:: Project

describe()

Returns a description of a project.

Parameters:: project_id (str) – A unique string identifier for the project.
Returns:: The description of the project.
Return type:: Project

rename(name)

This method renames a project after it is created.

Parameters:: name (str) – The new name for the project.

delete(force_delete=False)

Delete a specified project from your organization.

This method deletes the project, its associated trained models, and deployments. The datasets attached to the specified project remain available for use with other projects in the organization.

This method will not delete a project that contains active deployments. Ensure that all active deployments are stopped before using the delete option.

Note: All projects, models, and deployments cannot be recovered once they are deleted.

Parameters:: force_delete (bool) – If True, the project will be deleted even if it has active deployments.

add_tags(tags)

This method adds a tag to a project.

Parameters:: tags (list) – The tags to add to the project.

remove_tags(tags)

This method removes a tag from a project.

Parameters:: tags (list) – The tags to remove from the project.

set_feature_mapping(feature_group_id, feature_name, feature_mapping=None, nested_column_name=None)

Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.

Parameters:

feature_group_id (str) – The unique ID associated with the feature group.
feature_name (str) – The name of the feature.
feature_mapping (str) – The mapping of the feature in the feature group.
nested_column_name (str) – The name of the nested column if the input feature is part of a nested feature group for the given feature_group_id.

Returns:

A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.

Return type:

list[Feature]

validate(feature_group_ids=None)

Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.

Parameters:: feature_group_ids (List) – The list of feature group IDs to validate.
Returns:: The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.
Return type:: ProjectValidation

infer_feature_mappings(feature_group_id)

Infer the feature mappings for the feature group in the project based on the problem type.

Parameters:: feature_group_id (str) – The unique ID associated with the feature group.
Returns:: A dict that contains the inferred feature mappings.
Return type:: InferredFeatureMappings

describe_feature_group(feature_group_id)

Describe a feature group associated with a project

Parameters:: feature_group_id (str) – The unique ID associated with the feature group.
Returns:: The project feature group object.
Return type:: ProjectFeatureGroup

list_feature_groups(filter_feature_group_use=None, limit=100, start_after_id=None)

List all the feature groups associated with a project

Parameters:

filter_feature_group_use (str) – The feature group use filter, when given as an argument only allows feature groups present in this project to be returned if they are of the given use. Possible values are: ‘USER_CREATED’, ‘BATCH_PREDICTION_OUTPUT’.
limit (int) – The maximum number of feature groups to be retrieved.
start_after_id (str) – An offset parameter to exclude all feature groups up to a specified ID.

Returns:

All the Feature Groups in a project.

Return type:

list[ProjectFeatureGroup]

list_feature_group_templates(limit=100, start_after_id=None, should_include_all_system_templates=False)

List feature group templates for feature groups associated with the project.

Parameters:

limit (int) – Maximum number of templates to be retrieved.
start_after_id (str) – Offset parameter to exclude all templates till the specified feature group template ID.
should_include_all_system_templates (bool) – If True, will include built-in templates.

Returns:

All the feature groups in the organization, optionally limited by the feature group that created the template(s).

Return type:

list[FeatureGroupTemplate]

get_training_config_options(feature_group_ids=None, for_retrain=False, current_training_config=None)

Retrieves the full initial description of the model training configuration options available for the specified project. The configuration options available are determined by the use case associated with the specified project. Refer to the [Use Case Documentation]({USE_CASES_URL}) for more information on use cases and use case-specific configuration options.

Parameters:

feature_group_ids (List) – The feature group IDs to be used for training.
for_retrain (bool) – Whether the training config options are used for retraining.
current_training_config (TrainingConfig) – The current state of the training config, with some options set, which shall be used to get new options after refresh. This is None by default initially.

Returns:

An array of options that can be specified when training a model in this project.

Return type:

list[TrainingConfigOptions]

create_train_test_data_split_feature_group(training_config, feature_group_ids)

Get the train and test data split without training the model. Only supported for models with custom algorithms.

Parameters:

training_config (TrainingConfig) – The training config used to influence how the split is calculated.
feature_group_ids (List) – List of feature group IDs provided by the user, including the required one for data split and others to influence how to split.

Returns:

The feature group containing the training data and folds information.

Return type:

FeatureGroup

train_model(name=None, training_config=None, feature_group_ids=None, refresh_schedule=None, custom_algorithms=None, custom_algorithms_only=False, custom_algorithm_configs=None, builtin_algorithms=None, cpu_size=None, memory=None, algorithm_training_configs=None)

Create a new model and start its training in the given project.

Parameters:

name (str) – The name of the model. Defaults to “<Project Name> Model”.
training_config (TrainingConfig) – The training config used to train this model.
feature_group_ids (List) – List of feature group IDs provided by the user to train the model on.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.
custom_algorithms (list) – List of user-defined algorithms to train. If not set, the default enabled custom algorithms will be used.
custom_algorithms_only (bool) – Whether to only run custom algorithms.
custom_algorithm_configs (dict) – Configs for each user-defined algorithm; key is the algorithm name, value is the config serialized to JSON.
builtin_algorithms (list) – List of algorithm names or algorithm IDs of the builtin algorithms provided by Abacus.AI to train. If not set, all applicable builtin algorithms will be used.
cpu_size (str) – Size of the CPU for the user-defined algorithms during training.
memory (int) – Memory (in GB) for the user-defined algorithms during training.
algorithm_training_configs (list) – List of algorithm specifc training configs that will be part of the model training AutoML run.

Returns:

The new model which is being trained.

Return type:

Model

create_model_from_python(function_source_code, train_function_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, name=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, package_requirements=None, use_gpu=False, is_thread_safe=None)

Initializes a new Model from user-provided Python code. If a list of input feature groups is supplied, they will be provided as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well-defined return type, as it returns the prediction made by the predictFunctionName, which can be anything.

Parameters:

function_source_code (str) – Contents of a valid Python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the source code that will be executed to run predictions through the model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”
cpu_size (str) – Size of the CPU for the model training function
memory (int) – Memory (in GB) for the model training function
training_config (TrainingConfig) – Training configuration
exclusive_run (bool) – Decides if this model will be run exclusively or along with other Abacus.AI algorithms
package_requirements (list) – List of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’]
use_gpu (bool) – Whether this model needs gpu
is_thread_safe (bool) – Whether this model is thread safe

Returns:

The new model, which has not been trained.

Return type:

Model

list_models()

Retrieves the list of models in the specified project.

Parameters:: project_id (str) – Unique string identifier associated with the project.
Returns:: A list of models.
Return type:: list[Model]

get_custom_train_function_info(feature_group_names_for_training=None, training_data_parameter_name_override=None, training_config=None, custom_algorithm_config=None)

Returns information about how to call the custom train function.

Parameters:

feature_group_names_for_training (list) – A list of feature group table names to be used for training.
training_data_parameter_name_override (dict) – Override from feature group type to parameter name in the train function.
training_config (TrainingConfig) – Training config for the options supported by the Abacus.AI platform.
custom_algorithm_config (dict) – User-defined config that can be serialized by JSON.

Returns:

Information about how to call the customer-provided train function.

Return type:

CustomTrainFunctionInfo

create_model_monitor(prediction_feature_group_id, training_feature_group_id=None, name=None, refresh_schedule=None, target_value=None, target_value_bias=None, target_value_performance=None, feature_mappings=None, model_id=None, training_feature_mappings=None, feature_group_base_monitor_config=None, feature_group_comparison_monitor_config=None, exclude_interactive_performance_analysis=True, exclude_bias_analysis=None, exclude_performance_analysis=None, exclude_feature_drift_analysis=None, exclude_data_integrity_analysis=None)

Runs a model monitor for the specified project.

Parameters:

prediction_feature_group_id (str) – The unique ID of the prediction data feature group.
training_feature_group_id (str) – The unique ID of the training data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor.
target_value (str) – A target positive value for the label to compute bias and PR/AUC for performance page.
target_value_bias (str) – A target positive value for the label to compute bias.
target_value_performance (str) – A target positive value for the label to compute PR curve/AUC for performance page.
feature_mappings (dict) – A JSON map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
model_id (str) – The unique ID of the model.
training_feature_mappings (dict) – A JSON map to override features for training_fature_group, where keys are column names and the values are feature data use types.
feature_group_base_monitor_config (dict) – Selection strategy for the feature_group 1 with the feature group version if selected.
feature_group_comparison_monitor_config (dict) – Selection strategy for the feature_group 1 with the feature group version if selected.
exclude_interactive_performance_analysis (bool) – Whether to exclude interactive performance analysis. Defaults to True if not provided.
exclude_bias_analysis (bool) – Whether to exclude bias analysis in the model monitor. For default value bias analysis is included.
exclude_performance_analysis (bool) – Whether to exclude performance analysis in the model monitor. For default value performance analysis is included.
exclude_feature_drift_analysis (bool) – Whether to exclude feature drift analysis in the model monitor. For default value feature drift analysis is included.
exclude_data_integrity_analysis (bool) – Whether to exclude data integrity analysis in the model monitor. For default value data integrity analysis is included.

Returns:

The new model monitor that was created.

Return type:

ModelMonitor

list_model_monitors(limit=None)

Retrieves the list of model monitors in the specified project.

Parameters:: limit (int) – Maximum number of model monitors to return. We’ll have internal limit if not set.
Returns:: A list of model monitors.
Return type:: list[ModelMonitor]

create_vision_drift_monitor(prediction_feature_group_id, training_feature_group_id, name, feature_mappings, training_feature_mappings, target_value_performance=None, refresh_schedule=None)

Runs a vision drift monitor for the specified project.

Parameters:

prediction_feature_group_id (str) – Unique string identifier of the prediction data feature group.
training_feature_group_id (str) – Unique string identifier of the training data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
feature_mappings (dict) – A JSON map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
training_feature_mappings (dict) – A JSON map to override features for training_feature_group, where keys are column names and the values are feature data use types.
target_value_performance (str) – A target positive value for the label to compute precision-recall curve/area under curve for performance page.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically rerun the created vision drift monitor.

Returns:

The new model monitor that was created.

Return type:

ModelMonitor

create_nlp_drift_monitor(prediction_feature_group_id, training_feature_group_id, name, feature_mappings, training_feature_mappings, target_value_performance=None, refresh_schedule=None)

Runs an NLP drift monitor for the specified project.

Parameters:

prediction_feature_group_id (str) – Unique string identifier of the prediction data feature group.
training_feature_group_id (str) – Unique string identifier of the training data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
feature_mappings (dict) – A JSON map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
training_feature_mappings (dict) – A JSON map to override features for training_feature_group, where keys are column names and the values are feature data use types.
target_value_performance (str) – A target positive value for the label to compute precision-recall curve/area under curve for performance page.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically rerun the created nlp drift monitor.

Returns:

The new model monitor that was created.

Return type:

ModelMonitor

create_forecasting_monitor(name, prediction_feature_group_id, training_feature_group_id, training_forecast_config, prediction_forecast_config, forecast_frequency, refresh_schedule=None)

Runs a forecasting monitor for the specified project.

Parameters:

name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
prediction_feature_group_id (str) – Unique string identifier of the prediction data feature group.
training_feature_group_id (str) – Unique string identifier of the training data feature group.
training_forecast_config (ForecastingMonitorConfig) – The configuration for the training data.
prediction_forecast_config (ForecastingMonitorConfig) – The configuration for the prediction data.
forecast_frequency (str) – The frequency of the forecast. Defaults to the frequency of the prediction data.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically rerun the created forecasting monitor.

Returns:

The new model monitor that was created.

Return type:

ModelMonitor

create_eda(feature_group_id, name, refresh_schedule=None, include_collinearity=False, include_data_consistency=False, collinearity_keys=None, primary_keys=None, data_consistency_test_config=None, data_consistency_reference_config=None, feature_mappings=None, forecast_frequency=None)

Run an Exploratory Data Analysis (EDA) for the specified project.

Parameters:

feature_group_id (str) – The unique ID of the prediction data feature group.
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> EDA”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created EDA.
include_collinearity (bool) – Set to True if the EDA type is collinearity.
include_data_consistency (bool) – Set to True if the EDA type is data consistency.
collinearity_keys (list) – List of features to use for collinearity
primary_keys (list) – List of features that corresponds to the primary keys or item ids for the given feature group for Data Consistency analysis or Forecasting analysis respectively.
data_consistency_test_config (dict) – Test feature group version selection strategy for Data Consistency EDA type.
data_consistency_reference_config (dict) – Reference feature group version selection strategy for Data Consistency EDA type.
feature_mappings (dict) – A JSON map to override features for the given feature_group, where keys are column names and the values are feature data use types. (In forecasting, used to set the timestamp column and target value)
forecast_frequency (str) – The frequency of the data. It can be either HOURLY, DAILY, WEEKLY, MONTHLY, QUARTERLY, YEARLY.

Returns:

The new EDA object that was created.

Return type:

Eda

list_eda()

Retrieves the list of Exploratory Data Analysis (EDA) in the specified project.

Parameters:: project_id (str) – Unique string identifier associated with the project.
Returns:: List of EDA objects.
Return type:: list[Eda]

list_holdout_analysis(model_id=None)

List holdout analyses for a project. Optionally, filter by model.

Parameters:: model_id (str) – (optional) ID of the model to filter by
Returns:: The holdout analyses
Return type:: list[HoldoutAnalysis]

create_monitor_alert(alert_name, condition_config, action_config, model_monitor_id=None, realtime_monitor_id=None)

Create a monitor alert for the given conditions and monitor. We can create monitor alert either for model monitor or real-time monitor.

Parameters:

alert_name (str) – Name of the alert.
condition_config (AlertConditionConfig) – Condition to run the actions for the alert.
action_config (AlertActionConfig) – Configuration for the action of the alert.
model_monitor_id (str) – Unique string identifier for the model monitor created under the project.
realtime_monitor_id (str) – Unique string identifier for the real-time monitor for the deployment created under the project.

Returns:

Object describing the monitor alert.

Return type:

MonitorAlert

list_prediction_operators()

List all the prediction operators inside a project.

Parameters:: project_id (str) – The unique ID of the project.
Returns:: A list of prediction operator objects.
Return type:: list[PredictionOperator]

create_deployment_token(name=None)

Creates a deployment token for the specified project.

Deployment tokens are used to authenticate requests to the prediction APIs and are scoped to the project level.

Parameters:: name (str) – The name of the deployment token.
Returns:: The deployment token.
Return type:: DeploymentAuthToken

list_deployments()

Retrieves a list of all deployments in the specified project.

Parameters:: project_id (str) – The unique identifier associated with the project.
Returns:: An array of deployments.
Return type:: list[Deployment]

list_deployment_tokens()

Retrieves a list of all deployment tokens associated with the specified project.

Parameters:: project_id (str) – The unique ID associated with the project.
Returns:: A list of deployment tokens.
Return type:: list[DeploymentAuthToken]

list_realtime_monitors()

List the real-time monitors associated with the deployment id.

Parameters:: project_id (str) – Unique string identifier for the deployment.
Returns:: An array of real-time monitors.
Return type:: list[RealtimeMonitor]

list_refresh_policies(dataset_ids=[], feature_group_id=None, model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[], notebook_ids=[])

List the refresh policies for the organization. If no filters are specified, all refresh policies are returned.

Parameters:

dataset_ids (List) – Comma-separated list of Dataset IDs.
feature_group_id (str) – Feature Group ID for which we wish to see the refresh policies attached.
model_ids (List) – Comma-separated list of Model IDs.
deployment_ids (List) – Comma-separated list of Deployment IDs.
batch_prediction_ids (List) – Comma-separated list of Batch Prediction IDs.
model_monitor_ids (List) – Comma-separated list of Model Monitor IDs.
notebook_ids (List) – Comma-separated list of Notebook IDs.

Returns:

List of all refresh policies in the organization.

Return type:

list[RefreshPolicy]

list_batch_predictions(limit=None)

Retrieves a list of batch predictions in the project.

Parameters:: limit (int) – Maximum number of batch predictions to return. We’ll have internal limit if not set.
Returns:: List of batch prediction jobs.
Return type:: list[BatchPrediction]

list_pipelines()

Lists the pipelines for an organization or a project

Parameters:: project_id (str) – Unique string identifier for the project to list graph dashboards from.
Returns:: A list of pipelines.
Return type:: list[Pipeline]

create_graph_dashboard(name, python_function_ids=None)

Create a plot dashboard given selected python plots

Parameters:

name (str) – The name of the dashboard.
python_function_ids (List) – A list of unique string identifiers for the python functions to be used in the graph dashboard.

Returns:

An object describing the graph dashboard.

Return type:

GraphDashboard

list_graph_dashboards()

Lists the graph dashboards for a project

Parameters:: project_id (str) – Unique string identifier for the project to list graph dashboards from.
Returns:: A list of graph dashboards.
Return type:: list[GraphDashboard]

list_builtin_algorithms(feature_group_ids, training_config=None)

Return list of built-in algorithms based on given input data and training config.

Parameters:

feature_group_ids (List) – List of feature group IDs specifying input data.
training_config (TrainingConfig) – The training config to be used for model training.

Returns:

List of applicable builtin algorithms.

Return type:

list[Algorithm]

create_chat_session(name=None)

Creates a chat session with Data Science Co-pilot.

Parameters:: name (str) – The name of the chat session. Defaults to the project name.
Returns:: The chat session with Data Science Co-pilot
Return type:: ChatSession

create_agent(function_source_code=None, agent_function_name=None, name=None, memory=None, package_requirements=[], description=None, enable_binary_input=False, evaluation_feature_group_id=None, agent_input_schema=None, agent_output_schema=None, workflow_graph=None, agent_interface=AgentInterface.DEFAULT, included_modules=None, org_level_connectors=None, user_level_connectors=None, initialize_function_name=None, initialize_function_code=None)

Creates a new AI agent using the given agent workflow graph definition.

Parameters:

name (str) – The name you want your agent to have, defaults to “<Project Name> Agent”.
memory (int) – Overrides the default memory allocation (in GB) for the agent.
package_requirements (list) – A list of package requirement strings. For example: [‘numpy==1.2.3’, ‘pandas>=1.4.0’].
description (str) – A description of the agent, including its purpose and instructions.
evaluation_feature_group_id (str) – The ID of the feature group to use for evaluation.
workflow_graph (WorkflowGraph) – The workflow graph for the agent.
agent_interface (AgentInterface) – The interface that the agent will be deployed with.
included_modules (List) – A list of user created custom modules to include in the agent’s environment.
org_level_connectors (List) – A list of org level connector ids to be used by the agent.
user_level_connectors (Dict) – A dictionary mapping ApplicationConnectorType keys to lists of OAuth scopes. Each key represents a specific user level application connector, while the value is a list of scopes that define the permissions granted to the application.
initialize_function_name (str) – The name of the function to be used for initialization.
initialize_function_code (str) – The function code to be used for initialization.
function_source_code (str)
agent_function_name (str)
enable_binary_input (bool)
agent_input_schema (dict)
agent_output_schema (dict)

Returns:

The new agent.

Return type:

Agent

generate_agent_code(prompt, fast_mode=None)

Generates the code for defining an AI Agent

Parameters:

prompt (str) – A natural language prompt which describes agent specification. Describe what the agent will do, what inputs it will expect, and what outputs it will give out
fast_mode (bool) – If True, runs a faster but slightly less accurate code generation pipeline

list_agents()

Retrieves the list of agents in the specified project.

Parameters:: project_id (str) – The unique identifier associated with the project.
Returns:: A list of agents in the project.
Return type:: list[Agent]

create_document_retriever(name, feature_group_id, document_retriever_config=None)

Returns a document retriever that stores embeddings for document chunks in a feature group.

Document columns in the feature group are broken into chunks. For cases with multiple document columns, chunks from all columns are combined together to form a single chunk.

Parameters:

name (str) – The name of the Document Retriever. Can be up to 120 characters long and can only contain alphanumeric characters and underscores.
feature_group_id (str) – The ID of the feature group that the Document Retriever is associated with.
document_retriever_config (VectorStoreConfig) – The configuration, including chunk_size and chunk_overlap_fraction, for document retrieval.

Returns:

The newly created document retriever.

Return type:

DocumentRetriever

list_document_retrievers(limit=100, start_after_id=None)

List all the document retrievers.

Parameters:

limit (int) – The number of document retrievers to return.
start_after_id (str) – An offset parameter to exclude all document retrievers up to this specified ID.

Returns:

All the document retrievers in the organization associated with the specified project.

Return type:

list[DocumentRetriever]

create_model_from_functions(train_function, predict_function=None, training_input_tables=None, predict_many_function=None, initialize_function=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False)

Creates a model using python.

Parameters:

train_function (callable) – The train function is passed.
predict_function (callable) – The prediction function is passed.
training_input_tables (list) – The input tables to be used for training the model. Defaults to None.
predict_many_function (callable) – Prediction function for batch input
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
initialize_function (callable)
training_config (dict)
exclusive_run (bool)

Returns:

The model object.

Return type:

Model