Skip to main content

Workflows

Workflows are processes designed to manipulate large volumes of data for data cleaning, visualisation or machine learning purposes.

On the TrueState platform, workflows combine assets (data and machine learning models) with actions (things we do to those models and data) to build high-impact AI and machine learning solutions.

Assets

Assets include models and data.

Data: Currently only tabular data is supported on the TrueState platform. This data can be uploaded via the CLI or integrated with an integration action (see below). Note there is a 100mb size limit for direct file uploads.

Actions

  1. Data integrations
  2. Data Transforms
  3. Model training
  4. Model usage
  5. Batch flow execution
  6. Explainable AI

Data integrations

Data integrations load data from external sources into the TrueState platform.

Currently the following data integrations are supported:

  • S3: Load CSV files from an Amazon Web Services S3 bucket
  • GCS: Load CSV files from a Google Cloud Platform GCS bucket.
  • Blob storage: Load CSV files form an Azure Blob Storage bucket.
  • Salesforce: Extract data with a SOQL query from SalesForce.

Data Transforms

Data transforms allow you to build petabyte-scale data processing pipelines to reshape, filter and combine large tabular datasets in preparation for machine learning, data visualisations or RAG use-cases. To accomodate for different preferences in processing tabular data, the following languages / packages are supported:

Model training

The model training step enables you to train a machine learning model. The following model types are supported.

  • Text classifier (RoBERTa Large): For classic text classification (e.g. sentiment analysis)
  • Text preference / reward model (RoBERTa Large): Trains a model to score text data according to expert preferences based on "chosen / rejected" pairs.
  • Tabular classifier (xgboost): Trains a tree-based model for prediction of classes (e.g. likely churn vs not likely churn)
  • Tabular regressor (xgboost): Trains a tree-based model for prediction of a numerical value (e.g. sales volume)
  • LLM fine-tuning (Llama3.2 3bn): Implement supervised fine-tuning via Low-Rank Adapters (LORA)

Model usage

Mode usage (aka inference) allows you to apply a model to your data to generate new data.

The models that can be used are:

  • Text classifers (must be trained via model training)
  • Text preference models / reward models (must be trained via model training)
  • Tabular classifier (must be trained via model training)
  • Tabular regressor (must be traind via model training)
  • Universal classification: uses a universal classifier (aka an NLI model) to classify if a user-defined hypothesis is valid given a supplied text field.
  • Hierarchy classification: uses a universal classifier to classify a text record into a MECE hierarchy of classes without needing to label data.
  • Tagging: uses a universal classifier to tag a text record with any tags with a matching criteria.
  • Embedding: uses an embedding model to make text data searchable.

Batch Flow Execution

In some instances, it is adventageous to use agentic AI solutions across large datasets (e.g. researching information on a large number of companies). In such scenarios, you can execute a large number of live-flows via the "Run batch flows" action.

Explainable AI

For tabular models (tabular classifier and tabular regressor) you can glean insights into how the model is making its predictions through the use of SHAP visualisations.

The TrueState platform supports BeeSwarm and Dependence Plots (see the SHAP Documentation for more information on interpreting models with SHAP.