Pipelines

Pipelines are automated workflows that let you string together a series of data operations—like importing data, transforming it with SQL or Python, training machine-learning models, running inferences, and then output new datasets or models in a reproducible, versioned way. In the TrueState platform, you build pipelines by dragging and connecting nodes (datasets, action steps and models) in a visual editor, specifying inputs, outputs and configurations for each step. When you run a pipeline, it executes each job in order on our managed infrastructure, tracks artifacts (datasets and models) immutably, and provides logs and metrics so you can debug, monitor and iterate quickly—making it easy to build reliable, end-to-end data and ML workflows without writing orchestration code.

Creating and Running Pipelines

A new pipeline can be created from the Pipelines page from the menu. A pipeline will typically begin with a dataset. Click the ‘Add nodes’ button on the left, and drag a dataset node into the canvas to begin. The dataset can be set by changing the name field on the node. Dragging in an action node from the left will allow you to apply an operation to the dataset. Join the dataset to the action node by dragging the handle below the dataset node. The documentation below details how to use each action node. To save the pipeline, click the ‘Save’ button in the bottom right. To run the pipeline, click the ‘Run’ button in the bottom right.

Pipeline AI Assistance

The TrueState AI Assistant is able to create and modify pipelines. This is the preferred way to get started with a pipeline. Describe what you want the pipeline to do, and the bot will create it for you. It can also be very useful as a debugging tool - if you provide the error in the chat it will make the required modifications to the pipeline to resolve it. A pipeline must be added to the chat context for the bot to be able to modify it. Click the + icon above the chat and select the pipeline to add it to the context.

Dataset Nodes

A dataset node represents an input dataset to an action node, or an output dataset containing the results of an action node. If the dataset exists, it will be populated with a preview. If the dataset does not yet exist, it will remain empty and be populated when it is generated by the pipeline.

Model Nodes

A model node stores the model created by a training step. This model can then be utilised by an inference step. The model node is plug-and-play: there is no code required to use the models.

Action Nodes

An Action Node represents an operation that is performed on some data, producing either a new dataset, or a trained model.

Integration (Import)

Refer to the Loading Data guide: Importing from external platforms.

Data Transforms

A Data Transform is a SQL operation that is performed on some data. Data transform nodes let you take one or more existing datasets, apply a custom SQL query to reshape, filter, join or enrich the data, and then write the results into one or more new, immutable output datasets. In the TrueState visual pipeline editor, you connect input dataset nodes into a transform node, open the built-in SQL editor (which can autogenerate a starter query based on your schema), write a CREATE OR REPLACE TABLE statement prefaced with a simple comment explaining the logic, and specify the names of the outputs. When the pipeline runs, each transform step executes in order on BigQuery, producing clean, versioned datasets that downstream steps—like model training, embeddings or exports—can reliably consume. If you are not comfortable writing SQL syntax, the TrueState AI assistant can be used to write these queries. Providing the bot with a text description of what the transformation should do will allow it to write the SQL code for you.

Regression and Classification Model Training

Regression training nodes let you build predictive models that estimate a continuous target variable—like revenue, temperature or demand—based on one or more feature columns. To use the node, you drag your preprocessed training dataset onto the canvas, connect it to a “Tabular Regressor Training” node, and then specify: (1) which columns are your input features; (2) the output target column you want to predict; (3) an optional set of hyperparameters (with sensible defaults ready to go); and (4) a name for the resulting model artifact. When you run the pipeline, TrueState executes the training step on managed infrastructure, produces an immutable model file, and tracks performance metrics so you can compare runs and iterate. Classification training nodes follow the same pattern but predict categorical outcomes—such as “churn” vs. “no churn” or product category labels—instead of numeric values. You connect your labeled dataset to a “Tabular Classifier Training” node, pick your feature columns and the target label column, review or tweak the classifier’s hyperparameters, and choose a name for your new classification model. Once trained, the model artifact is available for inference and SHAP-based explainability steps elsewhere in your pipeline. The TrueState AI assistant is useful for selecting and experimenting with hyperparameters. It knows appropriate settings and where performance gains can be squeezed out.

Regression and Classification Model Inference

Regression inference nodes let you apply a trained regression model to new data in order to generate continuous predictions—such as forecasting sales, pricing estimates or risk scores—for each record. To use the node, you first connect your prepared dataset (e.g., a test or holdout set) and the previously trained model artifact to a “Tabular Regressor Inference” node. Then you specify which feature columns should feed into the model and name the output column that will store the predicted values. When you run the pipeline, TrueState executes the inference step and produces a new dataset with your original columns plus the model’s numeric predictions, ready for downstream analysis or reporting. Classification inference nodes work similarly but output categorical predictions—like customer segments, fraud/no-fraud flags or product category assignments—instead of numbers. You attach your input dataset and a trained classification model to a “Tabular Classifier Inference” node, select the same feature columns used during training, and give a name to the column that will hold the predicted labels. Once the pipeline runs, the node generates a dataset enriched with predicted class labels and probability scores, which you can use for performance evaluation, business decisioning or further processing such as SHAP explainability. The AI assistant can be helpful for writing evaluation data transforms on the inference code. A typical pattern will be to perform the inference, and then create an evaluation dataset with accuracy, precision, recall, and any other helpful evaluation parameters. Just ask the bot to create this evaluation and it knows what to do.

LLM Inference

You can apply a LLM prompt to process each row in a dataset. The LLM model generates a response for every row, and the output is stored in the column you specify under Output Column Name. Use the syntax in your prompt to reference values from dataset columns. During the pipeline run, {column_name} will be replaced by the actual text from that column.

Example: Categorizing Products

In the example below, we use LLM Inference to classify products into categories (sleeping bag, tent, backpack) based on their description column.

Parameter	Value
Input Dataset (outdoor_products)	Contains product details including product_id, product_name, and description.
LLM Configuration	The prompt instructs the model to classify each product into one of the valid categories using the product description.
Output Dataset (outdoor_products_categorized)	The results include a new column product_category, which stores the LLM’s classification for each product.

Apply Embeddings

Embedding nodes let you transform unstructured text into numerical vectors so you can power semantic search, clustering or downstream machine-learning tasks. To use the node, you drag your source dataset onto the canvas, connect it to an “Apply Embeddings” node, and then specify the name of the text column to embed and a name for the new output dataset. When you run the pipeline, TrueState applies a pretrained embedding model to each row’s text, adds a new vector column to your output dataset, and tags it as embedding-searchable—making it easy to perform similarity queries or integrate with dashboards and automations without writing any custom embedding code.

Running Pipelines

The ‘Run’ button in the bottom right of the canvas can be used to run all of the jobs in sequence. To run a specific subset of jobs, use the ‘Run Subset’ button in the header to select a start and finish job. To run a specific node, or all jobs below a specific node, use the run icons to the right of the node.

Logs

After you click ‘Run’, the pipeline will begin working on our dedicated infrastructure. As it works, updates will be posted back to the display as logs.

Node States

After clicking run, nodes will be set to one of 4 states:

State	Meaning
Black	Will not run
Blue	Pending or Running
Green	Success
Red	Failed

These states will be updated as the job is executed. The active logs can be seen in the nodes logs tab or in the logs view.

Viewing Logs

Logs can be viewed either on the node itself, or in the logs UI. To access the logs UI, click the ‘Logs’ button in the top right. Each run will be shown with a timestamp, and each job will include the configuration and the output. The logs can be added to the chat for the assistant to help with any debugging and required modifications. The logs can also be viewed from the node itself by selecting the ‘Logs’ Tab on the node. These error logs can be added directly to the chat with the ‘+’ icon.

Debugging

Adding the error to the chat allows the TrueState AI Assistant to understand what has gone wrong in the pipeline execution. The assistant can then make any required inspections of the data using the query tool, and then resolve the error by modifying the Pipeline directly. Common debugging issues include:

Empty columns
Incorrect column names
SQL syntax and type issues
BigQuery SQL dialect issues

Get started

Guides

Settings

Creating and Running Pipelines

Pipeline AI Assistance

Dataset Nodes

Model Nodes