Text is one of the richest but most challenging types of business data. It lives in emails, support tickets, surveys, reviews, contracts, and more. To extract meaning from it at scale, you need the right mix of automation, structure, and interpretability.

TrueState supports a powerful set of text analytics tools, all accessible directly within the Pipeline canvas. These tools let you enrich, classify, tag, and summarise unstructured text—without writing code or managing external models.

This guide explains each of the supported methods, when to use them, and how to configure them inside a pipeline.

Supported methods:


1. Calling automations from pipelines

For advanced or multi-step enrichment, you can call a full automation from within a pipeline.

Using the Automation step, you can run any automation once per record. Each input is passed through the automation, and the result is appended to the dataset.

This allows you to combine:

  • Large Language Model (LLM) chains
  • Web scraping
  • Conditional logic
  • External APIs

For more on how to build these, see the Automations guide.

Use cases:

  • Summarising documents using GPT-4 or Claude
  • Enriching lead records with web-sourced company metadata
  • Running multi-hop reasoning on product reviews

Use Automations when you need control, orchestration, or hybrid workflows. If you only need high-throughput enrichment, prefer LLM inference.


2. High-volume LLM inference

The LLM Inference step uses small, efficient models to enrich text quickly and cost-effectively. It’s ideal when you need structured output across thousands of records.

Capabilities include:

  • Summarising product descriptions
  • Extracting entities or topics
  • Rewriting or simplifying text
  • Light classification or tone detection

Use cases:

  • Creating short-form summaries for a UI
  • Extracting country and company mentions from survey responses
  • Rephrasing raw notes into business-ready summaries

This step is optimised for performance and throughput—not deep reasoning or chain-of-thought logic.


3. Text classification

Text Classification allows you to assign a label to each record by choosing from a predefined set of categories. It uses pre-trained or fine-tuned models behind the scenes to select the most appropriate label from your list.

Unlike Universal Classification or Tagging, this approach doesn’t require logic statements—it simply learns the mapping from text to labels based on examples or embeddings.

Use cases:

  • Categorising feedback into predefined themes (e.g., UI, Pricing, Support)
  • Assigning sentiment categories: Positive, Neutral, Negative
  • Labeling intent in form submissions or queries

Configuration:

  • You provide a list of possible labels (e.g., “Bug”, “Feature Request”, “General Inquiry”)
  • The model selects the best match for each row

Use Text Classification when you already have a clear list of categories and don’t need explanation logic or flexible tagging.


4. Universal classification

Universal classification uses a Natural Language Inference (NLI) model to classify a text input based on a statement you provide.

You define a set of labels, each paired with a statement. The model determines whether that statement is entailed by the input. If so, the corresponding label is applied.

Example:

  • Input: “The customer asked for a refund after receiving a broken item.”
  • Statement: “This message is a complaint.”
  • → Entailed → Assign label: "Complaint"

Use cases:

  • Intent detection in messages or tickets
  • Filtering for eligibility criteria in open-ended responses
  • Auto-labeling short texts for downstream filtering

Write statements as plain-English factual assertions. Avoid ambiguous or compound phrasing.


5. Tagging (criteria-based)

Tagging is a multi-label abstraction of universal classification. Instead of assigning just one label, you define a set of tags, each with one or more criteria statements. If any statement is entailed by the input, the tag is applied.

Multiple tags can be assigned per row. They are returned as a |-separated string.

Example output: "Complaint|Urgent|Refund"

Use cases:

  • Flagging multiple concerns in a support transcript
  • Annotating user feedback with multiple themes
  • Extracting overlapping topics from interviews

How to define: Each tag is defined as:
TagName | Statement
Multiple criteria can be attached to the same tag.

Use tagging when you want broad annotation across multiple dimensions. For single-label classification, use universal classification instead.


6. Hierarchy classification

Hierarchy classification is for structured, multi-level label selection. You define a hierarchy of labels, grouped by level, where the model evaluates entailed statements within each level and selects the highest-scoring peer.

This approach ensures mutually exclusive decisions at each level of a hierarchy.

Key rule: Labels at each level must be MECE (Mutually Exclusive, Collectively Exhaustive).

Example structure:

Level 1

  • Product Feedback: “This message is about the product.”
  • Support Request: “This message is asking for help.”
  • General Comment: “This message is general commentary.”

Level 2 (under Product Feedback)

  • Pricing Concern: “The message discusses the product’s pricing.”
  • Feature Request: “The message asks for a new product feature.”

If a record is classified as “Product Feedback” at Level 1, it will be evaluated against the Level 2 options. Among any peer group, only the highest-scoring label is selected.

Use cases:

  • Classifying tickets into department → topic → subtopic
  • Routing forms through a business process hierarchy
  • Multi-level content categorisation

Use hierarchy classification when your label set is nested or tree-structured. Ensure each group at a level has no overlaps in definition.


Choosing the right text analytics method

GoalRecommended StepNotes
Flexible, high-quality enrichmentAutomation stepUse for chains, scraping, or external APIs
Fast enrichment at scaleLLM Inference stepBest for summarisation and extraction
Simple multi-class predictionText ClassificationUse when categories are known and unambiguous
Logic-based single-label classificationUniversal ClassificationUses NLI to match statements
Multi-label annotationTagging stepFlexible tagging using pipe-separated output
Tree-based classificationHierarchy ClassificationSupports nested taxonomies with MECE label logic

Glossary

  • Automation step – Executes a full automation workflow for each row in a dataset.
  • LLM Inference step – Applies fast, high-throughput language models to text.
  • Text Classification – Assigns a best-match label from a list without needing logic statements.
  • Universal Classification – Uses NLI to assign a single label based on entailed logic.
  • Tagging – Assigns multiple tags using matching statements and pipe-separated outputs.
  • Hierarchy Classification – Selects labels across multiple levels with MECE structure.
  • MECE – A classification principle: Mutually Exclusive, Collectively Exhaustive.

Best practices

  • Write clear, concise, and specific statements for classification
  • Use Text Classification when labels are stable and fixed
  • Don’t overload tagging steps—group by theme when possible
  • Validate performance on a small batch before scaling
  • Use Automations for advanced logic, but monitor cost and latency

Next steps

  • Go to the Pipeline section in TrueState
  • Upload a dataset with one or more text columns
  • Add the appropriate text analytics node to your pipeline
  • Configure using natural language, label lists, or tag templates
  • Combine with downstream classification, enrichment, or reporting steps