Integrating data
Link mission-critical data into your AI-driven analytics engine
Integrating external data is the first step in powering AI-driven analytics. TrueState makes it simple to connect to key storage systems, databases, business applications, or upload structured datasets—directly into a pipeline.
Once connected, your data becomes accessible across the platform—enabling dashboards, transformations, AI agents, and enriched analytics workflows.
This guide explains how to bring data into the platform, what sources are supported, how to manage credentials securely, and how to upload datasets manually when needed.
Ways to bring in data
Data can be integrated into TrueState using two primary methods:
- Integration nodes – connect directly to external systems such as databases, cloud storage, or applications
- CSV uploads – import structured flat files manually via the platform
All data sources feed into the Pipeline canvas, where you can combine, transform, and enrich them using AI-driven tools.
You can mix both methods in the same pipeline for fast, multi-source integration.
Uploading CSV files
CSV files are a simple and effective way to get started with structured data. They’re especially useful for working with exports from other systems or manual data inputs.
Upload guidelines:
- Files must be formatted as clean tables
- The first row must be a header (column names)
- No metadata, empty rows, or extra formatting above the header
- Column types should be consistent across rows
- Remove Excel-specific formatting, summary rows, or totals
Date orientation:
For time-based data, we recommend vertical orientation—where each row represents a different date. This ensures smooth parsing, summarisation, and time-series analysis.
Not recommended (horizontal orientation):
Metric | Jan 2023 | Feb 2023 | Mar 2023 |
---|---|---|---|
Revenue | 10000 | 11000 | 10500 |
Recommended (vertical orientation):
Date | Metric | Value |
---|---|---|
2023-01-01 | Revenue | 10000 |
2023-02-01 | Revenue | 11000 |
2023-03-01 | Revenue | 10500 |
Use the Upload CSV option from the Datasets panel or add a CSV Upload node directly in the Pipeline canvas.
Very large files may take longer to process. Ensure all columns have unique and clearly labelled headers to avoid ingest errors.
Connecting to external systems
Use Integration nodes to bring in data from supported systems and cloud platforms. Each Integration node pulls data using credentials stored as secrets, which are securely managed through the platform.
Supported sources:
1. AWS S3
- Access: Access Key ID and Secret
- Format: CSV, JSON, Parquet
- Use: Load data lakes, logs, and cloud pipeline exports
2. Google Cloud Storage (GCS)
- Access: Service Account (JSON key)
- Format: CSV, JSON, Parquet
- Use: Bring in structured exports, training sets, and backup files
3. Azure Blob Storage
- Access: Storage Account Name and Key
- Format: CSV, JSON, Parquet
- Use: Import telemetry, reports, and cloud exports
4. Salesforce (SOQL)
- Access: OAuth via Connected App
- Format: Query result (SOQL)
- Use: Pull leads, opportunities, accounts, and custom object data
5. SharePoint (.pdf ingestion)
- Access: OAuth client credentials
- Format: PDF
- Use: Ingest policies, contracts, and documents for LLM processing
PDFs from SharePoint are processed using multimodal LLMs for advanced document analysis. This incurs higher per-document costs. Only use where a few dollars per file is justified by business impact.
6. SQL Server
- Access: Plain JSON secret with username and password
- Format: Table or SQL query result
- Use: Load internal ERP records, finance tables, or operational data
All credentials are securely encrypted and reusable across multiple pipelines.
Managing credentials (Secrets)
All external data sources require credentials stored securely as secrets. Secrets can be added in two places:
- From the Secrets section (via main navigation)
- Directly inside the Integration node configuration in the Pipeline canvas
Secrets are encrypted, versioned, and scoped to your organisation. Updating a secret does not break dependent pipelines.
Use consistent naming (e.g., gcs-prod
, sql-finance
, salesforce-sandbox
) for clarity and reuse.
Credential setup by source
AWS S3
- Generate an Access Key ID and Secret via AWS IAM
- Go to Secrets → Create New → AWS S3
- Paste the credentials and test the connection
GCP GCS
- Create a Service Account with access to your bucket
- Download the JSON key
- Go to Secrets → Create New → GCS
- Upload the key file and verify access
Azure Blob Storage
- In Azure Portal, retrieve your Storage Account Name and Access Key
- Go to Secrets → Create New → Azure Blob
- Paste in the credentials and test
Salesforce
- Create a Connected App with API access
- Obtain the Consumer Key and Secret
- Go to Secrets → Create New → Salesforce or use the Integration node to add it inline
- Paste the credentials and complete OAuth authorisation
SharePoint
- Register an app in Azure Active Directory
- Retrieve the Client ID, Tenant ID, and Client Secret
- Go to Secrets → Create New → SharePoint or add inline in a node
- Authenticate using the credentials provided
SQL Server
- Gather:
- Host (e.g.,
sql.company.com:1433
) - Database name
- Username and password
- Host (e.g.,
- Create a plain JSON secret:
- Go to Secrets → Create New → Plain-JSON or use the Integration node directly
Enter the host, port, and database in the Integration node; attach the secret to authenticate
Next steps
- Upload a clean CSV or configure an external connection
- Go to the Pipeline section
- Add an Integration node to your pipeline
- Clean your data with the help of the Pipeline agent (see our data cleaning guide for more information)
- Connect the output to dashboards, models, or agents