Skip to main content

Text Scoring

Text scoring is the process of assigning a score to a piece of text. In the TrueState platform, text scoring models follow the preference learning paradigm, where a model is trained on "chosen" and "rejected" example pairs. The model learns to assign a higher score to chosen examples and a lower score to rejected examples.

While text scoring models can be used in the fine-tuning of text generation models (via RLHF), they can also be used for preference ranking, data quality scoring, and more.

Currently the base model used for all text scoring models is RoBERTa.

Train a text-scoring model

from truestate import datasets, models

# get the dataset
dataset = datasets.get(name="my-dataset")

# train the model
model = models.TextClassifier(
name="my-model",
)

model.train(
dataset=dataset
input_column="text",
target_column="label",
)

Apply a text scoring model to new data

from truestate import models, datasets

# get the model
model = models.get(name="my-model")

# get the dataset
dataset = datasets.get(name="my-dataset")

# apply the model to the dataset
predictions = model.inference(
dataset=dataset,
input_column="text",
)