- 4 Posts
- 9 Comments
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
11·1 year agoSorry, I really don’t care to continue talking about the difference between supervised and unsupervised learning. It’s a pattern used to describe how you are doing ML. It’s not a property of a dataset (you wouldn’t call Dataset A “unsupervised”). Read the Wikipedia articles for more details.
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoNo, in that case there’s no labelling required. That would be unsupervised learning.
https://en.wikipedia.org/wiki/Unsupervised_learning
Conceptually, unsupervised learning divides into the aspects of data, training, algorithm, and downstream applications. Typically, the dataset is harvested cheaply “in the wild”, such as massive text corpus obtained by web crawling, with only minor filtering (such as Common Crawl). This compares favorably to supervised learning, where the dataset (such as the ImageNet1000) is typically constructed manually, which is much more expensive.
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoGround truth labels are just prescriptive labels that we recognize as being true. The main thing that distinguishes unsupervised from supervised is that in unsupervised learning, what is “good” is learned from the unstructured data itself. In supervised learning, what is “good” is learned from some external input, like “good” human-provided examples.
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoNo, it’s unsupervised. In pre-training, the text data isn’t structured at all. It’s books, documents, online sources, all put together.
Supervised learning uses data with “ground truth” labels.
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoThis pre-training was done by Meta. It’s what Llama-3.1-405B is (in contrast to Llama-3.1-405B-Instruct). https://huggingface.co/meta-llama/Llama-3.1-405B
Training Data
Overview: Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoUnsupervised training happens during the pre-training phase when you dump all kinds of quality documents and it learns the relationship between tokens
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoThe article you linked to uses SFT (supervised fine tuning, a specific training technique) as its alignment strategy. There are other ways to fine-tune a model.
I guess I’m wondering if you can train on these partial responses without needing the full rest of the output, without the stop token, or if you need full examples as the article hints to.
hok@lemmy.dbzer0.comOPto
LocalLLaMA@sh.itjust.works•Can you fine-tune on localized steering of an LLM?English
1·1 year agoCan SFT be used on partial generations? What I mean by a “steer” is a correction to only a portion, and not even the end, of model output.
For example, a “bad” partial output might be:
<assistant> Here are four examples: 1. High-quality example 1 2. Low-quality example 2and the “steer” might be:
<assistant> Here are four examples: 1. High-quality example 1 2. High-quality example 2but the full response will eventually be:
<assistant> Here are four examples: 1. High-quality example 1 2. High-quality example 2 3. High-quality example 3 4. High-quality example 4The corrections don’t include the full output.

Thanks for your answer. I think to be clear, what I’m looking for is a kind of masked fine-tuning. You see, I want to “steer” a particular output instead of providing complete examples, which are costly to create.
The steering would be something like this:
What I would like to do is train the model based on these corrections I give it, where many corrections might be part of the same overall generation. Conceptually I think each correction must have some training value. I don’t know much about masking, but what I mean here is that I don’t want it to train on a few tens or hundreds of (incomplete) samples but rather thousands of (masked) “steers” that correct the course of the rest of the sample’s generated text.