"Would it be fair to summari..."

by Eric Rogstad Mar 23 2016

Would it be fair to summarize the idea of a conservative concept boundary as a classifier that avoids false positives while remaining simple?

Comments

Eliezer Yudkowsky

Well, the purpose is to avoid the AGI classifying potential goal fulfillments in a way that, from the user's perspective, is a "false positive". The reason why we have to spend a lot of time thinking about really, really good ways to have the AGI not guess positive labels on things that we wouldn't label as positive, is that the training data we present to the AI may be ambiguous in some way we don't know about, or many ways we don't know about. Meaning that the AI does not actually have the information to figure out what we meant by looking for the simplest ways to classify the training cases, and instead has to do something that's very very similar to the positively labeled training instances to minimize the probability of screwing up.

I'm pushing back a little on this "classifier that avoids false positives" description because that's what every classifier is in some sense intended to do; you have to be specific about how, or what approach you're taking, in order to say something that means more than just "classifier that is a good classifier".

Eric Rogstad

I'm pushing back a little on this "classifier that avoids false positives" description because that's what every classifier is in some sense intended to do

Well presumably there's a trade-off between avoiding false positives and avoiding false negatives. And you want a classifier that tries really hard to avoid false positives, as I understand.

Eric Rogstad

Suppose there are existing generic techniques for developing classifiers that prioritize avoiding false positives over avoiding false negatives -- would you not expect them to find a "conservative concept boundary" by default?