MIT6036L03F19c

so if you want to think of this as a classifier we have to pick a threshold right to make it into a classifier so to make a classifier actually if you have to place a bet on whether something is positive or negative to make an actual classifier then you should predict +1 +1 when Sigma Theta naught is greater than 0.5 and I talk in the notes a little bit about how in some kinds of cases you might want to set that threshold differently right it that might be too risky to predict a negative in a case where predicting a negative when it's not such a good idea where the risk of predicting a negative when it should be positive is bad then maybe you want to reduce the threshold so here's a question for you if I want to predict positive when Sigma of this stuff is bigger than 0.5 what's the condition on the inner stuff right what has to be true of this stuff inside for that to be true that that inequality to be true has to be positive right so this is the same as when theta transpose X plus theta naught is bigger than 0 so where it is our same old hypothesis class there was a nice Piazza question about that it is our sail hypothesis class we're just setting things up differently for two reasons one is to make the optimization easier and the other one is that it's actually useful there's actually sort of more information when you get these values out so sometimes in applications it's useful to get this continuous quantity that goes between 0 & 1 you might take advantage of that you might say oh if I'm very sure it's positive I'll do something if I'm very sure it's negative I'll do something but if I'm right near the point 5 then maybe me I should not place a bet yet maybe I should get more information or do something so there's sometimes there's some useful information and the fact that the signal varies okay okay good so what does this look like interior dimensions what does this look like in two dimensions so here's a two-dimensional problem and so now in this problem right our X's have two dimensions so this is the first dimension of the data and this is the second dimension of the data well okay actually we've never really looked too much at the wind emotional case alright so we'll do one dimension and then two dimensions okay so one dimension let's come over here and you one dimension imagine our data is like this negative negative positive positive and it's just in one dimension but I'm gonna kind of make it make a little bit of a cheat here I'm also gonna draw the z axis right so Z or actually I'm gonna draw the Sigma the output of the classifier and so if we have a nice logistic classifier in one dimension then one way to see what's going on is that the output value for points over here right so this is a function of X right and this function here that I'm plotting is Sigma of theta X plus theta naught right so for some value of theta and theta naught I'll get an output curve that looks like that the separator is still wherever this crosses point five right so if this is 0.5 then point five then that's our separator all right so we still have a linear separator which is a point when it's in one dimension but the Sigma value actually has this variation now depending on the values of theta and theta naught we can slide it over so maybe it goes over like that we can actually flip it over so that it goes up on the left side we can make it more or less steep and that all depends on the theta values okay so that's that's what it looks like and what I mentioned in two dimensions it's kind of hard to see because I don't have the third dimension but for this data if you found a good hypothesis a good linear logistic classifier for this the analogous sort of figured to that our separator might be this line right and so what that means is that the Sigma function crosses point five at this line so what is that so think of Sigma just a sigma here so X was our data and Sigma was this extra dimension right it was the output function of the data in two dimensions you think of a sigma as growing out of the board right and in the negative part it should be kind of like zero so it should be kind of flat and then it's gonna kind of come out and then be like one over here right so it's like taking that thing and and and sweeping it I don't know it's hard hard for me to illustrate that there's a little picture in the notes so this is some kind of surface over the XY over that x1 x2 okay but always where it crosses your threshold it will still induce a linear separator back in your original space so that's an important thing to get all right so that's the kind of what this loss function is so let's think about just kind of how we can see that some of the losses how do we kind of kind of actually this at the last election this is the prediction right so this is a prediction Sigma is the prediction now we need a last question okay so all right hypothesis class is gonna be this Lizzie's logistic linear classifiers and they're gonna take a parameter vector theta right which is in D dimensions same as before and a theta zero right so that's going to be our hypothesis class and now we need to derive a loss function

Full Transcript

Need a transcript for another video?