Okay, hi everybody and welcome to Lesson 14.
The numbers are getting up pretty high now, huh? We had a lesson last time talking about
calculus and how we implement the chain rule in neural network training in an efficient way
called backpropagation. I just wanted to point out that one excellent student, Kaushik Sinha,
has produced a very nice explanation of the code that we looked at last time and I've linked
to it. So it's got the math and then the code. The code's slightly different to what I had, but
it's basically the same thing, some minor changes. And it might be helpful to kind of link between
the math and the code to see what's going on. So you'll find that in the Lesson 13
resources. But I thought I'd just quickly try to explain it as well. So maybe I could try to
copy this and just explain what's going on here. With this code. So the basic idea is that we
have a neural network that is calculating, well, a neural network and a loss function that
together they calculate a loss. So let's imagine, let’s just call the loss
function, we'll call it L. And the loss function is being applied to the
output of the neural network. So the neural network function we'll call n. And that takes two
things, a bunch of weights and a bunch of inputs. The loss function also requires the targets, but
I'm just going to ignore that for now because it's not really part of what we actually care
about. And what we're interested in knowing is if we want to be able to update the weights,
let's say this is just a single layer things, keep it simple. If we want to be able to update
the weights, we need to know how does the loss change if we change the weights, if we
change one weight at a time, if you like. So how would we calculate that? Well, what we
could do is we could rewrite our loss function by saying, well, let's call capital N the result
of the neural network applied to the weights and the inputs. And that way we can now
rewrite the loss function to say L equals, big L equals, little l, the loss function
applied to the output of the neural network. And so maybe you can see where this is going.
We can now say, okay, the derivative of the loss with respect to the weights is going
to be equal to the derivative of the loss with respect to the outputs
of that neural network layer times, this is the chain rule, the derivative
of the outputs of that neural network layer. I'm going to get my notation consistent since
these are not scalar with respect to the weights. So you can see we can get rid of those and we
end up with the change in loss with respect to the weights. And so we can just say this is
a chain rule. This is what the chain rule is. So the change in the loss with respect
to the output of the neural network, well, we did the forward pass here and
then we took here, this here is where we calculated the derivative of the loss with
respect to the output of the neural network, which came out from here and ended up in diff. So
there it is. So out.g contains this derivative. So then to calculate, let's actually do one
more. We could also say the change in the loss with respect to the inputs, we can do
the same thing with the chain rule times… And so this time we have the
inputs. So here you can see that is this line of code. So that is the change
in the loss with respect to the inputs. That's what input.g means. And it's equal to the
change in the loss with respect to the output. So that's what out.g means. Times… It's
actually matrix times, because we're doing matrix calculus, times this derivative, and
since this is a linear layer we were looking at, this derivative is simply the weights themselves.
And then we have exactly the same thing for w.g, which is the change in the loss,
the derivative of the loss with respect to the weights. And so again, you've got the same
thing. You've got your out.g, and remember we actually showed how we can simplify this into
also a matrix product with a transpose as well. So that's how what's happening in
our code is mapping to the math. So hopefully that's useful, but as I say, do check
out this really nice resource, which has a lot more detail if you're interested in digging
deeper. The other thing I'd say is if you, some people have mentioned that they actually
didn't study this at high school, which is fine. We've provided resources on the forum for
recommending how to learn the basics of derivatives and the chain rule. And so in
particular, I would recommend 3Blue1Brown's essence of calculus series and also Khan
Academy. It's not particularly difficult to learn. It'll only take you a few hours and
then you can, this will make a lot more sense. Or if you did it at high school,
but you've forgotten it, same deal. So don't worry if you found this difficult because
you had forgotten the, or had never learned the basic derivative and chain rule stuff. That's
something that you can pick up now and I would recommend doing so. Okay. So what we then did last
time, which is actually pretty exciting, is we got to a point where we had successfully created
a training loop, which did these four steps. So and the nice thing is that every
single thing here is something that we have implemented from scratch. Now, we didn't
always use our implemented from scratch versions. There's no particular reason to, when we've
re-implemented something that already exists, let's use the version that exists. But every
single thing here, well, I guess not argmax, but that's trivially easy to implement. Every
single thing here, we have implemented ourselves and we successfully trained an MNIST model to
96% accurately recognize handwritten digits. So I think that's super neat. It's, this
is, I mean, this is not a great metric. It's only looking at the training set, in
particular it's only looking at one batch of the training set. Since last time, I've just
refactored a little bit. I've pulled out this report function, which is now just running at
the end of each epoch. And it's just printing out the loss and the accuracy. Just something
I wanted to mention here is hopefully you've seen f-strings before. They're a really helpful
part of Python that lets you pop a variable or an expression inside curly braces in a string and
it'll evaluate it. You might not have seen this colon thing. This is called a format specifier.
And with a format specifier, you can change how things are printed in an f-string. So this is
how I'm printing it to do decimal places. This says a two decimal places floating point number
called loss printed out here, followed by a comma. So I'm not going to show you how to use those
other than to say, yeah, Python f-strings and format specifiers are really helpful. And so if
you haven't used them before, do go look them up, a tutorial of the documentation, because they're
definitely something that you'll probably find useful to know about. Okay. So let's
just rerun all those lines of code. If you're wondering how I just reran
all the cells above where I was, there's a cell here. There's Run All Above.
And it's so helpful that I always make sure there's a keyboard shortcut for that. So you
can see here, I've added a keyboard shortcut QA. So if I type QA, it runs all cells above. If
I type QB, it runs all cells below. And so yeah, stuff that you do a lot, make sure you've got
keyboard shortcuts for them. You don't want to be fiddling around, moving around your mouse
everywhere. You want it to be as easy as thinking. So this is really exciting. We've successfully
built and trained a neural network model from scratch and it works okay. It's a bit clunky.
There's a lot of code. There's features we're missing. So let's start refactoring it. And
so refactoring is all about making it so we have to write less code to do the same work.
And so we're now going to, I'm going to show you something that's part of PyTorch and
then I'm going to show you how to build it. And then you'll see why this is really useful. So
PyTorch has a sub module called nn, torch.nn. And in there, there's something called the Module
class. Now we don't normally use it this way, but I just want to show you how it works. We
can create an instance of it in the usual way where we create instances of classes, and then we
can assign things to, attributes of that module. So for example, let's assign a linear
layer to it. And if we now print out that, you'll see it says, oh, this is a
module containing something called foo, which is a linear layer. But here's something
quite tricky. This module, we can say, show me all of the named children
of that module. And it says, oh, there's one called foo and it's a linear layer.
And we can say, oh, show me all of the parameters of this module. And it says, oh, okay,
sure. There's two of them. There's this four by three tensor, that's the weights. And
there's this four long vector, that's the biases. And so somehow just by creating this module
and assigning this to it, it's automatically tracked what's in this module and what are its
parameters. That's pretty neat. So we're going to see both how and why it does that. I'm just going
to point out, by the way, why did I add list here? If I just said m1.named_children(), it just
prints out generator object, which is not very helpful. And that's because this is a kind of
iterator called a generator. And it's something which is going to only produce the contents
of this when I actually do something with it, such as list them out. So just popping a list
around a generator is one way to run the generator and get its output. So that's a little trick
when you want to look inside a generator. Okay. So now, as I said, we don't normally use
it this way. What we normally do is we create our own class. So for example, we'll create
our own multi-layer perceptron and we inherit it. We inherit from nn.Module. And so then in
dunder init, this is the thing that constructs an object of the class. This is the special magic
method that does that. We'll say, okay, well, how many inputs are there to this multi-layer
perceptron? How many hidden activations and how many output activations are there? So it'd just
be one hidden layer. And then here we can do just like we did up here, where we assigned things as
attributes, we can do that in this constructor. So we'll create an l1 attribute, which is a
linear layer from number in to number hidden. l2 is a linear layer from number hidden to
number out, and we'll also create a ReLU. And so, when we call that module, we
can take the input that we get and run the linear layer and then run the ReLU and
then run the l2. And so I can create one of these, as you see, and I can have a look and see like,
oh, here's the attribute l1. And there it is, like I had, and I can say, print out the model and
the model knows all the stuff that's in it. And I can go through each of the named children and
print out the name and the layer. Now, of course, if you remember, although you can use dunder call,
we actually showed how we can refactor things using forward such that it would automatically
kind of do the things necessary to make all the automatic gradient stuff work correctly. And
so in practice, we're actually not going to do dunder call, we would do forward. So this is an
example of creating a custom PyTorch module. And the key thing to recognize is that it knows
what are all the attributes you added to it. And it also knows what are all the parameters. So
if I go through the parameters and print out their shapes, you can see I've got my linear layers
weights, first linear layer, sorry, second linear layer, my… oh no: first linear layers weights, my
first linear layers biases, second linear layers weights, second linear layers biases. And this 50
is because we set nh, the number of hidden, to 50. So why is that interesting? Well, because
now I don't have to write all this anymore going through layers and having to make
sure that they've all been put into a list. We've just been able to add them as
attributes and they're automatically going to appear as parameters. So we
can just say, go through each parameter and update it based on the gradient
and the learning rate. And furthermore, you can actually just go model.zero_grad()
and it'll zero out all of the gradients. So that's really made our code quite a lot nicer
and quite a lot more flexible, which is cool. So let's check that this still works. There we go. So just to clarify, if I called
report() on this before I ran it, as you would expect, the accuracy is about 8%, well, about
10%, a bit less, and the loss is pretty high. And so after I run this fit(), this model,
the accuracy goes up and the loss goes down. So basically it's all of this
exactly the same as before. The only thing I've changed are these two lines
of code. So that's a really useful refactoring. So how on earth did this happen? How did it know
what the parameters and layers are automatically? It used a trick called dunder setattr, and
we're going to create our own nn.Module now. So if there was no such thing as
nn.Module, here's how we'd build it. And so let's actually build it and also add some
things to it. So in dunder init, we would have to create a dictionary for our named children. This
is going to contain a list, a dictionary of all of the layers. Okay. So just like before,
we'll create a couple of linear layers, right? And then what we're going to do is we're
going to define this special magic thing that Python has called dunder setattr. And this is
called automatically by Python, if you have it, every time you set an attribute such as here or
here. And it's going to be passed the name of the attribute, the key, and the value is the actual
thing on the right hand side of the equal sign. Now, generally speaking, things that start with an
underscore we use for private stuff. So we check that it doesn't start with an underscore. And if
it doesn't start with an underscore, setattr will put this value into the modules dictionary
with this key and then call Python’s… the normal Python’s setattr to make sure it
just actually does the attribute setting. So super() is how you call whatever
is in the super class, the base class. So another useful thing to know about is how does
it do this nifty thing where you can just type the name and it kind of lists out all this information
about it. That's a special thing called dunder repr. So here dunder repr will just have it return
a stringified version of the modules dictionary. And then here we've got parameters(). How did
parameters work? So how did this thing work? Well, we can go through each of those modules,
go through each value. So the values of the modules is all the actual layers and then go
through each of the parameters in each module and yield p. So that's going to create an
iterator, if you remember when we looked at iterators for all the parameters. So let's
try it. So we can create one of these modules and if we just like before loop
through its parameters, there they are. Now I'll just mention something that's optional,
kind of like advanced Python that a lot of people don't know about, which is there's no need to
loop through a list or a generator or I guess say loop through an iterator and yield. There's
actually a shortcut, which is you can just say: yield from and then give it the iterator.
And so with that, we can get this all down to one line of code and it'll do exactly
the same thing. So that's basically saying yield one at a time, everything in here, that's
what yield from does. So there's a cool little advanced Python thing, totally optional, but if
you're interested, I think it can be kind of neat. So we've now learned how to create our own
implementation of nn.Module and therefore we are now allowed to use PyTorch's
nn.Module. So that's good news. So how would we do using the PyTorch nn.Module, how
would we create the model that we started with, which is where we had this self.layers? Because we
want to somehow register all of these all at once. That's not going to happen
based on the code we just wrote. So to do that, let's have a look. We can, so
let's make a list of the layers we want. And so we'll create again a subclass of nn.Module. Make
sure you call the super() classes in it first, and we'll just store the list of layers and
then to tell PyTorch about all those layers, we basically have to loop through them and call add_module() and say what the name of
the module is and what the module is. And again, probably should have used
forward here in the first place. And you can see this has now done exactly the same
thing. Okay. So if you've used a sequential model before, you'll see, or you can see that we're
on the path to creating a sequential model. Okay. So Ganesh has asked an interesting
question, which is what on earth is super calling because we actually, in fact, we
don't even need the parentheses here. We actually don't have a base class. That's
because if you don't put any parentheses or if you put empty parentheses, it's actually
a shortcut for writing that. And so Python has stuff in object, which does, you know,
all the normal objecty things like storing your attributes so that you can get them
back later. So that's what's happening there. Okay. So this is a little bit awkward is to
have to store the list and then enumerate and call add_module(). So now that we've implemented
that from scratch, we can use PyTorch's version, which is they've just got something called
ModuleList that just does that for you. Okay. So if you use ModuleList and pass it
a list of layers, it will just go ahead and register them all those modules for you. So
here's something called SequentialModel. This is just like nn.Sequential now. So if I create it
passing in the layers, there you go. You can see there's my model containing my
module list with my layers. And so, I don't know why I never used
forward for these things. It's silly. I guess it doesn't matter
terribly in this stage, but anywho. Okay. So, call fit() and there we go. Okay. So in forward
here, I just go through each layer and I set the result of that equal to calling that layer on the
previous result and then pass and return it at the end. Now there's a little, another way of doing
this, which I think is kind of fun. It's not, like, shorter or anything at this stage. I just
wanted to show an example of something that you see quite a lot in machine learning code, which
is the use of reduce(). This implementation here is exactly the same as this thing here. So let me explain how it works. What reduce
does. Reduce is a very common kind of, like, fundamental computer science concept: reductions.
This is something that does a reduction and what a reduction is, is it's something that says, start
with, the third parameter, some initial value. So we're going to start with x, the thing with being
passed and then loop through a sequence. So look through each of our layers and then for each
layer call some function. Here is our function. And the function is going to get passed, first
time around, it’ll be passed the initial value and the first thing in your
list. So your first layer and x. So it's just going to call the layer function on
x. The second time around it takes the output of that and passes it in as the first parameter and
passes in the second layer. So then the second time this goes through, it's going to be calling
the second layer on the result of the first layer and so forth. And that's what a reduction
is. And so when you might see reduce(), you'll certainly see it talked about quite a lot
in papers and books, and you might sometimes also see it in code. It's a very general concept. And
so here's how you can implement a sequential model using reduce(). So there's no explicit loop there,
although the loop is still happening internally. So now that we've reimplemented sequential, we
can just go ahead and use PyTorch's version. So there's nn.Sequential. We can pass in our
layers and we can fit, not surprisingly. We can see the model. So yeah, it looks very
similar to the one we built ourselves. All right. So this thing of
looping through parameters and updating our parameters based on gradients
and a learning rate, and then zeroing them is very common. So common that there is something
that does that all for us. And that's called an optimizer. It's the stuff in optim. So let's
create our own optimizer. And as you can see, it's just going to do the two things we just saw.
It's going to go through each of the parameters and update them using the gradient and the
learning rate. And there's also zero grad, which will go through each parameter
and set their gradients to zero. If you use .data, it's just a way of avoiding
having to say torch.no_grad, basically. So in optimizer, we're going to pass it
the parameters that we want to optimize, and we're going to pass it the learning rate.
And we're just going to store them away. And since the parameters might be a generator,
we'll call list() to turn them into a list. So we're going to create our optimizer, pass
it in the model.parameters(), which have been automatically constructed for us by nn.Module.
And so here's our new loop. Now we don't have to do any of the stuff manually. We can just
say opt.step(). So that's going to call this. And opt.zero_grad(). And
that's going to call this. There it is. So we've now built our own SGD
optimizer from scratch. So I think this is really interesting, right? These things which
seem like they must be big and complicated, once we have this nice structure in place,
an SGD optimizer doesn't take much code at all. And so it's all very transparent,
simple, clear. If you're having trouble using complex library code that you've found
elsewhere, this can be a really good approach, is to actually just go all the way back, remove
as many of these abstractions as you can and run everything by hand to see exactly what's
going on. It can be really freeing to see that you can do all this. Anyway, since PyTorch
has this for us in torch.optim, it's got an optim.SGD(). And just like our version, you pass
in the parameters and you pass in the learning rate. So you really see it is just the same.
So let's define something called get_model(). That's going to return the model, the sequential
model and the optimizer for it. So if we go model, opt equals get_model(), and then we can call
the loss function to see where it's starting. And so then we can write our training loop
again. Go through each epoch, go through each starting point for our batches, grab the slice,
slice into our X and Y in the training set, calculate our predictions, calculate our loss,
do the backward pass, do the optimizer step, do the zero gradient and print out how you're
going at the end of each one. And there we go. All right. So let's keep making this
simpler. There's still too much code. So one thing we could do is we could replace
these lines of code with one line of code by using something we'll call the Dataset class. So the
Dataset class is just something that we're going to pass in our independent and dependent variable.
We'll store them away as self.x and self.y. We'll have something. So if you define dunder len,
then that's the thing that allows the len function to work. So the length of the Dataset will just
be the length of the independent variables. And then dunder getitem is the thing that
will be called automatically anytime you use square brackets in Python. So that just
is going to call this function passing in the indices that you want. So when
we grab some items from our Dataset, we're going to return a tuple of the x values and
the y values. So then we'll be able to do this. So let's create a Dataset using
this tiny little tree line class. It's going to be a Dataset containing the
x and y training, and then create another Dataset containing the x and y valid. And those
two datasets we'll call train_ds and valid_ds. So let's check the length of those datasets should
be the same as the length of the x’s and they are. And so now we can do exactly what
we hoped we could do. We can say xb comma yb equals train_ds and pass in some slice. So that's going to give us back our… Check the
shapes are correct. It should be 5 by 28 by 28, 5 by 28 times 28 and the y should just be
five. And so here they are the x’s and the y’s. So that's nice. We've created a Dataset from
scratch. And again, it's not complicated at all. And if you look at the actual PyTorch
source code, this is basically all Dataset do. So let's try it. We call get_model(). And so now
we've replaced our dataset line with this one and per usual, it still runs. And so this is what
I do when I'm writing code is I try to, like, always make sure that my starting code works as
I refactor. And so you can see all the steps. And so somebody reading my code can then see
exactly like, why am I building everything I'm building? How does it all fit in? See that
it still works. And I can also keep it clear in my own head. So I think this is a really
nice way of implementing libraries as well. All right. So now we're going to
replace these two lines of code with this one line of code. So we're going
to create something called a DataLoader and a DataLoader is something that's just going to
do this. Okay. So we need to create an iterator. So an iterator is a class that has a dunder
iter method. When you say “for in” in Python, behind the scenes, it's actually calling dunder
iter to get a special object, which it can then loop through using yield. So it's basically
getting this thing that you can iterate through using yield. So a DataLoader is something
that's going to have a Dataset and a batch size, because we're going to go through the
batches and grab one batch at a time. So we have to store away the Dataset and the batch
size. And so when you, when we call the for loop, it's going to call dunder iter. We're going to
want to do exactly what we saw before, go through the range, just like we did before, and then
yield that bit of the data set. And that's all. So that's a DataLoader. So we can now create
a train DataLoader and a valid DataLoader from our train Dataset and valid Dataset. And so
now we can, if you remember the way you can create one thing out of an iterator, so you don't
need to use a for loop, you can just say iter, and that will also call dunder iter. Next, we'll
just grab one value from it. So here we will run this and you can see we've now just confirmed
we’ve xb is a 50 by 784 and yb, there it is. And then we can check what it looks like. So
let's grab the first element of our X batch, make it 28 by 28. And there it is. So now that
we've got a DataLoader, again, we can grab our model and we can simplify our fit function to just
go for xb, yb in train_dl. So this is getting nice and small, don't you think? And it still works
the same way. Okay. So this is really cool. And now that it's nice and concise,
we can start adding features to it. So one feature I think we should add is that
our training set, each time we go through it, it should be in a different order. It should
be randomized, the order. So instead of always just going through these indexes in order, we
want some way to say, go use random indexes. So the way we can do that is create a class
called Sampler. And what sampler is going to do, I'll show you, is if we create a sampler
without shuffle, without randomizing it, it's going to simply return all
the numbers from zero up to n in order and it'll be an iterator. See, this
is dunder iter. But if I do want it shuffled, then it will randomly shuffle them. So here you
can see I've created a sampler without shuffle. So if I then make an iterator from that and print
a few things from the iterator, you can see it's just printing out the indexes it's going to
want. Or I can do exactly the same thing as we learned earlier in the course using islice. We
can grab the first five. So here's the first five things from a sampler when it's not shuffled.
So as you can see, these are just indexes. So we could add shuffle equals true. And now
that's going to call random.shuffle(), which just randomly permuts them. And now if I do the same
thing, I've got random indexes of my source data. So why is that useful? Well, what we could now
do is create something called a BatchSampler. And what the BatchSampler is going to do is it's
going to basically do this islice thing for us. So we're going to say, okay, pass in a sampler.
So that's something that generates indices and pass in a batch size. And remember, we've
looked at chunking before. It's going to chunk that iterator by that batch size. And so if I now say, all right, please
take our sampler and create batches of 4. As you can see here, it's creating batches
of four indices at a time. So rather than just looping through them in order, I
can now loop through this BatchSampler. So we're going to change our data loader
so that now it's going to take some BatchSampler. And it's going to loop through the
BatchSampler. That's going to give us indices. And then we're going to get that Dataset item
from that batch for everything in that batch. So that's going to give us a list. And then we
have to stack all of the x’s and all of the y’s together into tensors. So I've created
something here called collate function. And we're going to default that to this little
function here, which is going to grab our batch, pull out the x’s and y’s separately,
and then stack them up into tensors. So this is called our collate function.
Okay. So if you put all that together, we can create a training sampler, which is a batch
sampler over the training set with shuffle true. A validation sampler will be a batch sampler
over the validation set with shuffle false. And so then we can pass that
into this DataLoader class, the training data set and the training sampler
and the collate function, which we don't really need because it's, we're just using the default
one. So I guess we can just get rid of that. And so now here we go. We can do
exactly the same thing as before xb, yb, next, iter. And this time we use the valid
DataLoader, check the shapes. And this is how PyTorch's actual DataLoader works. This is the,
this is all the pieces they have. They have samplers, they have batch samplers, they have
a collation function and they have DataLoaders. So remember that what I want you
to be doing for your homework is experimenting with these carefully to see
exactly what each thing's taking in. Okay. So Piotr is asking on the chat, what is this
collate thing doing? Okay. So collate function, it defaults to collate. What does it do? Well,
let's see, let's go through each of these steps. Okay. So we need, so we've got a batch
sampler, so let's do just the valid sampler. Okay. So the batch sampler, here it is. So we're going to go through each
thing in the batch sampler. So let's just grab one thing from the batch sampler. Okay. So the
output of the batch sampler will be next. It's okay. So here's what the batch sampler
contains. All right. Just the first 50 digits, not surprisingly, because this is our
validation sampler. If we did a training sampler, that would be randomized. There they
are. Okay. And what we then do is we go self.dataset[i] for i in b.
So let's copy that. Copy, paste. And so rather than self.dataset[i],
we'll just say valid_ds[i]. Oh, and it's not i and b it's i
and o that's what we called it. Oh, and we did it for training. Sorry. Training.
Okay. So what it's created here is a list of tuples of tensors, I think. Let's have a
look. So let's have a look. So let's say this, p —whatever. So p[0]. Okay is a tuple. It's got
the x and the y, independent variable. So that's not what we want. What we want is something that
we can loop through. We want to get batches. So what the collation model is going to do, sorry not
collation model, the collate function is going to do is it's going to take all of our x’s and all
of our y’s and collate them into two tensors, one tensor of x’s and one tensor of y’s. So the
way it does that is it first of all calls zip(). So zip is a very, very commonly used Python
function. It's got nothing to do with the compression program zip, but instead what it does
is it effectively allows us to transpose things so that now, as you can see, we've got all of the
second elements or index 1 elements all together and all of the index 0 elements together. And
so then we can stack those all up together and that gives us our y’s for our batch. So that's
what collate does. So the collate function is used an awful lot in PyTorch, increasingly nowadays
where Hugging Face stuff uses it a lot. And so we'll be using it a lot as well. And basically
it's a thing that allows us to customize how the data that we get back from our Dataset, once
it's been kind of generating a list of things from the Dataset, how do we put it together
into a bunch of things that our model can take as inputs? Because that's really what we want
here. So that's what the collation function does. Oh, this is the wrong way around. Like so. This is something that I do so often that
fastcore has a quick little shortcut for it, just called store_attr, store attributes. And so if you
just put that in your dunder init, then you just need one line of code and it does exactly the same
thing. So there's a little shortcut as you see. And so you'll see that quite a bit. All
right. Let's have a seven minute break and see you back here very soon. And we're going
to look at a multi-processing DataLoader, and then we'll have nearly finished
this notebook. All right. See you soon. All right. Let's keep going. So we've seen how
to create a DataLoader and sampling from it. The PyTorch DataLoader works exactly like this,
but it uses a lot more code because it implements multi-processing. And so multi-processing means
that the actual, this thing here, that code, can be run in multiple processes. They can be run
in parallel for multiple items. So this code, for example, might be opening up a JPEG, rotating it,
flipping it, et cetera. Right? So because remember this is just calling the dunder getitem for a
Dataset. So that could be doing a lot of work for each item and we're doing it for every item in the
batch. So we'd love to do those all in parallel. So I'll show you a very quick and dirty
way that basically does the job. So Python has a multi-processing library. It doesn't
work particularly well with PyTorch tensors. So PyTorch has created an exact re-implementation of
it. So it's identical API wise, but it does work well with tensors. So this is basically what
has grabbed the multi-processing. So this is not quite cheating because multi-processing isn't
the standard library and this is API equivalent. So I'm going to say, we're allowed to do that.
So as we've discussed, you know, when we call square brackets on a class, it's actually
identical to calling the dunder getitem function on the object. So you can see here, if
we say, give me items 3, 6, 8, and 1, it's the same as calling dunder
getitem passing in 3, 6, 8, and 1. Now why does this matter? Well, I'll show you why.
It matters because we're going to be able to use map and I'll explain why we want to use map in
a moment. Map is a really important concept. You might've heard of map-reduce. So we've already
talked about reductions and what those are. Maps are kind of the other key piece. Map is something
which takes a sequence and calls a function on every element of that sequence. So imagine we had
a couple of batches of indices, 3 and 6 and 8 and 1. Then we're going to call dunder getitem
on each of those batches. So that's what map does. Map calls this function on every element
of the sequence. And so that's going to give us the same stuff, but now this same as this, but now
batched into two batches. Now why do we want to do that? Because multiprocessing has something called
Pool where you can tell it how many workers do you want to run, how many processes you want to run.
And it then has a map which works just like the normal Python map, but it runs this function
in parallel over the items from this iterator. So this is how we can create a multiprocessing
DataLoader. So here we're creating our DataLoader. And again, we don't actually need to pass in the
collate function because we're using the default one. So if we say n_workers equals 2 and then
create that, if we say next, see how it's taking a moment and it took a moment because it was
firing off those two workers in the background. So the first batch actually comes out more slowly.
But the reason that we would use a multiprocessing DataLoader is if this is doing a lot of work, we
want it to run in parallel. And even though the first item might come out a bit slower,
once those processes are fired up, it's going to be faster to run. So this is a really
simplified multiprocessing DataLoader. Because this needs to be super, super efficient,
PyTorch has lots more code than this to make it much more efficient. But the idea is this,
and this is actually a perfectly good way of experimenting or building your own DataLoader
to make things work exactly how you want. So now that we've re-implemented all this from
PyTorch, let's just grab PyTorch’s. As you can see, they're exactly the same DataL oader. They
don't have one thing called sampler that you pass shuffle to. They have two separate classes
called SequentialSampler and RandomSampler. I don't know why they do it that way. It's a
little bit more work to me, but same idea. And they've got BatchSampler. And so it's exactly the
same idea. The training sampler is a BatchSampler with a RandomSampler. The validation sampler
is a BatchSampler with a SequentialSampler. Pass them in batch sizes. And so we can now pass
those samplers to the DataLoader. This is now the PyTorch’s DataLoader. And just like ours, it
also takes a collate function. And it works. Cool. So that's, as you can see, it's doing exactly
the same stuff that ours is doing with exactly the same API. And it's got some shortcuts, as I'm
sure you've noticed when you've used DataLoaders. So for example, calling batch sampler is going
to be very, very common. So you can actually just pass the batch size directly to a DataLoader, and
it will then auto-create the batch samplers for you. So you don't have to pass in BatchSampler at
all. Instead you can just say sampler, and it will automatically wrap that in the batch sampler
for you. So it does exactly the same thing. And in fact, because it's so common to create
a RandomSampler or a SequentialSampler for a Dataset, you don't have to do that manually.
You can just pass in shuffle equals true or shuffle equals false to the DataLoader. And
that does, again, exactly the same thing. There it is. Now something that is very
interesting is that, when you think about it, the batch sampler and the collation function
are things which are taking the result of the sampler, looping through them,
and then collating them together. But what we could do is, actually, because our Datasets know how to grab multiple
indices at once, we can actually just use the BatchSampler as a sampler. We don't
actually have to loop through them and collate them because they're basically instantly,
they come pre-collated. So this is a trick which actually Hugging Face stuff can use as well, and
we'll be seeing it again. So this is an important thing to understand is how come we can pass a
BatchSampler to sampler and what's it doing? And so rather than trying to look through the
PyTorch code, I suggest going back to our non-multi-processing pure Python code to see
exactly how that would work. Because it's a really nifty trick for things that you can grab multiple
things from at once and it can save a whole lot of time. It can make your code a lot faster. Okay.
So now that we've got all that nicely implemented, we should now add a validation set. And there's
not really too much to talk about here. We'll just take our fit function, and this is
exactly the same code that we had before. And then we're just going to add something
which goes through the validation set and gets the predictions and sums up the losses
and accuracies and from time to time prints out the loss and accuracy. And so get_dls(), we will
implement by using the PyTorch DataLoader now. And so now our whole process will be get_dls()
passing in the training and validation dataset. Notice that for our validation DataLoader, I'm
doubling the batch size because it doesn't have to do back propagation. So it should use about half
as much memory so I can use a bigger batch size. Get our model and then call this
fit. And now it's printing out the loss and accuracy on the validation set.
So finally we actually know how we're doing, which is that we're getting 97% accuracy on the
validation set. And that's on the whole thing, not just on the last batch. So that's cool. We've now
implemented a proper, working, sensible training loop. It's still, you know, a bit more code
than I would like, but it's not bad. And every line of code in there and every line of code it's
calling is all stuff that we have built ourselves, re-implemented ourselves. So we know exactly
what's going on and that means it's going to be much easier for us to create anything we can think
of. We don't have to rely on other people's code. So hopefully you're as excited about that as I
am. Cause it really opens up a whole world for us. So one thing that we're going to want to be able
to do now that we've got a training loop is to grab data. And there's a really fantastic library
of datasets available on Hugging Face nowadays. And so let's look at how we use those datasets
now that we know how to bring things into data loaders and stuff so that now we can use
the entire world of Hugging Face datasets with our code. So we're going to,
so you need to pip install datasets. And once you've pip install datasets, you'll be
to say from datasets import, and you can import a few things. I just, these two things now,
load_dataset, load_dataset_builder. And we're going to look at a dataset called Fashion-MNIST.
And so the way things tend to work with Hugging Face is there's something called the Hugging
Face hub, which has models and it has datasets amongst other things. And generally you'll give
them a name and you can then say, in this case, load a dataset builder for Fashion-MNIST. Now a
dataset builder is just basically something which has some metadata about this dataset. So the
dataset builder has a .info and the .info has a .description. And here's a description of this.
And as you can see, again, we've got 28 by 28 grayscale. So it's going to be very familiar
to us because it's just like MNIST. And again, we've got 10 categories. And again, we've got
60,000 training examples. And again, we've got 10,000 test examples. So this is cool. So as it
says, it's a direct drop-in replacement for MNIST. And so the dataset builder also will tell
us what's in this dataset. And so Hugging Face stuff generally uses dictionaries rather
than tuples. So there's going to be an image of type Image, and there's going to be a label of
type ClassLabel There's 10 classes and these are the names of the classes. So it's quite nice that
in Hugging Face datasets, you know, we can kind of get this information directly. It also tells us
if there are some recommended training test bits, we can find out those as well. So this is the size
of the training split and the number of examples. So now that we're ready to start playing with
it, we can load the dataset. Okay, so this is the difference between load_dataset_builder() versus
load_dataset(). So this will actually download it, cache it, and here it is. And it creates a dataset
dictionary. So a dataset dictionary, if you've used fast.ai, is basically just like what we call
the datasets class. They call the DatasetDict class. So it's a dictionary that contains in
this case, a train and a test item, and those are datasets. These datasets are very much like the
datasets that we created in the previous notebook. So we can now grab the training and test items
from that dictionary and just pop them into variables. And so we can now have a look at the
0 index thing in training. And just like we were promised, it contains an image and a label.
So as you can see, we're not getting tuples anymore. We're getting dictionaries containing
the x and the y, in this case, image and label. So I'm going to get pretty bored writing image and
label in strings all the time. So I'm just going to store them as x and y. So x is going to be the
string ‘image’ and y will be the string ‘label’. I guess the other way I could have done that would have been to say x comma y equals
that. That would probably be a bit neater because it's coming straight from the
features. And if you iterate into a dictionary, you get back its keys. That's why that works.
So anyway, I've done it manually here, which is a bit sad, but there you go. Okay. So
we can now grab the from train zero, which we've already seen. We can grab the x, i.e.
the image, and there it is. There's the image. We could grab the first five images
and the first five labels, for example. And there they are. Now we already
know what the names of the classes are. So we could now see what these map to by grabbing
those features. So there they are. So this is a special Hugging Face class, which most libraries
have something including fast.ai that works like this. There's something called int to string
{int2str}, which is going to take these and convert them to these. So if I call it on our y
batch, you'll see we've got, first is ‘ankle boot’ and there that is indeed an ankle boot. Now we're
going to have a couple of t-shirts and a dress. Okay. So how do we use this to train a model?
Well, we're going to need a DataLoader and we want a DataLoader that for now we're going to do just
like we've done it before. It's going to return, well, actually we're going to do something
a bit different. We're going to have, our collate function is actually going to return
a dictionary. Actually, this is pretty common for Hugging Face stuff. And PyTorch
doesn't mind if you, it's happy for you to return a dictionary from a collation
function. So rather than returning a tuple of the stacked up. Hopefully this looks very
familiar. This looks a lot like the thing that goes through the Dataset for each
one and stacks them up just like we did in the previous notebook. So that's what we're
doing. We're doing all in one step here in our collate function. And then again, exactly the
same thing. Go through our batch, grab the y and this is just stacking them up with the
integers so we don't have to call stack. And so we're now going to have the
image and label bits in our dictionary. So if we create our DataLoader
using that collation function, grab one batch. So we can go batch
x dot shape is a 16 by 1 by 28 by 28 and our y of the batch here, here it is. So the
thing to notice here is that we haven't done any transforms or anything or written our own
Dataset class or anything. We're actually putting all the work directly in the collation
functions. This is like a really nice way to skip all of the kind of abstractions
of your framework, if you want to, is you can just do all of your work in
collate functions. So it's going to pass you each item. So you're going to get the batch
directly. You just go through each item. And so here we're saying, okay, grab the x key
from that dictionary, convert it to a tensor and then do that for everything in the batch
and then stack them all together. So this is, yeah, this is like, can be quite a nice way to
do things if you want to do things just very manually without having to think too
much about, you know, a framework, particularly if you're doing really
custom stuff, this can be quite helpful. Having said that, Hugging Face datasets
absolutely lets you avoid doing everything in collate function, which, if we want
to create really simple applications, that's where we're going to eventually want
to head. So we can do this using a transform instead. And so the way we do that is we create
a function. You've got to take our batch. It's going to replace the x in our batch with the
tensor version of each of those PIL images. And I'm not even stacking them or anything.
And then we're going to return that batch. And so Hugging Face datasets has something
called with_transform(), and that's going to take your dataset, your Hugging Face
dataset, and it's going to apply this function to every element. And it doesn't run at all now,
it's going to basically, when, when it, behind the scenes, when it calls dunder getitem, it will
call this function on the fly. So, in other words, this could have data augmentation, which can
be random or whatever, because it's going to be rerun every time you grab an item, it's not cached
or anything like that. So other than that, this dataset has exactly the API, same API as any other
dataset. It has a length, it has a dunder getitem, so you can pass it to a DataLoader. And so PyTorch
already knows how to collate dictionaries of tensors. So we've got a dictionary of tensors
now. So that means we don't need a collate function anymore. I can create a DataLoader from
this without a collate function, as you can see. And so this is given exactly
the same thing as before, but without having to create a custom collate
function. Now, even this is a bit more code than I want, having to return this seems a bit
silly. But the reason I had to do this is because Hugging Face datasets expects the with_transform
function to return the new version of the data. So I wanted to be able to write
it like this, transform in place, and just say the change I want to make and
have it automatically return that. So if I create this function, it's exactly the same
as the previous one, but doesn't have return. How would I turn this into something
which does return the result? So here's an interesting trick. We could take
that function, pass it to another function to create a new function, which is the, a version
of this inplace function that returns the result. And the way I do that is by creating a
function called inplace. It takes a function, it returns a function. The function it
returns is one that calls my original function and then returns the result. So this is the
function. This is a function generating function. And it's modifying an inplace function to become
a function that returns the new version of that data. And so this is a function. This function is
passed to this function, which returns a function. And here it is. So here's the version that Hugging
Face will be able to use. So I can now pass that to with_transform() and it
does exactly the same thing. So this is very, very common in Python. It's so
common that this line of code can be entirely removed and replaced with this little token. If
you have a function and put @ at the start, you can then put that before a function. And what it
says is take this whole function, pass it to this function and replace it with the result. So this
is exactly the same as the combination of this and this. And when we do it this way, this kind
of little syntax sugar is called a decorator. Okay. So there's nothing magic about decorators.
It's literally, literally identical to this. Oh, I guess the only difference is we don't end up with
this unnecessary intermediate underscore version, but the result is exactly the same. And therefore
I can create a transformed Dataset by using this. And there we go. It's all working fine. Yeah, so I mean, none of this is particularly
necessary, but what we're doing is we're just kind of like seeing, you know, the
pieces that we can, we can put in place to make this stuff as easy as possible and
that we don't have to think about things too much. All right. Now with all this, we can
basically make things pretty automatic. And the way we can make things pretty automatic is
we're going to use a cool thing in Python called itemgetter(). And itemgetter is
a function that returns a function. So hopefully you're getting used to this idea now.
This creates a function that gets the a and c items from a dictionary or something that looks
like a dictionary. So here's a dictionary. It contains keys a, b, and c. So this function will
take a dictionary and return the a and c values. And as you can see, it has done exactly
that. I’ll explain why this is useful in a moment. I just wanted to briefly mention
what did I mean when I said something that looks like a dictionary? I mean, this is a
dictionary. Okay. That looks like a dictionary. But Python doesn't care about what type things
actually are. It only cares about what they look like. And remember that when we call something
with square brackets, when we index into something, behind the scenes it's just calling
dunder getitem. So we could create our own class. And its dunder getitem, gets the key. And it's
just going to manually return 1 if k equals a or 2 if k equals b or 3 otherwise. And look, that
class also works just fine with an itemgetter. The reason this is interesting
is because a lot of people write Python as if it's like C++ or Java or
something. They write as if it's this kind of statically typed thing. But I really wanted to
point out that it's an extremely dynamic language and there's a lot more flexibility than you might
have realized. Anyway, that's a little aside. So what we can do is think about a batch for
example where we've got these two dictionaries. Okay. So PyTorch comes with a default
collation function called, not surprisingly, default_collate So that's part of PyTorch. And
what default_collate() does with dictionaries is it simply takes the matching keys and then
grabs their values and stacks them together. And so that's why if I call default_collate, a
is now 1, 3, b is now 2, 4. That's actually what happened before when we created this DataLoader is
it used the default collation function, which does that. It also works on things that are tuples, not
dictionaries, which is what most of you would have used before. And what we can do therefore is we
could create something called collate_dict(), which is something which
is going to take a Dataset and it's going to create a itemgetter function for
the features in that Dataset, which in this case is ‘image’ and ‘label’. So this is a function
which will get the ‘image’ and ‘label’ items. And so we're now going to return a function
and that function is simply going to call our itemgetter() on default_collate(). And
what this is going to do is it's going to take a dictionary and collate it into a tuple
just like we did up here. So if we run that, so we're now going to call DataLoader on our
transform dataset, passing in, and remember, this is a function that returns a function.
So it's a collation function for this Dataset and there it is. So now this looks a lot like
what we had in our previous notebook. This is not returning a dictionary, but it's returning
a tuple. So this is a really important idea for, particularly, for working with Hugging Face
datasets is that they tend to do things with dictionaries and most other things in the PyTorch
world tend to work with tuples. So you can just use this now to convert anything that takes, that
returns dictionaries into something that provides tuples by passing it as a collation function
to your DataLoader. So remember, you know, the thing you want to be doing this this week is doing
things like import pdb, pdb.set_trace(), right? Put breakpoints, step through, see exactly
what's happening, you know, not just here, but also even more importantly, doing it inside the
innermost, inner function. So then you can see, what did I do wrong there? Oh,
did I? Set underscore trace. So then we can see exactly
what's going on. Put out b. List the code. And I could step into it. And
look, I'm now inside the default_collate function, which is inside PyTorch. And so I
can now see exactly how that works. There it all is. So it's going to go
through and this code is going to look very familiar because we've implemented all this
ourselves. Because it's being careful to like it works for lots of different types of things,
dictionaries, NumPy arrays, so on and so forth. So the first thing I wanted to do, oh,
actually, something I do want to mention here, this is so useful, we want to be able
to use it in all of our notebooks. So rather than copying and pasting this every
time, it would be really nice to create a Python module that contains this definition. So we've
created a library called nbdev. It's really a whole system called nbdev, which does exactly
that. It creates modules you can use from your notebooks. And the way you do it is you use this
special thing we call comment directives, which is hash pipe. And then hash pipe export. So you put
this at the top of a cell and it says do something special for this cell. What this does is it says
put this into a Python module for me, please. Export it to a Python module. What Python
module is it going to put it in? Well, if you go all the way to the top, you tell it what
default export module to create. So it's going to create a module called datasets. So what I do at
the very end of this module is I've got this line that says import nbdev, nbdev.nbdev_export().
And what that's going to do for me is create a library, a Python library. It's going to have
a datasets.py in it. And we'll see everything that we exported. Here it is. collate_dict
will appear in this for me. And so what that means is now in the future, in my notebooks,
I will be able to import collate_dict from my datasets. Now you might wonder, well, how
does it know to call it miniai? What's miniai? Well, in nbdev, you create a settings.ini file
where you say what the name of your library is. So we're going to be using this quite a lot
now because we're getting to the point where we're starting to implement stuff that didn't
exist before. So previously most of the stuff, or pretty much all the stuff we've created, I've
said like, oh, that already exists in PyTorch. So we don't need it. We just use PyTorch’s. But
we're now getting to a point where we're starting to create stuff that doesn't exist anywhere. We've
created it ourselves. And so therefore we want to be able to use it again. So during the rest of
this course, we're going to be building together a library called miniai That's going to be our
framework, our version of something like fastai. Maybe it's something like what fastai 3 will end
up being. We'll see. So that's what's going on here. So we're going to be using, once I
start using miniai, I'll show you exactly how to install this, but that's what this
export is. And so you might've noticed I also had an export on this in place thing. And
I also had it on my necessary import statements. Okay. We want to be able to see what this dataset
looks like. So I thought it now is a good time to talk a bit about plotting because knowing how
to visualize things well is really important. And again, the idea is we, we're not allowed
to use fastai's plotting library. So we've got to learn how to do everything ourselves. So
here's the basic way to plot an image using matplotlib. So we can create a batch, grab the
x part of it, grab the very first thing in that. And imshow() means show an image. And
here it is. There's our ankle boot. So let's start to think about what stuff we
might create, which we can export to make this a bit easier. So let's create something
called show_image(), which basically does imshow(), but we're going
to do a few extra things. We will make sure that it's in the correct
access order. We will make sure it's not on CUDA that's on the CPU. If it's not a NumPy
array, we'll convert it to a NumPy array. We'll be able to pass in an existing axis,
which we'll talk about soon. If we want to, we'll be able to set a title if we want to.
Amd also, this thing here removes all this ugly 05 blah blah blah axis because we're
showing an image. We don't want any of that. So if we try that, you can see, there we go. We
also been able to say what size we want the image. There it all is. Now here's something
interesting. When I say help, the help shows the things that I implemented,
but it also shows a whole lot more things. How did that magic thing happen? And you
can see they work because here's figsize, which I didn't add. Oh, sorry. I did add.
Well, okay. That's a bad example. Anyway, these other ones all work as well. So how did
that happen? Well, the trick is that I added **kwargs here and **kwargs says, grab, you
can pass it as many or any other arguments as you like that aren't listed. And they'll
all be put into a dictionary with this name. And then, when I call imshow() I pass that entire
dictionary ** here means “as separate arguments”. And that's how come it works. And then
how come does it know, how come it knows what help to provide? The reason why is that
fastcore has a special thing called delegates, which is a decorator. So now you know
what a decorator is and you tell it, what is it that you're going to be passing kwargs
to? I'm going to be passing it to imshow(), and then it automatically creates the documentation
correctly to show you what kwargs can do. So this is a really helpful way of being able to
kind of extend existing functions like imshow and still get all of their functionality and
all of their documentation and add your own. So delegates is one of the most useful
things we have in fastcore, in my opinion. So we're going to export that. So now we can use
show_image() anytime we want, which is nice. Something that's really helpful to
know about matplotlib is how to create subplots. So for example, what happens if you
want to plot two images next to each other? So in matplotlib subplots creates multiple
plots and you pass it number of rows and the number of columns. So this here has,
as you see, one row and two columns. And it returns axes. Now what it calls axes
is what it refers to as the individual plots. So if we now call show_image() on
the first image, passing in axs[0], it's going to get that here, right? Then we
call ax.imshow(). That means put the image on this subplot. They don't call it a subplot,
unfortunately, they call it an axis, put it on this axis. So that's how come we're able to
show an image, one image on the first axis, and then show a second image on the second axis by
which we mean subplot. And there's our two images. So that's pretty handy. So I've decided to add
some additional functionality to subplots. So therefore I use delegates on subplots() because
I'm adding functionality to it. And I'm going to be taking kwargs and passing it through to
subplots(). And the main thing I wanted to do is to automatically create an appropriate figure
size by just finding out, you tell us what image size you want. And I also want to be able to
add a title for the whole set of subplots. And so there it is. And then I also want
to show you that it'll automatically, if we want to, create documentation for us
as well, for our library. And here is the documentation. So as you can see here, for the
stuff I've added, it's telling me exactly what each of these parameters are, their type,
their defaults, and information about each one. And that information is automatically coming from
these little comments. We call these documents. This is all automatic stuff done by fastcore and
nbdev. And so you might've noticed when you look at fastai library documentation, it always has
all this info. So that's why. You don't actually have to call show_doc(), it automatically added to
your documentation for you. I'm just showing you here what it's going to end up looking like. And
you can see that it's worked with delegates. It's put all the extra stuff from delegates in here
as well. And here they are all listed out here as well. So anyway, subplots. So let's create
a 3 by 3 set of plots and we'll grab the first eight images. And so now we can go through each
of the subplots. Now it returns it as a 3 by 3, basically a list of 3 lists of 3 items. So
I flattened them all out into a single list. So we'll go through each of those subplots and
go through each image and show each image on each axis. And so here's a quick way to quickly
show them all. As you can see, it's a little bit ugly here, so we'll keep on adding more useful
plotting functionality. So here's something that, again, it calls our subplots delegates to it.
But we're going to be able to say, for example, how many subplots do we want? And it'll
automatically calculate the rows and the columns. And it's going to remove the axes for any ones
that we're not actually using. And so here we got that. So that's what get_grid()'s going to let us
do. So we're getting quite close. And so, finally, why don't we just create a single thing called
show_images() that's going to get our grid. And it's going to go through our images optionally
with a list of titles and show each one. And we can use that here. You can see we have
successfully got all of our labeled images. And so we, yeah, I think all this stuff for the
plotting is pretty useful. So as you might've noticed, they were all exported. So in our
datasets.py, we've got our get_grid(), we've got our subplots, we've got our show_images().
So that's going to make life easier for us now, since we have to create everything from
scratch, we have created all of those things. So as I mentioned at the very end,
we have this one line of code to run. And so just to show you, if I remove miniai dot datasets… miniai slash datasets.py, so
it's all empty. And then I run this line of code. And now it's back, as you can see, and it
tells you it's auto generated. All right. So we are nearly at the point where we can build
our learner. And once we've built our learner, we're going to be able to really dive deep into
training and studying models. So we've kind of got, nearly got all of our infrastructure in
place. Before we do, there's some pieces of Python, which not everybody knows, and I want
to kind of talk about and kind of computer science concepts I want to talk about.
So that's what 06_foundations is about. So this whole section is just going to tell it,
just going to talk about some stuff in Python that you may not have come across before. Or maybe
it's a review for some of you as well. And it's all stuff we're going to be using basically in the
next notebook. So that's why I wanted to cover it. So we're going to be creating a learner class.
So a learner class is going to be a very general purpose training loop, which we can get to do
anything that we want it to do. And we're going to be creating things called callbacks to make
that happen. And so therefore we're going to just spend a few moments talking about what are
callbacks, how are they used in computer science, how are they implemented, look at some examples.
They come up a lot. That's the most common place that you see callbacks in software is for GUI
events. So for events from some graphical user interface. So the main graphical user interface
library in Jupyter Notebooks is called ipywidgets. And we can create a widget like a button, like
so. And when we display it, it shows me a button. And at the moment it doesn't
do anything if I click on it. What we can do though, is we can
add an on_click() callback to it, which is something which is a fun,
we're going to pass it a function, which is called when you click it. So let's
define that function. So I'm going to say w.on_click(f) is going to assign the f function
to the on click callback. Now if I click this, there you go, it's doing it. Now what does that
mean? Well, a callback is simply a callable that you've provided. So remember a callable is a more
general version of a function. So in this case, it is a function that you've provided that will
be called back to when something happens. So in this case, there's something that's happening is
that they're clicking a button. So this is how we are defining and using a callback as a GUI event.
So basically everything in ipywidgets, if you want to create your own graphical user interfaces
for Jupyter, you can do it with ipywidgets and by using these callbacks. So these particular
kinds of callbacks are called events, but it's just a callback. All right, so that's somebody
else's callback. Let's create our own callback. So let's say we've got some very slow calculation.
And so it takes a very long time to add up the numbers zero to five squared because
we sleep for a second after each one. So let's run our slow calculations. Still
running. Oh, how's it going? Come on, finish our calculation. There we go. The answer
is 30. Now for a slow calculation like that, such as training a model, it's a slow calculation.
It would be nice to do things like, I don't know, print out the loss from time to time
or show a progress bar or whatever. So generally for those kinds of things, we would
like to define a callback that is called at the end of each epoch or batch or every few seconds or
something like that. So here's how we can modify our slow calculation routine such that you can
optionally pass at a callback. And so all of these codes are the same, except we've added this
one line of code that says, if there's a callback, then call it and pass in where we're up to. So
then we could create our callback function. So this is just like we created a full callback
function f(), let's create a show_progress() callback function. That's going to tell us how far
we've got. So now if we call show slow calculation passing in our callback, you can see it's going
to call this function at the end of each step. So here we've created our own callback. So
there's nothing special about a callback. It doesn't require its own like syntax. It's not
a new concept. It's just an idea, really, which is the idea of passing in a function, which some
other function will call at particular times, such as at the end of a step or such as when you click
a button. So that's what we mean by callbacks. We don't have to define
the function ahead of time. We could define the function at
the same time that we call the slow calculation by using Lambda. So as we've
discussed before, Lambda just defines a function, but it doesn't give it a name. So here's a
function that takes one parameter and prints out exactly the same thing as before. So here's
the same way as doing it, but using a Lambda. We could make it more sophisticated
now. And rather than always saying, “Awesome! We finished epoch…”, whatever, we
could have let you pass in an exclamation and we print that out. And so in this case, we
could now have our Lambda call that function. And so one of the things that
we can do now is to, again, we can create a function that returns a function. And so we could create a make_show_progress
function where you pass in the exclamation. We could then create, and there's no need to give
it a name actually, it's just return it directly. We can return a function that
calls that exclamation. So here we are passing in nice. And that's exactly the same
as doing something like what we've done before. We could say, instead of using a Lambda,
we can create an inner function like this. So here's now a function that returns a
function. This does exactly the same thing. Okay. So one way with the Lambda,
one way with outer Lambda. One of the reasons I wanted to show you
that is so I can, I don't know about… so many here, is that we can do exactly the
same thing using partial. So with partial, it's going to do exactly the same thing
as this kind of make_show_progress(). It's going to call show_progress() and
pass, okay, I guess. So this is again, an example of a function returning a function.
And so this is a function that calls show progress passing in this as the first parameter.
And again, it does exactly the same thing. Okay. So we tend to use partial a lot. So that's certainly something worth spending
time practicing. Now as we've discussed, Python doesn't care about types in particular.
And there's nothing about any of this that requires cb to be a function. It just has to
be a callable. A callable is something that you can call. And so as we've discussed, another way
of creating a callable is defining dunder call. So here's a class and this is going to work
exactly the same as our make show progress thing, but now as a class. So there's a dunder init,
which stores the exclamation and a dunder call, the prints. And so now we're creating a object
which is callable and does exactly the same thing. Okay. So these are all fundamental ideas that
I want you to get really comfortable with. The idea of dunder call, dunder things in general,
partials, classes, because they come up all the time in PyTorch code and in the code we'll be
writing and, in fact, pretty much all frameworks. So it's really important to feel comfortable
with them. And remember you don't have to rely on the resources we're providing. If there are
certain things here that are very new to you, Google around for some tutorials or ask for
help on the forums, finding things and so forth. And then I'm just going to briefly
recover something I've mentioned before, which is *args and **kwargs,
because again, they come up a lot. I just wanted to show you how they work. So if
we create a function that has *args and **kwargs, nothing else, and I'm just going to
have this function just print them. Now I'm going to call the function. I'm going
to pass 3. I'm going to pass “a” and I'm going to pass thing1=”hello”. Now these are parts, what
we would say, by position. We haven't got a blah equals. They're just stuck there. Things that are
passed by position are placed in *args, if you have one, it doesn't have to be called args. You
can call this anything you like, but in the star bit. And so you can see here that args is a tuple
containing the positionally passed arguments. And then kwags is a dictionary containing the named
arguments. So that is all that *args and **kwargs do. And as I say, there's nothing special about
these names. I'll call this a, I'll call this b. Okay. And it'll do exactly the same
thing. Okay. So this comes up a lot. And so it's important to remember that this
is literally all that they're doing. And then, on the other hand, let's say we had
a function which takes a couple of, okay, let's try that, print a, actually,
we'll just print them directly a, b, c. Okay. We can also, rather than just using them as
parameters, we can also use them when calling something. So let's say I create something called
args, again, it doesn't have to be called args, called, which contains [1, 2]. And I create
something called kwags that contains a dictionary containing {‘c’: 3}. I can then call g()
and I can pass in *args comma **kwargs. And that's going to take this 1, 2,
and pass them as individual arguments, positionally. And it's going to take the {‘c’:
3} and pass that as a named argument, c equals 3. And there it is. Okay. So there are two
linked but different ways that use * and **. Okay. Now here's a slightly different way
of doing callbacks, which I really like. In this case, I've now passing in
a callback that's not callable, but instead it's going to have a method called
before_calc and another method called after_calc. And I'm, so now my callback is going to be a class
containing a before_calc and an afte_calc method. And so if I run that, you can see it's… that there
it goes. Okay. And so this is printing before and after every step by calling before_calc() and
after_calc(). So callback actually doesn't have to be a callable. It doesn't have to be a function. A
callback could be something that contains methods. So we could have a version of this,
which actually, as you can see here, it's going to pass in to after_calc(), both
the epoch number and the value it's up to, but by using *args and **kwags, I can just
safely ignore them if I don't want them. Right. So it's just going to chew them up and
not complain. If I didn't have those here, it won't work. See, because
it got passed in val equals and there's nothing here looking for
val equals. And it doesn't like that. So this is one good use of *args and **kwags
is to eat up arguments you don't want. Or we could use the arguments. So let's
actually use epoch and val and print them out. And there it is. So this is a more sophisticated
callback that's giving us status as we go. Skip this bit because we don't really care about
that. Okay. So finally, let's just review this idea of dunder, which we've mentioned before,
but just to really nail this home, anything that looks like this, underscore underscore something
underscore underscore something is special. And basically it could be that Python has to find that
special thing or PyTorch has to find that special thing or NumPy has to find that special thing, but
they're special. These are called dunder methods. And some of them are defined as part of the
Python data model. And so if you go to the Python documentation, it'll tell you about these various
different— here's __repr__, which we used earlier. Here's __init__ that we used earlier. So
they're all here. PyTorch has some of its own, NumPy has some of its own. So for example, if
Python sees plus (+), what it actually does is it calls dunder add. So if we want to create
something that's not very good at adding things, it actually always adds 0.01 to it. Then I can say SloppyAdder(1) + SloppyAdder(2)
equals 3.01. So “+” here is actually calling dunder add. So if you're not familiar with
these, click on this data model link and read about these specific one, two, three, four,
five, six, seven, eight, nine, ten, eleven methods, because we'll be using all of these
in the course. So I'll try to revise them when we can, but I'm generally going to assume that
you know these. A particularly interesting one is getattr. We've seen setattr already. getattr
is just the opposite. Take a look at this. Here's a class. It just contains two attributes,
a and b, that are set to 1 and 2. So I'll create an object of that class a.b equals 2, because I
set b to 2. Okay. Now when you say a.b, that's just syntax sugar basically, in Python. What it's
actually calling behind the scenes is getattr. It calls getattr on the object. And so this
one here is the same as getattr(a, ‘b’), which hopefully, oh, actually that'll be, yeah,
so it calls getattr(a, ‘b’). And this can kind of be fun because you could call getattr a, and then
either ‘b’ or ‘a’ randomly. How's that for crazy? So if I run this, 2, 1, 1, 1, 2, as you can see,
it's random. So yeah, Python is such a dynamic language. You can even set it up so you literally
don't know what attributes are going to be called. Now getattr, behind the scenes, is actually
calling something called dunder getattr. And by default, it'll use the version in the object base
class. So here's something just like a, it's got a and b defined, but I've also got dunder getattr
defined. And so dunder getattr, it's only called for stuff that hasn't been defined yet, and it'll
pass in the key or the name of the attribute. So generally speaking, if the first character
is an underscore, it's going to be private or special. So I've just got to raise an
attribute error. Otherwise I'm going to steal it and return f‘Hello from {k}’. So if I
go b.a, that's defined. So it gives me 1. If I go b.foo, that's not defined. So it calls getAtra and
I get back hello from foo. And so, this gets used a lot in both fastai code and also a Hugging Face
code to often make it more convenient to access things. So that's, yeah, that's how the getattr
function and the dunder getattr method work. Okay. So I went over that pretty quickly. Since
I know for quite a few folks, this will be all review, but I know for folks who haven't seen
any of this, this is a lot to cover. So I'm hoping that you'll kind of go back over this,
revise it slowly, experiment with it and look up some additional resources and ask on the forum
and stuff for anything that's not clear. Remember, everybody has parts of the course that's really
easy for them and parts of the course that are completely unfamiliar for them. And so
if this particular part of the course is completely unfamiliar to you, it's not because
this is harder or going to be more difficult or whatever. It's just so happens that this is
a bit that you're less familiar with, or maybe the stuff about calculus in the last lesson was
a bit that you're less familiar with. There isn't really anything particularly in the course that's
more difficult than other parts. It's just that, you know, based on whether you happen to have
that background. And so, yeah, if you spend a few hours studying and practicing, you know,
you'll be able to pick up these things. And yeah, so don't stress if there are things that you don't
get right away. Just take the time. And if you, yeah, if you do get lost, please ask because
people are very keen to help. If you've tried asking on the forum, hopefully you've
noticed that people are really keen to help. All right. So, I think this has been a pretty
successful lesson. We've got to a point where we've got a pretty nicely optimized training
loop. We understand exactly what DataLoaders and Datasets do. We've got an optimizer. We've
been playing with Hugging Face datasets. And we've got those working really smoothly. So we
really feel like we're in a pretty good position to write our generic learner training loop and
then we can start building and experimenting with lots of models. So look forward to seeing you
next time to doing that together. Okay. Bye.
Get free YouTube transcripts with timestamps, translation, and download options.
Transcript content is sourced from YouTube's auto-generated captions or AI transcription. All video content belongs to the original creators. Terms of Service · DMCA Contact