Lesson 14: Deep Learning Foundations to Stable Diffusion ...

Okay, hi everybody and welcome to Lesson 14.

The numbers are getting up pretty high now, huh? We had a lesson last time talking about

calculus and how we implement the chain rule in neural network training in an efficient way

called backpropagation. I just wanted to point out that one excellent student, Kaushik Sinha,

has produced a very nice explanation of the code that we looked at last time and I've linked

to it. So it's got the math and then the code. The code's slightly different to what I had, but

it's basically the same thing, some minor changes. And it might be helpful to kind of link between

the math and the code to see what's going on. So you'll find that in the Lesson 13

resources. But I thought I'd just quickly try to explain it as well. So maybe I could try to

copy this and just explain what's going on here. With this code. So the basic idea is that we

have a neural network that is calculating, well, a neural network and a loss function that

together they calculate a loss. So let's imagine, let’s just call the loss

function, we'll call it L. And the loss function is being applied to the

output of the neural network. So the neural network function we'll call n. And that takes two

things, a bunch of weights and a bunch of inputs. The loss function also requires the targets, but

I'm just going to ignore that for now because it's not really part of what we actually care

about. And what we're interested in knowing is if we want to be able to update the weights,

let's say this is just a single layer things, keep it simple. If we want to be able to update

the weights, we need to know how does the loss change if we change the weights, if we

change one weight at a time, if you like. So how would we calculate that? Well, what we

could do is we could rewrite our loss function by saying, well, let's call capital N the result

of the neural network applied to the weights and the inputs. And that way we can now

rewrite the loss function to say L equals, big L equals, little l, the loss function

applied to the output of the neural network. And so maybe you can see where this is going.

We can now say, okay, the derivative of the loss with respect to the weights is going

to be equal to the derivative of the loss with respect to the outputs

of that neural network layer times, this is the chain rule, the derivative

of the outputs of that neural network layer. I'm going to get my notation consistent since

these are not scalar with respect to the weights. So you can see we can get rid of those and we

end up with the change in loss with respect to the weights. And so we can just say this is

a chain rule. This is what the chain rule is. So the change in the loss with respect

to the output of the neural network, well, we did the forward pass here and

then we took here, this here is where we calculated the derivative of the loss with

respect to the output of the neural network, which came out from here and ended up in diff. So

there it is. So out.g contains this derivative. So then to calculate, let's actually do one

more. We could also say the change in the loss with respect to the inputs, we can do

the same thing with the chain rule times… And so this time we have the

inputs. So here you can see that is this line of code. So that is the change

in the loss with respect to the inputs. That's what input.g means. And it's equal to the

change in the loss with respect to the output. So that's what out.g means. Times… It's

actually matrix times, because we're doing matrix calculus, times this derivative, and

since this is a linear layer we were looking at, this derivative is simply the weights themselves.

And then we have exactly the same thing for w.g, which is the change in the loss,

the derivative of the loss with respect to the weights. And so again, you've got the same

thing. You've got your out.g, and remember we actually showed how we can simplify this into

also a matrix product with a transpose as well. So that's how what's happening in

our code is mapping to the math. So hopefully that's useful, but as I say, do check

out this really nice resource, which has a lot more detail if you're interested in digging

deeper. The other thing I'd say is if you, some people have mentioned that they actually

didn't study this at high school, which is fine. We've provided resources on the forum for

recommending how to learn the basics of derivatives and the chain rule. And so in

particular, I would recommend 3Blue1Brown's essence of calculus series and also Khan

Academy. It's not particularly difficult to learn. It'll only take you a few hours and

then you can, this will make a lot more sense. Or if you did it at high school,

but you've forgotten it, same deal. So don't worry if you found this difficult because

you had forgotten the, or had never learned the basic derivative and chain rule stuff. That's

something that you can pick up now and I would recommend doing so. Okay. So what we then did last

time, which is actually pretty exciting, is we got to a point where we had successfully created

a training loop, which did these four steps. So and the nice thing is that every

single thing here is something that we have implemented from scratch. Now, we didn't

always use our implemented from scratch versions. There's no particular reason to, when we've

re-implemented something that already exists, let's use the version that exists. But every

single thing here, well, I guess not argmax, but that's trivially easy to implement. Every

single thing here, we have implemented ourselves and we successfully trained an MNIST model to

96% accurately recognize handwritten digits. So I think that's super neat. It's, this

is, I mean, this is not a great metric. It's only looking at the training set, in

particular it's only looking at one batch of the training set. Since last time, I've just

refactored a little bit. I've pulled out this report function, which is now just running at

the end of each epoch. And it's just printing out the loss and the accuracy. Just something

I wanted to mention here is hopefully you've seen f-strings before. They're a really helpful

part of Python that lets you pop a variable or an expression inside curly braces in a string and

it'll evaluate it. You might not have seen this colon thing. This is called a format specifier.

And with a format specifier, you can change how things are printed in an f-string. So this is

how I'm printing it to do decimal places. This says a two decimal places floating point number

called loss printed out here, followed by a comma. So I'm not going to show you how to use those

other than to say, yeah, Python f-strings and format specifiers are really helpful. And so if

you haven't used them before, do go look them up, a tutorial of the documentation, because they're

definitely something that you'll probably find useful to know about. Okay. So let's

just rerun all those lines of code. If you're wondering how I just reran

all the cells above where I was, there's a cell here. There's Run All Above.

And it's so helpful that I always make sure there's a keyboard shortcut for that. So you

can see here, I've added a keyboard shortcut QA. So if I type QA, it runs all cells above. If

I type QB, it runs all cells below. And so yeah, stuff that you do a lot, make sure you've got

keyboard shortcuts for them. You don't want to be fiddling around, moving around your mouse

everywhere. You want it to be as easy as thinking. So this is really exciting. We've successfully

built and trained a neural network model from scratch and it works okay. It's a bit clunky.

There's a lot of code. There's features we're missing. So let's start refactoring it. And

so refactoring is all about making it so we have to write less code to do the same work.

And so we're now going to, I'm going to show you something that's part of PyTorch and

then I'm going to show you how to build it. And then you'll see why this is really useful. So

PyTorch has a sub module called nn, torch.nn. And in there, there's something called the Module

class. Now we don't normally use it this way, but I just want to show you how it works. We

can create an instance of it in the usual way where we create instances of classes, and then we

can assign things to, attributes of that module. So for example, let's assign a linear

layer to it. And if we now print out that, you'll see it says, oh, this is a

module containing something called foo, which is a linear layer. But here's something

quite tricky. This module, we can say, show me all of the named children

of that module. And it says, oh, there's one called foo and it's a linear layer.

And we can say, oh, show me all of the parameters of this module. And it says, oh, okay,

sure. There's two of them. There's this four by three tensor, that's the weights. And

there's this four long vector, that's the biases. And so somehow just by creating this module

and assigning this to it, it's automatically tracked what's in this module and what are its

parameters. That's pretty neat. So we're going to see both how and why it does that. I'm just going

to point out, by the way, why did I add list here? If I just said m1.named_children(), it just

prints out generator object, which is not very helpful. And that's because this is a kind of

iterator called a generator. And it's something which is going to only produce the contents

of this when I actually do something with it, such as list them out. So just popping a list

around a generator is one way to run the generator and get its output. So that's a little trick

when you want to look inside a generator. Okay. So now, as I said, we don't normally use

it this way. What we normally do is we create our own class. So for example, we'll create

our own multi-layer perceptron and we inherit it. We inherit from nn.Module. And so then in

dunder init, this is the thing that constructs an object of the class. This is the special magic

method that does that. We'll say, okay, well, how many inputs are there to this multi-layer

perceptron? How many hidden activations and how many output activations are there? So it'd just

be one hidden layer. And then here we can do just like we did up here, where we assigned things as

attributes, we can do that in this constructor. So we'll create an l1 attribute, which is a

linear layer from number in to number hidden. l2 is a linear layer from number hidden to

number out, and we'll also create a ReLU. And so, when we call that module, we

can take the input that we get and run the linear layer and then run the ReLU and

then run the l2. And so I can create one of these, as you see, and I can have a look and see like,

oh, here's the attribute l1. And there it is, like I had, and I can say, print out the model and

the model knows all the stuff that's in it. And I can go through each of the named children and

print out the name and the layer. Now, of course, if you remember, although you can use dunder call,

we actually showed how we can refactor things using forward such that it would automatically

kind of do the things necessary to make all the automatic gradient stuff work correctly. And

so in practice, we're actually not going to do dunder call, we would do forward. So this is an

example of creating a custom PyTorch module. And the key thing to recognize is that it knows

what are all the attributes you added to it. And it also knows what are all the parameters. So

if I go through the parameters and print out their shapes, you can see I've got my linear layers

weights, first linear layer, sorry, second linear layer, my… oh no: first linear layers weights, my

first linear layers biases, second linear layers weights, second linear layers biases. And this 50

is because we set nh, the number of hidden, to 50. So why is that interesting? Well, because

now I don't have to write all this anymore going through layers and having to make

sure that they've all been put into a list. We've just been able to add them as

attributes and they're automatically going to appear as parameters. So we

can just say, go through each parameter and update it based on the gradient

and the learning rate. And furthermore, you can actually just go model.zero_grad()

and it'll zero out all of the gradients. So that's really made our code quite a lot nicer

and quite a lot more flexible, which is cool. So let's check that this still works. There we go. So just to clarify, if I called

report() on this before I ran it, as you would expect, the accuracy is about 8%, well, about

10%, a bit less, and the loss is pretty high. And so after I run this fit(), this model,

the accuracy goes up and the loss goes down. So basically it's all of this

exactly the same as before. The only thing I've changed are these two lines

of code. So that's a really useful refactoring. So how on earth did this happen? How did it know

what the parameters and layers are automatically? It used a trick called dunder setattr, and

we're going to create our own nn.Module now. So if there was no such thing as

nn.Module, here's how we'd build it. And so let's actually build it and also add some

things to it. So in dunder init, we would have to create a dictionary for our named children. This

is going to contain a list, a dictionary of all of the layers. Okay. So just like before,

we'll create a couple of linear layers, right? And then what we're going to do is we're

going to define this special magic thing that Python has called dunder setattr. And this is

called automatically by Python, if you have it, every time you set an attribute such as here or

here. And it's going to be passed the name of the attribute, the key, and the value is the actual

thing on the right hand side of the equal sign. Now, generally speaking, things that start with an

underscore we use for private stuff. So we check that it doesn't start with an underscore. And if

it doesn't start with an underscore, setattr will put this value into the modules dictionary

with this key and then call Python’s… the normal Python’s setattr to make sure it

just actually does the attribute setting. So super() is how you call whatever

is in the super class, the base class. So another useful thing to know about is how does

it do this nifty thing where you can just type the name and it kind of lists out all this information

about it. That's a special thing called dunder repr. So here dunder repr will just have it return

a stringified version of the modules dictionary. And then here we've got parameters(). How did

parameters work? So how did this thing work? Well, we can go through each of those modules,

go through each value. So the values of the modules is all the actual layers and then go

through each of the parameters in each module and yield p. So that's going to create an

iterator, if you remember when we looked at iterators for all the parameters. So let's

try it. So we can create one of these modules and if we just like before loop

through its parameters, there they are. Now I'll just mention something that's optional,

kind of like advanced Python that a lot of people don't know about, which is there's no need to

loop through a list or a generator or I guess say loop through an iterator and yield. There's

actually a shortcut, which is you can just say: yield from and then give it the iterator.

And so with that, we can get this all down to one line of code and it'll do exactly

the same thing. So that's basically saying yield one at a time, everything in here, that's

what yield from does. So there's a cool little advanced Python thing, totally optional, but if

you're interested, I think it can be kind of neat. So we've now learned how to create our own

implementation of nn.Module and therefore we are now allowed to use PyTorch's

nn.Module. So that's good news. So how would we do using the PyTorch nn.Module, how

would we create the model that we started with, which is where we had this self.layers? Because we

want to somehow register all of these all at once. That's not going to happen

based on the code we just wrote. So to do that, let's have a look. We can, so

let's make a list of the layers we want. And so we'll create again a subclass of nn.Module. Make

sure you call the super() classes in it first, and we'll just store the list of layers and

then to tell PyTorch about all those layers, we basically have to loop through them and call add_module() and say what the name of

the module is and what the module is. And again, probably should have used

forward here in the first place. And you can see this has now done exactly the same

thing. Okay. So if you've used a sequential model before, you'll see, or you can see that we're

on the path to creating a sequential model. Okay. So Ganesh has asked an interesting

question, which is what on earth is super calling because we actually, in fact, we

don't even need the parentheses here. We actually don't have a base class. That's

because if you don't put any parentheses or if you put empty parentheses, it's actually

a shortcut for writing that. And so Python has stuff in object, which does, you know,

all the normal objecty things like storing your attributes so that you can get them

back later. So that's what's happening there. Okay. So this is a little bit awkward is to

have to store the list and then enumerate and call add_module(). So now that we've implemented

that from scratch, we can use PyTorch's version, which is they've just got something called

ModuleList that just does that for you. Okay. So if you use ModuleList and pass it

a list of layers, it will just go ahead and register them all those modules for you. So

here's something called SequentialModel. This is just like nn.Sequential now. So if I create it

passing in the layers, there you go. You can see there's my model containing my

module list with my layers. And so, I don't know why I never used

forward for these things. It's silly. I guess it doesn't matter

terribly in this stage, but anywho. Okay. So, call fit() and there we go. Okay. So in forward

here, I just go through each layer and I set the result of that equal to calling that layer on the

previous result and then pass and return it at the end. Now there's a little, another way of doing

this, which I think is kind of fun. It's not, like, shorter or anything at this stage. I just

wanted to show an example of something that you see quite a lot in machine learning code, which

is the use of reduce(). This implementation here is exactly the same as this thing here. So let me explain how it works. What reduce

does. Reduce is a very common kind of, like, fundamental computer science concept: reductions.

This is something that does a reduction and what a reduction is, is it's something that says, start

with, the third parameter, some initial value. So we're going to start with x, the thing with being

passed and then loop through a sequence. So look through each of our layers and then for each

layer call some function. Here is our function. And the function is going to get passed, first

time around, it’ll be passed the initial value and the first thing in your

list. So your first layer and x. So it's just going to call the layer function on

x. The second time around it takes the output of that and passes it in as the first parameter and

passes in the second layer. So then the second time this goes through, it's going to be calling

the second layer on the result of the first layer and so forth. And that's what a reduction

is. And so when you might see reduce(), you'll certainly see it talked about quite a lot

in papers and books, and you might sometimes also see it in code. It's a very general concept. And

so here's how you can implement a sequential model using reduce(). So there's no explicit loop there,

although the loop is still happening internally. So now that we've reimplemented sequential, we

can just go ahead and use PyTorch's version. So there's nn.Sequential. We can pass in our

layers and we can fit, not surprisingly. We can see the model. So yeah, it looks very

similar to the one we built ourselves. All right. So this thing of

looping through parameters and updating our parameters based on gradients

and a learning rate, and then zeroing them is very common. So common that there is something

that does that all for us. And that's called an optimizer. It's the stuff in optim. So let's

create our own optimizer. And as you can see, it's just going to do the two things we just saw.

It's going to go through each of the parameters and update them using the gradient and the

learning rate. And there's also zero grad, which will go through each parameter

and set their gradients to zero. If you use .data, it's just a way of avoiding

having to say torch.no_grad, basically. So in optimizer, we're going to pass it

the parameters that we want to optimize, and we're going to pass it the learning rate.

And we're just going to store them away. And since the parameters might be a generator,

we'll call list() to turn them into a list. So we're going to create our optimizer, pass

it in the model.parameters(), which have been automatically constructed for us by nn.Module.

And so here's our new loop. Now we don't have to do any of the stuff manually. We can just

say opt.step(). So that's going to call this. And opt.zero_grad(). And

that's going to call this. There it is. So we've now built our own SGD

optimizer from scratch. So I think this is really interesting, right? These things which

seem like they must be big and complicated, once we have this nice structure in place,

an SGD optimizer doesn't take much code at all. And so it's all very transparent,

simple, clear. If you're having trouble using complex library code that you've found

elsewhere, this can be a really good approach, is to actually just go all the way back, remove

as many of these abstractions as you can and run everything by hand to see exactly what's

going on. It can be really freeing to see that you can do all this. Anyway, since PyTorch

has this for us in torch.optim, it's got an optim.SGD(). And just like our version, you pass

in the parameters and you pass in the learning rate. So you really see it is just the same.

So let's define something called get_model(). That's going to return the model, the sequential

model and the optimizer for it. So if we go model, opt equals get_model(), and then we can call

the loss function to see where it's starting. And so then we can write our training loop

again. Go through each epoch, go through each starting point for our batches, grab the slice,

slice into our X and Y in the training set, calculate our predictions, calculate our loss,

do the backward pass, do the optimizer step, do the zero gradient and print out how you're

going at the end of each one. And there we go. All right. So let's keep making this

simpler. There's still too much code. So one thing we could do is we could replace

these lines of code with one line of code by using something we'll call the Dataset class. So the

Dataset class is just something that we're going to pass in our independent and dependent variable.

We'll store them away as self.x and self.y. We'll have something. So if you define dunder len,

then that's the thing that allows the len function to work. So the length of the Dataset will just

be the length of the independent variables. And then dunder getitem is the thing that

will be called automatically anytime you use square brackets in Python. So that just

is going to call this function passing in the indices that you want. So when

we grab some items from our Dataset, we're going to return a tuple of the x values and

the y values. So then we'll be able to do this. So let's create a Dataset using

this tiny little tree line class. It's going to be a Dataset containing the

x and y training, and then create another Dataset containing the x and y valid. And those

two datasets we'll call train_ds and valid_ds. So let's check the length of those datasets should

be the same as the length of the x’s and they are. And so now we can do exactly what

we hoped we could do. We can say xb comma yb equals train_ds and pass in some slice. So that's going to give us back our… Check the

shapes are correct. It should be 5 by 28 by 28, 5 by 28 times 28 and the y should just be

five. And so here they are the x’s and the y’s. So that's nice. We've created a Dataset from

scratch. And again, it's not complicated at all. And if you look at the actual PyTorch

source code, this is basically all Dataset do. So let's try it. We call get_model(). And so now

we've replaced our dataset line with this one and per usual, it still runs. And so this is what

I do when I'm writing code is I try to, like, always make sure that my starting code works as

I refactor. And so you can see all the steps. And so somebody reading my code can then see

exactly like, why am I building everything I'm building? How does it all fit in? See that

it still works. And I can also keep it clear in my own head. So I think this is a really

nice way of implementing libraries as well. All right. So now we're going to

replace these two lines of code with this one line of code. So we're going

to create something called a DataLoader and a DataLoader is something that's just going to

do this. Okay. So we need to create an iterator. So an iterator is a class that has a dunder

iter method. When you say “for in” in Python, behind the scenes, it's actually calling dunder

iter to get a special object, which it can then loop through using yield. So it's basically

getting this thing that you can iterate through using yield. So a DataLoader is something

that's going to have a Dataset and a batch size, because we're going to go through the

batches and grab one batch at a time. So we have to store away the Dataset and the batch

size. And so when you, when we call the for loop, it's going to call dunder iter. We're going to

want to do exactly what we saw before, go through the range, just like we did before, and then

yield that bit of the data set. And that's all. So that's a DataLoader. So we can now create

a train DataLoader and a valid DataLoader from our train Dataset and valid Dataset. And so

now we can, if you remember the way you can create one thing out of an iterator, so you don't

need to use a for loop, you can just say iter, and that will also call dunder iter. Next, we'll

just grab one value from it. So here we will run this and you can see we've now just confirmed

we’ve xb is a 50 by 784 and yb, there it is. And then we can check what it looks like. So

let's grab the first element of our X batch, make it 28 by 28. And there it is. So now that

we've got a DataLoader, again, we can grab our model and we can simplify our fit function to just

go for xb, yb in train_dl. So this is getting nice and small, don't you think? And it still works

the same way. Okay. So this is really cool. And now that it's nice and concise,

we can start adding features to it. So one feature I think we should add is that

our training set, each time we go through it, it should be in a different order. It should

be randomized, the order. So instead of always just going through these indexes in order, we

want some way to say, go use random indexes. So the way we can do that is create a class

called Sampler. And what sampler is going to do, I'll show you, is if we create a sampler

without shuffle, without randomizing it, it's going to simply return all

the numbers from zero up to n in order and it'll be an iterator. See, this

is dunder iter. But if I do want it shuffled, then it will randomly shuffle them. So here you

can see I've created a sampler without shuffle. So if I then make an iterator from that and print

a few things from the iterator, you can see it's just printing out the indexes it's going to

want. Or I can do exactly the same thing as we learned earlier in the course using islice. We

can grab the first five. So here's the first five things from a sampler when it's not shuffled.

So as you can see, these are just indexes. So we could add shuffle equals true. And now

that's going to call random.shuffle(), which just randomly permuts them. And now if I do the same

thing, I've got random indexes of my source data. So why is that useful? Well, what we could now

do is create something called a BatchSampler. And what the BatchSampler is going to do is it's

going to basically do this islice thing for us. So we're going to say, okay, pass in a sampler.

So that's something that generates indices and pass in a batch size. And remember, we've

looked at chunking before. It's going to chunk that iterator by that batch size. And so if I now say, all right, please

take our sampler and create batches of 4. As you can see here, it's creating batches

of four indices at a time. So rather than just looping through them in order, I

can now loop through this BatchSampler. So we're going to change our data loader

so that now it's going to take some BatchSampler. And it's going to loop through the

BatchSampler. That's going to give us indices. And then we're going to get that Dataset item

from that batch for everything in that batch. So that's going to give us a list. And then we

have to stack all of the x’s and all of the y’s together into tensors. So I've created

something here called collate function. And we're going to default that to this little

function here, which is going to grab our batch, pull out the x’s and y’s separately,

and then stack them up into tensors. So this is called our collate function.

Okay. So if you put all that together, we can create a training sampler, which is a batch

sampler over the training set with shuffle true. A validation sampler will be a batch sampler

over the validation set with shuffle false. And so then we can pass that

into this DataLoader class, the training data set and the training sampler

and the collate function, which we don't really need because it's, we're just using the default

one. So I guess we can just get rid of that. And so now here we go. We can do

exactly the same thing as before xb, yb, next, iter. And this time we use the valid

DataLoader, check the shapes. And this is how PyTorch's actual DataLoader works. This is the,

this is all the pieces they have. They have samplers, they have batch samplers, they have

a collation function and they have DataLoaders. So remember that what I want you

to be doing for your homework is experimenting with these carefully to see

exactly what each thing's taking in. Okay. So Piotr is asking on the chat, what is this

collate thing doing? Okay. So collate function, it defaults to collate. What does it do? Well,

let's see, let's go through each of these steps. Okay. So we need, so we've got a batch

sampler, so let's do just the valid sampler. Okay. So the batch sampler, here it is. So we're going to go through each

thing in the batch sampler. So let's just grab one thing from the batch sampler. Okay. So the

output of the batch sampler will be next. It's okay. So here's what the batch sampler

contains. All right. Just the first 50 digits, not surprisingly, because this is our

validation sampler. If we did a training sampler, that would be randomized. There they

are. Okay. And what we then do is we go self.dataset[i] for i in b.

So let's copy that. Copy, paste. And so rather than self.dataset[i],

we'll just say valid_ds[i]. Oh, and it's not i and b it's i

and o that's what we called it. Oh, and we did it for training. Sorry. Training.

Okay. So what it's created here is a list of tuples of tensors, I think. Let's have a

look. So let's have a look. So let's say this, p —whatever. So p[0]. Okay is a tuple. It's got

the x and the y, independent variable. So that's not what we want. What we want is something that

we can loop through. We want to get batches. So what the collation model is going to do, sorry not

collation model, the collate function is going to do is it's going to take all of our x’s and all

of our y’s and collate them into two tensors, one tensor of x’s and one tensor of y’s. So the

way it does that is it first of all calls zip(). So zip is a very, very commonly used Python

function. It's got nothing to do with the compression program zip, but instead what it does

is it effectively allows us to transpose things so that now, as you can see, we've got all of the

second elements or index 1 elements all together and all of the index 0 elements together. And

so then we can stack those all up together and that gives us our y’s for our batch. So that's

what collate does. So the collate function is used an awful lot in PyTorch, increasingly nowadays

where Hugging Face stuff uses it a lot. And so we'll be using it a lot as well. And basically

it's a thing that allows us to customize how the data that we get back from our Dataset, once

it's been kind of generating a list of things from the Dataset, how do we put it together

into a bunch of things that our model can take as inputs? Because that's really what we want

here. So that's what the collation function does. Oh, this is the wrong way around. Like so. This is something that I do so often that

fastcore has a quick little shortcut for it, just called store_attr, store attributes. And so if you

just put that in your dunder init, then you just need one line of code and it does exactly the same

thing. So there's a little shortcut as you see. And so you'll see that quite a bit. All

right. Let's have a seven minute break and see you back here very soon. And we're going

to look at a multi-processing DataLoader, and then we'll have nearly finished

this notebook. All right. See you soon. All right. Let's keep going. So we've seen how

to create a DataLoader and sampling from it. The PyTorch DataLoader works exactly like this,

but it uses a lot more code because it implements multi-processing. And so multi-processing means

that the actual, this thing here, that code, can be run in multiple processes. They can be run

in parallel for multiple items. So this code, for example, might be opening up a JPEG, rotating it,

flipping it, et cetera. Right? So because remember this is just calling the dunder getitem for a

Dataset. So that could be doing a lot of work for each item and we're doing it for every item in the

batch. So we'd love to do those all in parallel. So I'll show you a very quick and dirty

way that basically does the job. So Python has a multi-processing library. It doesn't

work particularly well with PyTorch tensors. So PyTorch has created an exact re-implementation of

it. So it's identical API wise, but it does work well with tensors. So this is basically what

has grabbed the multi-processing. So this is not quite cheating because multi-processing isn't

the standard library and this is API equivalent. So I'm going to say, we're allowed to do that.

So as we've discussed, you know, when we call square brackets on a class, it's actually

identical to calling the dunder getitem function on the object. So you can see here, if

we say, give me items 3, 6, 8, and 1, it's the same as calling dunder

getitem passing in 3, 6, 8, and 1. Now why does this matter? Well, I'll show you why.

It matters because we're going to be able to use map and I'll explain why we want to use map in

a moment. Map is a really important concept. You might've heard of map-reduce. So we've already

talked about reductions and what those are. Maps are kind of the other key piece. Map is something

which takes a sequence and calls a function on every element of that sequence. So imagine we had

a couple of batches of indices, 3 and 6 and 8 and 1. Then we're going to call dunder getitem

on each of those batches. So that's what map does. Map calls this function on every element

of the sequence. And so that's going to give us the same stuff, but now this same as this, but now

batched into two batches. Now why do we want to do that? Because multiprocessing has something called

Pool where you can tell it how many workers do you want to run, how many processes you want to run.

And it then has a map which works just like the normal Python map, but it runs this function

in parallel over the items from this iterator. So this is how we can create a multiprocessing

DataLoader. So here we're creating our DataLoader. And again, we don't actually need to pass in the

collate function because we're using the default one. So if we say n_workers equals 2 and then

create that, if we say next, see how it's taking a moment and it took a moment because it was

firing off those two workers in the background. So the first batch actually comes out more slowly.

But the reason that we would use a multiprocessing DataLoader is if this is doing a lot of work, we

want it to run in parallel. And even though the first item might come out a bit slower,

once those processes are fired up, it's going to be faster to run. So this is a really

simplified multiprocessing DataLoader. Because this needs to be super, super efficient,

PyTorch has lots more code than this to make it much more efficient. But the idea is this,

and this is actually a perfectly good way of experimenting or building your own DataLoader

to make things work exactly how you want. So now that we've re-implemented all this from

PyTorch, let's just grab PyTorch’s. As you can see, they're exactly the same DataL oader. They

don't have one thing called sampler that you pass shuffle to. They have two separate classes

called SequentialSampler and RandomSampler. I don't know why they do it that way. It's a

little bit more work to me, but same idea. And they've got BatchSampler. And so it's exactly the

same idea. The training sampler is a BatchSampler with a RandomSampler. The validation sampler

is a BatchSampler with a SequentialSampler. Pass them in batch sizes. And so we can now pass

those samplers to the DataLoader. This is now the PyTorch’s DataLoader. And just like ours, it

also takes a collate function. And it works. Cool. So that's, as you can see, it's doing exactly

the same stuff that ours is doing with exactly the same API. And it's got some shortcuts, as I'm

sure you've noticed when you've used DataLoaders. So for example, calling batch sampler is going

to be very, very common. So you can actually just pass the batch size directly to a DataLoader, and

it will then auto-create the batch samplers for you. So you don't have to pass in BatchSampler at

all. Instead you can just say sampler, and it will automatically wrap that in the batch sampler

for you. So it does exactly the same thing. And in fact, because it's so common to create

a RandomSampler or a SequentialSampler for a Dataset, you don't have to do that manually.

You can just pass in shuffle equals true or shuffle equals false to the DataLoader. And

that does, again, exactly the same thing. There it is. Now something that is very

interesting is that, when you think about it, the batch sampler and the collation function

are things which are taking the result of the sampler, looping through them,

and then collating them together. But what we could do is, actually, because our Datasets know how to grab multiple

indices at once, we can actually just use the BatchSampler as a sampler. We don't

actually have to loop through them and collate them because they're basically instantly,

they come pre-collated. So this is a trick which actually Hugging Face stuff can use as well, and

we'll be seeing it again. So this is an important thing to understand is how come we can pass a

BatchSampler to sampler and what's it doing? And so rather than trying to look through the

PyTorch code, I suggest going back to our non-multi-processing pure Python code to see

exactly how that would work. Because it's a really nifty trick for things that you can grab multiple

things from at once and it can save a whole lot of time. It can make your code a lot faster. Okay.

So now that we've got all that nicely implemented, we should now add a validation set. And there's

not really too much to talk about here. We'll just take our fit function, and this is

exactly the same code that we had before. And then we're just going to add something

which goes through the validation set and gets the predictions and sums up the losses

and accuracies and from time to time prints out the loss and accuracy. And so get_dls(), we will

implement by using the PyTorch DataLoader now. And so now our whole process will be get_dls()

passing in the training and validation dataset. Notice that for our validation DataLoader, I'm

doubling the batch size because it doesn't have to do back propagation. So it should use about half

as much memory so I can use a bigger batch size. Get our model and then call this

fit. And now it's printing out the loss and accuracy on the validation set.

So finally we actually know how we're doing, which is that we're getting 97% accuracy on the

validation set. And that's on the whole thing, not just on the last batch. So that's cool. We've now

implemented a proper, working, sensible training loop. It's still, you know, a bit more code

than I would like, but it's not bad. And every line of code in there and every line of code it's

calling is all stuff that we have built ourselves, re-implemented ourselves. So we know exactly

what's going on and that means it's going to be much easier for us to create anything we can think

of. We don't have to rely on other people's code. So hopefully you're as excited about that as I

am. Cause it really opens up a whole world for us. So one thing that we're going to want to be able

to do now that we've got a training loop is to grab data. And there's a really fantastic library

of datasets available on Hugging Face nowadays. And so let's look at how we use those datasets

now that we know how to bring things into data loaders and stuff so that now we can use

the entire world of Hugging Face datasets with our code. So we're going to,

so you need to pip install datasets. And once you've pip install datasets, you'll be

to say from datasets import, and you can import a few things. I just, these two things now,

load_dataset, load_dataset_builder. And we're going to look at a dataset called Fashion-MNIST.

And so the way things tend to work with Hugging Face is there's something called the Hugging

Face hub, which has models and it has datasets amongst other things. And generally you'll give

them a name and you can then say, in this case, load a dataset builder for Fashion-MNIST. Now a

dataset builder is just basically something which has some metadata about this dataset. So the

dataset builder has a .info and the .info has a .description. And here's a description of this.

And as you can see, again, we've got 28 by 28 grayscale. So it's going to be very familiar

to us because it's just like MNIST. And again, we've got 10 categories. And again, we've got

60,000 training examples. And again, we've got 10,000 test examples. So this is cool. So as it

says, it's a direct drop-in replacement for MNIST. And so the dataset builder also will tell

us what's in this dataset. And so Hugging Face stuff generally uses dictionaries rather

than tuples. So there's going to be an image of type Image, and there's going to be a label of

type ClassLabel There's 10 classes and these are the names of the classes. So it's quite nice that

in Hugging Face datasets, you know, we can kind of get this information directly. It also tells us

if there are some recommended training test bits, we can find out those as well. So this is the size

of the training split and the number of examples. So now that we're ready to start playing with

it, we can load the dataset. Okay, so this is the difference between load_dataset_builder() versus

load_dataset(). So this will actually download it, cache it, and here it is. And it creates a dataset

dictionary. So a dataset dictionary, if you've used fast.ai, is basically just like what we call

the datasets class. They call the DatasetDict class. So it's a dictionary that contains in

this case, a train and a test item, and those are datasets. These datasets are very much like the

datasets that we created in the previous notebook. So we can now grab the training and test items

from that dictionary and just pop them into variables. And so we can now have a look at the

0 index thing in training. And just like we were promised, it contains an image and a label.

So as you can see, we're not getting tuples anymore. We're getting dictionaries containing

the x and the y, in this case, image and label. So I'm going to get pretty bored writing image and

label in strings all the time. So I'm just going to store them as x and y. So x is going to be the

string ‘image’ and y will be the string ‘label’. I guess the other way I could have done that would have been to say x comma y equals

that. That would probably be a bit neater because it's coming straight from the

features. And if you iterate into a dictionary, you get back its keys. That's why that works.

So anyway, I've done it manually here, which is a bit sad, but there you go. Okay. So

we can now grab the from train zero, which we've already seen. We can grab the x, i.e.

the image, and there it is. There's the image. We could grab the first five images

and the first five labels, for example. And there they are. Now we already

know what the names of the classes are. So we could now see what these map to by grabbing

those features. So there they are. So this is a special Hugging Face class, which most libraries

have something including fast.ai that works like this. There's something called int to string

{int2str}, which is going to take these and convert them to these. So if I call it on our y

batch, you'll see we've got, first is ‘ankle boot’ and there that is indeed an ankle boot. Now we're

going to have a couple of t-shirts and a dress. Okay. So how do we use this to train a model?

Well, we're going to need a DataLoader and we want a DataLoader that for now we're going to do just

like we've done it before. It's going to return, well, actually we're going to do something

a bit different. We're going to have, our collate function is actually going to return

a dictionary. Actually, this is pretty common for Hugging Face stuff. And PyTorch

doesn't mind if you, it's happy for you to return a dictionary from a collation

function. So rather than returning a tuple of the stacked up. Hopefully this looks very

familiar. This looks a lot like the thing that goes through the Dataset for each

one and stacks them up just like we did in the previous notebook. So that's what we're

doing. We're doing all in one step here in our collate function. And then again, exactly the

same thing. Go through our batch, grab the y and this is just stacking them up with the

integers so we don't have to call stack. And so we're now going to have the

image and label bits in our dictionary. So if we create our DataLoader

using that collation function, grab one batch. So we can go batch

x dot shape is a 16 by 1 by 28 by 28 and our y of the batch here, here it is. So the

thing to notice here is that we haven't done any transforms or anything or written our own

Dataset class or anything. We're actually putting all the work directly in the collation

functions. This is like a really nice way to skip all of the kind of abstractions

of your framework, if you want to, is you can just do all of your work in

collate functions. So it's going to pass you each item. So you're going to get the batch

directly. You just go through each item. And so here we're saying, okay, grab the x key

from that dictionary, convert it to a tensor and then do that for everything in the batch

and then stack them all together. So this is, yeah, this is like, can be quite a nice way to

do things if you want to do things just very manually without having to think too

much about, you know, a framework, particularly if you're doing really

custom stuff, this can be quite helpful. Having said that, Hugging Face datasets

absolutely lets you avoid doing everything in collate function, which, if we want

to create really simple applications, that's where we're going to eventually want

to head. So we can do this using a transform instead. And so the way we do that is we create

a function. You've got to take our batch. It's going to replace the x in our batch with the

tensor version of each of those PIL images. And I'm not even stacking them or anything.

And then we're going to return that batch. And so Hugging Face datasets has something

called with_transform(), and that's going to take your dataset, your Hugging Face

dataset, and it's going to apply this function to every element. And it doesn't run at all now,

it's going to basically, when, when it, behind the scenes, when it calls dunder getitem, it will

call this function on the fly. So, in other words, this could have data augmentation, which can

be random or whatever, because it's going to be rerun every time you grab an item, it's not cached

or anything like that. So other than that, this dataset has exactly the API, same API as any other

dataset. It has a length, it has a dunder getitem, so you can pass it to a DataLoader. And so PyTorch

already knows how to collate dictionaries of tensors. So we've got a dictionary of tensors

now. So that means we don't need a collate function anymore. I can create a DataLoader from

this without a collate function, as you can see. And so this is given exactly

the same thing as before, but without having to create a custom collate

function. Now, even this is a bit more code than I want, having to return this seems a bit

silly. But the reason I had to do this is because Hugging Face datasets expects the with_transform

function to return the new version of the data. So I wanted to be able to write

it like this, transform in place, and just say the change I want to make and

have it automatically return that. So if I create this function, it's exactly the same

as the previous one, but doesn't have return. How would I turn this into something

which does return the result? So here's an interesting trick. We could take

that function, pass it to another function to create a new function, which is the, a version

of this inplace function that returns the result. And the way I do that is by creating a

function called inplace. It takes a function, it returns a function. The function it

returns is one that calls my original function and then returns the result. So this is the

function. This is a function generating function. And it's modifying an inplace function to become

a function that returns the new version of that data. And so this is a function. This function is

passed to this function, which returns a function. And here it is. So here's the version that Hugging

Face will be able to use. So I can now pass that to with_transform() and it

does exactly the same thing. So this is very, very common in Python. It's so

common that this line of code can be entirely removed and replaced with this little token. If

you have a function and put @ at the start, you can then put that before a function. And what it

says is take this whole function, pass it to this function and replace it with the result. So this

is exactly the same as the combination of this and this. And when we do it this way, this kind

of little syntax sugar is called a decorator. Okay. So there's nothing magic about decorators.

It's literally, literally identical to this. Oh, I guess the only difference is we don't end up with

this unnecessary intermediate underscore version, but the result is exactly the same. And therefore

I can create a transformed Dataset by using this. And there we go. It's all working fine. Yeah, so I mean, none of this is particularly

necessary, but what we're doing is we're just kind of like seeing, you know, the

pieces that we can, we can put in place to make this stuff as easy as possible and

that we don't have to think about things too much. All right. Now with all this, we can

basically make things pretty automatic. And the way we can make things pretty automatic is

we're going to use a cool thing in Python called itemgetter(). And itemgetter is

a function that returns a function. So hopefully you're getting used to this idea now.

This creates a function that gets the a and c items from a dictionary or something that looks

like a dictionary. So here's a dictionary. It contains keys a, b, and c. So this function will

take a dictionary and return the a and c values. And as you can see, it has done exactly

that. I’ll explain why this is useful in a moment. I just wanted to briefly mention

what did I mean when I said something that looks like a dictionary? I mean, this is a

dictionary. Okay. That looks like a dictionary. But Python doesn't care about what type things

actually are. It only cares about what they look like. And remember that when we call something

with square brackets, when we index into something, behind the scenes it's just calling

dunder getitem. So we could create our own class. And its dunder getitem, gets the key. And it's

just going to manually return 1 if k equals a or 2 if k equals b or 3 otherwise. And look, that

class also works just fine with an itemgetter. The reason this is interesting

is because a lot of people write Python as if it's like C++ or Java or

something. They write as if it's this kind of statically typed thing. But I really wanted to

point out that it's an extremely dynamic language and there's a lot more flexibility than you might

have realized. Anyway, that's a little aside. So what we can do is think about a batch for

example where we've got these two dictionaries. Okay. So PyTorch comes with a default

collation function called, not surprisingly, default_collate So that's part of PyTorch. And

what default_collate() does with dictionaries is it simply takes the matching keys and then

grabs their values and stacks them together. And so that's why if I call default_collate, a

is now 1, 3, b is now 2, 4. That's actually what happened before when we created this DataLoader is

it used the default collation function, which does that. It also works on things that are tuples, not

dictionaries, which is what most of you would have used before. And what we can do therefore is we

could create something called collate_dict(), which is something which

is going to take a Dataset and it's going to create a itemgetter function for

the features in that Dataset, which in this case is ‘image’ and ‘label’. So this is a function

which will get the ‘image’ and ‘label’ items. And so we're now going to return a function

and that function is simply going to call our itemgetter() on default_collate(). And

what this is going to do is it's going to take a dictionary and collate it into a tuple

just like we did up here. So if we run that, so we're now going to call DataLoader on our

transform dataset, passing in, and remember, this is a function that returns a function.

So it's a collation function for this Dataset and there it is. So now this looks a lot like

what we had in our previous notebook. This is not returning a dictionary, but it's returning

a tuple. So this is a really important idea for, particularly, for working with Hugging Face

datasets is that they tend to do things with dictionaries and most other things in the PyTorch

world tend to work with tuples. So you can just use this now to convert anything that takes, that

returns dictionaries into something that provides tuples by passing it as a collation function

to your DataLoader. So remember, you know, the thing you want to be doing this this week is doing

things like import pdb, pdb.set_trace(), right? Put breakpoints, step through, see exactly

what's happening, you know, not just here, but also even more importantly, doing it inside the

innermost, inner function. So then you can see, what did I do wrong there? Oh,

did I? Set underscore trace. So then we can see exactly

what's going on. Put out b. List the code. And I could step into it. And

look, I'm now inside the default_collate function, which is inside PyTorch. And so I

can now see exactly how that works. There it all is. So it's going to go

through and this code is going to look very familiar because we've implemented all this

ourselves. Because it's being careful to like it works for lots of different types of things,

dictionaries, NumPy arrays, so on and so forth. So the first thing I wanted to do, oh,

actually, something I do want to mention here, this is so useful, we want to be able

to use it in all of our notebooks. So rather than copying and pasting this every

time, it would be really nice to create a Python module that contains this definition. So we've

created a library called nbdev. It's really a whole system called nbdev, which does exactly

that. It creates modules you can use from your notebooks. And the way you do it is you use this

special thing we call comment directives, which is hash pipe. And then hash pipe export. So you put

this at the top of a cell and it says do something special for this cell. What this does is it says

put this into a Python module for me, please. Export it to a Python module. What Python

module is it going to put it in? Well, if you go all the way to the top, you tell it what

default export module to create. So it's going to create a module called datasets. So what I do at

the very end of this module is I've got this line that says import nbdev, nbdev.nbdev_export().

And what that's going to do for me is create a library, a Python library. It's going to have

a datasets.py in it. And we'll see everything that we exported. Here it is. collate_dict

will appear in this for me. And so what that means is now in the future, in my notebooks,

I will be able to import collate_dict from my datasets. Now you might wonder, well, how

does it know to call it miniai? What's miniai? Well, in nbdev, you create a settings.ini file

where you say what the name of your library is. So we're going to be using this quite a lot

now because we're getting to the point where we're starting to implement stuff that didn't

exist before. So previously most of the stuff, or pretty much all the stuff we've created, I've

said like, oh, that already exists in PyTorch. So we don't need it. We just use PyTorch’s. But

we're now getting to a point where we're starting to create stuff that doesn't exist anywhere. We've

created it ourselves. And so therefore we want to be able to use it again. So during the rest of

this course, we're going to be building together a library called miniai That's going to be our

framework, our version of something like fastai. Maybe it's something like what fastai 3 will end

up being. We'll see. So that's what's going on here. So we're going to be using, once I

start using miniai, I'll show you exactly how to install this, but that's what this

export is. And so you might've noticed I also had an export on this in place thing. And

I also had it on my necessary import statements. Okay. We want to be able to see what this dataset

looks like. So I thought it now is a good time to talk a bit about plotting because knowing how

to visualize things well is really important. And again, the idea is we, we're not allowed

to use fastai's plotting library. So we've got to learn how to do everything ourselves. So

here's the basic way to plot an image using matplotlib. So we can create a batch, grab the

x part of it, grab the very first thing in that. And imshow() means show an image. And

here it is. There's our ankle boot. So let's start to think about what stuff we

might create, which we can export to make this a bit easier. So let's create something

called show_image(), which basically does imshow(), but we're going

to do a few extra things. We will make sure that it's in the correct

access order. We will make sure it's not on CUDA that's on the CPU. If it's not a NumPy

array, we'll convert it to a NumPy array. We'll be able to pass in an existing axis,

which we'll talk about soon. If we want to, we'll be able to set a title if we want to.

Amd also, this thing here removes all this ugly 05 blah blah blah axis because we're

showing an image. We don't want any of that. So if we try that, you can see, there we go. We

also been able to say what size we want the image. There it all is. Now here's something

interesting. When I say help, the help shows the things that I implemented,

but it also shows a whole lot more things. How did that magic thing happen? And you

can see they work because here's figsize, which I didn't add. Oh, sorry. I did add.

Well, okay. That's a bad example. Anyway, these other ones all work as well. So how did

that happen? Well, the trick is that I added **kwargs here and **kwargs says, grab, you

can pass it as many or any other arguments as you like that aren't listed. And they'll

all be put into a dictionary with this name. And then, when I call imshow() I pass that entire

dictionary ** here means “as separate arguments”. And that's how come it works. And then

how come does it know, how come it knows what help to provide? The reason why is that

fastcore has a special thing called delegates, which is a decorator. So now you know

what a decorator is and you tell it, what is it that you're going to be passing kwargs

to? I'm going to be passing it to imshow(), and then it automatically creates the documentation

correctly to show you what kwargs can do. So this is a really helpful way of being able to

kind of extend existing functions like imshow and still get all of their functionality and

all of their documentation and add your own. So delegates is one of the most useful

things we have in fastcore, in my opinion. So we're going to export that. So now we can use

show_image() anytime we want, which is nice. Something that's really helpful to

know about matplotlib is how to create subplots. So for example, what happens if you

want to plot two images next to each other? So in matplotlib subplots creates multiple

plots and you pass it number of rows and the number of columns. So this here has,

as you see, one row and two columns. And it returns axes. Now what it calls axes

is what it refers to as the individual plots. So if we now call show_image() on

the first image, passing in axs[0], it's going to get that here, right? Then we

call ax.imshow(). That means put the image on this subplot. They don't call it a subplot,

unfortunately, they call it an axis, put it on this axis. So that's how come we're able to

show an image, one image on the first axis, and then show a second image on the second axis by

which we mean subplot. And there's our two images. So that's pretty handy. So I've decided to add

some additional functionality to subplots. So therefore I use delegates on subplots() because

I'm adding functionality to it. And I'm going to be taking kwargs and passing it through to

subplots(). And the main thing I wanted to do is to automatically create an appropriate figure

size by just finding out, you tell us what image size you want. And I also want to be able to

add a title for the whole set of subplots. And so there it is. And then I also want

to show you that it'll automatically, if we want to, create documentation for us

as well, for our library. And here is the documentation. So as you can see here, for the

stuff I've added, it's telling me exactly what each of these parameters are, their type,

their defaults, and information about each one. And that information is automatically coming from

these little comments. We call these documents. This is all automatic stuff done by fastcore and

nbdev. And so you might've noticed when you look at fastai library documentation, it always has

all this info. So that's why. You don't actually have to call show_doc(), it automatically added to

your documentation for you. I'm just showing you here what it's going to end up looking like. And

you can see that it's worked with delegates. It's put all the extra stuff from delegates in here

as well. And here they are all listed out here as well. So anyway, subplots. So let's create

a 3 by 3 set of plots and we'll grab the first eight images. And so now we can go through each

of the subplots. Now it returns it as a 3 by 3, basically a list of 3 lists of 3 items. So

I flattened them all out into a single list. So we'll go through each of those subplots and

go through each image and show each image on each axis. And so here's a quick way to quickly

show them all. As you can see, it's a little bit ugly here, so we'll keep on adding more useful

plotting functionality. So here's something that, again, it calls our subplots delegates to it.

But we're going to be able to say, for example, how many subplots do we want? And it'll

automatically calculate the rows and the columns. And it's going to remove the axes for any ones

that we're not actually using. And so here we got that. So that's what get_grid()'s going to let us

do. So we're getting quite close. And so, finally, why don't we just create a single thing called

show_images() that's going to get our grid. And it's going to go through our images optionally

with a list of titles and show each one. And we can use that here. You can see we have

successfully got all of our labeled images. And so we, yeah, I think all this stuff for the

plotting is pretty useful. So as you might've noticed, they were all exported. So in our

datasets.py, we've got our get_grid(), we've got our subplots, we've got our show_images().

So that's going to make life easier for us now, since we have to create everything from

scratch, we have created all of those things. So as I mentioned at the very end,

we have this one line of code to run. And so just to show you, if I remove miniai dot datasets… miniai slash datasets.py, so

it's all empty. And then I run this line of code. And now it's back, as you can see, and it

tells you it's auto generated. All right. So we are nearly at the point where we can build

our learner. And once we've built our learner, we're going to be able to really dive deep into

training and studying models. So we've kind of got, nearly got all of our infrastructure in

place. Before we do, there's some pieces of Python, which not everybody knows, and I want

to kind of talk about and kind of computer science concepts I want to talk about.

So that's what 06_foundations is about. So this whole section is just going to tell it,

just going to talk about some stuff in Python that you may not have come across before. Or maybe

it's a review for some of you as well. And it's all stuff we're going to be using basically in the

next notebook. So that's why I wanted to cover it. So we're going to be creating a learner class.

So a learner class is going to be a very general purpose training loop, which we can get to do

anything that we want it to do. And we're going to be creating things called callbacks to make

that happen. And so therefore we're going to just spend a few moments talking about what are

callbacks, how are they used in computer science, how are they implemented, look at some examples.

They come up a lot. That's the most common place that you see callbacks in software is for GUI

events. So for events from some graphical user interface. So the main graphical user interface

library in Jupyter Notebooks is called ipywidgets. And we can create a widget like a button, like

so. And when we display it, it shows me a button. And at the moment it doesn't

do anything if I click on it. What we can do though, is we can

add an on_click() callback to it, which is something which is a fun,

we're going to pass it a function, which is called when you click it. So let's

define that function. So I'm going to say w.on_click(f) is going to assign the f function

to the on click callback. Now if I click this, there you go, it's doing it. Now what does that

mean? Well, a callback is simply a callable that you've provided. So remember a callable is a more

general version of a function. So in this case, it is a function that you've provided that will

be called back to when something happens. So in this case, there's something that's happening is

that they're clicking a button. So this is how we are defining and using a callback as a GUI event.

So basically everything in ipywidgets, if you want to create your own graphical user interfaces

for Jupyter, you can do it with ipywidgets and by using these callbacks. So these particular

kinds of callbacks are called events, but it's just a callback. All right, so that's somebody

else's callback. Let's create our own callback. So let's say we've got some very slow calculation.

And so it takes a very long time to add up the numbers zero to five squared because

we sleep for a second after each one. So let's run our slow calculations. Still

running. Oh, how's it going? Come on, finish our calculation. There we go. The answer

is 30. Now for a slow calculation like that, such as training a model, it's a slow calculation.

It would be nice to do things like, I don't know, print out the loss from time to time

or show a progress bar or whatever. So generally for those kinds of things, we would

like to define a callback that is called at the end of each epoch or batch or every few seconds or

something like that. So here's how we can modify our slow calculation routine such that you can

optionally pass at a callback. And so all of these codes are the same, except we've added this

one line of code that says, if there's a callback, then call it and pass in where we're up to. So

then we could create our callback function. So this is just like we created a full callback

function f(), let's create a show_progress() callback function. That's going to tell us how far

we've got. So now if we call show slow calculation passing in our callback, you can see it's going

to call this function at the end of each step. So here we've created our own callback. So

there's nothing special about a callback. It doesn't require its own like syntax. It's not

a new concept. It's just an idea, really, which is the idea of passing in a function, which some

other function will call at particular times, such as at the end of a step or such as when you click

a button. So that's what we mean by callbacks. We don't have to define

the function ahead of time. We could define the function at

the same time that we call the slow calculation by using Lambda. So as we've

discussed before, Lambda just defines a function, but it doesn't give it a name. So here's a

function that takes one parameter and prints out exactly the same thing as before. So here's

the same way as doing it, but using a Lambda. We could make it more sophisticated

now. And rather than always saying, “Awesome! We finished epoch…”, whatever, we

could have let you pass in an exclamation and we print that out. And so in this case, we

could now have our Lambda call that function. And so one of the things that

we can do now is to, again, we can create a function that returns a function. And so we could create a make_show_progress

function where you pass in the exclamation. We could then create, and there's no need to give

it a name actually, it's just return it directly. We can return a function that

calls that exclamation. So here we are passing in nice. And that's exactly the same

as doing something like what we've done before. We could say, instead of using a Lambda,

we can create an inner function like this. So here's now a function that returns a

function. This does exactly the same thing. Okay. So one way with the Lambda,

one way with outer Lambda. One of the reasons I wanted to show you

that is so I can, I don't know about… so many here, is that we can do exactly the

same thing using partial. So with partial, it's going to do exactly the same thing

as this kind of make_show_progress(). It's going to call show_progress() and

pass, okay, I guess. So this is again, an example of a function returning a function.

And so this is a function that calls show progress passing in this as the first parameter.

And again, it does exactly the same thing. Okay. So we tend to use partial a lot. So that's certainly something worth spending

time practicing. Now as we've discussed, Python doesn't care about types in particular.

And there's nothing about any of this that requires cb to be a function. It just has to

be a callable. A callable is something that you can call. And so as we've discussed, another way

of creating a callable is defining dunder call. So here's a class and this is going to work

exactly the same as our make show progress thing, but now as a class. So there's a dunder init,

which stores the exclamation and a dunder call, the prints. And so now we're creating a object

which is callable and does exactly the same thing. Okay. So these are all fundamental ideas that

I want you to get really comfortable with. The idea of dunder call, dunder things in general,

partials, classes, because they come up all the time in PyTorch code and in the code we'll be

writing and, in fact, pretty much all frameworks. So it's really important to feel comfortable

with them. And remember you don't have to rely on the resources we're providing. If there are

certain things here that are very new to you, Google around for some tutorials or ask for

help on the forums, finding things and so forth. And then I'm just going to briefly

recover something I've mentioned before, which is *args and **kwargs,

because again, they come up a lot. I just wanted to show you how they work. So if

we create a function that has *args and **kwargs, nothing else, and I'm just going to

have this function just print them. Now I'm going to call the function. I'm going

to pass 3. I'm going to pass “a” and I'm going to pass thing1=”hello”. Now these are parts, what

we would say, by position. We haven't got a blah equals. They're just stuck there. Things that are

passed by position are placed in *args, if you have one, it doesn't have to be called args. You

can call this anything you like, but in the star bit. And so you can see here that args is a tuple

containing the positionally passed arguments. And then kwags is a dictionary containing the named

arguments. So that is all that *args and **kwargs do. And as I say, there's nothing special about

these names. I'll call this a, I'll call this b. Okay. And it'll do exactly the same

thing. Okay. So this comes up a lot. And so it's important to remember that this

is literally all that they're doing. And then, on the other hand, let's say we had

a function which takes a couple of, okay, let's try that, print a, actually,

we'll just print them directly a, b, c. Okay. We can also, rather than just using them as

parameters, we can also use them when calling something. So let's say I create something called

args, again, it doesn't have to be called args, called, which contains [1, 2]. And I create

something called kwags that contains a dictionary containing {‘c’: 3}. I can then call g()

and I can pass in *args comma **kwargs. And that's going to take this 1, 2,

and pass them as individual arguments, positionally. And it's going to take the {‘c’:

3} and pass that as a named argument, c equals 3. And there it is. Okay. So there are two

linked but different ways that use * and **. Okay. Now here's a slightly different way

of doing callbacks, which I really like. In this case, I've now passing in

a callback that's not callable, but instead it's going to have a method called

before_calc and another method called after_calc. And I'm, so now my callback is going to be a class

containing a before_calc and an afte_calc method. And so if I run that, you can see it's… that there

it goes. Okay. And so this is printing before and after every step by calling before_calc() and

after_calc(). So callback actually doesn't have to be a callable. It doesn't have to be a function. A

callback could be something that contains methods. So we could have a version of this,

which actually, as you can see here, it's going to pass in to after_calc(), both

the epoch number and the value it's up to, but by using *args and **kwags, I can just

safely ignore them if I don't want them. Right. So it's just going to chew them up and

not complain. If I didn't have those here, it won't work. See, because

it got passed in val equals and there's nothing here looking for

val equals. And it doesn't like that. So this is one good use of *args and **kwags

is to eat up arguments you don't want. Or we could use the arguments. So let's

actually use epoch and val and print them out. And there it is. So this is a more sophisticated

callback that's giving us status as we go. Skip this bit because we don't really care about

that. Okay. So finally, let's just review this idea of dunder, which we've mentioned before,

but just to really nail this home, anything that looks like this, underscore underscore something

underscore underscore something is special. And basically it could be that Python has to find that

special thing or PyTorch has to find that special thing or NumPy has to find that special thing, but

they're special. These are called dunder methods. And some of them are defined as part of the

Python data model. And so if you go to the Python documentation, it'll tell you about these various

different— here's __repr__, which we used earlier. Here's __init__ that we used earlier. So

they're all here. PyTorch has some of its own, NumPy has some of its own. So for example, if

Python sees plus (+), what it actually does is it calls dunder add. So if we want to create

something that's not very good at adding things, it actually always adds 0.01 to it. Then I can say SloppyAdder(1) + SloppyAdder(2)

equals 3.01. So “+” here is actually calling dunder add. So if you're not familiar with

these, click on this data model link and read about these specific one, two, three, four,

five, six, seven, eight, nine, ten, eleven methods, because we'll be using all of these

in the course. So I'll try to revise them when we can, but I'm generally going to assume that

you know these. A particularly interesting one is getattr. We've seen setattr already. getattr

is just the opposite. Take a look at this. Here's a class. It just contains two attributes,

a and b, that are set to 1 and 2. So I'll create an object of that class a.b equals 2, because I

set b to 2. Okay. Now when you say a.b, that's just syntax sugar basically, in Python. What it's

actually calling behind the scenes is getattr. It calls getattr on the object. And so this

one here is the same as getattr(a, ‘b’), which hopefully, oh, actually that'll be, yeah,

so it calls getattr(a, ‘b’). And this can kind of be fun because you could call getattr a, and then

either ‘b’ or ‘a’ randomly. How's that for crazy? So if I run this, 2, 1, 1, 1, 2, as you can see,

it's random. So yeah, Python is such a dynamic language. You can even set it up so you literally

don't know what attributes are going to be called. Now getattr, behind the scenes, is actually

calling something called dunder getattr. And by default, it'll use the version in the object base

class. So here's something just like a, it's got a and b defined, but I've also got dunder getattr

defined. And so dunder getattr, it's only called for stuff that hasn't been defined yet, and it'll

pass in the key or the name of the attribute. So generally speaking, if the first character

is an underscore, it's going to be private or special. So I've just got to raise an

attribute error. Otherwise I'm going to steal it and return f‘Hello from {k}’. So if I

go b.a, that's defined. So it gives me 1. If I go b.foo, that's not defined. So it calls getAtra and

I get back hello from foo. And so, this gets used a lot in both fastai code and also a Hugging Face

code to often make it more convenient to access things. So that's, yeah, that's how the getattr

function and the dunder getattr method work. Okay. So I went over that pretty quickly. Since

I know for quite a few folks, this will be all review, but I know for folks who haven't seen

any of this, this is a lot to cover. So I'm hoping that you'll kind of go back over this,

revise it slowly, experiment with it and look up some additional resources and ask on the forum

and stuff for anything that's not clear. Remember, everybody has parts of the course that's really

easy for them and parts of the course that are completely unfamiliar for them. And so

if this particular part of the course is completely unfamiliar to you, it's not because

this is harder or going to be more difficult or whatever. It's just so happens that this is

a bit that you're less familiar with, or maybe the stuff about calculus in the last lesson was

a bit that you're less familiar with. There isn't really anything particularly in the course that's

more difficult than other parts. It's just that, you know, based on whether you happen to have

that background. And so, yeah, if you spend a few hours studying and practicing, you know,

you'll be able to pick up these things. And yeah, so don't stress if there are things that you don't

get right away. Just take the time. And if you, yeah, if you do get lost, please ask because

people are very keen to help. If you've tried asking on the forum, hopefully you've

noticed that people are really keen to help. All right. So, I think this has been a pretty

successful lesson. We've got to a point where we've got a pretty nicely optimized training

loop. We understand exactly what DataLoaders and Datasets do. We've got an optimizer. We've

been playing with Hugging Face datasets. And we've got those working really smoothly. So we

really feel like we're in a pretty good position to write our generic learner training loop and

then we can start building and experimenting with lots of models. So look forward to seeing you

next time to doing that together. Okay. Bye.

Lesson 14: Deep Learning Foundations to Stable Diffusion

Full Transcript

Need a transcript for another video?