Lesson 14: Deep Learning Foundations to Stable Diffusion

Jeremy Howard15,318 words

Full Transcript

Okay, hi everybody and welcome to Lesson 14. 

The numbers are getting up pretty high now, huh? We had a lesson last time talking about 

calculus and how we implement the chain rule in neural network training in an efficient way 

called backpropagation. I just wanted to point out that one excellent student, Kaushik Sinha, 

has produced a very nice explanation of the code that we looked at last time and I've linked 

to it. So it's got the math and then the code. The code's slightly different to what I had, but 

it's basically the same thing, some minor changes. And it might be helpful to kind of link between 

the math and the code to see what's going on. So you'll find that in the Lesson 13 

resources. But I thought I'd just quickly try to explain it as well. So maybe I could try to 

copy this and just explain what's going on here. With this code. So the basic idea is that we 

have a neural network that is calculating, well, a neural network and a loss function that 

together they calculate a loss. So let's imagine, let’s just call the loss 

function, we'll call it L. And the loss function is being applied to the 

output of the neural network. So the neural network function we'll call n. And that takes two 

things, a bunch of weights and a bunch of inputs. The loss function also requires the targets, but 

I'm just going to ignore that for now because it's not really part of what we actually care 

about. And what we're interested in knowing is if we want to be able to update the weights, 

let's say this is just a single layer things, keep it simple. If we want to be able to update 

the weights, we need to know how does the loss change if we change the weights, if we 

change one weight at a time, if you like. So how would we calculate that? Well, what we 

could do is we could rewrite our loss function by saying, well, let's call capital N the result 

of the neural network applied to the weights and the inputs. And that way we can now 

rewrite the loss function to say L equals, big L equals, little l, the loss function 

applied to the output of the neural network. And so maybe you can see where this is going. 

We can now say, okay, the derivative of the loss with respect to the weights is going 

to be equal to the derivative of the loss with respect to the outputs 

of that neural network layer times, this is the chain rule, the derivative 

of the outputs of that neural network layer. I'm going to get my notation consistent since 

these are not scalar with respect to the weights. So you can see we can get rid of those and we 

end up with the change in loss with respect to the weights. And so we can just say this is 

a chain rule. This is what the chain rule is. So the change in the loss with respect 

to the output of the neural network, well, we did the forward pass here and 

then we took here, this here is where we calculated the derivative of the loss with 

respect to the output of the neural network, which came out from here and ended up in diff. So 

there it is. So out.g contains this derivative. So then to calculate, let's actually do one 

more. We could also say the change in the loss with respect to the inputs, we can do 

the same thing with the chain rule times… And so this time we have the 

inputs. So here you can see that is this line of code. So that is the change 

in the loss with respect to the inputs. That's what input.g means. And it's equal to the 

change in the loss with respect to the output. So that's what out.g means. Times… It's 

actually matrix times, because we're doing matrix calculus, times this derivative, and 

since this is a linear layer we were looking at, this derivative is simply the weights themselves. 

And then we have exactly the same thing for w.g, which is the change in the loss, 

the derivative of the loss with respect to the weights. And so again, you've got the same 

thing. You've got your out.g, and remember we actually showed how we can simplify this into 

also a matrix product with a transpose as well. So that's how what's happening in 

our code is mapping to the math. So hopefully that's useful, but as I say, do check 

out this really nice resource, which has a lot more detail if you're interested in digging 

deeper. The other thing I'd say is if you, some people have mentioned that they actually 

didn't study this at high school, which is fine. We've provided resources on the forum for 

recommending how to learn the basics of derivatives and the chain rule. And so in 

particular, I would recommend 3Blue1Brown's essence of calculus series and also Khan 

Academy. It's not particularly difficult to learn. It'll only take you a few hours and 

then you can, this will make a lot more sense. Or if you did it at high school, 

but you've forgotten it, same deal. So don't worry if you found this difficult because 

you had forgotten the, or had never learned the basic derivative and chain rule stuff. That's 

something that you can pick up now and I would recommend doing so. Okay. So what we then did last 

time, which is actually pretty exciting, is we got to a point where we had successfully created 

a training loop, which did these four steps. So and the nice thing is that every 

single thing here is something that we have implemented from scratch. Now, we didn't 

always use our implemented from scratch versions. There's no particular reason to, when we've 

re-implemented something that already exists, let's use the version that exists. But every 

single thing here, well, I guess not argmax, but that's trivially easy to implement. Every 

single thing here, we have implemented ourselves and we successfully trained an MNIST model to 

96% accurately recognize handwritten digits. So I think that's super neat. It's, this 

is, I mean, this is not a great metric. It's only looking at the training set, in 

particular it's only looking at one batch of the training set. Since last time, I've just 

refactored a little bit. I've pulled out this report function, which is now just running at 

the end of each epoch. And it's just printing out the loss and the accuracy. Just something 

I wanted to mention here is hopefully you've seen f-strings before. They're a really helpful 

part of Python that lets you pop a variable or an expression inside curly braces in a string and 

it'll evaluate it. You might not have seen this colon thing. This is called a format specifier. 

And with a format specifier, you can change how things are printed in an f-string. So this is 

how I'm printing it to do decimal places. This says a two decimal places floating point number 

called loss printed out here, followed by a comma. So I'm not going to show you how to use those 

other than to say, yeah, Python f-strings and format specifiers are really helpful. And so if 

you haven't used them before, do go look them up, a tutorial of the documentation, because they're 

definitely something that you'll probably find useful to know about. Okay. So let's 

just rerun all those lines of code. If you're wondering how I just reran 

all the cells above where I was, there's a cell here. There's Run All Above. 

And it's so helpful that I always make sure there's a keyboard shortcut for that. So you 

can see here, I've added a keyboard shortcut QA. So if I type QA, it runs all cells above. If 

I type QB, it runs all cells below. And so yeah, stuff that you do a lot, make sure you've got 

keyboard shortcuts for them. You don't want to be fiddling around, moving around your mouse 

everywhere. You want it to be as easy as thinking. So this is really exciting. We've successfully 

built and trained a neural network model from scratch and it works okay. It's a bit clunky. 

There's a lot of code. There's features we're missing. So let's start refactoring it. And 

so refactoring is all about making it so we have to write less code to do the same work. 

And so we're now going to, I'm going to show you something that's part of PyTorch and 

then I'm going to show you how to build it. And then you'll see why this is really useful. So 

PyTorch has a sub module called nn, torch.nn. And in there, there's something called the Module 

class. Now we don't normally use it this way, but I just want to show you how it works. We 

can create an instance of it in the usual way where we create instances of classes, and then we 

can assign things to, attributes of that module. So for example, let's assign a linear 

layer to it. And if we now print out that, you'll see it says, oh, this is a 

module containing something called foo, which is a linear layer. But here's something 

quite tricky. This module, we can say, show me all of the named children 

of that module. And it says, oh, there's one called foo and it's a linear layer. 

And we can say, oh, show me all of the parameters of this module. And it says, oh, okay, 

sure. There's two of them. There's this four by three tensor, that's the weights. And 

there's this four long vector, that's the biases. And so somehow just by creating this module 

and assigning this to it, it's automatically tracked what's in this module and what are its 

parameters. That's pretty neat. So we're going to see both how and why it does that. I'm just going 

to point out, by the way, why did I add list here? If I just said m1.named_children(), it just 

prints out generator object, which is not very helpful. And that's because this is a kind of 

iterator called a generator. And it's something which is going to only produce the contents 

of this when I actually do something with it, such as list them out. So just popping a list 

around a generator is one way to run the generator and get its output. So that's a little trick 

when you want to look inside a generator. Okay. So now, as I said, we don't normally use 

it this way. What we normally do is we create our own class. So for example, we'll create 

our own multi-layer perceptron and we inherit it. We inherit from nn.Module. And so then in 

dunder init, this is the thing that constructs an object of the class. This is the special magic 

method that does that. We'll say, okay, well, how many inputs are there to this multi-layer 

perceptron? How many hidden activations and how many output activations are there? So it'd just 

be one hidden layer. And then here we can do just like we did up here, where we assigned things as 

attributes, we can do that in this constructor. So we'll create an l1 attribute, which is a 

linear layer from number in to number hidden. l2 is a linear layer from number hidden to 

number out, and we'll also create a ReLU. And so, when we call that module, we 

can take the input that we get and run the linear layer and then run the ReLU and 

then run the l2. And so I can create one of these, as you see, and I can have a look and see like, 

oh, here's the attribute l1. And there it is, like I had, and I can say, print out the model and 

the model knows all the stuff that's in it. And I can go through each of the named children and 

print out the name and the layer. Now, of course, if you remember, although you can use dunder call, 

we actually showed how we can refactor things using forward such that it would automatically 

kind of do the things necessary to make all the automatic gradient stuff work correctly. And 

so in practice, we're actually not going to do dunder call, we would do forward. So this is an 

example of creating a custom PyTorch module. And the key thing to recognize is that it knows 

what are all the attributes you added to it. And it also knows what are all the parameters. So 

if I go through the parameters and print out their shapes, you can see I've got my linear layers 

weights, first linear layer, sorry, second linear layer, my… oh no: first linear layers weights, my 

first linear layers biases, second linear layers weights, second linear layers biases. And this 50 

is because we set nh, the number of hidden, to 50. So why is that interesting? Well, because 

now I don't have to write all this anymore going through layers and having to make 

sure that they've all been put into a list. We've just been able to add them as 

attributes and they're automatically going to appear as parameters. So we 

can just say, go through each parameter and update it based on the gradient 

and the learning rate. And furthermore, you can actually just go model.zero_grad() 

and it'll zero out all of the gradients. So that's really made our code quite a lot nicer 

and quite a lot more flexible, which is cool. So let's check that this still works. There we go. So just to clarify, if I called 

report() on this before I ran it, as you would expect, the accuracy is about 8%, well, about 

10%, a bit less, and the loss is pretty high. And so after I run this fit(), this model, 

the accuracy goes up and the loss goes down. So basically it's all of this 

exactly the same as before. The only thing I've changed are these two lines 

of code. So that's a really useful refactoring. So how on earth did this happen? How did it know 

what the parameters and layers are automatically? It used a trick called dunder setattr, and 

we're going to create our own nn.Module now. So if there was no such thing as 

nn.Module, here's how we'd build it. And so let's actually build it and also add some 

things to it. So in dunder init, we would have to create a dictionary for our named children. This 

is going to contain a list, a dictionary of all of the layers. Okay. So just like before, 

we'll create a couple of linear layers, right? And then what we're going to do is we're 

going to define this special magic thing that Python has called dunder setattr. And this is 

called automatically by Python, if you have it, every time you set an attribute such as here or 

here. And it's going to be passed the name of the attribute, the key, and the value is the actual 

thing on the right hand side of the equal sign. Now, generally speaking, things that start with an 

underscore we use for private stuff. So we check that it doesn't start with an underscore. And if 

it doesn't start with an underscore, setattr will put this value into the modules dictionary 

with this key and then call Python’s… the normal Python’s setattr to make sure it 

just actually does the attribute setting. So super() is how you call whatever 

is in the super class, the base class. So another useful thing to know about is how does 

it do this nifty thing where you can just type the name and it kind of lists out all this information 

about it. That's a special thing called dunder repr. So here dunder repr will just have it return 

a stringified version of the modules dictionary. And then here we've got parameters(). How did 

parameters work? So how did this thing work? Well, we can go through each of those modules, 

go through each value. So the values of the modules is all the actual layers and then go 

through each of the parameters in each module and yield p. So that's going to create an 

iterator, if you remember when we looked at iterators for all the parameters. So let's 

try it. So we can create one of these modules and if we just like before loop 

through its parameters, there they are. Now I'll just mention something that's optional, 

kind of like advanced Python that a lot of people don't know about, which is there's no need to 

loop through a list or a generator or I guess say loop through an iterator and yield. There's 

actually a shortcut, which is you can just say: yield from and then give it the iterator. 

And so with that, we can get this all down to one line of code and it'll do exactly 

the same thing. So that's basically saying yield one at a time, everything in here, that's 

what yield from does. So there's a cool little advanced Python thing, totally optional, but if 

you're interested, I think it can be kind of neat. So we've now learned how to create our own 

implementation of nn.Module and therefore we are now allowed to use PyTorch's 

nn.Module. So that's good news. So how would we do using the PyTorch nn.Module, how 

would we create the model that we started with, which is where we had this self.layers? Because we 

want to somehow register all of these all at once. That's not going to happen 

based on the code we just wrote. So to do that, let's have a look. We can, so 

let's make a list of the layers we want. And so we'll create again a subclass of nn.Module. Make 

sure you call the super() classes in it first, and we'll just store the list of layers and 

then to tell PyTorch about all those layers, we basically have to loop through them and call add_module() and say what the name of 

the module is and what the module is. And again, probably should have used 

forward here in the first place. And you can see this has now done exactly the same 

thing. Okay. So if you've used a sequential model before, you'll see, or you can see that we're 

on the path to creating a sequential model. Okay. So Ganesh has asked an interesting 

question, which is what on earth is super calling because we actually, in fact, we 

don't even need the parentheses here. We actually don't have a base class. That's 

because if you don't put any parentheses or if you put empty parentheses, it's actually 

a shortcut for writing that. And so Python has stuff in object, which does, you know, 

all the normal objecty things like storing your attributes so that you can get them 

back later. So that's what's happening there. Okay. So this is a little bit awkward is to 

have to store the list and then enumerate and call add_module(). So now that we've implemented 

that from scratch, we can use PyTorch's version, which is they've just got something called 

ModuleList that just does that for you. Okay. So if you use ModuleList and pass it 

a list of layers, it will just go ahead and register them all those modules for you. So 

here's something called SequentialModel. This is just like nn.Sequential now. So if I create it 

passing in the layers, there you go. You can see there's my model containing my 

module list with my layers. And so, I don't know why I never used 

forward for these things. It's silly. I guess it doesn't matter 

terribly in this stage, but anywho. Okay. So, call fit() and there we go. Okay. So in forward 

here, I just go through each layer and I set the result of that equal to calling that layer on the 

previous result and then pass and return it at the end. Now there's a little, another way of doing 

this, which I think is kind of fun. It's not, like, shorter or anything at this stage. I just 

wanted to show an example of something that you see quite a lot in machine learning code, which 

is the use of reduce(). This implementation here is exactly the same as this thing here. So let me explain how it works. What reduce 

does. Reduce is a very common kind of, like, fundamental computer science concept: reductions. 

This is something that does a reduction and what a reduction is, is it's something that says, start 

with, the third parameter, some initial value. So we're going to start with x, the thing with being 

passed and then loop through a sequence. So look through each of our layers and then for each 

layer call some function. Here is our function. And the function is going to get passed, first 

time around, it’ll be passed the initial value and the first thing in your 

list. So your first layer and x. So it's just going to call the layer function on 

x. The second time around it takes the output of that and passes it in as the first parameter and 

passes in the second layer. So then the second time this goes through, it's going to be calling 

the second layer on the result of the first layer and so forth. And that's what a reduction 

is. And so when you might see reduce(), you'll certainly see it talked about quite a lot 

in papers and books, and you might sometimes also see it in code. It's a very general concept. And 

so here's how you can implement a sequential model using reduce(). So there's no explicit loop there, 

although the loop is still happening internally. So now that we've reimplemented sequential, we 

can just go ahead and use PyTorch's version. So there's nn.Sequential. We can pass in our 

layers and we can fit, not surprisingly. We can see the model. So yeah, it looks very 

similar to the one we built ourselves. All right. So this thing of 

looping through parameters and updating our parameters based on gradients 

and a learning rate, and then zeroing them is very common. So common that there is something 

that does that all for us. And that's called an optimizer. It's the stuff in optim. So let's 

create our own optimizer. And as you can see, it's just going to do the two things we just saw. 

It's going to go through each of the parameters and update them using the gradient and the 

learning rate. And there's also zero grad, which will go through each parameter 

and set their gradients to zero. If you use .data, it's just a way of avoiding 

having to say torch.no_grad, basically. So in optimizer, we're going to pass it 

the parameters that we want to optimize, and we're going to pass it the learning rate. 

And we're just going to store them away. And since the parameters might be a generator, 

we'll call list() to turn them into a list. So we're going to create our optimizer, pass 

it in the model.parameters(), which have been automatically constructed for us by nn.Module. 

And so here's our new loop. Now we don't have to do any of the stuff manually. We can just 

say opt.step(). So that's going to call this. And opt.zero_grad(). And 

that's going to call this. There it is. So we've now built our own SGD 

optimizer from scratch. So I think this is really interesting, right? These things which 

seem like they must be big and complicated, once we have this nice structure in place, 

an SGD optimizer doesn't take much code at all. And so it's all very transparent, 

simple, clear. If you're having trouble using complex library code that you've found 

elsewhere, this can be a really good approach, is to actually just go all the way back, remove 

as many of these abstractions as you can and run everything by hand to see exactly what's 

going on. It can be really freeing to see that you can do all this. Anyway, since PyTorch 

has this for us in torch.optim, it's got an optim.SGD(). And just like our version, you pass 

in the parameters and you pass in the learning rate. So you really see it is just the same. 

So let's define something called get_model(). That's going to return the model, the sequential 

model and the optimizer for it. So if we go model, opt equals get_model(), and then we can call 

the loss function to see where it's starting. And so then we can write our training loop 

again. Go through each epoch, go through each starting point for our batches, grab the slice, 

slice into our X and Y in the training set, calculate our predictions, calculate our loss, 

do the backward pass, do the optimizer step, do the zero gradient and print out how you're 

going at the end of each one. And there we go. All right. So let's keep making this 

simpler. There's still too much code. So one thing we could do is we could replace 

these lines of code with one line of code by using something we'll call the Dataset class. So the 

Dataset class is just something that we're going to pass in our independent and dependent variable. 

We'll store them away as self.x and self.y. We'll have something. So if you define dunder len, 

then that's the thing that allows the len function to work. So the length of the Dataset will just 

be the length of the independent variables. And then dunder getitem is the thing that 

will be called automatically anytime you use square brackets in Python. So that just 

is going to call this function passing in the indices that you want. So when 

we grab some items from our Dataset, we're going to return a tuple of the x values and 

the y values. So then we'll be able to do this. So let's create a Dataset using 

this tiny little tree line class. It's going to be a Dataset containing the 

x and y training, and then create another Dataset containing the x and y valid. And those 

two datasets we'll call train_ds and valid_ds. So let's check the length of those datasets should 

be the same as the length of the x’s and they are. And so now we can do exactly what 

we hoped we could do. We can say xb comma yb equals train_ds and pass in some slice. So that's going to give us back our… Check the 

shapes are correct. It should be 5 by 28 by 28, 5 by 28 times 28 and the y should just be 

five. And so here they are the x’s and the y’s. So that's nice. We've created a Dataset from 

scratch. And again, it's not complicated at all. And if you look at the actual PyTorch 

source code, this is basically all Dataset do. So let's try it. We call get_model(). And so now 

we've replaced our dataset line with this one and per usual, it still runs. And so this is what 

I do when I'm writing code is I try to, like, always make sure that my starting code works as 

I refactor. And so you can see all the steps. And so somebody reading my code can then see 

exactly like, why am I building everything I'm building? How does it all fit in? See that 

it still works. And I can also keep it clear in my own head. So I think this is a really 

nice way of implementing libraries as well. All right. So now we're going to 

replace these two lines of code with this one line of code. So we're going 

to create something called a DataLoader and a DataLoader is something that's just going to 

do this. Okay. So we need to create an iterator. So an iterator is a class that has a dunder 

iter method. When you say “for in” in Python, behind the scenes, it's actually calling dunder 

iter to get a special object, which it can then loop through using yield. So it's basically 

getting this thing that you can iterate through using yield. So a DataLoader is something 

that's going to have a Dataset and a batch size, because we're going to go through the 

batches and grab one batch at a time. So we have to store away the Dataset and the batch 

size. And so when you, when we call the for loop, it's going to call dunder iter. We're going to 

want to do exactly what we saw before, go through the range, just like we did before, and then 

yield that bit of the data set. And that's all. So that's a DataLoader. So we can now create 

a train DataLoader and a valid DataLoader from our train Dataset and valid Dataset. And so 

now we can, if you remember the way you can create one thing out of an iterator, so you don't 

need to use a for loop, you can just say iter, and that will also call dunder iter. Next, we'll 

just grab one value from it. So here we will run this and you can see we've now just confirmed 

we’ve xb is a 50 by 784 and yb, there it is. And then we can check what it looks like. So 

let's grab the first element of our X batch, make it 28 by 28. And there it is. So now that 

we've got a DataLoader, again, we can grab our model and we can simplify our fit function to just 

go for xb, yb in train_dl. So this is getting nice and small, don't you think? And it still works 

the same way. Okay. So this is really cool. And now that it's nice and concise, 

we can start adding features to it. So one feature I think we should add is that 

our training set, each time we go through it, it should be in a different order. It should 

be randomized, the order. So instead of always just going through these indexes in order, we 

want some way to say, go use random indexes. So the way we can do that is create a class 

called Sampler. And what sampler is going to do, I'll show you, is if we create a sampler 

without shuffle, without randomizing it, it's going to simply return all 

the numbers from zero up to n in order and it'll be an iterator. See, this 

is dunder iter. But if I do want it shuffled, then it will randomly shuffle them. So here you 

can see I've created a sampler without shuffle. So if I then make an iterator from that and print 

a few things from the iterator, you can see it's just printing out the indexes it's going to 

want. Or I can do exactly the same thing as we learned earlier in the course using islice. We 

can grab the first five. So here's the first five things from a sampler when it's not shuffled. 

So as you can see, these are just indexes. So we could add shuffle equals true. And now 

that's going to call random.shuffle(), which just randomly permuts them. And now if I do the same 

thing, I've got random indexes of my source data. So why is that useful? Well, what we could now 

do is create something called a BatchSampler. And what the BatchSampler is going to do is it's 

going to basically do this islice thing for us. So we're going to say, okay, pass in a sampler. 

So that's something that generates indices and pass in a batch size. And remember, we've 

looked at chunking before. It's going to chunk that iterator by that batch size. And so if I now say, all right, please 

take our sampler and create batches of 4. As you can see here, it's creating batches 

of four indices at a time. So rather than just looping through them in order, I 

can now loop through this BatchSampler. So we're going to change our data loader 

so that now it's going to take some BatchSampler. And it's going to loop through the 

BatchSampler. That's going to give us indices. And then we're going to get that Dataset item 

from that batch for everything in that batch. So that's going to give us a list. And then we 

have to stack all of the x’s and all of the y’s together into tensors. So I've created 

something here called collate function. And we're going to default that to this little 

function here, which is going to grab our batch, pull out the x’s and y’s separately, 

and then stack them up into tensors. So this is called our collate function. 

Okay. So if you put all that together, we can create a training sampler, which is a batch 

sampler over the training set with shuffle true. A validation sampler will be a batch sampler 

over the validation set with shuffle false. And so then we can pass that 

into this DataLoader class, the training data set and the training sampler 

and the collate function, which we don't really need because it's, we're just using the default 

one. So I guess we can just get rid of that. And so now here we go. We can do 

exactly the same thing as before xb, yb, next, iter. And this time we use the valid 

DataLoader, check the shapes. And this is how PyTorch's actual DataLoader works. This is the, 

this is all the pieces they have. They have samplers, they have batch samplers, they have 

a collation function and they have DataLoaders. So remember that what I want you 

to be doing for your homework is experimenting with these carefully to see 

exactly what each thing's taking in. Okay. So Piotr is asking on the chat, what is this 

collate thing doing? Okay. So collate function, it defaults to collate. What does it do? Well, 

let's see, let's go through each of these steps. Okay. So we need, so we've got a batch 

sampler, so let's do just the valid sampler. Okay. So the batch sampler, here it is. So we're going to go through each 

thing in the batch sampler. So let's just grab one thing from the batch sampler. Okay. So the 

output of the batch sampler will be next. It's okay. So here's what the batch sampler 

contains. All right. Just the first 50 digits, not surprisingly, because this is our 

validation sampler. If we did a training sampler, that would be randomized. There they 

are. Okay. And what we then do is we go self.dataset[i] for i in b. 

So let's copy that. Copy, paste. And so rather than self.dataset[i], 

we'll just say valid_ds[i]. Oh, and it's not i and b it's i 

and o that's what we called it. Oh, and we did it for training. Sorry. Training. 

Okay. So what it's created here is a list of tuples of tensors, I think. Let's have a 

look. So let's have a look. So let's say this, p —whatever. So p[0]. Okay is a tuple. It's got 

the x and the y, independent variable. So that's not what we want. What we want is something that 

we can loop through. We want to get batches. So what the collation model is going to do, sorry not 

collation model, the collate function is going to do is it's going to take all of our x’s and all 

of our y’s and collate them into two tensors, one tensor of x’s and one tensor of y’s. So the 

way it does that is it first of all calls zip(). So zip is a very, very commonly used Python 

function. It's got nothing to do with the compression program zip, but instead what it does 

is it effectively allows us to transpose things so that now, as you can see, we've got all of the 

second elements or index 1 elements all together and all of the index 0 elements together. And 

so then we can stack those all up together and that gives us our y’s for our batch. So that's 

what collate does. So the collate function is used an awful lot in PyTorch, increasingly nowadays 

where Hugging Face stuff uses it a lot. And so we'll be using it a lot as well. And basically 

it's a thing that allows us to customize how the data that we get back from our Dataset, once 

it's been kind of generating a list of things from the Dataset, how do we put it together 

into a bunch of things that our model can take as inputs? Because that's really what we want 

here. So that's what the collation function does. Oh, this is the wrong way around. Like so. This is something that I do so often that 

fastcore has a quick little shortcut for it, just called store_attr, store attributes. And so if you 

just put that in your dunder init, then you just need one line of code and it does exactly the same 

thing. So there's a little shortcut as you see. And so you'll see that quite a bit. All 

right. Let's have a seven minute break and see you back here very soon. And we're going 

to look at a multi-processing DataLoader, and then we'll have nearly finished 

this notebook. All right. See you soon. All right. Let's keep going. So we've seen how 

to create a DataLoader and sampling from it. The PyTorch DataLoader works exactly like this, 

but it uses a lot more code because it implements multi-processing. And so multi-processing means 

that the actual, this thing here, that code, can be run in multiple processes. They can be run 

in parallel for multiple items. So this code, for example, might be opening up a JPEG, rotating it, 

flipping it, et cetera. Right? So because remember this is just calling the dunder getitem for a 

Dataset. So that could be doing a lot of work for each item and we're doing it for every item in the 

batch. So we'd love to do those all in parallel. So I'll show you a very quick and dirty 

way that basically does the job. So Python has a multi-processing library. It doesn't 

work particularly well with PyTorch tensors. So PyTorch has created an exact re-implementation of 

it. So it's identical API wise, but it does work well with tensors. So this is basically what 

has grabbed the multi-processing. So this is not quite cheating because multi-processing isn't 

the standard library and this is API equivalent. So I'm going to say, we're allowed to do that. 

So as we've discussed, you know, when we call square brackets on a class, it's actually 

identical to calling the dunder getitem function on the object. So you can see here, if 

we say, give me items 3, 6, 8, and 1, it's the same as calling dunder 

getitem passing in 3, 6, 8, and 1. Now why does this matter? Well, I'll show you why. 

It matters because we're going to be able to use map and I'll explain why we want to use map in 

a moment. Map is a really important concept. You might've heard of map-reduce. So we've already 

talked about reductions and what those are. Maps are kind of the other key piece. Map is something 

which takes a sequence and calls a function on every element of that sequence. So imagine we had 

a couple of batches of indices, 3 and 6 and 8 and 1. Then we're going to call dunder getitem 

on each of those batches. So that's what map does. Map calls this function on every element 

of the sequence. And so that's going to give us the same stuff, but now this same as this, but now 

batched into two batches. Now why do we want to do that? Because multiprocessing has something called 

Pool where you can tell it how many workers do you want to run, how many processes you want to run. 

And it then has a map which works just like the normal Python map, but it runs this function 

in parallel over the items from this iterator. So this is how we can create a multiprocessing 

DataLoader. So here we're creating our DataLoader. And again, we don't actually need to pass in the 

collate function because we're using the default one. So if we say n_workers equals 2 and then 

create that, if we say next, see how it's taking a moment and it took a moment because it was 

firing off those two workers in the background. So the first batch actually comes out more slowly. 

But the reason that we would use a multiprocessing DataLoader is if this is doing a lot of work, we 

want it to run in parallel. And even though the first item might come out a bit slower, 

once those processes are fired up, it's going to be faster to run. So this is a really 

simplified multiprocessing DataLoader. Because this needs to be super, super efficient, 

PyTorch has lots more code than this to make it much more efficient. But the idea is this, 

and this is actually a perfectly good way of experimenting or building your own DataLoader 

to make things work exactly how you want. So now that we've re-implemented all this from 

PyTorch, let's just grab PyTorch’s. As you can see, they're exactly the same DataL oader. They 

don't have one thing called sampler that you pass shuffle to. They have two separate classes 

called SequentialSampler and RandomSampler. I don't know why they do it that way. It's a 

little bit more work to me, but same idea. And they've got BatchSampler. And so it's exactly the 

same idea. The training sampler is a BatchSampler with a RandomSampler. The validation sampler 

is a BatchSampler with a SequentialSampler. Pass them in batch sizes. And so we can now pass 

those samplers to the DataLoader. This is now the PyTorch’s DataLoader. And just like ours, it 

also takes a collate function. And it works. Cool. So that's, as you can see, it's doing exactly 

the same stuff that ours is doing with exactly the same API. And it's got some shortcuts, as I'm 

sure you've noticed when you've used DataLoaders. So for example, calling batch sampler is going 

to be very, very common. So you can actually just pass the batch size directly to a DataLoader, and 

it will then auto-create the batch samplers for you. So you don't have to pass in BatchSampler at 

all. Instead you can just say sampler, and it will automatically wrap that in the batch sampler 

for you. So it does exactly the same thing. And in fact, because it's so common to create 

a RandomSampler or a SequentialSampler for a Dataset, you don't have to do that manually. 

You can just pass in shuffle equals true or shuffle equals false to the DataLoader. And 

that does, again, exactly the same thing. There it is. Now something that is very 

interesting is that, when you think about it, the batch sampler and the collation function 

are things which are taking the result of the sampler, looping through them, 

and then collating them together. But what we could do is, actually, because our Datasets know how to grab multiple 

indices at once, we can actually just use the BatchSampler as a sampler. We don't 

actually have to loop through them and collate them because they're basically instantly, 

they come pre-collated. So this is a trick which actually Hugging Face stuff can use as well, and 

we'll be seeing it again. So this is an important thing to understand is how come we can pass a 

BatchSampler to sampler and what's it doing? And so rather than trying to look through the 

PyTorch code, I suggest going back to our non-multi-processing pure Python code to see 

exactly how that would work. Because it's a really nifty trick for things that you can grab multiple 

things from at once and it can save a whole lot of time. It can make your code a lot faster. Okay. 

So now that we've got all that nicely implemented, we should now add a validation set. And there's 

not really too much to talk about here. We'll just take our fit function, and this is 

exactly the same code that we had before. And then we're just going to add something 

which goes through the validation set and gets the predictions and sums up the losses 

and accuracies and from time to time prints out the loss and accuracy. And so get_dls(), we will 

implement by using the PyTorch DataLoader now. And so now our whole process will be get_dls() 

passing in the training and validation dataset. Notice that for our validation DataLoader, I'm 

doubling the batch size because it doesn't have to do back propagation. So it should use about half 

as much memory so I can use a bigger batch size. Get our model and then call this 

fit. And now it's printing out the loss and accuracy on the validation set. 

So finally we actually know how we're doing, which is that we're getting 97% accuracy on the 

validation set. And that's on the whole thing, not just on the last batch. So that's cool. We've now 

implemented a proper, working, sensible training loop. It's still, you know, a bit more code 

than I would like, but it's not bad. And every line of code in there and every line of code it's 

calling is all stuff that we have built ourselves, re-implemented ourselves. So we know exactly 

what's going on and that means it's going to be much easier for us to create anything we can think 

of. We don't have to rely on other people's code. So hopefully you're as excited about that as I 

am. Cause it really opens up a whole world for us. So one thing that we're going to want to be able 

to do now that we've got a training loop is to grab data. And there's a really fantastic library 

of datasets available on Hugging Face nowadays. And so let's look at how we use those datasets 

now that we know how to bring things into data loaders and stuff so that now we can use 

the entire world of Hugging Face datasets with our code. So we're going to, 

so you need to pip install datasets. And once you've pip install datasets, you'll be 

to say from datasets import, and you can import a few things. I just, these two things now, 

load_dataset, load_dataset_builder. And we're going to look at a dataset called Fashion-MNIST. 

And so the way things tend to work with Hugging Face is there's something called the Hugging 

Face hub, which has models and it has datasets amongst other things. And generally you'll give 

them a name and you can then say, in this case, load a dataset builder for Fashion-MNIST. Now a 

dataset builder is just basically something which has some metadata about this dataset. So the 

dataset builder has a .info and the .info has a .description. And here's a description of this. 

And as you can see, again, we've got 28 by 28 grayscale. So it's going to be very familiar 

to us because it's just like MNIST. And again, we've got 10 categories. And again, we've got 

60,000 training examples. And again, we've got 10,000 test examples. So this is cool. So as it 

says, it's a direct drop-in replacement for MNIST. And so the dataset builder also will tell 

us what's in this dataset. And so Hugging Face stuff generally uses dictionaries rather 

than tuples. So there's going to be an image of type Image, and there's going to be a label of 

type ClassLabel There's 10 classes and these are the names of the classes. So it's quite nice that 

in Hugging Face datasets, you know, we can kind of get this information directly. It also tells us 

if there are some recommended training test bits, we can find out those as well. So this is the size 

of the training split and the number of examples. So now that we're ready to start playing with 

it, we can load the dataset. Okay, so this is the difference between load_dataset_builder() versus 

load_dataset(). So this will actually download it, cache it, and here it is. And it creates a dataset 

dictionary. So a dataset dictionary, if you've used fast.ai, is basically just like what we call 

the datasets class. They call the DatasetDict class. So it's a dictionary that contains in 

this case, a train and a test item, and those are datasets. These datasets are very much like the 

datasets that we created in the previous notebook. So we can now grab the training and test items 

from that dictionary and just pop them into variables. And so we can now have a look at the 

0 index thing in training. And just like we were promised, it contains an image and a label. 

So as you can see, we're not getting tuples anymore. We're getting dictionaries containing 

the x and the y, in this case, image and label. So I'm going to get pretty bored writing image and 

label in strings all the time. So I'm just going to store them as x and y. So x is going to be the 

string ‘image’ and y will be the string ‘label’. I guess the other way I could have done that would have been to say x comma y equals 

that. That would probably be a bit neater because it's coming straight from the 

features. And if you iterate into a dictionary, you get back its keys. That's why that works. 

So anyway, I've done it manually here, which is a bit sad, but there you go. Okay. So 

we can now grab the from train zero, which we've already seen. We can grab the x, i.e. 

the image, and there it is. There's the image. We could grab the first five images 

and the first five labels, for example. And there they are. Now we already 

know what the names of the classes are. So we could now see what these map to by grabbing 

those features. So there they are. So this is a special Hugging Face class, which most libraries 

have something including fast.ai that works like this. There's something called int to string 

{int2str}, which is going to take these and convert them to these. So if I call it on our y 

batch, you'll see we've got, first is ‘ankle boot’ and there that is indeed an ankle boot. Now we're 

going to have a couple of t-shirts and a dress. Okay. So how do we use this to train a model? 

Well, we're going to need a DataLoader and we want a DataLoader that for now we're going to do just 

like we've done it before. It's going to return, well, actually we're going to do something 

a bit different. We're going to have, our collate function is actually going to return 

a dictionary. Actually, this is pretty common for Hugging Face stuff. And PyTorch 

doesn't mind if you, it's happy for you to return a dictionary from a collation 

function. So rather than returning a tuple of the stacked up. Hopefully this looks very 

familiar. This looks a lot like the thing that goes through the Dataset for each 

one and stacks them up just like we did in the previous notebook. So that's what we're 

doing. We're doing all in one step here in our collate function. And then again, exactly the 

same thing. Go through our batch, grab the y and this is just stacking them up with the 

integers so we don't have to call stack. And so we're now going to have the 

image and label bits in our dictionary. So if we create our DataLoader 

using that collation function, grab one batch. So we can go batch 

x dot shape is a 16 by 1 by 28 by 28 and our y of the batch here, here it is. So the 

thing to notice here is that we haven't done any transforms or anything or written our own 

Dataset class or anything. We're actually putting all the work directly in the collation 

functions. This is like a really nice way to skip all of the kind of abstractions 

of your framework, if you want to, is you can just do all of your work in 

collate functions. So it's going to pass you each item. So you're going to get the batch 

directly. You just go through each item. And so here we're saying, okay, grab the x key 

from that dictionary, convert it to a tensor and then do that for everything in the batch 

and then stack them all together. So this is, yeah, this is like, can be quite a nice way to 

do things if you want to do things just very manually without having to think too 

much about, you know, a framework, particularly if you're doing really 

custom stuff, this can be quite helpful. Having said that, Hugging Face datasets 

absolutely lets you avoid doing everything in collate function, which, if we want 

to create really simple applications, that's where we're going to eventually want 

to head. So we can do this using a transform instead. And so the way we do that is we create 

a function. You've got to take our batch. It's going to replace the x in our batch with the 

tensor version of each of those PIL images. And I'm not even stacking them or anything. 

And then we're going to return that batch. And so Hugging Face datasets has something 

called with_transform(), and that's going to take your dataset, your Hugging Face 

dataset, and it's going to apply this function to every element. And it doesn't run at all now, 

it's going to basically, when, when it, behind the scenes, when it calls dunder getitem, it will 

call this function on the fly. So, in other words, this could have data augmentation, which can 

be random or whatever, because it's going to be rerun every time you grab an item, it's not cached 

or anything like that. So other than that, this dataset has exactly the API, same API as any other 

dataset. It has a length, it has a dunder getitem, so you can pass it to a DataLoader. And so PyTorch 

already knows how to collate dictionaries of tensors. So we've got a dictionary of tensors 

now. So that means we don't need a collate function anymore. I can create a DataLoader from 

this without a collate function, as you can see. And so this is given exactly 

the same thing as before, but without having to create a custom collate 

function. Now, even this is a bit more code than I want, having to return this seems a bit 

silly. But the reason I had to do this is because Hugging Face datasets expects the with_transform 

function to return the new version of the data. So I wanted to be able to write 

it like this, transform in place, and just say the change I want to make and 

have it automatically return that. So if I create this function, it's exactly the same 

as the previous one, but doesn't have return. How would I turn this into something 

which does return the result? So here's an interesting trick. We could take 

that function, pass it to another function to create a new function, which is the, a version 

of this inplace function that returns the result. And the way I do that is by creating a 

function called inplace. It takes a function, it returns a function. The function it 

returns is one that calls my original function and then returns the result. So this is the 

function. This is a function generating function. And it's modifying an inplace function to become 

a function that returns the new version of that data. And so this is a function. This function is 

passed to this function, which returns a function. And here it is. So here's the version that Hugging 

Face will be able to use. So I can now pass that to with_transform() and it 

does exactly the same thing. So this is very, very common in Python. It's so 

common that this line of code can be entirely removed and replaced with this little token. If 

you have a function and put @ at the start, you can then put that before a function. And what it 

says is take this whole function, pass it to this function and replace it with the result. So this 

is exactly the same as the combination of this and this. And when we do it this way, this kind 

of little syntax sugar is called a decorator. Okay. So there's nothing magic about decorators. 

It's literally, literally identical to this. Oh, I guess the only difference is we don't end up with 

this unnecessary intermediate underscore version, but the result is exactly the same. And therefore 

I can create a transformed Dataset by using this. And there we go. It's all working fine. Yeah, so I mean, none of this is particularly 

necessary, but what we're doing is we're just kind of like seeing, you know, the 

pieces that we can, we can put in place to make this stuff as easy as possible and 

that we don't have to think about things too much. All right. Now with all this, we can 

basically make things pretty automatic. And the way we can make things pretty automatic is 

we're going to use a cool thing in Python called itemgetter(). And itemgetter is 

a function that returns a function. So hopefully you're getting used to this idea now. 

This creates a function that gets the a and c items from a dictionary or something that looks 

like a dictionary. So here's a dictionary. It contains keys a, b, and c. So this function will 

take a dictionary and return the a and c values. And as you can see, it has done exactly 

that. I’ll explain why this is useful in a moment. I just wanted to briefly mention 

what did I mean when I said something that looks like a dictionary? I mean, this is a 

dictionary. Okay. That looks like a dictionary. But Python doesn't care about what type things 

actually are. It only cares about what they look like. And remember that when we call something 

with square brackets, when we index into something, behind the scenes it's just calling 

dunder getitem. So we could create our own class. And its dunder getitem, gets the key. And it's 

just going to manually return 1 if k equals a or 2 if k equals b or 3 otherwise. And look, that 

class also works just fine with an itemgetter. The reason this is interesting 

is because a lot of people write Python as if it's like C++ or Java or 

something. They write as if it's this kind of statically typed thing. But I really wanted to 

point out that it's an extremely dynamic language and there's a lot more flexibility than you might 

have realized. Anyway, that's a little aside. So what we can do is think about a batch for 

example where we've got these two dictionaries. Okay. So PyTorch comes with a default 

collation function called, not surprisingly, default_collate So that's part of PyTorch. And 

what default_collate() does with dictionaries is it simply takes the matching keys and then 

grabs their values and stacks them together. And so that's why if I call default_collate, a 

is now 1, 3, b is now 2, 4. That's actually what happened before when we created this DataLoader is 

it used the default collation function, which does that. It also works on things that are tuples, not 

dictionaries, which is what most of you would have used before. And what we can do therefore is we 

could create something called collate_dict(), which is something which 

is going to take a Dataset and it's going to create a itemgetter function for 

the features in that Dataset, which in this case is ‘image’ and ‘label’. So this is a function 

which will get the ‘image’ and ‘label’ items. And so we're now going to return a function 

and that function is simply going to call our itemgetter() on default_collate(). And 

what this is going to do is it's going to take a dictionary and collate it into a tuple 

just like we did up here. So if we run that, so we're now going to call DataLoader on our 

transform dataset, passing in, and remember, this is a function that returns a function. 

So it's a collation function for this Dataset and there it is. So now this looks a lot like 

what we had in our previous notebook. This is not returning a dictionary, but it's returning 

a tuple. So this is a really important idea for, particularly, for working with Hugging Face 

datasets is that they tend to do things with dictionaries and most other things in the PyTorch 

world tend to work with tuples. So you can just use this now to convert anything that takes, that 

returns dictionaries into something that provides tuples by passing it as a collation function 

to your DataLoader. So remember, you know, the thing you want to be doing this this week is doing 

things like import pdb, pdb.set_trace(), right? Put breakpoints, step through, see exactly 

what's happening, you know, not just here, but also even more importantly, doing it inside the 

innermost, inner function. So then you can see, what did I do wrong there? Oh, 

did I? Set underscore trace. So then we can see exactly 

what's going on. Put out b. List the code. And I could step into it. And 

look, I'm now inside the default_collate function, which is inside PyTorch. And so I 

can now see exactly how that works. There it all is. So it's going to go 

through and this code is going to look very familiar because we've implemented all this 

ourselves. Because it's being careful to like it works for lots of different types of things, 

dictionaries, NumPy arrays, so on and so forth. So the first thing I wanted to do, oh, 

actually, something I do want to mention here, this is so useful, we want to be able 

to use it in all of our notebooks. So rather than copying and pasting this every 

time, it would be really nice to create a Python module that contains this definition. So we've 

created a library called nbdev. It's really a whole system called nbdev, which does exactly 

that. It creates modules you can use from your notebooks. And the way you do it is you use this 

special thing we call comment directives, which is hash pipe. And then hash pipe export. So you put 

this at the top of a cell and it says do something special for this cell. What this does is it says 

put this into a Python module for me, please. Export it to a Python module. What Python 

module is it going to put it in? Well, if you go all the way to the top, you tell it what 

default export module to create. So it's going to create a module called datasets. So what I do at 

the very end of this module is I've got this line that says import nbdev, nbdev.nbdev_export(). 

And what that's going to do for me is create a library, a Python library. It's going to have 

a datasets.py in it. And we'll see everything that we exported. Here it is. collate_dict 

will appear in this for me. And so what that means is now in the future, in my notebooks, 

I will be able to import collate_dict from my datasets. Now you might wonder, well, how 

does it know to call it miniai? What's miniai? Well, in nbdev, you create a settings.ini file 

where you say what the name of your library is. So we're going to be using this quite a lot 

now because we're getting to the point where we're starting to implement stuff that didn't 

exist before. So previously most of the stuff, or pretty much all the stuff we've created, I've 

said like, oh, that already exists in PyTorch. So we don't need it. We just use PyTorch’s. But 

we're now getting to a point where we're starting to create stuff that doesn't exist anywhere. We've 

created it ourselves. And so therefore we want to be able to use it again. So during the rest of 

this course, we're going to be building together a library called miniai That's going to be our 

framework, our version of something like fastai. Maybe it's something like what fastai 3 will end 

up being. We'll see. So that's what's going on here. So we're going to be using, once I 

start using miniai, I'll show you exactly how to install this, but that's what this 

export is. And so you might've noticed I also had an export on this in place thing. And 

I also had it on my necessary import statements. Okay. We want to be able to see what this dataset 

looks like. So I thought it now is a good time to talk a bit about plotting because knowing how 

to visualize things well is really important. And again, the idea is we, we're not allowed 

to use fastai's plotting library. So we've got to learn how to do everything ourselves. So 

here's the basic way to plot an image using matplotlib. So we can create a batch, grab the 

x part of it, grab the very first thing in that. And imshow() means show an image. And 

here it is. There's our ankle boot. So let's start to think about what stuff we 

might create, which we can export to make this a bit easier. So let's create something 

called show_image(), which basically does imshow(), but we're going 

to do a few extra things. We will make sure that it's in the correct 

access order. We will make sure it's not on CUDA that's on the CPU. If it's not a NumPy 

array, we'll convert it to a NumPy array. We'll be able to pass in an existing axis, 

which we'll talk about soon. If we want to, we'll be able to set a title if we want to. 

Amd also, this thing here removes all this ugly 05 blah blah blah axis because we're 

showing an image. We don't want any of that. So if we try that, you can see, there we go. We 

also been able to say what size we want the image. There it all is. Now here's something 

interesting. When I say help, the help shows the things that I implemented, 

but it also shows a whole lot more things. How did that magic thing happen? And you 

can see they work because here's figsize, which I didn't add. Oh, sorry. I did add. 

Well, okay. That's a bad example. Anyway, these other ones all work as well. So how did 

that happen? Well, the trick is that I added **kwargs here and **kwargs says, grab, you 

can pass it as many or any other arguments as you like that aren't listed. And they'll 

all be put into a dictionary with this name. And then, when I call imshow() I pass that entire 

dictionary ** here means “as separate arguments”. And that's how come it works. And then 

how come does it know, how come it knows what help to provide? The reason why is that 

fastcore has a special thing called delegates, which is a decorator. So now you know 

what a decorator is and you tell it, what is it that you're going to be passing kwargs 

to? I'm going to be passing it to imshow(), and then it automatically creates the documentation 

correctly to show you what kwargs can do. So this is a really helpful way of being able to 

kind of extend existing functions like imshow and still get all of their functionality and 

all of their documentation and add your own. So delegates is one of the most useful 

things we have in fastcore, in my opinion. So we're going to export that. So now we can use 

show_image() anytime we want, which is nice. Something that's really helpful to 

know about matplotlib is how to create subplots. So for example, what happens if you 

want to plot two images next to each other? So in matplotlib subplots creates multiple 

plots and you pass it number of rows and the number of columns. So this here has, 

as you see, one row and two columns. And it returns axes. Now what it calls axes 

is what it refers to as the individual plots. So if we now call show_image() on 

the first image, passing in axs[0], it's going to get that here, right? Then we 

call ax.imshow(). That means put the image on this subplot. They don't call it a subplot, 

unfortunately, they call it an axis, put it on this axis. So that's how come we're able to 

show an image, one image on the first axis, and then show a second image on the second axis by 

which we mean subplot. And there's our two images. So that's pretty handy. So I've decided to add 

some additional functionality to subplots. So therefore I use delegates on subplots() because 

I'm adding functionality to it. And I'm going to be taking kwargs and passing it through to 

subplots(). And the main thing I wanted to do is to automatically create an appropriate figure 

size by just finding out, you tell us what image size you want. And I also want to be able to 

add a title for the whole set of subplots. And so there it is. And then I also want 

to show you that it'll automatically, if we want to, create documentation for us 

as well, for our library. And here is the documentation. So as you can see here, for the 

stuff I've added, it's telling me exactly what each of these parameters are, their type, 

their defaults, and information about each one. And that information is automatically coming from 

these little comments. We call these documents. This is all automatic stuff done by fastcore and 

nbdev. And so you might've noticed when you look at fastai library documentation, it always has 

all this info. So that's why. You don't actually have to call show_doc(), it automatically added to 

your documentation for you. I'm just showing you here what it's going to end up looking like. And 

you can see that it's worked with delegates. It's put all the extra stuff from delegates in here 

as well. And here they are all listed out here as well. So anyway, subplots. So let's create 

a 3 by 3 set of plots and we'll grab the first eight images. And so now we can go through each 

of the subplots. Now it returns it as a 3 by 3, basically a list of 3 lists of 3 items. So 

I flattened them all out into a single list. So we'll go through each of those subplots and 

go through each image and show each image on each axis. And so here's a quick way to quickly 

show them all. As you can see, it's a little bit ugly here, so we'll keep on adding more useful 

plotting functionality. So here's something that, again, it calls our subplots delegates to it. 

But we're going to be able to say, for example, how many subplots do we want? And it'll 

automatically calculate the rows and the columns. And it's going to remove the axes for any ones 

that we're not actually using. And so here we got that. So that's what get_grid()'s going to let us 

do. So we're getting quite close. And so, finally, why don't we just create a single thing called 

show_images() that's going to get our grid. And it's going to go through our images optionally 

with a list of titles and show each one. And we can use that here. You can see we have 

successfully got all of our labeled images. And so we, yeah, I think all this stuff for the 

plotting is pretty useful. So as you might've noticed, they were all exported. So in our 

datasets.py, we've got our get_grid(), we've got our subplots, we've got our show_images(). 

So that's going to make life easier for us now, since we have to create everything from 

scratch, we have created all of those things. So as I mentioned at the very end, 

we have this one line of code to run. And so just to show you, if I remove miniai dot datasets… miniai slash datasets.py, so 

it's all empty. And then I run this line of code. And now it's back, as you can see, and it 

tells you it's auto generated. All right. So we are nearly at the point where we can build 

our learner. And once we've built our learner, we're going to be able to really dive deep into 

training and studying models. So we've kind of got, nearly got all of our infrastructure in 

place. Before we do, there's some pieces of Python, which not everybody knows, and I want 

to kind of talk about and kind of computer science concepts I want to talk about. 

So that's what 06_foundations is about. So this whole section is just going to tell it, 

just going to talk about some stuff in Python that you may not have come across before. Or maybe 

it's a review for some of you as well. And it's all stuff we're going to be using basically in the 

next notebook. So that's why I wanted to cover it. So we're going to be creating a learner class. 

So a learner class is going to be a very general purpose training loop, which we can get to do 

anything that we want it to do. And we're going to be creating things called callbacks to make 

that happen. And so therefore we're going to just spend a few moments talking about what are 

callbacks, how are they used in computer science, how are they implemented, look at some examples. 

They come up a lot. That's the most common place that you see callbacks in software is for GUI 

events. So for events from some graphical user interface. So the main graphical user interface 

library in Jupyter Notebooks is called ipywidgets. And we can create a widget like a button, like 

so. And when we display it, it shows me a button. And at the moment it doesn't 

do anything if I click on it. What we can do though, is we can 

add an on_click() callback to it, which is something which is a fun, 

we're going to pass it a function, which is called when you click it. So let's 

define that function. So I'm going to say w.on_click(f) is going to assign the f function 

to the on click callback. Now if I click this, there you go, it's doing it. Now what does that 

mean? Well, a callback is simply a callable that you've provided. So remember a callable is a more 

general version of a function. So in this case, it is a function that you've provided that will 

be called back to when something happens. So in this case, there's something that's happening is 

that they're clicking a button. So this is how we are defining and using a callback as a GUI event. 

So basically everything in ipywidgets, if you want to create your own graphical user interfaces 

for Jupyter, you can do it with ipywidgets and by using these callbacks. So these particular 

kinds of callbacks are called events, but it's just a callback. All right, so that's somebody 

else's callback. Let's create our own callback. So let's say we've got some very slow calculation. 

And so it takes a very long time to add up the numbers zero to five squared because 

we sleep for a second after each one. So let's run our slow calculations. Still 

running. Oh, how's it going? Come on, finish our calculation. There we go. The answer 

is 30. Now for a slow calculation like that, such as training a model, it's a slow calculation. 

It would be nice to do things like, I don't know, print out the loss from time to time 

or show a progress bar or whatever. So generally for those kinds of things, we would 

like to define a callback that is called at the end of each epoch or batch or every few seconds or 

something like that. So here's how we can modify our slow calculation routine such that you can 

optionally pass at a callback. And so all of these codes are the same, except we've added this 

one line of code that says, if there's a callback, then call it and pass in where we're up to. So 

then we could create our callback function. So this is just like we created a full callback 

function f(), let's create a show_progress() callback function. That's going to tell us how far 

we've got. So now if we call show slow calculation passing in our callback, you can see it's going 

to call this function at the end of each step. So here we've created our own callback. So 

there's nothing special about a callback. It doesn't require its own like syntax. It's not 

a new concept. It's just an idea, really, which is the idea of passing in a function, which some 

other function will call at particular times, such as at the end of a step or such as when you click 

a button. So that's what we mean by callbacks. We don't have to define 

the function ahead of time. We could define the function at 

the same time that we call the slow calculation by using Lambda. So as we've 

discussed before, Lambda just defines a function, but it doesn't give it a name. So here's a 

function that takes one parameter and prints out exactly the same thing as before. So here's 

the same way as doing it, but using a Lambda. We could make it more sophisticated 

now. And rather than always saying, “Awesome! We finished epoch…”, whatever, we 

could have let you pass in an exclamation and we print that out. And so in this case, we 

could now have our Lambda call that function. And so one of the things that 

we can do now is to, again, we can create a function that returns a function. And so we could create a make_show_progress 

function where you pass in the exclamation. We could then create, and there's no need to give 

it a name actually, it's just return it directly. We can return a function that 

calls that exclamation. So here we are passing in nice. And that's exactly the same 

as doing something like what we've done before. We could say, instead of using a Lambda, 

we can create an inner function like this. So here's now a function that returns a 

function. This does exactly the same thing. Okay. So one way with the Lambda, 

one way with outer Lambda. One of the reasons I wanted to show you 

that is so I can, I don't know about… so many here, is that we can do exactly the 

same thing using partial. So with partial, it's going to do exactly the same thing 

as this kind of make_show_progress(). It's going to call show_progress() and 

pass, okay, I guess. So this is again, an example of a function returning a function. 

And so this is a function that calls show progress passing in this as the first parameter. 

And again, it does exactly the same thing. Okay. So we tend to use partial a lot. So that's certainly something worth spending 

time practicing. Now as we've discussed, Python doesn't care about types in particular. 

And there's nothing about any of this that requires cb to be a function. It just has to 

be a callable. A callable is something that you can call. And so as we've discussed, another way 

of creating a callable is defining dunder call. So here's a class and this is going to work 

exactly the same as our make show progress thing, but now as a class. So there's a dunder init, 

which stores the exclamation and a dunder call, the prints. And so now we're creating a object 

which is callable and does exactly the same thing. Okay. So these are all fundamental ideas that 

I want you to get really comfortable with. The idea of dunder call, dunder things in general, 

partials, classes, because they come up all the time in PyTorch code and in the code we'll be 

writing and, in fact, pretty much all frameworks. So it's really important to feel comfortable 

with them. And remember you don't have to rely on the resources we're providing. If there are 

certain things here that are very new to you, Google around for some tutorials or ask for 

help on the forums, finding things and so forth. And then I'm just going to briefly 

recover something I've mentioned before, which is *args and **kwargs, 

because again, they come up a lot. I just wanted to show you how they work. So if 

we create a function that has *args and **kwargs, nothing else, and I'm just going to 

have this function just print them. Now I'm going to call the function. I'm going 

to pass 3. I'm going to pass “a” and I'm going to pass thing1=”hello”. Now these are parts, what 

we would say, by position. We haven't got a blah equals. They're just stuck there. Things that are 

passed by position are placed in *args, if you have one, it doesn't have to be called args. You 

can call this anything you like, but in the star bit. And so you can see here that args is a tuple 

containing the positionally passed arguments. And then kwags is a dictionary containing the named 

arguments. So that is all that *args and **kwargs do. And as I say, there's nothing special about 

these names. I'll call this a, I'll call this b. Okay. And it'll do exactly the same 

thing. Okay. So this comes up a lot. And so it's important to remember that this 

is literally all that they're doing. And then, on the other hand, let's say we had 

a function which takes a couple of, okay, let's try that, print a, actually, 

we'll just print them directly a, b, c. Okay. We can also, rather than just using them as 

parameters, we can also use them when calling something. So let's say I create something called 

args, again, it doesn't have to be called args, called, which contains [1, 2]. And I create 

something called kwags that contains a dictionary containing {‘c’: 3}. I can then call g() 

and I can pass in *args comma **kwargs. And that's going to take this 1, 2, 

and pass them as individual arguments, positionally. And it's going to take the {‘c’: 

3} and pass that as a named argument, c equals 3. And there it is. Okay. So there are two 

linked but different ways that use * and **. Okay. Now here's a slightly different way 

of doing callbacks, which I really like. In this case, I've now passing in 

a callback that's not callable, but instead it's going to have a method called 

before_calc and another method called after_calc. And I'm, so now my callback is going to be a class 

containing a before_calc and an afte_calc method. And so if I run that, you can see it's… that there 

it goes. Okay. And so this is printing before and after every step by calling before_calc() and 

after_calc(). So callback actually doesn't have to be a callable. It doesn't have to be a function. A 

callback could be something that contains methods. So we could have a version of this, 

which actually, as you can see here, it's going to pass in to after_calc(), both 

the epoch number and the value it's up to, but by using *args and **kwags, I can just 

safely ignore them if I don't want them. Right. So it's just going to chew them up and 

not complain. If I didn't have those here, it won't work. See, because 

it got passed in val equals and there's nothing here looking for 

val equals. And it doesn't like that. So this is one good use of *args and **kwags 

is to eat up arguments you don't want. Or we could use the arguments. So let's 

actually use epoch and val and print them out. And there it is. So this is a more sophisticated 

callback that's giving us status as we go. Skip this bit because we don't really care about 

that. Okay. So finally, let's just review this idea of dunder, which we've mentioned before, 

but just to really nail this home, anything that looks like this, underscore underscore something 

underscore underscore something is special. And basically it could be that Python has to find that 

special thing or PyTorch has to find that special thing or NumPy has to find that special thing, but 

they're special. These are called dunder methods. And some of them are defined as part of the 

Python data model. And so if you go to the Python documentation, it'll tell you about these various 

different— here's __repr__, which we used earlier. Here's __init__ that we used earlier. So 

they're all here. PyTorch has some of its own, NumPy has some of its own. So for example, if 

Python sees plus (+), what it actually does is it calls dunder add. So if we want to create 

something that's not very good at adding things, it actually always adds 0.01 to it. Then I can say SloppyAdder(1) + SloppyAdder(2) 

equals 3.01. So “+” here is actually calling dunder add. So if you're not familiar with 

these, click on this data model link and read about these specific one, two, three, four, 

five, six, seven, eight, nine, ten, eleven methods, because we'll be using all of these 

in the course. So I'll try to revise them when we can, but I'm generally going to assume that 

you know these. A particularly interesting one is getattr. We've seen setattr already. getattr 

is just the opposite. Take a look at this. Here's a class. It just contains two attributes, 

a and b, that are set to 1 and 2. So I'll create an object of that class a.b equals 2, because I 

set b to 2. Okay. Now when you say a.b, that's just syntax sugar basically, in Python. What it's 

actually calling behind the scenes is getattr. It calls getattr on the object. And so this 

one here is the same as getattr(a, ‘b’), which hopefully, oh, actually that'll be, yeah, 

so it calls getattr(a, ‘b’). And this can kind of be fun because you could call getattr a, and then 

either ‘b’ or ‘a’ randomly. How's that for crazy? So if I run this, 2, 1, 1, 1, 2, as you can see, 

it's random. So yeah, Python is such a dynamic language. You can even set it up so you literally 

don't know what attributes are going to be called. Now getattr, behind the scenes, is actually 

calling something called dunder getattr. And by default, it'll use the version in the object base 

class. So here's something just like a, it's got a and b defined, but I've also got dunder getattr 

defined. And so dunder getattr, it's only called for stuff that hasn't been defined yet, and it'll 

pass in the key or the name of the attribute. So generally speaking, if the first character 

is an underscore, it's going to be private or special. So I've just got to raise an 

attribute error. Otherwise I'm going to steal it and return f‘Hello from {k}’. So if I 

go b.a, that's defined. So it gives me 1. If I go b.foo, that's not defined. So it calls getAtra and 

I get back hello from foo. And so, this gets used a lot in both fastai code and also a Hugging Face 

code to often make it more convenient to access things. So that's, yeah, that's how the getattr 

function and the dunder getattr method work. Okay. So I went over that pretty quickly. Since 

I know for quite a few folks, this will be all review, but I know for folks who haven't seen 

any of this, this is a lot to cover. So I'm hoping that you'll kind of go back over this, 

revise it slowly, experiment with it and look up some additional resources and ask on the forum 

and stuff for anything that's not clear. Remember, everybody has parts of the course that's really 

easy for them and parts of the course that are completely unfamiliar for them. And so 

if this particular part of the course is completely unfamiliar to you, it's not because 

this is harder or going to be more difficult or whatever. It's just so happens that this is 

a bit that you're less familiar with, or maybe the stuff about calculus in the last lesson was 

a bit that you're less familiar with. There isn't really anything particularly in the course that's 

more difficult than other parts. It's just that, you know, based on whether you happen to have 

that background. And so, yeah, if you spend a few hours studying and practicing, you know, 

you'll be able to pick up these things. And yeah, so don't stress if there are things that you don't 

get right away. Just take the time. And if you, yeah, if you do get lost, please ask because 

people are very keen to help. If you've tried asking on the forum, hopefully you've 

noticed that people are really keen to help. All right. So, I think this has been a pretty 

successful lesson. We've got to a point where we've got a pretty nicely optimized training 

loop. We understand exactly what DataLoaders and Datasets do. We've got an optimizer. We've 

been playing with Hugging Face datasets. And we've got those working really smoothly. So we 

really feel like we're in a pretty good position to write our generic learner training loop and 

then we can start building and experimenting with lots of models. So look forward to seeing you 

next time to doing that together. Okay. Bye.

Need a transcript for another video?

Get free YouTube transcripts with timestamps, translation, and download options.

Transcript content is sourced from YouTube's auto-generated captions or AI transcription. All video content belongs to the original creators. Terms of Service · DMCA Contact

Lesson 14: Deep Learning Foundations to Stable Diffusion ...