Hello everyone. This is Phil from statisticsmentor.com. In this video, I'm going to give you a gentle introduction to the cross tabs, otherwise known as crosstabulation, otherwise known as kaiquare [Music] test. Now, it's a test. So being a test we got to know what the null and alternative stroke experimental hypothesis is. The null is there is no association between two attributes. Another way of saying it is there is no association between two attributes. And yet another way of saying that is attributes say a a and b are independent. Okay. So the words independent, no association, no relationship mean the same thing. And the alternative or the experimental hypothesis is there is a relationship or there they are dependent. those two attributes or there is association depending on the words you used for the null. The idea of association is similar to correlation. Recall with correlation we're looking to see if there is a relationship as well but it's for quantitative variables. Cross tabs is for qualitative variables. Qualitative we mean that the variable is either nominal or ordinal. Okay. Now in cross tabs cross tabs the basic cross tabs you're seeing if there's a relationship or no relationship between two qualitative variables. Right? So these two qualitative variables may be either both nominal, both ordinal or a mix. One is ordinal, one is nominal. Now this cross tabs is quite um a common method that is employed when you're dealing with survey type data. All right. Now I've got an example here. I'm trying to see if there's a relationship between smoking and lung cancer. Now suppose I have no idea that whether these variables are quantitative or qualitative. How what what kind of clue can I get from that? Well, if you go to variable view, smoke is whether the person smokes yes or no. If we click on the values there, that's coded smoker or non-smoker. If it's coded something like this, then that means it's qualitative. In this case, it's nominal cancer. It's coded again. Yes. So again, this tells us it is qualitative. So if we were to see if looking to see if there's a relationship between smoking and cancer, then correlation will is not the appropriate method that the test association or cross tabs this cross tabs analysis is. And then we look at we've got another two variables here which we're not going to use today but just to show you that age you can see it's not coded the values there's nothing there so that tells us that age is most likely to be quantitative those numbers actually and mean something they're not codes okay well I've only done the first few there for you so in that case if age was involved then cross tabs would not be appropriate. But since our hypothesis trying to test to see if there's a relationship between smoking and cancer, both those are qualitative cross tabs is the way. Okay, first of all been an introduction. Let's take things real slow. Let's first of all bring up the counts. So to do that, go to analyze descriptives and we go to cross tabs. We got rows and columns. It doesn't matter really which one you put. Put one in rows, put one in columns. I put smoking in rows. Put cancer in columns. And just then say okay. Should give us a table of the counts. All right. There's two tables. First one is not of interest. It's just telling us that we've got a sample of 100 people and there no missing data there. It's all there. Okay. The next one, the cross tabs table. All right. Reading it, we can see that smoking is split into two categories, smoker and non-smoker. And cancer is split into cancer and did not get cancer. And then we've got totals. the row totals and the column totals. So if we're to read this table, this tells us that this number here 30 says that 30 people were smokers and had lung cancer. This number five here says that five people were smokers and did not get lung cancer. Hope you get the idea. Let's go down to this one. What does that say? That says that the five people were non-smokers and got lung cancer. And this one here, 60 60 people, non-smokers did not get lung cancer. How many people we know that there's 100 people out of 100 people, how many people altogether got lung cancer? 35. Because you add up third. This is the column of people got can lung cancer. 30 + 5 is 35. That's why this total is doing. It's tottting up the totting up these column there. I really hate this double click to activate thing that keeps popping up in SPSS. Very annoying. Right. Uh how many people did not get lung cancer? 65. Okay. Similarly, how many people in my sample were smokers? Smokers go to the column row 30 + 5 35. And how many people out of my 100 were non-smokers? 65. And this row here will add up 5 60 65. All right. And then if you look at the sum of the total set, that's 100. And the sum of total there comes to 100 which is as it should be. Okay. Just looking at this it gives us an idea that smokers tended to have lung cancer and non-smokers. Okay. But because the total number of smokers and non-smokers are different those it's actually easier to compare make comparisons in percentage terms. So to do it in percentage terms go to analyze again where is it descriptive stats cross tabs and now we click on cells percentages let's get row column total and okay okay now slightly bigger table, but you can see it's same as the first one cuz you still got the numbers there. Look, lung cancer smokers 30 five 60. Those numbers are still there, but now we've got percentages, right? So, if we read along where it says 100 here, it means that's out of all people who smoke. Of all people who smoke, 85.7% get lung cancer. And out of everyone who smokes, 14.3% do not get lung cancer. And those two percentages add up to 100. For non-smokers, look within who smoke comes to 100. So how does is that 100 split up between lung cancer and no lung cancer? Well, of non- sm smokers 7.7% have lung cancer. That's five which represent which uh is five out of 65 which is 7.7% and 92.3% do not like lung cancer who are people who are non-smokers. That's 60 divid by 65 that's 60 out of 65 I mean is 92.3%. Okay. So now you can see the relationship is reversed for smokers that have a higher probability or higher chance of getting lung cancer than non-smokers. Non-smokers people who are non-smokers the pattern is reversed. Most do not get lung cancer. All right. So when we're saying is there a relationship between A and B by relationship we're looking for a pattern you know uh in this case it appears from our data that those people who do not smoke have less of a chance of getting lung cancer than those who did. All right. Now we our analysis doesn't stop there. We haven't actually done any cross tabs at all yet have we? Because we have not done a test. Well now we're ready for the test. Because what we've done so far is we've looked to see if there's a relationship just by looking at the counts and the percentages. But these counts and percentages, this apparent pattern, what what we've noted could just be due to sampling error. All right? So we need to test formally whether such a relationship holds, yes or no. If it doesn't then the pattern that we've just looked at here is just due to randomness in the data. It's not actually there. So that is why we need to do the test. So now let's go ahead and do it. It's pretty much the same buttons again. Analyze descriptive stats cross tabs. This time we click on statistics and we request the kai square there. Let's say okay and then we okay now you see the first two tables are the same as what we've seen previously there's an additional table and this is the kiquare test table recall that the null is well you can chant it out to tell me what it is yes I the null is that there is no relationship ship between smoking and lung cancer. We read off the top one here, forget all the other ones. Pearson kaiquare, the test statistic is 60.874 and one degree of freedom, but to be honest that doesn't interest us. those numbers the test statistic and degree of freedom ultimately to get the p value which is comes under this what they call asmtotic significance two-sided all right right there zero now recall that if the p value is low we reject the null and the dy to remember is if p is low null must go now p is low lower than 0.05 indeed it's lower even than 0.0.1 not one. So what do we do? It's low. So the null must go, i.e. we reject the null in favor of the alternative. So we conclude that there is a relationship there is there is evidence to support a relationship between smoking and lung cancer. And once we've done that, we can report it. We can report it like this in a report. There is very strong evidence of a relationship between smoking and death due to lung cancer open bracket and then write down the kai square comma the degree of freedom comma the p value less than 0.0 0.1 it's even it's really tiny or you could replace that if you are more comfortable 0.05 not five. Okay. So let's uh just um look back at what we've done. Let's look at this kai square. So this kiquare test is to test whether there's an association between two qualitative variables which may be nominal or um what is it nominal or ordinal or mix of ordinal and nominal. We run the kai square test and we look at the PS and kai square and the null is there is no relationship between the two things. All right. If we do not reject the null then the analysis stops and we say there is no evidence that there is a relationship between A and B. Full stop. Done. But if we reject the null, there is a relationship. And then we report what that relationship is by going to this table of counts and and then looking and reporting a pattern like we've like we did earlier. Okay. So that's only time that this comes into play is when we find there is evidence of a relationship. Now what I want to make clear is basically we're finished now but I want to conclude by pointing out uh a few things because this has been a very gentle introduction. There's obviously lots of things we've missed out but for a first uh first pass through this topic it's it's sufficient what we've done right now. I get students saying that saying that because the p value here is less it's very very tiny. It's it's basically less than 0.01 that there is a they say that there is a very strong relationship there is a very strong relationship between smoking and death due to lung cancer. Now why is that not correct? It's not correct because the lower the p value it does not mean the stronger the relationship. you know the test this is only test to see whether there's evidence of a relationship well there's no evidence and if there's evidence how strong is the degree of that evidence I think that's where students get caught out especially if their first language is not English so if the p value is very low it does not mean that there is very strong relationship it means that there is very strong evidence evidence of a relationship Okay. So let's say that again. Low p value means the lower the lower the p value that means the stronger the evidence of a relationship. It does not mean the stronger the relationship. That's first thing. Second thing is that look under this kai square test. It says here zero cells 0% have expected cell count of less than five. Okay. Now you will encounter occasionally especially if you've got um some categories with a few number of observations that sum of the expected cell values is less than five. Now if that's the case this k square approximation does not work so well and then what how you get around that is by collapsing groups. You might have heard about collapsing groups or combining groups. Okay. Um, if you're still in the dark there, let's just run this thing but with expected cell values and you'll see what I mean. So, we'll go to analyze again and let's see where is it? Um, descriptive stats cross tabs and we want expected, not counts, we want expected. Now look at this um table of counts that we've seen previously. We also got under each cell now expected count. These expected counts are the kind of the theoretical numbers you would expect to see in the cell if the null was not true. If the sorry if the null is true. Uh so where this number 30 is it says we observe 30 people who smokers and had lung cancer but if we expect the there to be no relationship I if we expect the null to be true we'd expect there to be actually around around 12.3 people with who are smokers and lung cancer. Why I say around is because obviously you cannot get.3 of a person. All right. So the closer the for each cell the closer the count value to the expected count that means the null is probably uh you probably will not reject the null I there is no relationship okay because the expected counts are generated assuming that the null is true i.e that there is no relationship okay now we swipe around here 12.3 22.8 22.8 8 42.3 all those expected counts I've just read out are exceed five so we have no problem all right but I'm saying that if we have tables where some of the numbers are less the expected cell counts is less than five then we would have to think about collapsing cells if we can because we can't always do it uh okay next thing right in terms of so going back to the strength of relationship. We know the p value cannot give you the strength of relationship. But if we find that there is strength of relation, we want to if there's a relationship, we can kind of report the relationships by doing what I did here very simply. But if you want to do it more sophisticated, you would use something called the odds and odds ratio, which you can get from this table. Again, I'm not going to do that in this video. That would give you the odds and odds ratio give you a idea of the strength of relationship. The other thing you can do is to report there are whole set of statistics that help you to report the strength of relation linear association. So if we go back to descriptive stats and cross tabs if we click on statistics here can you see at nominal and then ordinal under each column you've got a whole load of boxes there. These are basically options to test for the I mean not to test sorry to indicate the strength of association the degree of association are you weak or medium or strong between your two um your two qualitative variables. Okay. Well, this is just supposed to be gentle intro but I've said quite a bit towards the end finally also. Okay. Okay. So based on this test, it might may appear there is a strong evidence between smoking and lung cancer. However, as with statistics, it's not always as simple as that because although there could appear to be a relationship from the analysis we have done, this relationship could disappear when we introduce one other or one or more other variable into the analysis. I.e. if we condition on one or more other variable. uh sometimes this is called the third variable problem. Um so in correlation the analogous thing to in correlation to this is the partial correlation. So you know when you're looking at correlation between A and B when you introduce C the correlation between A and B could disappear disappear because A and B could A there's no rel real direct relationship between A and B just that there appears to be a relationship between A and B because each of A and B are related via C. Okay so similarly to here but again that's another uh that's would be a bit more advanced than what we're doing today. So let's stop there. So in this video just know that cross tabs is to test for the relationship between two qualitative variables and that we use the kiquare test to do that and the null is there is no relationship between two things A and B. The alternative is that there is a relationship between A and B. We look at the kai square value. If we reject it means that there is evidence of a relationship. Then we go into the table of counts and report the percentages. Look for a pattern there. Okay. Right. Well, I hope that has been useful. Thanks for watching.
Get free YouTube transcripts with timestamps, translation, and download options.
Transcript content is sourced from YouTube's auto-generated captions or AI transcription. All video content belongs to the original creators. Terms of Service · DMCA Contact
Browse transcripts generated by our community



















![Comment passer de 0 à 100 000€ en 2025 ? [GUIDE COMPLET]](https://img.youtube.com/vi/RFRmWqJWhN8/mqdefault.jpg)