Sticky Stat Wickets
     
Archived From the First Internet Gallery of Statistics Jokes

OUR SERIOUS STAT BUSINESS

ILLUMINATING DISCUSSIONS OF CONTROVERSIAL TOPICS

DEC 2007........DEGREES OF FREEDOM

Degrees of Freedom is a a very slippery concept that always seems on the verge of being mastered only to slither through the fingers of the beginning student. I will attempt to give it a very simple conceptualization and then present a working rule of thumb for counting the degrees of freedom in a variety of situations. In its simplest form the degrees of freedom (df-value) of a situation is the number of variables that you are allowed to vary freely without restriction. Thus, if I tell you X1 is a variable and you are perfectly free to assign it any number in our number system, you would have 1 df (not an infinity as you might think). Remember it is the number of variables not the number of values you can give a variable. Ok now suppose I expand the situation slightly. If I now give you the variables X1, X2, X3, and X4 and I again tell you can assign each of the four variables any number in our number system, you now have increased your df-value to 4. Get the idea? Now one more example will set this basic definition in stone (you can pick marble if you so desire). If I again give you the same four variables of the previous example and again offer you the opportunity to assign each variable any value until your heart is content but there is only one catch, the mean of the four numbers must be 8 when you end up. So you go on your merry way and give X1 the value 10, X2 the value 5, X3 the value 8, and without hesitation you go with the value 12 for X4. But whoa, the mean of your four numbers is 8.75 not 8! Now you suddenly realize you can't just give that 4th variable any value. You must give it a value that makes the sum of the four values 32 so that when you divide by 4 you get 8. Oh my gosh you are locked into the value 9 for X4! YOU HAVE LOST A DEGREE OF FREEDOM and really only have df=3 in this situation. In other words, giving you the opportunity to assign four variables any value but forcing the mean to be 8 is tantamount to losing a df. Knowledge of the mean counts as a restriction and subtracts one from the total df.

Now I shall move to a more generalized definition of degrees of freedom. The degrees of freedom of a statistic is the number of observations minus the number of necessary auxillary values which values themselves are based on the observations. This is kind of a nasty statement and somewhat flakey but don't panic. The rule works for 95% of the situations and that isn't bad statistically is it? In the last example the variables are the observations and the auxilliary value is the mean (note it is based on the four observed values), and therefore the df= 4 - 1 = 3. More specifically, the estimated standard error of the mean and therefore the t-statistic here would have df=3 and would test a hypothesis about a population mean.

Finally, one more example using this better rule and I shall close up shop for the month. When related pairs are present and your concen is the correlation coefficient r, what is the df for that situation? If I give you the correlated pairs (X1, Y1), (X2, Y2), (X3, Y3), (X4, Y4) and again allow you to assign the correlated values ( a wee bit tricky, huh?), the observations is 4 not 8 because a related pair is an observation (Now don't be rigid and not allow this). Also in computing the sample correlation coefficient r there are two auxiliary values, the slope of the best fitting straight line to the scatter plot of data and the Y-intercept. In other words , in a situation such as a t-test about the correlation coefficient r, the degrees of freedom here is df= 4-2 = 2 or in general df= N-2, where N is the number of correlated pairs and 2 is the number of auxiliary values. With this definition you must expand your notion of an observation and be cautious about new auxilliary values. Next month I will use this new rule to explain the df for the standard errors of some test statistics.

HAPPY HOLIDAYS


JAN 2008........COUNTING DEGREES OF FREEDOM FOR A TEST STATISTIC

Now that we all know something about the concept of degrees of freedom (df), I will show you how it plays a critical role in many statistical hypothesis tests. As you will soon see the df-value is usually a function of the sample size N. Many test statistics like t, F, and Chi Square have distinct df-values associated with them that must be determined in a given situation(In the case of F, it even has TWO distinct df-values...can you imagine that!). The df-value(s) is then entered into a table in the appendix of your book along with the level of significance to determine what critical value is needed to declare statistical significance. I shall use the dear old Student t given to the right to illutrate how you count this df-value for a few tests. The key with any t-test is to count the df-value of the ESTIMATED standard error which is the denominator of the ratio given to the right (indicated by the tilda sign). This then becomes the df-value for the test itself. Recall that a standard error is nothing more than a highly specialized standard deviation of the sampling distribution of the test statistic. Think of this formula as a generic template for ANY t-test with T standing for the test statistic in any given situation. In words, this formula is saying to calculate a t-ratio, take the observed value of the test statistic T and subtract the hypothesized population mean of the test statistic T and then divide this result by the ESTIMATED standard error of the test statistic T. Notice the emphasis on "ESTIMATED". If this were the EXACT standard error (no tilda), the statistic would become the well known standard normal z-test. Remember again for a test to be considered a t-test, it must be capable of being placed in the general format of this formula and the denominator must be an ESTIMATED standard error. As you might guess, the t-ratio at first glance could easily be mistaken for a z-ratio except for that little wiggle above the standard error indicating ESTIMATION. In fact, the distribution curves of both t and z are very similar(bell-shaped) except the t curve has more area in the tails as a function of the df-value. The greater the df-value the more alike the two curves become. Don't tell anyone this but a standard normal z-curve is really a special case of a t-curve with df=infinity. Now isn't that the cat's meow!

I shall now turn to the most basic t-test of them all displayed to the right, the t for testing a hypothesis about a single population mean. Recall that basically degrees of freedom is the number of variables that are allowed to vary freely without restriction. In hypothesis testing we usually work with a random sample(s) of scores of some sort. Here think of each score in the sample as a variable that is capable of taking on any value. Thus, each score becomes an observation and the total number of observations is the sample size N. Now the only necessary auxillary value in this case is the sample mean. Hence invoking the principle from last month that the df-value of a statistic is the number of observations minus the number of necessary auxillary values, the df-value of the estimated standard error in the denominator of this ratio is N-1 which becomes the df-value for this basic test. This ratio for example might be used to test the null hypothesis that a popuation mean of IQ scores is 100 which would be substituted on the right in the numerator. Of course, the sample mean and standard deviation s would be calculated from the data and plugged in also.

Several interesting observations are in order. Last week we determined that the sample standard deviation s had df=N-1 which is the same df as the estimated standard error of the mean in this test. Also if you suddenly go brain dead and forget the df-value for this test, it is staring you right in the face in the denominator of the formula. This is not by chance but occurs quite frequently with t-tests. Pretty nifty huh?

OK since things are going so smoothly, I next want to discuss the most widely used t-test in the literature...the so-called independent samples t. It is used to test the hypothesis that there is no difference in the means of two distinct popuations (i.e., the null hypothesis 0 is plugged into the right side of the numerator). The formula to the right admitedly looks a little scary but again it is nothing more than an iteration of the basic t template with the difference in the sample means serving as the test statistic. Here the observations are the scores in both samples and the auxillary values are the two separate sample means. Thus, by our rule the df-value for the estimated standard error and the test is N1+N2-2. Now that is pretty slick! The ingredients you need to calculate this t are the two sample means and the two sample variances and of course the two sample sizes. Again popping out like a zeon light from the formula is the df-value from the denominator to jog your memory. This test finds many applications. When you have two separate random samples of scores as in Experimental and Control groups or two different treatment groups and you desire to test the significance of the difference in the two means, this t becomes the star of your stat world.

Another relatively important t-test is presented that tests the hypothesis that there is no difference in the means of two correlated populations. To conduct this test, we must use the framework of the correlated pairs of (X1-X2) scores which was discussed last month. Fortunately in this situation you are allowed to compute a difference score (D) for each related pair in the sample and subsequently work with the sample D's from then on out. In essence you have reverted back to the simple t test with D's taking the place of X's. Thank God for little favors. Without this simple move, you must use an alternate method which requires that you compute the correlation coefficient and treat the X1's and X2's separately. Believe me unless you do this on a computer it is a statistican's nightmare and requires three times the work. Returning to the main problem of getting the df-value here, an observation becomes a D-value of which we have N and we have one auxilliary value which is the mean D. Thus the df for the estimated standard error is N-1 which becomes the df-value for the test. Beware of something with the calculation of t. You are working with a sample of D's so a difference is computed in the same order and you will probably end up with positive and negative D's which must be accounted for. The sample mean D and the sample standard deviation of D along with 0 for the hypothesized value of the population mean D are substituted in the formula and the value of t rolls out. This test is employed when you have a pre-test and post-test situation for a number of subjects or when you have subjects that are matched on another variable prior to administering two treatments. A common mistake with this test is to treat the X1's and the X2's as independent samples and use N+N-2 or 2N-2 as the df-value (too large) and employ the independent samples t-test above. This would be a positively biased test and result in too many Type I errors.

One Last Critical Note: The above test involves correlated or related pairs and then obtains D-scores and the mean D. This t-test has df= N-1 and tests a hypothesis about a population mean difference. Shown below is one last t-test (for now anyway), the t for testing the hypothesis that a population correlation coefficient is zero. It also involves correlated or related pairs and the sample correlation r, but employs a t-test with df= N-2 as we explained in an earlier Stickey Wicket. Many folks get the values N-1 of the former and N-2 of the latter confused since they both involve related pairs of scores, but remember there are two distinctly different hypothesis tests being performed here. Now you know the rest of the story!

Well that concludes my ramblings for January. I hope you are realizing that statistics has many reoccuring themes. Certainly the principles for counting degrees of freedom is one of them. You all now should be experts in counting degrees of freedom at least when you perform William Sealy Gossett's celebrated t-test.


FEB 2008........ N VS. N-1

What you say! You have to be kidding. You are making an issue of the number of scores in the sample and ONE LESS THAN THE NUMBER OF SCORES? How can that make a pennies worth of difference except in situations where the sample size is extremely small? How in the world can this be classified as a Sticky Wicket?

Well I understand where you are coming from in the moderate to large sample situation, but these two quantities have caused students of statistics more problems and confusion than a barrel of monkeys particularly when the students have used several textbooks in a course or in different statistics courses. The crux of the issue which generally an author makes no mention of is that the sample variance and standard deviation can be defined two different ways. This in turn makes subsequent formulas such as estimated standard errors (or error variances) look "seemingly" different depending on the definition when in reality the formulas are equivalent. The reason I feel that this issue requires discussion is that, to my knowledge, I have not seen a good explanation of this problem in any statistics textbook and it will save you questioning whether there are typos on many pages of the book. Let us then examine each definition , see where it takes us, and talk about the positives and negatives of each choice. Here you are going to see an issue that many statisticians are split on. If I were to guess I would say that the statistical community is about 50/50 on this one!

Now look at the two methods that are labeled (A) and (B) to the right. One thing both methods have in common is the sum of the squared deviations of the scores about the mean (∑x2). Three Cheers! In other words, statisticians pretty much agree that in most situations in order to measure how variable a set of scores is, you first must take into account each and every score in the sample. That is , you find out how far each score is above or below the mean of the sample (a deviation score). Then you square each of these deviation scores and summate the squared deviations. This is the direct or "brute force" method of computing this quantity and it involves far too many messy decimals. It is far easier to make this computation with only the raw scores and not fuss with the mean. You get ∑X and ∑X2 and employ STEP ONE of the World Famous Three Step Method. (i.e., ∑x2 = ∑X2 - (∑X)2/N). See Step One WFTSM for an example of this calculation.

Now the two methods part company. In (A) we divide the ∑x2 by the sample size N and this produces the sample variance s2. Since this index is in squared units, if we want an index in the original score units we extract the square root and have the sample standard deviation s. Division by N in this process makes sense logically because then we are able to state that the sample variance is the average squared deviation of each of the scores in the sample about the mean. This just shouts that it is measuring variation and it also just feels like a meaningful way of getting at the spread of a set of scores. Also it is valid when you have an N of 1 since the variance and standard deviation would be 0 which upon reflection is exactly what it should be.

Turning to method (B), we divide ∑x2 by N-1 to get s2 and then take the square root if desired to obtain s. However, the N-1 just seems nonintuitive. You cannot now neatly enterpret the sample variance as an average and the two formulas seem to lose their logical appeal. In addition, if N is 1 then the variance and standard deviation are both undefined because you are dividing by 0. Why then would anyone employ (B) to define the sample variance and standard deviation? I am going to whisper this but there is one slight advantage of (B). The reality of the matter is that with (B) you really have calculated an unbiased estimate of the population variance and very close to the same for the population standard deviation. So some authors feel that this method bypasses the sample index and moves directly to the population estimate. Thus, when authors label s2 = ∑x2/(N-1) and the subsequent square root as the sample variance and sample standard deviation, they are really somewhat disingenuous in doing so.

I will illustrate the confusion that the two definitions can create when you are reading different books. If the author uses (A), the estimated error variance of the mean is given by s2/(N-1) whereas if the author prefers (B) the same estimated error variance of the mean is s2/N...Two seemingly different results! But wait, two different definitions have been used for s2. REALLY THE TWO RESULTS ARE IDENTICAL! To show this, using the former result and substituting (A) for s2, we have ∑x2/N(N-1). Now using the latter result and substituting (B) for s2, we have ∑x2/(N-1)N...precisely identical results. Sooo...(what Steve Jobs would utter) what does all this mean? THE FIRST THING A PERSON SHOULD CHECK UPON OPENING A STATISTICS TEXTBOOK IS SEE WHAT STANCE THE AUTHOR TAKES ON THE N VS N-1 ISSUE IN DEFINING THE SAMPLE VARIANCE AND STANDARD DEVIATION. My opinion favors division by N but about half of the textbooks use division by N-1 so be prepared to make adjustments in your thinking. Statisticians end up at the same place on this one but sure create some illusions along the way. Thanks for reading my blurb and see you next month.


MAR & APR 2008........THE DEMISE OF THE CONFIDENCE INTERVAL

In inferential statistics, there have been two primary methodologies for gaining knowledge about population parameters. However, hypothesis testing has become the dominant force over confidence intervals throughout the latter half of the 20th century and into the 21st century. In fact in most disciplines, testing null hyotheses has become the exclusive method of choice in almost all of the research literature. The current textbooks have very little to say about confidence intervals. If they do it is in the form of a token short discussion or footnote. What has happened to a procedure that once was favored by mathematical statisticians and had an entire chapter devoted to it? Let us take a look at this procedure and see what difficulties have caused it to fall out of favor.

We will present a simple example of calculating upper and lower limits of a 95% confidence interval for a population mean μ. The figure at the right displays a standard normal curve of z-scores with two examples of useful percentiles that would be needed to obtain a 95% confidence interval. The first is called z.025 = -1.96 and by definition is the point on the z-scale such that 2.5% (.025) of the area falls below it (Remember the total area under this curve is 1 so areas correspond to probabilities). Now at the upper end we have z.975 = +1.96 or the point on the z-scale such that 97.5% (.975) of the area falls below it (upper blue area is therefore .025). The -1.96 and +1.96 come from the standard normal curve table and were perhaps memorized by some of you. Also the middle white area (called Δ or the confidence coefficient) then becomes 95% or .95. Note that in building a confidence interval, Δ is selected first and the tail-areas are always equal. Some other commonly used percentiles that may be dear to your heart from the tables are z.005 = -3.29 and z.995 = 3.29 with a middle area of 99% or Δ = .99. Also z.05 = -1.64 and z.95 = 1.64 with a middle area of 90% or Δ = .90. Great memories, huh? Now returning to the pictured example: If a random z is drawn from this distribution, the probabilty that a z will fall between -1.96 and +1.96 is .95 or mathematically, P(-1.96 ≤ z ≤ +1.96) = .95.

Next moving to the another 3-Step Procedure displayed to the right (notice never 2, never 4, always 3 steps for nice psychological closure), draw a random sample of size N from a population with known σ. Then convert the sample mean to a z in the previous probability statement and get statement (1) for a result. Then solving this three-way inequality with some simple algebra and getting μ smack dab in the middle by itself and everything else on the ends we arrive exactly where we want to be with statement (2). These end expressions are indeed the formulas for the lower and upper limits of a 95% confidence interval for μ. They are pulled out and stated for emphasis in statements (3). To cement these formulas in our minds let's do a simple example. Suppose we have a population of IQ scores with an unknown μ and σ = 16, We want to generate a 95% confidence interval for the population mean μ. If a random sample of N=64 is drawn and is computed to be 98.7, we substitute into statements (3):

    LL = 98.7 -1.96(16/sq rt(64)) = 98.7 -1.96(2) = 98.7 - 3.92 = 94.78
    UL = 98.7 +1.96(16/sq rt(64)) = 98.7 +1.96(2) = 98.7 + 3.92 = 102.62

Now the fun begins folks when we try to interpret these results. But you say, "This is a snap. We simply say the probablity that the population mean μ is between 94.78 and 102.62 is .95." But wait I hate to inform you that the population μ is a a fixed parameter and it is either between 94.78 and 102.62 ahead of time in which case the probability is one or the population μ is not beween 94.78 and 102.62 ahead of time in which case the probability is zero. Keep in mind probabilities refer to random variables and the mean μ is a fixed constant even though we don't know what it is. In other words, we can not associate a probability with any single pair of limits. This seems like a minor problem, but to many experts it is a real deterrant for using confidence intervals. Now we could replicate the experiment and obtain several sets of limits. Would this add any information? Certainly it would, but each pair of limits would be subject to the same criticism. But if I did collect an infinity of limits from N's of 64, 95% of the limits would contain the true value of μ. This is a true statement but many would deem this fact essentially useless.

The big advantage of a hypothesis test where an H0 is tested against a two-tailed alternative is that you do end up with an observed test statistic that has a probability associated with it when you reject or retain the null hypothesis. This method appears to appeal to many researchers even though we all know one hypothesis test does not prove anything. It is my speculation that the language itself with hypothesis testing has a certain degree of strength and finality associated with it. Expressions such as "Reject H0: μ1 - μ2 = 0 at the .05 level of significance and Accept the alternative that H1: μ1 > μ2" have a ring of authority linked to them. Recall also, thanks to Pearson and Neyman, we have our dear old friends Type I Error, Type II Error, and the Power of the test. It is indeed sad that the confidence interval approach has no such counterparts. In addition, the terminology of reject or retain H0 seems to mesh with complex ANOVA's where multiple comparisons are perfomed following a significant overall test. For these reasons and perhaps others that I have overlooked, hypothesis testing currently is the KING of the HILL with statisticians.

I would like to give you one advantage for the lonely confidence interval before I close shop. Assume the limits of the previous example where Δ = .95. If another reader reads these results and desires to hypothesis test instead, the results can be predicted very easily. Remember confidence intervals by nature are two-tailed and must be compared with a two-tailed hypothesis test. If the reader wants to test H0:μ = 100 with .05 as level of significance against two alternatives, retention of H0,: μ = 100 would be predicted because 100 is contained between the limits of 94.78 and 102.62. If the reader desires to test H0: μ = 104 against two alternatives, rejection of H0: μ = 104 would be predicted and acceptance of H1: μ < 104 would be supported since 104 is above both limits of 94.78 and 102.62. This may be continued on and on. Thus, the reader may very quickly and easily test any null hypotheses that his heart desires with the single set of data and limits given. Mathematicians have always thought this was pretty neat. However it has not caught on in other disciplines and this interpretation has not helped the cause for confidence intervals.

Thus, we conclude our cases for both methodologies of inference. I must admit I also favor hypothesis testing but who knows where we will be in ten years. Maybe we will turn to Tukey's Exploratory Data Analysis and refine sampling procedures to such a point where we do not even have to use inferential statistics. Now that would be a monumental advance. Meanwhile, thanks again for reading this presentation and HAPPY INFERRING!


MAY & JUNE 2008........A LEAN AND MEAN BASIC STATISTICS COURSE

In this month's Sticky Wicket we shall discuss one of my pet peeves in the area of statistics education. Just how many and what topics should be covered in the basic applied statistics course at the undergraduate level? This has been a troubling problem among the experts throughout my entire career but I have not shifted my position one iota in the last 30 years. I do not subscribe to the so-called comprehensive or "waterfront" course where you try to survey most of the statstical techniques and touch upon almost every imaginable topic. This is next to worthless. A good beginning course will teach the small set of reoccuring statistical themes that are invariant over a wide variety of fields such as psychology, biological science, political science, and yes even art and music. Statistics is statistics. There are only slight nuances between different fields with certain applications employed more often in some fields than in others (i.e, multiple regression in economics). Surprisingly, in the basic course, there are a FEW critical topics and skills that are fundamental and require mastery with a ton of practice . In this plan FEW is MORE! This type of basic course has the advantage of giving the student confidence and a solid footing in a core of topics that pop up over and over again in statistical analysis. Think of this course as the surface of a ball. You are flying a plane above the suface and darting up and down , in and out erratically, at varying heights. You want to land and enjoy the commonality of the surface (WOW, what a stimulating analogy).

Now let us look at this small glob of critical topics and skills that should be the focus of the course. I will present these in a sequential fashion but there is some flexibility in how they are ordered:

TOPICS FOR A BASIC APPLIED STATISTICS COURSE

(1) Collecting and Organizing Data.

(2) Picturing Distributions of Scores through Polygons, Histograms, Stem and Leaf Designs, and Box-and-Whisker Plots.

(3) Describing the Central Tendancy of Distributions (Mean, Median, and Mode) and Examining Skewness and Kurtosis.

(4) Variability - What Makes the Whole Field of Statistics Tick. The Most Important Skill of All--- Applying WFTSM Which is The World Famous Three Step Method Used to Calculate the Standard Deviation. (The Golden Key is Step1 which is at the top of the heap as far as important formulas in Statistics go)

(5) Interpreting a Score's Location in a Distribution - Percentiles and Standard Scores (Primarily z-Scores)

(6) The Normal Curve and Reading Out Probabilities from Under the Curve.

(7) Simple Hypothesis Testing with the z-Test using LFFSM which is the Locally Famous Five Step Method, another Critical Skill Almost as important as WFTSM. It is imperative that you have knowledge of three important aspects of the Sampling Distribution of the test statistic: Form, Mean, and Standard Error. If these three have not been at least estimated by mathematical statisticians for the statistic, all bets are off and a hypothesis test can not be employed with this particular statistic. Fortunately, the common sampling distributions discussed here have been thoroughly worked out and described.

(8) The t-Test and Reading the Table. Coverage of the Related and Independent Samples Tests.

(9) Correlation and the Importance of Step1's Cousin (Sum of products of the deviation scores) in the Calculation of the Correlation Coefficient.

(10) Simple Regression Analysis with One Predictor Variable.

Well, there you have my Ten Super Topics that give a student the solid underpinnings of statistical thought and allow him to easily move into more advanced areas. But wait you say, there are so many topics being left out such as One and Two Way Analysis of Variance, Confidence Intervels, the Chi-square Statistic, the Power of the Test, Non-parametric Statistics, Follow-up Tests in ANOVA and on and on. No doubt these are important but not core in the sense of lower level themes. If you made time for some or all of these more advanced topics, the course would evolve into a hodge podge of techniques with only the surface being scratched on each one. Precisely what you don't want at the basic level. You want depth in the above TEN topics. After all, there are entire courses devoted to Analysis of Variance and Covariance called Experimental Design and also semesters directed at Nonparametric techniques. There is a time and a place for these courses but don't muddle the beginning student's mind with the whole ball of wax in one semester. Allow the student to have some fun and insure that he walks away with a good impression of the statistics field. Thanks for your attention.


JULY & AUGUST 2008........STEP ONE'S COUSIN AND HOW IT SPAWNS COVARIANCE AND CORRELATION

We have repeatedly praised in these pages the World Famous Three Step Method (WFTSM) and its gold studded STEP ONE ∑x2 = ∑X2 - (∑X)2/N as perhaps the single most important computing sequence in basic statistics (See STEP ONE AND WFTSM for an example of this calculation). For a small set of scores, this procedure is really cool. If a scientific calculator is employed, the student enters each score one at time in the calculator and the N, ∑X, and ∑X2 are stored in separate memories of the calculator. Then when all the scores are entered, the student can punch a single key, and the standard deviation s will pop up on the display. You even have a choice of division by N or N-1 depending on your definition of s. Now that has to be so SLICK! What is going on here is the calculator is just moving through WFTSM when the last key is pushed. Just so you don't get too big a high on this recent news, if you are faced with large sets of data and many groups with ANOVAS and other procedures to carry out, the mainframe or a smaller computer is probably your best choice. Programs such as SPSS and SAS are then very useful. However, we now want to show you how STEP ONE can logically lead you into another critical computational routine.

Consider the score format that has been visited before and we have termed related or correlated pairs. That is, given (X1, Y 1), (X2, Y2),...(Xi, Yi),...(XN, YN). Now your dear little $15 calculator can still easily handle the task of entering the X member of the pair with one key and the Y member of the pair with another key until all pairs are entered. Then we can retrieve the following descriptive indices by pushing 4 different keys:, , sX, and sY. Now that is a pretty impressive array of indices. But recall that each of these pertain to either the separate X scores or the separate Y scores. We have no information on how high(or low or intermediate) the X score is relative to its mean compared with how high(or low or intermediate) the paired Y is relative to its mean. Putting this in very crude language, do the pairs of scores tend to be high together, low together and intermediate together or a completely different pattern such as the pairs being high and low together or low and high together? I hope you can see that the 4 basic indices do not touch on this type of "togetherness or covariability" relationship. Let us try out a calculation that may get at what we want...The sum of the products of the X and Y deviations about their respective means or in formula form:

This calculation can either be a positive number or a negative number unlike ∑x2and ∑y2 which are ALWAYS POSITIVE! So this has great promise for doing what we want it to. However, this is usually referred to as a "thinking" formula because it allows you to see exactly thow to calculate it directly but a direct calculation often is a very messy creature. Here we generally get decimals for both means, then we must subtract a decimal from each raw score for both the X's and Y's resulting in signed decimals, next we must find the products of these decimals again paying close attention to the signs, and finally sweating profusely we add up the whole batch of signed decimal products to arrive at the final ∑xy. Whew!!!

Fortunately, we are blessed with a neat computational formula displayed to the right where the ingredients are stored in memories in the calculator as you enter the pairs of scores (The proof will be omitted). Some calculators will (some won't) allow you to push still another key and ∑xy will appear upon the display. If not, you can still pull out from memories the sum of raw score products and the sums of the raw scores (ie. ∑XY, ∑X, and ∑Y) and finish the simple calculation on the right. Remember, that in this formula most raw scores will be whole numbers so this formula will be comparatively easy if done separately by hand on the calculator. Oh, I must mention we have finally arrived at what I call "STEP ONE'S COUSIN" because the procedure is so "analagously similar" (neat expression Huh?) to "STEP ONE". In other words,the right hand side of ∑x2 involves sum of raw squares and the square of the raw sum whereas here the ∑xy on the right involves sum of the raw products and the product of the raw sums. Hope you can see how similar they are! If STEP ONE is gold studded then STEP ONE'S COUSIN must rate silver studded!!!

Now we present two widely publicized formulas that are just tiny steps away from STEP ONE'S COUSIN and will give the measures of "togetherness" that we want for the X and Y pairs. Examine the formulas that are labeled (A) and (B) below:

To obtain result (A), we simply divide STEP ONE'S COUSIN by N, the number of pairs of scores and this produces the widely known Covariance of the X and Y pairs. In simple language, the Covariance is the mean product of the deviations of the X and Y scores about their respective means. Some authors refer to this as the mean cross product of the deviation scores. Recall that ∑xy/N can either be positive or negative and can range between negative infinity and positive infinity. If your calculator is top of the line it possibly has a button that will recall this result. But don't count on it. The Covariance is a very crucial index when you have a multiple number of variables. For example, with 4 variables we would arrange the 4 Variances down the main diagonal of a matrix with the 6 possible Covariances located in the off diagonal positions in the matrix. A wealth of information is contained in this 4x4 variance-covariance matrix with the Variance of each individual variable and the Covariance of all possible pairs of variables being displayed. Matrix algebra becomes the mode of operation when you delve into multivariate analysis.

Now in result (B) we move one more tiny step and divide the Covariance of X and Y by the product of the the standard deviations of X and Y. Putting it in statistical language, we are simply standardizing the covariance with this maneuver. Lo and Behold, the result may surprise you. We have now arrived at one the most celebrated statistics ever employed...The Pearson Product-Moment Correlation Coefficient. This index, of course, behaves very well and a full range of values between -1 and +1 may occur and includes the value of 0 as a possiblity. A high positive index such as +.90 or +.80 would suggest that high X's occur very frequently with high Y's, intermediate X's occur often with intermediate Y's, and low X's tend to be paired with low Y's. An inverse or negative correlation such as -.85 or -.90 would suggest low X's being paired with high Y's and high X's being paired with low Y's. A 0 index suggests no correspondence whatsoever. That is, given a high or low X value, it is impossible to predict where the Y will be. In terms of a scientific calculator, a high-end unit will almost always give you a button that will crank out the (B) result after all pairs are entered.

Finally, we shall calculate an example to show you how things work but will use a reasonably small set of pairs so you can use any type of calculator including a basic $5 unit. Please realize you will have to make three passes at the data if you use an el cheapo unit but it is still doable. Here are the paired data or the (Xi, Yi)'s which you may think of as pretest posttest scores for 10 individuals:

(10, 8) (8, 6) (5, 4) (12, 12) (4, 5) (3, 5) (14, 9) (12, 8) (6, 8) (12, 10)

After all the pairs of scores are entered, we recall from the calculator memories the following basic calculations and statistical indices:

∑X=86, ∑X2=878, ∑Y=75, ∑Y2=619, ∑XY=717, =8.6, =7.5, sX=3.72, sY=2.38 (WFTSM steps will be omitted and only the last new formulas will be shown.)

Now for the three new calculations of this entire blurb, substituting from the above

STEP ONE'S COUSIN
xy = ∑XY - (∑X)(∑Y) / N
        =717 - (86)(75) / 10
        =717 - 645
        =72

(A) COVARIANCE
COVXY = ∑xy / N
              =72 / 10
              =7.2

(B) CORRELATION COEFFICIENT
r = COVXY / (sXsY)
  =7.2 / [(3.72)(2.38)]
  =7.2 / 8.85
  =.814   This r of .814 would indicate there is a fairly strong tendancy for high X's to be paired with high Y's and low X's to be paired with low Y's.

With this example we now finish our presentation of three very useful formulas that were an outgrowth of a single set of scores representing a single variable to a format of pairs of scores representing two different variables. We have seen that statistical indices are important for each set of scores separately but now we also need indices that measure so called "togetherness" relationships between the two variables. This type of need has brought into play STEP ONE'S COUSIN and sequentially the Covariance and the Correlation Coefficient. This wicket has become indeed a little more sticky. In fact, a point of confusion is that these three formulas take on many different forms and in each case its equivalent may look nothing like the original symbollically. These formulas require much study and practice to develope depth of understanding. I will leave you with a neat little exercise just for fun. The formula for the correlation coefficient r can be thought of as STEP ONE'S COUSIN divided by the product of the square roots of two different STEP ONES! See if you can verify this goofy statement in your mind. Thanks for reading this somewhat rambling presentation and please tune in again.


SEPTEMBER & OCTOBER 2008........ WHY VARIATION IS THE KEY TO STATISTICAL METHODOLOGY AND ONE MORE FUN INDEX

During my career as a statistics professor I have had one question that has repeatedly been directed my way by students, academicians, and professional people...WHY DO WE NEED STATISTICS? My answer has evolved over the years from the simplistic...We need it to summarize data and make inferences about large populations to a more research oriented reason... We need it to intelligently read and interpret inferences from the scholarly literature in our own particular disciplines. Somehow I have not felt entirely comfortable with these reasons because they appear to be effects of statistical acumen rather than a primary need for statistics itself. I have finally, after extensive hand-wringing, come up with a more basic reason for the very existence of statistics... Statistics is needed because individuals and objects VARY on traits and characteristics. There it is... I finally hit the nail on the head without smashing my finger. Statistics is all about variation and what to do with it. Imagine a world where all people were the same height, same weight, same intelligenge, same hair color, and on and on or every car were the same color(black according to Henry Ford), same body, same engine horsepower, same number of doors, same dashboard and accesories, etc. Now think about this horrendous scenerio for ever trait of every individual and every characteristic of every object. Could we even survive in a boring, convoluted world like this?? I think not. So that is why when you progress through several statistics courses much time is devoted, sometimes unknowingly, to variation and how it is applied to the particular procedures that are presented to you. Yes variabibility RULES and our world will always need it to be appropriately understood.

Now I want to take you on a stroll through some of the statistical indices that are closely tied in with variation in the first several courses. You will be surprised at the shear number of these, some of which are very familiar and others may surprise you as being associated with variation. Remember that all measures of variabilitity should represent spread or clustering of a set of scores and should be capable of being viewed as a distance on a score scale. I will present the group of indices generally from the simplest to the most complex and make parenthetical comments as needed:

R = H - L   RANGE  (The highest minus the lowest score. Supplementary index since only 2 scores involved.)

Q = (Q3 - Q1)/ 2   SEMI-INTERQUARTILE RANGE  (The distance between the 75th percentile or 3rd quartile and the 25th percentile or 1st quartile divided by 2. Also can be thought of as the mean distance that Q3and Q1are from Q2, the median. Q is a good variability partner for the median as central tendancy when a distribution is skewed to any extent.)

 MEAN DEVIATION  (The mean absolute distance that each score is from the mean. A very intuitive index that makes a lot of sense. The only problem is it is difficult to calculate because of the absolute value signs, particularly for moderate to large samples. No STEP ONE-LIKE algorithm is available for this one.)

 SAMPLE VARIANCE  (Remember this is just the mean of the squares of the deviations of the scores about the mean or ∑x2/N or in its simplest form...STEP ONE divided by N. This index is very useful because it employs every score in the sample and avoids the absolute value signs by squaring the deviations.)

 SAMPLE STANDARD DEVIATION  (The square root of the above sample variance s2. Most widely used descriptive index of variability and is a perfect partner when the sample mean is used as the measure of central tendancy. Its main features are that again it depends on each and every score value and also may be interpreted in terms of the original score scale.)

 ESTIMATED STANDARD ERROR OF THE MEAN  (Remember a standard error is nothing more than a glorified standard deviation. This is the estimated SE of the sampling distribution of the sample mean and as such is employed in inferential statistics or hypothesis testing. If you substitute many other test statistics for such as or a myriad of other statistics, you can usually find stated SE formulas and use them in your hypothesis test. Some of these formulas are quite complex.)

MSW = SSW/(N-k)  MEAN SQUARE WITHIN  (This is employed in a one-way ANOVA with k groups of scores and nj scores in each group. It is an extension of ∑x2 where the deviation scores are taken about the respective group mean and then pooled together across all k groups. Looks suspiciously like variability again WITHIN the groups. N-k is the df-value for sum of squares within.)

MSB = SSB/(k-1)  MEAN SQUARE BETWEEN  (This also is employed in a one-way ANOVA. It is another extension of ∑x2 but the deviations are the group means about the overall mean M. Again, this has the appearance of BETWEEN group variation with k-1 as the df-value for sum of squares beween. Now, as some of you know you form an F-ratio with MSB/ MSW. This tests the significance of the differences in all the k group means with one big shot. WOW!)

 WILK'S LAMBDA  (This is a multivariate test that tests the significance of the differences of several population centroids of multivariate normal distributions. The bars in the numerator and denominator are determinants of the WITHIN GROUPS MATRIX and the TOTAL GROUPS MATRIX respectively. These matrices are variance-covariance matrices for the multiple dependent variables. Interestingly a small value of Wilk's Lambda is desirable for significance. This Lambda is usually employed in rather complex functions to actually run the test.)

OK from this list you can appreciate the statement that variation makes the statistical world go around. Keep in mind the list is not exhaustive but I would be exhausted if I continued this process without presenting something new and sort of exciting. To finish off this sticky wicket for the month I want to propose to you one last intuitive and compelling index of variation. To my knowledge, this index has never appeared in any behavioral science statistics textbook nor has it been used in any published study in this field. I am less certain of its use in economics, business, and other disciplines but at best its employment would be rare. To set the stage, consider a set of N scores: X1, X2, X3,..., XN. Now take all the possible differences between each and every score: X1 - X2, X1 - X3,..., X1 - XN, X2 - X3, X2 - X4,..., X2 - XN,..., XN-1 - XN. It should strike you with little thought that a possible measure of the spread of a set of scores may involve simply looking at each and every pair-wise difference. This is a simple concept but very powerful. Let's give it a try. One slight modification we will make is to take the square of each difference to solve the problem of negative and positive differences. Now I want you direct your attention to formula (A) on the right. Notice the big surprise on the right side of the equation: Dear old STEP ONE mutiplied by an N in front of it staring right at you. Are you kidding me? Absolutely amazing. I have omitted the proof of this formula but believe me it is as solid as a rock. Next it seems reasonable to find the mean of all these squared differences. That would involve determining the total number of differences present in a set of N scores. This would be the combinations of N things taken 2 at a time which is N! / (2)!(N-2)! = (N)(N-1) /2 (Recall "!" is the factorial function in mathematics). So dividing the left and right-hand side of (A) by (N)(N-1)/2 we arrive at what we will term MSAD (Mean Square of All Differences). Thus, the simplest formula is MSAD = 2∑x2/(N-1).

To give you a feel for this "new" index, I will use a small set of scores and compute it directly first and then use STEP ONE.
Consider the following set of 5 X-scores : 2, 4, 5, 5, 8
Using (A) on the left above for the direct calculation, the 10 squared differences:
(2 - 4)2 + (2 - 5)2 + (2 - 5)2 + (2 - 8)2 + (4 - 5)2 + (4 - 5)2 + (4 - 8)2 + (5 - 5)2 + (5 -8)2 + (5 - 8)2 = 4 + 9 + 9 + 36 + 1 + 1 + 16 + 0 + 9 + 9 = 94
Finally, MSAD = 94/10 = 9.4 This was not so bad here but imagine 20 scores with (20)(19)/2 = 180 differences. This would be a bear.
Now using STEP ONE on the right of (A) above we have:
∑X = 24, ∑X2 = 134 and ∑x2 = 134 -(24)2/5 = 134 - 576/5 = 134 - 115.2 = 18.8. Finally MSAD = 2∑x2/(N-1) = (2)(18.8)/4 = 9.4. Now I would like to make several comments about this index. First, it is intuitively appealing and can easily be computed with a simple function of the STEP ONE algorithim. Second, as with the standard deviation should we take the square root of MSAD as a descriptive index? Also mathematical statisticians would have to develop the sampling distribution of this statistic and its standard error before it could be employed in inferential statistics. I would certainly like to hear from my loyal readers about any reactions you have to MSAD. Thank you for reading this rather lengthy blurb.


Thank You For Visiting Sticky Stat Wickets
This Demonstrates That the Field of Statistics Is Not Cut and Dried But Still an Evolving Science With Controversies

BACK TO TOP OF PAGE

Now Back to the Fun and Humourous Side of Statistics:

Visit the Best Collection of Annotated Stat Jokes in the World With Over 200 entries. First Internet Gallery of Statistics Jokes

Read About the Fun Activities That Can Be Introduced in a Statistics Classroom. Archives of Statistics Fun

Also, If You Want Information About the Author That Created This Set of Pages Check the Home Page of Gary C. Ramseyer.

Copyright ©1997-2011 Ramo Productions. All Rights Reserved.