+1 917 8105386 [email protected]

Micro-evolution

Micro-evolution STATISTICS IN INTRODUCTORY BIOLOGY T hi s supplement is excerpted with permission from the BIOL 1010 lab m anual (Bishop et al., 2012). References to BIOL 1010 and BIOL 1011 also apply to BIOL 1020 and BIOL 1021, respectively. Bishop T, Gass G, Van Dommelen J. 2012. Appendix E: Statistics in Introductory Biology. I n: Biology 1010 Laboratory Manual. Halifax (NS): Dalhousie University. *************************************************************************************************** In virtually every published primar y research article in science, the Results section will contain the results of a number of statistical tests performed on the data collected by the researchers. Scientists use statistics to demonstrate mathematically that their results (for example, that p lants treated with Fertilizer A grew larger than plants treated with Fertilizer B) are meaningful. For example, a biologist might weigh the plants in the two groups and find that the average weights calculated for each group were different values. The rese archer would not stop there, but would then want to find out whether that the difference observed between the two groups is a legitimate or significant one (Fertilizer A really does promote plant growth better than Fertilizer B), rather than just an accide nt of chance (that is, the plants chosen for measurement in treatment A just happened to be heavier than the plants chosen for measurement in treatment B, even though there was no real difference in weights caused by the fertilizer used). Another biologist might have used a hypothesis to generate a prediction of the frequency of a particular phenotype in the offspring of a cross between two plants. When he or she actually performs that cross by breeding the plants together, do the frequencies match what was predicted? If they don’t match exactly, are they close enough, or do the expected and the observed differences differ significantly? In this Appendix, you will learn the basic stati stical tech niques that you may need to use in your BIOL 1010 and 101 1 lab oratory activities. More advanced biology classes make use of more advanced statistical techniques, but many of these techniques are based on the same concepts you will use in your labs this year. There are three sections to this Appendix: I. Basic descr iptive statistics: mean and standard deviation II. Statistical tests: the chi - square test III. Standard error and 95% confidence intervals I. BASIC DESCRIPTIVE STATISTICS: MEAN AND STANDARD DEVIATION When a group of measurements are taken, we often want to be able to characterize that group using descriptive statistics: for example, what was the middle or average weight of a plant in that group? How much did individual plants in that group tend to differ in weight from one another? A common measure of the middle or average value used in biology is the mean . You have likely calculated means in secondary school math: the mean is found by adding up all of the observed values, then dividing by the number of observed values. The number of observations or data po ints is referred to as n . The Greek letter sigma ( ? ) indicates that you should sum up whatever comes immediately after the sigma. We can represent the procedure for finding the mean like this: mean = ?observed values / n . In spreadsheet programs such as Microsoft Excel or Google Docs Spreadsheets you can calculate the mean using the “=AVERAGE” formula. The variability of the data set (how much the values tended to differ from the mean) is described using the standard deviation . Together, the mean and the standard deviation tell you about the distribution of your observations: what value they cluster around, and how narrow or wide that cluster is. The larger the differences between each observation and the mean, the larger the standard deviation. The procedure for finding the standard deviation of a sample is more complex than the procedure for finding the mean. Some values will fall below the mean (resulting in a negative number), while some values will fall above the mean (resulting in a positive number), so the values need first to be squared so that all of the differences will be positive, then a square root taken. Here is the formula describing this procedure: standard deviation = v ( ?(observed value – sample mean) 2 / n - 1) You can use the “=STDEV” formula in spreadsheet software to calculate the standard deviation. In BIOL 1010, you will need to know what the mean and standard deviation tell you about your set of data. In BIOL 1011, we will build on this knowledge: you will learn how to use n and standard deviation to calculate a related value called standard error, so that when graphing your data you can quickly assess whether the means of two groups are likely to be significantly d ifferent, as in the case of the Fertilizer A and B treatments described above. II. STATISTICAL TESTS: THE CHI - SQUARE TEST In both BIOL 1010 and 1011, you will carry out and interpret a statistical test called the chi - square ( ? 2 ) test of goodness of fit. This test will allow you to test hypotheses by comparing your predictions to your observations, as in the plant cross example described above. On the next page, you will find complete instructions for carrying out and interpret ing the chi - square test. This test is just one of a very wide range of statistical tests used in science, and if you take upper - year courses in biology you will likely encounter many different statistical tests. However, these tests tend to share some comm on features: o The purpose of the test is to help you decide whether or not to reject some hypothesis. The hypothesis itself will differ depending on the study being performed and the statistical test being used, but at the end of the test you should be able to say whether the hypothesis should be rejected or not. Notice that we do not say that the hypothesis is “supported” or “proven”, simply that we fail to reject it. o At the end of the mathematical operations involved in the test, you have computed what is called a test statistic . In the chi - square test, the test statistic is the ? 2 value that you calculate by adding up the squared differences between observed and expected values divided by the expected value; other types of tests (the Student’s t - test or the Mann - Whitney U test, for example) have their own test statistics arrived at by their own procedures. o Each test also requires that you find the number of degrees of freedom , which is related to the number of different categories being studied. Together, the test statistic and the degrees of freedom value will allow you to interp ret the results of your test. o When the test statistic and degrees of freedom have been calculated, you use these values to consult s tatistical tables (on paper or in computer databases) specific to each statistical test. In your chi - square test, the degree s of freedom value tells you which row of the table to look in. BIOL 1020 Lab Assignment: Microevolution Start by re-saving this file as follows:  lab_surname_labtitle.rtf, substituting your own surname.  Remember to convert to PDF after you have finished entering your answers and before submitting for grading. Type your responses to the questions below where indicated.  Remember to save your work frequently. Data Analysis and Interpretation 1. Use your data to estimate the allele frequencies at the longhair locus for the cats in each city of your chosen pair. Use the warm-up exercises in the online content as a guide and show your work clearly. Enter your results in Table 2 (which goes with Question 8 in this document). (2 marks) (a) City # 1 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (a) City # 2 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 2. What percentage of the cat population in each city is heterozygous at the longhair locus? Use the warm-up exercises in the online content as a guide and show your work clearly. (2 marks) (a) City # 1 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (b) City # 2 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 3. Use your data to calculate the allele frequencies at the spotting locus for the cats in each city of your chosen pair. Exclude unknowns from your totals. Use the warm-up exercises in the online content as a guide and show your work clearly. (2 marks) (a) City # 1 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (b) City # 2 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 4. Use your answers from the previous question to calculate the NUMBER of cats with each genotype that would be expected in your sample if the population were in Hardy-Weinberg equilibrium with respect to the spotting locus.  Use the warm-up exercises in the online content as a guide and show your work clearly. Add your genotype numbers to the appropriate ‘Expected #’ columns in Tables 1a and 1b. (2 marks) (a) City # 1 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (b) City # 2 [replace with city name] RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 5. From your data sheet, add your observed number of cats for each genotype associated with the spotting locus to the ‘Observed #’ columns in Tables 1a and 1b. Do the ‘Expected’ genotype numbers match the actual genotype numbers that you observed in your samples?  Probably not.  But are the differences STATISTICALLY significant?  If not, then we can say that the differences are due to chance alone, and that they do not represent a meaningful deviation from the equilibrium numbers.   If the differences ARE statistically significant, then we can say that they are not due to chance alone, and that there is some other factor that accounts for the differences. We can use the chi-squared test of goodness of fit to determine whether the observed spotting genotype numbers in your data are significantly different from those expected under equilibrium conditions.  In this case, we can say that the null hypothesis is that there is no difference between the observed spotting genotype numbers and those expected under Hardy-Weinberg equilibrium. Complete Tables 1a and 1b to obtain a test statistic for each city, and answer the questions that accompany them. (5 marks) Table 1a. Calculation of chi-squared test statistic for three genotypes in cats at shelters in _____________________________________________[fill in the name of the first city in your pair]. Genotype (class)    Observed # (o)    Expected # (e)    (o-e)    (o-e)2    (o-e)2 e SS Ss ss Total                    ?2 = You will have to determine the number of degrees of freedom before you proceed. When calculating the degrees of freedom for Hardy-Weinberg, the equation is slightly different than in other genetics problems. Use the formula df = k - r where k= the number of classes (genotypes) and r = the number of alleles in an individual (a) How many degrees of freedom are there for Table 1a? RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (b) What p-value did you obtain for the test statistic in Table 1a (refer to Table 3 near the end of this document)?  Give a range if appropriate. RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (c) Should you reject or fail to reject the null hypothesis?  With reference to your p-value, justify your decision. RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS Table 1b. Calculation of chi-squared test statistic for three genotypes in cats at shelters in _____________________________________________[fill in the name of the second city in your pair]. Genotype (class)    Observed # (o)    Expected # (e)    (o-e)    (o-e)2    (o-e)2 e SS Ss ss Total                    ?2 = (d) How many degrees of freedom are there for Table 1b? RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (e) What p-value did you obtain for the test statistic in Table 1b?  Give a range if appropriate. RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS (f) Should you reject or fail to reject the null hypothesis?  With reference to your p-value, justify your decision. RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 6. With respect to the spotting locus, is microevolution occurring in the cat population in either city in your pair? State your evidence. (1 mark) RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 7. List the five assumptions of the Hardy-Weinberg principle. If micoevolution is occurring with respect to the spotting locus, which if the assumptions do you think could be violated? Explain your answer. (If your data indicate that microevolution at the spotting locus is NOT occurring, pretend for a moment that it is, and answer the same question.) (1 mark) RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 8. Table 2 below summarizes your data and calculations. Another test (the chi-squared test of independence) would be required to determine whether any differences between the cities might be statistically significant, but that is beyond the scope of this lab.  For our purposes, we’ll consider the differences to be significant. Propose a hypothesis (explanation) as to why there are differences in cat data between cities. You may propose a general hypothesis (i.e., one that might apply to any or all of the items in Table 2), or a hypothesis specific to a particular item in Table 2. (Hint: the cities weren’t paired at random!) (1 mark) Table 2. Summary of observations and calculations based on data collected from photos of cats at shelters in two North American cities. City    longhair    spotting    HWE for spotting (yes/no) f (L)    f (l)    f (SS)    f (Ss)    f (ss) enter name of first city enter name of second city RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS 9. Name one potential drawback to the data collection procedure and speculate about its potential impact on your data. (1 mark) RESPONSE: PLEASE LEAVE THE SPACE BELOW EMPTY FOR TA COMMENTS Table 3. Critical ?2 Values Degrees of Freedom                    Probability (P) 0.95    0.8    0.5    0.2    0.05    0.01    0.005 1    0.004    0.064    0.455    1.642    3.841    6.635    7.879 2    0.103    0.446    1.386    3.219    5.991    9.21    10.597 3    0.352    1.005    2.366    4.642    7.815    11.345    12.838 4    0.711    1.649    3.357    5.989    9.48    13.277    14.86 5    1.145    2.343    4.351    7.289    11.07    15.086    16.75 6    1.635    3.07    5.348    8.558    12.592    16.812    18.548 7    2.167    3.822    6.346    9.803    14.067    18.475    20.278 8    2.733    4.594    7.344    11.03    15.507    20.09    21.955 Non significant    Significant Using the table of Critical ?2 Values 1.     Locate the row containing the appropriate degrees of freedom. 2.     Find where your chi-squared test statistic fits within the range of numbers in the row (it may fall outside of the range; i.e. to the left or right ends of the scale). 3.     Note the probability values (p-values) corresponding to your test statistic and determine which p-values your test statistic lies between, or whether the p-value is off the scale. 4.     According to statistical convention, a p-value of less than 0.05 (p < 0.05) means that there is less than a 5% chance that the difference between what you observed and what you expected is due to chance. Therefore the difference between the actual and expected values is considered to be due to some factor other than chance. So you can reject your null hypothesis that the difference is due to chance alone. 5.    If the p-value is greater than or equal to 0.05 (p=0.05) there is a greater than 5% chance that your test statistic is due to chance, so you do not reject the null hypothesis that the difference between what you expected and what you observed is due to chance alone. Lab Assignment Survey Questions We’re interested in your feedback!  Please visit the Lab AssignmentsSurvey, via the ‘Proctor and Other Surveys’ page in the Course Menu of the class site, to enter your responses to the questions below.  This survey is anonymous. 1. Approximately how long did it take you to complete this assignment? •    less than an hour •    1-2 hours •    2-3 hours •    3-4 hours •    more than 4 hours 2. Was this a fair amount of time, considering the particulars of the assignment? •    Yes, it was a fair amount of time. •    No, the assignment could have been more comprehensive. •    No, it took too much time. 3. How would you rate the level of difficulty of the assignment? •    Easy •    Challenging, but manageable •    Too challenging 4. How would you rate the learning value of the assignment? •    The assignment helped my learning. •    The assignment did not help my learning. 5.  Do you have any additional comments or feedback about this assignment? Start by re-saving this file as follows:  lab_surname_nicroevolutiondata.xlsx, substituting your own surname.  Remember to convert to (or save as) PDF  before submitting. Table 4. Data sheet for recording selected phenotypes of cats in shelters in __________________________ [replace the blank with the first city of your pair]. cat name    Tina T    Oximo phenotype    E.g. 1    E.g. 2    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    Total short hair (L_)    1    0 long hair (ll)    0    1 100% white (W_)    0    1 <100% white (ww)    1    0 >50% white spotting (SS)    1    0 <50% white spotting (Ss)    0    0 0% white (ss)    0    0 unknown at spotting locus    0    1 Table 5. Data sheet for recording selected phenotypes of cats in shelters in __________________________ [replace the blank with the second city of your pair].

Ready To Get Started?

GET STARTED TODAY