Epidemiology and statstics
Answer all questions. Please show your working for calculations.
Formatting your assignment:
• document in PDF format. This format locks the information and is particularly important if you are using a Mac computer to preserve the formatting, graphs and SPSS output;
• Please number every page and include your SID in the header of every page (not your name);
• Clearly number the answer to each question; only include your answer (not the question);
• Use 11 point or 12 point font and 1.5 times spacing (not double spacing);
• Format figures and tables appropriately according to the guidelines given in this course. Any hand drawn graphs should be scanned and pasted into your assignment;
• Do not submit SPSS data files.
Section 1
A cross-sectional study was conducted to examine the association between regular physical exercise and Low -Density Lipoprotein (LDL) cholesterol levels. LDL is a type of fat and a major constituent of blood cholesterol. This is also called “bad cholesterol” and is a significant risk factor for heart attack and stroke. To lower the blood LDL level, physicians normally recommend a low-saturated fat diet with regular exercise. Data on a random sample of 1000 participants is posted on Moodle in an Excel file LDL.xlsx for your assignment. In the dataset, LDL cholesterol is expressed in mg/dL (micrograms/decilitre) and physical exercise is a binary variable (0=does not do regular physical exercise, 1= does do regular physical exercise).
NOTES
• Sometimes there is not a single way of handling data to answer a question. When you are asked to make your own decision in this assignment you should state the reasons for the decision you have made.
• To describe the distribution of a variable, you need to report the appropriate measures of central tendency and variation.
• Tables and graphs should be labelled appropriately.
1. Examine the LDL cholesterol variable in the dataset:
a. Describe the distribution of LDL cholesterol and include appropriate plots and descriptive statistics.
Are there any outliers? If so, deal with them appropriately, describe what you have done and the reasons for the decisions you have made.
b. Examine the distribution (i.e. test for normality) of LDL cholesterol in the dataset you created in part a after dealing with any outlier(s). Please use appropriate statistics and plots to describe the distribution.
Note: please use the final dataset you have created in this section after dealing with the outliers as required, to carry out all the subsequent analyses.
2. National guidelines divide blood LDL cholesterol levels into five categories according to clinical significance, as shown below:
LDL Cholesterol Level LDL Cholesterol Category
Less than 100 mg/dL Optimal
100–129 mg/dL Near optimal/above optimal
130–159 mg/dL Borderline high
160–189 mg/dL High
190 mg/dL and above Very high
Create a new variable named ‘LDL_Level’ according the categories shown in the table above.
Report the frequency distribution of LDL_level in a table.
[note that you do not need to show the SPSS steps you followed to generate the variable LDL_level].
3. If you randomly selected a person from the population that the sample was drawn from,
what is the probability that the person will have:
a. Very high LDL?
b. Borderline high LDL?
Please show your working for calculations and graphically display the P-value as the area under the normal curve for each of these two LDL levels. Use the sample mean and standard deviation from the data set you created in part a to calculate the probabilities.
[Note: you can draw this graph either by hand or using the drawing tools available in MS Word. You will be assessed only for the correctness of the graph not for its artistic quality. If drawn by hand you can scan the drawing and attach it to your assignment.]
4. Write the research question for this study in PICO format. ]
5. State the null and alternative hypotheses.
6. What statistical test would you conduct to examine the hypotheses you have stated in
Question 5?
7. Generate and report error bars of LDL cholesterol by physical exercise category of the participants. Interpret the error bars in terms of the statistical significance of the difference
between the two groups.
8. Conduct the statistical test you have mentioned in Question 6 using SPSS. Report the
relevant SPSS output from the test and interpret the appropriate statistics.
9. What assumptions you have made for the test you conducted in Question 8? Check all the
assumptions using appropriate graphs and test statistics.
10. Write a brief conclusion summarising and interpreting your results.
Section 2
Part A
Stroke is the third leading cause of death in Australia. A stroke is often referred to as a cerebrovascular accident that affects blood flow to the brain.
A research team was interested in studying the distribution and determinants of stroke. The first study the research team conducted was to investigate the distribution of stroke in Australia. Data are presented in Table1:
Table1: Age-specific number of stroke and population size in Australia, 2004
Age group (years) Total cases of Total population
stroke
45-54 239 2,713,959
55-64 468 2,025,247
65-74 1,470 1,353,800
75-85 4,629 906,159
>85 6,036 286,464
Total 12842 7285629
1. Calculate the crude rate of stroke for the total population and for all age groups?
2. Calculate the 95% confidence interval (CI) of the stroke rate for the total population and the 55-64 age group? Interpret the meaning of the 95% CIs you have calculated.
3. Graph the age-specific rates for stroke using an appropriate graph.
4. Use Table 1, the graph and the data you have calculated to describe the distribution of
stroke in Australia? Include relevant data from the table and graph to illustrate the
points you make.
Part B
Read the Extract, which is a highly edited version of a published paper. Use the information provided in the Extract to answer the following questions.
1. In your own words, write the research question in this study (use PICO format).
2. What type of study has been used to answer the research question? What would be the ideal study design for this type of research question? Provide reasons for your
answers.
3. a. In your own words, briefly describe the study participants and the source population
b. Are there any issues of concern about selection bias in this study? Provide reasons
for your answer.
4. a. What was the main study factor? Are there any concerns with measurement error of
the study factor? Provide reasons for your answers.
b. What was the outcome factor? Are there any concerns with measurement error of
the outcome factor? Provide reasons for your answers.
5. In your own words, briefly summarise the main results of the study shown in Table 2 of
the extract. ]
6. How would selection and measurement issues you described in questions 3 and 4 affect the internal validity of this study? For which populations could the results of this
study be generalised to? Provide reasons for your answers.
7. The researchers reported that among the 216 cases of FDE and 395 controls, 120 cases and 253 controls had high grade actinic skin damage from sun exposure.
a. Create a 2 x 2 table to display this information.
b. Calculate the crude (unmatched) Odds Ratio and 95% confidence interval of the association between high grade actinic skin damage and FDE in this study.
Interpret the values you have calculated.
ABSTRACT
Objectives: To examine whether past and recent sun exposure and vitamin D status (serum 25hydroxyvitamin D [25(OH)D] levels) are associated with a risk of first demyelinating events (FDEs) and to evaluate the contribution of these factors to the latitudinal gradient in FDE incidence in Australia.
Methods: This was a multicenter study. Cases (n = 216) were aged 18–59 years with a FDE and resident within one of 4 Australian centres (from latitudes 27°S to 43°S), from November 1, 2003, to December 31, 2006. Controls (n = 395) were matched to cases on age, sex, and study region, without CNS demyelination. Exposures measured included self-reported sun exposure by life stage, objective measures of skin phenotype and actinic damage, and vitamin D status.
Conclusions: Sun exposure and vitamin D status may have roles in the risk of CNS demyelination. Both will need to be evaluated in clinical trials for multiple sclerosis prevention.
GLOSSARY
AOR = adjusted odds ratio; CI = confidence interval;
CNS = central nervous system; FDE = first demyelinating event;
MS = multiple sclerosis; OR = odds ratio;
UV = ultraviolet.
INTRODUCTION
The etiology of multiple sclerosis (MS), an autoimmune disease of the central nervous system (CNS), is uncertain. Positive latitude gradients in the occurrence of MS and its common precursor, first demyelinating events (FDEs), along with findings from individual-level epidemiologic studies, indicate that low sun exposure may increase risk. However, most such studies have involved prevalent MS cases.
Two prospective epidemiologic studies found that higher vitamin D intake or serum levels were associated with reduced MS risk. In animal studies of MS, high dose supplementation with the active form of vitamin D (1,25[OH]D), or ultraviolet (UV) exposure without significant increase in vitamin D status suppressed the development of disease. Previous human studies have not measured past sun exposure before the onset of MS and thus have been unable to differentiate etiologic roles of sun exposure and vitamin D status, although this could have important implications for preventive interventions for MS.
We present the results of a large, epidemiologic study that has been able to address these issues and examine
whether past and recent sun exposure and vitamin D status are associated with a risk of FDEs in incident cases. We also examined the extent to which the latitudinal gradient in the incidence of FDEs across Australia can be attributed to recent and past sun exposure.
METHODS
We conducted a multicenter study in 4 regions of Australia: Brisbane City (latitude 27° South), Newcastle City and surrounds (33° South), Geelong City and the Western Districts of Victoria (37° South), and the island of Tasmania (43° South). Participants were aged 18–59 years and resident within a study region between November 1, 2003, and December 31, 2006.
Participants. Cases had an incident first clinical diagnosis of classic FDEs (defined as a single, first, episode of demyelination). Cases were notified to the study by medical specialists and a study neurologist confirmed the date and symptomatology of the FDE and conducted a full neurologic examination. Case clinical information was reviewed annually by the study neurologist group. We aimed to recruit all incident cases within each study region from November 1, 2003–December 31, 2006.
Controls without evidence of demyelinating conditions were randomly selected from the Australian Electoral Roll (compulsory registration for citizens: ≥18 years) and matched to cases on age (within 2 years), sex, and study region.
Measurements. Questionnaire data. Sun exposure measurements included time in the sun during weekends and holidays (leisure time) in summer and winter for different periods of life (6–10 y, 11–15 y, 16–20 y, and last 3 y) and a calendar, noting, for each year of life, the location of residence, school/occupation, and leisure time in the sun in summer and winter from age 6 years. The validity and reliability of these measures has been previously reported. For example, the test-retest reliability (with an 11-week interval between tests) for recall of childhood/ adolescent sun exposure (e.g., time in the sun in summer) and recall of recent adult sun exposure (time in the sun in summer, last 3 years) was high. Recall of total lifetime sun exposure (assessed from the calendar data) was significantly associated with the actinic damage score (p < 0.01).
Other relevant data included self-reported propensity to tan or burn; freckles as a teenager; smoking history (total
years smoking as a continuous variable); highest education level (3 categories, see Table 1); usual physical activity (scored and categorized into 3 levels according to the International Physical Activity Questionnaire); a food frequency questionnaire; and use of vitamin D–containing supplements in the last year (assumed to contain 400 IU if supplement not named or dose supplied).
Examination data. Research officers noted the natural skin and eye color (with reference to standardized color photographs) and ethnicity (Caucasian, Asian, African, Australian Aboriginal or Torres Strait Islander, Other), and undertook a nevi count on the left arm. Skin reflectance on the buttock (non-sun-exposed site) was measured using a hand-held spectrophotometer (Minolta CM-2500D) to estimate cutaneous melanin density. Silicone rubber impressions (casts) of the skin on the back of both hands were made as previously described. Casts were photographed and graded on a scale from 1 to 6 representing minimal to severe actinic skin damage.
Vitamin D. Participants provided a blood sample. Serum aliquots (1 mL) were stored at -80°C and analyzed at study completion for 25hydroxyvitamin D concentration [25(OH)D], using liquid chromatography dual mass spectrometry. There was high inter-batch agreement for duplicate samples.
Ambient UV over the life course. Average daily ambient UV exposure for every month of life for each participant was estimated using the latitude and longitude of residence (assigned using GIS software) and data from the TOMS satellite. The leisure-time UV dose was calculated as follows: (ambient UV X proportion of day in the sun), summed over the relevant period, from 6 years of age.
Statistical analysis. Odds ratios (ORs) and 95% confidence intervals (95% CI) were calculated using logistic regression. Adjusted ORs (AORs) include adjustment for physical activity, smoking, and past history of infectious mononucleosis, with additional adjustment where noted. Statistical significance was defined as p < 0.05. Participants with missing data on factors of interest were excluded from those specific analyses.
To examine whether any UV-related factors could “account” for the observed latitudinal variation in FDE incidence, we compared values for Tasmania (highest incidence region) and Brisbane (lowest incidence region) and calculated the increase in incidence. All analyses
were undertaken using Stata for Windows (version 9.2; StataCorp LP, College Station, TX).
Standard protocol approvals, registrations, and patient consents. The Ausimmune Study was approved by 9 regional Human Research Ethics Committees. All participants gave written informed consent.
RESULTS
Of 248 cases notified to the study, 14 (5.6%) were ruled ineligible and 18 (7.7%) refused to participate, leaving 216 participating eligible FDE cases (participation rate = 92%). Of 755 controls initially identified, 634 were successfully contacted (84%), and 395 participated in the study (62% of those contacted). Each control was specifically matched to an eligible FDE case.
Participant characteristics are presented in Table 1. FDE cases were similar in educational level, but fairer skinned than matched controls, by both self-report and spectrophotometric measurement of buttock melanin across density (OR = 0.83; 95% CI 0.73–0.94). Accounting for this, FDE cases were also more likely to have a larger number of nevi on the left arm [p (trend) = 0.001].
Sun exposure. High coherence was observed across the markers of sun exposure. For example, among controls, higher recent time in the sun (hours) predicted vitamin D level (p < 0.001) and leisure-time UV dose (6 years to current age) predicted the actinic skin damage grade (p = 0.005).
In univariate analyses, higher self-reported time in the sun in the 3 years prior to interview (OR = 0.84; 95% CI 0.72-0.99), and increasing leisure-time UV dose (6 year to current age) (OR = 0.70; 95% CI 0.53-0.94), were associated with reduced FDE risk and this finding was not altered by adjustment for use of sun protection (data not shown). Sunburn history was not associated with FDE risk (e.g., past history of blistering sunburn, OR = 1.36; 95% CI 0.91–2.01). Multivariate logistic regression results are shown in Table 2.
Vitamin D status. FDE cases had lower 25(OH)D levels than matched controls (mean [SD] nmol/L cases 75.1 [31.9]; controls 80.4 [31.4]) and FDE risk decreased with increasing 25(OH)D level (Table 2).
The lower serum 25(OH)D levels among FDE cases occurred despite higher vitamin D supplement use. At interview, 34.3% of FDE cases and 26.6% of matched controls were taking a vitamin D–containing supplement
(p = 0.05). Of these participants, 16.7% of cases
compared to 7.8% of controls (p = 0.07) commenced the
supplement within the 2 months following the case’s first
episode (compared to 2.8% of cases and 5.8% of controls,
p = 0.34, commencing within the 2 months prior to the
episode).
Additional analyses. The median time lag from the first
event to the study interview was 147.5 days (IQR 77.5–
220), and, reassuringly, time lag did not influence
25(OH)D levels within the FDE group (p = 0.66).
Furthermore, the association between 25(OH)D level and
FDE risk also did not vary by the time lag.
Contribution of sun exposure and vitamin D status to
the latitude gradient in FDE incidence. In the study
regions, age-and sex-standardized FDE incidence varied
from 2.1 (95% CI 1.6–2.6) at 27°S to 8.7 (95% CI 6.6–
10.7) at 43°S (per 100,000 per year during November 1,
2003, to December 31, 2006), a 4-fold increase. Among
study participants, the sun-related indices accounted for
only part (a 32.4% increase in incidence) of the observed
latitudinal FDE incidence gradient across these
Australian regions.
Table 1: Characteristics of study participants
Characteristic Cases Controls
Age, y, mean (SD)b 37.46 (9.41) 38.61 (9.32)
Sex, n (%) males 50 (23.2) 89 (22.5)
Education, n (%)
Year 10 or less 52 (24.1) 123 (31.1)
Year 12 or TAFE 111 (51.4) 165 (41.8)
University 51 (23.6) 104 (26.3)
Ethnicity, n (%)c
White 208 (96.3) 370 (93.7)
Other 6 (2.8) 24 (6.1)
Total years smoked, median (IQR)d 6.9 (0–19) 2.0 (0–14.7)
Physical activity, n (%)e
Low 30 (14.2) 61 (16.3)
Moderate 82 (38.9) 156 (41.6)
High 99 (46.9) 158 (42.1)
Past infectious mononucleosis, n (%)
Yes 56 (25.9) 66 (16.7)
Nof 157 (72.7) 325 (82.3)
Buttock melanin density, median (IQR) 1.2 (0.7–2.0) 1.4 (0.8–2.7)
Freckling as a teenager, n (%)
No freckles 57 (26.4) 108 (27.3)
Few freckles 99 (45.8) 169 (42.8)
Some freckles 44 (20.4) 86 (21.8)
Many freckles 14 (6.5) 32 (8.1)
End of summer tan, n (%)
Dark 44 (20.4) 83 (21.0)
Medium 85 (39.4) 155 (39.2)
Light 50 (23.2) 107 (27.1)
No tan 35 (16.2) 49 (12.4)
Reaction to 1 hour of summer sun, n (%)
Burn then peel 97 (44.9) 153 (38.7)
Burn then tan 83 (38.4) 164 (41.5)
Tan only 34 (15.7) 77 (19.5)
Abbreviations: FDE = first demyelinating event; IQR = interquartile range; TAFE = Technical and Further Education
a. Numbers not adding to totals represent missing data, generally <1%.
b. Age at interview.
c. Assessed by the study nurse at interview; other includes Asian, Aboriginal and Torres Strait Islander and African.
d. Total years smoked, continuous variable.
e. Physical activity was scored and categorised according to the International Physical Activity Questionnaire.
f. Includes “Don’t know” responses to questions.
Table 2: Adjusteda Odds Ratios and 95% Confidence Intervals for sun exposure and vitamin D exposure and FDE
aORa 95% CI
Leisure time UV dose 6 years–current age (per 1,000 kJ/m2 increase) 0.70 (0.53–0.94)
Actinic skin damage score (>3vs ≤3) 0.42 (0.26–0.70)
UV dose preceding the FDE 0.58 (0.30–1.14)
25(OH)D levels (per 50 nmol/L increase) 0.68 (0.48–0.98)
Abbreviations: AOR = adjusted odds ratio; CI = confidence interval; FDE = first
demyelinating event;
UV = ultraviolet.
a. Adjusted for total years smoked, history of infectious mononucleosis, physical activity, and buttock melanin density.

