Generalized Linear Model (GLM)
......
Generalized Linear Model (GLM) ......
Generalized Linear Model (GLM)
A dataset is a collection of n values or scores from a population or sample of interest. When performing descriptive analysis of a dataset, it is crucial to consider the measures of central tendency of the dataset because this can affect the applicability of inferential statistical tests or methodologies to the dataset. In essence, the measures of central tendency identify the middle-most value in the dataset. The primary measures of central tendency are:
Mean - the arithmetic average of the dataset
Median - the middle value of the dataset after sorting the data
Mode - the most frequent value in the dataset
Range - a measure of variability in the dataset represented as max-min=x
Standard Deviation - a measure of how tightly clustered or spread the data is to the mean
Example
We have a dataset containing a sample of diastolic blood pressure (DBP) from 20 subjects.
Descriptive Analysis
Before conducting any numerical analysis of a dataset, it is best practice do some descriptive or qualitative analysis of the data. This helps us to identify and understand any peculiarities and nuances of the data. This would normally include a frequency histogram. Suppose we wish to have a histogram illustrating:
a bar chart
the mean
the median
the probability density
Histogram
Mean
The mean is the arithmetic average of the data. It is given by the formula:
Median
The median is the value that has half the values in the dataset below it and half above it.
If the data contains an odd number of observations, the median is given by the formula:
If the data has an even number of observations, the median is given by the formula:
There are 20 observations (data points) so we use the second formula.
The median of the diastolic blood pressure sample is 82.
Standard Deviation
The standard deviation is given by the following formula.
We are dealing with a sample so we must use the first formula which corrects for the fact that we do not know the true population mean. But rather than manually calculating the s.d. using the formula, we will use the R standard deviation function sd.
Summary
The measures of central tendency for this dataset can be summarized as:
The standard deviation of the DBP sample is 6.661
Mean = 83.2
Median = 82
Mode = 80
Std deviation = 6.6618