Generalized Linear Model (GLM)

......

Generalized Linear Model (GLM) ......

Generalized Linear Model (GLM)

A dataset is a collection of n values or scores from a population or sample of interest. When performing descriptive analysis of a dataset, it is crucial to consider the measures of central tendency of the dataset because this can affect the applicability of inferential statistical tests or methodologies to the dataset. In essence, the measures of central tendency identify the middle-most value in the dataset. The primary measures of central tendency are:

Mean - the arithmetic average of the dataset
Median - the middle value of the dataset after sorting the data
Mode - the most frequent value in the dataset
Range - a measure of variability in the dataset represented as max-min=x
Standard Deviation - a measure of how tightly clustered or spread the data is to the mean

Example

We have a dataset containing a sample of diastolic blood pressure (DBP) from 20 subjects.

Descriptive Analysis

Before conducting any numerical analysis of a dataset, it is best practice do some descriptive or qualitative analysis of the data. This helps us to identify and understand any peculiarities and nuances of the data. This would normally include a frequency histogram. Suppose we wish to have a histogram illustrating:

  • a bar chart

  • the mean

  • the median

  • the probability density

Histogram

Mean

The mean is the arithmetic average of the data. It is given by the formula:

Median

The median is the value that has half the values in the dataset below it and half above it.

If the data contains an odd number of observations, the median is given by the formula:

If the data has an even number of observations, the median is given by the formula:

There are 20 observations (data points) so we use the second formula.

The median of the diastolic blood pressure sample is 82.

Standard Deviation

The standard deviation is given by the following formula.

We are dealing with a sample so we must use the first formula which corrects for the fact that we do not know the true population mean. But rather than manually calculating the s.d. using the formula, we will use the R standard deviation function sd.

Summary

The measures of central tendency for this dataset can be summarized as:

The standard deviation of the DBP sample is 6.661

Mean = 83.2
Median = 82
Mode = 80
Std deviation = 6.6618