Statistic Reasoning with R Exam 1 Note for Printing

Used to be printed on paper

Sep 26, 2023 3 min read

Monospaced Page

Causal Inference

Specific causal question (SCQ)

SCQ four components
- - intervention ("what is the impact of xxx", x)
  - event & purpose ("of xxx", x improves y )
  - who ("for xxx", group_A, a group )
  - alternative or the control ("relative to xxx", x_control )
- Example output: What is the impact of x on improving y for group_A relative to x_control.
- Define y: continues variable, each individual of group_A has a y

Hypothesis

The researchers hypothesize that x will improve y for group_A.

Potential Outcomes of One Treatment (total count = 2)

What would the outcome of y be if one of group_A (a individual of group_A) was subjected under x_control.
What would the outcome of y be if one of group_A was subjected under x.

Average Factual Outcome (for treatment effect x)

What is the average y after individuals in group_A are subjected under x.

Average missing counterfactual (for treatment effect x)

What would have been the average y for individuals in group_A are subjected under x_control instead of x but all else remained the same.

Randomization

Randomization ensures the average difference in outcome of y between x and x_control is only due to the treatment because the two groups are on average identical to each others in all other pretreatment characteristic.
Ensures internal validity
lacks in external validity, where the conclusion can only be generalized for this experiment.

Observational

Not randomized
estimated average MCF using group_A who received x_control, but cannot guarantee unbiased
To be unbiased: no other features systematically differ between those who was subjected under x and those who was subjected under x_control

Confounders, covariate

Systematic difference
Two conditions for cofounder
- related or predicts the outcome of y and not observed
- difference in baseline covariate of the x group and the x_control group

Univariate Summary Statistics And Figures

Want to find out what values they take; the frequency of each value or each range of values: central tendency, spread, shape, notable features.

Variables to identify

continuous
discret
- dichotomous (use mean)
- categorical (use mode for central tendency)
- ordinal (use median ): categorical with meaningful order
- continuous variable (mean and median): numeric variable

Quantiles:

Univariate variable (median: average of middle-most two values, or middle-most)
mean: more sensitive to outliers than median
mode: the most frequently appearing variable
first quartile (lower quartile): the first 25% of the data
second quartile: the first 50% of the data, which is the median
third quartile (upper quartile): the first 75% of the data
interquartile range: diff between upper and lower quart. and measures the spread
root mean square: abs. magnitude change in proportion
standard deviation: the proportion of y subjected to x is approx. sd() away from its mean.

Plots

Bar plots: Summarize the distribution (0.3, 0.4, etc.), or dichotomous var, of a proportion var or char. var with multiple categories.
Histogram: (For numeric vals) First, discretize by creating bins; second, calculate the density of each bin; third, use density as the height of the bin.
Box plot: shows distribution of numeric values; best to show variables side-by-side; visualizes median, upper q., lower q., and IQR together

Bivariate Relationships

Scatter Plot: Bivariate only; shows relationship between two continuous values.

Types:

dichotomous vs. categorical: grounded bar plot
dichotomous vs. continuous : two bar plots, overlaid or side-by-side histograms
categorical vs. continuous : two box plots side-by-side
continuous vs. continuous : scatter