Statistic Reasoning with R Exam 1 Note for Printing

Used to be printed on paper

Monospaced Page

Causal Inference

Specific causal question (SCQ)

  • SCQ four components
      • intervention ("what is the impact of xxx", x)
      • event & purpose ("of xxx", x improves y )
      • who ("for xxx", group_A, a group )
      • alternative or the control ("relative to xxx", x_control )
    • Example output: What is the impact of x on improving y for group_A relative to x_control.
    • Define y: continues variable, each individual of group_A has a y

Hypothesis

The researchers hypothesize that x will improve y for group_A.

Potential Outcomes of One Treatment (total count = 2)

  • What would the outcome of y be if one of group_A (a individual of group_A) was subjected under x_control.
  • What would the outcome of y be if one of group_A was subjected under x.

Average Factual Outcome (for treatment effect x)

What is the average y after individuals in group_A are subjected under x.

Average missing counterfactual (for treatment effect x)

What would have been the average y for individuals in group_A are subjected under x_control instead of x but all else remained the same.

Randomization

  • Randomization ensures the average difference in outcome of y between x and x_control is only due to the treatment because the two groups are on average identical to each others in all other pretreatment characteristic.
  • Ensures internal validity
  • lacks in external validity, where the conclusion can only be generalized for this experiment.


Observational

  • Not randomized
  • estimated average MCF using group_A who received x_control, but cannot guarantee unbiased
  • To be unbiased: no other features systematically differ between those who was subjected under x and those who was subjected under x_control


Confounders, covariate

  • Systematic difference
  • Two conditions for cofounder
    • related or predicts the outcome of y and not observed
    • difference in baseline covariate of the x group and the x_control group

Univariate Summary Statistics And Figures

Want to find out what values they take; the frequency of each value or each range of values: central tendency, spread, shape, notable features.

Variables to identify

  • continuous
  • discret
    • dichotomous (use mean)
    • categorical (use mode for central tendency)
    • ordinal (use median ): categorical with meaningful order
    • continuous variable (mean and median): numeric variable


Quantiles:

  • Univariate variable (median: average of middle-most two values, or middle-most)
  • mean: more sensitive to outliers than median
  • mode: the most frequently appearing variable
  • first quartile (lower quartile): the first 25% of the data
  • second quartile: the first 50% of the data, which is the median
  • third quartile (upper quartile): the first 75% of the data
  • interquartile range: diff between upper and lower quart. and measures the spread
  • root mean square: abs. magnitude change in proportion
  • standard deviation: the proportion of y subjected to x is approx. sd() away from its mean.

Plots

  • Bar plots: Summarize the distribution (0.3, 0.4, etc.), or dichotomous var, of a proportion var or char. var with multiple categories.
  • Histogram: (For numeric vals) First, discretize by creating bins; second, calculate the density of each bin; third, use density as the height of the bin.
  • Box plot: shows distribution of numeric values; best to show variables side-by-side; visualizes median, upper q., lower q., and IQR together

Bivariate Relationships

  • Scatter Plot: Bivariate only; shows relationship between two continuous values.

Types:

  • dichotomous vs. categorical: grounded bar plot
  • dichotomous vs. continuous : two bar plots, overlaid or side-by-side histograms
  • categorical vs. continuous : two box plots side-by-side
  • continuous vs. continuous : scatter



Yi Yang
Yi Yang

My research interests include end-to-end encrypted systems, encryption, and information security.