diff --git a/docs/source/glossary.rst b/docs/source/glossary.rst index c6aedbb2..6f31530d 100644 --- a/docs/source/glossary.rst +++ b/docs/source/glossary.rst @@ -53,6 +53,9 @@ Glossary One-group posttest-only design A design where a single group is exposed to a treatment and assessed on an outcome measure. There is no pretest measure or comparison group. + Parallel trends assumption + An assumption made in difference in differences designs that the trends (over time) in the outcome variable would have been the same between the treatment and control groups in the absence of the treatment. + Panel data Time series data collected on multiple units where the same units are observed at each time point. diff --git a/docs/source/index.rst b/docs/source/index.rst index 4e3cc9c8..3e312037 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -141,6 +141,7 @@ Documentation outline :caption: Knowledge Base design_notation.md + quasi_dags.ipynb glossary.rst .. toctree:: diff --git a/docs/source/quasi_dags.ipynb b/docs/source/quasi_dags.ipynb new file mode 100644 index 00000000..dff18e53 --- /dev/null +++ b/docs/source/quasi_dags.ipynb @@ -0,0 +1,461 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Causal DAGS for Quasi-Experiments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This page provides an overview of causal Directed Acyclic Graphs (DAG's) for some of the most common quasi-experiments. It takes inspiration from a paper by {cite:t}`steiner2017graphical`, and the books by {cite:t}`cunningham2021causal` and {cite:t}`huntington2021effect`, and readers are encouraged to consult these sources for more details." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [], + "source": [ + "import daft\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [], + "source": [ + "ff = \"times new roman\"\n", + "plt.rcParams[\"font.family\"] = ff\n", + "\n", + "GRID_UNIT = 2.0\n", + "DPI = 200\n", + "NODE_EC = \"none\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before we take a look at randomized controlled trials (RCTs) and quasi-experiments, let's first consider the concept of confounding. Confounding occurs when a variable (or variables) causally influence both the treatment and the outcome and is very common in observational studies. This can lead to biased estimates of the treatment effect (the causal effect of $Z \\rightarrow Y$). The following causal DAG illustrates the concept of confounding. Note that the confounder is written as a vector because there may be multiple confounding variables, $\\mathbf{X}=x_1, x_2,x_3$." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAATMAAAEMCAYAAACodFEmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAB7CAAAewgFu0HU+AAAQAElEQVR4nO3de2jV9R/H8ddxaqV4g5kXnJVGOZfzsnlJXIlZm5akKaywpHQ6/8myzMgmJBGZVkoUkbJRIKWpZavpzMVW2siQ8oLTLlNzapaZyJzT3b6/P/p58Ox6pmfne877+3zAwHPOd/Ku1tPP+5zt6HMcxxEARLl2bg8AAKFAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMAJhAzACYQMwAmEDMcM0uXbqkQYMGyefzBXw8/fTTzX5ecXGxYmJiGnxeQUFBmCaHRT7HcRy3h0D0Ki4uVkpKiurq6vz3+Xw+ffvtt0pJSWlw/eXLlzVs2DAdPnw44P6MjAytXbu2zeeFXZzMcF3Gjh2rBQsWBNznOI7mzJmjysrKBtcvW7asQcj69eunN998s03nhH2czHDdLl68qMTERJWWlgbcv2jRIq1cudJ/+6efftLo0aNVU1MTcF1eXp4mT54clllhFzFDSBQVFWnChAm6+sspJiZGxcXFGjVqlGpqapScnKx9+/YFfN6sWbP00UcfhXtcGMSaiZAYP3685s+fH3BfbW2tZs+eraqqKi1fvrxByHr37q3Vq1eHcUpYxskMIXPhwgXddddd+uOPPwLunzlzpjZu3KiqqqqA+z/77DNNmzYtnCPCMGKGkNqxY4ceeOCBFq9LT0/X+vXrwzARvIKYIeQyMjKUnZ3d5OM9e/bUwYMH1bNnzzBOBeuIGULu/PnzSkhI0MmTJxt9fP369UpPTw/zVLCOFwAQct26ddOECRMafaxz585BraFAaxEzhNx3332ndevWNfpYRUWFFi5cGOaJ4AWsmQipyspKJSYm6vfff2/2um3btiktLS1MU8ELOJkhpLKyshqErH379g2umzdvnsrLy8M1FjyAmCFkdu/e3eCbYH0+n7788kvFx8cH3F9WVqYXXnghjNPBOmKGkKiqqtLs2bMD3j1DkjIzM5WWlqacnBy1axf45bZmzRoVFhaGc0wYRswQEsuWLVNJSUnAfXFxcVqxYoUkacyYMQ2e+HccRxkZGbp48WLY5oRdvACA6/bzzz/7f5j8alu3btWkSZP8tysrKzV06FD99ttvAdc988wz/Iwmrhsxw3WpqanRyJEjtXfv3oD7m3o3jF27dumee+4JeHeNdu3aaefOnRo7dmxbjwvDWDNxXV5//fUGIevVq5dWrVrV6PXjxo1r8LbadXV1mjNnji5dutRWY8IDOJnhmpWUlGj48OEN3g1j06ZNmj59epOfV1FRocTERB05ciTg/hdffFHLly9vk1lhHzEDYAJrJgATiBkAE4gZABOIGQATiBkAE4gZABOIGQATiBkAE4gZABOIGQATiBkAE4gZABOIGQATiBkAE4gZABOIGQATiBkAE4gZABOIGQATiBkAE4gZABOIGQATiBkAE4gZJEmnT59WZWWl22O02tGjR8Vf/QqJmOH/+vTpo06dOunjjz92e5SgOI6jadOmacCAAZo6darb4yACEDNo3rx5/l/PnDlT//77r4vTBOett97Sli1bJEm5ubnav3+/uwPBdT6HM7qnlZaW6vbbb/ffvvfee1VUVOTeQK3g8/kCbtfV1TW4D97Byczjrg6ZJBUWFro0SeuVlJQE3Gbd9DZi5mFXr5eStHv37qg62cTHxys9Pd1/m3XT21gzPSqa18v6WDchcTLzrGheL+tj3YREzDwp2tfL+lg3IbFmeo6l9bI+1k1v42TmMZbWy/pYN72NmHmItfWyPtZNb2PN9AjL62V9rJvexMnMIyyvl/WxbnoTMfMA6+tlfayb3sSaaZyX1sv6WDe9hZOZcV5aL+tj3fQWYmaY19bL+lg3vYU10ygvr5f1sW56Ayczo7y8XtbHuukNxMwgr6+X9bFuegNrpjGsl01j3bSNk5kxrJdNY920jZgZwnrZPNZN21gzjWC9DB7rpk2czIxgvQwe66ZNxMwA1svWYd20iTUzyrFeXjvWTVs4mUU51strx7ppCzGLYqyX14d10xbWzCjFehk6rJs2cDKLUqyXocO6aQMxi0Ksl6HFumkDa2YEO3PmjMrKyjRixAj/fayXbae5ddNxHH399ddKTU11YzQEgZNZBMvOztaoUaO0dOlSVVVVSWK9bEtNrZunTp3SlClTlJaWxoktkjmISDU1Nc6tt97qSHIkOUOGDHFSUlL8tyU5u3fvdntMc9LT0wP+HWdlZTndu3f3354/f77bI6IJrJkRKi8vTw899FCTj7Netp3mnn/s3LmzTp06pa5du4ZxIgSDNTNCvf/++80+/vbbb4dpEm9xHEcvv/xyk49XVFRo3bp1YZwIweJkFoGOHj2qgQMHqrn/NDExMVqyZImysrLUsWPHME5n16lTp5SZmamvvvqq2esSEhJ04MABXkGOMJzMItCaNWuaDZkk1dbW6tVXX1VycrJOnz4dpsnsKigoUEJCQoshk6SDBw9q165dYZgKrUHMIszly5eVnZ0d9PWpqanq1atXG07kDSNHjtSgQYOCvr6lpwEQfsQswmzevFlnzpwJ6tpFixZpxYoVrDsh0K1bN+Xn52vMmDFBXb9p0yb99ddfbTwVWoOYRZhg/8QnZKHXmqBVV1crJycnDFMhWLwAEEEOHDigxMTEFq8jZG3r/PnzSktL0w8//NDsdf3799eRI0cUExMTpsnQHE5mESSYUxkha3vBntCOHz+ubdu2hWkqtISTWYQoLy9X3759deHChSavIWThFcwJbdKkSdq6dWsYp0JTOJlFiHXr1hGyCBPMCS0/P19Hjx4N41RoCjGLAI7jNLtiEjL3tBQ0x3H0wQcfhHkqNIY1MwLs2rVLKSkpjT5GyCJDcytnbGysTpw4oRtuuMGFyXAFJ7MI0NSpjJBFjuZOaP/88482bdrkwlS4Giczl/3999/q16+fqqurA+4nZJGpqRPa2LFj9f3337s0FSROZq7LyckhZFGkqRNacXExb9zoMmLmotra2gZPHhOyyNdU0Ph5TXcRMxf9+uuvOnnypP82IYsejQWtsLCwxXc7QdvhOTOX7d+/X0899ZQmTJhAyKLQlefQ7rzzTq1atUo9evRweyTPImYRoKamRjExMYQsStXU1Kh9+/Zuj+F5xAyACTxnBsAEYgbABGIGwARiBsAEYgbABGIGwARiBsAEYgbABGIGwARiBsAEYgbABGIGwARiBsAEYgbABGIGwARiBsAEYlbPlbeuvtaP1atXu/2PgChy7tw5denSRT6fT3FxcaqpqWnxc2prazVp0iT/19wnn3wShkkjHzGrZ8+ePdf1+UOGDAnRJPCCHj16aO7cuZKkEydOBPWXCT///PPKz8+XJGVlZemxxx5r0xmjBW+bXc+RI0d08eLFoK4tLy9Xenq6ysrKJEkjRozQzp071alTp7YcEcaUlZVp4MCBqq6u1ujRoxv8BcNXW7t2rebNmydJmj59ujZu3MjfHXGFg2tSWVnpjB8/3pHkSHLi4+OdM2fOuD0WotSsWbP8X0vFxcWNXlNYWOh06NDBkeQMHz7cqaioCPOUkY018xpUV1drxowZKioqkiTddtttKigoUGxsrLuDIWotXrzYf8Jq7HnX0tJSzZgxQ9XV1erdu7dyc3PZAOohZq1UV1enJ554Qnl5eZKkvn37qqCgQH379nV5MkSzhIQETZ48WZK0efNmHT9+3P/Y+fPnNWXKFJ09e1Y33nijvvjiC/Xr18+tUSMWMWulzMxMbdiwQZIUGxurHTt2aMCAAS5PBQsWL14s6b9XK999913/rx999FEdOnRIkpSTk6NRo0a5NmNEc3vPjSbPPfec/3mNrl27Onv27HF7JBgzZswYR5LTvXt358KFC86CBQv8X3NLly51e7yIxquZQVq2bJleeeUVSVKnTp20fft2jRs3zt2hPOjZZ59VbGyskpKSlJSUpJtvvtntkULq888/1yOPPCJJuu+++/TNN99I4pXLYBCzIKxevVoLFy6UJHXs2FG5ublKTU11eSpvGjRokH755Rf/7bi4OH/YLATOcRzFx8cH/DPyLT/BIWYtyMnJUUZGhhzHUUxMjDZs2KDp06e7PZZn1Y9ZY6I9cNnZ2crIyJAk9enTRz/++CNP+AehvdsDRLJPP/1Uc+fOleM48vl8ys7OJmRRoKysTGVlZdqyZYv/vmgK3MCBA/2/zszMJGTBcu3ZugiXl5fn/wZFSc4777zj9kjNeumll/yz8hHcR1xcnDN16lRnx44dbv/nC7Bq1Sr/jFu2bHF7nKhBzBpRVFTk3HTTTf4vqNdee83tkVrkdhii+SPSXiV88skn/bMdO3bM7XGiBt9nVs+ePXs0ZcoUVVZWSvrve3+WLFni8lRoS+3aRdb/Bnv37pX03w+h33LLLe4OE0V4zuwqBw8eVFpamsrLyyVJ8+fP1xtvvOHyVMHJz8/3v4xv2cqVK6/79/D5fLrjjjuUnJyspKQkPfjggyGYLDSqq6tVUlIiSRo6dKjL00QXYvZ/paWluv/++3X27FlJ0syZM/Xee++5PFXwUlNTPfHtIrm5uS2+mnm1+uFKSkrS8OHD1aVLlzac8todOnRIVVVVkqRhw4a5O0yUIWaSTp48qYkTJ+rPP/+UJD388MP68MMPI279QPOiLVyNubJiSsSstTwfs3PnzmnixIk6duyYpP++j2np0qU6fPhw0L9HXFycunXr1kYTojEWwtWYffv2+X9NzFrH8zHbvn17QLgOHz6s5OTkVv0excXFuvvuu0M9GhqRlZWl/v37mwhXY66czDp27KjBgwe7O0yU8XzMDhw4cF2fHxMTw5+gYfT444+7PUKbunIyGzx4sDp06ODyNNGFH2cCYALPcAMwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEwgZgBMIGYATCBmAEw4X95pcLi6dHvaAAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pgm = daft.PGM(dpi=DPI, grid_unit=GRID_UNIT, node_ec=NODE_EC)\n", + "\n", + "pgm.add_node(\"z\", \"$Z$\", 1, 0)\n", + "pgm.add_node(\"x\", \"$\\mathbf{X}$\", 1.5, 0.75)\n", + "pgm.add_node(\"y\", \"$Y$\", 2, 0)\n", + "\n", + "pgm.add_edge(\"z\", \"y\")\n", + "pgm.add_edge(\"x\", \"y\")\n", + "pgm.add_edge(\"x\", \"z\")\n", + "\n", + "pgm.render();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One way to tell that our estimate of the causal relationship $Z \\rightarrow Y$ may be biased is the presence of a backdoor path, $Z \\leftarrow \\mathbf{X} \\rightarrow Y$. This path type is known as a \"fork\". Because $\\mathbf{X}$ is a common cause of $Z$ and $Y$, any observed statistical relation between $Z$ and $Y$ may be due to the confounding effect of $\\mathbf{X}$. \n", + "\n", + "Backdoor paths are problematic because they introduce _statistical associations_ between variables that do not reflect the true causal relationships, potentially leading to biased causal estimates. For example, if we ran a regression of the form `y ~ z`, and observe a main effect of $Z$ on $Y$, we have no way of knowing if this represents a true causal impact of $Z$ on $Y$, or if it is due to the confounding effect of $\\mathbf{X}$. \n", + "\n", + "One approach is to \"close the backdoor path\" by conditioning on the confounding variables. Practically, this could involve including confounders $\\mathbf{X}$ as a covariate in a regression model such as: `y ~ z + x₁ + x₂ + x₃`. Without explaining why, the coefficient for the main effect of $Z$ would now be an unbiased estimate of the _causal_ effect of $Z \\rightarrow Y$.\n", + "\n", + "However, unless we are very sure that we have accurate measures of _all_ confounding variables (maybe there is an $x_4$ that we don't know about or couldn't measure), it is still possible that our estimate of the causal effect is biased." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This leads us to Randomized Controlled Trials (RCTs) which are considered the gold standard for estimating causal effects. One reason for this is that we (as experimenters) intervene in the system by assigning units to treatment by {term}`random assignment`. Because of this intervention, any causal influence of the confounders upon the treatment $\\mathbf{X} \\rightarrow Z$ is broken - treamtent is now soley determined by the randomisation process, $R \\rightarrow T$. The following causal DAG illustrates the structure of an RCT." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAdEAAAEMCAYAAACbY4xqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAB7CAAAewgFu0HU+AAASY0lEQVR4nO3daWxUZR+G8XvaQkqxNBjQglQUEkJZCrVIkRRCAKWFICoIGgQUyuIHVEAwKiTywYhgAiEaXllliYJFLGBZ0ipFSCOGsNi0rAWkRSQGCUKn0O28HwiTblOmD505s1y/ZJLpzBnzN9P24nl6ZsZhWZYlAADQZGF2DwAAQKAiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAAsqdO3fUvXt3ORyOWpfZs2c3+ri8vDyFh4fXe1xOTo6PJkcwcliWZdk9BAA0RV5engYNGqTq6mrXbQ6HQwcPHtSgQYPqHX/37l317dtXp0+frnV7enq61qxZ4/V5EbxYiQIIOAMHDtQ777xT6zbLsjRt2jSVlZXVO37x4sX1AtqpUyd98cUXXp0TwY+VKICA5HQ6lZCQoKKiolq3v//++1q2bJnr62PHjik5OVmVlZW1jsvKytLIkSN9MiuCFxEFELByc3M1dOhQ1fw1Fh4erry8PPXv31+VlZXq16+fTp48WetxkydP1saNG309LoIQ27kAAtaQIUM0a9asWrdVVVVp6tSpKi8v15IlS+oFNDY2VitWrPDhlAhmrEQBBLTbt2+rV69e+vPPP2vdPnHiRGVkZKi8vLzW7Tt27NDLL7/syxERxIgogICXnZ2tF1544YHHTZgwQVu3bvXBRAgVRBRAUEhPT9e6devc3t++fXsVFBSoffv2PpwKwY6IAggKN2/eVM+ePXXlypUG79+6dasmTJjg46kQ7DixCEBQiImJ0dChQxu8r3Xr1h5t9wJNRUQBBIVff/1VW7ZsafC+0tJSzZkzx8cTIRSwnQsg4JWVlSkhIUHnz59v9Li9e/cqNTXVR1MhFLASBRDwFi5cWC+gERER9Y6bMWOGbt265auxEAKIKICAduTIkXpvnuBwOLR7927Fx8fXur24uFjz58/34XQIdkQUQMAqLy/X1KlTa32aiyTNnDlTqampWr9+vcLCav+aW716tQ4cOODLMRHEiCiAgLV48WIVFhbWui0uLk5Lly6VJA0YMKDeCUWWZSk9PV1Op9NncyJ4cWIRgIB0/Phx15vM17Rnzx6lpaW5vi4rK1OfPn107ty5Wse9++67vIcuHhoRBRBwKisr9eyzz+rEiRO1bnf36SyHDx/W4MGDa33aS1hYmA4dOqSBAwd6e1wEMbZzAQSczz77rF5AH3/8cS1fvrzB41NSUjR79uxat1VXV2vatGm6c+eOt8ZECGAlCiCgFBYWKjExsd6ns2zfvl1jx451+7jS0lIlJCTowoULtW7/4IMPtGTJEq/MiuBHRAEAMMR2LgAAhogoAACGiCgAAIaIKAAAhogoAACGiCgAAIaIKAAAhogoAACGiCgAAIaIKAAAhogoAACGiCgAAIaIKAAAhogoAACGiCgAAIaIKAAAhogoAACGiCgAAIaIKAAAhogoAACGiCgAAIaIKAAAhogoAACGiCgANIFlWbp48aLdY8BPEFEA8NCJEycUFhamLl26yOl02j0O/IDDsizL7iEAIBA4HA7X9fDwcFVWVto4DfwBK1EA8NDq1atd16uqqrRp0yYbp4E/YCUKAE1QczUqSaWlpYqKirJpGtiNlSgANMF///1X6+s2bdrYNAn8AREFgCaIjo5mWxcubOcCgAG2dSGxEgUAI2zrQiKiAGCEbV1IbOcCwENhWze0sRIFgIfAtm5oI6IA8BDY1g1tbOcCQDNgWzc0sRIFgGbAtm5oIqIA0AzY1g1NbOcCQDNiWze0sBIFgGbEtm5oIaIA0IzY1g0tbOcCgBewrRsaWIkCgBewrRsaiCgAeAHbuqGB7VwA8CK2dYMbK1EA8CK2dYMbEQUAL2JbN7ixnQsAPsC2bnBiJQoAPsC2bnAiogDgA2zrBie2cwHAh9jWDS6sRAHAh9jWDS5EFAB8iG3d4MJ2LgDYgG3d4MBKFABswLZucCCiAGADT7Z1S0pKVFBQ4OvR0ARs5wKAjRra1m3VqpU2bNigOXPmKDU1Vdu2bbNpOjwIEQUAG926daveVm5aWpr27t0rSYqIiFBxcbFiY2PtGA8PwHYuANio7rauJFdAJamyslJr16719VjwECtRALBZSUmJ4uLi3N4fFxenCxcuKCIiwodTwROsRAHAJpZlacOGDerVq1ejxxUXFysrK8tHU6EpiCgA2KCiokIvvviipk6dqps3bz7w+FWrVvlgKjQVEQUAG7Ro0ULjx4+vd3auO/v371dRUZGXp0JTEVEAsMmkSZO0ceNGj0P6v//9z8sToak4sQgAbLZ582ZNmTJFD/p1/Oijj6qkpEStWrXy0WR4EFaiAGAzT1ek//77rzIyMnw0FTxBRAHAD3gaUk4w8i9s5wKAH/Fka/fYsWNKTEz04VRwh5UoAPgRT1akrEb9BytRAPBDja1Io6Ki9NdffykmJsaGyVATK1EA8EONrUidTme9j02DPViJAoAfc7cijY+PV0FBgcevMYV3sBIFAD/mbkV66tQpHTx40KapcB8RBQA/5y6knGBkP7ZzASBA1N3a5QO77cdKFAACRN0VaWVlpQ4dOmTzVKGNiAJAALkf0s6dOysrK0uvvvqq3SOFNLZzASAAVVZWKiIiwu4xQh4RBQDAENu5AAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqIAABgiogAAGCKiAAAYIqKNWLVqlRwOR4OX1q1bq3v37po1a5ZOnTpl96gIcEuXLnX7vebJZcWKFXb/LyCA3LhxQ9HR0XI4HIqLi1NlZeUDH1NVVaW0tDTX99x3333ng0n9HxFtxIkTJ9ze53Q6debMGX399ddKTEzUtm3bfDcYgs7Ro0cf6vG9e/dupkkQCtq2bavp06dLkkpKSrR9+/YHPmbevHnat2+fJGnhwoV6/fXXvTpjoHBYlmXZPYS/Sk5O1u+//66YmBgdPnzYdXt5ebmKioq0YsUK5eXlSZJatWqlc+fO6YknnrBrXASwCxcuyOl0enTsrVu3NGHCBBUXF0uSnnnmGR06dEhRUVHeHBFBpri4WF27dlVFRYWSk5P122+/uT12zZo1mjFjhiRp7NixysjIkMPh8NWo/s1Cg6qqqqyoqChLkpWSkuL2mAEDBliSLEnWsmXLfDwlQk1ZWZk1ZMgQ1/dcfHy89c8//9g9FgLU5MmTXd9LeXl5DR5z4MABq0WLFpYkKzEx0SotLfXxlP6N7Vw3zpw541oZJCQkNHhMWFiY3n77bdfXBQUFPpkNoamiokLjxo1Tbm6uJOnpp59WTk6O2rVrZ+9gCFgLFixwrSgb+rt6UVGRxo0bp4qKCsXGxmrXrl3seNRBRN2o+ffQxv7e1LlzZ9d1T/44D5iorq7WpEmTlJWVJUnq2LGjcnJy1LFjR5snQyDr2bOnRo4cKUn64YcfdPnyZdd9N2/e1OjRo3X9+nVFRkZq586d6tSpk12j+i0i6kbNiLpbiUrStWvXXNeffvppb46EEDZz5kzXyWvt2rVTdna2unTpYvNUCAYLFiyQdO/s2y+//NJ1/bXXXnO98mD9+vXq37+/bTP6MyLqRs2I9urVy+1xmZmZrutjxozx4kQIVfPmzdPatWslSW3atNG+ffvUo0cPm6dCsBg8eLAGDBgg6d4JRKWlpZo7d67rTNxFixZxJm4jODvXjdjYWF27dk1PPfWULl682OAxmZmZGjt2rKqrqzVu3DhlZGT4eMrQ895776ldu3ZKSkpSUlKSHnvsMbtH8qrFixfrk08+kSRFRUVp//79SklJsXeoEJOTk6PMzEzX91yPHj0UERFh91jN6scff9Qrr7wiSRo2bJh+/vlnSZyJ6wki2oC///5bHTp0kCSNHj1au3btct139+5dnT17Vhs2bNDKlStVVVWllJQU7dmzR9HR0XaNHDK6d++uM2fOuL6Oi4tz/XILtrCuWLFCc+bMkSS1bNlSu3bt0ogRI2yeKvRs2rRJU6ZMcX0dGRmpPn36qF+/fkETVsuyFB8fX+tni5dOeSZwn3UvOn78uOv67t273f4rLCkpSVOnTtWMGTMC+gcokBUXF6u4uLjWtnowhHX9+vWaO3euJCk8PFzffvstAfUTd+7c0ZEjR3TkyBHXbYEeVofDofnz5ys9PV2S1KFDB+3cuZOAeiAwnmEfa+ydimq6ffu20tLSAuYHJVQEeli///57TZ8+XZZlyeFwaN26dRo7dqzdY6ERwRDWrl27uq7PnDmTM3E9ZeeLVP3V+PHjXS9Azs3NtfLz8638/HzryJEj1ubNm63ExETX/YMGDbJ7XMuyLOvDDz90zcTFs0tcXJz10ksvWdnZ2XY/fS5ZWVmuF7ZLslauXGn3SG45nU7bn8NAu0RGRlrJycnW7Nmz7X766lm+fLlrzszMTLvHCRhEtAHdunWzJFnt2rVr8P6ysjKrZ8+erm+4o0eP+njC+uz+5RDIl0WLFtn99FmWZVm5ublWq1atXHN9+umndo/UqBs3btj+3AXqJSIiwu6nr54333zTNd+lS5fsHidg8BKXOpxOp86fPy9JSkxMbPCYyMhILVy40PX1li1bfDIbvCMszP4fg6NHj2r06NEqKyuTdO+1ex999JHNUyGU3P8zVtu2bWu9iQwa55+b8zY6efKkqqurJUl9+/Z1e9yYMWP0yCOP6Pbt29qxY4eWL1/uowkbtm/fPtdp6cFs2bJlD/3fcDgc6tatm+tvVaNGjWqGycwVFBQoNTVVt27dkiTNmjVLn3/+ua0zeaJ169aaP3++3WN4XXZ2tsfnSTQmJibG9bfRfv36PfxgzaiiokKFhYWSpD59+tg8TWAhonXU/GFxtxKV7n1qy/Dhw5WZmanLly/rjz/+aPSdjbxtxIgRIXH25q5du2qdhv8gdYOZlJSkxMREv3k5UlFRkZ5//nldv35dkjRx4kR99dVXNk/lmRYtWmjp0qV2j+F1dV/i4omawbwfzS5duvjt6y1PnTql8vJySY0vHlAfEa2jZkQf9M00atQo1xmgu3fvtjWi8P9g1nXlyhUNHz5cV69elXRvd+Obb77xi+1leC7QgtmQpvzeQ21EtI7730xRUVHq1q1bo8eOHDlSDodDlmXpp59+0scff+yDCSEFXjDrunHjhoYPH65Lly5JuvcmEosWLdLp06c9/m/ExcUpJibGSxOiIcEQzIacPHnSdZ2INg3vWFRDdXW1oqOj5XQ6H/ghtfclJSXp2LFjCgsL09WrV/32tYfBYsuWLXryyScDKpgN2bp160O/H2leXp6ee+65ZpoI7uTn56uwsDBogtmQYcOG6ZdfflHLli11+/ZttWjRwu6RAgYr0RrOnj3r+gxRT/81NmrUKB07dkzV1dXKysrSW2+95cUJ8cYbb9g9QrPIz89/qMeHh4ezYvCR3r17N/pxiMHg/kq0R48eBLSJWIkCAGCIMxgAADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADBERAEAMEREAQAwREQBADD0f1uMy4xvJGA6AAAAAElFTkSuQmCC", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pgm = daft.PGM(dpi=DPI, grid_unit=GRID_UNIT, node_ec=NODE_EC)\n", + "\n", + "pgm.add_node(\"r\", \"$R$\", 0, 0)\n", + "pgm.add_node(\"z\", \"$Z$\", 1, 0)\n", + "pgm.add_node(\"x\", \"$\\mathbf{X}$\", 1.5, 0.75)\n", + "pgm.add_node(\"y\", \"$Y$\", 2, 0)\n", + "\n", + "pgm.add_edge(\"r\", \"z\")\n", + "pgm.add_edge(\"z\", \"y\")\n", + "pgm.add_edge(\"x\", \"y\")\n", + "\n", + "pgm.render();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The new variable $R$ represents the random assignment of units to the treatment group. This means that the treatment effect $Z \\rightarrow Y$ can be estimated without bias." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Instrumental Variables\n", + "\n", + "In quasi-experiments, we cannot randomly assign subjects to treatment groups. So confounders $\\mathbf{X}$ will still influence treatment assignment. In the instrumental variable (IV) approach, the causal effect of $Z \\rightarrow Y$ is identifiable if we have an IV that causally influences the treatment $Z$ but not the outcome $Y$." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAdEAAAEMCAYAAACbY4xqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAB7CAAAewgFu0HU+AAAVMElEQVR4nO3deWwU9f/H8de0RWsVEBRFpRqKolAoVrEgEaN4FFAjChENCkZBPIJGEa/QKBo8UCMqoilQFK2iKJ5osQgIpoIh0qrlUFGwiLdISg96zfcPfp0f23P3w+7O7szzkTTZnZ3dvMzWvvi897O7lm3btgAAQMgS3A4AAEC8okQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQoAgCFKFAAAQ5QoAACGKFEAAAxRogAAGKJEAQAwRIkCAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQoAgCFKFAAAQ5QoAACGKFEAAAxRogAAGKJEAQAwRIkCAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQogrlRXV+u0006TZVkBP1OmTGnzfkVFRUpMTGx2vxUrVkQpObzIsm3bdjsEAISiqKhIQ4cOVUNDg3PMsix9/vnnGjp0aLPz9+3bp9NPP11btmwJOD5x4kTNmzcv4nnhXaxEAcSdIUOG6Pbbbw84Ztu2brzxRlVVVTU7f8aMGc0KtEePHnrqqacimhPex0oUQFyqrKxURkaGtm3bFnD87rvv1pNPPulc//rrrzVo0CDV1dUFnLds2TKNHDkyKlnhXZQogLi1evVqDRs2TAf+GUtMTFRRUZGysrJUV1engQMHqqSkJOB+48eP1yuvvBLtuPAgxrkA4tZ5552nm2++OeBYfX29brjhBtXU1Ojxxx9vVqDdu3fX7Nmzo5gSXsZKFEBc27t3r/r166cdO3YEHB83bpyWLFmimpqagONLly7VFVdcEc2I8DBKFEDcKyws1MUXX9zueWPHjtXixYujkAh+QYkC8ISJEydqwYIFrd7erVs3lZaWqlu3blFMBa+jRAF4wp49e5Senq5ff/21xdsXL16ssWPHRjkVvI6NRQA8oXPnzho2bFiLtx1++OFBjXuBUFGiADxhzZo1eu2111q8raKiQnfeeWeUE8EPGOcCiHtVVVXKyMjQjz/+2OZ5n3zyiYYPHx6lVPADVqIA4t706dObFWhSUlKz82666SaVl5dHKxZ8gBIFENfWr1/f7MMTLMvShx9+qD59+gQcLysr07Rp06KYDl5HiQKIWzU1NbrhhhsCvs1FkiZPnqzhw4crLy9PCQmBf+Zyc3O1atWqaMaEh1GiAOLWjBkztGnTpoBjqampmjVrliRp8ODBzTYU2batiRMnqrKyMmo54V1sLAIQlzZu3Oh8yPyBPv74Y40YMcK5XlVVpQEDBuiHH34IOO+OO+7gM3Rx0ChRAHGnrq5OZ511loqLiwOOt/btLF988YXOPffcgG97SUhI0Nq1azVkyJBIx4WHMc4FEHcee+yxZgV67LHH6plnnmnx/HPOOUdTpkwJONbQ0KAbb7xR1dXVkYoJH2AlCiCubNq0SZmZmc2+neXtt9/W6NGjW71fRUWFMjIy9NNPPwUcv/fee/X4449HJCu8jxIFAMAQ41wAAAxRogAAGKJEAQAwRIkCAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQoAgCFKFAAAQ5QoANf8/PPPbkcImW3bcZkbkUGJAnBFfn6+0tLS1LlzZ1VXV7sdJyjFxcVKSEhQWlqaKisr3Y6DGGDZtm27HQKAv1RVVSklJcW5fsopp+j77793MVFwLMtyLicmJqqurs7FNIgFrEQBRN2RRx4ZcH3Dhg3uBAlRbm6uc7m+vl6LFi1yMQ1iAStRAFGVn5+va6+91rk+d+5c3XLLLS4mCs2Bq1FJqqioCFhVw18oUQBR03SMK+3fqBNPysvL1alTJ+c6Y11/Y5wLIGqajnH37NnjTpCD0LFjR8a6cLASBRAV8T7GbYqxLiRKFEAUeGGM2xRjXUiMcwFEgRfGuE0x1oXEShRAhHltjNsUY11/o0QBRIwXx7hNMdb1N8a5ACLGi2Pcphjr+hsrUQAR4fUxblOMdf2JEgUQdn4Y4zbFWNefGOcCCDs/jHGbYqzrT6xEAYSV38a4TTHW9RdKFEDY+HGM2xRjXX9hnAsgbPw4xm2Ksa6/sBIFEBZ+H+M2xVjXHyhRAAeNMW5zjHX9gXEugIPGGLc5xrr+wEoUwEFhjNs2xrreRokCMMYYt32Mdb2NcS4AY4xx28dY19tYiQIwwhg3NIx1vYkSBRAyxrihY6zrTYxzAYSsS5cuAdcZ47aPsa43sRIFEBLGuAeHsa63UKIAgsYY9+Ax1vUWxrkAgsYY9+Ax1vUWVqIAgsIYN7wY63oDJQqgXYxxw4+xrjcwzgUQYOXKlaqpqQk4xhg3/IIZ6+7cuVOlpaXRjoYQUKIAHHv37tWoUaOUlZWl4uJiSfvHuPv27XPOmTt3bsAKCuYmTZoUcH3ChAmqrKyUbdvKy8tTenq6Hn74YZfSIRiMcwE4cnNzNXnyZElSUlKS7r33Xs2cOTPgHP5khFfTsa4kjRgxQp988omk/c9DWVmZunfv7kY8tIOVKABJ+8tx7ty5zvW6urpmBcoYN/yajnUlOQUq7X8e5s+fH+1YCBIlCkCStG7dOpWUlLR6e0ZGhg477LAoJvKPESNGtHl7bm4um45iFCUKQJICVqEt+eabb5SVldVm0SI0tm1r4cKF6tevX5vnlZWVadmyZVFKhVDwmigA/f333zrhhBOa7cptSVJSkp599lndeuutUUjmXbW1tbryyiv10UcfBXV+dna2CgoKIpwKoWIlCkALFy4MqkAlqXv37rr44osjnMj7OnTooKuuuqrZhy60Zvny5dq2bVuEUyFUlCjgcw0NDXrppZeCOrdHjx5atWqVTj755Ain8ofrrrtOr7zyStBFGuzzhOhhnAv4XEFBQbsbWyQKNJJeffVVTZgwod23D3Xt2lU7d+5kg1cMYSUK+Fx7G4okCjTSgl2R/vvvv1qyZEmUUiEYrEQBH9uxY4fS0tLU0NDQ6jkUaPQEsyIdPHiwvvzyyyimQltYiQI+lpubS4HGkGBWpOvWrdPGjRujmAptoUQBn6qpqWnzk3AoUHcEU6QvvvhiFBOhLZQo4FPvvvuu/vzzzxZvo0Dd1V6R5ufn8xGMMYISBXyqtQ1FFGhsaKtIKysrm31tGtzBxiLAh0pLS1v8qDkKNPa0ttmoT58+Ki0tDfo9pogMVqKAD7X0mhoFGptaW5Fu3rxZn3/+uUup0IgSBXxm7969zUaBFGhsa61I2WDkPkoU8JnXX39d5eXlznUKND60VKRLly7V77//7mIqUKKAz6xcudK5TIHGl6ZFWldXp7Vr17qcyt/YWAT4TG1trWbOnKlFixbp008/pUDj0KuvvqqcnBzNnTtXI0eOdDuOr1GigE/V1dUpKSnJ7RgwxPMXGyhRAAAM8ZooAACGKFEAAAxRogAAGKJEAQAwRIkCAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUqKQXX3xRlmXJsizl5+c7x88//3xZlqUuXbqE/Jjjx493HvPtt98OZ1x40KxZs5zfF5Of2bNnu/2fgDiye/dudezYUZZlKTU1VXV1de3ep76+XiNGjHB+5954440oJI19lKik4uJi5/Lpp5/uXO7Xr58k6b///tOuXbuCfrySkhKnjIcMGaIxY8aEJSe8a8OGDQd1//79+4cpCfygS5cumjRpkiRp586dQf1Df+rUqSooKJAkTZ8+Xddcc01EM8YLy7Zt2+0Qbhs0aJC++uorJScna+/evUpMTJQkvfTSS7rlllskSStWrNAFF1wQ1ONlZ2fr008/lSR9+eWXGjx4cGSCwzN++uknVVZWBnVueXm5xo4dq7KyMknSGWecobVr1yolJSWSEeExZWVl6tWrl2prazVo0CCtW7eu1XPnzZunm266SZI0evRoLVmyRJZlRStqbLN9rr6+3k5JSbEl2QMHDgy4bc2aNbYkW5L93HPPBfV4hYWFzn3Gjh0bicjwsaqqKvu8885zfsf69Olj//XXX27HQpwaP36887tUVFTU4jmrVq2yO3ToYEuyMzMz7YqKiiinjG2+H+du3brVWQEcOMqVpPT0dOfypk2b2n0s27Z1zz33SJIOPfRQPfbYY+ELCt+rra3VmDFjtHr1aklSz549tWLFCh199NHuBkPcuueee5wVZUuvq2/btk1jxoxRbW2tunfvrg8++ICJRxO+L9EDXw/NzMwMuK1r16467rjjJEmbN29u97Hy8/O1ceNGSdLtt9+unj17hi8ofK2hoUHXXXedli1bJkk6/vjjtWLFCh1//PEuJ0M8S09P18iRIyVJ77zzjn755Rfntj179uiyyy7TP//8o+TkZL3//vvq0aOHW1FjFiXayqaiRo2r0fZWovv27VNOTo4k6aijjtIDDzwQtozA5MmT9eabb0qSjj76aBUWFiotLc3lVPCCxulZfX295syZ41y++uqrncVDXl6esrKyXMsYyyjR/ytRy7KUkZHR7PbGEv3rr7/0999/t/o4c+bM0fbt2yVJDz74oI488shwR4VPTZ06VfPnz5ckderUSQUFBerbt6/LqeAV5557rrP5cd68eaqoqNBdd93l7MTNyclhJ25b3H5R1m3HHnusLck+5ZRTWrw9NzfXeeF9zZo1LZ6ze/duu2vXrrYku3fv3nZNTU0kI/vaHXfcYT/yyCP2xx9/bP/xxx9ux4m4hx56yPn9S0lJsdeuXet2JN8pLCy0b7vtNjsvL88uKSmxa2tr3Y4UdkuXLnV+zy644ALn8ujRo+2Ghga348W0JLfKOxb8/vvv+uOPPyS1PMqV/v+9otL+ke7QoUObnfPoo4/q33//lSQ98cQT6tChQ/jDQpJUUFCgrVu3OtdTU1N15plnBvwcc8wxLiYMn9mzZ+uhhx6SJB1yyCFaunSpzjnnHHdD+dCuXbv0wgsvONeTk5M1YMAADRw40Pmd69u3r5KS4vfP6ahRo3Tqqadq69at+uyzzyTtf+vUokWLeCtLO+L3WQ+Dxk1AUvNNRY0O3KHb0uaisrIyPf/885L2j0VGjRoV3pBoU1lZmcrKyvTee+85x7xQrHl5ebrrrrskSYmJiXr99deVnZ3tcipIUnV1tdavX6/169c7x+K9WC3L0rRp0zRx4kRJ0nHHHaf333+fnbhBiI9nOELa21Qk7X8NqkePHtq5c2eLm4umT5+u6upqWZalp59+OkJJEYp4L9a33npLkyZNkm3bsixLCxYs0OjRo92OhTZ4oVh79erlXJ48eTI7cYPl9jzZTVdddZUz+9+1a1er52VnZ9uS7BNOOCHgeElJiZ2QkGBLsseNGxfpuG26//77nf8WfoL7SU1NtUeNGmUXFha6+twdaNmyZc4b2xXCh3y4obKy0vXnMN5+kpOT7UGDBtlTpkxx++lr5plnnnFyvvfee27HiRu+LtHevXvbkuxjjjmmzfOmTp3q/HLt2bPHOd5YrsnJyfaOHTsiHbdNbv9xiOefnJwcV5+7RqtXr7YPO+wwJ9fMmTPdjtSm3bt3u/7cxetPUlKS209fM9dff72Tb/v27W7HiRu+fYtLZWWlfvzxR0mtj3IbtfS66MqVK7V8+XJJ0p133qkTTzwxMkERcQkJ7v9vsGHDBl122WWqqqqStP+9e7zXGNHU+PJWly5ddNJJJ7kbJo7E5nA+CkpKStTQ0CCp9U1FjZp+/F9WVpamTZsmSerWrZvuu+++yAUNUkFBgbOrzsuefPLJg34My7LUu3dv57WqSy65JAzJzJWWlmr48OEqLy+XJN1888164oknXM0UjMMPP9z5/8DLCgsLA/ZPmOrcubPz2ujAgQMPPlgY1dbWOns+BgwY4HKa+OLbEg1mU1Gjvn37yrIs2batzZs364033tDXX38tSZoxY4Y6deoUwaTByc7O9sXuzQ8++CDgLS7taVqYZ555pjIzM9WxY8cIpgzetm3bdNFFF+mff/6RJI0bNy7g7RSxrEOHDpo1a5bbMSJu0aJFmjBhQkj3ObAwG0szLS0tZt8usnnzZtXU1Ehq/+8hAlGiav+X5ogjjtBJJ52k7du3q7i42PnuvT59+jjfyQf3xXphNvXrr7/qwgsv1G+//SZJuvzyy/Xyyy/HxHgZwYu3wmxJKH8PEcj3JZqSkqLevXu3e356erq2b9+uwsJC59isWbNidru618VbYTa1e/duXXjhhc5HRZ522mnKycnRli1bgn6M1NRUde7cOUIJ0RIvFGZLSkpKnMuUaGh82QANDQ367rvvJEn9+/cP6l/+6enpzjdoSNKwYcN06aWXRiwjWjZ9+nSdeOKJcVWYLVm+fHlAYW7ZsiXk18mKiop09tlnhzsamsjMzNTixYs9U5gtaVxUHHLIIXwuc4h8WaLff/99q98h2poDNxclJCToqaeeikQ0tOPaa691O0JYfPvttwd1/8TERFYMUdK/f3/179/f7RgR1bgS7du3Lx9bGiLLtm3b7RAAAMQjdjAAAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQoAgCFKFAAAQ5QoAACGKFEAAAxRogAAGKJEAQAwRIkCAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQoAgCFKFAAAQ5QoAACGKFEAAAxRogAAGKJEAQAwRIkCAGCIEgUAwBAlCgCAIUoUAABDlCgAAIYoUQAADFGiAAAYokQBADBEiQIAYIgSBQDAECUKAIAhShQAAEOUKAAAhihRAAAMUaIAABiiRAEAMESJAgBgiBIFAMAQJQoAgKH/AcCXPLig+oBWAAAAAElFTkSuQmCC", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pgm = daft.PGM(dpi=DPI, grid_unit=GRID_UNIT, node_ec=NODE_EC)\n", + "\n", + "pgm.add_node(\"iv\", \"$IV$\", 0, 0)\n", + "pgm.add_node(\"z\", \"$Z$\", 1, 0)\n", + "pgm.add_node(\"y\", \"$Y$\", 2, 0)\n", + "pgm.add_node(\"x\", \"$\\mathbf{X}$\", 1.5, 0.75)\n", + "pgm.add_edge(\"iv\", \"z\")\n", + "pgm.add_edge(\"x\", \"z\")\n", + "pgm.add_edge(\"x\", \"y\")\n", + "pgm.add_edge(\"z\", \"y\")\n", + "\n", + "pgm.render();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try to get some intuition of why having the $IV$ helps:\n", + "* The presence of $\\mathbf{X}$ is a confounder because it influences both $Z$ and $Y$.\n", + "* But the $IV$ helps overcome this confounding because it is not influenced by $\\mathbf{X}$.\n", + "* Any association between the $IV$ and $Y$ must be through the treatment $Z$.\n", + "* This means that the $IV$ can be used to estimate the causal effect of $Z \\rightarrow Y$, without being confounded by $\\mathbf{X}$. Informally, the $IV$ causes some variation in the treatment $Z$ that is not due to $\\mathbf{X}$, and this variation can be used to estimate the causal effect of $Z \\rightarrow Y$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Readers are referred to {cite:t}`steiner2017graphical,cunningham2021causal` or {cite:t}`huntington2021effect` for a more in-depth discussion of the IV approach from the causal DAG perspective." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Interrupted Time Series\n", + "\n", + "A causal DAG for interrupted time series quasi-experiment is given in Chapter 17 of {cite:t}`huntington2021effect`, though they are labelled as [Event Studies](https://theeffectbook.net/ch-EventStudies.html). These kinds of studies are suited to situations where an intervention is made at a given point in time at which we move from untreated to treated. Typically, we consider situations where there are a 'decent' number of observations over time. Here's the causal DAG - note that $\\text{time}$ represents all the things changing over time such as the time index as well as time-varying predictor variables." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pgm = daft.PGM(dpi=DPI, grid_unit=GRID_UNIT, node_ec=NODE_EC)\n", + "\n", + "pgm.add_node(\"a\", \"after\\ntreatment\", -1, 0)\n", + "pgm.add_node(\"z\", \"$Z$\", 0, 0)\n", + "pgm.add_node(\"y\", \"$Y$\", 1, 0)\n", + "pgm.add_node(\"t\", \"time\", 0, 1)\n", + "\n", + "pgm.add_edge(\"a\", \"z\")\n", + "pgm.add_edge(\"t\", \"a\")\n", + "pgm.add_edge(\"t\", \"y\")\n", + "pgm.add_edge(\"z\", \"y\")\n", + "\n", + "pgm.render();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What we want to understand is the causal effect of the treatment upon the outcome, $Z \\rightarrow Y$. But we have a back door path between $Z$ and $Y$ which will make this hard, $Z \\leftarrow \\text{after treatment} \\leftarrow \\text{time} \\rightarrow Y$.\n", + "\n", + ":::{note}\n", + "Below is an attempt to explain one way that we can deal with this. Though it is a bit of a brain-twister and can take some time to get your head around. Thanks to Nick Huntington-Klein for some clarification in [this twitter thread](https://twitter.com/inferencelab/status/1783882438063661374).\n", + ":::\n", + "\n", + "One approach we can use is:\n", + "1. We want to close the backdoor path, and one way to do this is to split the dataset into two parts: pre-treatment and post-treatment. By fitting a model only to the pre-treatment data, we have removed any variation in $\\text{after treatment}$ (all values are $0$), so there is now no variation in $Z$ caused by $\\text{time}$. This is one way to close a backdoor path, and means that a model fitted to this data (e.g. $Y_{\\text{pre}} \\sim f(\\text{time}_{\\text{pre}})$) will not be biased by the backdoor path.\n", + "2. However, our goal is to estimate the causal effects of the treatment $Z \\rightarrow Y$, but we have just removed any variation in $Z$ and it does not appear in the aforementioned model, $Y_{\\text{pre}} \\sim f(\\text{time}_{\\text{pre}})$, so our work is not done. One way to deal with this is to use the model to predict what would have happened in the post-treatment era if no treatment had been given. If we make the assumption that nothing would have changed in the absence of treatment, then this will be an unbiased estimate of the counterfactual. By comparing the counterfactual with the observed post-treatment data, we can estimate the treatment effect $Z \\rightarrow Y$. By focussing only on the post-treatment data we are looking at empirical outcomes $Y_\\text{post}$ which are affected by treatment $Z = 1$, but have closed the back door because all $\\text{after treatment} = 1$. The final comparison (subtraction) between the counterfactual estimate and the observed post-treatment data gives us the estimated treatment effect." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Difference in Differences\n", + "\n", + "Difference in Difference studies involve comparing the change in outcomes over time between a treatment and control group. The causal DAG for this is given in Chapter 18 of {cite:t}`huntington2021effect`:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pgm = daft.PGM(dpi=DPI, grid_unit=GRID_UNIT, node_ec=NODE_EC)\n", + "\n", + "pgm.add_node(\"z\", \"$Z$\", 0, 0)\n", + "pgm.add_node(\"y\", \"$Y$\", 1, 0)\n", + "pgm.add_node(\"t\", \"time\", 0, 1)\n", + "pgm.add_node(\"g\", \"group\", 1, 1)\n", + "pgm.add_edge(\"t\", \"z\")\n", + "pgm.add_edge(\"t\", \"y\")\n", + "pgm.add_edge(\"g\", \"z\")\n", + "pgm.add_edge(\"g\", \"y\")\n", + "pgm.add_edge(\"z\", \"y\")\n", + "pgm.render();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + ":::{note}\n", + "For our explanation below, we will assume we are dealing with the simplest case of a two-group, two-time period design, the so called \"classical\" 2$\\times$2 difference-in-differences design. \n", + ":::\n", + "\n", + "Our goal is to estimate the causal effect of the treatment on the outcome, $Z \\rightarrow Y$, but now we have _two_ backdoor paths:\n", + "1. $Z \\leftarrow \\text{time} \\rightarrow Y$\n", + "2. $Z \\leftarrow \\text{group} \\rightarrow Y$\n", + "\n", + "From a regression point of view, both $time$ and $group$ are binary variables. In this situation, treatment is given to the treatment group ($\\text{group}=1$) at time $\\text{time}=1$.\n", + "\n", + "The causal effect of the treatment upon the outcome is typically estimated by fitting a regression model of the form `y ~ time + group + time:group`. The interaction term `time:group` captures the causal effect of $Z \\rightarrow Y$. \n", + "\n", + "We can note that this interaction term $\\text{time} \\times \\text{group}$ encodes the values of $Z$, which as we said above, is equal to 1 for only the treatment group at time 1. So another way to think about the inclusion of an interaction effect is that we are simply conditioning on all the observed data ($Z$, $\\text{time}$, $\\text{group}$, $Y$) to estimate the causal effect of $Z \\rightarrow Y$.\n", + "\n", + ":::{warning}\n", + "Achieving an unbiased estimate is strongly dependent upon the {term}`parallel trends assumption`. That is, we assume that the treatment and control groups would have followed the same trajectory (over time) in the absence of treatment. This is a strong assumption and should be carefully considered when interpreting the results of a difference-in-differences study. In the case of the classic 2$\\times$2 design we cannot assess the validity of this assumption empirically, so it is important to consider the plausibility of this assumption in the context of the particular example. \n", + ":::" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Synthetic Control\n", + "\n", + ":::{warning}\n", + "While many texts cover the synthetic control method, they typically do not provide a causal DAG-based treatment. So this section is pending - we hope to update it soon.\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Regression Discontinuity\n", + " \n", + "The regression discontinuity design is similar to the interrupted time series design, but rather than the the treatment being at a specific point in time, treatment is based on a cutoff value $\\lambda$ along some running variable $RV$. This running variable could be a test score, age, spatial location, etc. The running variable may also influence the outcome $RV \\rightarrow Y$. The running variable may also be associated with a set of variables $\\mathbf{X}$ that influence the outcome, $RV - - - - \\mathbf{X} \\rightarrow Y$." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pgm = daft.PGM(dpi=DPI, grid_unit=GRID_UNIT, node_ec=NODE_EC)\n", + "\n", + "# data generating graph\n", + "pgm.add_node(\"a\", \"$RV$\", 0, 1)\n", + "pgm.add_node(\"z\", \"$Z$\", 0, 0)\n", + "pgm.add_node(\"x\", \"$\\mathbf{X}$\", 1, 1)\n", + "pgm.add_node(\"y\", \"$Y$\", 1, 0)\n", + "pgm.add_edge(\"a\", \"z\")\n", + "pgm.add_edge(\"a\", \"y\")\n", + "pgm.add_edge(\n", + " \"a\",\n", + " \"x\",\n", + " plot_params={\"ec\": \"grey\", \"lw\": 1.5, \"ls\": \":\", \"head_length\": 0, \"head_width\": 0},\n", + ")\n", + "pgm.add_edge(\"z\", \"y\")\n", + "pgm.add_edge(\"x\", \"y\")\n", + "pgm.add_text(0, 1.3, \"Data generating graph\")\n", + "\n", + "# limiting graph\n", + "x_offset = 2\n", + "pgm.add_node(\"a2\", r\"$RV \\rightarrow \\lambda$\", 0 + x_offset, 1)\n", + "pgm.add_node(\"z2\", \"$Z$\", 0 + x_offset, 0)\n", + "pgm.add_node(\"x2\", \"$\\mathbf{X}$\", 1 + x_offset, 1)\n", + "pgm.add_node(\"y2\", \"$Y$\", 1 + x_offset, 0)\n", + "pgm.add_edge(\"a2\", \"z2\")\n", + "pgm.add_edge(\"z2\", \"y2\")\n", + "pgm.add_edge(\"x2\", \"y2\")\n", + "pgm.add_text(x_offset, 1.3, \"Limiting graph\")\n", + "\n", + "pgm.render();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see from the data generating graph (left) that the $RV$ is a confounding variable as it influences both the treatment $Z$ and the outcome $Y$. \n", + "\n", + "If we tried to identify the causal effect of $Z \\rightarrow Y$ by conditioning on the running variable ($RV=rv$), we would eliminate any variation in $Z$ or $Y$ caused by $RV$. And because $Z$ is constant for any given value of $RV$, then the $Z \\rightarrow Y$ path would disappear and we could not estimate the causal effect.\n", + "\n", + "Identification of the causal effect of $Z \\rightarrow Y$ is done with a limiting graph (right). The $RV$ node is replaced by a subset of the data where $RV$ is close to the cutoff value $\\lambda$, hence the name \"limiting graph\" and the symbol $RV \\rightarrow \\lambda$.\n", + "\n", + "In the limit, this eliminates variation in the running variable and so breaks the $RV \\rightarrow Y$ path. The causal effect of $Z \\rightarrow Y$ can be estimated by comparing the outcomes of units just above and just below the cutoff value $\\lambda$.\n", + "\n", + "Readers are referred to {cite:t}`steiner2017graphical` and [Chapter 6](https://mixtape.scunning.com/06-regression_discontinuity) of {cite:t}`cunningham2021causal` who discuss limiting graphs in more detail. Chapter 20 of {cite:t}`huntington2021effect` also covers regression discontinuity designs, but presents simplified (and non-kosher, in his own words) causal DAG." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References\n", + ":::{bibliography}\n", + ":filter: docname in docnames\n", + ":::" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "CausalPy", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/source/references.bib b/docs/source/references.bib index 08034e5a..93acf8a8 100644 --- a/docs/source/references.bib +++ b/docs/source/references.bib @@ -76,3 +76,28 @@ @book{shadish_cook_cambell_2002 year={2002}, publisher={Houghton Mifflin Boston, MA} } + +@article{steiner2017graphical, + title={Graphical models for quasi-experimental designs}, + author={Steiner, Peter M and Kim, Yongnam and Hall, Courtney E and Su, Dan}, + journal={Sociological methods \& research}, + volume={46}, + number={2}, + pages={155--188}, + year={2017}, + publisher={SAGE Publications Sage CA: Los Angeles, CA} +} + +@book{cunningham2021causal, + title={Causal inference: The mixtape}, + author={Cunningham, Scott}, + year={2021}, + publisher={Yale university press} +} + +@book{huntington2021effect, + title={The effect: An introduction to research design and causality}, + author={Huntington-Klein, Nick}, + year={2021}, + publisher={Chapman and Hall/CRC} +} diff --git a/pyproject.toml b/pyproject.toml index abd63362..eb8d61df 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -52,6 +52,7 @@ dependencies = [ dev = ["pathlib", "pre-commit", "twine", "interrogate"] docs = [ "ipykernel", + "daft", "linkify-it-py", "myst-nb", "pathlib",