Skip to content

Commit a3f13fa

Browse files
fixing \ref and \eqref
1 parent 7e65e96 commit a3f13fa

11 files changed

+113
-110
lines changed

Ch03-linreg-lab.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ As mentioned above, there is an existing function to add a line to a plot --- `a
343343

344344

345345
Next we examine some diagnostic plots, several of which were discussed
346-
in Section~\ref{Ch3:problems.sec}.
346+
in Section 3.3.3.
347347
We can find the fitted values and residuals
348348
of the fit as attributes of the `results` object.
349349
Various influence measures describing the regression model
@@ -440,7 +440,7 @@ We can access the individual components of `results` by name
440440
and
441441
`np.sqrt(results.scale)` gives us the RSE.
442442

443-
Variance inflation factors (section~\ref{Ch3:problems.sec}) are sometimes useful
443+
Variance inflation factors (section 3.3.3) are sometimes useful
444444
to assess the effect of collinearity in the model matrix of a regression model.
445445
We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.
446446

Ch04-classification-lab.Rmd

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -405,7 +405,7 @@ lda.fit(X_train, L_train)
405405
406406
```
407407
Here we have used the list comprehensions introduced
408-
in Section~\ref{Ch3-linreg-lab:multivariate-goodness-of-fit}. Looking at our first line above, we see that the right-hand side is a list
408+
in Section 3.6.4. Looking at our first line above, we see that the right-hand side is a list
409409
of length two. This is because the code `for M in [X_train, X_test]` iterates over a list
410410
of length two. While here we loop over a list,
411411
the list comprehension method works when looping over any iterable object.
@@ -454,7 +454,7 @@ lda.scalings_
454454
455455
```
456456

457-
These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (\ref{Ch4:bayes.multi}).
457+
These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (4.24).
458458
If $-0.64\times `Lag1` - 0.51 \times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline.
459459

460460
```{python}
@@ -463,7 +463,7 @@ lda_pred = lda.predict(X_test)
463463
```
464464

465465
As we observed in our comparison of classification methods
466-
(Section~\ref{Ch4:comparison.sec}), the LDA and logistic
466+
(Section 4.5), the LDA and logistic
467467
regression predictions are almost identical.
468468

469469
```{python}
@@ -522,7 +522,7 @@ The LDA classifier above is the first classifier from the
522522
`sklearn` library. We will use several other objects
523523
from this library. The objects
524524
follow a common structure that simplifies tasks such as cross-validation,
525-
which we will see in Chapter~\ref{Ch5:resample}. Specifically,
525+
which we will see in Chapter 5. Specifically,
526526
the methods first create a generic classifier without
527527
referring to any data. This classifier is then fit
528528
to data with the `fit()` method and predictions are
@@ -875,7 +875,7 @@ This is double the rate that one would obtain from random guessing.
875875
The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.
876876
We do not know *a priori* what value to use. It is therefore of interest
877877
to see how the classifier performs on test data as we vary these
878-
parameters. This can be achieved with a `for` loop, described in Section~\ref{Ch2-statlearn-lab:for-loops}.
878+
parameters. This can be achieved with a `for` loop, described in Section 2.3.8.
879879
Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase
880880
insurance as we vary the number of neighbors from 1 to 5:
881881

@@ -902,7 +902,7 @@ As a comparison, we can also fit a logistic regression model to the
902902
data. This can also be done
903903
with `sklearn`, though by default it fits
904904
something like the *ridge regression* version
905-
of logistic regression, which we introduce in Chapter~\ref{Ch6:varselect}. This can
905+
of logistic regression, which we introduce in Chapter 6. This can
906906
be modified by appropriately setting the argument `C` below. Its default
907907
value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)
908908
logistic regression estimator discussed above.
@@ -946,7 +946,7 @@ confusion_table(logit_labels, y_test)
946946
947947
```
948948
## Linear and Poisson Regression on the Bikeshare Data
949-
Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section~\ref{Ch4:sec:pois}.
949+
Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section 4.6.
950950
The response `bikers` measures the number of bike rentals per hour
951951
in Washington, DC in the period 2010--2012.
952952

@@ -987,7 +987,7 @@ variables constant, there are on average about 7 more riders in
987987
February than in January. Similarly there are about 16.5 more riders
988988
in March than in January.
989989

990-
The results seen in Section~\ref{sec:bikeshare.linear}
990+
The results seen in Section 4.6.1
991991
used a slightly different coding of the variables `hr` and `mnth`, as follows:
992992

993993
```{python}
@@ -1041,7 +1041,7 @@ np.allclose(M_lm.fittedvalues, M2_lm.fittedvalues)
10411041
```
10421042

10431043

1044-
To reproduce the left-hand side of Figure~\ref{Ch4:bikeshare}
1044+
To reproduce the left-hand side of Figure 4.13
10451045
we must first obtain the coefficient estimates associated with
10461046
`mnth`. The coefficients for January through November can be obtained
10471047
directly from the `M2_lm` object. The coefficient for December
@@ -1081,7 +1081,7 @@ ax_month.set_ylabel('Coefficient', fontsize=20);
10811081
10821082
```
10831083

1084-
Reproducing the right-hand plot in Figure~\ref{Ch4:bikeshare} follows a similar process.
1084+
Reproducing the right-hand plot in Figure 4.13 follows a similar process.
10851085

10861086
```{python}
10871087
coef_hr = S2[S2.index.str.contains('hr')]['coef']
@@ -1116,7 +1116,7 @@ M_pois = sm.GLM(Y, X2, family=sm.families.Poisson()).fit()
11161116
11171117
```
11181118

1119-
We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure~\ref{Ch4:bikeshare.pois}. We first complete these coefficients as before.
1119+
We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure 4.15. We first complete these coefficients as before.
11201120

11211121
```{python}
11221122
S_pois = summarize(M_pois)

Ch05-resample-lab.Rmd

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ for i, d in enumerate(range(1,6)):
237237
cv_error
238238
239239
```
240-
As in Figure~\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and
240+
As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and
241241
quadratic fits, but then no clear improvement from using higher-degree polynomials.
242242

243243
Above we introduced the `outer()` method of the `np.power()`
@@ -278,7 +278,7 @@ cv_error
278278
Notice that the computation time is much shorter than that of LOOCV.
279279
(In principle, the computation time for LOOCV for a least squares
280280
linear model should be faster than for $k$-fold CV, due to the
281-
availability of the formula~(\ref{Ch5:eq:LOOCVform}) for LOOCV;
281+
availability of the formula~(5.2) for LOOCV;
282282
however, the generic `cross_validate()` function does not make
283283
use of this formula.) We still see little evidence that using cubic
284284
or higher-degree polynomial terms leads to a lower test error than simply
@@ -325,7 +325,7 @@ incurred by picking different random folds.
325325

326326
## The Bootstrap
327327
We illustrate the use of the bootstrap in the simple example
328-
{of Section~\ref{Ch5:sec:bootstrap},} as well as on an example involving
328+
{of Section 5.2,} as well as on an example involving
329329
estimating the accuracy of the linear regression model on the `Auto`
330330
data set.
331331
### Estimating the Accuracy of a Statistic of Interest
@@ -340,8 +340,8 @@ in a dataframe.
340340
To illustrate the bootstrap, we
341341
start with a simple example.
342342
The `Portfolio` data set in the `ISLP` package is described
343-
in Section~\ref{Ch5:sec:bootstrap}. The goal is to estimate the
344-
sampling variance of the parameter $\alpha$ given in formula~(\ref{Ch5:min.var}). We will
343+
in Section 5.2. The goal is to estimate the
344+
sampling variance of the parameter $\alpha$ given in formula~(5.7). We will
345345
create a function
346346
`alpha_func()`, which takes as input a dataframe `D` assumed
347347
to have columns `X` and `Y`, as well as a
@@ -360,7 +360,7 @@ def alpha_func(D, idx):
360360
```
361361
This function returns an estimate for $\alpha$
362362
based on applying the minimum
363-
variance formula (\ref{Ch5:min.var}) to the observations indexed by
363+
variance formula (5.7) to the observations indexed by
364364
the argument `idx`. For instance, the following command
365365
estimates $\alpha$ using all 100 observations.
366366

@@ -430,7 +430,7 @@ intercept and slope terms for the linear regression model that uses
430430
`horsepower` to predict `mpg` in the `Auto` data set. We
431431
will compare the estimates obtained using the bootstrap to those
432432
obtained using the formulas for ${\rm SE}(\hat{\beta}_0)$ and
433-
${\rm SE}(\hat{\beta}_1)$ described in Section~\ref{Ch3:secoefsec}.
433+
${\rm SE}(\hat{\beta}_1)$ described in Section 3.1.2.
434434

435435
To use our `boot_SE()` function, we must write a function (its
436436
first argument)
@@ -499,7 +499,7 @@ This indicates that the bootstrap estimate for ${\rm SE}(\hat{\beta}_0)$ is
499499
0.85, and that the bootstrap
500500
estimate for ${\rm SE}(\hat{\beta}_1)$ is
501501
0.0074. As discussed in
502-
Section~\ref{Ch3:secoefsec}, standard formulas can be used to compute
502+
Section 3.1.2, standard formulas can be used to compute
503503
the standard errors for the regression coefficients in a linear
504504
model. These can be obtained using the `summarize()` function
505505
from `ISLP.sm`.
@@ -513,21 +513,21 @@ model_se
513513

514514

515515
The standard error estimates for $\hat{\beta}_0$ and $\hat{\beta}_1$
516-
obtained using the formulas from Section~\ref{Ch3:secoefsec} are
516+
obtained using the formulas from Section 3.1.2 are
517517
0.717 for the
518518
intercept and
519519
0.006 for the
520520
slope. Interestingly, these are somewhat different from the estimates
521521
obtained using the bootstrap. Does this indicate a problem with the
522522
bootstrap? In fact, it suggests the opposite. Recall that the
523523
standard formulas given in
524-
{Equation~\ref{Ch3:se.eqn} on page~\pageref{Ch3:se.eqn}}
524+
{Equation 3.8 on page~\pageref{Ch3:se.eqn}}
525525
rely on certain assumptions. For example,
526526
they depend on the unknown parameter $\sigma^2$, the noise
527527
variance. We then estimate $\sigma^2$ using the RSS. Now although the
528528
formulas for the standard errors do not rely on the linear model being
529529
correct, the estimate for $\sigma^2$ does. We see
530-
{in Figure~\ref{Ch3:polyplot} on page~\pageref{Ch3:polyplot}} that there is
530+
{in Figure 3.8 on page~\pageref{Ch3:polyplot}} that there is
531531
a non-linear relationship in the data, and so the residuals from a
532532
linear fit will be inflated, and so will $\hat{\sigma}^2$. Secondly,
533533
the standard formulas assume (somewhat unrealistically) that the $x_i$
@@ -540,7 +540,7 @@ the results from `sm.OLS`.
540540
Below we compute the bootstrap standard error estimates and the
541541
standard linear regression estimates that result from fitting the
542542
quadratic model to the data. Since this model provides a good fit to
543-
the data (Figure~\ref{Ch3:polyplot}), there is now a better
543+
the data (Figure 3.8), there is now a better
544544
correspondence between the bootstrap estimates and the standard
545545
estimates of ${\rm SE}(\hat{\beta}_0)$, ${\rm SE}(\hat{\beta}_1)$ and
546546
${\rm SE}(\hat{\beta}_2)$.

Ch06-varselect-lab.Rmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ Hitters.shape
8989
```
9090

9191

92-
We first choose the best model using forward selection based on $C_p$ (\ref{Ch6:eq:cp}). This score
92+
We first choose the best model using forward selection based on $C_p$ (6.2). This score
9393
is not built in as a metric to `sklearn`. We therefore define a function to compute it ourselves, and use
9494
it as a scorer. By default, `sklearn` tries to maximize a score, hence
9595
our scoring function computes the negative $C_p$ statistic.
@@ -114,7 +114,7 @@ sigma2 = OLS(Y,X).fit().scale
114114
115115
```
116116

117-
The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section~\ref{Ch5-resample-lab:the-bootstrap} to freeze the first argument with our estimate of $\sigma^2$.
117+
The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section 5.3.3 to freeze the first argument with our estimate of $\sigma^2$.
118118

119119
```{python}
120120
neg_Cp = partial(nCp, sigma2)
@@ -366,7 +366,7 @@ Since we
366366
standardize first, in order to find coefficient
367367
estimates on the original scale, we must *unstandardize*
368368
the coefficient estimates. The parameter
369-
$\lambda$ in (\ref{Ch6:ridge}) and (\ref{Ch6:LASSO}) is called `alphas` in `sklearn`. In order to
369+
$\lambda$ in (6.5) and (6.7) is called `alphas` in `sklearn`. In order to
370370
be consistent with the rest of this chapter, we use `lambdas`
371371
rather than `alphas` in what follows. {At the time of publication, ridge fits like the one in code chunk [22] issue unwarranted convergence warning messages; we expect these to disappear as this package matures.}
372372

@@ -643,7 +643,7 @@ not perform variable selection!
643643
### Evaluating Test Error of Cross-Validated Ridge
644644
Choosing $\lambda$ using cross-validation provides a single regression
645645
estimator, similar to fitting a linear regression model as we saw in
646-
Chapter~\ref{Ch3:linreg}. It is therefore reasonable to estimate what its test error
646+
Chapter 3. It is therefore reasonable to estimate what its test error
647647
is. We run into a problem here in that cross-validation will have
648648
*touched* all of its data in choosing $\lambda$, hence we have no
649649
further data to estimate test error. A compromise is to do an initial
@@ -779,11 +779,11 @@ Principal components regression (PCR) can be performed using
779779
`PCA()` from the `sklearn.decomposition`
780780
module. We now apply PCR to the `Hitters` data, in order to
781781
predict `Salary`. Again, ensure that the missing values have
782-
been removed from the data, as described in Section~\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.
782+
been removed from the data, as described in Section 6.5.1.
783783

784784
We use `LinearRegression()` to fit the regression model
785785
here. Note that it fits an intercept by default, unlike
786-
the `OLS()` function seen earlier in Section~\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.
786+
the `OLS()` function seen earlier in Section 6.5.1.
787787

788788
```{python}
789789
pca = PCA(n_components=2)
@@ -867,7 +867,7 @@ cv_null = skm.cross_validate(linreg,
867867
The `explained_variance_ratio_`
868868
attribute of our `PCA` object provides the *percentage of variance explained* in the predictors and in the response using
869869
different numbers of components. This concept is discussed in greater
870-
detail in Section~\ref{Ch10:sec:pca}.
870+
detail in Section 12.2.
871871

872872
```{python}
873873
pipe.named_steps['pca'].explained_variance_ratio_

Ch07-nonlin-lab.Rmd

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ from ISLP.pygam import (approx_lam,
5858
```
5959

6060
## Polynomial Regression and Step Functions
61-
We start by demonstrating how Figure~\ref{Ch7:fig:poly} can be reproduced.
61+
We start by demonstrating how Figure 7.1 can be reproduced.
6262
Let's begin by loading the data.
6363

6464
```{python}
@@ -70,7 +70,7 @@ age = Wage['age']
7070

7171
Throughout most of this lab, our response is `Wage['wage']`, which
7272
we have stored as `y` above.
73-
As in Section~\ref{Ch3-linreg-lab:non-linear-transformations-of-the-predictors}, we will use the `poly()` function to create a model matrix
73+
As in Section 3.6.6, we will use the `poly()` function to create a model matrix
7474
that will fit a $4$th degree polynomial in `age`.
7575

7676
```{python}
@@ -84,7 +84,7 @@ summarize(M)
8484
This polynomial is constructed using the function `poly()`,
8585
which creates
8686
a special *transformer* `Poly()` (using `sklearn` terminology
87-
for feature transformations such as `PCA()` seen in Section \ref{Ch6-varselect-lab:principal-components-regression}) which
87+
for feature transformations such as `PCA()` seen in Section 6.5.3) which
8888
allows for easy evaluation of the polynomial at new data points. Here `poly()` is referred to as a *helper* function, and sets up the transformation; `Poly()` is the actual workhorse that computes the transformation. See also
8989
the
9090
discussion of transformations on
@@ -151,7 +151,7 @@ def plot_wage_fit(age_df,
151151
We include an argument `alpha` to `ax.scatter()`
152152
to add some transparency to the points. This provides a visual indication
153153
of density. Notice the use of the `zip()` function in the
154-
`for` loop above (see Section~\ref{Ch2-statlearn-lab:for-loops}).
154+
`for` loop above (see Section 2.3.8).
155155
We have three lines to plot, each with different colors and line
156156
types. Here `zip()` conveniently bundles these together as
157157
iterators in the loop. {In `Python`{} speak, an "iterator" is an object with a finite number of values, that can be iterated on, as in a loop.}
@@ -254,7 +254,7 @@ anova_lm(*[sm.OLS(y, X_).fit() for X_ in XEs])
254254

255255

256256
As an alternative to using hypothesis tests and ANOVA, we could choose
257-
the polynomial degree using cross-validation, as discussed in Chapter~\ref{Ch5:resample}.
257+
the polynomial degree using cross-validation, as discussed in Chapter 5.
258258

259259
Next we consider the task of predicting whether an individual earns
260260
more than $250,000 per year. We proceed much as before, except
@@ -313,7 +313,7 @@ value do not cover each other up. This type of plot is often called a
313313
*rug plot*.
314314

315315
In order to fit a step function, as discussed in
316-
Section~\ref{Ch7:sec:scolstep-function}, we first use the `pd.qcut()`
316+
Section 7.2, we first use the `pd.qcut()`
317317
function to discretize `age` based on quantiles. Then we use `pd.get_dummies()` to create the
318318
columns of the model matrix for this categorical variable. Note that this function will
319319
include *all* columns for a given categorical, rather than the usual approach which drops one
@@ -345,7 +345,7 @@ evaluation functions are in the `scipy.interpolate` package;
345345
we have simply wrapped them as transforms
346346
similar to `Poly()` and `PCA()`.
347347

348-
In Section~\ref{Ch7:sec:scolr-splin}, we saw
348+
In Section 7.4, we saw
349349
that regression splines can be fit by constructing an appropriate
350350
matrix of basis functions. The `BSpline()` function generates the
351351
entire matrix of basis functions for splines with the specified set of
@@ -360,7 +360,7 @@ bs_age.shape
360360
```
361361
This results in a seven-column matrix, which is what is expected for a cubic-spline basis with 3 interior knots.
362362
We can form this same matrix using the `bs()` object,
363-
which facilitates adding this to a model-matrix builder (as in `poly()` versus its workhorse `Poly()`) described in Section~\ref{Ch7-nonlin-lab:polynomial-regression-and-step-functions}.
363+
which facilitates adding this to a model-matrix builder (as in `poly()` versus its workhorse `Poly()`) described in Section 7.8.1.
364364

365365
We now fit a cubic spline model to the `Wage` data.
366366

@@ -469,7 +469,7 @@ of a model matrix with a particular smoothing operation:
469469
`s` for smoothing spline; `l` for linear, and `f` for factor or categorical variables.
470470
The argument `0` passed to `s` below indicates that this smoother will
471471
apply to the first column of a feature matrix. Below, we pass it a
472-
matrix with a single column: `X_age`. The argument `lam` is the penalty parameter $\lambda$ as discussed in Section~\ref{Ch7:sec5.2}.
472+
matrix with a single column: `X_age`. The argument `lam` is the penalty parameter $\lambda$ as discussed in Section 7.5.2.
473473

474474
```{python}
475475
X_age = np.asarray(age).reshape((-1,1))
@@ -559,7 +559,7 @@ The strength of generalized additive models lies in their ability to fit multiva
559559

560560
We now fit a GAM by hand to predict
561561
`wage` using natural spline functions of `year` and `age`,
562-
treating `education` as a qualitative predictor, as in (\ref{Ch7:nsmod}).
562+
treating `education` as a qualitative predictor, as in (7.16).
563563
Since this is just a big linear regression model
564564
using an appropriate choice of basis functions, we can simply do this
565565
using the `sm.OLS()` function.
@@ -642,9 +642,9 @@ ax.set_title('Partial dependence of year on wage', fontsize=20);
642642
643643
```
644644

645-
We now fit the model (\ref{Ch7:nsmod}) using smoothing splines rather
645+
We now fit the model (7.16) using smoothing splines rather
646646
than natural splines. All of the
647-
terms in (\ref{Ch7:nsmod}) are fit simultaneously, taking each other
647+
terms in (7.16) are fit simultaneously, taking each other
648648
into account to explain the response. The `pygam` package only works with matrices, so we must convert
649649
the categorical series `education` to its array representation, which can be found
650650
with the `cat.codes` attribute of `education`. As `year` only has 7 unique values, we

0 commit comments

Comments
 (0)