{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Principal Component Regression vs Partial Least Squares Regression\n\nAn example of ``scikit-learn``, copied from\n[scikit-learn.org](https://scikit-learn.org/stable/auto_examples/cross_decomposition/plot_pcr_vs_pls.html#sphx-glr-auto-examples-cross-decomposition-plot-pcr-vs-pls-py).\n\nThis example compares [Principal Component Regression](https://en.wikipedia.org/wiki/Principal_component_regression) (PCR) and\n[Partial Least Squares Regression](https://en.wikipedia.org/wiki/Partial_least_squares_regression) (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## The data\n\nWe start by creating a simple dataset with two features. Before we even dive\ninto PCR and PLS, we fit a PCA estimator to display the two principal\ncomponents of this dataset, i.e. the two directions that explain the most\nvariance in the data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.decomposition import PCA\n\nrng = np.random.RandomState(0)\nn_samples = 500\ncov = [[3, 3], [3, 4]]\nX = rng.multivariate_normal(mean=[0, 0], cov=cov, size=n_samples)\npca = PCA(n_components=2).fit(X)\n\n\nplt.scatter(X[:, 0], X[:, 1], alpha=0.3, label=\"samples\")\nfor i, (comp, var) in enumerate(zip(pca.components_, pca.explained_variance_)):\n    comp = comp * var  # scale component by its variance explanation power\n    plt.plot(\n        [0, comp[0]],\n        [0, comp[1]],\n        label=f\"Component {i}\",\n        linewidth=5,\n        color=f\"C{i + 2}\",\n    )\nplt.gca().set(\n    aspect=\"equal\",\n    title=\"2-dimensional dataset with principal components\",\n    xlabel=\"first feature\",\n    ylabel=\"second feature\",\n)\nplt.legend()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For the purpose of this example, we now define the target `y` such that it is\nstrongly correlated with a direction that has a small variance. To this end,\nwe will project `X` onto the second component, and add some noise to it.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "y = X.dot(pca.components_[1]) + rng.normal(size=n_samples) / 2\n\nfig, axes = plt.subplots(1, 2, figsize=(10, 3))\n\naxes[0].scatter(X.dot(pca.components_[0]), y, alpha=0.3)\naxes[0].set(xlabel=\"Projected data onto first PCA component\", ylabel=\"y\")\naxes[1].scatter(X.dot(pca.components_[1]), y, alpha=0.3)\naxes[1].set(xlabel=\"Projected data onto second PCA component\", ylabel=\"y\")\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Projection on one component and predictive power\n\nWe now create two regressors: PCR and PLS, and for our illustration purposes\nwe set the number of components to 1. Before feeding the data to the PCA step\nof PCR, we first standardize it, as recommended by good practice. The PLS\nestimator has built-in scaling capabilities.\n\nFor both models, we plot the projected data onto the first component against\nthe target. In both cases, this projected data is what the regressors will\nuse as training data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.model_selection import train_test_split\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.decomposition import PCA\nfrom sklearn.cross_decomposition import PLSRegression\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rng)\n\npcr = make_pipeline(StandardScaler(), PCA(n_components=1), LinearRegression())\npcr.fit(X_train, y_train)\npca = pcr.named_steps[\"pca\"]  # retrieve the PCA step of the pipeline\n\npls = PLSRegression(n_components=1)\npls.fit(X_train, y_train)\n\nfig, axes = plt.subplots(1, 2, figsize=(10, 3))\naxes[0].scatter(pca.transform(X_test), y_test, alpha=0.3, label=\"ground truth\")\naxes[0].scatter(\n    pca.transform(X_test), pcr.predict(X_test), alpha=0.3, label=\"predictions\"\n)\naxes[0].set(\n    xlabel=\"Projected data onto first PCA component\", ylabel=\"y\", title=\"PCR / PCA\"\n)\naxes[0].legend()\naxes[1].scatter(pls.transform(X_test), y_test, alpha=0.3, label=\"ground truth\")\naxes[1].scatter(\n    pls.transform(X_test), pls.predict(X_test), alpha=0.3, label=\"predictions\"\n)\naxes[1].set(xlabel=\"Projected data onto first PLS component\", ylabel=\"y\", title=\"PLS\")\naxes[1].legend()\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As expected, the unsupervised PCA transformation of PCR has dropped the\nsecond component, i.e. the direction with the lowest variance, despite\nit being the most predictive direction. This is because PCA is a completely\nunsupervised transformation, and results in the projected data having a low\npredictive power on the target.\n\nOn the other hand, the PLS regressor manages to capture the effect of the\ndirection with the lowest variance, thanks to its use of target information\nduring the transformation: it can recognize that this direction is actually\nthe most predictive. We note that the first PLS component is negatively\ncorrelated with the target, which comes from the fact that the signs of\neigenvectors are arbitrary.\n\nWe also print the R-squared scores of both estimators, which further confirms\nthat PLS is a better alternative than PCR in this case. A negative R-squared\nindicates that PCR performs worse than a regressor that would simply predict\nthe mean of the target.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"PCR r-squared {pcr.score(X_test, y_test):.3f}\")\nprint(f\"PLS r-squared {pls.score(X_test, y_test):.3f}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As a final remark, we note that PCR with 2 components performs as well as\nPLS: this is because in this case, PCR was able to leverage the second\ncomponent which has the most preditive power on the target.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pca_2 = make_pipeline(PCA(n_components=2), LinearRegression())\npca_2.fit(X_train, y_train)\nprint(f\"PCR r-squared with 2 components {pca_2.score(X_test, y_test):.3f}\")"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}