{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem\n",
    "\n",
    "Is to identify products at risk of backorder before the event occurs so that business has time to react. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is a Backorder?\n",
    "Backorders are products that are temporarily out of stock, but a customer is permitted to place an order against future inventory. \n",
    "A backorder generally indicates that customer demand for a product or service exceeds a company’s capacity to supply it. Back orders are both good and bad. Strong demand can drive back orders, but so can suboptimal planning. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data description\n",
    "\n",
    "Data file contains the historical data for the 8 weeks prior to the week we are trying to predict. The data was taken as weekly snapshots at the start of each week. Columns are defined as follows:\n",
    "\n",
    "    sku - Random ID for the product\n",
    "\n",
    "    national_inv - Current inventory level for the part\n",
    "\n",
    "    lead_time - Transit time for product (if available)\n",
    "\n",
    "    in_transit_qty - Amount of product in transit from source\n",
    "\n",
    "    forecast_3_month - Forecast sales for the next 3 months\n",
    "\n",
    "    forecast_6_month - Forecast sales for the next 6 months\n",
    "\n",
    "    forecast_9_month - Forecast sales for the next 9 months\n",
    "\n",
    "    sales_1_month - Sales quantity for the prior 1 month time period\n",
    "\n",
    "    sales_3_month - Sales quantity for the prior 3 month time period\n",
    "\n",
    "    sales_6_month - Sales quantity for the prior 6 month time period\n",
    "\n",
    "    sales_9_month - Sales quantity for the prior 9 month time period\n",
    "\n",
    "    min_bank - Minimum recommend amount to stock\n",
    "\n",
    "    potential_issue - Source issue for part identified\n",
    "\n",
    "    pieces_past_due - Parts overdue from source\n",
    "\n",
    "    perf_6_month_avg - Source performance for prior 6 month period\n",
    "\n",
    "    perf_12_month_avg - Source performance for prior 12 month period\n",
    "\n",
    "    local_bo_qty - Amount of stock orders overdue\n",
    "\n",
    "    deck_risk - Part risk flag\n",
    "\n",
    "    oe_constraint - Part risk flag\n",
    "\n",
    "    ppap_risk - Part risk flag\n",
    "\n",
    "    stop_auto_buy - Part risk flag\n",
    "\n",
    "    rev_stop - Part risk flag\n",
    "\n",
    "    went_on_backorder - Product actually went on backorder. This is the target value.\n",
    "    \n",
    "         Yes or 1 : Product backordered\n",
    "\n",
    "         No or 0  : Product not backordered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading the required libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
    "\n",
    "from sklearn.impute import SimpleImputer\n",
    "\n",
    "from sklearn.svm import SVC\n",
    "\n",
    "from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, f1_score\n",
    "\n",
    "from sklearn.model_selection import GridSearchCV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "df=pd.read_csv(\"BackOrders.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sku</th>\n",
       "      <th>national_inv</th>\n",
       "      <th>lead_time</th>\n",
       "      <th>in_transit_qty</th>\n",
       "      <th>forecast_3_month</th>\n",
       "      <th>forecast_6_month</th>\n",
       "      <th>forecast_9_month</th>\n",
       "      <th>sales_1_month</th>\n",
       "      <th>sales_3_month</th>\n",
       "      <th>sales_6_month</th>\n",
       "      <th>...</th>\n",
       "      <th>pieces_past_due</th>\n",
       "      <th>perf_6_month_avg</th>\n",
       "      <th>perf_12_month_avg</th>\n",
       "      <th>local_bo_qty</th>\n",
       "      <th>deck_risk</th>\n",
       "      <th>oe_constraint</th>\n",
       "      <th>ppap_risk</th>\n",
       "      <th>stop_auto_buy</th>\n",
       "      <th>rev_stop</th>\n",
       "      <th>went_on_backorder</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1888279</td>\n",
       "      <td>117</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>-99.00</td>\n",
       "      <td>-99.00</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1870557</td>\n",
       "      <td>7</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.50</td>\n",
       "      <td>0.28</td>\n",
       "      <td>0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1475481</td>\n",
       "      <td>258</td>\n",
       "      <td>15.0</td>\n",
       "      <td>10</td>\n",
       "      <td>10</td>\n",
       "      <td>77</td>\n",
       "      <td>184</td>\n",
       "      <td>46</td>\n",
       "      <td>132</td>\n",
       "      <td>256</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.54</td>\n",
       "      <td>0.70</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1758220</td>\n",
       "      <td>46</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.90</td>\n",
       "      <td>0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1360312</td>\n",
       "      <td>2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>10</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.97</td>\n",
       "      <td>0.92</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       sku  national_inv  lead_time  in_transit_qty  forecast_3_month  \\\n",
       "0  1888279           117        NaN               0                 0   \n",
       "1  1870557             7        2.0               0                 0   \n",
       "2  1475481           258       15.0              10                10   \n",
       "3  1758220            46        2.0               0                 0   \n",
       "4  1360312             2        2.0               0                 4   \n",
       "\n",
       "   forecast_6_month  forecast_9_month  sales_1_month  sales_3_month  \\\n",
       "0                 0                 0              0              0   \n",
       "1                 0                 0              0              0   \n",
       "2                77               184             46            132   \n",
       "3                 0                 0              1              2   \n",
       "4                 6                10              2              2   \n",
       "\n",
       "   sales_6_month  ...  pieces_past_due  perf_6_month_avg perf_12_month_avg  \\\n",
       "0             15  ...                0            -99.00            -99.00   \n",
       "1              0  ...                0              0.50              0.28   \n",
       "2            256  ...                0              0.54              0.70   \n",
       "3              6  ...                0              0.75              0.90   \n",
       "4              5  ...                0              0.97              0.92   \n",
       "\n",
       "   local_bo_qty  deck_risk  oe_constraint  ppap_risk stop_auto_buy rev_stop  \\\n",
       "0             0         No             No        Yes           Yes       No   \n",
       "1             0        Yes             No         No           Yes       No   \n",
       "2             0         No             No         No           Yes       No   \n",
       "3             0        Yes             No         No           Yes       No   \n",
       "4             0         No             No         No           Yes       No   \n",
       "\n",
       "  went_on_backorder  \n",
       "0                No  \n",
       "1                No  \n",
       "2                No  \n",
       "3                No  \n",
       "4                No  \n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sku</th>\n",
       "      <th>national_inv</th>\n",
       "      <th>lead_time</th>\n",
       "      <th>in_transit_qty</th>\n",
       "      <th>forecast_3_month</th>\n",
       "      <th>forecast_6_month</th>\n",
       "      <th>forecast_9_month</th>\n",
       "      <th>sales_1_month</th>\n",
       "      <th>sales_3_month</th>\n",
       "      <th>sales_6_month</th>\n",
       "      <th>...</th>\n",
       "      <th>pieces_past_due</th>\n",
       "      <th>perf_6_month_avg</th>\n",
       "      <th>perf_12_month_avg</th>\n",
       "      <th>local_bo_qty</th>\n",
       "      <th>deck_risk</th>\n",
       "      <th>oe_constraint</th>\n",
       "      <th>ppap_risk</th>\n",
       "      <th>stop_auto_buy</th>\n",
       "      <th>rev_stop</th>\n",
       "      <th>went_on_backorder</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>61584</th>\n",
       "      <td>1397275</td>\n",
       "      <td>6</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "      <td>24</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>9</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.98</td>\n",
       "      <td>0.98</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61585</th>\n",
       "      <td>3072139</td>\n",
       "      <td>130</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>80</td>\n",
       "      <td>140</td>\n",
       "      <td>18</td>\n",
       "      <td>108</td>\n",
       "      <td>230</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.51</td>\n",
       "      <td>0.28</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61586</th>\n",
       "      <td>1909363</td>\n",
       "      <td>135</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "      <td>40</td>\n",
       "      <td>65</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0.99</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61587</th>\n",
       "      <td>1845783</td>\n",
       "      <td>63</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>452</td>\n",
       "      <td>1715</td>\n",
       "      <td>3425</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>-99.00</td>\n",
       "      <td>-99.00</td>\n",
       "      <td>1</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61588</th>\n",
       "      <td>1200539</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0.79</td>\n",
       "      <td>0.78</td>\n",
       "      <td>0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           sku  national_inv  lead_time  in_transit_qty  forecast_3_month  \\\n",
       "61584  1397275             6        8.0               0                24   \n",
       "61585  3072139           130        2.0               0                40   \n",
       "61586  1909363           135        9.0               0                 0   \n",
       "61587  1845783            63        NaN               0                 0   \n",
       "61588  1200539             0        2.0               0                 8   \n",
       "\n",
       "       forecast_6_month  forecast_9_month  sales_1_month  sales_3_month  \\\n",
       "61584                24                24              0              7   \n",
       "61585                80               140             18            108   \n",
       "61586                 0                 0             10             40   \n",
       "61587                 0                 0            452           1715   \n",
       "61588                 8                 8              0              1   \n",
       "\n",
       "       sales_6_month  ...  pieces_past_due  perf_6_month_avg  \\\n",
       "61584              9  ...                0              0.98   \n",
       "61585            230  ...                0              0.51   \n",
       "61586             65  ...                0              1.00   \n",
       "61587           3425  ...                0            -99.00   \n",
       "61588              1  ...                0              0.79   \n",
       "\n",
       "      perf_12_month_avg  local_bo_qty  deck_risk  oe_constraint  ppap_risk  \\\n",
       "61584              0.98             0         No             No         No   \n",
       "61585              0.28             0         No             No         No   \n",
       "61586              0.99             0         No             No        Yes   \n",
       "61587            -99.00             1         No             No         No   \n",
       "61588              0.78             0        Yes             No         No   \n",
       "\n",
       "      stop_auto_buy rev_stop went_on_backorder  \n",
       "61584           Yes       No                No  \n",
       "61585           Yes       No                No  \n",
       "61586           Yes       No                No  \n",
       "61587            No       No               Yes  \n",
       "61588           Yes       No               Yes  \n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(61589, 23)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sku                    int64\n",
       "national_inv           int64\n",
       "lead_time            float64\n",
       "in_transit_qty         int64\n",
       "forecast_3_month       int64\n",
       "forecast_6_month       int64\n",
       "forecast_9_month       int64\n",
       "sales_1_month          int64\n",
       "sales_3_month          int64\n",
       "sales_6_month          int64\n",
       "sales_9_month          int64\n",
       "min_bank               int64\n",
       "potential_issue       object\n",
       "pieces_past_due        int64\n",
       "perf_6_month_avg     float64\n",
       "perf_12_month_avg    float64\n",
       "local_bo_qty           int64\n",
       "deck_risk             object\n",
       "oe_constraint         object\n",
       "ppap_risk             object\n",
       "stop_auto_buy         object\n",
       "rev_stop              object\n",
       "went_on_backorder     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sku</th>\n",
       "      <th>national_inv</th>\n",
       "      <th>lead_time</th>\n",
       "      <th>in_transit_qty</th>\n",
       "      <th>forecast_3_month</th>\n",
       "      <th>forecast_6_month</th>\n",
       "      <th>forecast_9_month</th>\n",
       "      <th>sales_1_month</th>\n",
       "      <th>sales_3_month</th>\n",
       "      <th>sales_6_month</th>\n",
       "      <th>sales_9_month</th>\n",
       "      <th>min_bank</th>\n",
       "      <th>pieces_past_due</th>\n",
       "      <th>perf_6_month_avg</th>\n",
       "      <th>perf_12_month_avg</th>\n",
       "      <th>local_bo_qty</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>6.158900e+04</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>58186.000000</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>6.158900e+04</td>\n",
       "      <td>6.158900e+04</td>\n",
       "      <td>6.158900e+04</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>6.158900e+04</td>\n",
       "      <td>6.158900e+04</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>61589.000000</td>\n",
       "      <td>61589.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.037188e+06</td>\n",
       "      <td>287.721882</td>\n",
       "      <td>7.559619</td>\n",
       "      <td>30.192843</td>\n",
       "      <td>1.692728e+02</td>\n",
       "      <td>3.150413e+02</td>\n",
       "      <td>4.535760e+02</td>\n",
       "      <td>44.742957</td>\n",
       "      <td>150.732631</td>\n",
       "      <td>2.835465e+02</td>\n",
       "      <td>4.196427e+02</td>\n",
       "      <td>43.087256</td>\n",
       "      <td>1.605400</td>\n",
       "      <td>-6.264182</td>\n",
       "      <td>-5.863664</td>\n",
       "      <td>1.205361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>6.564178e+05</td>\n",
       "      <td>4233.906931</td>\n",
       "      <td>6.498952</td>\n",
       "      <td>792.869253</td>\n",
       "      <td>5.286742e+03</td>\n",
       "      <td>9.774362e+03</td>\n",
       "      <td>1.420201e+04</td>\n",
       "      <td>1373.805831</td>\n",
       "      <td>5224.959649</td>\n",
       "      <td>8.872270e+03</td>\n",
       "      <td>1.269858e+04</td>\n",
       "      <td>959.614135</td>\n",
       "      <td>42.309229</td>\n",
       "      <td>25.537906</td>\n",
       "      <td>24.844514</td>\n",
       "      <td>29.981155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.068628e+06</td>\n",
       "      <td>-2999.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>-99.000000</td>\n",
       "      <td>-99.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.498574e+06</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.620000</td>\n",
       "      <td>0.640000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.898033e+06</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>4.000000e+00</td>\n",
       "      <td>6.000000e+00</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.820000</td>\n",
       "      <td>0.800000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>2.314826e+06</td>\n",
       "      <td>57.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.200000e+01</td>\n",
       "      <td>2.500000e+01</td>\n",
       "      <td>3.600000e+01</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>3.400000e+01</td>\n",
       "      <td>5.100000e+01</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.960000</td>\n",
       "      <td>0.950000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>3.284895e+06</td>\n",
       "      <td>673445.000000</td>\n",
       "      <td>52.000000</td>\n",
       "      <td>170976.000000</td>\n",
       "      <td>1.126656e+06</td>\n",
       "      <td>2.094336e+06</td>\n",
       "      <td>3.062016e+06</td>\n",
       "      <td>295197.000000</td>\n",
       "      <td>934593.000000</td>\n",
       "      <td>1.799099e+06</td>\n",
       "      <td>2.631590e+06</td>\n",
       "      <td>192978.000000</td>\n",
       "      <td>7392.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2999.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                sku   national_inv     lead_time  in_transit_qty  \\\n",
       "count  6.158900e+04   61589.000000  58186.000000    61589.000000   \n",
       "mean   2.037188e+06     287.721882      7.559619       30.192843   \n",
       "std    6.564178e+05    4233.906931      6.498952      792.869253   \n",
       "min    1.068628e+06   -2999.000000      0.000000        0.000000   \n",
       "25%    1.498574e+06       3.000000      4.000000        0.000000   \n",
       "50%    1.898033e+06      10.000000      8.000000        0.000000   \n",
       "75%    2.314826e+06      57.000000      8.000000        0.000000   \n",
       "max    3.284895e+06  673445.000000     52.000000   170976.000000   \n",
       "\n",
       "       forecast_3_month  forecast_6_month  forecast_9_month  sales_1_month  \\\n",
       "count      6.158900e+04      6.158900e+04      6.158900e+04   61589.000000   \n",
       "mean       1.692728e+02      3.150413e+02      4.535760e+02      44.742957   \n",
       "std        5.286742e+03      9.774362e+03      1.420201e+04    1373.805831   \n",
       "min        0.000000e+00      0.000000e+00      0.000000e+00       0.000000   \n",
       "25%        0.000000e+00      0.000000e+00      0.000000e+00       0.000000   \n",
       "50%        0.000000e+00      0.000000e+00      0.000000e+00       0.000000   \n",
       "75%        1.200000e+01      2.500000e+01      3.600000e+01       6.000000   \n",
       "max        1.126656e+06      2.094336e+06      3.062016e+06  295197.000000   \n",
       "\n",
       "       sales_3_month  sales_6_month  sales_9_month       min_bank  \\\n",
       "count   61589.000000   6.158900e+04   6.158900e+04   61589.000000   \n",
       "mean      150.732631   2.835465e+02   4.196427e+02      43.087256   \n",
       "std      5224.959649   8.872270e+03   1.269858e+04     959.614135   \n",
       "min         0.000000   0.000000e+00   0.000000e+00       0.000000   \n",
       "25%         0.000000   0.000000e+00   0.000000e+00       0.000000   \n",
       "50%         2.000000   4.000000e+00   6.000000e+00       0.000000   \n",
       "75%        17.000000   3.400000e+01   5.100000e+01       3.000000   \n",
       "max    934593.000000   1.799099e+06   2.631590e+06  192978.000000   \n",
       "\n",
       "       pieces_past_due  perf_6_month_avg  perf_12_month_avg  local_bo_qty  \n",
       "count     61589.000000      61589.000000       61589.000000  61589.000000  \n",
       "mean          1.605400         -6.264182          -5.863664      1.205361  \n",
       "std          42.309229         25.537906          24.844514     29.981155  \n",
       "min           0.000000        -99.000000         -99.000000      0.000000  \n",
       "25%           0.000000          0.620000           0.640000      0.000000  \n",
       "50%           0.000000          0.820000           0.800000      0.000000  \n",
       "75%           0.000000          0.960000           0.950000      0.000000  \n",
       "max        7392.000000          1.000000           1.000000   2999.000000  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 61589 entries, 0 to 61588\n",
      "Data columns (total 23 columns):\n",
      " #   Column             Non-Null Count  Dtype  \n",
      "---  ------             --------------  -----  \n",
      " 0   sku                61589 non-null  int64  \n",
      " 1   national_inv       61589 non-null  int64  \n",
      " 2   lead_time          58186 non-null  float64\n",
      " 3   in_transit_qty     61589 non-null  int64  \n",
      " 4   forecast_3_month   61589 non-null  int64  \n",
      " 5   forecast_6_month   61589 non-null  int64  \n",
      " 6   forecast_9_month   61589 non-null  int64  \n",
      " 7   sales_1_month      61589 non-null  int64  \n",
      " 8   sales_3_month      61589 non-null  int64  \n",
      " 9   sales_6_month      61589 non-null  int64  \n",
      " 10  sales_9_month      61589 non-null  int64  \n",
      " 11  min_bank           61589 non-null  int64  \n",
      " 12  potential_issue    61589 non-null  object \n",
      " 13  pieces_past_due    61589 non-null  int64  \n",
      " 14  perf_6_month_avg   61589 non-null  float64\n",
      " 15  perf_12_month_avg  61589 non-null  float64\n",
      " 16  local_bo_qty       61589 non-null  int64  \n",
      " 17  deck_risk          61589 non-null  object \n",
      " 18  oe_constraint      61589 non-null  object \n",
      " 19  ppap_risk          61589 non-null  object \n",
      " 20  stop_auto_buy      61589 non-null  object \n",
      " 21  rev_stop           61589 non-null  object \n",
      " 22  went_on_backorder  61589 non-null  object \n",
      "dtypes: float64(3), int64(13), object(7)\n",
      "memory usage: 10.8+ MB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sku                  61589\n",
       "national_inv          2916\n",
       "lead_time               28\n",
       "in_transit_qty         908\n",
       "forecast_3_month      1623\n",
       "forecast_6_month      2195\n",
       "forecast_9_month      2664\n",
       "sales_1_month         1092\n",
       "sales_3_month         1928\n",
       "sales_6_month         2679\n",
       "sales_9_month         3220\n",
       "min_bank              1098\n",
       "potential_issue          2\n",
       "pieces_past_due        190\n",
       "perf_6_month_avg       102\n",
       "perf_12_month_avg      102\n",
       "local_bo_qty           201\n",
       "deck_risk                2\n",
       "oe_constraint            2\n",
       "ppap_risk                2\n",
       "stop_auto_buy            2\n",
       "rev_stop                 2\n",
       "went_on_backorder        2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sku                     0\n",
       "national_inv            0\n",
       "lead_time            3403\n",
       "in_transit_qty          0\n",
       "forecast_3_month        0\n",
       "forecast_6_month        0\n",
       "forecast_9_month        0\n",
       "sales_1_month           0\n",
       "sales_3_month           0\n",
       "sales_6_month           0\n",
       "sales_9_month           0\n",
       "min_bank                0\n",
       "potential_issue         0\n",
       "pieces_past_due         0\n",
       "perf_6_month_avg        0\n",
       "perf_12_month_avg       0\n",
       "local_bo_qty            0\n",
       "deck_risk               0\n",
       "oe_constraint           0\n",
       "ppap_risk               0\n",
       "stop_auto_buy           0\n",
       "rev_stop                0\n",
       "went_on_backorder       0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isna().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "No     50296\n",
       "Yes    11293\n",
       "Name: went_on_backorder, dtype: int64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.went_on_backorder .value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "data=df.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "newdf=df.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(58186, 23)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sku                  0\n",
       "national_inv         0\n",
       "lead_time            0\n",
       "in_transit_qty       0\n",
       "forecast_3_month     0\n",
       "forecast_6_month     0\n",
       "forecast_9_month     0\n",
       "sales_1_month        0\n",
       "sales_3_month        0\n",
       "sales_6_month        0\n",
       "sales_9_month        0\n",
       "min_bank             0\n",
       "potential_issue      0\n",
       "pieces_past_due      0\n",
       "perf_6_month_avg     0\n",
       "perf_12_month_avg    0\n",
       "local_bo_qty         0\n",
       "deck_risk            0\n",
       "oe_constraint        0\n",
       "ppap_risk            0\n",
       "stop_auto_buy        0\n",
       "rev_stop             0\n",
       "went_on_backorder    0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf.isna().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sku                    int64\n",
       "national_inv           int64\n",
       "lead_time            float64\n",
       "in_transit_qty         int64\n",
       "forecast_3_month       int64\n",
       "forecast_6_month       int64\n",
       "forecast_9_month       int64\n",
       "sales_1_month          int64\n",
       "sales_3_month          int64\n",
       "sales_6_month          int64\n",
       "sales_9_month          int64\n",
       "min_bank               int64\n",
       "potential_issue       object\n",
       "pieces_past_due        int64\n",
       "perf_6_month_avg     float64\n",
       "perf_12_month_avg    float64\n",
       "local_bo_qty           int64\n",
       "deck_risk             object\n",
       "oe_constraint         object\n",
       "ppap_risk             object\n",
       "stop_auto_buy         object\n",
       "rev_stop              object\n",
       "went_on_backorder     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_cols=[\"national_inv\",\"lead_time\",\"in_transit_qty\",\"forecast_3_month\",\"forecast_6_month\",\"forecast_9_month\",\"sales_1_month\"\n",
    "         ,\"sales_3_month\",\"sales_6_month\",\"sales_9_month\",\"min_bank\",\"local_bo_qty\"]\n",
    "cat_cols=[\"sku\",\"potential_issue\",\"deck_risk\",\"oe_constraint\",\"ppap_risk\",\"stop_auto_buy\",\"rev_stop\",\"went_on_backorder\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "newdf[cat_cols] = newdf[cat_cols].astype('category')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sku                  category\n",
       "national_inv            int64\n",
       "lead_time             float64\n",
       "in_transit_qty          int64\n",
       "forecast_3_month        int64\n",
       "forecast_6_month        int64\n",
       "forecast_9_month        int64\n",
       "sales_1_month           int64\n",
       "sales_3_month           int64\n",
       "sales_6_month           int64\n",
       "sales_9_month           int64\n",
       "min_bank                int64\n",
       "potential_issue      category\n",
       "pieces_past_due         int64\n",
       "perf_6_month_avg      float64\n",
       "perf_12_month_avg     float64\n",
       "local_bo_qty            int64\n",
       "deck_risk            category\n",
       "oe_constraint        category\n",
       "ppap_risk            category\n",
       "stop_auto_buy        category\n",
       "rev_stop             category\n",
       "went_on_backorder    category\n",
       "dtype: object"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "newdf.drop([\"sku\"],axis=1,inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "national_inv            int64\n",
       "lead_time             float64\n",
       "in_transit_qty          int64\n",
       "forecast_3_month        int64\n",
       "forecast_6_month        int64\n",
       "forecast_9_month        int64\n",
       "sales_1_month           int64\n",
       "sales_3_month           int64\n",
       "sales_6_month           int64\n",
       "sales_9_month           int64\n",
       "min_bank                int64\n",
       "potential_issue      category\n",
       "pieces_past_due         int64\n",
       "perf_6_month_avg      float64\n",
       "perf_12_month_avg     float64\n",
       "local_bo_qty            int64\n",
       "deck_risk            category\n",
       "oe_constraint        category\n",
       "ppap_risk            category\n",
       "stop_auto_buy        category\n",
       "rev_stop             category\n",
       "went_on_backorder    category\n",
       "dtype: object"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "X= newdf.drop([\"went_on_backorder\"], axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "y=newdf[\"went_on_backorder\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(58186, 21) (58186,)\n"
     ]
    }
   ],
   "source": [
    "print(X.shape, y.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 123, stratify=y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(40730, 21)\n",
      "(17456, 21)\n",
      "(40730,)\n",
      "(17456,)\n"
     ]
    }
   ],
   "source": [
    "print(X_train.shape)\n",
    "print(X_test.shape)\n",
    "print(y_train.shape)\n",
    "print(y_test.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "No     0.81149\n",
       "Yes    0.18851\n",
       "Name: went_on_backorder, dtype: float64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train.value_counts(True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "No     0.811469\n",
       "Yes    0.188531\n",
       "Name: went_on_backorder, dtype: float64"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_test.value_counts(True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import LabelEncoder\n",
    "le=LabelEncoder()\n",
    "le.fit(y_train)\n",
    "y_train=le.transform(y_train)\n",
    "y_test=le.transform(y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    81.14903\n",
       "1    18.85097\n",
       "dtype: float64"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.value_counts(y_train)/y_train.size*100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "cat_attr=X_train.select_dtypes(include=[\"category\"]).columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import OneHotEncoder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "enc= OneHotEncoder(drop=\"first\")\n",
    "enc.fit(X_train[cat_attr])\n",
    "\n",
    "X_train_ohe=enc.transform(X_train[cat_attr]).toarray()\n",
    "\n",
    "X_test_ohe=enc.transform(X_test[cat_attr]).toarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "OneHotEncoder(drop='first')"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "enc.fit(X_test[cat_attr])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "## standardzing\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StandardScaler()"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scaler = StandardScaler()\n",
    "scaler.fit(X_train[num_cols])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_std = scaler.transform(X_train[num_cols])\n",
    "X_test_std = scaler.transform(X_test[num_cols])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(40730, 12)\n",
      "(17456, 12)\n"
     ]
    }
   ],
   "source": [
    "print(X_train_std.shape)\n",
    "print(X_test_std.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_con = np.concatenate([X_train_std, X_train_ohe], axis=1)\n",
    "X_test_con = np.concatenate([X_test_std, X_test_ohe], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(40730, 18)\n",
      "(17456, 18)\n"
     ]
    }
   ],
   "source": [
    "print(X_train_con.shape)\n",
    "print(X_test_con.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "clf1=RandomForestClassifier()\n",
    "\n",
    "clf1.fit(X_train_con,y_train)\n",
    "\n",
    "train_preds=clf1.predict(X_train_con)\n",
    "test_preds=clf1.predict(X_test_con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, f1_score\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate_model(act, pred):\n",
    "    print(\"Confusion Matrix \\n\", confusion_matrix(act, pred))\n",
    "    print(\"Accurcay : \", accuracy_score(act, pred))\n",
    "    print(\"Recall   : \", recall_score(act, pred))\n",
    "    print(\"Precision: \", precision_score(act, pred))\n",
    "    print(\"F1_score : \", f1_score(act, pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---train---\n",
      "Confusion Matrix \n",
      " [[32907   145]\n",
      " [  233  7445]]\n",
      "Accurcay :  0.9907193714706605\n",
      "Recall   :  0.969653555613441\n",
      "Precision:  0.9808959156785244\n",
      "F1_score :  0.9752423369138066\n",
      "---train---\n",
      "Confusion Matrix \n",
      " [[13529   636]\n",
      " [  699  2592]]\n",
      "Accurcay :  0.9235219981668195\n",
      "Recall   :  0.7876025524156791\n",
      "Precision:  0.8029739776951673\n",
      "F1_score :  0.7952139898757479\n"
     ]
    }
   ],
   "source": [
    "print(\"---train---\")\n",
    "evaluate_model(y_train,train_preds)\n",
    "    \n",
    "print(\"---train---\")\n",
    "evaluate_model(y_test,test_preds)   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "from imblearn.over_sampling import SMOTE\n",
    "smote=SMOTE(random_state=123)\n",
    "X_train_sm,y_train_sm=smote.fit_resample(X_train_con,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [],
   "source": [
    "clf2=RandomForestClassifier()\n",
    "clf2.fit(X_train_sm,y_train_sm)\n",
    "\n",
    "train_pred_sm=clf2.predict(X_train_sm)\n",
    "test_pred_sm=clf2.predict(X_test_con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---train---\n",
      "Confusion Matrix \n",
      " [[32765   287]\n",
      " [  282 32770]]\n",
      "Accurcay :  0.991392351446206\n",
      "Recall   :  0.9914679898342007\n",
      "Precision:  0.9913180264391808\n",
      "F1_score :  0.9913930024656249\n",
      "---train---\n",
      "Confusion Matrix \n",
      " [[13241   924]\n",
      " [  501  2790]]\n",
      "Accurcay :  0.9183661778185152\n",
      "Recall   :  0.8477666362807658\n",
      "Precision:  0.7512116316639742\n",
      "F1_score :  0.7965738758029979\n"
     ]
    }
   ],
   "source": [
    "print(\"---train---\")\n",
    "evaluate_model(y_train_sm,train_pred_sm)\n",
    "    \n",
    "print(\"---train---\")\n",
    "evaluate_model(y_test,test_pred_sm) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "param_grid={\"n_estimators\":[50,100],\n",
    "           \"max_depth\":[1,5],\n",
    "            \"max_features\":[3,5],\n",
    "            \"min_samples_leaf\":[1,2,3]\n",
    "           }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [],
   "source": [
    "clf3=RandomForestClassifier()\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "clf_grid=GridSearchCV(clf3,param_grid,cv=2)\n",
    "\n",
    "clf_grid.fit(X_train_sm,y_train_sm)\n",
    "\n",
    "train_pred_gs=clf_grid.predict(X_train_sm)\n",
    "test_pred_gs=clf_grid.predict(X_test_con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---train---\n",
      "Confusion Matrix \n",
      " [[27859  5193]\n",
      " [ 3482 29570]]\n",
      "Accurcay :  0.8687673968292388\n",
      "Recall   :  0.8946508532010166\n",
      "Precision:  0.8506170353536806\n",
      "F1_score :  0.8720784487207845\n",
      "---train---\n",
      "Confusion Matrix \n",
      " [[11906  2259]\n",
      " [  528  2763]]\n",
      "Accurcay :  0.8403414298808433\n",
      "Recall   :  0.8395624430264357\n",
      "Precision:  0.5501792114695341\n",
      "F1_score :  0.664741970407795\n"
     ]
    }
   ],
   "source": [
    "print(\"---train---\")\n",
    "evaluate_model(y_train_sm,train_pred_gs)\n",
    "    \n",
    "print(\"---train---\")\n",
    "evaluate_model(y_test,test_pred_gs) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataframe={\n",
    "    \n",
    "    \"Accuracy\":[0.9906702676160078,0.99140,0.86],\n",
    "    \"Recall\" :[0.9716071893722323,0.9915285005445964,0.8930775747307274],\n",
    "    \"precission\":[0.9912885662431942,0.9912885662431942,0.8531953637598636],\n",
    "   \"f1_score\":[0.991408518877057, 0.991408518877057, 0.8726810448048012]\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0=pd.DataFrame(dataframe)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataframe2={\n",
    "    \"Accuracy\":[0.9237511457378552,0.9177360219981668,0.843263],\n",
    "    \"Recall\":[0.7885141294439381,0.8447280461865694,0.838954],\n",
    "    \"precission\":[0.8034055727554179,0.7503373819163293,0.5558687],\n",
    "   \"f1_score\":[0.7958902008894342,0.7947398513436249,0.668684]\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "df01=pd.DataFrame(dataframe2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Accuracy</th>\n",
       "      <th>Recall</th>\n",
       "      <th>precission</th>\n",
       "      <th>f1_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.990670</td>\n",
       "      <td>0.971607</td>\n",
       "      <td>0.991289</td>\n",
       "      <td>0.991409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.991400</td>\n",
       "      <td>0.991529</td>\n",
       "      <td>0.991289</td>\n",
       "      <td>0.991409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.860000</td>\n",
       "      <td>0.893078</td>\n",
       "      <td>0.853195</td>\n",
       "      <td>0.872681</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.923751</td>\n",
       "      <td>0.788514</td>\n",
       "      <td>0.803406</td>\n",
       "      <td>0.795890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.917736</td>\n",
       "      <td>0.844728</td>\n",
       "      <td>0.750337</td>\n",
       "      <td>0.794740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.843263</td>\n",
       "      <td>0.838954</td>\n",
       "      <td>0.555869</td>\n",
       "      <td>0.668684</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Accuracy    Recall  precission  f1_score\n",
       "0  0.990670  0.971607    0.991289  0.991409\n",
       "1  0.991400  0.991529    0.991289  0.991409\n",
       "2  0.860000  0.893078    0.853195  0.872681\n",
       "0  0.923751  0.788514    0.803406  0.795890\n",
       "1  0.917736  0.844728    0.750337  0.794740\n",
       "2  0.843263  0.838954    0.555869  0.668684"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "frames=[df0,df01]\n",
    "result=pd.concat(frames)\n",
    "display(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                        Accuracy    Recall  precission  f1_score\n",
      "RandomForestClassifier   0.99067  0.971607    0.991289  0.991409\n",
      "smote                    0.99140  0.991529    0.991289  0.991409\n",
      "GridSearchCV             0.86000  0.893078    0.853195  0.872681\n"
     ]
    }
   ],
   "source": [
    "print(pd.DataFrame(dataframe,index=[\"RandomForestClassifier\",\"smote\",\"GridSearchCV\"],))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                        Accuracy    Recall  precission  f1_score\n",
      "RandomForestClassifier  0.923751  0.788514    0.803406  0.795890\n",
      "smote                   0.917736  0.844728    0.750337  0.794740\n",
      "GridSearchCV            0.843263  0.838954    0.555869  0.668684\n"
     ]
    }
   ],
   "source": [
    "print(pd.DataFrame(dataframe2,index=[\"RandomForestClassifier\",\"smote\",\"GridSearchCV\"]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "pef_columns=[\"model name\",\"training accuracy\",\"train precision\",\"train recall\",\"test accuracy\",\"test precision\",\"test recall\"]\n",
    "performance_comparision=pd.DataFrame(columns=pef_columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_to_perform_compare_df(df,model_name,train_actual,train_predict,test_actual,test_predict):\n",
    "    from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score\n",
    "    \n",
    "    train_accuracy=accuracy_score(train_actual,train_predict)\n",
    "    test_accuracy=accuracy_score(test_actual,test_predict)\n",
    "    \n",
    "    train_recall=recall_score(train_actual,train_predict)\n",
    "    test_recall=recall_score(test_actual,test_predict)\n",
    "    \n",
    "    train_precision=precision_score(train_actual,train_predict)\n",
    "    test_precision=precision_score(test_actual,test_predict)\n",
    "    \n",
    "    df=df.append(pd.Series([model_name,train_accuracy,train_precision,train_recall,test_accuracy,test_precision,test_recall],index=df.columns),\n",
    "                ignore_index=True)\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [],
   "source": [
    "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Random Forest\",y_train_sm,train_pred,y_test,test_pred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [],
   "source": [
    "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Upsampling.SMOTE\",y_train_sm,train_pred_sm,y_test,test_pred_sm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model name</th>\n",
       "      <th>training accuracy</th>\n",
       "      <th>train precision</th>\n",
       "      <th>train recall</th>\n",
       "      <th>test accuracy</th>\n",
       "      <th>test precision</th>\n",
       "      <th>test recall</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>model name</td>\n",
       "      <td>training accuracy</td>\n",
       "      <td>train precision</td>\n",
       "      <td>train recall</td>\n",
       "      <td>test accuracy</td>\n",
       "      <td>test precision</td>\n",
       "      <td>test recall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>model name</td>\n",
       "      <td>training accuracy</td>\n",
       "      <td>train precision</td>\n",
       "      <td>train recall</td>\n",
       "      <td>test accuracy</td>\n",
       "      <td>test precision</td>\n",
       "      <td>test recall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Random Forest</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hyper_para_RF_SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.991392</td>\n",
       "      <td>0.991318</td>\n",
       "      <td>0.991468</td>\n",
       "      <td>0.918366</td>\n",
       "      <td>0.751212</td>\n",
       "      <td>0.847767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Hyper_para_RF_SMOTE</td>\n",
       "      <td>0.868767</td>\n",
       "      <td>0.850617</td>\n",
       "      <td>0.894651</td>\n",
       "      <td>0.840341</td>\n",
       "      <td>0.550179</td>\n",
       "      <td>0.839562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Gradientboost_rf_smote</td>\n",
       "      <td>0.900172</td>\n",
       "      <td>0.888409</td>\n",
       "      <td>0.915315</td>\n",
       "      <td>0.873568</td>\n",
       "      <td>0.620284</td>\n",
       "      <td>0.849286</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               model name  training accuracy  train precision  train recall  \\\n",
       "0              model name  training accuracy  train precision  train recall   \n",
       "1              model name  training accuracy  train precision  train recall   \n",
       "2           Random Forest           0.872988         0.858039      0.893864   \n",
       "3        Upsampling.SMOTE           0.872988         0.858039      0.893864   \n",
       "4     Hyper_para_RF_SMOTE           0.872988         0.858039      0.893864   \n",
       "5        Upsampling.SMOTE           0.872988         0.858039      0.893864   \n",
       "6        Upsampling.SMOTE           0.991392         0.991318      0.991468   \n",
       "7     Hyper_para_RF_SMOTE           0.868767         0.850617      0.894651   \n",
       "8  Gradientboost_rf_smote           0.900172         0.888409      0.915315   \n",
       "\n",
       "   test accuracy  test precision  test recall  \n",
       "0  test accuracy  test precision  test recall  \n",
       "1  test accuracy  test precision  test recall  \n",
       "2       0.847273        0.563867     0.838347  \n",
       "3       0.847273        0.563867     0.838347  \n",
       "4       0.847273        0.563867     0.838347  \n",
       "5       0.847273        0.563867     0.838347  \n",
       "6       0.918366        0.751212     0.847767  \n",
       "7       0.840341        0.550179     0.839562  \n",
       "8       0.873568        0.620284     0.849286  "
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "performance_comparision"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [],
   "source": [
    "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Hyper_para_RF_SMOTE\",y_train_sm,train_pred_gs,y_test,test_pred_gs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(40730,)"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(66104,)"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_pred.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GradientBoostingClassifier()"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "gbc=GradientBoostingClassifier()\n",
    "gbc.fit(X_train_sm,y_train_sm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_pred_gbc=gbc.predict(X_train_sm)\n",
    "test_pred_gbc=gbc.predict(X_test_con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [],
   "source": [
    "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Gradientboost_rf_smote\",y_train_sm,train_pred_gbc,y_test,test_pred_gbc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model name</th>\n",
       "      <th>training accuracy</th>\n",
       "      <th>train precision</th>\n",
       "      <th>train recall</th>\n",
       "      <th>test accuracy</th>\n",
       "      <th>test precision</th>\n",
       "      <th>test recall</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>model name</td>\n",
       "      <td>training accuracy</td>\n",
       "      <td>train precision</td>\n",
       "      <td>train recall</td>\n",
       "      <td>test accuracy</td>\n",
       "      <td>test precision</td>\n",
       "      <td>test recall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>model name</td>\n",
       "      <td>training accuracy</td>\n",
       "      <td>train precision</td>\n",
       "      <td>train recall</td>\n",
       "      <td>test accuracy</td>\n",
       "      <td>test precision</td>\n",
       "      <td>test recall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Random Forest</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hyper_para_RF_SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.991392</td>\n",
       "      <td>0.991318</td>\n",
       "      <td>0.991468</td>\n",
       "      <td>0.918366</td>\n",
       "      <td>0.751212</td>\n",
       "      <td>0.847767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Hyper_para_RF_SMOTE</td>\n",
       "      <td>0.868767</td>\n",
       "      <td>0.850617</td>\n",
       "      <td>0.894651</td>\n",
       "      <td>0.840341</td>\n",
       "      <td>0.550179</td>\n",
       "      <td>0.839562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Gradientboost_rf_smote</td>\n",
       "      <td>0.900172</td>\n",
       "      <td>0.888409</td>\n",
       "      <td>0.915315</td>\n",
       "      <td>0.873568</td>\n",
       "      <td>0.620284</td>\n",
       "      <td>0.849286</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>xgbboosting_rf_smote</td>\n",
       "      <td>0.951092</td>\n",
       "      <td>0.942432</td>\n",
       "      <td>0.96088</td>\n",
       "      <td>0.902326</td>\n",
       "      <td>0.71068</td>\n",
       "      <td>0.812823</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               model name  training accuracy  train precision  train recall  \\\n",
       "0              model name  training accuracy  train precision  train recall   \n",
       "1              model name  training accuracy  train precision  train recall   \n",
       "2           Random Forest           0.872988         0.858039      0.893864   \n",
       "3        Upsampling.SMOTE           0.872988         0.858039      0.893864   \n",
       "4     Hyper_para_RF_SMOTE           0.872988         0.858039      0.893864   \n",
       "5        Upsampling.SMOTE           0.872988         0.858039      0.893864   \n",
       "6        Upsampling.SMOTE           0.991392         0.991318      0.991468   \n",
       "7     Hyper_para_RF_SMOTE           0.868767         0.850617      0.894651   \n",
       "8  Gradientboost_rf_smote           0.900172         0.888409      0.915315   \n",
       "9    xgbboosting_rf_smote           0.951092         0.942432       0.96088   \n",
       "\n",
       "   test accuracy  test precision  test recall  \n",
       "0  test accuracy  test precision  test recall  \n",
       "1  test accuracy  test precision  test recall  \n",
       "2       0.847273        0.563867     0.838347  \n",
       "3       0.847273        0.563867     0.838347  \n",
       "4       0.847273        0.563867     0.838347  \n",
       "5       0.847273        0.563867     0.838347  \n",
       "6       0.918366        0.751212     0.847767  \n",
       "7       0.840341        0.550179     0.839562  \n",
       "8       0.873568        0.620284     0.849286  \n",
       "9       0.902326         0.71068     0.812823  "
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "performance_comparision"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[16:03:30] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
       "              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
       "              importance_type='gain', interaction_constraints='',\n",
       "              learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
       "              min_child_weight=1, missing=nan, monotone_constraints='()',\n",
       "              n_estimators=100, n_jobs=16, num_parallel_tree=1, random_state=0,\n",
       "              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\n",
       "              tree_method='exact', validate_parameters=1, verbosity=None)"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from xgboost import XGBClassifier\n",
    "xgb=XGBClassifier()\n",
    "xgb.fit(X_train_sm,y_train_sm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_pred_xgb=xgb.predict(X_train_sm)\n",
    "test_pred_xgb=xgb.predict(X_test_con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [],
   "source": [
    "performance_comparision=add_to_perform_compare_df(performance_comparision,\"xgbboosting_rf_smote\",y_train_sm,train_pred_xgb,y_test,test_pred_xgb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model name</th>\n",
       "      <th>training accuracy</th>\n",
       "      <th>train precision</th>\n",
       "      <th>train recall</th>\n",
       "      <th>test accuracy</th>\n",
       "      <th>test precision</th>\n",
       "      <th>test recall</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>model name</td>\n",
       "      <td>training accuracy</td>\n",
       "      <td>train precision</td>\n",
       "      <td>train recall</td>\n",
       "      <td>test accuracy</td>\n",
       "      <td>test precision</td>\n",
       "      <td>test recall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>model name</td>\n",
       "      <td>training accuracy</td>\n",
       "      <td>train precision</td>\n",
       "      <td>train recall</td>\n",
       "      <td>test accuracy</td>\n",
       "      <td>test precision</td>\n",
       "      <td>test recall</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Random Forest</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hyper_para_RF_SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.872988</td>\n",
       "      <td>0.858039</td>\n",
       "      <td>0.893864</td>\n",
       "      <td>0.847273</td>\n",
       "      <td>0.563867</td>\n",
       "      <td>0.838347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Upsampling.SMOTE</td>\n",
       "      <td>0.991392</td>\n",
       "      <td>0.991318</td>\n",
       "      <td>0.991468</td>\n",
       "      <td>0.918366</td>\n",
       "      <td>0.751212</td>\n",
       "      <td>0.847767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Hyper_para_RF_SMOTE</td>\n",
       "      <td>0.868767</td>\n",
       "      <td>0.850617</td>\n",
       "      <td>0.894651</td>\n",
       "      <td>0.840341</td>\n",
       "      <td>0.550179</td>\n",
       "      <td>0.839562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Gradientboost_rf_smote</td>\n",
       "      <td>0.900172</td>\n",
       "      <td>0.888409</td>\n",
       "      <td>0.915315</td>\n",
       "      <td>0.873568</td>\n",
       "      <td>0.620284</td>\n",
       "      <td>0.849286</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>xgbboosting_rf_smote</td>\n",
       "      <td>0.951092</td>\n",
       "      <td>0.942432</td>\n",
       "      <td>0.96088</td>\n",
       "      <td>0.902326</td>\n",
       "      <td>0.71068</td>\n",
       "      <td>0.812823</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               model name  training accuracy  train precision  train recall  \\\n",
       "0              model name  training accuracy  train precision  train recall   \n",
       "1              model name  training accuracy  train precision  train recall   \n",
       "2           Random Forest           0.872988         0.858039      0.893864   \n",
       "3        Upsampling.SMOTE           0.872988         0.858039      0.893864   \n",
       "4     Hyper_para_RF_SMOTE           0.872988         0.858039      0.893864   \n",
       "5        Upsampling.SMOTE           0.872988         0.858039      0.893864   \n",
       "6        Upsampling.SMOTE           0.991392         0.991318      0.991468   \n",
       "7     Hyper_para_RF_SMOTE           0.868767         0.850617      0.894651   \n",
       "8  Gradientboost_rf_smote           0.900172         0.888409      0.915315   \n",
       "9    xgbboosting_rf_smote           0.951092         0.942432       0.96088   \n",
       "\n",
       "   test accuracy  test precision  test recall  \n",
       "0  test accuracy  test precision  test recall  \n",
       "1  test accuracy  test precision  test recall  \n",
       "2       0.847273        0.563867     0.838347  \n",
       "3       0.847273        0.563867     0.838347  \n",
       "4       0.847273        0.563867     0.838347  \n",
       "5       0.847273        0.563867     0.838347  \n",
       "6       0.918366        0.751212     0.847767  \n",
       "7       0.840341        0.550179     0.839562  \n",
       "8       0.873568        0.620284     0.849286  \n",
       "9       0.902326         0.71068     0.812823  "
      ]
     },
     "execution_count": 138,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "performance_comparision"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}