{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem\n", "\n", "Is to identify products at risk of backorder before the event occurs so that business has time to react. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is a Backorder?\n", "Backorders are products that are temporarily out of stock, but a customer is permitted to place an order against future inventory. \n", "A backorder generally indicates that customer demand for a product or service exceeds a company’s capacity to supply it. Back orders are both good and bad. Strong demand can drive back orders, but so can suboptimal planning. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data description\n", "\n", "Data file contains the historical data for the 8 weeks prior to the week we are trying to predict. The data was taken as weekly snapshots at the start of each week. Columns are defined as follows:\n", "\n", " sku - Random ID for the product\n", "\n", " national_inv - Current inventory level for the part\n", "\n", " lead_time - Transit time for product (if available)\n", "\n", " in_transit_qty - Amount of product in transit from source\n", "\n", " forecast_3_month - Forecast sales for the next 3 months\n", "\n", " forecast_6_month - Forecast sales for the next 6 months\n", "\n", " forecast_9_month - Forecast sales for the next 9 months\n", "\n", " sales_1_month - Sales quantity for the prior 1 month time period\n", "\n", " sales_3_month - Sales quantity for the prior 3 month time period\n", "\n", " sales_6_month - Sales quantity for the prior 6 month time period\n", "\n", " sales_9_month - Sales quantity for the prior 9 month time period\n", "\n", " min_bank - Minimum recommend amount to stock\n", "\n", " potential_issue - Source issue for part identified\n", "\n", " pieces_past_due - Parts overdue from source\n", "\n", " perf_6_month_avg - Source performance for prior 6 month period\n", "\n", " perf_12_month_avg - Source performance for prior 12 month period\n", "\n", " local_bo_qty - Amount of stock orders overdue\n", "\n", " deck_risk - Part risk flag\n", "\n", " oe_constraint - Part risk flag\n", "\n", " ppap_risk - Part risk flag\n", "\n", " stop_auto_buy - Part risk flag\n", "\n", " rev_stop - Part risk flag\n", "\n", " went_on_backorder - Product actually went on backorder. This is the target value.\n", " \n", " Yes or 1 : Product backordered\n", "\n", " No or 0 : Product not backordered" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading the required libraries" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "import pandas as pd\n", "\n", "\n", "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", "\n", "from sklearn.impute import SimpleImputer\n", "\n", "from sklearn.svm import SVC\n", "\n", "from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, f1_score\n", "\n", "from sklearn.model_selection import GridSearchCV" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "df=pd.read_csv(\"BackOrders.csv\")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
skunational_invlead_timein_transit_qtyforecast_3_monthforecast_6_monthforecast_9_monthsales_1_monthsales_3_monthsales_6_month...pieces_past_dueperf_6_month_avgperf_12_month_avglocal_bo_qtydeck_riskoe_constraintppap_riskstop_auto_buyrev_stopwent_on_backorder
01888279117NaN00000015...0-99.00-99.000NoNoYesYesNoNo
1187055772.00000000...00.500.280YesNoNoYesNoNo
2147548125815.010107718446132256...00.540.700NoNoNoYesNoNo
31758220462.00000126...00.750.900YesNoNoYesNoNo
4136031222.004610225...00.970.920NoNoNoYesNoNo
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " sku national_inv lead_time in_transit_qty forecast_3_month \\\n", "0 1888279 117 NaN 0 0 \n", "1 1870557 7 2.0 0 0 \n", "2 1475481 258 15.0 10 10 \n", "3 1758220 46 2.0 0 0 \n", "4 1360312 2 2.0 0 4 \n", "\n", " forecast_6_month forecast_9_month sales_1_month sales_3_month \\\n", "0 0 0 0 0 \n", "1 0 0 0 0 \n", "2 77 184 46 132 \n", "3 0 0 1 2 \n", "4 6 10 2 2 \n", "\n", " sales_6_month ... pieces_past_due perf_6_month_avg perf_12_month_avg \\\n", "0 15 ... 0 -99.00 -99.00 \n", "1 0 ... 0 0.50 0.28 \n", "2 256 ... 0 0.54 0.70 \n", "3 6 ... 0 0.75 0.90 \n", "4 5 ... 0 0.97 0.92 \n", "\n", " local_bo_qty deck_risk oe_constraint ppap_risk stop_auto_buy rev_stop \\\n", "0 0 No No Yes Yes No \n", "1 0 Yes No No Yes No \n", "2 0 No No No Yes No \n", "3 0 Yes No No Yes No \n", "4 0 No No No Yes No \n", "\n", " went_on_backorder \n", "0 No \n", "1 No \n", "2 No \n", "3 No \n", "4 No \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
skunational_invlead_timein_transit_qtyforecast_3_monthforecast_6_monthforecast_9_monthsales_1_monthsales_3_monthsales_6_month...pieces_past_dueperf_6_month_avgperf_12_month_avglocal_bo_qtydeck_riskoe_constraintppap_riskstop_auto_buyrev_stopwent_on_backorder
61584139727568.00242424079...00.980.980NoNoNoYesNoNo
6158530721391302.00408014018108230...00.510.280NoNoNoYesNoNo
6158619093631359.00000104065...01.000.990NoNoYesYesNoNo
61587184578363NaN000045217153425...0-99.00-99.001NoNoNoNoNoYes
61588120053902.00888011...00.790.780YesNoNoYesNoYes
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " sku national_inv lead_time in_transit_qty forecast_3_month \\\n", "61584 1397275 6 8.0 0 24 \n", "61585 3072139 130 2.0 0 40 \n", "61586 1909363 135 9.0 0 0 \n", "61587 1845783 63 NaN 0 0 \n", "61588 1200539 0 2.0 0 8 \n", "\n", " forecast_6_month forecast_9_month sales_1_month sales_3_month \\\n", "61584 24 24 0 7 \n", "61585 80 140 18 108 \n", "61586 0 0 10 40 \n", "61587 0 0 452 1715 \n", "61588 8 8 0 1 \n", "\n", " sales_6_month ... pieces_past_due perf_6_month_avg \\\n", "61584 9 ... 0 0.98 \n", "61585 230 ... 0 0.51 \n", "61586 65 ... 0 1.00 \n", "61587 3425 ... 0 -99.00 \n", "61588 1 ... 0 0.79 \n", "\n", " perf_12_month_avg local_bo_qty deck_risk oe_constraint ppap_risk \\\n", "61584 0.98 0 No No No \n", "61585 0.28 0 No No No \n", "61586 0.99 0 No No Yes \n", "61587 -99.00 1 No No No \n", "61588 0.78 0 Yes No No \n", "\n", " stop_auto_buy rev_stop went_on_backorder \n", "61584 Yes No No \n", "61585 Yes No No \n", "61586 Yes No No \n", "61587 No No Yes \n", "61588 Yes No Yes \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(61589, 23)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sku int64\n", "national_inv int64\n", "lead_time float64\n", "in_transit_qty int64\n", "forecast_3_month int64\n", "forecast_6_month int64\n", "forecast_9_month int64\n", "sales_1_month int64\n", "sales_3_month int64\n", "sales_6_month int64\n", "sales_9_month int64\n", "min_bank int64\n", "potential_issue object\n", "pieces_past_due int64\n", "perf_6_month_avg float64\n", "perf_12_month_avg float64\n", "local_bo_qty int64\n", "deck_risk object\n", "oe_constraint object\n", "ppap_risk object\n", "stop_auto_buy object\n", "rev_stop object\n", "went_on_backorder object\n", "dtype: object" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
skunational_invlead_timein_transit_qtyforecast_3_monthforecast_6_monthforecast_9_monthsales_1_monthsales_3_monthsales_6_monthsales_9_monthmin_bankpieces_past_dueperf_6_month_avgperf_12_month_avglocal_bo_qty
count6.158900e+0461589.00000058186.00000061589.0000006.158900e+046.158900e+046.158900e+0461589.00000061589.0000006.158900e+046.158900e+0461589.00000061589.00000061589.00000061589.00000061589.000000
mean2.037188e+06287.7218827.55961930.1928431.692728e+023.150413e+024.535760e+0244.742957150.7326312.835465e+024.196427e+0243.0872561.605400-6.264182-5.8636641.205361
std6.564178e+054233.9069316.498952792.8692535.286742e+039.774362e+031.420201e+041373.8058315224.9596498.872270e+031.269858e+04959.61413542.30922925.53790624.84451429.981155
min1.068628e+06-2999.0000000.0000000.0000000.000000e+000.000000e+000.000000e+000.0000000.0000000.000000e+000.000000e+000.0000000.000000-99.000000-99.0000000.000000
25%1.498574e+063.0000004.0000000.0000000.000000e+000.000000e+000.000000e+000.0000000.0000000.000000e+000.000000e+000.0000000.0000000.6200000.6400000.000000
50%1.898033e+0610.0000008.0000000.0000000.000000e+000.000000e+000.000000e+000.0000002.0000004.000000e+006.000000e+000.0000000.0000000.8200000.8000000.000000
75%2.314826e+0657.0000008.0000000.0000001.200000e+012.500000e+013.600000e+016.00000017.0000003.400000e+015.100000e+013.0000000.0000000.9600000.9500000.000000
max3.284895e+06673445.00000052.000000170976.0000001.126656e+062.094336e+063.062016e+06295197.000000934593.0000001.799099e+062.631590e+06192978.0000007392.0000001.0000001.0000002999.000000
\n", "
" ], "text/plain": [ " sku national_inv lead_time in_transit_qty \\\n", "count 6.158900e+04 61589.000000 58186.000000 61589.000000 \n", "mean 2.037188e+06 287.721882 7.559619 30.192843 \n", "std 6.564178e+05 4233.906931 6.498952 792.869253 \n", "min 1.068628e+06 -2999.000000 0.000000 0.000000 \n", "25% 1.498574e+06 3.000000 4.000000 0.000000 \n", "50% 1.898033e+06 10.000000 8.000000 0.000000 \n", "75% 2.314826e+06 57.000000 8.000000 0.000000 \n", "max 3.284895e+06 673445.000000 52.000000 170976.000000 \n", "\n", " forecast_3_month forecast_6_month forecast_9_month sales_1_month \\\n", "count 6.158900e+04 6.158900e+04 6.158900e+04 61589.000000 \n", "mean 1.692728e+02 3.150413e+02 4.535760e+02 44.742957 \n", "std 5.286742e+03 9.774362e+03 1.420201e+04 1373.805831 \n", "min 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n", "25% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n", "50% 0.000000e+00 0.000000e+00 0.000000e+00 0.000000 \n", "75% 1.200000e+01 2.500000e+01 3.600000e+01 6.000000 \n", "max 1.126656e+06 2.094336e+06 3.062016e+06 295197.000000 \n", "\n", " sales_3_month sales_6_month sales_9_month min_bank \\\n", "count 61589.000000 6.158900e+04 6.158900e+04 61589.000000 \n", "mean 150.732631 2.835465e+02 4.196427e+02 43.087256 \n", "std 5224.959649 8.872270e+03 1.269858e+04 959.614135 \n", "min 0.000000 0.000000e+00 0.000000e+00 0.000000 \n", "25% 0.000000 0.000000e+00 0.000000e+00 0.000000 \n", "50% 2.000000 4.000000e+00 6.000000e+00 0.000000 \n", "75% 17.000000 3.400000e+01 5.100000e+01 3.000000 \n", "max 934593.000000 1.799099e+06 2.631590e+06 192978.000000 \n", "\n", " pieces_past_due perf_6_month_avg perf_12_month_avg local_bo_qty \n", "count 61589.000000 61589.000000 61589.000000 61589.000000 \n", "mean 1.605400 -6.264182 -5.863664 1.205361 \n", "std 42.309229 25.537906 24.844514 29.981155 \n", "min 0.000000 -99.000000 -99.000000 0.000000 \n", "25% 0.000000 0.620000 0.640000 0.000000 \n", "50% 0.000000 0.820000 0.800000 0.000000 \n", "75% 0.000000 0.960000 0.950000 0.000000 \n", "max 7392.000000 1.000000 1.000000 2999.000000 " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 61589 entries, 0 to 61588\n", "Data columns (total 23 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 sku 61589 non-null int64 \n", " 1 national_inv 61589 non-null int64 \n", " 2 lead_time 58186 non-null float64\n", " 3 in_transit_qty 61589 non-null int64 \n", " 4 forecast_3_month 61589 non-null int64 \n", " 5 forecast_6_month 61589 non-null int64 \n", " 6 forecast_9_month 61589 non-null int64 \n", " 7 sales_1_month 61589 non-null int64 \n", " 8 sales_3_month 61589 non-null int64 \n", " 9 sales_6_month 61589 non-null int64 \n", " 10 sales_9_month 61589 non-null int64 \n", " 11 min_bank 61589 non-null int64 \n", " 12 potential_issue 61589 non-null object \n", " 13 pieces_past_due 61589 non-null int64 \n", " 14 perf_6_month_avg 61589 non-null float64\n", " 15 perf_12_month_avg 61589 non-null float64\n", " 16 local_bo_qty 61589 non-null int64 \n", " 17 deck_risk 61589 non-null object \n", " 18 oe_constraint 61589 non-null object \n", " 19 ppap_risk 61589 non-null object \n", " 20 stop_auto_buy 61589 non-null object \n", " 21 rev_stop 61589 non-null object \n", " 22 went_on_backorder 61589 non-null object \n", "dtypes: float64(3), int64(13), object(7)\n", "memory usage: 10.8+ MB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sku 61589\n", "national_inv 2916\n", "lead_time 28\n", "in_transit_qty 908\n", "forecast_3_month 1623\n", "forecast_6_month 2195\n", "forecast_9_month 2664\n", "sales_1_month 1092\n", "sales_3_month 1928\n", "sales_6_month 2679\n", "sales_9_month 3220\n", "min_bank 1098\n", "potential_issue 2\n", "pieces_past_due 190\n", "perf_6_month_avg 102\n", "perf_12_month_avg 102\n", "local_bo_qty 201\n", "deck_risk 2\n", "oe_constraint 2\n", "ppap_risk 2\n", "stop_auto_buy 2\n", "rev_stop 2\n", "went_on_backorder 2\n", "dtype: int64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.nunique()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sku 0\n", "national_inv 0\n", "lead_time 3403\n", "in_transit_qty 0\n", "forecast_3_month 0\n", "forecast_6_month 0\n", "forecast_9_month 0\n", "sales_1_month 0\n", "sales_3_month 0\n", "sales_6_month 0\n", "sales_9_month 0\n", "min_bank 0\n", "potential_issue 0\n", "pieces_past_due 0\n", "perf_6_month_avg 0\n", "perf_12_month_avg 0\n", "local_bo_qty 0\n", "deck_risk 0\n", "oe_constraint 0\n", "ppap_risk 0\n", "stop_auto_buy 0\n", "rev_stop 0\n", "went_on_backorder 0\n", "dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna().sum()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "No 50296\n", "Yes 11293\n", "Name: went_on_backorder, dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.went_on_backorder .value_counts()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "data=df.copy()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "newdf=df.dropna()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(58186, 23)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newdf.shape" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sku 0\n", "national_inv 0\n", "lead_time 0\n", "in_transit_qty 0\n", "forecast_3_month 0\n", "forecast_6_month 0\n", "forecast_9_month 0\n", "sales_1_month 0\n", "sales_3_month 0\n", "sales_6_month 0\n", "sales_9_month 0\n", "min_bank 0\n", "potential_issue 0\n", "pieces_past_due 0\n", "perf_6_month_avg 0\n", "perf_12_month_avg 0\n", "local_bo_qty 0\n", "deck_risk 0\n", "oe_constraint 0\n", "ppap_risk 0\n", "stop_auto_buy 0\n", "rev_stop 0\n", "went_on_backorder 0\n", "dtype: int64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newdf.isna().sum()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sku int64\n", "national_inv int64\n", "lead_time float64\n", "in_transit_qty int64\n", "forecast_3_month int64\n", "forecast_6_month int64\n", "forecast_9_month int64\n", "sales_1_month int64\n", "sales_3_month int64\n", "sales_6_month int64\n", "sales_9_month int64\n", "min_bank int64\n", "potential_issue object\n", "pieces_past_due int64\n", "perf_6_month_avg float64\n", "perf_12_month_avg float64\n", "local_bo_qty int64\n", "deck_risk object\n", "oe_constraint object\n", "ppap_risk object\n", "stop_auto_buy object\n", "rev_stop object\n", "went_on_backorder object\n", "dtype: object" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newdf.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "num_cols=[\"national_inv\",\"lead_time\",\"in_transit_qty\",\"forecast_3_month\",\"forecast_6_month\",\"forecast_9_month\",\"sales_1_month\"\n", " ,\"sales_3_month\",\"sales_6_month\",\"sales_9_month\",\"min_bank\",\"local_bo_qty\"]\n", "cat_cols=[\"sku\",\"potential_issue\",\"deck_risk\",\"oe_constraint\",\"ppap_risk\",\"stop_auto_buy\",\"rev_stop\",\"went_on_backorder\"]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "newdf[cat_cols] = newdf[cat_cols].astype('category')" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sku category\n", "national_inv int64\n", "lead_time float64\n", "in_transit_qty int64\n", "forecast_3_month int64\n", "forecast_6_month int64\n", "forecast_9_month int64\n", "sales_1_month int64\n", "sales_3_month int64\n", "sales_6_month int64\n", "sales_9_month int64\n", "min_bank int64\n", "potential_issue category\n", "pieces_past_due int64\n", "perf_6_month_avg float64\n", "perf_12_month_avg float64\n", "local_bo_qty int64\n", "deck_risk category\n", "oe_constraint category\n", "ppap_risk category\n", "stop_auto_buy category\n", "rev_stop category\n", "went_on_backorder category\n", "dtype: object" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newdf.dtypes" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "newdf.drop([\"sku\"],axis=1,inplace=True)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "national_inv int64\n", "lead_time float64\n", "in_transit_qty int64\n", "forecast_3_month int64\n", "forecast_6_month int64\n", "forecast_9_month int64\n", "sales_1_month int64\n", "sales_3_month int64\n", "sales_6_month int64\n", "sales_9_month int64\n", "min_bank int64\n", "potential_issue category\n", "pieces_past_due int64\n", "perf_6_month_avg float64\n", "perf_12_month_avg float64\n", "local_bo_qty int64\n", "deck_risk category\n", "oe_constraint category\n", "ppap_risk category\n", "stop_auto_buy category\n", "rev_stop category\n", "went_on_backorder category\n", "dtype: object" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "newdf.dtypes" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "X= newdf.drop([\"went_on_backorder\"], axis = 1)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "y=newdf[\"went_on_backorder\"]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(58186, 21) (58186,)\n" ] } ], "source": [ "print(X.shape, y.shape)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 123, stratify=y)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(40730, 21)\n", "(17456, 21)\n", "(40730,)\n", "(17456,)\n" ] } ], "source": [ "print(X_train.shape)\n", "print(X_test.shape)\n", "print(y_train.shape)\n", "print(y_test.shape)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "No 0.81149\n", "Yes 0.18851\n", "Name: went_on_backorder, dtype: float64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train.value_counts(True)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "No 0.811469\n", "Yes 0.188531\n", "Name: went_on_backorder, dtype: float64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test.value_counts(True)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import LabelEncoder\n", "le=LabelEncoder()\n", "le.fit(y_train)\n", "y_train=le.transform(y_train)\n", "y_test=le.transform(y_test)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 81.14903\n", "1 18.85097\n", "dtype: float64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.value_counts(y_train)/y_train.size*100" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "cat_attr=X_train.select_dtypes(include=[\"category\"]).columns" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import OneHotEncoder" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "enc= OneHotEncoder(drop=\"first\")\n", "enc.fit(X_train[cat_attr])\n", "\n", "X_train_ohe=enc.transform(X_train[cat_attr]).toarray()\n", "\n", "X_test_ohe=enc.transform(X_test[cat_attr]).toarray()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OneHotEncoder(drop='first')" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "enc.fit(X_test[cat_attr])" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "## standardzing\n" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StandardScaler()" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaler = StandardScaler()\n", "scaler.fit(X_train[num_cols])" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "X_train_std = scaler.transform(X_train[num_cols])\n", "X_test_std = scaler.transform(X_test[num_cols])" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(40730, 12)\n", "(17456, 12)\n" ] } ], "source": [ "print(X_train_std.shape)\n", "print(X_test_std.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "X_train_con = np.concatenate([X_train_std, X_train_ohe], axis=1)\n", "X_test_con = np.concatenate([X_test_std, X_test_ohe], axis=1)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(40730, 18)\n", "(17456, 18)\n" ] } ], "source": [ "print(X_train_con.shape)\n", "print(X_test_con.shape)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "clf1=RandomForestClassifier()\n", "\n", "clf1.fit(X_train_con,y_train)\n", "\n", "train_preds=clf1.predict(X_train_con)\n", "test_preds=clf1.predict(X_test_con)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, f1_score\n" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "def evaluate_model(act, pred):\n", " print(\"Confusion Matrix \\n\", confusion_matrix(act, pred))\n", " print(\"Accurcay : \", accuracy_score(act, pred))\n", " print(\"Recall : \", recall_score(act, pred))\n", " print(\"Precision: \", precision_score(act, pred))\n", " print(\"F1_score : \", f1_score(act, pred))" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---train---\n", "Confusion Matrix \n", " [[32907 145]\n", " [ 233 7445]]\n", "Accurcay : 0.9907193714706605\n", "Recall : 0.969653555613441\n", "Precision: 0.9808959156785244\n", "F1_score : 0.9752423369138066\n", "---train---\n", "Confusion Matrix \n", " [[13529 636]\n", " [ 699 2592]]\n", "Accurcay : 0.9235219981668195\n", "Recall : 0.7876025524156791\n", "Precision: 0.8029739776951673\n", "F1_score : 0.7952139898757479\n" ] } ], "source": [ "print(\"---train---\")\n", "evaluate_model(y_train,train_preds)\n", " \n", "print(\"---train---\")\n", "evaluate_model(y_test,test_preds) " ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "from imblearn.over_sampling import SMOTE\n", "smote=SMOTE(random_state=123)\n", "X_train_sm,y_train_sm=smote.fit_resample(X_train_con,y_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [], "source": [ "clf2=RandomForestClassifier()\n", "clf2.fit(X_train_sm,y_train_sm)\n", "\n", "train_pred_sm=clf2.predict(X_train_sm)\n", "test_pred_sm=clf2.predict(X_test_con)" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---train---\n", "Confusion Matrix \n", " [[32765 287]\n", " [ 282 32770]]\n", "Accurcay : 0.991392351446206\n", "Recall : 0.9914679898342007\n", "Precision: 0.9913180264391808\n", "F1_score : 0.9913930024656249\n", "---train---\n", "Confusion Matrix \n", " [[13241 924]\n", " [ 501 2790]]\n", "Accurcay : 0.9183661778185152\n", "Recall : 0.8477666362807658\n", "Precision: 0.7512116316639742\n", "F1_score : 0.7965738758029979\n" ] } ], "source": [ "print(\"---train---\")\n", "evaluate_model(y_train_sm,train_pred_sm)\n", " \n", "print(\"---train---\")\n", "evaluate_model(y_test,test_pred_sm) " ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "param_grid={\"n_estimators\":[50,100],\n", " \"max_depth\":[1,5],\n", " \"max_features\":[3,5],\n", " \"min_samples_leaf\":[1,2,3]\n", " }" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [], "source": [ "clf3=RandomForestClassifier()\n", "from sklearn.model_selection import GridSearchCV\n", "clf_grid=GridSearchCV(clf3,param_grid,cv=2)\n", "\n", "clf_grid.fit(X_train_sm,y_train_sm)\n", "\n", "train_pred_gs=clf_grid.predict(X_train_sm)\n", "test_pred_gs=clf_grid.predict(X_test_con)" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "---train---\n", "Confusion Matrix \n", " [[27859 5193]\n", " [ 3482 29570]]\n", "Accurcay : 0.8687673968292388\n", "Recall : 0.8946508532010166\n", "Precision: 0.8506170353536806\n", "F1_score : 0.8720784487207845\n", "---train---\n", "Confusion Matrix \n", " [[11906 2259]\n", " [ 528 2763]]\n", "Accurcay : 0.8403414298808433\n", "Recall : 0.8395624430264357\n", "Precision: 0.5501792114695341\n", "F1_score : 0.664741970407795\n" ] } ], "source": [ "print(\"---train---\")\n", "evaluate_model(y_train_sm,train_pred_gs)\n", " \n", "print(\"---train---\")\n", "evaluate_model(y_test,test_pred_gs) " ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "dataframe={\n", " \n", " \"Accuracy\":[0.9906702676160078,0.99140,0.86],\n", " \"Recall\" :[0.9716071893722323,0.9915285005445964,0.8930775747307274],\n", " \"precission\":[0.9912885662431942,0.9912885662431942,0.8531953637598636],\n", " \"f1_score\":[0.991408518877057, 0.991408518877057, 0.8726810448048012]\n", " }" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "df0=pd.DataFrame(dataframe)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "dataframe2={\n", " \"Accuracy\":[0.9237511457378552,0.9177360219981668,0.843263],\n", " \"Recall\":[0.7885141294439381,0.8447280461865694,0.838954],\n", " \"precission\":[0.8034055727554179,0.7503373819163293,0.5558687],\n", " \"f1_score\":[0.7958902008894342,0.7947398513436249,0.668684]\n", "}" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "df01=pd.DataFrame(dataframe2)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccuracyRecallprecissionf1_score
00.9906700.9716070.9912890.991409
10.9914000.9915290.9912890.991409
20.8600000.8930780.8531950.872681
00.9237510.7885140.8034060.795890
10.9177360.8447280.7503370.794740
20.8432630.8389540.5558690.668684
\n", "
" ], "text/plain": [ " Accuracy Recall precission f1_score\n", "0 0.990670 0.971607 0.991289 0.991409\n", "1 0.991400 0.991529 0.991289 0.991409\n", "2 0.860000 0.893078 0.853195 0.872681\n", "0 0.923751 0.788514 0.803406 0.795890\n", "1 0.917736 0.844728 0.750337 0.794740\n", "2 0.843263 0.838954 0.555869 0.668684" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "frames=[df0,df01]\n", "result=pd.concat(frames)\n", "display(result)" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Accuracy Recall precission f1_score\n", "RandomForestClassifier 0.99067 0.971607 0.991289 0.991409\n", "smote 0.99140 0.991529 0.991289 0.991409\n", "GridSearchCV 0.86000 0.893078 0.853195 0.872681\n" ] } ], "source": [ "print(pd.DataFrame(dataframe,index=[\"RandomForestClassifier\",\"smote\",\"GridSearchCV\"],))" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Accuracy Recall precission f1_score\n", "RandomForestClassifier 0.923751 0.788514 0.803406 0.795890\n", "smote 0.917736 0.844728 0.750337 0.794740\n", "GridSearchCV 0.843263 0.838954 0.555869 0.668684\n" ] } ], "source": [ "print(pd.DataFrame(dataframe2,index=[\"RandomForestClassifier\",\"smote\",\"GridSearchCV\"]))" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "pef_columns=[\"model name\",\"training accuracy\",\"train precision\",\"train recall\",\"test accuracy\",\"test precision\",\"test recall\"]\n", "performance_comparision=pd.DataFrame(columns=pef_columns)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "def add_to_perform_compare_df(df,model_name,train_actual,train_predict,test_actual,test_predict):\n", " from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score\n", " \n", " train_accuracy=accuracy_score(train_actual,train_predict)\n", " test_accuracy=accuracy_score(test_actual,test_predict)\n", " \n", " train_recall=recall_score(train_actual,train_predict)\n", " test_recall=recall_score(test_actual,test_predict)\n", " \n", " train_precision=precision_score(train_actual,train_predict)\n", " test_precision=precision_score(test_actual,test_predict)\n", " \n", " df=df.append(pd.Series([model_name,train_accuracy,train_precision,train_recall,test_accuracy,test_precision,test_recall],index=df.columns),\n", " ignore_index=True)\n", " return df" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Random Forest\",y_train_sm,train_pred,y_test,test_pred)" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [], "source": [ "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Upsampling.SMOTE\",y_train_sm,train_pred_sm,y_test,test_pred_sm)" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
0model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
1model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
2Random Forest0.8729880.8580390.8938640.8472730.5638670.838347
3Upsampling.SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
4Hyper_para_RF_SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
5Upsampling.SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
6Upsampling.SMOTE0.9913920.9913180.9914680.9183660.7512120.847767
7Hyper_para_RF_SMOTE0.8687670.8506170.8946510.8403410.5501790.839562
8Gradientboost_rf_smote0.9001720.8884090.9153150.8735680.6202840.849286
\n", "
" ], "text/plain": [ " model name training accuracy train precision train recall \\\n", "0 model name training accuracy train precision train recall \n", "1 model name training accuracy train precision train recall \n", "2 Random Forest 0.872988 0.858039 0.893864 \n", "3 Upsampling.SMOTE 0.872988 0.858039 0.893864 \n", "4 Hyper_para_RF_SMOTE 0.872988 0.858039 0.893864 \n", "5 Upsampling.SMOTE 0.872988 0.858039 0.893864 \n", "6 Upsampling.SMOTE 0.991392 0.991318 0.991468 \n", "7 Hyper_para_RF_SMOTE 0.868767 0.850617 0.894651 \n", "8 Gradientboost_rf_smote 0.900172 0.888409 0.915315 \n", "\n", " test accuracy test precision test recall \n", "0 test accuracy test precision test recall \n", "1 test accuracy test precision test recall \n", "2 0.847273 0.563867 0.838347 \n", "3 0.847273 0.563867 0.838347 \n", "4 0.847273 0.563867 0.838347 \n", "5 0.847273 0.563867 0.838347 \n", "6 0.918366 0.751212 0.847767 \n", "7 0.840341 0.550179 0.839562 \n", "8 0.873568 0.620284 0.849286 " ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "performance_comparision" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [], "source": [ "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Hyper_para_RF_SMOTE\",y_train_sm,train_pred_gs,y_test,test_pred_gs)" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(40730,)" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train.shape" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(66104,)" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_pred.shape" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "GradientBoostingClassifier()" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.ensemble import GradientBoostingClassifier\n", "gbc=GradientBoostingClassifier()\n", "gbc.fit(X_train_sm,y_train_sm)" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [], "source": [ "train_pred_gbc=gbc.predict(X_train_sm)\n", "test_pred_gbc=gbc.predict(X_test_con)" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [], "source": [ "performance_comparision=add_to_perform_compare_df(performance_comparision,\"Gradientboost_rf_smote\",y_train_sm,train_pred_gbc,y_test,test_pred_gbc)" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
0model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
1model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
2Random Forest0.8729880.8580390.8938640.8472730.5638670.838347
3Upsampling.SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
4Hyper_para_RF_SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
5Upsampling.SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
6Upsampling.SMOTE0.9913920.9913180.9914680.9183660.7512120.847767
7Hyper_para_RF_SMOTE0.8687670.8506170.8946510.8403410.5501790.839562
8Gradientboost_rf_smote0.9001720.8884090.9153150.8735680.6202840.849286
9xgbboosting_rf_smote0.9510920.9424320.960880.9023260.710680.812823
\n", "
" ], "text/plain": [ " model name training accuracy train precision train recall \\\n", "0 model name training accuracy train precision train recall \n", "1 model name training accuracy train precision train recall \n", "2 Random Forest 0.872988 0.858039 0.893864 \n", "3 Upsampling.SMOTE 0.872988 0.858039 0.893864 \n", "4 Hyper_para_RF_SMOTE 0.872988 0.858039 0.893864 \n", "5 Upsampling.SMOTE 0.872988 0.858039 0.893864 \n", "6 Upsampling.SMOTE 0.991392 0.991318 0.991468 \n", "7 Hyper_para_RF_SMOTE 0.868767 0.850617 0.894651 \n", "8 Gradientboost_rf_smote 0.900172 0.888409 0.915315 \n", "9 xgbboosting_rf_smote 0.951092 0.942432 0.96088 \n", "\n", " test accuracy test precision test recall \n", "0 test accuracy test precision test recall \n", "1 test accuracy test precision test recall \n", "2 0.847273 0.563867 0.838347 \n", "3 0.847273 0.563867 0.838347 \n", "4 0.847273 0.563867 0.838347 \n", "5 0.847273 0.563867 0.838347 \n", "6 0.918366 0.751212 0.847767 \n", "7 0.840341 0.550179 0.839562 \n", "8 0.873568 0.620284 0.849286 \n", "9 0.902326 0.71068 0.812823 " ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "performance_comparision" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[16:03:30] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n" ] }, { "data": { "text/plain": [ "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", " colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n", " importance_type='gain', interaction_constraints='',\n", " learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n", " min_child_weight=1, missing=nan, monotone_constraints='()',\n", " n_estimators=100, n_jobs=16, num_parallel_tree=1, random_state=0,\n", " reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\n", " tree_method='exact', validate_parameters=1, verbosity=None)" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from xgboost import XGBClassifier\n", "xgb=XGBClassifier()\n", "xgb.fit(X_train_sm,y_train_sm)" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [], "source": [ "train_pred_xgb=xgb.predict(X_train_sm)\n", "test_pred_xgb=xgb.predict(X_test_con)" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [], "source": [ "performance_comparision=add_to_perform_compare_df(performance_comparision,\"xgbboosting_rf_smote\",y_train_sm,train_pred_xgb,y_test,test_pred_xgb)" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
0model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
1model nametraining accuracytrain precisiontrain recalltest accuracytest precisiontest recall
2Random Forest0.8729880.8580390.8938640.8472730.5638670.838347
3Upsampling.SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
4Hyper_para_RF_SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
5Upsampling.SMOTE0.8729880.8580390.8938640.8472730.5638670.838347
6Upsampling.SMOTE0.9913920.9913180.9914680.9183660.7512120.847767
7Hyper_para_RF_SMOTE0.8687670.8506170.8946510.8403410.5501790.839562
8Gradientboost_rf_smote0.9001720.8884090.9153150.8735680.6202840.849286
9xgbboosting_rf_smote0.9510920.9424320.960880.9023260.710680.812823
\n", "
" ], "text/plain": [ " model name training accuracy train precision train recall \\\n", "0 model name training accuracy train precision train recall \n", "1 model name training accuracy train precision train recall \n", "2 Random Forest 0.872988 0.858039 0.893864 \n", "3 Upsampling.SMOTE 0.872988 0.858039 0.893864 \n", "4 Hyper_para_RF_SMOTE 0.872988 0.858039 0.893864 \n", "5 Upsampling.SMOTE 0.872988 0.858039 0.893864 \n", "6 Upsampling.SMOTE 0.991392 0.991318 0.991468 \n", "7 Hyper_para_RF_SMOTE 0.868767 0.850617 0.894651 \n", "8 Gradientboost_rf_smote 0.900172 0.888409 0.915315 \n", "9 xgbboosting_rf_smote 0.951092 0.942432 0.96088 \n", "\n", " test accuracy test precision test recall \n", "0 test accuracy test precision test recall \n", "1 test accuracy test precision test recall \n", "2 0.847273 0.563867 0.838347 \n", "3 0.847273 0.563867 0.838347 \n", "4 0.847273 0.563867 0.838347 \n", "5 0.847273 0.563867 0.838347 \n", "6 0.918366 0.751212 0.847767 \n", "7 0.840341 0.550179 0.839562 \n", "8 0.873568 0.620284 0.849286 \n", "9 0.902326 0.71068 0.812823 " ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "performance_comparision" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 4 }