# ARIMA

In this section we will do a quick introduction about ARIMA. ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Averages. ARIMA models work on the following assumptions:

• The data provided as input must be a univariate series, since arima uses the past values to predict the future values.
• MA term is used to defines number of past forecast errors used to predict the future values. The parameter ‘q’ in arima represents the MA term. ACF plot is used to identify the correct ‘q’ value.
• Order of differencing specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to determine whether the series is stationary and help in identifying the d value.

# Auto ARIMA

When fitting an ARIMA model, we need to make the series stationary and determine the values of our parameters p,d and q which optimise a certain metric. There are many methods to achieve this goal and yet the correct parametrization of ARIMA models can be a tedious process that requires statistical expertise and time. In this article, we hope to overcome this issue by writing ‘a grid search’ algorithm using auto ARIMA in Python, which automatically selects the best combination of (p, d, q) that provides the least error.

# Our Goals

The goal is to train an ARIMA model with optimal parameters that will forecast the closing price of the stocks on the test data.
Okay, let’s get started!

## Step 1. Importing Required Libraries

`import warningswarnings.filterwarnings('ignore')#Data Manipulation and Treatmentimport numpy as npimport pandas as pdimport datetime as dtfrom datetime import timedelta#Plotting and Visualizationsimport matplotlib.pyplot as pltimport seaborn as sns!pip install plotlyimport plotly.express as pximport plotly.graph_objects as gofrom plotly.subplots import make_subplots#Scikit-Learn for Modelingfrom sklearn.metrics import mean_squared_error,r2_score, mean_absolute_error,mean_squared_log_error#Statisticsimport statsmodels.api as smfrom statsmodels.tsa.api import Holt,SimpleExpSmoothing,ExponentialSmoothingfrom pmdarima import auto_arima`

The dataset consists of stock market data of Apple Inc. and it can be downloaded from Yahoo Finance. The data shows the stock price of Apple Inc from 2019–01–02 till 2020–10–27. We choose the closing value for this analysis.

`df = pd.read_csv('AAPL.csv', sep=",")df.head()df['Date'] =  pd.to_datetime(df.Date,format='%Y-%m-%d')df.index = df['Date']df = df.drop('Date',axis=1)df.head()`

## Step 3. Visualizing the Data

`fig = px.line(y=df.Close, x=df.index)fig.update_layout(title_text='Stock Prices of APPLE',font=dict(size=12),                 xaxis_title_text="Date", yaxis_title_text="Close")fig.show()`

## Step 4. Test For the Stationarity

`# Checking Stasionarity - Dicky Fuller Testfrom statsmodels.tsa.stattools import adfuller def test_stationarity(timeseries):        #Determining rolling statistics    rolmean = timeseries.rolling(4).mean() # around 4 weeks on each month    rolstd = timeseries.rolling(4).std()        #Plot rolling statistics:    orig = plt.plot(timeseries, color='blue',label='Original')    mean = plt.plot(rolmean, color='red', label='Rolling Mean')    std = plt.plot(rolstd, color='black', label = 'Rolling Std')    plt.legend(loc='best')    plt.title('Rolling Mean & Standard Deviation')    plt.show(block=False)        #Perform Dickey-Fuller test:    print ('Results of Dickey-Fuller Test:')    dftest = adfuller(timeseries, autolag='AIC')    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])for key,value in dftest.items():        dfoutput['Critical Value (%s)'%key] = value    print (dfoutput)        if dfoutput['p-value'] < 0.05:        print('result : time series is stationary')    else : print('result : time series is not stationary')from matplotlib.pylab import rcParams rcParams['figure.figsize'] = 20,10test_stationarity(df['Close'])`
`# Checking Trend and Seasonalityfrom statsmodels.tsa.seasonal import seasonal_decompose decomposition = seasonal_decompose(df, freq=30)trend = decomposition.trend seasonal = decomposition.seasonal residual = decomposition.residplt.subplot(411) plt.plot(df, label='Original') plt.legend(loc='best') plt.subplot(412) plt.plot(trend, label='Trend') plt.legend(loc='best') plt.subplot(413) plt.plot(seasonal,label='Seasonality') plt.legend(loc='best') plt.subplot(414) plt.plot(residual, label='Residuals') plt.legend(loc='best') plt.tight_layout() plt.show()`

## 5. Fitting auto ARIMA model

Now, we are going to create an auto ARIMA model and will train it with the closing price of the stock on the train data. So let us split the data into training and test set.

`model_train=df.iloc[:int(df.shape*0.80)]valid=df.iloc[int(df.shape*0.80):]y_pred=valid.copy()`
`model_scores_r2=[]model_scores_mse=[]model_scores_rmse=[]model_scores_mae=[]model_scores_rmsle=[]model_arima= auto_arima(model_train["Close"],trace=True, error_action='ignore', start_p=1,start_q=1,max_p=3,max_q=3,              suppress_warnings=True,stepwise=False,seasonal=False)model_arima.fit(model_train["Close"])`

## 5. Forecasting the Data

Using the trained model which was built in the earlier step to forecast the sales on the test data.

`prediction_arima=model_arima.predict(len(valid))y_pred["ARIMA Model Prediction"]=prediction_arimar2_arima= r2_score(y_pred["Close"],y_pred["ARIMA Model Prediction"])mse_arima= mean_squared_error(y_pred["Close"],y_pred["ARIMA Model Prediction"])rmse_arima=np.sqrt(mean_squared_error(y_pred["Close"],y_pred["ARIMA Model Prediction"]))mae_arima=mean_absolute_error(y_pred["Close"],y_pred["ARIMA Model Prediction"])rmsle_arima = np.sqrt(mean_squared_log_error(y_pred["Close"],y_pred["ARIMA Model Prediction"]))model_scores_r2.append(r2_arima)model_scores_mse.append(mse_arima)model_scores_rmse.append(rmse_arima)model_scores_mae.append(mae_arima)model_scores_rmsle.append(rmsle_arima)print("R Square Score ARIMA: ",r2_arima)print("Mean Square Error ARIMA: ",mse_arima)print("Root Mean Square Error ARIMA: ",rmse_arima)print("Mean Absoulute Error ARIMA: ",mae_arima)print("Root Mean Squared Logarithmic Error ARIMA: ", rmsle_arima)`
`fig=go.Figure()fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Close"], mode='lines',name="Train Data for Stock Prices"))fig.add_trace(go.Scatter(x=valid.index, y=valid["Close"], mode='lines',name="Validation Data for Stock Prices",))fig.add_trace(go.Scatter(x=valid.index, y=y_pred["ARIMA Model Prediction"], mode='lines',name="Prediction for Stock Prices",))fig.update_layout(title="ARIMA",xaxis_title="Date",yaxis_title="Close",legend=dict(x=0,y=1,traceorder="normal"),font=dict(size=12))fig.show()`
`ARIMA_model_new_date=[]ARIMA_model_new_prediction=[]for i in range(1,14):    ARIMA_model_new_date.append(df.index[-1]+timedelta(days=i))    ARIMA_model_new_prediction.append(model_arima.predict(len(valid)+i)[-1])    pd.set_option('display.float_format', lambda x: '%.6f' % x)model_predictions=pd.DataFrame(zip(holt_new_date,holt_new_prediction), columns=["Dates","ARIMA Model Prediction"])model_predictions`

# Conclusion

In this article, we learned how to use the Auto ARIMA model and this approach will come into handy if you would like to generate the p, d, and q values from the model itself. In the basic ARIMA model, we need to perform differencing and plot ACF and PACF graphs to determine these values which are time-consuming. However, it is always advisable to go with statistical techniques and implement the basic ARIMA model to understand the intuitive behind the p,d, and q.

--

--