Stock Price Prediction using Auto-ARIMA

ARIMA

In this section we will do a quick introduction about ARIMA. ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Averages. ARIMA models work on the following assumptions:

  • The data provided as input must be a univariate series, since arima uses the past values to predict the future values.
  • MA term is used to defines number of past forecast errors used to predict the future values. The parameter ‘q’ in arima represents the MA term. ACF plot is used to identify the correct ‘q’ value.
  • Order of differencing specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to determine whether the series is stationary and help in identifying the d value.

Auto ARIMA

When fitting an ARIMA model, we need to make the series stationary and determine the values of our parameters p,d and q which optimise a certain metric. There are many methods to achieve this goal and yet the correct parametrization of ARIMA models can be a tedious process that requires statistical expertise and time. In this article, we hope to overcome this issue by writing ‘a grid search’ algorithm using auto ARIMA in Python, which automatically selects the best combination of (p, d, q) that provides the least error.

Our Goals

The goal is to train an ARIMA model with optimal parameters that will forecast the closing price of the stocks on the test data.
Okay, let’s get started!

Step 1. Importing Required Libraries

import warnings
warnings.filterwarnings('ignore')
#Data Manipulation and Treatment
import numpy as np
import pandas as pd
import datetime as dt
from datetime import timedelta
#Plotting and Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
!pip install plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
#Scikit-Learn for Modeling
from sklearn.metrics import mean_squared_error,r2_score, mean_absolute_error,mean_squared_log_error
#Statistics
import statsmodels.api as sm
from statsmodels.tsa.api import Holt,SimpleExpSmoothing,ExponentialSmoothing
from pmdarima import auto_arima

Step 2. Loading the Dataset

The dataset consists of stock market data of Apple Inc. and it can be downloaded from Yahoo Finance. The data shows the stock price of Apple Inc from 2019–01–02 till 2020–10–27. We choose the closing value for this analysis.

df = pd.read_csv('AAPL.csv', sep=",")
df.head()
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')
df.index = df['Date']
df = df.drop('Date',axis=1)
df.head()

Step 3. Visualizing the Data

fig = px.line(y=df.Close, x=df.index)
fig.update_layout(title_text='Stock Prices of APPLE',font=dict(size=12),
xaxis_title_text="Date", yaxis_title_text="Close")
fig.show()

Step 4. Test For the Stationarity

# Checking Stasionarity - Dicky Fuller Test
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):

#Determining rolling statistics
rolmean = timeseries.rolling(4).mean() # around 4 weeks on each month
rolstd = timeseries.rolling(4).std()

#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)

#Perform Dickey-Fuller test:
print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput)

if dfoutput['p-value'] < 0.05:
print('result : time series is stationary')
else : print('result : time series is not stationary')
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 20,10
test_stationarity(df['Close'])
# Checking Trend and Seasonality
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df, freq=30)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
plt.subplot(411)
plt.plot(df, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

5. Fitting auto ARIMA model

Now, we are going to create an auto ARIMA model and will train it with the closing price of the stock on the train data. So let us split the data into training and test set.

model_train=df.iloc[:int(df.shape[0]*0.80)]
valid=df.iloc[int(df.shape[0]*0.80):]
y_pred=valid.copy()
model_scores_r2=[]
model_scores_mse=[]
model_scores_rmse=[]
model_scores_mae=[]
model_scores_rmsle=[]
model_arima= auto_arima(model_train["Close"],trace=True, error_action='ignore', start_p=1,start_q=1,max_p=3,max_q=3,
suppress_warnings=True,stepwise=False,seasonal=False)
model_arima.fit(model_train["Close"])

5. Forecasting the Data

Using the trained model which was built in the earlier step to forecast the sales on the test data.

prediction_arima=model_arima.predict(len(valid))
y_pred["ARIMA Model Prediction"]=prediction_arima
r2_arima= r2_score(y_pred["Close"],y_pred["ARIMA Model Prediction"])
mse_arima= mean_squared_error(y_pred["Close"],y_pred["ARIMA Model Prediction"])
rmse_arima=np.sqrt(mean_squared_error(y_pred["Close"],y_pred["ARIMA Model Prediction"]))
mae_arima=mean_absolute_error(y_pred["Close"],y_pred["ARIMA Model Prediction"])
rmsle_arima = np.sqrt(mean_squared_log_error(y_pred["Close"],y_pred["ARIMA Model Prediction"]))
model_scores_r2.append(r2_arima)
model_scores_mse.append(mse_arima)
model_scores_rmse.append(rmse_arima)
model_scores_mae.append(mae_arima)
model_scores_rmsle.append(rmsle_arima)
print("R Square Score ARIMA: ",r2_arima)
print("Mean Square Error ARIMA: ",mse_arima)
print("Root Mean Square Error ARIMA: ",rmse_arima)
print("Mean Absoulute Error ARIMA: ",mae_arima)
print("Root Mean Squared Logarithmic Error ARIMA: ", rmsle_arima)
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Close"], mode='lines',name="Train Data for Stock Prices"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Close"], mode='lines',name="Validation Data for Stock Prices",))fig.add_trace(go.Scatter(x=valid.index, y=y_pred["ARIMA Model Prediction"], mode='lines',name="Prediction for Stock Prices",))fig.update_layout(title="ARIMA",xaxis_title="Date",yaxis_title="Close",legend=dict(x=0,y=1,traceorder="normal"),font=dict(size=12))fig.show()
ARIMA_model_new_date=[]
ARIMA_model_new_prediction=[]
for i in range(1,14):
ARIMA_model_new_date.append(df.index[-1]+timedelta(days=i))
ARIMA_model_new_prediction.append(model_arima.predict(len(valid)+i)[-1])

pd.set_option('display.float_format', lambda x: '%.6f' % x)
model_predictions=pd.DataFrame(zip(holt_new_date,holt_new_prediction), columns=["Dates","ARIMA Model Prediction"])
model_predictions

Conclusion

In this article, we learned how to use the Auto ARIMA model and this approach will come into handy if you would like to generate the p, d, and q values from the model itself. In the basic ARIMA model, we need to perform differencing and plot ACF and PACF graphs to determine these values which are time-consuming. However, it is always advisable to go with statistical techniques and implement the basic ARIMA model to understand the intuitive behind the p,d, and q.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store