!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.5.0-cp38-cp38-manylinux2010_x86_64.whl (454.4 MB)
     |████████████████████████████████| 454.4 MB 28 kB/s  eta 0:00:011   |█▏                              | 17.3 MB 2.3 MB/s eta 0:03:09     |█▊                              | 24.4 MB 3.8 MB/s eta 0:01:53     |██▍                             | 33.4 MB 6.6 MB/s eta 0:01:04     |██▋                             | 36.9 MB 5.9 MB/s eta 0:01:12     |██▊                             | 38.3 MB 5.9 MB/s eta 0:01:11     |██▊                             | 38.6 MB 5.9 MB/s eta 0:01:11     |███▊                            | 52.5 MB 5.3 MB/s eta 0:01:16     |████▌                           | 64.5 MB 4.0 MB/s eta 0:01:38     |█████▋                          | 79.7 MB 4.1 MB/s eta 0:01:31     |█████▉                          | 83.3 MB 5.0 MB/s eta 0:01:14     |██████                          | 85.6 MB 5.0 MB/s eta 0:01:14     |██████▏                         | 88.1 MB 5.6 MB/s eta 0:01:06     |███████▊                        | 110.1 MB 4.0 MB/s eta 0:01:27     |████████                        | 114.8 MB 9.6 MB/s eta 0:00:36     |█████████▋                      | 137.2 MB 5.3 MB/s eta 0:01:00     |█████████▉                      | 138.9 MB 5.3 MB/s eta 0:01:00     |██████████▊                     | 152.4 MB 2.7 MB/s eta 0:01:54     |███████████▏                    | 159.0 MB 2.1 MB/s eta 0:02:20     |███████████▍                    | 161.6 MB 10.0 MB/s eta 0:00:30     |███████████▊                    | 165.7 MB 10.0 MB/s eta 0:00:29     |████████████                    | 169.6 MB 9.5 MB/s eta 0:00:31     |████████████▍                   | 175.7 MB 9.0 MB/s eta 0:00:31     |████████████▊                   | 180.2 MB 9.0 MB/s eta 0:00:31     |████████████▊                   | 180.7 MB 9.0 MB/s eta 0:00:31     |██████████████▌                 | 205.3 MB 2.9 MB/s eta 0:01:26     |███████████████▍                | 218.9 MB 11.6 MB/s eta 0:00:21     |███████████████▊                | 223.4 MB 4.6 MB/s eta 0:00:50     |████████████████▍               | 233.0 MB 6.6 MB/s eta 0:00:34     |█████████████████               | 242.6 MB 4.9 MB/s eta 0:00:44     |██████████████████▏             | 258.7 MB 6.0 MB/s eta 0:00:33     |███████████████████▌            | 277.0 MB 5.0 MB/s eta 0:00:36     |██████████████████████▌         | 320.2 MB 5.2 MB/s eta 0:00:26     |██████████████████████▊         | 323.0 MB 5.2 MB/s eta 0:00:26     |███████████████████████         | 325.5 MB 7.4 MB/s eta 0:00:18     |███████████████████████▌        | 333.8 MB 2.8 MB/s eta 0:00:43     |███████████████████████▊        | 337.4 MB 3.6 MB/s eta 0:00:33     |███████████████████████▉        | 337.8 MB 3.6 MB/s eta 0:00:33     |█████████████████████████       | 354.7 MB 4.7 MB/s eta 0:00:22     |█████████████████████████▍      | 360.3 MB 4.7 MB/s eta 0:00:20     |█████████████████████████▌      | 361.8 MB 4.8 MB/s eta 0:00:20     |█████████████████████████▌      | 362.4 MB 4.8 MB/s eta 0:00:20     |██████████████████████████      | 370.3 MB 1.5 MB/s eta 0:00:57     |███████████████████████████▌    | 391.1 MB 2.2 MB/s eta 0:00:29     |████████████████████████████    | 398.1 MB 5.7 MB/s eta 0:00:10     |████████████████████████████▌   | 404.3 MB 5.4 MB/s eta 0:00:10     |██████████████████████████████  | 425.6 MB 2.5 MB/s eta 0:00:12     |██████████████████████████████▌ | 432.5 MB 1.9 MB/s eta 0:00:12     |██████████████████████████████▋ | 434.5 MB 6.0 MB/s eta 0:00:04     |███████████████████████████████ | 438.8 MB 6.0 MB/s eta 0:00:03     |███████████████████████████████ | 439.8 MB 12.5 MB/s eta 0:00:02     |███████████████████████████████ | 441.7 MB 12.5 MB/s eta 0:00:02
Requirement already satisfied: six~=1.15.0 in /opt/conda/lib/python3.8/site-packages (from tensorflow) (1.15.0)
Requirement already satisfied: h5py~=3.1.0 in /opt/conda/lib/python3.8/site-packages (from tensorflow) (3.1.0)
Collecting absl-py~=0.10
  Downloading absl_py-0.12.0-py3-none-any.whl (129 kB)
     |████████████████████████████████| 129 kB 14.8 MB/s eta 0:00:01
Collecting wrapt~=1.12.1
  Downloading wrapt-1.12.1.tar.gz (27 kB)
Requirement already satisfied: numpy~=1.19.2 in /opt/conda/lib/python3.8/site-packages (from tensorflow) (1.19.5)
Collecting tensorflow-estimator<2.6.0,>=2.5.0rc0
  Downloading tensorflow_estimator-2.5.0-py2.py3-none-any.whl (462 kB)
     |████████████████████████████████| 462 kB 6.9 MB/s eta 0:00:01
Collecting grpcio~=1.34.0
  Downloading grpcio-1.34.1-cp38-cp38-manylinux2014_x86_64.whl (4.0 MB)
     |████████████████████████████████| 4.0 MB 4.5 MB/s eta 0:00:01     |███████████████████████▎        | 2.9 MB 4.5 MB/s eta 0:00:01
Collecting keras-nightly~=2.5.0.dev
  Downloading keras_nightly-2.5.0.dev2021032900-py2.py3-none-any.whl (1.2 MB)
     |████████████████████████████████| 1.2 MB 7.3 MB/s eta 0:00:01
Collecting flatbuffers~=1.12.0
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting tensorboard~=2.5
  Downloading tensorboard-2.5.0-py3-none-any.whl (6.0 MB)
     |████████████████████████████████| 6.0 MB 14.6 MB/s eta 0:00:01
Collecting gast==0.4.0
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting google-pasta~=0.2
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
     |████████████████████████████████| 57 kB 4.6 MB/s  eta 0:00:01
Collecting opt-einsum~=3.3.0
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
     |████████████████████████████████| 65 kB 4.4 MB/s eta 0:00:011
Collecting astunparse~=1.6.3
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: typing-extensions~=3.7.4 in /opt/conda/lib/python3.8/site-packages (from tensorflow) (3.7.4.3)
Collecting keras-preprocessing~=1.1.2
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
     |████████████████████████████████| 42 kB 1.5 MB/s  eta 0:00:01
Collecting termcolor~=1.1.0
  Downloading termcolor-1.1.0.tar.gz (3.9 kB)
Requirement already satisfied: protobuf>=3.9.2 in /opt/conda/lib/python3.8/site-packages (from tensorflow) (3.14.0)
Requirement already satisfied: wheel~=0.35 in /opt/conda/lib/python3.8/site-packages (from tensorflow) (0.36.2)
Collecting werkzeug>=0.11.15
  Downloading Werkzeug-2.0.0-py3-none-any.whl (288 kB)
     |████████████████████████████████| 288 kB 12.1 MB/s eta 0:00:01
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.0-py3-none-any.whl (781 kB)
     |████████████████████████████████| 781 kB 7.8 MB/s eta 0:00:01
Requirement already satisfied: requests<3,>=2.21.0 in /opt/conda/lib/python3.8/site-packages (from tensorboard~=2.5->tensorflow) (2.25.1)
Collecting markdown>=2.6.8
  Downloading Markdown-3.3.4-py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 3.8 MB/s eta 0:00:01
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Downloading google_auth_oauthlib-0.4.4-py2.py3-none-any.whl (18 kB)
Collecting google-auth<2,>=1.6.3
  Downloading google_auth-1.30.0-py2.py3-none-any.whl (146 kB)
     |████████████████████████████████| 146 kB 7.2 MB/s eta 0:00:01
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
     |████████████████████████████████| 4.9 MB 5.2 MB/s eta 0:00:01     |██▋                             | 399 kB 5.2 MB/s eta 0:00:01
Requirement already satisfied: setuptools>=41.0.0 in /opt/conda/lib/python3.8/site-packages (from tensorboard~=2.5->tensorflow) (49.6.0.post20210108)
Collecting rsa<5,>=3.1.4
  Downloading rsa-4.7.2-py3-none-any.whl (34 kB)
Collecting cachetools<5.0,>=2.0.0
  Downloading cachetools-4.2.2-py3-none-any.whl (11 kB)
Collecting pyasn1-modules>=0.2.1
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
     |████████████████████████████████| 155 kB 7.2 MB/s eta 0:00:01
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Downloading pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
     |████████████████████████████████| 77 kB 3.1 MB/s eta 0:00:01
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow) (1.26.3)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow) (2020.12.5)
Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow) (4.0.0)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow) (3.0.1)
Building wheels for collected packages: termcolor, wrapt
  Building wheel for termcolor (setup.py) ... done
  Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4829 sha256=e8f80d6c42650f141e467a80050ef6a6c8efba11b67bc2f00c6afe512b9f7939
  Stored in directory: /home/jovyan/.cache/pip/wheels/a0/16/9c/5473df82468f958445479c59e784896fa24f4a5fc024b0f501
  Building wheel for wrapt (setup.py) ... done
  Created wheel for wrapt: filename=wrapt-1.12.1-cp38-cp38-linux_x86_64.whl size=81771 sha256=c49c0a6c134f732af69484d67c84151675a18ce99cdeba048f0357aa0ec143fc
  Stored in directory: /home/jovyan/.cache/pip/wheels/5f/fd/9e/b6cf5890494cb8ef0b5eaff72e5d55a70fb56316007d6dfe73
Successfully built termcolor wrapt
Installing collected packages: pyasn1, rsa, pyasn1-modules, cachetools, requests-oauthlib, google-auth, werkzeug, tensorboard-plugin-wit, tensorboard-data-server, markdown, grpcio, google-auth-oauthlib, absl-py, wrapt, termcolor, tensorflow-estimator, tensorboard, opt-einsum, keras-preprocessing, keras-nightly, google-pasta, gast, flatbuffers, astunparse, tensorflow
Successfully installed absl-py-0.12.0 astunparse-1.6.3 cachetools-4.2.2 flatbuffers-1.12 gast-0.4.0 google-auth-1.30.0 google-auth-oauthlib-0.4.4 google-pasta-0.2.0 grpcio-1.34.1 keras-nightly-2.5.0.dev2021032900 keras-preprocessing-1.1.2 markdown-3.3.4 opt-einsum-3.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-oauthlib-1.3.0 rsa-4.7.2 tensorboard-2.5.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.0 tensorflow-2.5.0 tensorflow-estimator-2.5.0 termcolor-1.1.0 werkzeug-2.0.0 wrapt-1.12.1


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn 
import tensorflow as tf
import datetime
from datetime import datetime
from statsmodels.formula.api import ols


dataset1 = pd.read_csv('BTC-USD.csv')
dataset1


dataset2 = pd.read_csv('BTCUSD_day.csv')
dataset2


# Drops the Volume BTC Column
dataset2 = dataset2.drop('Volume BTC', axis=1)

# renames the Volume USD column to Volume
dataset2.rename(columns={"Volume USD" : "Volume"}, inplace=True)

# displays the first 5 rows to see how the data looks now
dataset2.head()


# drops the Symbol column
dataset2.drop('Symbol', axis=1, inplace=True)

# reverses the order of the rows s.t. dataset2 is also in oldest-latest order
dataset2 = dataset2.iloc[::-1].reset_index(drop=True)
dataset2


# removes the Adj. Close column
dataset1.drop('Adj Close', axis=1, inplace=True)
dataset1.head()


dataset1


dataset2


dataset2.drop(1646, axis=0, inplace=True)
dataset2.tail()
# Observe the 1646 row has been dropped


frames = [dataset2, dataset1]
# merging the two data sets
data = pd.concat(frames).reset_index(drop=True)
data


data.dtypes

Date       object
Open      float64
High      float64
Low       float64
Close     float64
Volume    float64
dtype: object


print(type(data['Date'][0]))
print(type(data['Date'][2040]))

<class 'str'>
<class 'str'>


# converts Date column from a Series of strings to a Series of datetime objects
data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, "%Y-%m-%d"))
type(data['Date'][0])

pandas._libs.tslibs.timestamps.Timestamp


# prints locations where data is NaN
data[data.isna().any(axis=1)]


# prints locations where data is 0 for Volume
data[data['Volume'] == 0]


# prints locations where data is 0 for High
data[data['High'] == 0]


# prints locations where data is 0 for Low
data[data['Low'] == 0]


# prints locations where data is 0 for Open
data[data['Open'] == 0]


# prints locations where data is 0 for Close
data[data['Close'] == 0]


# drops any row with NaN values; axis = 0 -> drops rows
data = data.dropna(axis=0)
# resets indices
data = data.reset_index(drop=True)
data


# acquires the new indices where we need to update the 0 volume data since we re-indexed in earlier bloc
# Turns out they are the same, but good practice still to ensure you are updating the correct indices
data[data['Volume'] == 0]


# Replaces the 0 volume values with values searched up on yahoo
data.at[688, 'Volume'] = 1511609984
data.at[1039, 'Volume'] = 5665250000
data.at[1104, 'Volume'] = 4074800000

# displays whether we still have 0 volume values to ensure we did change the values
data[data['Volume'] == 0]


data


plot_cols = ['Open', 'High', 'Low', 'Close']
# setting size of the plot
plt.figure(figsize=(20,10))
# titling the graph and setting the labels
plt.title('Historical BTC Price Data in USD')
plt.ylabel('Price')
#plotting each price metric
for col in plot_cols:
    y = data[col]
    x = data['Date']
    plt.plot(x, y, label=col)
    plt.legend(bbox_to_anchor=(1.1,1), loc=0, borderaxespad=0)


# plotting the top portion which is the Close price
top_plt = plt.subplot2grid((5,4), (0, 0), rowspan=3, colspan=4)
top_plt.plot(data['Date'], data['Close'])
# titling the top half
plt.title('Historical BTC Price Data in USD')
# titling the bottom half
plt.title('\nHistorical BTC Trading Volume')
# plotting the bottom portion which is Volume
bot_plt = plt.subplot2grid((5,4), (3,0), rowspan=3, colspan=4)
bot_plt.bar(data['Date'], data['Volume'])
plt.gcf().set_size_inches(12,8)


# generating the data up until March 11th 2020, the day the WHO declared COVID-19 to be a pandemic
# we also pass in the data sorted by volume
uptill_2020 = data[data['Date'] < datetime(2020, 3, 11)].sort_values('Volume')
x_uptill = uptill_2020['Volume']
y_uptill = uptill_2020['Close']
# modelling a linear regression for the data up till March 11th, 2020
# generates estimates -> m = slope parameter, b = intercept
m_b4, b_b4 = np.polyfit(x=x_uptill,y=y_uptill,deg=1)
# plotting the actual data
plt.figure(figsize=(20,10))
plt.title('Volume vs Close Price BEFORE March 11 2020')
plt.xlabel('Volume')
plt.ylabel('Price')
plt.plot(x_uptill, y_uptill, '.', x_uptill, m_b4 * x_uptill + b_b4)

[<matplotlib.lines.Line2D at 0x7f87445a5df0>,
 <matplotlib.lines.Line2D at 0x7f87445a5f10>]


# resorting values back to by date
uptill_2020 = uptill_2020.sort_values('Date')
# plotting the top portion which is the Close price
top_plt = plt.subplot2grid((5,4), (0, 0), rowspan=3, colspan=4)
top_plt.plot(uptill_2020['Date'], uptill_2020['Close'])
# titling the top half
plt.title('Historical BTC Price Data in USD UP TILL COVID-19')
# titling the bottom half
plt.title('\nHistorical BTC Trading Volume UP TILL COVID-19')
# plotting the bottom portion which is Volume
bot_plt = plt.subplot2grid((5,4), (3,0), rowspan=3, colspan=4)
bot_plt.bar(uptill_2020['Date'], uptill_2020['Volume'])
plt.gcf().set_size_inches(12,8)


# generating data after March 11th as that is when we knew we were in a pandemic
# we also pass in the data sorted by volume
after = data[data['Date'] >= datetime(2020,3,11)].sort_values('Volume')
x_after = after['Volume']
y_after = after['Close']
# modelling a linear regression for the data after March 11th, 2020
m_af,b_af = np.polyfit(x=x_after,y=y_after,deg=1)
# plotting the actual data
plt.figure(figsize=(20,10))
plt.title('Volume vs Close Price AFTER March 11 2020')
plt.xlabel('Volume')
plt.ylabel('Price')
plt.plot(x_after, y_after, '.', x_after, m_af * x_after + b_af)

[<matplotlib.lines.Line2D at 0x7f8736403040>,
 <matplotlib.lines.Line2D at 0x7f8736403c70>]


# resorting values back to by date
after = after.sort_values('Date')
# plotting the top portion which is the Close price
top_plt = plt.subplot2grid((5,4), (0, 0), rowspan=3, colspan=4)
top_plt.plot(after['Date'], after['Close'])
# titling the top half
plt.title('Historical BTC Price Data in USD UP TILL COVID-19')
# titling the bottom half
plt.title('\nHistorical BTC Trading Volume UP TILL COVID-19')
# plotting the bottom portion which is Volume
bot_plt = plt.subplot2grid((5,4), (3,0), rowspan=3, colspan=4)
bot_plt.bar(after['Date'], after['Volume'])
plt.gcf().set_size_inches(12,8)


uptill_2020 = uptill_2020.sort_values('Volume')
model_uptill = ols(formula='Close~Volume', data=uptill_2020).fit()
print(model_uptill.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Close   R-squared:                       0.016
Model:                            OLS   Adj. R-squared:                  0.016
Method:                 Least Squares   F-statistic:                     26.57
Date:                Mon, 17 May 2021   Prob (F-statistic):           2.86e-07
Time:                        14:47:09   Log-Likelihood:                -15669.
No. Observations:                1616   AIC:                         3.134e+04
Df Residuals:                    1614   BIC:                         3.135e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   4719.4076     99.084     47.630      0.000    4525.061    4913.754
Volume      2.796e-06   5.42e-07      5.155      0.000    1.73e-06    3.86e-06
==============================================================================
Omnibus:                       79.708   Durbin-Watson:                   0.835
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               83.339
Skew:                           0.525   Prob(JB):                     8.00e-19
Kurtosis:                       2.634   Cond. No.                     1.85e+08
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.85e+08. This might indicate that there are
strong multicollinearity or other numerical problems.


after = after.sort_values('Volume')
model_after = ols(formula='Close~Volume', data=after).fit()
print(model_after.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Close   R-squared:                       0.417
Model:                            OLS   Adj. R-squared:                  0.415
Method:                 Least Squares   F-statistic:                     302.7
Date:                Mon, 17 May 2021   Prob (F-statistic):           1.48e-51
Time:                        14:47:57   Log-Likelihood:                -4677.3
No. Observations:                 426   AIC:                             9359.
Df Residuals:                     424   BIC:                             9367.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   5969.4087   1216.329      4.908      0.000    3578.623    8360.194
Volume      4.432e-07   2.55e-08     17.398      0.000    3.93e-07    4.93e-07
==============================================================================
Omnibus:                       99.554   Durbin-Watson:                   1.225
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1618.068
Skew:                          -0.481   Prob(JB):                         0.00
Kurtosis:                      12.499   Cond. No.                     8.43e+10
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.43e+10. This might indicate that there are
strong multicollinearity or other numerical problems.


# saving the daily price changes in percentage format in a new column
data['DailyChange'] = ((data['Close'] - data['Open'])/data['Open']) * 100
data['DailyChange']

0        0.597938
1        0.590285
2        0.370838
3        1.299229
4       -0.761523
          ...    
2037    -4.104787
2038     1.535138
2039   -13.336963
2040    -0.038689
2041     0.397631
Name: DailyChange, Length: 2042, dtype: float64


# displaying the daily price change
# plotting the top portion which is the daily price change
top_plt = plt.subplot2grid((5,4), (0, 0), rowspan=3, colspan=4)
# creates an x-axis line
horiz_line_data = np.array([0 for i in range(len(data['Date']))])
top_plt.plot(data['Date'], data['DailyChange'])
#plots the x-axis line
top_plt.plot(data['Date'], horiz_line_data, 'orange') 
# titling the top half
plt.title('Historical BTC Price Change in Percentage Change')
# titling the bottom half
plt.title('\nHistorical BTC Trading Volume')
# plotting the bottom portion which is Volume
bot_plt = plt.subplot2grid((5,4), (3,0), rowspan=3, colspan=4)
bot_plt.bar(data['Date'], data['Volume'])
plt.gcf().set_size_inches(12,8)


clone = data.sort_values('Volume')
x = clone['Volume']
y = clone['DailyChange']
# modelling a linear regression for the data after March 11th, 2020
m,b = np.polyfit(x=x,y=y,deg=1)
# plotting the actual data
plt.figure(figsize=(20,10))
plt.title('Volume vs Price Change')
plt.xlabel('Volume')
plt.ylabel('Price Change')
plt.plot(x, y, '.', x, m * x + b)
plt.show()


model = ols(formula='DailyChange~Volume', data=data).fit()
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:            DailyChange   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.4736
Date:                Mon, 17 May 2021   Prob (F-statistic):              0.491
Time:                        18:23:15   Log-Likelihood:                -5747.7
No. Observations:                2042   AIC:                         1.150e+04
Df Residuals:                    2040   BIC:                         1.151e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.3118      0.097      3.229      0.001       0.122       0.501
Volume      3.047e-12   4.43e-12      0.688      0.491   -5.63e-12    1.17e-11
==============================================================================
Omnibus:                      332.396   Durbin-Watson:                   2.100
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5722.980
Skew:                          -0.135   Prob(JB):                         0.00
Kurtosis:                      11.197   Cond. No.                     2.36e+10
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.36e+10. This might indicate that there are
strong multicollinearity or other numerical problems.

	Date	Open	High	Low	Close	Adj Close	Volume
0	2020-04-10	7303.815430	7303.815430	6802.475098	6865.493164	6865.493164	4.362284e+10
1	2020-04-11	6867.440430	6926.069824	6789.920898	6859.083008	6859.083008	3.122209e+10
2	2020-04-12	6858.067871	7119.947266	6811.078125	6971.091797	6971.091797	3.575957e+10
3	2020-04-13	6965.616699	6965.616699	6668.259766	6845.037598	6845.037598	3.861931e+10
4	2020-04-14	6843.281738	6958.557129	6793.821289	6842.427734	6842.427734	3.411043e+10
...	...	...	...	...	...	...	...
396	2021-05-11	55847.242188	56872.542969	54608.652344	56704.574219	56704.574219	6.130840e+10
397	2021-05-12	56714.531250	57939.363281	49150.535156	49150.535156	49150.535156	7.521540e+10
398	2021-05-13	49735.433594	51330.843750	46980.019531	49716.191406	49716.191406	9.672115e+10
399	2021-05-14	49682.980469	51438.117188	48868.578125	49880.535156	49880.535156	5.573750e+10
400	2021-05-15	NaN	NaN	NaN	NaN	NaN	NaN

	Date	Symbol	Open	High	Low	Close	Volume BTC	Volume USD
0	2020-04-10	BTCUSD	7315.25	7315.25	7315.25	7315.25	0.00	0.00
1	2020-04-09	BTCUSD	7369.60	7378.85	7115.04	7315.25	2237.13	16310014.53
2	2020-04-08	BTCUSD	7201.81	7432.23	7152.80	7369.60	2483.60	18138080.27
3	2020-04-07	BTCUSD	7336.96	7468.42	7078.00	7201.81	2333.34	17047120.32
4	2020-04-06	BTCUSD	6775.21	7369.76	6771.01	7336.96	3727.47	26533750.17
...	...	...	...	...	...	...	...	...
1642	2015-10-12	BTCUSD	249.50	249.50	247.60	247.60	38.28	9493.89
1643	2015-10-11	BTCUSD	246.30	249.50	246.30	249.50	12.22	3021.12
1644	2015-10-10	BTCUSD	245.39	246.30	244.60	246.30	12.17	2984.44
1645	2015-10-09	BTCUSD	243.95	249.97	243.60	245.39	30.99	7651.63
1646	2015-10-08	BTCUSD	242.50	245.00	242.50	243.95	18.80	4595.84

	Date	Symbol	Open	High	Low	Close	Volume
0	2020-04-10	BTCUSD	7315.25	7315.25	7315.25	7315.25	0.00
1	2020-04-09	BTCUSD	7369.60	7378.85	7115.04	7315.25	16310014.53
2	2020-04-08	BTCUSD	7201.81	7432.23	7152.80	7369.60	18138080.27
3	2020-04-07	BTCUSD	7336.96	7468.42	7078.00	7201.81	17047120.32
4	2020-04-06	BTCUSD	6775.21	7369.76	6771.01	7336.96	26533750.17

	Date	Open	High	Low	Close	Volume
0	2015-10-08	242.50	245.00	242.50	243.95	4595.84
1	2015-10-09	243.95	249.97	243.60	245.39	7651.63
2	2015-10-10	245.39	246.30	244.60	246.30	2984.44
3	2015-10-11	246.30	249.50	246.30	249.50	3021.12
4	2015-10-12	249.50	249.50	247.60	247.60	9493.89
...	...	...	...	...	...	...
1642	2020-04-06	6775.21	7369.76	6771.01	7336.96	26533750.17
1643	2020-04-07	7336.96	7468.42	7078.00	7201.81	17047120.32
1644	2020-04-08	7201.81	7432.23	7152.80	7369.60	18138080.27
1645	2020-04-09	7369.60	7378.85	7115.04	7315.25	16310014.53
1646	2020-04-10	7315.25	7315.25	7315.25	7315.25	0.00

	Date	Open	High	Low	Close	Volume
0	2020-04-10	7303.815430	7303.815430	6802.475098	6865.493164	4.362284e+10
1	2020-04-11	6867.440430	6926.069824	6789.920898	6859.083008	3.122209e+10
2	2020-04-12	6858.067871	7119.947266	6811.078125	6971.091797	3.575957e+10
3	2020-04-13	6965.616699	6965.616699	6668.259766	6845.037598	3.861931e+10
4	2020-04-14	6843.281738	6958.557129	6793.821289	6842.427734	3.411043e+10

Can We Greatly Predict the Price Movement of Bitcoin?¶

Set Up¶

Importing the Libraries¶

Data Collection and Curation¶

Nature of Our Datasets¶

Loading the Data¶

Cleaning the Data¶

Issue 1: Volume Mismatch¶

Issue 2: Extraneous Columns and Reverse Order¶

Issue 3: Last Row Dataset 2¶

Issue 4: Merging the Two Datasets¶

Issue 5: Ensuring Type Compatibility¶

Issue 6: Missing Data¶

Querying the Missing Data¶

Hypothesizing Why the Data is Missing¶

The NaN Missing Data¶

The 0 Missing Data¶

Dealing with the Missing Data¶

Dealing with the NaN Data¶

Dealing with the 0 Data¶

Full Definition of the Data¶

Exploratory Data Analysis¶

Visualizing Evolution of BTC Prices¶

Exploring Both Volume and Price¶

Observing Volume vs. Close Price BEFORE WHO Declaration of COVID-19 to be a Pandemic¶

Generating the Volume vs. Close Price Regression AFTER WHO Declaration of COVID-19 to be a Pandemic¶

Hypothesis Testing of Price and Volume Relationship¶

Observing the Relationship Between Price Change and Volume¶

Hypothesis Testing the Relationship¶

Machine Learning¶

Describing the Problem¶

Pursing a Full ML Model¶

Net Insights¶

	Date	Open	High	Low	Close	Volume
1641	2020-04-05	6870.20	6907.90	6678.60	6775.21	8662210.80
1642	2020-04-06	6775.21	7369.76	6771.01	7336.96	26533750.17
1643	2020-04-07	7336.96	7468.42	7078.00	7201.81	17047120.32
1644	2020-04-08	7201.81	7432.23	7152.80	7369.60	18138080.27
1645	2020-04-09	7369.60	7378.85	7115.04	7315.25	16310014.53

	Date	Open	High	Low	Close	Volume
0	2015-10-08	242.500000	245.000000	242.500000	243.950000	4.595840e+03
1	2015-10-09	243.950000	249.970000	243.600000	245.390000	7.651630e+03
2	2015-10-10	245.390000	246.300000	244.600000	246.300000	2.984440e+03
3	2015-10-11	246.300000	249.500000	246.300000	249.500000	3.021120e+03
4	2015-10-12	249.500000	249.500000	247.600000	247.600000	9.493890e+03
...	...	...	...	...	...	...
2042	2021-05-11	55847.242188	56872.542969	54608.652344	56704.574219	6.130840e+10
2043	2021-05-12	56714.531250	57939.363281	49150.535156	49150.535156	7.521540e+10
2044	2021-05-13	49735.433594	51330.843750	46980.019531	49716.191406	9.672115e+10
2045	2021-05-14	49682.980469	51438.117188	48868.578125	49880.535156	5.573750e+10
2046	2021-05-15	NaN	NaN	NaN	NaN	NaN

	Date	Open	High	Low	Close	Volume
1653	2020-04-17	NaN	NaN	NaN	NaN	NaN
1828	2020-10-09	NaN	NaN	NaN	NaN	NaN
1831	2020-10-12	NaN	NaN	NaN	NaN	NaN
1832	2020-10-13	NaN	NaN	NaN	NaN	NaN
2046	2021-05-15	NaN	NaN	NaN	NaN	NaN

	Date	Open	High	Low	Close
688	2017-08-26	4371.25	4371.25	4371.25	4371.25
1039	2018-08-12	6091.87	6091.87	6091.87	6091.87
1104	2018-10-16	6437.66	6437.66	6428.43	6428.43