This past week I explored the statistics and machine learning toolbox in Matlab. I applied basic statistics concepts I am learning at AACC. For example, I wrote a program that performs linear regression and calculates Pearson's product moment correlation and a least squares line. I spent a lot of time looking at correlations in data sets. There is a strong positive correlation between the price of the OPEC crude oil bundle and the exchange rate of the Venezuelan Bolivares. There is also a historical strong positive correlation between the exchange rate of the Singapore dollar and the exchange rate of the Danish kroner from 2017-April 2019. In addition, I wrote a program that performs hypothesis testing. I compared two data sets of hourly sea level measurements from Venice, Italy binning the years from roughly 1983-1998 and 1999-2015. Matlab performed a breathtaking calculation of over 300,000 data points in seconds. The conclusion was there is a statistically significant difference between the two data sets. Everyone knows this to be true and I would have been worried if the calculation said there is no statistically significant difference between the data. The benefit of confirming a conclusion we know to be true is we know the program is working properly.
There are many different types of hypothesis tests to perform in Matlab.
The documentation for the statistics and machine learning toolbox is robust. I spent a lot of time this past week looking through data sets on Kaggle and Quandl. Moving forward I would like to explore the non-linear regression functions.