Machine learning can classically be summarized with two methodologies: supervised and unsupervised learning. In supervised learning, the correct answers are annotated ahead of time and the algorithm tries to fit a decision space based on those answers. In unsupervised learning, algorithms try to group like examples together, inferring similarities via distance or similarity metrics. These learning types allow us to explore data and categorize them in a meaningful way, predicting where new data will fit into our models. Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientist’s toolkit for machine learning of incoming data sets. The purpose of this course is to serve as an introduction to Machine Learning with Scikit-Learn. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms rather than as simply a research or investigation methodology. Understand the differences and data requirements for regressions, classification, and clustering machine learning methodologies. Be able to deploy models into applications or data products to receive feedback from them, retraining and reinforcing existing models. Enrollment in this course is restricted. Students must submit an application and be accepted into the Certificate in Data Science in order to register for this course.

