Data Science is the study of the generalizable extraction of knowledge from data. Being a data scientist requires an integrated skill set spanning mathematics, statistics, machine learning, databases and other branches of computer science along with a good understanding of the craft of problem formulation to engineer effective solutions. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mind-set. Students will learn concepts, techniques and tools they need to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, predictive modelling, descriptive modelling, data product creation, evaluation, and effective communication. The focus in the treatment of these topics will be on breadth, rather than depth, and emphasis will be placed on integration and synthesis of concepts and their application to solving problems.
- Describe what Data Science is and the skill sets needed to be a data scientist.
- Explain in basic terms what Statistical Inference means. Identify probability distributions commonly used as foundations for statistical modelling. Able to fit a model to data.
- Use python to carry out basic data analytics and machine learning applications.
- Introduction to AI and some Machine Learning algorithms.
- Understanding about data visualization tool with special emphasis to Hadoop.
The course is suitable for upper-level undergraduate students in computer science, mathematics, or any science streams.
Students are expected to have basic knowledge in mathematics and programming logics. If you are interested in taking the course, but are not sure if you have the right background, talk to the instructors. You may still be allowed to take the course if you are willing to put in the extra effort to fill in any gaps.
The course consists of:
- Problem Solving
- Assignment/Case Study submissions
Topics and course outline:
1. Introduction to Data Science: What is Data Science?
- Data Science Components
- Data Mining and Data Warehousing
- Big Data and Data Science hype
- The Data Science Process
- Data Science and Ethical Issues
- Discussions on privacy, security, ethics
- A look back at Data Science
- Data Science Job Roles in present era
- Next-generation data scientists
- Tools for Data Science
2. Probability and Statistics for Data Science
- Population and samples
- Statistical Inference: modelling, probability distributions, fitting a model
- Descriptive Statistics
- Probability Distributions
- Inferential Statistics through hypothesis tests
- Permutation & Randomization Test
- Correlation and Regression
3. Python for Data Science
- Introduction to Python in Data Science
- Environment setup/ Working with Google Colab
- Basic Data types
- Data Structures: arrays, string, list, dictionary.
- Data Operations: slicing and string operations.
- Iteration and Conditional Construct.
- Overview of python libraries
- Familiarization to important python libraries- Numpy, Panda, Mathplotlib.
- Exploratory Analysis in python.
4. Artificial Intelligence and Machine Learning
- Introduction to Artificial Neural Network
- Biological Neurons vs Artificial Neurons
- Learning theory in ML
- Supervised Learning- Classification: KNN and SVM
- Unsupervised Learning- Clustering: K-means
- Reinforcement Learning
- Introduction to Deep neural networks
- DNN types
5. Introduction to Data Analytics and Data Visualization Tools
- Introduction to Big Data
- Applications of Big Data
- Data Analytics (DA)
- Steps in Data Analytics
- Challenges for Big Data Analytics
- Ideas and tools for Data Visualization: with special reference to Hadoop.
Introduction to Data Science
- What is Data Science? Data Science Components , Data Mining and Data Warehousing
- Big Data and Data Science hype The Data Science Process
- Data Science and Ethical Issues Discussions on privacy, security, ethics
- A look back at Data Science Data Science Job Roles in present era
- Next-generation data scientists Tools for Data Science
Python for Data Science
Artificial Intelligence and Machine Learning