TECHNOLOGY
BigData Science/Analytics + Project Algorithmica
Plot no:134, 4th floor, Beside Bata showroom, vinayak nagar, Gachibowli, Hyderabad, Gachibowli - 72 Location Map
Contact Educator
Save

BigData Science/Analytics + Project

Data Science With Python And Spark

  • Secure Secure Multi-mode Payment
  • Cash Back Cashback with each Transaction
  • Lowest Price Lowest Price Guarantee
  • Money Back 100% Moneyback Guarantee
Price: NA

A PHP Error was encountered

Severity: Warning

Message: include(emi_option.php): failed to open stream: No such file or directory

Filename: browseCourse/book_course_container.php

Line Number: 341

A PHP Error was encountered

Severity: Warning

Message: include(): Failed opening 'emi_option.php' for inclusion (include_path='.:/usr/share/php:/usr/share/pear')

Filename: browseCourse/book_course_container.php

Line Number: 341

A PHP Error was encountered

Severity: Warning

Message: include(walletamount.php): failed to open stream: No such file or directory

Filename: browseCourse/book_course_container.php

Line Number: 342

A PHP Error was encountered

Severity: Warning

Message: include(): Failed opening 'walletamount.php' for inclusion (include_path='.:/usr/share/php:/usr/share/pear')

Filename: browseCourse/book_course_container.php

Line Number: 342

Course at a Glance
  • 90 hours of Classroom Teaching
  • Study Content (Online)
  • 20 Hours Project
  • 20 Hours Lab Work
  • English Language
  • Online Doubt Support
Click here
About Course

The course aims at developing both math and programming skills required for a data scientist. It allows us to get insight into data analysis problems that arise in business verticals and solving those problems using statistical and machine learning approaches. The course also focus upon the understanding fundamental math underlying those models. This course is more of practical research oreinted course than developer oriented. It focuses on 6 most common data analysis problems that arise in most business verticals: Classification, Regression, Recommender Systems, Clustering, Association Analysis and Outlier Detection.

Objectives

Upon successful completion of Data Science/Analytics course, participants will be able to:

  • Understand and Apply how statistical data analysis techniques are utilized in business decision making
  • Understand and Apply machine learning techniques in business data analysis
  • Solve the data analysis use case from its inception to deployment on their own
  • Apply algorithms to build machine intelligence
About Institute

Algorithmica, founded in 2008, is a world class corporate training company that focuses on improving and expanding the engineering skills of developers and on enhancing the quality of the software they develop. Since 2008, Algorithmica has been helping IT professionals get better at what they do by providing an extensive range of training services on emerging technologies. Always pushing the envelope, Algorithmica constantly explores new fields of knowledge as well as new training methodologies to better serve clients. The company is led by a team of experts from IIT Alumni , with accumulated experience of tens of years of software development, architectural design and project management. The team has provided most authentic, comprehensive and high quality training services to good number of companies in recent years, ranging from small start-ups to large enterprises. We pride ourselves by standing by our commitment to help IT professionals get to the next level, by being in tune with our customers actual needs, and by always delivering on what we promise, all while having fun doing it.

Instructors
 ThimmaReddy Maramreddy ThimmaReddy Maramreddy
ThimmaReddy MaramReddy is the founder and the strategist for Algorithm...
Course Structure

1. Introduction to Data Science/Analytics

  • Why does companies care about Data Scientist/Analyst?
  • Data Analytics:OLAP vs DataMining
  • What is DataScience? Why DataScience?
  • Data driven product engineering
  • Skill-set of Data Scientist and How to become a Data Scientist?
  • Who is hiring? Career Opportunities

2. Data Analysis Problems/Usecases in Business

  • Predictive Analytics Problems: Classification, Regression, Recommenders
  • Descriptive Analytics Problems: Frequent Pattern Mining, Clustering, Outlier Detection
  • Types of Data: Structured, Time-Series, Text, Image, Voice and Video data
  • Business Verticals: Retail, Banking, Financial, Social, Web, Medical, Scientific, Logistics, Real Estate

3. Tools for Data Science/Analytics

  • Data Life Cycle for Analysis
  • Technologies for Data Science/Analytics
  • Single Machine Analytic Platforms: R, Python
  • Distributed Analytical Platforms: Hadoop, Spark, H20
  • Datasets for doing data science/analytics

4. Mastering R/Python Language

  • IDE for R/Python
  • basic data structures
  • basic features
  • advanced features
  • packages required for datScience in R/Python
  • Lab Session

5. Linear Algebra for data scientist

  • Ideas that need Linear Algebra
  • Vector Algebra
    • ideas that map to vectors
    • understanding vector operations
    • understanding lienar independance
    • applications of dotproduct
    • Lab Session
  • Matrix Algebra
    • ideas that map to matrices
    • understanding matrix operations
    • understanding determinant
    • understanding eigen-values and eigen-vectors
    • understanding inverse
    • understanding rank
    • understanding positive definite & semi-definiteness
    • concept of basis
      • basis,orthogonal and ortho-normal basis
      • understanding basis change
    • understanding factorization
      • Spectral factorization
      • Eigen factorization
      • SVD factorization
      • (Optional)LU factorization
      • (Optional)QR factorization
    • applications of matrices
      • image processing
      • solving systems of equations
      • modelling discrete systems
    • Lab Session

6. Statistics for data scientist

  • Ideas that need statistics
  • Descriptive stats for single variable
    • mean, median, mode, quantiles, percentiles
    • standard deviation, variance
    • MAD, IQR
  • Descriptive stats for two variables
    • covariance
    • correlation
    • chi-squared Analysis
  • Hypothesis Testing
  • Inferential Statistics
  • Lab Session

Syllabus

1. Introduction to Data Science/Analytics

  • Why does companies care about Data Scientist/Analyst?
  • Data Analytics:OLAP vs DataMining
  • What is DataScience? Why DataScience?
  • Data driven product engineering
  • Skill-set of Data Scientist and How to become a Data Scientist?
  • Who is hiring? Career Opportunities

2. Data Analysis Problems/Usecases in Business

  • Predictive Analytics Problems: Classification, Regression, Recommenders
  • Descriptive Analytics Problems: Frequent Pattern Mining, Clustering, Outlier Detection
  • Types of Data: Structured, Time-Series, Text, Image, Voice and Video data
  • Business Verticals: Retail, Banking, Financial, Social, Web, Medical, Scientific, Logistics, Real Estate

3. Tools for Data Science/Analytics

  • Data Life Cycle for Analysis
  • Technologies for Data Science/Analytics
  • Single Machine Analytic Platforms: R, Python
  • Distributed Analytical Platforms: Hadoop, Spark, H20
  • Datasets for doing data science/analytics

4. Mastering R/Python Language

  • IDE for R/Python
  • basic data structures
  • basic features
  • advanced features
  • packages required for datScience in R/Python
  • Lab Session

5. Linear Algebra for data scientist

  • Ideas that need Linear Algebra
  • Vector Algebra
    • ideas that map to vectors
    • understanding vector operations
    • understanding lienar independance
    • applications of dotproduct
    • Lab Session
  • Matrix Algebra
    • ideas that map to matrices
    • understanding matrix operations
    • understanding determinant
    • understanding eigen-values and eigen-vectors
    • understanding inverse
    • understanding rank
    • understanding positive definite & semi-definiteness
    • concept of basis
      • basis,orthogonal and ortho-normal basis
      • understanding basis change
    • understanding factorization
      • Spectral factorization
      • Eigen factorization
      • SVD factorization
      • (Optional)LU factorization
      • (Optional)QR factorization
    • applications of matrices
      • image processing
      • solving systems of equations
      • modelling discrete systems
    • Lab Session

6. Statistics for data scientist

  • Ideas that need statistics
  • Descriptive stats for single variable
    • mean, median, mode, quantiles, percentiles
    • standard deviation, variance
    • MAD, IQR
  • Descriptive stats for two variables
    • covariance
    • correlation
    • chi-squared Analysis
  • Hypothesis Testing
  • Inferential Statistics
  • Lab Session

7. Probability for data scientist

  • Ideas that need probabilistic analysis
  • Basic Probability, Conditional Probability
  • Bayes Rule/Reasoning
  • MAP vs MLE Reasoning
  • Mapping Random process to Random variable
  • Properties of Random variables
    • expectation
    • variance
    • entropy and cross-entropy
    • covariance and correlation
  • Estimating probability of Random variable
  • Understanding standard random processes
  • Probability Distributions: Normal, Gamma, Poisson , Dirichlet, Bernoulli, Binomial, Powerlaw, Log normal, Multinomial
  • Parameter Estimation in Distributions: MAP and MLE approaches
  • Lab Session

8. Calculus for data scientist

  • Ideas that need calculus
  • Rate of change
  • Concept of limit
  • Concept of derivative
  • Partial derivatives & gradient
  • Significance of gradient
  • Concept of integration
  • Applications of calculus
  • Lab Session

9. Optimization theory for data scientist

  • Ideas with optimization requirement
  • Modelling ML problems with optimization requirements
  • Solving unconstrained optimization problems
  • Solving optimization problems with linear constraints
  • Gradient descent ideas
    • gradient descent, steepest descent ideas
    • batch gradient descent
    • stochastic gradient descent
  • Lab Session

10. Classification Problem

  • What is classification?
  • Classification Examples in Business Verticals
  • Solution strategies for classification
    • Finding pattern and Fixed Pattern Approach
    • Limitations of Fixed Pattern Approach
    • Machine Learning Approaches for classfication
      • KNN, Decision Trees, SVM, Naive Bayes
      • Logistic Regression, Neural Network, Ensembles
  • How do you handle overfitting?
  • Evaluation Metrics for Classification Algorithms
    • Confusion Matrix, Accuracy, Error Rate
    • Precision, Recall and F-Score
    • ROC curve, AUC

11. Regression Problem

  • What is Regression?
  • Regression Examples in Business Verticals
  • Solution strategies for Regression
    • Finding pattern and Fixed Pattern Approach
    • Limitations of Fixed Pattern Approach
    • Machine Learning Approaches for regression
      • KNN, Linear Regression, Ridge and Lasso Regression
      • Decision Trees, SVM, Neural Network, Ensembles
  • How do you handle overfitting?
  • Evaluation Metrics for Regression Algorithms
    • RMSE(Root Mean Squared Error)
    • Mean Absolute Deviation(MAD)
    •  

12. Recommendation Problem

  • What is Recommendation System?
    • Top-N Recommender
    • Rating Prediction
  • Recommendations in Business Verticals
  • Solution strategies for Recommender System
    • Content based Recommenders
    • Limitations of Content based recommenders
    • Machine Learning Approaches for Recommenders
      • User-User KNN model, Item-Item KNN model
      • Factorization or latent factor model
    • Hybrid Recommenders
  • How do you handle overfitting?
  • Evaluation Metrics for Recommendation Algorithms
    • Top-N Recommnder: Accuracy, Error Rate
    • Rating Prediction: RMSE

13. Frequent Pattern Mining

  • What is Frequent Pattern Mining?
  • Frequent Pattern Mining in Business Verticals
  • Solution strategies for Frequent Pattern Mining
    • Finding pattern and Fixed Pattern Approach
    • Limitations of Fixed Pattern Approach
    • Machine Learning Approaches for Frequent Pattern Mining
      • Apriori, Eclat, FP-Growth
  • Evaluation Metrics for Frequent Pattern Mining
    • Support, Confidence, Lift

14. Clustering Problem

  • What is Clustering?
  • Clustering Examples in Business Verticals
  • Solution strategies for Clustering
    • Finding pattern and Fixed Pattern Approach
    • Limitations of Fixed Pattern Approach
    • Machine Learning Approaches for Clustering
      • Iterative based K-Means & K-Medoid Approaches
      • Hierarchical Agglomerative Approaches
      • Density based DB-SCAN Approach
  • Evaluation Metrics for Clustering
    • Cohesion, Coupling Metrics
    • Correlation Metric

15. Outlier Problem

  • What are Outliers?
  • Outlier Examples in Business Verticals
  • Solution strategies for Outlier Detection
    • Finding pattern and Fixed Pattern Approach
    • Limitations of Fixed Pattern Approach
    • Machine Learning Approaches for Outliers
      • Probabilistic Approach, KNN Approach
      • Density based LOF Approach, Cluster Based Approach

16. Overview of Machine Learning Algorithms

  • What is Machine Learning?
  • Pipeline for ML Algorithms
  • Pipeline Stages: Data Collection, Data Preparation, Feature Engineering, Model Building, Model Evaluation and Model Deployment
  • Supervised, Unsupervised and Semi-supervised ML Algorithms

17. Data Collection Techniques

  • Collecting data from Excel/csv/tsv files
  • Collecting data from databases
  • Collecting data from services
  • Collecting data via scraping
  • Lab Session

18. Data Preparation Techniques

  • Structured Data Preparation
    • Data Type Conversion
      • Category to Numeric Conversion
      • Numeric to Category Conversion
    • Data Normalization:0-1, Z-Score
    • Handling Skew Data:Box-Cox Idea
    • Handling Missing Data
  • Text Data Preparation
    • Normalizing Text
    • Stop word Removal
    • Whitespace Removal
    • Stemming
    • Building Document Term Matrix
  • Image Data Preparation
    • Converting to gray scale
    • Pixel Value Normalization
    • Building Pixel Intensity Matrix
  • (Optional)Voice Data Preparation
  • (Optional)Video Data Preparation
  • Lab Session

19. EDA(Numerical + Graphical) and Feature Engineering

  • Exploring Individual Features
  • Exploring Bi-Feature Relationships
  • Exploring Multi-feature Relationships
  • Feature/Dimension Reduction: PCA
    • Intuition behind PCA
    • Covariance & Correlation
    • Relating PCA to Covariance/Correlation
    • Intuition to math
    • Applications of PCA:Dimensionality Reduction, Image Compression
  • (Optional)Automatic Feature Extraction via Deep Learning
  • Lab Session

20. Classification and Regression: KNN Model

  • Intutitive idea of KNN classification
  • KNN learning
  • Limitations of KNN
  • KNN Regression
  • Applying KNN and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

21. Classification and Regression: Decision Tree Model

  • Intuitive Idea of Decision Tree for classification
  • Decision Tree Learning
  • Approaches for tree learning: Entropy,Inf Gain,Inf Gain Ratio,Gini-index,Misclassfication error
  • How to control over-fitting in tree learning?
  • Comparing ID3, CART, C4.5
  • Decision Trees for Regression
  • Applying Decision Tree and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

22. Classification and Regression: Naive Bayes Model

  • Intutitive idea of Naive Bayes classification
  • Math of Naive Bayes Model
  • Naive Bayes learning
  • Limitations of Naive Bayes Learning
  • Smoothing in Navie Bayes Learning
  • Applying NaiveBayes model and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

23. Classification:Logistic Regression

  • Intuitive Idea of Logistic Regression
  • Math of Logistic Regression
  • Logistic Regression Learning
  • Applying Logistic Regression and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

24. Classification and Regression: SVM Model

  • Intuitive Idea of SVM classification
  • Transforming SVM idea to Math
  • Hard-margin SVM Learning
  • Limitations of Hard-margin SVM Learning
  • Soft-margin SVM Learning
  • Limitations of Soft-margin SVM Learning
  • Kernel SVM Learning
  • Generalizing SVM to multi-classes
  • SVM Regression
  • Applying SVM and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

25. Classification and Regression: Neural Network Model

  • Intutitive idea of Neural Network
  • Perceptron model for classification and regression
  • Perceptron Learning
  • Limitations of Perceptron model
  • Multi-layer FF NN model for classification and regression
  • ML-FF-NN Learning with backpropagation
  • Applying ML-FF-NN and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

26. Classification and Regression: Ensemble Model

  • Intuitive Idea of Ensemble for classification
  • Understanding Weak Learners
  • Approaches for Ensemble learning: Boosting, Bagging and Randomization
  • Bagging Idea in depth and why it works?
  • Bagged Tree Model Learning
  • Boosting Idea in depth and why it works?
  • Boosting variations: AdaBoost & GradientBoost
  • Boosted Tree Model Learning
  • Ensembles for Regression
  • Applying Bagging and Boosting and parameter tuning
  • Pros and Cons of the Model
  • Lab Session

27. Recommenders: Content based Recommendation

  • Building user/item profiles
  • Recommendation Algorithm based on content
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

28. Recommenders: User-User KNN Model

  • Building user/user similarity matrix from rating matrix
  • Recommendation Algorithm based on user-user similarity matrix
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

29. Recommenders:Item-Item KNN Model

  • Building Item/Item similarity matrix from rating matrix
  • Recommendation Algorithm based on item-item similarity matrix
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

30. Recommenders:Latent Factor Model

  • Building factors of rating matrix
  • Recommendation Algorithm based matrix factors
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

31. Clustering: Iterative Models

  • Intuitive Idea of Iterative Model
  • K-Means & K-Medoid Models
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

32. Clustering: Hierarchical Models

  • Intuitive Idea of Hierarchical Model
  • Agglomerative Models:Single, Complete, Average Link
  • Agglomerative Models:Centroid, Custom Link
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

33. Clustering: Density Models

  • Intuitive Idea of Density Model
  • DB-SCAN Model
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

34. Outliers: Probabilistic Model

  • Intuitive Idea
  • Probabilistic Model
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

35. Outliers: KNN Model

  • Intuitive Idea
  • KNN Model
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

36. Outliers: Density Model

  • Intuitive Idea
  • LOF Model
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

37. Association Analysis: Apriori Model

  • Intuitive Idea
  • Apriori Model
  • Applying the Algorithm and tuning
  • Pros and Cons of the Model
  • Lab Session

38. Distributed/BIGDATA Analytics

  • Analytics at Scale
  • Platforms for Distributed Analytics: Hadoop, Spark, H20
  • Lab Session

39. (Optional)Data Visualization

  • Need of Data visualization in practice
  • D3 basics + Lab Session

40. Project(4 day Hackathon)

  • Hackathon(Day 1)
  • Hackathon(Day 2)
  • Hackathon(Day 3)
  • Hackathon(Day 4)

 

 

For Whom

Developers at all levels, BI professionals, DataWarehousing Professionals, Team Leads, Analytics Managers & Business Managers.

Prerequisites

Nothing but passion & interest towards data engineering

Center Address
Algorithmica Center
Center Details
  • Ac Classroom Yes
  • Power Backup Yes
  • Lift Yes
  • Purified Water Yes
  • Four Wheeler Parking Yes
  • Two Wheeler Parking Yes
  • Hostel Support No
  • Girls Wash Room Yes
  • Female Staff Yes
  • Fire Alarm System Yes
  • Fire Extinguishers Yes
  • Manned Security Building Yes
  • Security Cams Facility Yes
Academic Profile
  • Hours 90
  • Online Query Support Yes
  • Online Tests Yes
  • Telephonic Query Support No
  • Video Classes No
  • Study Content Yes
  • Class Hand Outs Yes
Participants
No Participants to Display.
Call us on India +91 9618192305

Leave a message and we'll get back to you.