Skip to content

10-605: Machine Learning with Large Datasets

CategoryDifficulty (Out of 5)
Homework – Programming4
Homework – Written4
Exams4

The course 10-605 is a very popular course in the Machine Learning Department at Carnegie Mellon University. The course is good for those who want to understand Machine Learning on a large scale. It provides both theoretical and programming experience throughout the course. The topics are covered at a slightly mediocre level but provide students with resources to go in-depth. They are mentioned below:

Topics Covered

  1. Distributed Computing, Spark
  2. Visualization, PCA
  3. Nonlinear Dimensionality Reduction
  4. Distributed Linear Regression
  5. Kernel Approximations
  6. Logistic Regression, Hashing
  7. Randomized Algorithms
  8. Cloud Computing & Services
  9. Deep Learning – Autodiff
  10. DL Frameworks & Hardware
  11. Large-Scale Optimization
  12. Optimization for DL
  13. Parallel/Distributed DL
  14. Hyperparameter Tuning
  15. Inference, Model Compression
  16. Neural Architecture Search
  17. Federated Learning

Class Structure

  1. Lectures on the above topics mentioned
  2. Importance & Significance
  3. Theoretical Derivations
  4. Algorithms
  5. Applications
  6. Guest Lectures
  7. Homeworks
  8. Programming (Spark, Tensorflow)
  9. Written

Homeworks

The homework component of the course is divided into written and programming parts. The programming parts usually involve applications of tools like PySpark on Databricks to help students get a hands-on experience of dealing with Machine Learning at scale. After the midterm exam, the programming part involves applications of tools like Tensorflow to perform Deep Learning tasks. The written component usually has proofs, derivations, and other concepts taught in the class. Students are expected to know the concepts of Linear Algebra. One of the homework also enables the students to explore cloud computing platform AWS and use Elastic Map Reduce (EMR) service for distributed Machine Learning.

Exams

The course involves two exams mainly during the mid-term and end-term weeks. The exam mainly involves questions from the lecture slides, guest lectures, and homework. Any student having gone through this syllabus would find it easy to get prepared for the exam. The format of the exam is simple with a combination of Multiple Choice Questions (Select One, Select All That Apply, True/False), and short paragraph answer type questions.

Tips and Tricks to do Well

  • Attend lectures and recitations. Can also review recordings later on your time.
  • Refer to the external resources pointed out during the lectures.
  • Start Homeworks and prepare for the exam early.
  • Follow-up on Piazza for any clarifications or doubts. One can also attend OH for any help.

Written by CMU students, for CMU students.