It turned out to be a wonderful Fall Semester at Georgia Tech through their Online Master in Computer Science (OMSCS) program. As of 2015, OMSCS is a unique MOOC (Massive Open Online Courses) partnership with Udacity. The most promising aspects of OMSCS is its accessible, affordable, and challenging curriculum. I personally applied to the program to dive deeper into machine learning, where it was difficult to do it on my own and when there are no additional stakes. Coursera is a great place to start to get a solid data science and analytics foundation, the courses range from free to super affordable.
Now, for what you came for, details about Machine Learning at Georgia Tech (CS 7641). I wrote this post fresh from taking this course. CS 7641 is rated unofficially as the 3rd most difficult class among all OMSCS offerings and ranked #2 highest in expected weekly hours, about 20 hours a week. Although, myself and many students believe that plentiful intellectual rewards await upon completion of the course. Don’t take my word for it, read through comments by other students. Also, note there are opportunities for “extra credit” from problem sets to peer review participation.
There are things you can do to calm that stress for this required OMSCS class. There are no spoilers so the machine learning journey is your own to create even with a few extra tips. May the (learning) force be with you.
- Trust in yourself. Believe in your ability to get the math, programming, and theory. It may not stick the first 3 times but keep at it. I suggest only considering this class after your first semester in the program because you’ll have some reference-able success in the program. Nothing like a good pep talk to keep yourself from dropping the course. The professions and teaching assistants will similarly tell you the same things about sticking with the materials. There are enough assignments and tests in the curriculum to pull yourself up from less than an ideal start.
- Previous Machine Learning knowledge required. At least exposure to common machine learning concepts will make CS 7641 less stressful and focused on mastery. This is an old principle for any learning scenario, repetition helps you learn. This is especially true for Machine Learning since you’ll be taking a dense survey of the entire field. It will ease some frustrations if you can run through any one of the following listed below. These concepts will be covered again in CS 7641 in greater detail.
- Be open to any tool or language. I started this course thinking I would run everything in python machine learning packages. It turned out to be untrue and most students didn’t only use one language in the course. Being able to read, understand, and implement open source machine learning packages in R, python, or java will save your butts.
- Contribute and help out your fellow (struggling) students. Pretty much everyone is learning together. If you’re having a hard time then other people are too. Google groups and Piazza are integral to your success so comment, grip, and share as much as possible.
Finally, more concretely, a few high-level insights for each assignment below. The implementation can be tedious and frustrating, which is more the norm in machine learning. In practice, the same stressors apply: time, dirty data, inefficient algorithms or databases. Note that things change quickly, I took the course in Spring 2015 and many things can happen between then and the time you are reading this post.
- Data sets: There is no extra glory for complicated and hard to figure out data sets. It will only add to your frustration. Don’t get too hung up in this part of the exercise. I suggest picking one data sets that is more complicated (don’t go nuts!) and one that is very straight forward (like almost tooooo easy).
- Supervised Learning: Focus on completing implementation as well as featuring different options of each models listed in the task. Graphs and charts matter the most in each of the assignment, especially since graders can quickly view the results of models and inputs. In other machine learning courses, supervised learning is covered more extensively. The advantages are the materials and code packages are not new to you. However, the disadvantages are old habits and code may have to be refactored. The most complete packages for this assignment are listed below, don’t be afraid to use multiple languages and resources:
- (1) Java Weka Software which can be run as a code wrapper and/or manual user interface. This was my choice for implementation. I used jython wrapper in python.
- (2) R packages caret, knn, kernlab, ada.
- Python will work well except for neutral networks and decision trees implementations unless you want to manual tweak the source code.
- Orange is another GUI tool that seemed promising but it wasn’t able to work through more complicated results without freezing your labtop.
- Optimization and Randomization: Java ABAGAIL is an open source package referenced and written by one of the course contributors and creators. This area was unknown territory and not covered in the Coursera and General Assembly Data Science course. I went the path of least resistance for this assignment and used the most complete package referenced by the professor. My main problems was proficiency and setting up Java on my mac.
- Unsupervised Learning: This assignment is when I learned my love for R, including packages fastICA, FactoMineR, nfactors, and psych. Each package used provided robust visualization tools, which were highly appreciated during this assignment.
- Markov Decision Processes (MDPs): Java BURLAP is another open source tool for implementing and evaluating MDPs. Also available is MDPtoolbox for Python and R.
<a style=”background-color:black;color:white;text-decoration:none;padding:4px 6px;font-family:-apple-system, BlinkMacSystemFont, "San Francisco", "Helvetica Neue", Helvetica, Ubuntu, Roboto, Noto, "Segoe UI", Arial, sans-serif;font-size:12px;font-weight:bold;line-height:1.2;display:inline-block;border-radius:3px” href=”https://unsplash.com/@lute3d?utm_medium=referral&utm_campaign=photographer-credit&utm_content=creditBadge” target=”_blank” rel=”noopener noreferrer” title=”Download free do whatever you want high-resolution photos from Manuel Barroso Parejo”><span style=”display:inline-block;padding:2px 3px”><svg xmlns=”http://www.w3.org/2000/svg” style=”height:12px;width:auto;position:relative;vertical-align:middle;top:-1px;fill:white” viewBox=”0 0 32 32″><title>unsplash-logo</title><path d=”M20.8 18.1c0 2.7-2.2 4.8-4.8 4.8s-4.8-2.1-4.8-4.8c0-2.7 2.2-4.8 4.8-4.8 2.7.1 4.8 2.2 4.8 4.8zm11.2-7.4v14.9c0 2.3-1.9 4.3-4.3 4.3h-23.4c-2.4 0-4.3-1.9-4.3-4.3v-15c0-2.3 1.9-4.3 4.3-4.3h3.7l.8-2.3c.4-1.1 1.7-2 2.9-2h8.6c1.2 0 2.5.9 2.9 2l.8 2.4h3.7c2.4 0 4.3 1.9 4.3 4.3zm-8.6 7.5c0-4.1-3.3-7.5-7.5-7.5-4.1 0-7.5 3.4-7.5 7.5s3.3 7.5 7.5 7.5c4.2-.1 7.5-3.4 7.5-7.5z”></path></svg></span><span style=”display:inline-block;padding:2px 3px”>Manuel Barroso Parejo</span></a>