Coursera: Machine Learning- Andrew NG(Week 8)Quiz - Unsupervised Learning

Coursera: Machine Learning- Andrew NG(Week 8)Quiz - Unsupervised Learning



 These solutions are for reference only.

It is recommended that you should solve the assignment and quiz by yourself honestly then only it makes sense to complete the course.
but if you cant figure out some part of it than you can refer these solutions

make sure you understand the solution
dont just copy paste it

answers in green colour

----------------------------------------------------------------------------------------------

▸ Unsupervised Learning :







  1. For which of the following tasks might K-means clustering be a suitable algorithm
    Select all that apply.

    •  Given a set of news articles from many different news websites, find out what are the main topics covered.
       K-means can cluster the articles and then we can inspect them or use other methods to infer what topic each cluster represents

    •  Given historical weather records, predict if tomorrow’s weather will be sunny or rainy.

    •  From the user usage patterns on a website, figure out what different groups of users exist.
       We can cluster the users with K-means to find different, distinct groups.

    •  Given many emails, you want to determine if they are Spam or Non-Spam emails.

    •  Given a database of information about your users, automatically group them into different market segments.
       You can use K-means to cluster the database entries, and each cluster will correspond to a different market segment.

    •  Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.
       If you cluster the sales data with K-means, each cluster should correspond to coherent groups of items.

    •  Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.




  1. Suppose we have three cluster centroids  and .
    Furthermore, we have a training example . After a cluster assignment
    step, what will  be?

    •   = 1
        is closest to , So  = 1.
      (Calculate Euclidean distance for each centroid and choose the smallest one)

    •   is not assigned

    •   = 2

    •   = 3


  1. K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?

    •  Move the cluster centroids, where the centroids  are updated.
       The cluster update is the second step of the K-means loop.

    •  The cluster assignment step, where the parameters  are updated.
       This is the correst first step of the K-means loop.

    •  Using the elbow method to choose K.

    •  Feature scaling, to ensure each feature is on a comparable scale to the others.

    •  The cluster centroid assignment step, where each cluster centroid  is assigned (by setting ) to the closest training example .

    •  Move each cluster centroid , by setting it to be equal to the closest training example .

    •  Test on the cross-validation set.

    •  Randomly initialize the cluster centroids.



  1. Suppose you have an unlabeled dataset . You run K-means with 50 different random initializations, and obtain 50 different clusterings of the data.

    What is the recommended way for choosing which one of these 50 clusterings to use?

    •  Use the elbow method.

    •  Plot the data and the cluster centroids, and pick the clustering that gives the most “coherent” cluster centroids.

    •  Manually examine the clusterings, and pick the best one.

    •  Compute the distortion function , and pick the one that minimizes this.
       A lower value for the distortion function implies a better clustering, so you should choose the clustering with the smallest value for the distortion function.

    •  The only way to do so is if we also have labels  for our data.

    •  Always pick the final (50th) clustering found, since by that time it is more likely to have converged to a good solution.

    •  The answer is ambiguous, and there is no good way of choosing.

    •  For each of the clusterings, compute , and pick the one that minimizes this.
       This function is the distortion function. Since a lower value for the distortion function implies a better clustering, you should choose the clustering with the smallest value for the distortion function.



  1. Which of the following statements are true? Select all that apply.

    •  On every iteration of K-means, the cost function  (the distortion function) should either stay the same or decrease; in particular, it should not increase.
       Both the cluster assignment and cluster update steps decrese the cost / distortion function, so it should never increase after an iteration of K-means.

    •  A good way to initialize K-means is to select K (distinct) examples from the training set and set the cluster centroids equal to these selected examples.
       This is the recommended method of initialization.

    •  K-Means will always give the same results regardless of the initialization of the centroids.

    •  Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid

    •  For some datasets, the “right” or “correct” value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide.
       In many datasets, different choices of K will give different clusterings which appear quite reasonable. With no labels on the data, we cannot say one is better than the other.

    •  The standard way of initializing K-means is setting  to be equal to a vector of zeros.

    •  If we are worried about K-means getting stuck in bad local optima, one way to ameliorate (reduce) this problem is if we try using multiple random initializations.
       Since each run of K-means is independent, multiple runs can find different optima, and some should avoid bad local optima.

    •  Since K-Means is an unsupervised learning algorithm, it cannot overfit the data, and thus it is always better to have as large a number of clusters as is computationally feasible.


darkmode