Coursera: Machine Learning-Andrew NG(Week 3) Quiz - Logistic Regression - codemummy |online technical computer science platform.

These solutions are for reference only.

try to solve on your own

but if you get stuck in between than you can refer these solutions

there are different set of questions ,

we have provided the variations in particular question at the end.

read questions carefully before marking

-----------------------------------------------------------------------------------------

  Our estimate for P(y=0|x;\theta)P(y=0∣x;θ) is 0.3.   

  Our estimate for P(y=1|x;\theta)P(y=1∣x;θ) is 0.7.   

  Our estimate for P(y=0|x;\theta)P(y=0∣x;θ) is 0.7.   

  Our estimate for P(y=1|x;\theta)P(y=1∣x;θ) is 0.3. 

EXPLANATION:

Our estimate for P(y=1|x;θ) is 0.7 =>because =>hθ(x) = 0.7

Our estimate for P(y=0|x;θ) is 0.3 =>because => P(y=0|x;θ) = 1 - P(y = 1| x; θ); the former is 1 - 0.7= 0.3

J(\theta)J(θ) will be a convex function, so gradient descent should converge to the global minimum.    

  Adding polynomial features (e.g., instead using h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_1 x_2 + \theta_5 x_2^2)hθ​(x)=g(θ0​+θ1​x1​+θ2​x2​+θ3​x12​+θ4​x1​x2​+θ5​x22​) ) could increase how well we can fit the training data.    

  The positive and negative examples cannot be separated using a straight line. So, gradient descent will fail to converge.    

  Because the positive and negative examples cannot be separated using a straight line, linear regression will perform as well as logistic regression on this data.   

EXPLANATION:

J(θ) will be a convex function, so gradient descent should converge to the global minimum.(true)
=>fact

Adding polynomial features (e.g., instead using hθ(x) = g(θ0 + θ1x1 + θ2x2 + θ3x2 + θ4x1x2 + θ5x2 )) could increase how well we can fit the training data (true)
=>Adding new features can only improve the fit on the training set: since setting θ3 = θ4 = θ5 = 0 makes the hypothesis the same as the original one, gradient descent will use those features (by making the corresponding non-zero) only if doing so improves the training set fit

other statements that can occur in this question:

At the optimal value of θ (e.g., found by fminunc), we will have J(θ) ≥ 0. (true)

   \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m{ \left(\frac{1}{1 + e^{-\theta^T x^{(i)}}} - y^{(i)}\right) x_j^{(i)}} θj​:=θj​−αm1​∑i=1m​(1+e−θTx(i)1​−y(i))xj(i)​ (simultaneously update for all jj).    

   \theta := \theta - \alpha \frac{1}{m} \sum_{i=1}^m{ \left(\theta^T x - y^{(i)}\right) x^{(i)}} θ:=θ−αm1​∑i=1m​(θTx−y(i))x(i).   

  \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m{ (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}} θj​:=θj​−αm1​∑i=1m​(hθ​(x(i))−y(i))xj(i)​ (simultaneously update for all jj).    

  \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m{ (h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}} θj​:=θj​−αm1​∑i=1m​(hθ​(x(i))−y(i))x(i) (simultaneously update for all jj).    

variation to 3rd question is provided at the end.

  For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum).  This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).  

  Linear regression always works well for classification if you classify by using a threshold on the prediction made by linear regression.

  The cost function J(\theta)J(θ) for logistic regression trained with m \geq 1m≥1 examples is always greater than or equal to zero.   

   The sigmoid function g(z) = \frac{1}{1 + e^{-z}}g(z)=1+e−z1​ is never greater than one ( >1 >1).   

EXPLANATION:

The cost function J(θ) for logistic regression trained with examples is always greater than or equal to zero.(true)
=>The cost for any example x(i) is always ≥ 0 since it is the negative log of a quantity less than one. The cost function J(θ) is a summation over the cost for each eample, so the cost function itself must be greater than or equal to zero.

The sigmoid function is never greater than one(true)
=>fact

other statements that can occur in this question:

The one-vs-all technique allows you to use logistic regression for problems in which each y(i)comes from a fixed, discrete set of values. (true)
=>If each y(i) is one of k different values, we can give a label to each y(i)belongs{1,2,....,k} and use one-vs-all as described in the lecture.

Figure:

Figure:

Figure:

Figure:

EXPLANATION:

In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.

$h_{θ} (x) = g (θ_{0} + θ_{1} x_{1} + θ_{2} x_{2})$ .

$where θ_{0} = - 6, θ_{1} = 0, θ_{2} = 1$ .

-----------------------------------------------------------------------------

variations in 5 th question:

Figure:

Figure:

Figure:

Figure:

variations in 3 th question:

   \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m{ \left(\theta^T x - y^{(i)}\right) x_j^{(i)}} θj​:=θj​−αm1​∑i=1m​(θTx−y(i))xj(i)​  (simultaneously update for all jj).  

   \theta := \theta - \alpha \frac{1}{m} \sum_{i=1}^m{ \left(\frac{1}{1 + e^{-\theta^T x^{(i)}}} - y^{(i)}\right) x^{(i)}} θ:=θ−αm1​∑i=1m​(1+e−θTx(i)1​−y(i))x(i).    

  \theta := \theta - \alpha \frac{1}{m} \sum_{i=1}^m{ (h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}} θ:=θ−αm1​∑i=1m​(hθ​(x(i))−y(i))x(i).    

   \theta := \theta - \alpha \frac{1}{m} \sum_{i=1}^m{ \left(\theta^T x - y^{(i)}\right) x^{(i)}} θ:=θ−αm1​∑i=1m​(θTx−y(i))x(i).   

---------------------------------------------------------------------------------

reference : coursera

Coursera: Machine Learning-Andrew NG(Week 3) Quiz - Logistic Regression

Logistic Regression

Top Coding Questions

Quantitative Aptitude

Popular Posts

Tags