Coursera: Machine Learning-Andrew NG(Week 1) Quiz - Linear Regression with One Variable - codemummy |online technical computer science platform.

These solutions are for reference only.
try to solve on your own
but if you get stuck in between than you can refer these solutions

there are different set of questions ,
we have provided the variations in particular question at the end.
read questions carefully before marking

--------------------------------------------------------------------

Linear Regression with One Variable

TOTAL POINTS 5
Question 1
Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.
Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of "A" grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is $h_\theta(x) = \theta_0 + \theta_1x$ , and we use $m$ to denote the number of training examples.
For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of $m$ ? In the box below, please enter your answer (which should be a number between 0 and 10).
1 point
Question 2
Consider the following training set of $m=4$ training examples:
   x       y
   1       0.5
   2       1
   4       2
   0       0
Consider the linear regression model $h_\theta(x) = \theta_0 + \theta_1x$ . What are the values of $\theta_0$ and $\theta_1$ that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)
1 point
$\theta_0 = 0.5, \theta_1 = 0.5$
$\theta_0 = 0 , \theta_1 = 0.5$
$\theta_0 = 1, \theta_1 = 1$
$\theta_0 = 1, \theta_1 = 0.5$
$\theta_0 = 0.5, \theta_1 = 0$

explanation:
As J(θ0,θ1)=0, y = hθ(x) = θ0 + θ1x. Using any two values in the table, solve for θ0, θ1.

explanation: Setting x = 2, we have hθ(x)=θ0+θ1x = 0 + (1.5)(2) = 3

Question 4
Let $f$ be some function so that
$f(\theta_0, \theta_1)$ outputs a number. For this problem,
$f$ is some arbitrary/unknown smooth function (not necessarily the
cost function of linear regression, so $f$ may have local optima).
Suppose we use gradient descent to try to minimize $f(\theta_0, \theta_1)$
as a function of $\theta_0$ and $\theta_1$ . Which of the
following statements are true? (Check all that apply.)
1 point
If $\theta_0$ and $\theta_1$ are initialized at
the global minimum, then one iteration will not change their values.
Setting the learning rate $\alpha$ to be very small is not harmful, and can
only speed up the convergence of gradient descent.
If the first few iterations of gradient descent cause $f(\theta_0, \theta_1)$ to
increase rather than decrease, then the most likely cause is that we have set the
learning rate $\alpha$ to too large a value.
No matter how $\theta_0$ and $\theta_1$ are initialized, so long
as $\alpha$ is sufficiently small, we can safely expect gradient descent to converge
to the same solution.

explanation:
True If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values. At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the parameters.
False Setting the learning rate to be very small is not harmful, and can only speed up the convergence of gradient descent. If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.
True If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value if alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(θ0,θ1) at least a little bit. If gradient descent instead increases the objective value, that means alpha is too large (or you have a bug in your code!).
False No matter how θ0 and θ1 are initialized, so long as learning rate is sufficiently small, we can safely expect gradient descent to converge to the same solution This is not true, depending on the initial condition, gradient descent may end up at different local optima.

other options which can come in above question 4:
True If the learning rate is too small, then gradient descent may take a very long time to converge. If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, and therefor can take a long time to converge
False Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ0,θ1). If the learning rate is too large, one step of gradient descent can actually vastly "overshoot" and actually increase the value of f(θ0,θ1).
False If θ0 and θ1 are initialized so that θ0=θ1, then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ0=θ1. The updates to θ0 and θ1 are different (even though we're doing simulaneous updates), so there's no particular reason to update them to be same after one iteration of gradient descent.

Our training set can be fit perfectly by a straight line,
i.e., all of our training examples lie perfectly on some straight line.
Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
For this to be true, we must have $y^{(i)} = 0$ for every value of $i = 1, 2, \ldots, m$ .
For this to be true, we must have $\theta_0 = 0$ and $\theta_1 = 0$
so that $h_\theta(x) = 0$
explanation:

False For this to be true, we must have y(i)=0 for every value of i=1,2,…,m. So long as all of our training examples lie on a straight line, we will be able to find θ0 and θ1) so that J(θ0,θ1)=0. It is not necessary that y(i) for all our examples.
False Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. -
False For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0 If J(θ0,θ1)=0 that means the line defined by the equation "y = θ0 + θ1x" perfectly fits all of our data. There's no particular reason to expect that the values of θ0 and θ1 that achieve this are both 0 (unless y(i)=0 for all of our training examples).
True Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line. -

other options which can come in above question 5:

False We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.) -
False This is not possible: By the definition of J(θ0,θ1), it is not possible for there to exist θ0 and θ1 so that J(θ0,θ1)=0 -
True For these values of θ0 and θ1 that satisfy J(θ0,θ1)=0, we have that hθ(x(i))=y(i) for every training example (x(i),y(i)) -

------------------------------------------------------
variations in 2nd question:

2.For this question, assume that we are using the training set from Q1.
Recall our definition of the cost function was $J(\theta_0, \theta_1 ) = \frac{1}{2m} \sum_{i=1}^{m} (h (x^{(i)} ) - y^{(i)})^2$
What is $J(0,1)$ ? In the box below,
please enter your answer (Simplify fractions to decimals when entering answer, and ‘.’ as the decimal delimiter e.g., 1.5).

Question 2
Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemist obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released.
You would like to use linear regression ( $h_{\theta}(x) = \theta_0 + \theta_1 x$ ) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for $\theta_0$ and $\theta_1$ ? You should be able to select the right answer without actually implementing linear regression.
$\theta_0 = -569.6, \theta_1 = -530.9$
$\theta_0 = -1780.0, \theta_1 = 530.9$
$\theta_0 = -1780.0, \theta_1 = -530.9$
$\theta_0 = -569.6, \theta_1 = 530.9$

variations in 3rd question:

3. Suppose we set θ0=−1,θ1=0.5. What is hθ(4)?

explanation:
Setting x = 4, we have hθ(x)=θ0+θ1x = -1 + (0.5)(4) = 1

3.Suppose we set $\theta_0$ = −2, $\theta_1$ = 0.5 in the linear regression hypothesis from Q1. What is $h_\theta(6)$ ?

explanation:
Setting x = 6, we have hθ(x)=θ0+θ1x = -2 + (0.5)(6) = 1

variations in 4th question:

Point P (The global minimum of plot 2) corresponds to point C of Plot 1.
If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function $J(\theta_0,\theta_1)$ is maximum at point A.
If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point C, as the value of cost function $J(\theta_0,\theta_1)$ is minimum at point C.
Point P (the global minimum of plot 2) corresponds to point A of Plot 1.
If we start from point B, gradient descent with a well-chosen learning rate will eventually help us reach at or near point A, as the value of cost function $J(\theta_0,\theta_1)$ is minimum at A.

reference : coursera

----------------------------------------------------------------------------------------------

Coursera: Machine Learning-Andrew NG(Week 1) Quiz - Linear Regression with One Variable

Linear Regression with One Variable

explanation:
Setting x = 4, we have hθ(x)=θ0+θ1x = -1 + (0.5)(4) = 1

3.Suppose we set $\theta_0$ = −2, $\theta_1$ = 0.5 in the linear regression hypothesis from Q1. What is $h_\theta(6)$ ?

explanation:
Setting x = 6, we have hθ(x)=θ0+θ1x = -2 + (0.5)(6) = 1

variations in 4th question:

----------------------------------------------------------------------------------------------

Top Coding Questions

Quantitative Aptitude

Popular Posts

Tags

Coursera: Machine Learning-Andrew NG(Week 1) Quiz - Linear Regression with One Variable

Linear Regression with One Variable

explanation:Setting x = 4, we have hθ(x)=θ0+θ1x = -1 + (0.5)(4) = 13.Suppose we set = −2, = 0.5 in the linear regression hypothesis from Q1. What is ?

explanation:Setting x = 6, we have hθ(x)=θ0+θ1x = -2 + (0.5)(6) = 1

variations in 4th question:

----------------------------------------------------------------------------------------------

Top Coding Questions

Quantitative Aptitude

Popular Posts

Tags

explanation:
Setting x = 4, we have hθ(x)=θ0+θ1x = -1 + (0.5)(4) = 1

3.Suppose we set $\theta_0$ = −2, $\theta_1$ = 0.5 in the linear regression hypothesis from Q1. What is $h_\theta(6)$ ?

explanation:
Setting x = 6, we have hθ(x)=θ0+θ1x = -2 + (0.5)(6) = 1