Coursera: Machine Learning- Andrew NG(Week 11) Quiz - Application - Photo OCR
These solutions are for reference only.
It is recommended that you should solve the assignment and quiz by yourself honestly then only it makes sense to complete the course.
but if you cant figure out some part of it than you can refer these solutions
make sure you understand the solution
dont just copy paste it
answers in green colour
----------------------------------------------------------------------------------------------
▸ Application - Photo OCR :
- Suppose you are running a sliding window detector to find text in images. Your input images are 1000x1000 pixels. You will run your sliding windows detector at two scales, 10x10 and 20x20 (i.e., you will run your classifier on lots of 10x10 patches to decide if they contain text or not; and also on lots of 20x20 patches), and you will “step” your detector by 2 pixels each time. About how many times will you end up running your classifier on a single 1000x1000 test set image?
- 250,000
- 500,000
With a stride of 2, you will run your classifier approximately 500 times for each dimension. Since you run the classifier twice (at two scales), you will run it 2 * 500 * 500 = 500,000 times.
- 1,000,000
- 100,000
- Suppose that you just joined a product team that has been developing a machine learning application, using m = 1,000 training examples. You discover that you have the option of hiring additional personnel to help collect and label data. You estimate that you would have to pay each of the labellers $10 per hour, and that each labeller can label 4 examples per minute. About how much will it cost to hire labellers to label 10,000 new training examples?
- $400
On labeller can label 4 × 60 = 240 examples in one hour. It will thus take him 10,000/240 ≈ 40 hours to complete 10,000 examples. At $10 an hour, this is $400.
- $600
- $10,000
- $250
- $400
- What are the benefits of performing a ceiling analysis? Check all that apply.
- If we have a low-performing component, the ceiling analysis can tell us if that component has a high bias problem or a high variance problem.
- A ceiling analysis helps us to decide what is the most promising learning algorithm (e.g., logistic regression vs. a neural network vs. an SVM) to apply to a specific component of a machine learning pipeline.
- It gives us information about which components, if improved, are most likely to have a significant impact on the performance of the final system.
The ceiling analysis gives us this information by comparing the baseline overall system performance with ground truth results from each component of the pipeline
- It can help indicate that certain components of a system might not be worth a significant amount of work improving, because even if it had perfect performance its impact on the overall system may be small.
An unpromising component will have little effect on overall performance when it is replaced with ground truth.
- It is a way of providing additional training data to the algorithm.
- It helps us decide on allocation of resources in terms of which component in a machine learning pipeline to spend more effort on.
The ceiling analysis reveals which parts of the pipeline have the most room to improve the performance of the overall system.
- Suppose you are building an object classifier, that takes as input an image, and recognizes that image as either containing a car (y = 1) or not (y = 0). For example, here are a positive example and a negative example:
After carefully analyzing the performance of your algorithm, you conclude that you need more positive (y = 1) training examples. Which of the following might be a good way to get additional positive examples?- Mirror your training images across the vertical axis (so that a left-facing car now becomes a right-facing one).
A mirrored example is different from the original but equally likely to occur, so mirroring is a good way to generate new data.
- Take a few images from your training set, and add random, Gaussian noise to every pixel.
- Take a training example and set a random subset of its pixel to 0 to generate a new example.
- Select two car images and average them to make a third example.
- Apply translations, distortions, and rotations to the images already in your training set.
These geometric distortions are likely to occur in real-world images, so they are a good way to generate additional data.
- Make two copies of each image in the training set; this immediately doubles your training set size.
- Mirror your training images across the vertical axis (so that a left-facing car now becomes a right-facing one).
- Suppose you have a PhotoOCR system, where you have the following pipeline:
You have decided to perform a ceiling analysis on this system, and find the following:
Which of the following statements are true? - There is a large gain in performance possible in improving the character recognition system.
Plugging in ground truth character recognition gives an 18% improvement over running the character recognition system on ground truth character segmentation. Thus there is a good deal of room for overall improvement by improving character recognition.
- Performing the ceiling analysis shown here requires that we have ground-truth labels for the text detection, character segmentation and the character recognition systems.
At each step, we provide the system with the ground-truth output of the previous step in the pipeline. This requires ground truth for every step of the pipeline.
- The potential benefit to having a significantly improved text detection system is small, and thus it may not be worth significant effort trying to improve it.
Plugging in ground truth text detection improved the overall system by only 2%, so it is not a good candidate for development effort.
- The least promising component to work on is the character recognition system, since it is already obtaining 100% accuracy.
- The most promising component to work on is the text detection system, since it has the lowest performance (72%) and thus the biggest potential gain.
- We should dedicate significant effort to collecting additional training data for the text detection system.
- If the text detection system was trained using gradient descent, running gradient descent for more iterations is unlikely to help much.
Plugging in ground truth text detection improved the overall system by only 2%, so even if you could improve text detection performance with more gradient descent iterations, this would have minimal impact on the overall system performance.
- If we conclude that the character recognition’s errors are mostly due to the
character recognition system having high variance, then it may be worth significant effort obtaining additional training data for character recognition.Since the biggest improvement comes from character recognition ground truth, we would like to improve the performance of that system. It the character recognition system has high variance, additional data will improve its performance.
main course coursera.org
No comments