SAS more readable / general: https://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/
multi-class classification:
just use one-vs-all (aka one-vs-rest)
train a model for each class where the negative labels are all the other classes
regression:
linear vs logistic: linear is continuous, logistic is categorical
finding coefficients for standard linear equation: y = a + bx
for logistic, we use logistic function (s-shaped curve) to convert output into 0 - 1 value, then we can apply a rule to get a binary label
like >0.5 then yes else no
decision trees:
start with most important feature, split based on that, then continue splitting train set on successive features (where each split has the same value for that feature) until get to leaf nodes (which are the output, e.g., the yes/no decision)
prone to overfitting (high variance), so some ways to account for that are:
bagging:
similar to bootstrapping, construct multiple tree models for different samples of input, then for each new data point, average all the predictions
random forests:
instead of selecting optimal split (i.e., instead of selecting splits that best result in even "sides"), introduce randomness
boosting (think gradient boosted decision trees / GBDT):
combine weak learners
each successive learner attempts to correct for the errors of the previous one
place more weight on data points where the prev learner went wrong
collaborative filtering:
based on assumption that people like things similar to other things they like, and things that are liked by other people with similar taste
types of filtering:
A user-item filtering takes a particular user, finds users who are similar to that user based on similarity of ratings, and recommend items that those similar users liked. In contrast, item-item filtering will take an item, find users who liked that item, and find other items that those users or similar users also liked. It takes items and outputs other items as recommendations. https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0
User-Item Collaborative Filtering: “Users who are similar to you also liked …”
Item-Item Collaborative Filtering: “Users who liked this item also liked …”
SVM or support vector machines:
basically trying to find the hyperplane (non-linear line) that best separates the input into separate classes. do this by maximizing the margin, i.e., the distance from the hyperplace to the closest data points (the support vectors)
k-means classification:
not to be confused w k nearest neighbors, which is quite different https://www.wikiwand.com/en/K-nearest_neighbors_algorithm
algorithm:
iterative, start with k randomly assigned centroid points, then iterate by:
for each point, assign to the closest centroid
for each centroid, update/move to the middle (mean) of the points assigned to it
stop iterating when no more point assignments or centroid movements need to be made
Feature engineering:
cross-validation
1-hot encoding
Performance metrics / evaluation:
confusion matrix:
actual vs predicted https://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix_files/confusion_matrix_1.png
Predicted
P N
P: TP FN
N: FP TN
recall or sensitivity = TP / (TP + FN) or true positives
of all actual positive labels, how many did we successfully predict?
precision / true positive rate = TP / (TP + FP)
of all predicted positive labels, how many were actually positive?
false positive rate = FP / (FP + TN)
specificity or true negatives = TN / (TN + FP)
F1 is the harmonic mean of precision and recall
2TP / (2TP + FP + FN)
ROC curve is true positives (recall or sensitivity) vs false positives rate
we want the curve to be at the top left—this would indicate 0 FPs
sensitivity as a function of fallout (another name for FPR)
ROC curve is graph of recall/sensitivity vs [1- ]specificity, which is the false positive rate
in other words, graph of true positives vs false positives
so area under the curve measures how well the positive probabilities are separated from negative
intuitively, measures the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example
0.5 = random predictor
better for skewed or imbalanced classes, where it will reward discriminative classifiers over representative (eg punishes a model that only outputs ham for problem where 90% is ham)
also it's a measure of the entire classifier and not dependent on the specific threshold you choose
Learning to rank:
used for ranking a list of items, eg search engine results or FB newsfeed
differs from traditional ML classification/regression bc prediction is on more than a single instance at a time: https://www.quora.com/What-is-the-intuitive-explanation-of-Learning-to-Rank-and-algorithms-like-RankNet-LambdaRank-and-LambdaMART-In-what-types-of-data-variables-can-these-techniques-be-used-What-are-their-strengths-and-limitations/answer/Nikhil-Dandekar
Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. E.g. if you are doing spam detection on email, you will look at all the features associated with that email and classify it as spam or not. The aim of traditional ML is to come up with a class (spam or no-spam) or a single numerical score for that instance.
LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn't care much about the exact score that each item gets, but cares more about the relative ordering among all the items.
3 main approaches: pointwise, pairwise, and listwise
RankNet, LambdaRank and LambdaMART all transform ranking into a pairwise classification or regression problem. That means you look at pairs of items at a time, come up with the optimal ordering for that pair of items, and then use it to come up with the final ranking for all the results.
common loss function is minimizing # of inversions: cases where a pair's order should be inverted
Normalized Discounted Cumulative Gain (NDCG) is a measure of ranking quality:
Recommender systems survey http://www.sciencedirect.com/science/article/pii/S0950705113001044
Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions http://www.stanford.edu/class/ee378b/papers/adomavicius-recsys.pdf
A survey of collaborative filtering techniques http://dl.acm.org/citation.cfm?id=1722966
The YouTube video recommendation system http://dl.acm.org/citation.cfm?id=1864770
No comments