Hey MLM Family,
I have been on both sides of the "Data Science Interview Table".
Therefore I thought of creating the most frequently asked 50 Data science interview questions along with their answers in 5 parts.
Ten pair of Question-Answer per week, so that it won't be an overburden to read or remember it.
please let me know your honest thoughts about it in the comment box.
So, without taking much time let's get started!
Q1. What is the difference between Artificial Learning & Machine Learning?
Designing and developing algorithms according to the behaviour based on empirical(verifiable by observation or experience rather than theory or pure logic) data are known as Machine Learning.
While Artificial Intelligence in addition to machine learning, also covers other aspects like knowledge representation, natural language processing, planning, robotics etc.
Q2. What is ‘overfitting’ in Machine Learning?
When the model maps perfectly on training data and it seems that the accuracy is almost near to 100% but the model performs poorly in test data, In this scenario, we can say that the model has overfitted.
This happens because of too many parameters with respect to the number of training data-types.
Q3. How can you avoid overfitting?
By just using a lot of data overfitting can be avoided, overfitting happens relatively as you have a small dataset, and you try to learn from it.
But if you have a small dataset and you are forced to come with a model based on that.
In such a scenario, you can use a technique knows as cross-validation. In this method the dataset splits into two sections, testing and training dataset, the testing dataset will only test the model while, in the training dataset, the data points will come up with the model.
Q4. Why we place ‘Naive’ in the Naive Bayes algorithm?
This is a Lil tricky question that was asked to me in an interview and also I ask it many times when I take interviews. So, the answer is
Naive Bayes is called naive because it assumes that each input variable is independent. This is a strong assumption and unrealistic for real data; however, the technique is very effective on a large range of complex problems.
Q5. What are the two methods used for calibration in supervised machine learning?
Two methods used for predicting good probabilities in supervised learning are
1. Platt Calibration
2. Isotonic Calibration
The methods are defined for binary classification.
Q6. What is perceptron in Machine learning?
Perceptron is a building block of Artificial Neural Network(ANN), In simple words, the perceptron is a single-layered neural network that helps in binary classification.
It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.
It comes in the category of supervised learning.
Q7. Explain the two components of the Bayesian logic program?
The bayesian logic program consists of two components:
The first component is a logical one; It consists of a set of Bayesian Clauses, which captures the qualitative structure of the domain.
The second component is a quantitative one, it encodes the quantitative information about the domain.
Q8. What are Bayesian Networks(BN)?
Bayesian Network is used to represent the graphical model for probability relationship among a set of variables.
For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
Q9. What is ensemble learning and why it is used?
To solve a particular computational program, multiple models such as classifiers are strategically generated and combined. This process is known as ensemble learning.
Ensemble learning is used to improve classification, prediction, function approximation etc.
Q10. What is the bias-variance decomposition of classification error in ensemble methods?
The expected error of a learning algorithm can be decomposed into bias and variance.
A bias term measures how closely the average classifier produced by the learning algorithm matches the target function.
The variance measures how much the learning algorithm’s prediction fluctuates for different training sets.