top of page

What are the top 8 interview questions for Machine Learning & Data Science

Hello World! today we will learn about different types of top interview questions that are always floating on the back of the head of any ML/DS recruiter.

If you have read my previous blogs about basic machine learning then I would like to congratulate you and if not it's never too late, after this blog you can go and read them.

Let's start learning.

1. What are the common types of probability distribution and try to explain and one of them?

The common types of the probability distribution to analyze the data are:

  1. Normal Distribution

  2. Poisson Distribution

  3. Bernoulli Distribution

  4. Uniform Distribution

  5. Exponential Distribution

  6. Binomial Distribution

Explanation of Normal Distribution:

The normal distribution is also called Gaussian distribution because it was discovered by Carl Friedrich Gauss.

The normal distribution carries certain characteristics. If any distribution contains these characteristics, that will be called Normal Distribution.

a) The data is almost distributed in a bell-shaped manner, i.e. First low then high and then again low and then it must be symmetrical from the line x=μ.

b) The total area under the bell-shaped curve is always 1.

c) From point (a), we can conclude that exactly half of the value/area is to the left of the center and half of it is to the right.

d) There is a coincidental mean, median, mode in the distribution.

The PDF(Probability Density Function) of a random variable X following a normal distribution is given by:

And the graph of a Normal distribution looks like:

2. Distinguish Between Supervised Learning & Unsupervised Learning?

Supervised Learning

Supervised machine learning technique is a technique where we need to have supervision on the method to get the algorithm working. The supervised learning techniques need the training data along with their labels to train the machine.

Unsupervised Learning

Unsupervised machine learning technique is where we have unlabeled training data. The system tried to learn without any supervision. In this case, we use clustering techniques to determine which class data belongs to.

3. Explain Bias and Variance.


Bias is formally defined as the expectation of predicted value minus the actual value, which is error so it is an expectation of error. Think it this way, if the bias is increased(too high) which means the error rate is increased which also means the data is underfitting because the expectation of error is more & if the bias is decreased(too low) which means the error is decreased which also means the data is underfitting because the expectation of error is less.

Formula: E[Y(predicted) - Y(acutal)]


Variance is the measure of variability in the results predicted by our model. To put this in a simple way, variance measures the magnitude of the difference in prediction when we change the dataset. Think it this way, on an independent unseen data or validation set when the model does not perform as well as it did with the training dataset, there is a possibility that the model has a variance. It is basically telling you how scattered the actual values are from the predicted values. A high variance in a dataset means that the model is trained with a lot of noise and irrelevant data thus causing in the overfitting of the model and when the model has high variance, it becomes very flexible and makes wrong predictions for newer data points because it has tuned itself to the data points of the training set.

4. What is the trade-off between Bias and Variance?

The Bias and Variance trade-off is finding the right balance between bias and variance of the model so that the prediction of the model can be optimized.

It is basically a way to make sure that the model is neither over-fitted nor under-fitted in any case.

Let's understand it with the bull's eye diagram:

If the model is too simple and has very few features, it may suffer from high bias and low variance.

If the model has a large number of features, it will have high variance and low bias.

This tradeoff should result in a perfectly balanced relationship between the two of it and ideally low bias and low variance is the success mantra for any machine learning model which we train.

5. What is Instance-Based Learning and Model-Based Learning?

Instance-Based Learning

The most basic form of learning is by mugging up the data.

for example, if this technique is applied to a spam filter, it would just flag all the emails that are identical to the emails that have already been flagged by users. We can say it is not the worst condition but it is also certainly not the best either.

Model-Based Learning

The other way to generalize the prediction is to create a model on these example data and then use that model to predict future instances. For example, we can make different models on the data like a linear model, logistic model and so on.

6. What is the difference between Batch Learning and Online Learning?

Batch Learning

Batch learning does not have the capacity of learning incrementally. Now it means that the machine must be trained using all the available data. It is very tedious, takes lots of time and compute resources and therefore it is generally done offline.

If we train the system with total available data and in production, it runs with the limited knowledge then it is not called a robust model because for certain jobs data is always updating and therefore if the model is not changed according to data then the model will be outdated.

Online Learning

Online learning is the sweet solution of Batch Learning, Yeah, you guessed it correct(if you guessed). It uses mini-batches of whole data and the mini-batches are fed to the model incrementally. It helps us reduce time consumption, compute consumption and most importantly as the data comes it is fed to the model which prevents us to train the model from scratch when new data arrives.

7. Explain Gradient Descent.

Gradient descent is an optimization function that helps to find the minimum of a function.

We start with a random point of a function and move in the negative direction of a function to reach the local/global minima of that function that is why it is called descent.

In Machine Learning the function we minimize to reach the local/global minima is the Cost Function(Error Function) so that we can minimize the error of our model by taking small steps towards the minimum loss of that function and in turn gaining more accuracy of the model.

8. Explain the Recurrent Neural Network(RNN).

A recurrent neural network is a type of neural network which can deal with the time-series problems in an optimized manner. The recurrent neural network(RNN) was introduced by J.L Elman. As the term defines, the recurrent network is formed in it. For example, if we feed input to the model it predicts the output and that output is considered as an input again. This cycle goes on as shown in Fig. It can be better understood as a series of connected feedforward neural networks.

326 views2 comments

Recent Posts

See All
bottom of page