Source: Instagram dinosandcomics

Hi there, you may not know who I am but I’ve spent some time this week doing some deliberate contemplation — so here’s an article more personal for a change. Some background about myself, I studied Mathematics and Economics in my undergraduate years. After graduation, I joined the government in a role that performs data anonymisation. After a year, I realised there was so much more that I wanted to learn about using data for change and wanted a different pace at work so I spent the next 3 years working in a fast-paced online grocery company as a product…

Features in the dataset often need to be preprocessed before it is used in machine learning models for better model performance, which is why most of data scientists’ time is spent on data cleaning and preparation. I can think of 2 main reasons why we would want to do feature transformation:

  1. Improves model performance
  • How? Variability of feature due to extreme values could be reduced
    Extreme values will lead to extreme estimates when modelling. A dataset with features having very different magnitude poses a problem in some machine learning algorithms as the feature with larger magnitude might be given higher weightage…

In machine learning, cross-validation is often performed to select the best hyper-parameter for a model. Once the hyper-parameters are selected, the model is retrained on both train and validation sets before being evaluated with the test set. A general workflow for cross-validation looks something like this:

Source: Scikit-learn

While cross-validation are often used for hyper-parameter tuning, it is also good to do cross-validation when we are not trying to do hyper-parameter tuning. For simplicity, we leave out hyper-parameter tuning here but cross-validation still needs to be performed if we are comparing performance across different models.

Cross-validation is done by resampling the dataset…

Not all sites provide developer’s API for easy access to their data. Even if they do, chances are they may not contain all the information you need. Possessing web scraping knowledge allows data analysts to supplement their datasets with extracted data from websites, further enriching the analysis. Data scientists could also use additional data obtained for feature engineering and improve the performance of their models.

This post will not share in detail how to inspect the HTML to extract the information that you need from a site. …

The purpose of A/B testing is to find out if the treatment group performs significantly better than the control group in a certain success metric (e.g. conversion rate). How can we tell if it is “significantly” better? What can we do to tell if it is significantly better? The null hypothesis of our statistical test is that there is no difference between treatment and control group and we would like to prove that they are statistically different from each other (reject null hypothesis). …


I write to understand.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store