The Secret Moves of the Data Science Ninja: Feature Engineering

Nov 2, 2014 | | Say something

If you want to get better results from your next data analysis project, chances are you would benefit by allocating a little less time to running and interpreting predictive algorithms and a little more time to feature engineering.

If you aren’t sure what I mean by that, no need to be embarrassed. It’s not a widely used term, even in the data science community. I hadn’t heard it myself until maybe 18 months ago. But it turns out I’ve been a practitioner of feature engineering pretty much my whole working life – I just didn’t know what to call it!

For a great introduction to feature engineering, check out this post by Jason Brownlee.

In a nutshell, feature engineering is the process of transforming the original input data into new variables – features – that are easier to interpret and/or have more explanatory power than the original input data. No new data is being created in this process because the features are all implicit in the original dataset, but well-designed features provide a much better starting point for further analysis.

Feature engineering is done by a human being (with machine assistance, of course) not by an automatic algorithm. The human brings context and understanding of the problem to bear in selecting good features.

Some of the likely benefits of using a dataset with well-engineered features:

  • Machine algorithms are more likely to be successful in finding predictive models
  • Predictive models are easier to interpret (less “black box”)
  • Simpler techniques (like visualization) can reveal underlying structure in the dataset without the need for predictive modelling

Sounds too good to be true? Perhaps I can convince you with a worked example.

 

Posted in: Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *