Building a Data Science Portfolio

As someone getting started in Data Science and Tech generally, I know what it feels like to be overwhelmed with everything there is to learn, and that's why I'm glad I came across Emily Robinson and Jacqueline Nolis's book , “Build a career on Data Science”.

I'm even more glad I get to share what I have learnt so far with people who might feel the same way I feel, sometimes.

I just finished chapter 4 of the book and I've learnt really interesting things.

As someone starting out in Data Science, I'm always looking for resources to improve my learning, and well, apart from watching videos on YouTube, going through courses and reading articles on blogs, I also got books on everything Data Science.

That's how I came across this book, and as the name implies, it's a book that guides you on how to build a career in Data Science - duh - I think if you're a newbie in Data Science - like me - you should also check the book out.

The preceding chapters talked about what Data Science is, Data Science companies and getting the skills, this particular chapter I just read talked about how you can build your Data Science portfolio.

Having a portfolio is really important as it gives you a chance to display your skillset and show employers that you can really do the job.

There are two parts of a Data Science portfolio:

A GitHub repository : This is where you host the code for any project you have worked on.
A Blog : This is where you show off your communication skills and the non code part of your Data Science work.

As a Data Scientist, you'll always be in a position where you have to communicate your results to a group of people who might not be in the Data Science field, so you'll have to find a way to translate the entire Data Science process into business language or lay man terms.

A Data Science project typically starts with a Data set - any dataset that you find interesting and a question - a question to ask about it.

It's important to note that it's not all the time it starts with a Data set, it could also start with a question, maybe something that's been on your mind for some time.

A typical Data Science project usually goes like this:

Dataset - Question or Question - Dataset.
Analysis : Try to answer the question you have with the Data.
Blog Post : Write a blog post about the entire process, what did you do? What didn't you do? What challenges did you face? etc. and GitHub : You store your code publicly with documentation.

It's not just enough to commit your code to GitHub, it's also important that you fill out a GitHub README concerning your project, and do it well, here are some questions you'll have answer while filling out your GitHub README :

What is the project about?
What data does it use?
What question are you answering?
What was the output: A model, a machine learning system, a dashboard, or a report?

When filling the repository, you shouldn't put all your code together without arranging it properly, it will be hard for anyone to read.

It's important you divide it into sections :

The process you took to get the Data.
The cleaning process.
Exploring the Data.
Your final analysis.

When setting up a Data Science blog, there are tons of things you could talk about, the best practice is taking the audience of your blog as your self when you started out initially, talk about what you wish you had answers to, then.

Your Data Science blog posts could be in the form of :

- Code Heavy Tutorials : This is the type of blog post where you show your readers how to do things like web scraping, exploratory data analysis, etc.

- Theory Heavy Tutorials : This is the type of blog post where you teach your readers something, probably something you found out or learnt recently, like, What is probability mass functions, cumulative distribution function, etc.

- A fun project you did : Talk to your readers about a project you did and take them through the steps.

- Write about your experience : you can tell your readers about a Bootcamp you participated in recently, an online course you took, a summit you attended, etc.

When setting up your blog, you could either decide to make your own website or use platforms like Medium or Hashnode , whichever one.

You could choose how often you want to post on your blog, it is up to you. It doesn't really matter if you post every week or every month, the point is, you're doing it and you're learning.

You get to structure your thoughts, and just like when you're teaching someone in person, it helps you realize when you don’t know something as well as you thought you did.

I really enjoyed reading the book because it answered a lot of questions I had concerning building a portfolio, now I'm sure that I'm on the right path and doing the right thing.

There was something David Robinson - a Data Scientist - talked about during the question and answer section of the chapter, he talked about how easy it is to get overwhelmed with keeping up in the industry and how tempting it is to start worrying about learning really advanced stuff like, deep learning and other things as a beginner.

According to him, here are the things you should be concerned with when starting out in Data Science :

Getting good at transforming and visualizing data.
Knowing how to program with a wide variety of packages.
Know how to use statistical techniques like, Hypothesis tests, regression, and classification.

Keep learning, practicing, and giving yourself breaks, allow yourself to recharge. I hope you learnt something valuable from this blog post.

Do not forget to react, comment and follow. To your success! 🚀