Classifying Song Genres

(image credits: Akshay Gupta, instagram: odds_are_awed)

With the boom in technology, music streaming platforms are very commonly used. We all listen to our favourite songs on these platforms. They use various classification methods in order to give personalized recommendations. It could be done by analyzing the audio or various other methods.

In my Data Camp project, I was working on data compiled by a research group known as The Echo Nest. It has many musical features: acousticness, danceability, energy, instrumentalness, liveliness, speechiness, tempo, track_id, valence. These features are recorded on a scale of -1 to 1.

These are the features that will be used for classification of songs as “hip-hop” or “rock”. There is another file with general information about the songs. Only the Genre column is needed from this file. It will be the target variable.

After importing data from both files and merging to form a single data frame, I checked for correlation between Features and plotted the following heat-map:

No strong correlations between the features is found. So, in order for the model to perform well, I performed the most common dimensionality reduction technique – Principal Component Analysis(PCA).

PCA helps to find relative weight of each feature towards the variance between classes.

In order to decide number of components to perform PCA, I plotted n_components on x-axis and the cumulative explained variance on y-axis. To explain 90% of variance, I selected n_components as 6.

In the data-set, there are more entries for Rock classification than for Hip-Hop. In order to prevent disproportional results from the model, I reduced the entries for Rock to the size of Hip-Hop and performed the final PCA.

Finally, models were trained to predict on the test set. I used Decision Trees and Logistic Regression. This is the classification report:

I hope you enjoyed reading about this project. Thanks to Data Camp for an opportunity to work on this interesting project. It was a great learning experience for me.

Find Code and datasets in my Github repository.

Find my LinkedIn profile here.

Pima Indians Diabetes Database - My First Project

Hello, everyone! My name is Manan. I am a student at NMIMS university, pursuing a B.Tech in Data Science. I am starting this blog in order to talk about the Data Science projects I do, the new things that I learn and much more about this field. I have done very few and basic projects yet (like- Titanic, Housing - prices, MNIST, etc). Today, I will be talking about the one I finished recently. It was the first time that I did a project completely on my own. The dataset I was working on is the PIMA Indians Diabetes Database. This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Let's get started! Preparing the Data: If you check the dataset, the...

Manan Jhaveri

Search This Blog

Classifying Song Genres

Comments

Post a Comment

Popular posts from this blog

Pima Indians Diabetes Database - My First Project