Skip to main content

Classifying Song Genres

(image credits: Akshay Gupta, instagram: odds_are_awed)

With the boom in technology, music streaming platforms are very commonly used. We all listen to our favourite songs on these platforms. They use various classification methods in order to give personalized recommendations. It could be done by analyzing the audio or various other methods. 

In my Data Camp project, I was working on data compiled by a research group known as The Echo Nest.  It has many musical  features:  acousticness, danceability, energy, instrumentalness, liveliness, speechiness, tempo, track_id, valence. These features are recorded on a scale of -1 to 1.

These are the features that will be used for classification of songs as “hip-hop” or “rock”. There is another file with general information about the songs. Only the Genre column is needed  from this file. It will be the target variable.

 

After importing data from both files and merging to form a single data frame, I checked for correlation between Features and plotted the following heat-map:

 

No strong correlations between the features is found. So, in order for the model to perform well, I performed the most common dimensionality reduction technique – Principal Component Analysis(PCA).

PCA helps to find relative weight of each feature towards the variance between classes.

 In order to decide number of components to perform PCA, I plotted n_components on x-axis and the cumulative explained variance on y-axis. To explain 90% of variance, I selected n_components as 6.




 

In the data-set, there are more entries for Rock classification than for Hip-Hop. In order to prevent disproportional results from the model, I reduced the entries for Rock to the size of Hip-Hop and performed the final PCA.

 

Finally, models were trained to predict on the test set. I used Decision Trees and Logistic Regression. This is the classification report:

 

I hope you enjoyed reading about this project. Thanks to Data Camp for an opportunity to work on this interesting project. It was a great learning experience for me.

 

Find Code and datasets in my Github repository.

Find my LinkedIn profile here.

 

 

 


Comments

  1. Nice research work and execution, really liked it. This is informative.

    ReplyDelete
  2. Great work , too much depth in research work . Enjoyed reading it๐Ÿ‘

    ReplyDelete

Post a Comment

Popular posts from this blog

Pima Indians Diabetes Database - My First Project

                            Hello, everyone! My name is Manan. I am a student at NMIMS university, pursuing a B.Tech in Data Science. I am starting this blog in order to talk about the Data Science projects I do, the new things that I learn and much more about this field. I have done very few and basic projects yet (like- Titanic, Housing - prices, MNIST, etc). Today, I will be talking about the one I finished recently. It was the first time that I did a project completely on my own.  The dataset I was working on is the PIMA Indians Diabetes Database. This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Let's get started! Preparing the Data: If you check the dataset, there are no missing values. But, note that the  values which were missing