We selected the dataset which was noise free and of high quality type. So, we did not need much of
pre-processing for our audio clips. So, we went on extracting features from our .Wav files. We
extracted a total of 42 features via coding in python. The features are : ZCR, Energy, Entropy of Energy,
Spectral centroid, Spectral Spread, Spectral Entropy, Spectral Flux, Spectral Roll-off, MFCCs, Chroma Vectors
Chromo-deviation. After that, we normalized these features and got a processed .CSV file. Then we tested the
combination of these features and saw the result of it on the classifier accuracy and chose the best combination
of the features. The codes written for these extraction are
here
The codes for classification by KNNs are
here
Feature Trying and Outputs
Feature's Combination |
Frame Size(in s) |
Classifier |
Accuracy |
All extracted Features |
0.5 |
SVM |
0.46 |
Removing Spectral Centroid and Spread |
0.5 |
SVM |
0.48 |
Removing some of MFCCs |
0.5 |
SVM |
0.32 |
Removing Chroma Vectors* |
0.5 |
SVM |
0.76 |
Removing chroma vectors along with Chroma Deviation* |
0.5 |
SVM |
0.82 |
Removing chroma vectors along with Chroma Deviation** |
0.5 |
SVM |
1.00 |
*we have split the dataset into 80:20 train to test
New test data and same trained classifier
**we have shuffled the whole test and train data and then found this.