Take a look, df_div = pd.melt(df, “class”, var_name=”Characteristics”), p = sns.violinplot(ax = ax, x=”Characteristics”, y=”value”, hue=”class”, split = True, data=df_div, inner = ‘quartile’, palette = ‘Set1’), df_no_class = df.drop([“class”],axis = 1). Here is the output: The .info() method will give you a concise summary of the DataFrame. Data Exploration and Processing. Usually, the least correlating variable is the most important one for classification. Classification, Clustering . p.set_xticklabels(rotation = 90, labels = list(df_no_class.columns)); #plt.savefig(“violinplot.png”, format=’png’, dpi=500, bbox_inches=’tight’), sns.heatmap(df.corr(),linewidths=.1,cmap="Purples", annot=True, annot_kws={"size": 7}), #plt.savefig("corr.png", format='png', dpi=400, bbox_inches='tight'), df[['class', 'gill-color']].groupby(['gill-color'], as_index=False).mean().sort_values(by='class', ascending=False). So we will have to change the type to ‘category’ before using this approach. Explore it in the IPython Shell and select the correct statement from the options below. The dataset used in this project is mushrooms.csv that contains 8124 instances of mushrooms with 23 features like cap-shape, cap-surface, cap-color, bruises, odor, etc. 2 videos. 10000 . “Nature alone is antique and the oldest art a mushroom.” ~ Thomas Carlyle. We did Exploratory Data Analysis on the data set in python to bust those myths. agaricus-lepiota.data.txt 365 KB Get access. Abstract: This paper presents classification techniques for analyzing mushroom dataset. From over 50,000 species of mushrooms only in North America, how will you classify the mushroom as edible or poisonous? The target is binary, categorical, and balanced. However, the unknown class was combined with the poisonous one. The fruits dataset was created by Dr. Iain Murray from University of Edinburgh. Pandas read_csv() function imports a CSV file (in our case, ‘mushrooms.csv’) to DataFrame format. If you are not aware of the multi-classification problem below are examples of multi-classification problems. From the df.describe() method, we saw that our columns are of ‘object’ datatype. What's included? to classify the mushrooms into edible and poisonous. Explore it in the IPython Shell and select the correct statement from the options below. But have you ever wondered if the mushroom you eat is healthy for you? The dataset used in this project is mushrooms.csv that contains 8124 instances of mushrooms with 23 features like cap-shape, cap-surface, cap-color, bruises, odor, etc. Aritificial Neural Network and Adaptive Nuero Fuzzy inference system are used for implementation of the classification techniques. We can now eat healthy mushrooms!! Here is the output of the Decision Tree Classifier report: Here is the output of the Logistic Regression Classifier report: Here is the output of the Best KNN Value and Test Accuracy: Here is the output of the KNN Classifier report: Here is the output of the SVM Classifier report: Here is the output of the Naive Bayes Classifier report: Here is the output of the Random Forest Classifier report: Predicting some of the X_test results and matching it with true i.e. The .unique() method will give you the unique occurrences in the ‘class’ column of the dataset. Larger values spread out the clusters/classes and make the classification task easier. dot_data = export_graphviz(dt, out_file=None, plt.barh(range(len(sorted_idx)), feature_importance[sorted_idx], align='center', color ="red"), print("Decision Tree Classifier report: \n\n", classification_report(y_test, y_pred_dt)), print("Test Accuracy: {}%".format(round(dt.score(X_test, y_test)*100, 2))), from sklearn.linear_model import LogisticRegression, lr = LogisticRegression(solver="lbfgs", max_iter=500), print("Test Accuracy: {}%".format(round(lr.score(X_test, y_test)*100,2))), print("Logistic Regression Classifier report: \n\n", classification_report(y_test, y_pred_lr)), from sklearn.neighbors import KNeighborsClassifier, print("KNN Classifier report: \n\n", classification_report(y_test, y_pred_knn)), cm = confusion_matrix(y_test, y_pred_knn), print("Test Accuracy: {}%".format(round(svm.score(X_test, y_test)*100, 2))), print("SVM Classifier report: \n\n", classification_report(y_test, y_pred_svm)), cm = confusion_matrix(y_test, y_pred_svm), from sklearn.naive_bayes import GaussianNB, print("Test Accuracy: {}%".format(round(nb.score(X_test, y_test)*100, 2))), print("Naive Bayes Classifier report: \n\n", classification_report(y_test, y_pred_nb)), from sklearn.ensemble import RandomForestClassifier, rf = RandomForestClassifier(n_estimators=100, random_state=42), print("Test Accuracy: {}%".format(round(rf.score(X_test, y_test)*100, 2))), print("Random Forest Classifier report: \n\n", classification_report(y_test, y_pred_rf)), Continuing In 2020: The Disruption of The Analytics Space.