EvoCC: An Open-Source Classification-Based Nature-Inspired Optimization Clustering Framework in Python

EvoCC framework is an open-source, free, and cross-platform framework implemented in Python which combines clustering, classification, and evolutionary computation methods. It optimizes the classification process by generating a classification model for each group generated by a clustering process where the clustering process is optimized by evolutionary optimization techniques. It includes the most well-known and recent nature-inspired metaheuristic optimization algorithms, well-known datasets, different fitness functions, and distance measures, and several well-known and highly-used classifiers. The aim is to provide the practitioners and researchers with a user-friendly and customizable implementation of classification-based nature-inspired optimization clustering algorithms that can be used by experienced and non-experienced users for the classification process in different domains. The current implementation of the framework includes eleven classification algorithms and five evaluation measures. It also utilizes the implementation of the EvoCluster framework which has ten metaheuristic optimizers, thirty datasets, five objective functions, more than twenty distance measures, and ten different ways for detecting the number of clusters ($k$ value).

Team Members:

Anh T. Dang,
Raneem Qaddoura,
Hossam Faris,
Ibrahim Aljarah,
and Pedro Castillo


Features:

  • Ten nature-inspired metaheuristic optimizers are implemented (SSA, PSO, GA, BAT, FFA, GWO, WOA, MVO, MFO, and CS).
  • Eight classifiers from scikit learn (SVM. linear SVM, SGD, KNN, Naive Bayes, Decision Tree, Multi-Layer Perceptron, and Adaboost)
  • Five objective functions (SSE, TWCV, SC, DB, and DI)
  • Thirty datasets obtained from Scikit learn, UCI, School of Computing at University of Eastern Finland, ELKI, KEEL, and Naftali Harris Blog
  • Twelve evaluation measures (SSE, Purity, Entropy, HS, CS, VM, AMI, ARI, Fmeasure, TWCV, SC, Accuracy, DI, DB, and Standard Deviation) )
  • More than twenty distance measures
  • Ten different ways for detecting the k value )
  • The implementation uses the fast array manipulation using NumPy.
  • Matrix support using SciPy’s package.
  • Simple and efficient tools for prediction using sklearn
  • File data analysis and manipulation tool using pandas
  • Plot interactive visualizations using matplotlib
  • More optimizers, objective functions, datasets, and evaluation measures are coming soon.
  • The source code can be found on GitHub here
  • Published Paper can be found here

Be the first to comment on "EvoCC: An Open-Source Classification-Based Nature-Inspired Optimization Clustering Framework in Python"

Leave a comment

Your email address will not be published.


*