Skip to content

This toolbox offers 13 wrapper feature selection methods (PSO, GA, GWO, HHO, BA, WOA, and etc.) with examples. It is simple and easy to implement.

License

NotificationsYou must be signed in to change notification settings

JingweiToo/Wrapper-Feature-Selection-Toolbox-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jx-WFST : Wrapper Feature Selection Toolbox

LicenseGitHub release


"Toward Talent Scientist: Sharing and Learning Together" --- Jingwei Too


Wheel

Introduction

  • This toolbox offers 13 wrapper feature selection methods
  • The Demo_PSO provides an example of how to apply PSO on benchmark dataset
  • Source code of these methods are written based on pseudocode & paper

Usage

The main function jfs is adopted to perform feature selection. You may switch the algorithm by changing the pso in from FS.pso import jfs to other abbreviations

  • If you wish to use particle swarm optimization ( PSO ) then you may write
from FS.pso import jfs
  • If you want to use differential evolution ( DE ) then you may write
from FS.de import jfs

Input

  • feat : feature vector matrix ( Instance x Features )
  • label : label matrix ( Instance x 1 )
  • opts : parameter settings
    • N : number of solutions / population size ( for all methods )
    • T : maximum number of iterations ( for all methods )
    • k : k-value in k-nearest neigr

Output

  • Acc : accuracy of validation model
  • fmdl : feature selection model ( It contains several results )
    • sf : index of selected features
    • nf : number of selected features
    • c : convergence curve

Example 1 : Particle Swarm Optimization ( PSO )

import numpy as np
import pandas as pd
from sklearn.neigrs import KNeigrsClassifier
from sklearn.model_selection import train_test_split
from FS.pso import jfs   # change this to switch algorithm 
import matplotlib.pyplot as plt


# load data
data  = pd.read_csv('ionosphere.csv')
data  = data.values
feat  = np.asarray(data[:, 0:-1])   # feature vector
label = np.asarray(data[:, -1])     # label vector

# split data into train & validation (70 -- 30)
xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label)
fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest}

# parameter
k    = 5     # k-value in KNN
N    = 10    # number of particles
T    = 100   # maximum number of iterations
w    = 0.9
c1   = 2
c2   = 2
opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'w':w, 'c1':c1, 'c2':c2}

# perform feature selection
fmdl = jfs(feat, label, opts)
sf   = fmdl['sf']

# model with selected features
num_train = np.size(xtrain, 0)
num_valid = np.size(xtest, 0)
x_train   = xtrain[:, sf]
y_train   = ytrain.reshape(num_train)  # Solve bug
x_valid   = xtest[:, sf]
y_valid   = ytest.reshape(num_valid)  # Solve bug

mdl       = KNeigrsClassifier(n_neigrs = k) 
mdl.fit(x_train, y_train)

# accuracy
y_pred    = mdl.predict(x_valid)
Acc       = np.sum(y_valid == y_pred)  / num_valid
print("Accuracy:", 100 * Acc)

# number of selected features
num_feat = fmdl['nf']
print("Feature Size:", num_feat)

# plot convergence
curve   = fmdl['c']
curve   = curve.reshape(np.size(curve,1))
x       = np.arange(0, opts['T'], 1.0) + 1.0

fig, ax = plt.subplots()
ax.plot(x, curve, 'o-')
ax.set_xlabel('Number of Iterations')
ax.set_ylabel('Fitness')
ax.set_title('PSO')
ax.grid()
plt.show()

Example 2 : Genetic Algorithm ( GA )

import numpy as np
import pandas as pd
from sklearn.neigrs import KNeigrsClassifier
from sklearn.model_selection import train_test_split
from FS.ga import jfs   # change this to switch algorithm 
import matplotlib.pyplot as plt


# load data
data  = pd.read_csv('ionosphere.csv')
data  = data.values
feat  = np.asarray(data[:, 0:-1])
label = np.asarray(data[:, -1])

# split data into train & validation (70 -- 30)
xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label)
fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest}

# parameter
k    = 5     # k-value in KNN
N    = 10    # number of chromosomes
T    = 100   # maximum number of generations
CR   = 0.8
MR   = 0.01
opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'CR':CR, 'MR':MR}

# perform feature selection
fmdl = jfs(feat, label, opts)
sf   = fmdl['sf']

# model with selected features
num_train = np.size(xtrain, 0)
num_valid = np.size(xtest, 0)
x_train   = xtrain[:, sf]
y_train   = ytrain.reshape(num_train)  # Solve bug
x_valid   = xtest[:, sf]
y_valid   = ytest.reshape(num_valid)  # Solve bug

mdl       = KNeigrsClassifier(n_neigrs = k) 
mdl.fit(x_train, y_train)

# accuracy
y_pred    = mdl.predict(x_valid)
Acc       = np.sum(y_valid == y_pred)  / num_valid
print("Accuracy:", 100 * Acc)

# number of selected features
num_feat = fmdl['nf']
print("Feature Size:", num_feat)

# plot convergence
curve   = fmdl['c']
curve   = curve.reshape(np.size(curve,1))
x       = np.arange(0, opts['T'], 1.0) + 1.0

fig, ax = plt.subplots()
ax.plot(x, curve, 'o-')
ax.set_xlabel('Number of Iterations')
ax.set_ylabel('Fitness')
ax.set_title('GA')
ax.grid()
plt.show()

Requirement

  • Python 3
  • Numpy
  • Pandas
  • Scikit-learn
  • Matplotlib

List of available wrapper feature selection methods

  • Note that the methods are altered so that they can be used in feature selection tasks
  • The extra parameters represent the parameter(s) other than population size and maximum number of iterations
  • Click on the name of method to view how to set the extra parameter(s)
  • Use the opts to set the specific parameters
  • If you do not set extra parameters then the algorithm will use default setting in here
No.AbbreviationNameYearExtra Parameters
13hhoHarris Hawk Optimization2019No
12ssaSalp Swarm Algorithm2017No
11woaWhale Optimization Algorithm2016Yes
10scaSine Cosine Algorithm2016Yes
09jaJaya Algorithm2016No
08gwoGrey Wolf Optimizer2014No
07fpaFlower Pollination Algorithm2012Yes
06baBat Algorithm2010Yes
05faFirefly Algorithm2010Yes
04csCuckoo Search Algorithm2009Yes
03deDifferential Evolution1997Yes
02psoParticle Swarm Optimization1995Yes
01gaGenetic Algorithm-Yes