Site menu:

AmPEP - Sequence-based Identification of Antimicrobial Peptides using Distribution Patterns of Amino Acid Properties 

Studying phase transition temperature of pentadecane using MD simulations 

AMP or non-AMP? A question that all drug designers wants to answer for the peptide sequences in their hands. Screenings of AMPs against genomes have been conducted extensively in experiments, however, the processes are lengthy, expensive, and in many cases failure. What if the genome first analyzed by your computer and generated potential AMP sequences for your further action? 

About AmPEP

Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens due to its broad range of activities and low toxicity. However, identification of AMPs through wet-lab experiment is still expensive and time consuming. AmPEP is an accurate computational method for AMP prediction using the random forest algorithm. The prediction model is based on the distribution patterns of amino acid properties along the sequence:

Fig 1: Encode peptide sequence into distribution patterns of
7-type & 3-class of physiochemical properties.

Using our collection of large and diverse set of AMP/non-AMP data (3268/166791 sequences), we evaluated 19 random forest classifiers with different positive:negative data ratios by 10-fold cross-validation. Our optimal model, AmPEP with 1:3 data ratio achieved a very high accuracy of 96%, MCC of 0.9, AUC-ROC of 0.99 and Kappa statistic of 0.9. Descriptor analysis by Pearson correlation coefficients of AMP/non-AMP distributions revealed that reduced feature sets (from full-feature of 105 to minimal-feature of 23) can achieve comparable performance in all aspects except some reductions in precision. Furthermore, AmPEP outperforms existing methods with respect to accuracy, MCC, and AUC-ROC when tested using the benchmark datasets.

Fig 2: The simple yet powerful prediction model based on
random forest algorithm (implemented in MATLAB).

Data Availability

Our collection of AMP and non-AMP datasets for model assessment:
AMP  non-AMP 
Unique data collected from ADP3, CAMPR3, LAMP. All non-natural amino acids were removed. Generated from Uniprot sequences  without annotation of AMP, membrane, toxic, secretory, defensing, antibiotic, anticancer, antiviral and antifungal
3268   166791 

Benchmark datasets from Xiao et al. (iAMP-2L) for methods comparison can be downloaded from here.


Sequence encoding was done using Propy 1.0 package. The classifier was implemented in MATLAB using the TreeBagger function of the Statistical Machine Learning toolbox.

We have reimplemented the MATLAB source codes for the purpose of easy sharing and runnning at your own machine; they are now available in SourceForge 


Please cite our paper if you have used AmPEP or the datasets.

Pratiti Bhadra, Jielu Yan, Jinyan Li, Simon Fong, and Shirley W. I. Siu*
AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest.
(Scientific Reports, major revision)

Contact Us

Developer: Pratiti Bhadra pratiti.bhadra_[at]_gmail_[dot]_com
Jielu Yan mb55463_[at]_connect_[dot]_umac_[dot]_mo
Project P.I.: Shirley W. I. Siu shirley_siu_[at]_umac_[dot]_mo
(please remove all underscores)