Here we present the NEPdb, a database containing more than 17,000 validated human immunogenic and non-immunogenic neoepitope entries with human leukocyte antigens (HLAs) and T cell information, curated from published literatures.
Also, NEPdb provides pan-cancer level predicted neoepitopes derived from common cancer somatic mutations, based on NetMHCpan 4.0 and HLAthena.
| Data content | HLA-Ⅰ data | HLA-Ⅱ data | Total data |
|---|---|---|---|
| Entry (Total) | 12239 | 5310 | 17549 |
| Entry (Positive) | 155 | 18 | 173 |
| Entry (Negative) | 12084 | 5292 | 17376 |
| Tumor type | 22 | 11 | 23 |
| HLA allele | 60 | 35 | 95 |
| Gene | 2063 | 811 | 2068 |
| Protein sequence | 2332 | 895 | 2337 |
| Name | Count |
|---|---|
| Cancer gene | 683 |
| Non-synonymous mutation | 16745 |
| Neopeptide | 516036 |
| HLA class Ⅰ | 95 |
| Total prediction | 49023420 |
Overall performance of nine HLA class Ⅰ prediction algorithms (immunogenic data from NEPdb)
Nine commonly used peptide-MHC binding prediction algorithms were respectively evaluated based on our positive samples from Validated Neopeptide Dataset.
NetMHCcons 1.1, NetMHCpan 4.0 and HLAthena performed better than others under this criterion.