# BioMedR

## Introduction

The BioMedR package offers an R/CRAN package for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions. See `vignette(‘BioMedR’)` for the comprehensive user guide.

## Installation

To install the BioMedR package in R, simply type

    install.packages('BioMedR')

Several dependencies of the BioMedR package may require some system-level libraries, check the corresponding manuals of these packages for detailed installation guides.

## Features

1) BioMedR implemented and integrated the state-of-the-art protein sequence descriptors and molecular descriptors/fingerprints with R. For protein sequences, the BioMedR package could

  * Calculate six protein descriptor groups composed of fourteen types of commonly used structural and physicochemical descriptors that include 9920 descriptors.

  * Calculate six types of generalized scales-based descriptors derived by various dimensionality reduction methods for proteochemometric (PCM) modeling.

  * Parallellized pairwise similarity computation derived by protein sequence alignment and Gene Ontology (GO) semantic similarity measures within a list of proteins.

2) For small molecules, the BioMedR package could:

  * Calculate 307 molecular descriptors (2D/3D), including constitutional, topological, geometrical, and electronic descriptors, etc.

  * Calculate more than ten types of molecular fingerprints, including E-state fingerprints, MACCS keys, etc., and parallelized chemical similarity search.
 

3) For DNA/RNA molecules, the BioMedR package could:

  * Calculate three nucleic acid composition features describing the local sequence information by means of kmers (subsequences of DNA sequences).

  * Calculate  six autocorrelation features describing the level of correlation between  two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties.
 
  * Calculate two pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence order information, particularly the global or long-range sequence order information, via the physicochemical properties of its constituent oligonucleotides.

  * Parallelized pairwise similarity computation derived by fingerprints and maximum common substructure search within a list of small molecules.

4) BioMedR provide 9 kinds of descriptor classes for Proteochemometric (PCM) modeling derived by Principal Components Analysis, Factor Analysis and so on. 

5) By combining various types of descriptors for drugs, proteins and DNA/RNA in different methods, interaction descriptors representing protein-protein, compound-compound, DNA-DNA,  compound-DNA compound-protein and DNA-protein interactions could be conveniently generated withBioMedR, including:

  * Two types of compound-protein interaction (CPI) descriptors
  * Two types of compound-DNA interaction (CDI) descriptors 
  * Two types of DNA-protein interaction (DPI) descriptors
  * Three types of protein-protein interaction (PPI) descriptors
  * Three types of compound-compound interaction (CCI) descriptors
  * Three types of DNA-DNA interaction (DDI) descriptors

6) Several useful auxiliary utilities are also shipped with BioMedR:

  * Parallelized molecule and protein sequence retrieval from several online databases, like PubChem, ChEMBL, KEGG, DrugBank, UniProt, RCSB PDB, genBank, etc.

  * Loading molecules stored in SMILES/SDF files and loading protein sequences from FASTA/PDB files

  * Molecular file format conversion

The computed protein sequence descriptors, molecular descriptors/fingerprints, interaction descriptors and pairwise similarities are widely used in various research fields relevant to drug disvery, primarily bioinformatics, chemoinformatics, proteochemometrics and chemogenomics.

## Links

  * CRAN Page: https://cran.r-project.org/web/packages/BioMedR/index.html

  * Track Devel: https://github.com/wind22zhu/BioMedR

  * Report Bugs: https://github.com/wind22zhu/BioMedR/issues

## Contact

The BioMedR package is developed by Computational Biology and Drug Design Group, Central South University, China.
  
  * Minfeng Zhu <wind2zhu@163.com> 

  * Dong Jie <biomed@csu.edu.cn>

  * Dongsheng Cao <oriental-cds@163.com>

