MCScanx: Multiple Collinearity Scan toolkit

About

MCScan is an algorithm to scan multiple genomes or subgenomes to identify putative homologous chromosomal regions, then align these regions using genes as anchors. MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity and extends the software by incorporating 15 utility programs for display and further analyses. Compared with MCScan version 0.8, MCScanX has the following new features:

1) Simplified usage: The main program (MCScanX) takes only a m8 format BLASTP file and a simplified "gff" file as inputs.

2) More flexible options: The user can set the maximum gaps allowed between adjacent anchors. Outputs can be filtered by intra or inter-species analysis.

3) Better display: Alignments of multiple collinear regions against reference chromosomes are output as HTML files, which can be viewed through a web browser. Duplication depth at each locus is shown in the first column. Tandem genes are marked in red.

4) Adjusted MCScan algorithm: Distances between genes can be calculated in terms of differences in gene ranks in addition to base positions, to mitigate the effects of different gene densities among species. Simplified multi-alignment procedure allows MCScanX to generate results on 8 angiosperm genomes in about 2 hours (runtime test).

5) Incorporating 15 analysis tools: Facilitate further exploration of evolutionary insights based on identified synteny and collinearity, at both genome and gene family levels.

3-28-2013: MCScanX-transposed has been published in Bioinformatics. In addition to analyzing transposed gene duplications, MCScanX-transposed is able to generate duplicated gene pairs of different modes including WGD, tandem, proximal and transposed for a genome..duplicate_gene_classifier users may use MCScanX-transposed to generate duplicated gene pairs.

11-13-2012: an MCScanX-derived software package named MCScanX-transposed, which is aimed to analyze transposed gene duplications, is availabe now.

08-05-2012: there was a logic error in duplicate gene classifier which reduced the numbers of tandem and proximal duplicates and now this error has been corrected. Please re-download the MCScanX package if you previously used duplicate gene classifier.

03-20-2012: a downstream analysis tool named family_tree_plotter_chr has been incorporated in the MCScanX package. family_tree_plotter_chr displays a gene family tree on which collinear and tandem gene pairs are connected with red and blue curves respectively, and each node (gene) of the tree is linked to its position on chromosomes whose synteny is also shown. Its aim is to relate gene family evolution and genome evolution.

 

 

Citation: Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, Kissinger JC, Paterson AH. (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res, 40(7): e49.

For help please contact Dr. Yupeng Wang: wyp1125@gmail.com

Download

MCScanX.zip

Installation

MCScanX should be executed using command lines on Mac OS (via X11) or Linux systems. On Mac OS, Xcode (http://developer.apple.com/xcode/) should be installed prior to the installation of MCScanX package. On Linux systems, the Java SE Development Kit (JDK) and “libpng” should be installed before the installation of MCScanX package. To install MCScanX, unpack the package and type "make".

Documentation

Documents: Manual; MCScanX structure

Examples
Program name Species Computation time (minutes)
MCScanX Arabidopsis (at) <1
MCScanX Arabidopsis (at) and grape (vv) <1
MCScanX_h Arabidopsis (at), soybean (gm), poplar (pt) and grape (vv) ~2
duplicate_gene_classifier Arabidopsis (at) <1
detect_collinear_tandem_arrays Arabidopsis (at) and grape (vv) <1
dissect_multiple_alignments Arabidopsis (at) and grape (vv) <1
dot_plotter, dual_synteny_plotter, circle_plotter and bar_plotter Rice (os) and sorghum (sb) <1
add_ka_and_ks_to_collinearity Arabidopsis (at) ~5
group_collinear_genes Arabidopsis (at) <1
detect_collinearity_within_gene_families Arabidopsis (at) <1
origin_enrichment_analysis Arabidopsis (at) <1
family_circle_plotter Arabidopsis (at) <1
family_tree_plotter Arabidopsis (at) <1
Program runtime

The computation time of MCScanX increases exponentially with number of genomes involved. MCScanX is quite efficient for up to 10 genomes. The following is the computation time on Linux system hosted on a single Intel(R) Xeon(R) CPU (2.33GHz).

Number of Angiosperm genomes
Computation time
1
<1 min
2
<1 min
4
~2 mins
8
~2 hours
10
~12 hours
15
~117 hours
Yupeng Wang | Plant Genome Mapping Laboratory | University of Georgia