ArrayMining - Online Microarray Data Mining
Ensemble and Consensus Analysis Methods for Gene Expression Data
Contact | [X] |
The project and the website are maintained by Enrico Glaab School of Computer Science, Nottingham University, UK webmaster@arraymining.net |
|
Close |
Graph Layout Algorithm - Kamada Kawai | [X] |
Short description: The Kamada-Kawai algorithm is a stochastic, force based graph layout algorithm. This method attempts to reduce the number of edge crossings and maintain the "total balance of the graph" (i.e. the sum of squares of the differences between the "ideal" distance and the actual distance for all vertices). The following default values are used here for the parameters: number of iterations (1000), standard deviation of position change proposals (size(graph)/4), initial temperature (10), cooling exponent (0.99), vertex attraction constant (size(graph)^2). References: - Tomihisa Kamada, Satoru Kawai. "An Algorithm for Drawing General Undirected Graphs". Information Processing Letters, 31:7-15, 1988. |
Graph Layout - Fruchterman-Rheingold | [X] |
Short description: The graph-layout algorithm by Fruchterman and Rheingold is a force-directed method for general undirected graphs. Vertices are modelled as physical objects with a repelling force between them, while edges are represented by springs resulting in attractive forces between the corresponding vertices (proportional to the edge weights). The energy of the system is minimized using Simulated Annealing until a local minimum is reached corresponding to a more readable graph layout. The parameters and their default values are: number of iterations (500), maximum change (size(graph)), area (size(graph)^2), cooling exponent (3), cancellation radius (area*size(graph)). References: - T. Fruchterman, E. Reingold. "Graph Drawing by Force Directed Placement". Software Practise and Experience, 21(11), 1991. |
Graph Layout Algorithm - DrL | [X] |
Short description: The DrL layout algorithm by Shawn Martin et al. is a fast graph visualization method targeted specifically toward large and complex networks. The recursive multi-level force-directed algorithm is based on the VxOrd ordination method, which places nodes in clusters on a 2D-plane such that the sum of a repulsive and an attractive force is minimized (see VxInsight program). The algorithm passes through different phases before reaching an equilibrium, including a liquid phase, an expansion phase, a damping phase, a cool-down phase and a simmer phase. References: - Shawn Martin. http://www.cs.sandia.gov/~smartin/software.html. |
Graph Layout Algorithm - Circle | [X] |
Short description: The circle approach is on of the simplest graph layouts, placing all nodes equidistantly on a unit circle. This layout can be useful to obtain a quick qualitative overview on the connectivity of the network. |
Graph Layout Algorithm - Graphopt | [X] |
Short description: The Graphopt layout approach by Michael Schmuhl is a force-directed method simulating attracting (springs) and repelling forces (opposite charges) between the nodes to compute an optimal placement corresponding to the system's equilibrium. In contrast to simulated annealing based methods, however, it is not guaranteed that the simulation will reach this stable equilibrium. References: - Michael Schmuhl. http://www.schmuhl.org/graphopt. |
Graph Layout Algorithm - Singular Value Decomposition | [X] |
Short description: The SVD method computes a graph layout based on a Singular Value Decomposition of the network's adjacency matrix. The method is applied separately on each graph component, and the single component are then merged together into a common layout. |
Golub et al. (1999) Leukemia data set | [X] |
Short description: Analysis of patients with acute lymphoblastic leukemia (ALL, 1) or acute myeloid leukemia (AML, 0). Sample types: ALL, AML No. of genes: 7129 No. of samples: 72 (class 0: 25, class 1: 47) Normalization: VSN (Huber et al., 2002) References: - Golub et al., Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science (1999), 531-537 - Huber et al., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics (2002) 18 Suppl.1 96-104 |
van't Veer et al. (2002) Breast cancer data set | [X] |
Short description: Samples from Breast cancer patients were subdivided in a "good prognosis" (0) and "poor prognosis" (1) group depending on the occurrence of distant metastases within 5 years. The data set is pre-processed as described in the original paper and was obtained from the R package "DENMARKLAB" (Fridlyand and Yang, 2004). Sample types: good prognosis, poor prognosis No. of genes: 4348 (pre-processed) No. of samples: 97 (class 0: 51, class 1: 46) Normalization: see reference (van't Veer et al., 2002) References: - van't Veer et al., Gene expression profiling predicts clinical outcome of breast cancer, Nature (2002), 415, p. 530-536 - Fridlyand,J. and Yang,J.Y.H. (2004) Advanced microarray data analysis: class discovery and class prediction (http://genome. cbs.dtu.dk/courses/norfa2004/Extras/DENMARKLAB.zip) |
Yeoh et al. (2002) Leukemia multi-class data set | [X] | ||
Short description: A multi-class data set for the prediction of the disease subtype in pediatric acute lymphoblastic leukemia (ALL).
No. of samples: 327 Normalization: VSN (Huber et al., 2002) References: - Yeoh et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. March 2002. 1: 133-143 - Huber et al., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics (2002) 18 Suppl.1 96-104 |
Alon et al. (1999) Colon cancer data set | [X] |
Short description: Analysis of colon cancer tissues (1) and normal colon tissues (0). Sample types: tumour, healthy No. of genes: 2000 No. of samples: 62 (class 1: 40, class 0: 22) Normalization: VSN (Huber et al., 2002) References: - U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” in Proceedings of the National Academy of Science (1999), vol. 96, pp. 6745–6750 - Huber et al., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics (2002) 18 Suppl.1 96-104 |
Singh et al. (2002) Prostate cancer data set | [X] |
Short description: Analysis of prostate cancer tissues (1) and normal tissues (0). Sample types: tumour, healthy No. of genes: 2135 (pre-processed) No. of samples: 102 (class 1: 52, class 0: 50) Normalization: GeneChip RMA (GCRMA) References: - D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J.Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D’Amico, J.P. Richie, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2): pp. 203–209, 2002 - Z. Wu and R.A. Irizarry. Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays. Journal of Computational Biology, 12(6): pp. 882–893, 2005 |
Shipp et al. (2002) B-Cell Lymphoma data set | [X] |
Short description: Analysis of Diffuse Large B-Cell lymphoma samples (1) and follicular B-Cell lymphoma samples (0). Sample types: DLBCL, follicular No. of genes: 2647 (pre-processed) No. of samples: 77 (class 1: 58, class 0: 19) Normalization: VSN (Huber et al., 2002) References: - M.A. Shipp, K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G.S. Pinkus, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1): pp. 68–74, 2002 - Huber et al., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics (2002) 18 Suppl.1 96-104 |
Shin et al. (2007) T-Cell Lymphoma data set | [X] |
Short description: Analysis of cutaneous T-Cell lymphoma (CTCL) samples from lesional skin biopsies. Samples are divided in lower-stage (stages IA and IB, 0) and higher-stage (stages IIB and III) CTCL. Sample types: lower_stage, higher_stage No. of genes: 2922 (pre-processed) No. of samples: 63 (class 1: 20, class 0: 43) Normalization: VSN (Huber et al., 2002) References: - J. Shin, S. Monti, D. J. Aires, M. Duvic, T. Golub, D. A. Jones and T. S. Kuppe, Lesional gene expression profiling in cutaneous T-cell lymphoma reveals natural clusters associated with disease outcome. Blood, 110(8): pp. 3015, 2007 - Huber et al., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics (2002) 18 Suppl.1 96-104 |
Armstrong et al. (2002) Leukemia data set | [X] |
Short description: Comparison of three classes of Leukemia samples: Acute lymphoblastic leukemia (ALL, 0), acute myelogenous leukemia (AML, 1) and ALL with mixed-lineage leukemia gene translocation (MLL, 3). Sample types: ALL, AML, MLL No. of genes: 8560 (pre-processed) No. of samples: 72 (class 0: 24, class 1: 28, class 2: 20) Normalization: VSN (Huber et al., 2002) References: - S.A. Armstrong, J.E Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, S.J. Korsmeyer; MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1): pp. 41–47, 2002 - Huber et al., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics (2002) 18 Suppl.1 96-104 |
SVM - Support Vector Machine | [X] |
Short description: Support vector machines (SVMs) belong to the most popular methods in microarray sample classification. The SVM classifier differs from other learning algorithm in that it selects the separating hyperplane with the maximum distance to the closest samples (the maximum margin hyperplane). Extensions to the linear SVM like the "soft margin" and the "kernel trick" allow the classifier to deal with mislabelled samples and separate non-linear data. We use the linear kernel C-SVM from the e1071-package, which is a wrapper for the well-known LibSVM library. Parameter-optimization is performed via grid-search in a nested cross-validation routine. References: - Dimitriadou, E. and Hornik, K. and Leisch, F. and Meyer, D. and Weingessel, A. and Leisch, M.F. (2005), Misc functions of the department of statistics (e1071), TU Wien - C.-C. Chang and C.-J. Lin (2001), LIBSVM}: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm |
Help | [X] |
Close |
Terms and Conditions | [X] |
Close |
Arraymining.net - Newsletter | [X] |
Stay informed about updates and new features on our website by joining our newsletter. Your email address remains strictly confidential and will only be used to inform you about major updates of our web-service (<= 1 email per month). You can unsubscribe at any time by clicking on the unsubscribe link at the bottom of our e-mails. |
Arraymining.net - Newsletter | [X] |
|
SUMCOV | [X] |
Please notice: The SUMCOV approach intends to pre-filter the gene/protein identifiers in a dataset automatically by retaining both genes/proteins with large sums of covariances in the covariance matrix for all pairs of genes/proteins. Combining this procedure with gene co-expression network construction is not guaranteed to provide a feasible solution for each problem instance. Therefore, if the algorithm does not terminate successfully, the procedure will revert to a classical variance filtering. |
Gene Co-Expression Network Analysis
This module constructs a weighted gene co-expression network and applies a simple topological network analysis for input data from microarray study. To obtain instructions click help.