Help
Introduction
The Cancer Network Galaxy (TCNG) is a database of cancer gene networks estimated from the publicly available cancer gene expression data.
You can search for or browse networks via a WWW browser.
Search
Network and gene search
You can search for networks and genes by key words in the search box. Currently, array set names and array set descriptions associated to networks, and gene names and gene descriptions of genes are target fields of search.
If you enter more than one words, it is regarded as "AND" search; that is, it searches for networks and genes including all of the key words you entered. To perform "OR" search, use keyword "OR" between words.
Currently, you can not use the combinations of AND and OR keywords.
Edge search
Use keyword "target:edge" to perform edge search. However, unfortunately it is very very slow. For edge search, you can also use "pa:x" and "ch:x" keywords where x is any keywords in order to search for edges by specifying keywords for parents and children, particularly.
XML and JSON interface
Some pages support XML and JSON output. Add "&fmt=xml" or "&fmt=json" to the HTTP URI. Please do not submit too much requests from your scripts or software. Please limit to 1 request per 3 seconds. That is, wait for at least 3 seconds between your accesses.
Terminologies
A
- array set
- An array set is defined as a set of DNA microarrays. The network estimation methods require many microarrays as input data for estimating gene-to-gene regulatory relationships.
B
- Bayesian network
- A model of gene network. It can model causal relationships between random variables. In gene network estimation, the gene expression of a gene is regarded as a random variable in a network. The networks in TCNG are estimated with the Bayesian network model.
- betweenness
- An index of centrality of nodes. It represents that how close a particular node is located at the center of a network.
- BS.Prob (edge attribute)
- Bootstrap probability calculated by SiGN-BN HC+Bootstrap. It is a frequency of the edge estimated during the bootstrap iterations. The range of the value is from 0 to 1. By the default setting, an edge with BS.Prob greater than 0.05 is regarded as being estimated. You can consider this value as the confidence of the estiamated edge. This does not represent the accuracy nor the strength of the edge. See the original publication for the detailed definitions.
- BS.Direction (edge attribute)
- Frequency of the edge estimated as being this direction, calculated by SiGN-BN HC+Bootstrap. In the bootstrap method, edges in both directions (that is, A -> B and B -> A) are regarded as a single edge, and the direction of the edge is determined by comparing the frequency of the directions out of all the bootstrap iterations that estimate this relationship. The direction with the greater frequency is employed as the estimated edge. Thus, the range of the value is from 0.5 to 1.0.
C
- child
- If two nodes are connected with an directed edge (arrow), the sink node is called as child, child node or child gene.
- CSML
- Cell System Markup language. An XML format used in Cell Illustrator Online. You can download networks in CSML from each network page.
D
- dataset
- A dataset is defined as a pair of a gene set and an array set. The network estimation is performed for a dataset to estimate a network.
- degree
- A degree of a node is the number of edges connected to it. It is the sum of the number of parents and children of it.
E
- edge
- An edge in a network is a directed arrow connecting two nodes (genes). It represents that two genes are estimated as being related in terms of gene expression, or more simply, if an edge is connected from gene A to B, then A seems to regulate B. Note that the edges in the DB are estimated by a statistical method.
F
- Freq (edge attribute)
- Edge attribute calculated by SiGN-BN NNSR. It represents the frequency of the edge estimated during the iterations of the NNSR algorithm. The range of the value is from 0 to 1. By the default setting, an edge with Freq greater than 0.2 is regarded as being estimated. You can consider this value as the confidence of the estimated edge. This does not represent the accuracy nor the strength of the edge. See the original publication for the detailed definitions.
G
- gene set
- A gene set is defined as a set of genes (nodes) used for the single gene network estimation.
H
- hub
- A hub, a hub gene, or a hub node is a node which has many child nodes in a network. Hub genes are very important because they are considered as master regulators in a network.
N
- network
- A network or a gene network is a set of nodes and directed edges. A network is estimated from a dataset by a network estimation method. The network name is made from a combination of a gene set name, an array set name and the estimation method.
- node
- A node is a point connected with edges in a network. It is generally defined as a single probe in DNA arrays. Thus, if it matches for more than one genes, it consists of these matched genes. If so, in TCNG, the node name is made from the matched gene symbols concatenated with triple slashes (///). Two nodes are connected with a directed edge.
P
- parent
- If two nodes are connected with a directed edge (arrow), the source node is called as parent, parent node, or parent gene.
S
- SiGN
- SiGN is a collection of gene network estimation software develped at Laboratory of DNA Information Analysis & Laboratory of Sequence Analysis, Human Genome Center, Institute of Medical Science, The University of Tokyo; and Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo. Visit SiGN web site for more details.
- SiGN-BN
- SiGN-BN is gene network estimation software using Bayesian networks included in SiGN. Visit SiGN-BN web site for more details.