IPA Network Generation

zdloy 2010-08-19

展开全文

IPA Network Generation

What are the steps in the Network Generation Algorithm?

1. The user designates molecules of interest on the Create Analysis page before running the analysis. Molecules of interest which interact with other molecules in the Ingenuity Knowledge Base are identified as Network Eligible molecules. Network Eligible molecules serve as "seeds" for generating networks.

2. Network Eligible molecules are combined into networks that maximize their specific connectivity, which is their interconnectedness with each other relative to all molecules they are connected to in the Ingenuity Knowledge Base.

3. Additional molecules from the Ingenuity Knowledge Base are used to specifically connect two or more smaller networks by merging them into a larger one. Networks are limited to 35 molecules each to keep them to a usable size.

4. Networks are scored based on the number of Network Eligible molecules they contain. The higher the score, the lower the probability of finding the observed number of Network Eligible molecules in a given network by random chance.

For more details, see the IPA Network Generation Algorithm whitepaper.

How does the Network Generation Algorithm work if I have uploaded a list of molecules without expression values?

IPA considers all Network Eligible molecules on your list to be of equal importance when generating networks for molecule lists. Network Eligible molecules are uploaded molecules that have interactions with other molecules in the Ingenuity Knowledge Base. See ”What are the steps in the Network Generation Algorithm”.

Are all relationships displayed in a network for a particular set of molecules?

Networks show relevant relationships as specified by the Analysis Components settings in the Create Analysis page when the analysis was run. This generally does not include all relationships for every Network Eligible molecule. Some interactions present in the Ingenuity Knowledge Base are not used in the network generation process. However, for your molecules of interest, you may often find these additional relationships helpful and biologically important. There are several ways to view additional relationships:

1. You can view the full complement of direct and indirect interactions for a molecule by double-clicking on it to see its Node View summary and then clicking on Neighborhood Explorer. See the Mutant Information section of the Node View for interactions involving functionally mutant forms of the molecule.

2. You may add interactions to a network by clicking the Build button and using the Grow (to add new molecules) or Connect tools after selecting molecules of interest.

3. You may select multiple networks of interest in the Networks tab and then click the Merge Networks button to combine them into one network which adds and highlights all interactions between molecules in different local networks.

How does IPA use my expression values in the Network Generation Algorithm?

If you set a cutoff value in the Create Analysis page, IPA compares the expression values of your molecules to it to identify the Network Eligible molecules. molecule expression values are also used along with specific connectivity to prioritize addition of molecules that are not Network Eligible into networks.

What is the "Score" for a network, how is it calculated, and how should I interpret this?

The score is a numerical value used to rank networks according to their degree of relevance to the Network Eligible molecules in your dataset. The score takes into account the number of Network Eligible molecules in the network and its size, as well as the total number of Network Eligible molecules analyzed and the total number of molecules in the Ingenuity Knowledge Base that could potentially be included in networks. In the Networks view, networks are ordered according to their score, with the highest scoring network displayed at the top of the page.

The network Score is based on the hypergeometric distribution and is calculated with the right-tailed Fisher's Exact Test.

For this example, suppose that a network of 35 molecules has a Fisher Exact Test result of 1x10-6. The network’s Score = -log(Fisher's Exact test result) = 6. This can be interpreted as, "There is a 1 in a million chance of getting a network containing at least the same number of Network Eligible molecules by chance when randomly picking 35 molecules that can be in networks from the Ingenuity Knowledge Base”.

The score is not an indication of the quality or biological relevance of the network; it simply calculates the approximate "fit" between each network and your Network Eligible molecules.

How do "hub" molecules affect the Network Generation Algorithm?

The network algorithm optimizes for specific connectivity, so when a "hub" molecule is included in a network it connects a higher fraction of Network Eligible molecules relative to all molecules the "hub" molecule is connected to. Biologically many such "hub" molecules often exist in multiple protein complexes and this is represented in networks as many molecules connected to the "hub" molecule rather than many protein complex nodes.

If you see a "hub" molecule that is not a Network Eligible molecule and believe is unlikely to be present or active in your biological context, we recommend flagging the molecule as Absent in your input file, which will cause the algorithm to exclude it. Additionally, "hub" molecules often have many indirect molecular signaling effects, so running analyses with direct interactions only also reduces the likelihood of hub molecules by only including them in cases where they directly physically interact with Network Eligible molecules.

One way to see the specific connectivity significance of a hub molecule is to add it to a new MyPathway. Use the Grow function with the same relationship types (e.g. direct and indirect interactions or direct only) and node types (all molecules or

exclude chemicals) as in the analysis to get the total number of molecules. Then overlay the expression values from the analysis to determine the number of nodes that are Network Eligible molecules.

Why do I get high network scores when I submit a random list of molecules into IPA?

The purpose of IPA's network generation algorithm is to find networks of highly connected Network Eligible molecules. If you submit molecules chosen at random, IPA will still do its best to bring as many Network Eligible molecules into a single network as possible, because it assumes that these molecules have some interest to you. If the number of Network Eligible molecules is large, resulting networks can often receive a high score because of the generally high interconnectivity of molecules in the Ingenuity Knowledge Base.

When looking at the List of Networks generated for a random list of molecules, the highest-scoring network typically has a lower score than a non-random list. Additionally, the distribution of network scores (e.g. when comparing sorted order) typically is lower and falls off more sharply for random networks relative to actual data (see the algorithm whitepaper).

It's important to keep the biological context in mind when evaluating the networks since the goal of the algorithm is to come up with the best hypotheses it can about how the Network Eligible molecules may be interacting biologically, using the principle of specific connectivity. The algorithm assumes there are some biological commonalities and attempts to identify and highlight them for you. The score and the processes associated with each network are intended to help you identify the most striking and relevant networks given your biological context.

The network score is not intended to prove that a particular network represents what is happening in a biological system. It just indicates that the network is rare relative to all hypotheses it could come up with.

Functional Analysis Algorithm

To aid the user in assessing the biological relevance of the network to the sample, the application also calculates the probabilistic fit between the networks and a list of biological functions stored in the Ingenuity Knowledge Base. This enables the user to correlate the molecules that form a given network to their involvement in a specific biological function, e.g. apoptosis.

The diagram below illustrates the flow of information in IPA: