Commit a6ecbe4a authored by ClovisG's avatar ClovisG
Browse files

removed prot struct pred

parent 847c3678
......@@ -441,55 +441,8 @@ for i,record in enumerate(Bio.SeqIO.parse("path/to/my/file.fasta",&quo
<p>You can retrieve pre-computed multiple sequence alignments of known protein families (in the <strong>Alignments</strong> tab in the menu on the left). These protein sequence alignments can be used for many purposes, and in particular for protein structure prediction (see following section).</p>
</div>
</div>
<div id="protein-structure-prediction" class="section level2">
<h2>Protein structure prediction</h2>
<div id="predicting-contact-map-from-multiple-sequence-alignment-msa" class="section level3">
<h3>Predicting contact map from multiple sequence alignment (MSA)</h3>
<p>There are 20 standard amino acids. They have various physicochemical properties, such as size, charge, aromatic cycle, etc.</p>
<p>The structure and therefore the interactions between the amino acid of the protein sequence determine its function in the organism. Facing amino-acid in 3D structure interacts, we say there are in <em>contact</em>. The structure of the protein is stable mainly due to electrostatic interaction between amino-acid being in contact.</p>
<p>Therefore if one mutates, the amino-acid in contact have to compensate the mutation: e.g. if two amino-acid X and Y are charged positively and negatively respectively, if X is mutated in a negatively charged amino-acid, then Y have to be mutated in a positively charged amino-acid.</p>
<p>By aligning mulitple related protein sequences and computing the mutual information between the position of the alignment, one can infer the possible contacts. The mutual information is defined as follows:</p>
<center>
<span class="math inline">\(MI(i,j) = \sum f_{i,j,a,b} log_2\frac{f_{i,j,a,b}}{f_{i,a}.f_{j,b}}\)</span>
</center>
<p>Where <span class="math inline">\(f_{i,j,a,b}\)</span> is the frequency of the event “having amino acid <span class="math inline">\(a\)</span> at position <span class="math inline">\(i\)</span> and amino acid <span class="math inline">\(b\)</span> at position <span class="math inline">\(j\)</span>”, and <span class="math inline">\(f_{i,a}\)</span> and <span class="math inline">\(f_{j,b}\)</span> the corresponding marginals. If there are a lot of changes at position <span class="math inline">\(i\)</span> and if those changes are correlated to changes at position <span class="math inline">\(j\)</span>, then the mutual information will be high. If position <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> are independent, then the joint equals the product of the marginals (<span class="math inline">\(f_{i,j,a,b} = f_{i,a}.f_{j,b}\)</span>) and the mutual information is 0.</p>
<p>A simple approach when there are enough sequences in an MSA is to suppose that:</p>
<center>
<span class="math inline">\(MI(i,j)&gt;\tau \Rightarrow \text{contact between } i \text{ and } j\)</span>
</center>
<p>where <span class="math inline">\(\tau\)</span> is a threshold that has to be tuned.</p>
<p>As some columns tend to predict too many contacts (many correlations arise due to inherited mutations, and not by compensatory effect), the following usual correction is applied:</p>
<center>
<span class="math inline">\(MI&#39;(i,j) = MI(i,j) - \frac{1}{N}\sum\limits_{k}(MI(k,j)+MI(i,k))\)</span>
</center>
</div>
<div id="ft-comar" class="section level3">
<h3>FT-COMAR</h3>
<p>This software generates a 3D structure from a contact map.</p>
<p>You can download it here: <a href="https://clovisg.github.io/teaching/protein-structure-prediction/ft-comar.tgz">FT-COMAR</a>.</p>
<p>The contact map have to be in the following format:</p>
<pre><code>133
1
0
0
0
0
1
0
...</code></pre>
<p>Where the first line is the number <span class="math inline">\(N\)</span> of amino-acid in the sequence, and the following <span class="math inline">\(N^2\)</span> lines indicate if there is a <strong>contact</strong> by a 1 between amino acid <span class="math inline">\(i\)</span> and amino acid <span class="math inline">\(j\)</span>, or <strong>no contact by a 0</strong>. The order of the lines correspond to the lexicographic order: (1,1), (1,2),…, (1,N), (2,1), (2,2),…, (2,N), …, (N,N).</p>
<p>Here is an example for getting the 3D structure (in PDB format) associated to the contacts listed in <code>my_contacts.lst</code>:</p>
<pre><code>path/to/FT-COMAR my_contacts.lst 9 0 test.pdb</code></pre>
<p>If you want to visualize the obtained 3D structure, you can use rasmol by calling:</p>
<p><code>rasmol test.pdb</code></p>
</div>
</div>
 
 
</div>
<script>
 
// add bootstrap table styles to pandoc tables
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment