Python Packages: Pandas and BioPython

Pandas

Python has lots of packages that allows you to manipulate data in different ways. For example, Pandas allows you to manipulate dataset with a similar dataframe structure that the language R has.

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

Pandas is well suited for many different kinds of data:

  • Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet

  • Ordered and unordered (not necessarily fixed-frequency) time series data.

  • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels

  • Any other form of observational / statistical data sets. The data need not be labeled at all to be placed into a pandas data structure.

For today we are going to learn basics commands that will allow us to upload data from a CSV, perform commands that will allow us to get general data from the dataset and filter by column.

Importing the Pandas package

To work woth any package in python you have to download it through your package manager (in the case of this workshop would be Conda), and then import it. For this workshop, the conda enviroment already had installed Pandas.

The following code will allow you to import the package Pandas to your

import pandas as pd

How to load a CSV into Pandas

df = pd.read_csv('../python_lesson/combined_tidy_vcf.csv')

Note: The directory path were the dataset is located may be in a diferent place than this example.

Getting information about the dataset

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 801 entries, 0 to 800
Data columns (total 29 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   sample_id      801 non-null    object 
 1   CHROM          801 non-null    object 
 2   POS            801 non-null    int64  
 3   ID             0 non-null      float64
 4   REF            801 non-null    object 
 5   ALT            801 non-null    object 
 6   QUAL           801 non-null    float64
 7   FILTER         0 non-null      float64
 8   INDEL          801 non-null    bool   
 9   IDV            101 non-null    float64
 10  IMF            101 non-null    float64
 11  DP             801 non-null    int64  
 12  VDB            801 non-null    float64
 13  RPB            28 non-null     float64
 14  MQB            28 non-null     float64
 15  BQB            28 non-null     float64
 16  MQSB           753 non-null    float64
 17  SGB            801 non-null    float64
 18  MQ0F           801 non-null    float64
 19  ICB            0 non-null      float64
 20  HOB            0 non-null      float64
 21  AC             801 non-null    int64  
 22  AN             801 non-null    int64  
 23  DP4            801 non-null    object 
 24  MQ             801 non-null    int64  
 25  Indiv          801 non-null    object 
 26  gt_PL          801 non-null    object 
 27  gt_GT          801 non-null    int64  
 28  gt_GT_alleles  801 non-null    object 
dtypes: bool(1), float64(14), int64(6), object(8)
memory usage: 176.1+ KB

Getting the first rows of the dataset

df.head()
sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV ... ICB HOB AC AN DP4 MQ Indiv gt_PL gt_GT gt_GT_alleles
0 SRR2584863 CP000819.1 9972 NaN T G 91.0 NaN False NaN ... NaN NaN 1 1 0,0,0,4 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 121,0 1 G
1 SRR2584863 CP000819.1 263235 NaN G T 85.0 NaN False NaN ... NaN NaN 1 1 0,1,0,5 33 /home/dcuser/dc_workshop/results/bam/SRR258486... 112,0 1 T
2 SRR2584863 CP000819.1 281923 NaN G T 217.0 NaN False NaN ... NaN NaN 1 1 0,0,4,5 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 247,0 1 T
3 SRR2584863 CP000819.1 433359 NaN CTTTTTTT CTTTTTTTT 64.0 NaN True 12.0 ... NaN NaN 1 1 0,1,3,8 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 91,0 1 CTTTTTTTT
4 SRR2584863 CP000819.1 473901 NaN CCGC CCGCGC 228.0 NaN True 9.0 ... NaN NaN 1 1 1,0,2,7 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 255,0 1 CCGCGC

5 rows × 29 columns

Getting the last rows of the dataset

df.tail()
sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV ... ICB HOB AC AN DP4 MQ Indiv gt_PL gt_GT gt_GT_alleles
796 SRR2589044 CP000819.1 3481820 NaN A G 225.0 NaN False NaN ... NaN NaN 1 1 0,0,4,8 60 /home/dcuser/dc_workshop/results/bam/SRR258904... 255,0 1 G
797 SRR2589044 CP000819.1 3893550 NaN AG AGG 101.0 NaN True 4.0 ... NaN NaN 1 1 0,0,3,1 52 /home/dcuser/dc_workshop/results/bam/SRR258904... 131,0 1 AGG
798 SRR2589044 CP000819.1 3901455 NaN A AC 70.0 NaN True 3.0 ... NaN NaN 1 1 0,0,3,0 60 /home/dcuser/dc_workshop/results/bam/SRR258904... 100,0 1 AC
799 SRR2589044 CP000819.1 4100183 NaN A G 177.0 NaN False NaN ... NaN NaN 1 1 0,0,3,5 60 /home/dcuser/dc_workshop/results/bam/SRR258904... 207,0 1 G
800 SRR2589044 CP000819.1 4431393 NaN TGG T 225.0 NaN True 10.0 ... NaN NaN 1 1 0,0,4,6 60 /home/dcuser/dc_workshop/results/bam/SRR258904... 255,0 1 T

5 rows × 29 columns

Getting the name of the columns

df.columns
Index(['sample_id', 'CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER',
       'INDEL', 'IDV', 'IMF', 'DP', 'VDB', 'RPB', 'MQB', 'BQB', 'MQSB', 'SGB',
       'MQ0F', 'ICB', 'HOB', 'AC', 'AN', 'DP4', 'MQ', 'Indiv', 'gt_PL',
       'gt_GT', 'gt_GT_alleles'],
      dtype='object')

Identifiying Null values

df.isnull()
sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV ... ICB HOB AC AN DP4 MQ Indiv gt_PL gt_GT gt_GT_alleles
0 False False False True False False False True False True ... True True False False False False False False False False
1 False False False True False False False True False True ... True True False False False False False False False False
2 False False False True False False False True False True ... True True False False False False False False False False
3 False False False True False False False True False False ... True True False False False False False False False False
4 False False False True False False False True False False ... True True False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
796 False False False True False False False True False True ... True True False False False False False False False False
797 False False False True False False False True False False ... True True False False False False False False False False
798 False False False True False False False True False False ... True True False False False False False False False False
799 False False False True False False False True False True ... True True False False False False False False False False
800 False False False True False False False True False False ... True True False False False False False False False False

801 rows × 29 columns

Identifiying Null values and counting them

df.isnull().sum()
sample_id          0
CHROM              0
POS                0
ID               801
REF                0
ALT                0
QUAL               0
FILTER           801
INDEL              0
IDV              700
IMF              700
DP                 0
VDB                0
RPB              773
MQB              773
BQB              773
MQSB              48
SGB                0
MQ0F               0
ICB              801
HOB              801
AC                 0
AN                 0
DP4                0
MQ                 0
Indiv              0
gt_PL              0
gt_GT              0
gt_GT_alleles      0
dtype: int64

Filtering a column

In this example we are going to filter the dataset to alter allele sequences of at least length 10. To do this we need to apply a transformation to the ALT column using apply.

df['ALT'].apply(len)
0      1
1      1
2      1
3      9
4      6
      ..
796    1
797    3
798    2
799    1
800    1
Name: ALT, Length: 801, dtype: int64

This vector of lengths can then be compared to a minimum length to negerate a boolean vector.

min_len = 10
df['ALT'].apply(len) >= min_len
0      False
1      False
2      False
3      False
4      False
       ...  
796    False
797    False
798    False
799    False
800    False
Name: ALT, Length: 801, dtype: bool

Using this boolean vector we can select only the rows that met the condition by using the loc method.

df.loc[df['ALT'].apply(len) >= min_len]
sample_id CHROM POS ID REF ALT QUAL FILTER INDEL IDV ... ICB HOB AC AN DP4 MQ Indiv gt_PL gt_GT gt_GT_alleles
8 SRR2584863 CP000819.1 2103887 NaN ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCC... 56.0000 NaN True 2.0 ... NaN NaN 1 1 0,1,1,1 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 111,28 1 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCC...
60 SRR2584866 CP000819.1 197681 NaN CTTTTTTTT CTTTTTTTTT 50.0000 NaN True 13.0 ... NaN NaN 1 1 0,0,9,4 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 80,0 1 CTTTTTTTTT
72 SRR2584866 CP000819.1 294206 NaN TGGGGGGG TGGGGGGGGG 100.0000 NaN True 8.0 ... NaN NaN 1 1 0,2,4,4 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 151,24 1 TGGGGGGGGG
78 SRR2584866 CP000819.1 352369 NaN AGGGGGGGG AGGGGGGGGG 36.2447 NaN True 16.0 ... NaN NaN 1 1 1,1,12,3 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 63,0 1 AGGGGGGGGG
197 SRR2584866 CP000819.1 1021771 NaN GTTTTTTTT GTTTTTTTTTT 51.0000 NaN True 5.0 ... NaN NaN 1 1 0,1,1,4 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 85,7 1 GTTTTTTTTTT
214 SRR2584866 CP000819.1 1112067 NaN TCCCCCCC TCCCCCCCCC 46.4187 NaN True 4.0 ... NaN NaN 1 1 0,0,3,1 49 /home/dcuser/dc_workshop/results/bam/SRR258486... 78,2 1 TCCCCCCCCC
237 SRR2584866 CP000819.1 1172377 NaN TCCCCCC TCCCCCCCCC 225.0000 NaN True 8.0 ... NaN NaN 1 1 0,0,5,3 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 255,0 1 TCCCCCCCCC
315 SRR2584866 CP000819.1 1699806 NaN CATGATGATGATGAT CATGATGATGAT 153.0000 NaN True 6.0 ... NaN NaN 1 1 1,2,4,2 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 255,75 1 CATGATGATGAT
402 SRR2584866 CP000819.1 2247096 NaN TCCCCCCC TCCCCCCCCCC 17.8332 NaN True 2.0 ... NaN NaN 1 1 1,0,0,2 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 70,25 1 TCCCCCCCCCC
493 SRR2584866 CP000819.1 2693113 NaN GTTCTTCTTCTTC GTTCTTCTTC 213.0000 NaN True 7.0 ... NaN NaN 1 1 0,1,4,3 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 255,15 1 GTTCTTCTTC
516 SRR2584866 CP000819.1 2788797 NaN CAAAAAAAAA CAAAAAAAAAA 59.0000 NaN True 14.0 ... NaN NaN 1 1 0,0,6,8 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 89,0 1 CAAAAAAAAAA
548 SRR2584866 CP000819.1 3052305 NaN AGCGCGCGCGCG AGCGCGCGCG 228.0000 NaN True 9.0 ... NaN NaN 1 1 0,1,3,5 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 255,0 1 AGCGCGCGCG
722 SRR2584866 CP000819.1 4022380 NaN GTTTTTTTT GTTTTTTTTT 11.1284 NaN True 9.0 ... NaN NaN 1 1 2,2,3,4 60 /home/dcuser/dc_workshop/results/bam/SRR258486... 39,1 1 GTTTTTTTTT
794 SRR2589044 CP000819.1 2103887 NaN ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG 225.0000 NaN True 10.0 ... NaN NaN 1 1 0,0,1,9 60 /home/dcuser/dc_workshop/results/bam/SRR258904... 255,0 1 ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG

14 rows × 29 columns

BioPython

BioPython is another packages that Python includes. It features include parsers for various Bioinformatics file formats (BLAST, Clustalw, FASTA, Genbank,…), access to online services (NCBI, Expasy,…), interfaces to common and not-so-common programs (Clustalw, DSSP, MSMS…), a standard sequence class, various clustering modules, a KD tree data structure etc. and even documentation.

To import BioPython Package

For this exersise we are going to import two BioPython (or simply Bio) sub-modules Seq and pairwise2.

from Bio import Seq, pairwise2

The Seq module has a sequence object that stores sequence data in memory more efficiently than strings.

The pairwise2 module has functions that can be used to compare sequences.

Gathering sequences from the dataset we loaded using pandas

Select one of the rows found by the filter and extract the reference and the alter sequences.

row = 8
ref = Seq.Seq(df.loc[row]['REF'])
alt = Seq.Seq(df.loc[row]['ALT'])

print(ref)
print(alt)
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG

Auto Generate Alignments between the reference and the altered sequence

We can generate alignments between the reference and alter sequences using the pairwise2.align.globalxx function.

alignments = pairwise2.align.globalxx(ref,alt)
print(len(alignments))
56

To visualize this alignments, pairwise2 provides the format_alignment function that needs to receive multiple parameters. Thankfully

for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG------------------------
||||||||||||||||||||||||||||||||                        
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCA----G--------------------
|||||||||||||||||||||||||||||||    |                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAGCC----AG--------------------
||||||||||||||||||||||||||||||    ||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAGC---C-AG--------------------
|||||||||||||||||||||||||||||   | ||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAG-C--C-AG--------------------
|||||||||||||||||||||||||||| |  | ||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAGC----CAG--------------------
|||||||||||||||||||||||||||||    |||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAG-C---CAG--------------------
|||||||||||||||||||||||||||| |   |||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCAG----CCAG--------------------
||||||||||||||||||||||||||||    ||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCCA----GCCAG--------------------
|||||||||||||||||||||||||||    |||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGCC----AGCCAG--------------------
||||||||||||||||||||||||||    ||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGC---C-AGCCAG--------------------
|||||||||||||||||||||||||   | ||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAG-C--C-AGCCAG--------------------
|||||||||||||||||||||||| |  | ||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAGC----CAGCCAG--------------------
|||||||||||||||||||||||||    |||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAG-C---CAGCCAG--------------------
|||||||||||||||||||||||| |   |||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCAG----CCAGCCAG--------------------
||||||||||||||||||||||||    ||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCCA----GCCAGCCAG--------------------
|||||||||||||||||||||||    |||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGCC----AGCCAGCCAG--------------------
||||||||||||||||||||||    ||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGC---C-AGCCAGCCAG--------------------
|||||||||||||||||||||   | ||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAG-C--C-AGCCAGCCAG--------------------
|||||||||||||||||||| |  | ||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAGC----CAGCCAGCCAG--------------------
|||||||||||||||||||||    |||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAG-C---CAGCCAGCCAG--------------------
|||||||||||||||||||| |   |||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCAG----CCAGCCAGCCAG--------------------
||||||||||||||||||||    ||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCCA----GCCAGCCAGCCAG--------------------
|||||||||||||||||||    |||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGCC----AGCCAGCCAGCCAG--------------------
||||||||||||||||||    ||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGC---C-AGCCAGCCAGCCAG--------------------
|||||||||||||||||   | ||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAG-C--C-AGCCAGCCAGCCAG--------------------
|||||||||||||||| |  | ||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAGC----CAGCCAGCCAGCCAG--------------------
|||||||||||||||||    |||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAG-C---CAGCCAGCCAGCCAG--------------------
|||||||||||||||| |   |||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCAG----CCAGCCAGCCAGCCAG--------------------
||||||||||||||||    ||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCCA----GCCAGCCAGCCAGCCAG--------------------
|||||||||||||||    |||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGCC----AGCCAGCCAGCCAGCCAG--------------------
||||||||||||||    ||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGC---C-AGCCAGCCAGCCAGCCAG--------------------
|||||||||||||   | ||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAG-C--C-AGCCAGCCAGCCAGCCAG--------------------
|||||||||||| |  | ||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAGC----CAGCCAGCCAGCCAGCCAG--------------------
|||||||||||||    |||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAG-C---CAGCCAGCCAGCCAGCCAG--------------------
|||||||||||| |   |||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCAG----CCAGCCAGCCAGCCAGCCAG--------------------
||||||||||||    ||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCCA----GCCAGCCAGCCAGCCAGCCAG--------------------
|||||||||||    |||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGCC----AGCCAGCCAGCCAGCCAGCCAG--------------------
||||||||||    ||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGC---C-AGCCAGCCAGCCAGCCAGCCAG--------------------
|||||||||   | ||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAG-C--C-AGCCAGCCAGCCAGCCAGCCAG--------------------
|||||||| |  | ||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAGC----CAGCCAGCCAGCCAGCCAGCCAG--------------------
|||||||||    |||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAG-C---CAGCCAGCCAGCCAGCCAGCCAG--------------------
|||||||| |   |||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCAG----CCAGCCAGCCAGCCAGCCAGCCAG--------------------
||||||||    ||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCCA----GCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|||||||    |||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGCC----AGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
||||||    ||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGC---C-AGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|||||   | ||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAG-C--C-AGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|||| |  | ||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAGC----CAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|||||    |||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAG-C---CAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|||| |   |||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACAG----CCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
||||    ||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

ACA----GCCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|||    |||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

AC----AGCCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
||    ||||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

A---C-AGCCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|   | ||||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

--A-C-AGCCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
  | | ||||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

A----CAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
|    |||||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32

--A--CAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG--------------------
  |  |||||||||||||||||||||||||||||||                    
ACAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAG
  Score=32