[Bioperl-l] problem to fit genomic coordinates
Kevin Brown
Kevin.M.Brown at asu.edu
Wed Mar 25 13:30:12 EDT 2009
Please keep all replies on list.
Doing it with the SimpleAlign gets rid of the problem of incrementing and reduces the complexity of the number of loop iterations you'll have to do. Based on your sample data you have a lot of IDs that actually have the same location information that they are needing, you also have overlapping information from the first file. So you'll still need to make decisions as to which item is what you really want (e.g. CDS vs Exon).
________________________________
From: Laurent MANCHON [mailto:lmanchon at univ-montp2.fr]
Sent: Wednesday, March 25, 2009 9:44 AM
To: Kevin Brown
Subject: Re: [Bioperl-l] problem to fit genomic coordinates
Okay but i think it's not an easy way with this method,
the files are already sorted on colum numbers, so maybe another logical method
without using Bioperl libraries exist, for example using a while loop,
something like:
$i = $j = 1;
$idx = number of lines in file1
$cpt = number of lines in file2
while ($i <= $idx && $j <= $cpt) {
#compare current elements
#increment either $i or $j depending which segment comes before the other
}
the difficulty is when to decide to incremente $i or $j inside the loop
Laurent --
Kevin Brown a écrit :
Read in first file and create a Bio::SimpleAlign object
Then use the slice method to find the features that are between the
start/end values of your second file
=head2 slice
Title : slice
Usage : $aln2 = $aln->slice(20,30)
Function : Creates a slice from the alignment inclusive of start and
end columns, and the first column in the alignment is
denoted 1.
Sequences with no residues in the slice are excluded from
the
new alignment and a warning is printed. Slice beyond the
length of
the sequence does not do padding.
Returns : A Bio::SimpleAlign object
Args : Positive integer for start column, positive integer for end
column,
optional boolean which if true will keep gap-only columns
in the newly
created slice. Example:
$aln2 = $aln->slice(20,30,1)
=cut
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
Laurent MANCHON
Sent: Wednesday, March 25, 2009 7:57 AM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] problem to fit genomic coordinates
this is my problem:
how is it possible to fit range of genomic coordinates stored in two
distinct files ?
first file (file1.txt) is my annotation file with format as:
regulatory_region 3455 3463
regulatory_region 3535 3544
regulatory_region 3601 3608
transcriptional_cis_regulatory_region 3622 3630
five_prime_UTR 3631 3759
CDS 3760 3913
exon 3631 3913
CDS 3996 4276
exon 3996 4276
CDS 4486 4605
exon 4486 4605
CDS 4706 5095
exon 4706 5095
CDS 5174 5326
exon 5174 5326
....
....
second file (file2.txt) is my experimental file with format as:
acc_2765773 3222 3239 -
acc_2842543 3222 3239 -
acc_2842544 3222 3239 -
acc_442945 3222 3239 -
acc_442946 3222 3239 -
acc_4873 3222 3239 -
acc_53956 3222 3239 -
acc_562588 3222 3239 -
acc_807114 3222 3239 -
acc_84146 3222 3239 -
acc_2419732 3268 3285 +
acc_3041065 3565 3583 +
acc_362358 3640 3656 -
acc_3279485 3793 3813 +
acc_3091017 3794 3811 -
acc_2807380 3832 3848 +
acc_3105138 3832 3848 +
acc_3105139 3832 3848 +
acc_3105140 3832 3848 +
acc_3116450 3832 3848 +
acc_86708 3832 3848 +
acc_1987802 3922 3938 -
acc_1679660 4113 4129 +
acc_891489 4113 4129 +
acc_2829973 4299 4318 +
....
....
number of lines in file1.txt ~ 150000
number of lines in file2.txt ~ 800000
so, how to annotate my file2 using the genomic coordinates stored in
file1. I need to compare each couple of range of my file2 with each
couple of range of my file1: 800000x150000 combinaisons (quadratic
analysis) ?
i'm looking for a fast method to do that, something like linear
progression in the analysis
thank you so much if you have ideas for help me.
Laurent --
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list