[Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files
Chris Fields
cjfields at uiuc.edu
Fri Mar 10 08:48:40 EST 2006
The second was built using bioperl, so postprocess_data isn't working
as expected. I committed a change to NCBIHelper in CVS yesterday to
fix this by retrieving the sequence directly from NCBI using format
'fasta.'
Chris
On Mar 9, 2006, at 8:38 PM, Brian Osborne wrote:
> Chris,
>
> Below...
>
....
>>
>>> CH398085 Oryza sativa (indica cultivar-group) chromosome 1
>>> scaffold000005
>> genomic scaffold, whole genome shotgun sequence (from NCBI)
>> ....
>> TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT
>> TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC
>> TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA
>> GGATTAAGCTCAGGCCTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>> NNNNNNNNNCTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCCCCTTCAGTA
>> AGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAGTTGAGCGCC
>> TGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATTGTCAGGCCT
>> TAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCTTTTATATCA
>> TGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAGCGTTCGGGA
>> AAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAATGACATATC
>> CTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGGGGTGGAAAA
>> ACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACCCGAGATGCA
>> TAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGTGCCCATGGA
>> GATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTGCAAATTGTG
>> GCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGTACAGAGCCA
>> GAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACCCCGGTCCCT
>> GAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGACTCCAAGTCT
>> AAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTAAGGAACGTG
>> CCAAACTCAGAGATGATGACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>> NNNNNNNNNNNNNTACTTGTTGCAATAATCTTGCTCCGGAGTAAGTGGTTATAGGATGCA
>> AGTACAATAACTAGTTGTAGACAAAGTCAATGACGATACGGAGAAGAATAAGCGCAATGT
>>
>>
>>
>>> CH398085 Oryza sativa (indica cultivar-group) chromosome 1
>>> scaffold000005
>> genomic scaffold, whole genome shotgun sequence (bioperl's version)
>> ....
>> TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT
>> TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC
>> TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA
>> GGATTAAGCTCAGGCCTC <----no gap, missing base
>>
>> CTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCC
>> CCTTCAGTAAGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAG
>> TTGAGCGCCTGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATT
>> GTCAGGCCTTAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCT
>> TTTATATCATGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAG
>> CGTTCGGGAAAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAA
>> TGACATATCCTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGG
>> GGTGGAAAAACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACC
>> CGAGATGCATAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGT
>> GCCCATGGAGATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTG
>> CAAATTGTGGCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGT
>> ACAGAGCCAGAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACC
>> CCGGTCCCTGAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGAC
>> TCCAAGTCTAAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTA
>> AGGAACGTGCCAAACTCAGAGATGATGACCC <---- no gap, missing base
>>
>> GATGGTGGGTTAGCCTGCCTAGCTAGTTC <---- should be revcomp
>> GAAGCGGCACTCCTTTTAATTATTTGATATTAGATCATTTTTTAATATTTGTGTTTTTAC
>> AAGTACCGCGAGGTACAACCTCATGGACAGGAACAACGCTTTTTTGCAACATATATTTTA
>> TACGAAATCTATGCTTTCTGTAAAGTTAAAGCACACTAAATCTAAAGCTTAATATACAAC
>> CATGCCACATCATCACCCACTAGCAATAATTATATATTTAATCTCATACAAGCATACAAA
>
> Here's the sequence from NCBI:
>
> 1621 ttaggtggtt ttataacttt agactttggg aattttcata tcacctggac
> actatggaat
> 1681 tgttggatga tggtggaatt ggacatacac ctctcttcct ctttcaaaac
> ccctaaaacc
> 1741 tgttttcggt ggggtttggg tgcatgccag ttgtgggaag tagcaccccg
> ggcactataa
> 1801 ggattaagct caggcctct
> [gap 50 bp] Expand Ns
> 1870 c tgagtactgt ggttgtactc attcttgctc aatcttttcc
> cccttcagta
> 1921 agagaagatt tggagaagaa gtcttaggtg gagtcctggc ttatacccca
> gttgagcgcc
> 1981 tgtgaagatg gagccgtagg cccgctagtc cgctgctgtt tatttttgat
> tgtcaggcct
> 2041 taagtgcctt tgtaataatg taaatattat cgatataata aagatgtgtc
> ttttatatca
> 2101 tgtttgtgtg gtgtaccccg gcttttcctg ggacggggat taatacacta
> gcgttcggga
> 2161 aaaggcaatt ttcccggtcg cgacagaact tgtaattctc tagcactaga
> atgacatatc
> 2221 ctttggattg tgcaccaatg ccacgcgaaa acccatggtg ccaaaactag
> gggtggaaaa
> 2281 acctccgaga cctcctccga agaggcaggt gacaggtaag gcggaggaac
> ccgagatgca
> 2341 taaggaaaat ccagtgccgg aagtgccacc ggagattgca gtgccggagg
> tgcccatgga
> 2401 gattgtagtg ccgttgtccc aatggagatt acagtggcag aaccagaggt
> gcaaattgtg
> 2461 gcatcagtcg ggacatatat agaagaagta gtacgattgg aatgggacgg
> tacagagcca
> 2521 gaaatatttg aagacccttc tcctgcgaaa gaccccgagg tgcaagaaac
> cccggtccct
> 2581 gagaaggcca ctgacaattc taaggtgcct aaagtgctta tgagccacga
> ctccaagtct
> 2641 aaagatgaga acaatgagaa gttcatgggc taaccatctt cagagggggt
> aaggaacgtg
> 2701 ccaaactcag agatgatgac ccc
> [gap 50 bp] Expand Ns
> 2774 tacttgt tgcaataatc ttgctccgga gtaagtggtt
> ataggatgca
> 2821 agtacaataa ctagttgtag acaaagtcaa tgacgatacg gagaagaata
> agcgcaatgt
> 2881 cagaccagct tgttataatc cagtaacagt aagtaaactc cgtaccgttc
> gtttttttca
> 2941 ttcattttaa ttattgtccg ttgcaggctt gcagcagtca catgagtgcg
> tataaatgca
> 3001 ccgatttcaa gcccggtgct attaatcaat agattcttct tcactgtggt
> tcgacaaaca
> 3061 atgaaactag tataactata gtataactag gtgattcctc acgctttccc
> gtgctttgtt
> 3121 gtaaaattta ctaagaaatt ctcaatatgt tttttttaca atcaaactag
> gattacgaag
>
> It agrees with the 1st sequence, not the second sequence.
>
> Brian O.
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list