[Bioperl-l] Concatenating Bacterial Genome Sequence

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed Nov 5 06:50:50 EST 2003


Hi

First of all, apologies to posting to both lists at once, I realise a lot of people will get this e-mail twice, but I believe this question is of relevance to both lists.

Those of you on the ensembl list will be familiar with my (successful!) attempts to put the Salmonella genome into an ensembl (well, actually, an otter) database - the parse_pathogen script, by and large, worked very well and I have a (mostly) functional website.

The problem comes from the fact that the EMBL entries for the bacterial genomes I am interested in consist of many different sequences which represent segments of the genome.  So parse_pathogen handles this by creating a new ensembl "chromosome" for each segment.  Of course these bacterial genomes are circular and constant, so splitting them up into chromosomes doesn't make too much sense, but I can get away with it most of the time with typhi CT18, which is in 20 pieces, and typhi Ty2, which is in 16 pieces, but when I come to typhimurium LT2, this is in 220 pieces;  If I want to pose the question "Are these two gene's adjacent on the genome?", normally a very simple task using ensembl, I will have to do some jumping through hoops figuring out if the genes are at the end of segments, and if so, what are the adjacent segments and are the gene's adjacent on the genome but on two different segments... 

So what would be realy great, and this is where bioperl (maybe) comes in, is something that takes the EMBL entry for the S.typhimurium genome, which is actually 220 EMBL sequences, and creates a single EMBL sequence entry for the whole genome, with all the feature's updated so that their location is relative to the start of the whole genome, and not just of the segment they are on.   Has anyone done this and care to share?  If not, any comments on how difficult/easy this might be using Bioperl would be welcome.

Regards

Mick


More information about the Bioperl-l mailing list