[Bioperl-l] GEO SOFT Parser?

Gong Wuming gongwuming at hotmail.com
Sun May 30 22:47:26 EDT 2004


Hi Tex. 
I asked the same question here some days before but got no responce. It is 
a bit surprising because I thought it should be relatively common problem.
At first I planned to roll a module for parsing soft format in 
Bio::Expression::MicroarrayIO::, but then I found it is a difficult for me 
because many important base classes in Bioperl-Microarray were not 
implemented yet especially on the feature of expression data. So, I wrote a 
simple perl script for reading information in soft file into a data 
strucuture. below is the code. 

-----------------------------------
#! /usr/bin/perl
use strict;
use warnings;
my $hash = {};
my $DATA = ();
my ($last_domain, $this_domain, $last_mark, $this_mark);

# Reading file line by line.
while (<>){
  chomp;

  $this_mark = substr($_, 0, 1); # Get line marker: '^', '!' or '#'

  if ($this_mark =~ /\^|\!/){ # If the line is headed by '^' or '!'.
    my @attr;

    # Extract the key-value pair ("key = value")
    my ($key, $value) = split (/\s+=\s+/, substr($_, 1));
    ($this_domain, @attr) = split ("_", $key);
    my $attribute = join ('_', @attr) || 'id';

    if ($this_mark eq '^' and $last_domain) {
      my %attribute = %$hash;
      push (@{$DATA->{$last_domain}}, \%attribute);
      $hash = {};
    }
    $hash->{$attribute} = $value;
  }elsif ($this_mark eq '#'){
    my ($field, $desc) = /^#(.+?)\s+=\s+(.+)$/;
    my ($description, $src) = (split (/;*\s+.+?:\s+/, $desc))[1, 2];
    push (@{$DATA->{'data'}}, {'field'=>$field, 
'description'=>$description, 'src'=>$src, 'value'=>[]});
  }else{ # Data field.
    next if /^ID_REF/;
    my $i = 0;
    map {push (@{$DATA->{'data'}->[$i++]->{'value'}}, $_)} split (/\t/);
  }
  $last_domain = $this_domain;
  $last_mark = $this_mark;
}
-------------------------------------------------------------
The results were stored in such a data structrure:

$DATA{
  'database'=>{
    'name'=>
    'institute'=>
    'web_link'=>
    'email'=>
    'ref'=>
  }
  'dataset'=>{
    'id'=>
    'completeness'=>
    'description'=>
    'experiment_type'=>
    'maximum_probes'=>
    'order'=>
    'organism'=>
    'platform'=>
    'reference_series'=>
    'title'=>
    'total_samples'=>
    'update_date'=>
    'value_type'=>
  }
  'subset'=>[
    {
      'id'=>
      'description'=>
      'type'=>
      'sample=>[]
    }
  ]		
  'data'=>[
    {
      field => 
      description=>
      src=>
      value=>[]
    }
  ]
}
Wuming Gong
-- 
College of Life Science, 
Wuhan University, China.

_________________________________________________________________
Ãâ·ÑÏÂÔØ MSN Explorer:   http://explorer.msn.com/lccn/  



More information about the Bioperl-l mailing list