Annotated genes and genome data, from the FlyBase Chado Annotation Database are available thru searches, maps and reports at Genome Annotations & Sequences
Notes
Drosophila melanogaster genome annotation release 3.2.0 date 03162004
DATA CONTENTS
Feature counts in release 3.2 (r320, March 2004)
compared to release 3.1 (Dec 2003, r310d and Spring 2003, r310g)
Feature r320 r310d r310g
-----------------------------------------------------------------
BAC 949 949 --
CDS 18746 18109 18122
DNA_motif 5 0 0
EST 304257 302509 --
aberration_junction 87 0 0
cDNA_clone 10204 10197 --
enhancer 27 0 0
gene 13473 13369 13377
insertion_site 424 0 0
mRNA 18810 18153 18122
mature_peptide 8 0 0
ncRNA 65 95 60
oligonucleotide 193813 193168 --
point_mutation 476 0 0
polyA_site 101 0 0
processed_transcript 16748 14677 --
protein 233812 211135 --
protein_binding_site 85 0 0
pseudogene 39 17 19
rRNA 85 0 0
region 28 0 0
regulatory_region 136 0 0
repeat_region 3390 3021 --
rescue_fragment 135 0 0
segment 437 437 437
sequence_variant 225 0 0
signal_peptide 1 0 0
snRNA 28 28 28
snoRNA 28 28 28
so 16244 14334 0
tRNA 288 288 288
transcription_start_site 16997 16832 --
transposable_element 1567 1571 1508
transposable_element_inserti.. 4566 4346 --
-----------------------------------------------------------------
-- data unavailable for this feature
Data are taken from Postgres Chado database, release 3.2.0 date 03162004
Copy at ftp://flybase.net/genomes/Drosophila_melanogaster/
dmel_r3.2.0_03162004/pgsql/chado_r3.2_19.gz, Mar 16 2004
WEB FUNCTIONS
Updates to data, with some software changes, for
-- Gene annotation reports - updated and extended symbols, synonyms, IDs,
annotation notes. Other Features added.
-- Genome maps (gbrowse) - added new feature types
-- Sequence reports -- new features mat/signal peptides, etc.
See http://flybase.net/annot/
SYMBOLS and IDS
Symbols and IDs for annotations in this release have been updated
to close correspondence with gene data.
The transcript and translation/CDS symbols and IDs for FlyBase
have changed some over last year. An annotation has an ID
of CG00000 (with a corresponding FBan00000 which is being de-emphasized),
Its mRNA and CDS have -Rx and -Px suffixes respectively, where letter
'x' extends to as many variants as found.
In the release 3.2, the standard symbols for gene annotation CG00000
have been replaced with accepted gene name (where available), thus
CG8094, CG8094-RA, CG8094-PA become gene 'Hex-C', Hex-C-RA, Hex-C-PA.
The CG8094 ID is supported as a more computable alternative to this
symbolic name, but will be less visible than the more consistant and
memorable gene names There is still some quandry in data files about
when to use 'Hex-C-PA' or CG8094-PA.
BULK FILE SET
See ftp://flybase.net/genomes/Drosophila_melanogaster/current/
blast/ - updated NCBI blast database set for transcripts, translations and transposons
dna/ - contains dna in fasta and/or raw format files per chromosome-arm; no change from release 3 data.
fasta/ - dna and protein data per chromosome and feature type
feats-all/ - intermediate files of all feature locations in tabular format
gff/ - GFF v2 standard feature files per chromosome
gnomap/ - Gnomap standard feature files per chromosome (drive genome map views)
pgsql/ - Postgres Chado database dump, source of most of these files
srs/ - SRS search indices
fbobs/ - Acode format annotation object data files for web services
xml-chado/ - Chado format XML database output of genes, dna and other features, per scaffold
xml-game/ - GAME format XML database output of genes, dna and other features, per scaffold
Bulk files compared to those of release 3.1:
whole_genome_* -- create by catenating each chr file set
heterochromatin_* and (2h,3h,Xh,Yh,U) -- 'heterosomes' to be added
euchromatin_* -- create by catenating each chr file set, excluding 'heterosomes'
per chromosome set
2L_3_UTR, 2L_5_UTR == dmel_2L_three_prime_UTR, dmel_2L_five_prime_UTR
2L_CDS == dmel_2L_CDS
2L_annotation == catenate dmel_2L_gene with (tRNA,miscRNA,transposon) set
2L_annotation_extend5000 == dmel_2L_gene_extended5000, minus (tRNA,miscRNA,transposon) set
2L_annotation_extend2000 .. not planned
2L_annotation_extend500 .. not planned
2L_exon .. not planned
2L_genomic == dmel_2L_chromosome (chromosome arm dna, same as rel3.1)
2L_genomic_scaffolds == dmel_2L_scaffolds (segment dna, same as rel3.1)
2L_intron .. not planned
2L_masked_genomic .. not planned
2L_noncoding-gene == catenate (tRNA,miscRNA,transposon,pseudogene)
2L_protein-coding-gene == dmel_2L_gene
2L_splice_site .. not planned
2L_tRNA == dmel_2L_tRNA
2L_transcript == dmel_2L_transcript
2L_translation == dmel_2L_translation (curated translations)
2L_transposable_element == dmel_2L_transposon
2L_unique_intergenic .. not planned
2L_unique_intron .. not planned
Not in past release: dmel_2L_miscRNA dmel_2L_pseudogene
File name format:
$org_$chr_$feature_$release.$format
$org in (dmel)
$chr in (2L 2R 3L 3R X 4), (2h 3h Xh Yh U)
$feature in (
gene, mRNA, CDS, CDS-translation,
transposon/transposable_element, pseudogene,
tRNA, miscRNA=ncRNA,snRNA,snoRNA,rRNA
gene-extended5000
chromosome-arm
scaffold
)
$release in (
r3.1.0g (gadfly, summer 2003 )
r3.1.0d (chado r3.1.0_12182003)
r3.2.0a (chado r3.2.0_12052003)
r3.2.0c (chado r3.2.0_03162004)
)
$format in (
.fasta(.gz)
.gff(.gz)
.chado.xml(.gz)
.game.xml(.gz)
)
ANNOTATION RELEASE 3.1 HOLD-OVERS
ftp://flybase.net/genomes/Drosophila_melanogaster/dmel_RELEASE3-1/
Annotations_and_Evidence/ GFF/ blastdb/
FASTA/ README
Annotations_and_Evidence/
------
>>> euchromatic scaffolds, updated in r3.2 release
AE002603.xml.gz
..
AE003847.xml.gz
>>> heterochromatin and centromere scaffolds - no r3.2 equivalent yet
AABU01000058.xml.gz
..
AABU01002775.xml.gz
2L_wgs3_centromere_extension.xml.gz
2R_wgs3_centromere_extension.xml.gz
3L_wgs3_centromere_extension.xml.gz
3R_wgs3_centromere_extension.xml.gz
X_wgs3_centromere_extensionB.xml.gz
linked_1.xml.gz
linked_2.xml.gz
linked_3.xml.gz
linked_4.xml.gz
linked_5.xml.gz
linked_6.xml.gz
linked_7.xml.gz
FASTA
-------------
Heterochromatin sections are not yet available for r3.2
2h, 3H, Xh, Yh, U (heterochromatin, unclassified)
Block dna (fasta) sections are identical for r3.2
scaffolds, genomic, masked_genomic
CHADO DATABASE LOOKUP SERVICE
SERVICE URL
http://flybase.net/apollo-cgi/chado2apollo.cgi
Information and software at
http://bugbane.bio.indiana.edu:7092/apollo/
EXAMPLES
http://flybase.net/apollo-cgi/chado2apollo.cgi?scaffold=AE003650
http://flybase.net/apollo-cgi/chado2apollo.cgi?gene=cact
http://flybase.net/apollo-cgi/chado2apollo.cgi?range=2L:300000-310000
http://flybase.net/apollo-cgi/chado2apollo.cgi?band=34A
This provides support for Apollo genome browser/editor,
returning GAME XML gene and genome objects in response
to basic queries of 'gene' name/ID, 'scaffold' section,
genome base 'range' or cytological 'band'.
It currently works well for scaffold chunks of data, using pre-generated
XML. But it is very slow (5 - 10 minutes) at generating XML, for the
gene region queries. The default operation now returns pre-generated
scaffolds to any query. We will work to improve this.
Send comments to us at
flybase-help AT morgan.harvard.edu