Drosophila melanogaster genome annotation release 3.2.0 date 03162004 DATA CONTENTS Feature counts in release 3.2 (r320, March 2004) compared to release 3.1 (Dec 2003, r310d and Spring 2003, r310g) Feature r320 r310d r310g ----------------------------------------------------------------- BAC 949 949 -- CDS 18746 18109 18122 DNA_motif 5 0 0 EST 304257 302509 -- aberration_junction 87 0 0 cDNA_clone 10204 10197 -- enhancer 27 0 0 gene 13473 13369 13377 insertion_site 424 0 0 mRNA 18810 18153 18122 mature_peptide 8 0 0 ncRNA 65 95 60 oligonucleotide 193813 193168 -- point_mutation 476 0 0 polyA_site 101 0 0 processed_transcript 16748 14677 -- protein 233812 211135 -- protein_binding_site 85 0 0 pseudogene 39 17 19 rRNA 85 0 0 region 28 0 0 regulatory_region 136 0 0 repeat_region 3390 3021 -- rescue_fragment 135 0 0 segment 437 437 437 sequence_variant 225 0 0 signal_peptide 1 0 0 snRNA 28 28 28 snoRNA 28 28 28 so 16244 14334 0 tRNA 288 288 288 transcription_start_site 16997 16832 -- transposable_element 1567 1571 1508 transposable_element_inserti.. 4566 4346 -- ----------------------------------------------------------------- -- data unavailable for this feature Data are taken from Postgres Chado database, release 3.2.0 date 03162004 Copy at ftp://flybase.net/genomes/Drosophila_melanogaster/ dmel_r3.2.0_03162004/pgsql/chado_r3.2_19.gz, Mar 16 2004 WEB FUNCTIONS Updates to data, with some software changes, for -- Gene annotation reports - updated and extended symbols, synonyms, IDs, annotation notes. Other Features added. -- Genome maps (gbrowse) - added new feature types -- Sequence reports -- new features mat/signal peptides, etc. See http://flybase.net/annot/ SYMBOLS and IDS Symbols and IDs for annotations in this release have been updated to close correspondence with gene data. The transcript and translation/CDS symbols and IDs for FlyBase have changed some over last year. An annotation has an ID of CG00000 (with a corresponding FBan00000 which is being de-emphasized), Its mRNA and CDS have -Rx and -Px suffixes respectively, where letter 'x' extends to as many variants as found. In the release 3.2, the standard symbols for gene annotation CG00000 have been replaced with accepted gene name (where available), thus CG8094, CG8094-RA, CG8094-PA become gene 'Hex-C', Hex-C-RA, Hex-C-PA. The CG8094 ID is supported as a more computable alternative to this symbolic name, but will be less visible than the more consistant and memorable gene names There is still some quandry in data files about when to use 'Hex-C-PA' or CG8094-PA. BULK FILE SET See ftp://flybase.net/genomes/Drosophila_melanogaster/current/ blast/ - updated NCBI blast database set for transcripts, translations and transposons dna/ - contains dna in fasta and/or raw format files per chromosome-arm; no change from release 3 data. fasta/ - dna and protein data per chromosome and feature type feats-all/ - intermediate files of all feature locations in tabular format gff/ - GFF v2 standard feature files per chromosome gnomap/ - Gnomap standard feature files per chromosome (drive genome map views) pgsql/ - Postgres Chado database dump, source of most of these files srs/ - SRS search indices fbobs/ - Acode format annotation object data files for web services xml-chado/ - Chado format XML database output of genes, dna and other features, per scaffold xml-game/ - GAME format XML database output of genes, dna and other features, per scaffold Bulk files compared to those of release 3.1: whole_genome_* -- create by catenating each chr file set heterochromatin_* and (2h,3h,Xh,Yh,U) -- 'heterosomes' to be added euchromatin_* -- create by catenating each chr file set, excluding 'heterosomes' per chromosome set 2L_3_UTR, 2L_5_UTR == dmel_2L_three_prime_UTR, dmel_2L_five_prime_UTR 2L_CDS == dmel_2L_CDS 2L_annotation == catenate dmel_2L_gene with (tRNA,miscRNA,transposon) set 2L_annotation_extend5000 == dmel_2L_gene_extended5000, minus (tRNA,miscRNA,transposon) set 2L_annotation_extend2000 .. not planned 2L_annotation_extend500 .. not planned 2L_exon .. not planned 2L_genomic == dmel_2L_chromosome (chromosome arm dna, same as rel3.1) 2L_genomic_scaffolds == dmel_2L_scaffolds (segment dna, same as rel3.1) 2L_intron .. not planned 2L_masked_genomic .. not planned 2L_noncoding-gene == catenate (tRNA,miscRNA,transposon,pseudogene) 2L_protein-coding-gene == dmel_2L_gene 2L_splice_site .. not planned 2L_tRNA == dmel_2L_tRNA 2L_transcript == dmel_2L_transcript 2L_translation == dmel_2L_translation (curated translations) 2L_transposable_element == dmel_2L_transposon 2L_unique_intergenic .. not planned 2L_unique_intron .. not planned Not in past release: dmel_2L_miscRNA dmel_2L_pseudogene File name format: $org_$chr_$feature_$release.$format $org in (dmel) $chr in (2L 2R 3L 3R X 4), (2h 3h Xh Yh U) $feature in ( gene, mRNA, CDS, CDS-translation, transposon/transposable_element, pseudogene, tRNA, miscRNA=ncRNA,snRNA,snoRNA,rRNA gene-extended5000 chromosome-arm scaffold ) $release in ( r3.1.0g (gadfly, summer 2003 ) r3.1.0d (chado r3.1.0_12182003) r3.2.0a (chado r3.2.0_12052003) r3.2.0c (chado r3.2.0_03162004) ) $format in ( .fasta(.gz) .gff(.gz) .chado.xml(.gz) .game.xml(.gz) ) ANNOTATION RELEASE 3.1 HOLD-OVERS ftp://flybase.net/genomes/Drosophila_melanogaster/dmel_RELEASE3-1/ Annotations_and_Evidence/ GFF/ blastdb/ FASTA/ README Annotations_and_Evidence/ ------ >>> euchromatic scaffolds, updated in r3.2 release AE002603.xml.gz .. AE003847.xml.gz >>> heterochromatin and centromere scaffolds - no r3.2 equivalent yet AABU01000058.xml.gz .. AABU01002775.xml.gz 2L_wgs3_centromere_extension.xml.gz 2R_wgs3_centromere_extension.xml.gz 3L_wgs3_centromere_extension.xml.gz 3R_wgs3_centromere_extension.xml.gz X_wgs3_centromere_extensionB.xml.gz linked_1.xml.gz linked_2.xml.gz linked_3.xml.gz linked_4.xml.gz linked_5.xml.gz linked_6.xml.gz linked_7.xml.gz FASTA ------------- Heterochromatin sections are not yet available for r3.2 2h, 3H, Xh, Yh, U (heterochromatin, unclassified) Block dna (fasta) sections are identical for r3.2 scaffolds, genomic, masked_genomic CHADO DATABASE LOOKUP SERVICE SERVICE URL http://flybase.net/apollo-cgi/chado2apollo.cgi Information and software at http://bugbane.bio.indiana.edu:7092/apollo/ EXAMPLES http://flybase.net/apollo-cgi/chado2apollo.cgi?scaffold=AE003650 http://flybase.net/apollo-cgi/chado2apollo.cgi?gene=cact http://flybase.net/apollo-cgi/chado2apollo.cgi?range=2L:300000-310000 http://flybase.net/apollo-cgi/chado2apollo.cgi?band=34A This provides support for Apollo genome browser/editor, returning GAME XML gene and genome objects in response to basic queries of 'gene' name/ID, 'scaffold' section, genome base 'range' or cytological 'band'. It currently works well for scaffold chunks of data, using pre-generated XML. But it is very slow (5 - 10 minutes) at generating XML, for the gene region queries. The default operation now returns pre-generated scaffolds to any query. We will work to improve this.