Access any flatfile from NCBI (The NCBI home page is http://www.ncbi.nlm.nih.gov ). Decode every information given in the accessed file
• What is the first line indicating
• What is the nature of the sequence
• Identify the version
• Is the data you have accessed is coding sequences or open reading frame? Which is the start and stop codon?
• Has it got untranslated regions?
• Has it been linked to the protein database? If connected, how many amino acids? What is the accession number?
• Is the information published?
The NCBI Reference Sequence (RefSeq) project provides sequence records and related information for numerous organisms, and provides a baseline for medical, functional, and comparative studies
The distinct accession number format, which begins with two characters followed by an underscore (e.g., NP_), is the most distinguishing feature of a RefSeq record. An underscore is never included in an INSDC accession number.
NCBI creates and updates RefSeq records from sequence data available through the INSDC.
While this is frequently true for genes with very limited sequence data, reference sequence records are not intended to reflect the historical 'first sequenced' record of a gene. Until the RefSeq record is completely checked, PROVISIONAL records may be automatically revised to use a longer INSDC source nucleotide sequence that becomes accessible.
In the COMMENT field of the flat file record, all INSDC submissions used to create a RefSeq are specified.
The GPX3 gene (GeneID 2878) produces a protein that contains selenocysteine as an amino acid. The codon ‘tga,' which is commonly read as a stop codon, encodes selenocysteine. The COMMENT block of NM 002084.3 displays the RefSeq Attribute ‘protein contains selenocysteine,' and the translation exception qualifier on the CDS function specifies the position of the stop codon that encodes selenocysteine (which appears as a ‘U' in the amino acid sequence). The position of the selenocysteine codon or amino acid residue is also annotated as a misc feature or Site feature for transcripts and proteins in the Bilateria community.
Refseq information is maintained, curated and published and is in the public domain.