Wednesday, July 14, 2010

Base Pairs



The human genome -- as well as the genomes of many species (including Neanderthals!) -- can now be browsed online. I looked for the FOXP2 gene, related to some language production and comprehension disorders in humans, and found it (image above shows it on the lower part of chromosome 7). It is mind-boggling to think that this is possible; when I was an undergraduate the study of human evolution was 99% fossils and 1% genetic: today it's 50/50. In my browsing I searched for and found that FOXP2 is 2,594 base-pairs long; I copied the list, pasted it into Word, deleted the line counters, did a character count, and wow...2,594. Below are the base pair arrangements for FOXP2, followed by a summary. Note that genes are simply arrangements of various molecules; they direct the building of the twenty proteins that make up most life forms. There are four kinds of bases; adenine (a), guanine (g), cytosine (c) and thymine (t).

LOCUS AY144615 2594 bp mRNA linear PRI 02-NOV-2002
DEFINITION Homo sapiens brain forkhead/winged helix transcription factor FOXP2
isoform mRNA, complete cds; alternatively spliced.
ACCESSION AY144615
VERSION AY144615.1 GI:24496248
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 2594)
AUTHORS Guo,J.H., Chen,L. and Yu,L.
TITLE Direct Submission
JOURNAL Submitted (27-AUG-2002) School of Life Sciences, Fudan University,
Institute of Genetics, Handan RD, 220, Shanghai 200433, China
FEATURES Location/Qualifiers
source 1..2594
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/tissue_type="brain"
CDS 370..2592
/note="alternatively spliced"
/codon_start=1
/product="forkhead/winged helix transcription factor FOXP2
isoform"
/protein_id="AAN60016.1"
/db_xref="GI:24496249"
/translation="MMQESATETISNSSMNQNGMSTLSSQLDAGSRDGRSSGDTSSEV
STVELLHLQQQQALQAARQLLLQQQTSGLKSPKSSDKQRPLQELLPETKLCICGHSSG
DGHPHNTFAVPVSVAMMTPQVITPQQMQQILQQQVLSPQQLQALLQQQQAVMLQQQQL
QEFYKKQQEQLHLQLLQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQHP
GKQAKEQQQQQQQQQQLAAQQLVFQQQLLQMQQLQQQQHLLSLQRQGLISIPPGQAAL
PVQSLPQAGLSPAEIQQLWKEVTGVHSMEDNGIKHGGLDLTTNNSSSTTSSNTSKASP
PITHHSIVNGQSSVLSARRDSSSHEETGASHTLYGHGVCKWPGCESICEDFGQFLKHL
NNEHALDDRSTAQCRVQMQVVQQLEIQLSKERERLQAMMTHLHMRPSEPKPSPKPLNL
VSSVTMSKNMLETSPQSLPQTPTTPTAPVTPITQGPSVITPASVPNVGAIRRRHSDKY
NIPMSSEIAPNYEFYKNADVRPPFTYATLIRQAIMESSDRQLTLNEIYSWFTRTFAYF
RRNAATWKNAVRHNLSLHKCFVRVENVKGAVWTVDEVEYQKRRSQKITGSPTLVKNIP
TSLGYGAALNASLQAALAESSLPLLSNPGLINNASSGLLQAVHEDLNGSLDHIDSNGN
SSPGCSPQPHIHSIHVKEEPVIAEDEDCPMSLVTTANHSPELEDDREIEEEPLSEDLE"


Begin


gcttgaacct tgtcacccct cacgtgcaca ccaaagacat accctagtga ttaaatgctg
atttgtgtac gatgtccacg gacgccaaaa caatcacaga gctgcttgat tgttttaatt
atccagcaca aaatgccatc agtctgggac gtgatcgggc agaggtgtac tcacagtagt
gtaaatactg ctgtaaatag tgtctgatgg tggcttgaca gtgagctagc ttctgagttt
tcccttcttt ttatactgtt ttctgtgctg gcttttttga atcttcctaa tttttcatct
ctttaacaaa ctcctatgaa gttgaaaccg ggaagtttgc tctaacattt ccagagaagg
tattaagtca tgatgcagga atctgcgaca gagacaataa gcaacagttc aatgaatcaa
aatggaatga gcactctaag cagccaatta gatgctggca gcagagatgg aagatcaagt
ggtgacacca gctctgaagt aagcacagta gaactgctac atctgcaaca acagcaggct
ctccaggcag caagacaact tcttttacag cagcaaacaa gtggattgaa atctcctaag
agcagtgata aacagagacc actgcaggaa ttgcttccag aaacaaaatt atgtatctgt
ggccactctt ctggtgatgg gcatcctcac aacacatttg cagtgcctgt gtcagtggcc
atgatgactc cccaggtgat cacccctcag caaatgcagc agatccttca gcaacaagtc
ctgtctcctc agcagctaca agcccttctc caacaacagc aggctgtcat gctgcagcag
caacaactac aagagtttta caagaaacag caagagcagt tacatcttca gcttttgcag
cagcagcagc aacagcagca gcagcaacaa cagcagcaac aacagcagca gcaacaacaa
caacaacagc agcaacaaca gcagcagcag cagcaacagc agcagcagca gcaacagcat
cctggaaagc aagcgaaaga gcagcagcag cagcagcagc agcaacagca attggcagcc
cagcagcttg tcttccagca gcagcttctc cagatgcaac aactccagca gcagcagcat
ctgctcagcc ttcagcgtca gggactcatc tccattccac ctggccaggc agcacttcct
gtccaatcgc tgcctcaagc tggcttaagt cctgctgaga ttcagcagtt atggaaagaa
gtgactggag ttcacagtat ggaagacaat ggcattaaac atggagggct agacctcact
actaacaatt cctcctcgac tacctcctcc aacacttcca aagcatcacc accaataact
catcattcca tagtgaatgg acagtcttca gttctaagtg caagacgaga cagctcgtca
catgaggaga ctggggcctc tcacactctc tatggccatg gagtttgcaa atggccaggc
tgtgaaagca tttgtgaaga ttttggacag tttttaaagc accttaacaa tgaacacgca
ttggatgacc gaagcactgc tcagtgtcga gtgcaaatgc aggtggtgca acagttagaa
atacagcttt ctaaagaacg cgaacgtctt caagcaatga tgacccactt gcacatgcga
ccctcagagc ccaaaccatc tcccaaacct ctaaatctgg tgtctagtgt caccatgtcg
aagaatatgt tggagacatc cccacagagc ttacctcaaa cccctaccac accaacggcc
ccagtcaccc cgattaccca gggaccctca gtaatcaccc cagccagtgt gcccaatgtg
ggagccatac gaaggcgaca ttcagacaaa tacaacattc ccatgtcatc agaaattgcc
ccaaactatg aattttataa aaatgcagat gtcagacctc catttactta tgcaactctc
ataaggcagg ctatcatgga gtcatctgac aggcagttaa cacttaatga aatttacagc
tggtttacac ggacatttgc ttacttcagg cgtaatgcag caacttggaa gaatgcagta
cgtcataatc ttagcctgca caagtgtttt gttcgagtag aaaatgttaa aggagcagta
tggactgtgg atgaagtaga ataccagaag cgaaggtcac aaaagataac aggaagtcca
accttagtaa aaaatatacc taccagttta ggctatggag cagctcttaa tgccagtttg
caggctgcct tggcagagag cagtttacct ttgctaagta atcctggact gataaataat
gcatccagtg gcctactgca ggccgtccac gaagacctca atggttctct ggatcacatt
gacagcaatg gaaacagtag tccgggctgc tcacctcagc cgcacataca ttcaatccac
gtcaaggaag agccagtgat tgcagaggat gaagactgcc caatgtcctt agtgacaaca
gctaatcaca gtccagaatt agaagacgac agagagattg aagaagagcc tttatctgaa
gatctggaat gaga

End

"This gene encodes a member of the forkhead/winged-helix (FOX) family of transcription factors. It is expressed in fetal and adult brain as well as in several other organs such as the lung and gut. The protein product contains a FOX DNA-binding domain and a large polyglutamine tract and is an evolutionarily conserved transcription factor, which may bind directly to approximately 300 to 400 gene promoters in the human genome to regulate the expression of a variety of genes. This gene is required for proper development of speech and language regions of the brain during embryogenesis, and may be involved in a variety of biological pathways and cascades that may ultimately influence language development. Mutations in this gene cause speech-language disorder 1 (SPCH1), also known as autosomal dominant speech and language disorder with orofacial dyspraxia. Multiple alternative transcripts encoding different isoforms have been identified in this gene."

1 comment:

Anonymous said...

I would like to exchange links with your site amphibianadventures.blogspot.com
Is this possible?