Fun Stuff > CHATTER
Learning has occurred
Barmymoo:
Paul, that does make more sense, but now I realise that I don't really understand what a genome is and how you can store one. Are you storing descriptions of them, as data on a computer? Presumably you're not somehow stuffing little tubs of DNA samples into a computer shell.
pwhodges:
@LTK:
Much computer storage is in binary, though, especially with huge datasets; if the code were stored in ASCII, then it would compress dramatically using common algorithms, but I know from an article on the subject I've just read that gzip compresses a conventionally stored genome by only about 35%.
@May:
The genome is the actual sequence of bases (A, T, C, G) in all the DNA of the individual; i.e. the definition of all the genes on all the chromosomes (I don't know if mitochondrial DNA is included). Essentially it is the complete chemical formula of the whole lot. The term "sequencing" DNA is used, because the process is determining that sequence.
LTK:
The article compares the genome to Windows XP when it is configured and installed, but a better comparison would be with the installation CD for Windows XP.
Barmymoo:
Right, I think I understand up to my capacity for understanding stuff like this! I struggle with molecular biology and really any form of science I can't see, because I don't have the tools for thinking about it. But we briefly covered DNA at university in a single one hour lecture, so I at least recognise the words...
Aimless:
--- Quote from: pwhodges on 13 Jan 2014, 05:28 ---Actually, I don't know where that article got the 750MB for the human genome - I'm just working on pricing up a computer system based on a requirement of 250GB per genome (we're looking at 10,000 genomes, so 2.5 Exabytes).
--- End quote ---
While the size of the average human genome may indeed in some way somewhere sometimes be ca. 750MB, the hardware will be used to handle extremely large datasets required for sequencing, analysing, aligning and backing up millions of overlapping snippets of DNA that, taken as a whole, account for many copies of several (in the case of pharmaceutival research, perhaps hundreds?) variants (rather than single straightforward genomes).
Hardware-gobblers:
http://en.wikipedia.org/wiki/Shotgun_sequencing#Coverage
http://en.wikipedia.org/wiki/Sequence_assembly
Data requirements:
http://www.avadis-ngs.com/support/ngs-data-storage-requirements
Overview of one method:
http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29/De_novo_assembly#Comparing_datasets
Horrible gargantuan files:
http://en.wikipedia.org/wiki/SAMtools
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version