

These files come in a variety of formats and may specify coordinates relative to different genome versions, thus requiring significant processing. Access to these datasets is offered via various web interfaces, or as voluminous downloadable files.


Researchers therefore need to compare observed SNVs with data from several different sources to ascertain whether a variant is novel or known, and to determine the population frequencies of the alleles. The recent explosion of genomic sequencing projects have identified vast numbers of variants that have not yet been incorporated to dbSNP, a trend expected to increase as genomic sequencing becomes more economical. Depending on the type and extent of sequencing done, a physician or medical researcher may need to test from just a few to potentially a very large number of observed genome variants, most of which are single nucleotide variants (SNVs).ĭbSNP is a freely available, periodically updated general catalog of genome variation ( Sherry et al., 2001). Given the list of personal variations, one of the first analytical tasks is thus to determine, for each variant, whether it has already been observed in humans.

Variants frequently observed in other personal genomes are less likely to be random artifacts. When family genomes are available, over half of the errors may be identified by inheritance state analysis ( Roach et al., 2010). A fraction of the personal variants observed are false positive artifacts of random sequencing error or cell line mutations, which may confound the search for disease-causing mutations. The advent of personalized systems medicine ( Auffray et al., 2009) will be predicated on the availability of precise genomic information for each patient, which will be gleaned by genotyping of known variants, by exome and transcriptome sequencing and through whole-genome sequencing. The database is also provided.Ĭontact: Information: Supplementary data are available at Bioinformatics online. Kaviar may be used online as a programmatic web service or downloaded for local use from.
ACCESSIBLE SYSTEMS SOFTWARE
Kaviar includes: (i) an integrated and growing database of genomic variation from diverse sources, including over 55 million variants from personal genomes, family genomes, transcriptomes, SNV databases and population surveys and (ii) software for querying the database efficiently.Īvailability: Kaviar is programmed in Perl and offered free of charge as Open Source Software. We present here Kaviar, a tool that greatly simplifies the assessment of novel variants. Alternatively, researchers can upload data to online tools, which may conflict with privacy requirements. This task requires downloading and handling large and diverse datasets from a variety of sources, and processing them with bioinformatics tools and pipelines. Summary: With the rapidly expanding availability of data from personal genomes, exomes and transcriptomes, medical researchers will frequently need to test whether observed genomic variants are novel or known.
