genotype              package:genetics              R Documentation

_G_e_n_o_t_y_p_e _o_r _H_a_p_l_o_t_y_p_e _O_b_j_e_c_t_s.

_D_e_s_c_r_i_p_t_i_o_n:

     'genotype' creates a genotype object.

     'haplotype' creates a haplotype object.

     'is.genotype' returns 'TRUE' if 'x' is of class 'genotype'

     'is.haplotype' returns 'TRUE' if 'x' is of class 'haplotype'

     'as.genotype' attempts to coerce its argument into an object of
     class 'genotype'.

     'as.genotype.allele.count' converts allele counts (0,1,2) into
     genotype pairs ("A/A", "A/B", "B/B").

     'as.haplotype' attempts to coerce its argument into an object of
     class 'haplotype'.

     'nallele' returns the number of alleles in an object of class
     'genotype'.

_U_s_a_g_e:

       genotype(a1, a2=NULL, alleles=NULL, sep="/", remove.spaces=TRUE,
                reorder = c("yes", "no", "default", "ascii", "freq"),
                allow.partial.missing=FALSE, locus=NULL,
                genotypeOrder=NULL)

       haplotype(a1, a2=NULL, alleles=NULL, sep="/", remove.spaces=TRUE,
                 reorder="no", allow.partial.missing=FALSE, locus=NULL,
                 genotypeOrder=NULL)

       is.genotype(x)

       is.haplotype(x)

       as.genotype(x, ...)

       as.genotype.allele.count(x, alleles=c("A","B"), ... )

       as.haplotype(x, ...)

       ## S3 method for class 'genotype':
       print(x, ...)

       nallele(x)

_A_r_g_u_m_e_n_t_s:

       x: either an object of class 'genotype' or 'haplotype' or an
          object to be converted to class 'genotype' or 'haplotype'.

   a1,a2: vector(s) or matrix containing two alleles for each
          individual. See details, below.

 alleles: names (and order if 'reorder="yes"') of possible alleles.

     sep: character separator or column number used to divide alleles
          when 'a1' is a vector of strings where each string holds both
          alleles. See below for details.

remove.spaces: logical indicating whether spaces and tabs will be
          removed from a1 and a2  before processing.

 reorder: how should alleles within an individual be reordered. If
          'reorder="no"', use the order specified by the alleles
          parameter.  If 'reorder="freq"' or 'reorder="yes"', sort
          alleles within each individual by observed frequency.  If
          'reorder="ascii"', reorder alleles in ASCII order
          (alphabetical, with all upper case before lower case). The
          default value for 'genotype' is '"freq"'.  The default value
          for 'haplotype' is '"no"'. 

allow.partial.missing: logical indicating whether one allele is
          permitted to be missing.  When set to 'FALSE' both alleles
          are set to 'NA' when either is missing.

   locus: object of class locus, gene, or marker, holding information
          about the source of this genotype.

genotypeOrder: character, vector of genotype/haplotype names so that
          further functions can sort genotypes/haplotypes in wanted
          order

     ...: optional arguments

_D_e_t_a_i_l_s:

     Genotype objects hold information on which gene or marker alleles
     were observed for different individuals.  For each individual, two
     alleles are recorded.

     The genotype class considers the stored alleles to be unordered,
     i.e., "C/T" is equivalent to "T/C".  The haplotype class considers
     the order of the alleles to be significant so that "C/T" is
     distinct from "T/C".

     When calling 'genotype' or 'haplotype':


        *  If only 'a1' is provided and is a character vector, it is
           assumed that each element encodes both alleles. In this
           case, if 'sep' is a character string, 'a1' is assumed to be
           coded as "Allele1<sep>Allele2".  If 'sep' is a numeric
           value, it is assumed that character locations '1:sep'
           contain allele 1 and that remaining locations contain allele
           2.

        *  If 'a1' is a matrix, it is assumed that column 1 contains
           allele 1 and column 2 contains allele 2.

        *  If 'a1' and 'a2' are both provided, each is assumed to
           contain one allele value so that the genotype for an
           individual is obtained by 'paste(a1,a2,sep="/")'.


     If 'remove.spaces' is TRUE, (the default) any whitespace contained
     in 'a1' and 'a2' is removed when the genotypes are created.  If
     whitespace is used as the separator, (eg "C C", "C T", ...), be
     sure to set remove.spaces to FALSE.

     When the alleles are explicitly specified using the 'alleles'
     argument, all potential alleles not present in the list will be
     converted to 'NA'.

     NOTE: 'genotype' assumes that the order of the alleles is not
     important (E.G., "A/C" == "C/A").  Use class 'haplotype' if order
     is significant.

     If 'genotypeOrder=NULL' (the default setting), then
     'expectedGenotypes' is used to get standard sorting order. Only
     unique values in 'genotypeOrder' are used, which in turns means
     that the first occurrence prevails. When 'genotypeOrder' is given
     some genotype names, but not all that appear in the data, the rest
     (those in the data and possible combinations based on allele
     variants) is automatically added at the end of 'genotypeOrder'.
     This puts "missing" genotype names at the end of sort order. This
     feature is especially useful when there are a lot of allele
     variants and especially in haplotypes. See examples.

_V_a_l_u_e:

     The genotype class extends "factor" and haplotype extends
     genotype. Both classes have the following attributes: 

  levels: character vector of possible genotype/haplotype values stored
          coded by 'paste( allele1, "/", allele2, sep="")'.

allele.names: character vector of possible alleles. For a SNP, these
          might be c("A","T").   For a variable length dinucleotyde
          repeat this might be c("136","138","140","148"). 

allele.map: matrix encoding how the factor levels correspond to
          alleles.  See the source code to 'allele.genotype()' for how
          to extract allele values using this matrix.  Better yet, just
          use 'allele.genotype()'.

genotypeOrder: character, genotype/haplotype names in defined order
          that can used for sorting in various functions. Note that
          this slot stores both ordered and unordered genotypes i.e.
          "A/B" and "B/A".

_A_u_t_h_o_r(_s):

     Gregory R. Warnes warnes@bst.rochester.edu and Friedrich Leisch.

_S_e_e _A_l_s_o:

     'HWE.test', 'allele', 'homozygote', 'heterozygote', 'carrier',
     'summary.genotype', 'allele.count', 'sort.genotype',
     'genotypeOrder', 'locus', 'gene', 'marker', and '%in%' for default
     %in% method

_E_x_a_m_p_l_e_s:

     # several examples of genotype data in different formats
     example.data   <- c("D/D","D/I","D/D","I/I","D/D",
                         "D/D","D/D","D/D","I/I","")
     g1  <- genotype(example.data)
     g1

     example.data2  <- c("C-C","C-T","C-C","T-T","C-C",
                         "C-C","C-C","C-C","T-T","")
     g2  <- genotype(example.data2,sep="-")
     g2

     example.nosep  <- c("DD", "DI", "DD", "II", "DD",
                         "DD", "DD", "DD", "II", "")
     g3  <- genotype(example.nosep,sep="")
     g3

     example.a1 <- c("D",  "D",  "D",  "I",  "D",  "D",  "D",  "D",  "I",  "")
     example.a2 <- c("D",  "I",  "D",  "I",  "D",  "D",  "D",  "D",  "I",  "")
     g4  <- genotype(example.a1,example.a2)
     g4

     example.mat <- cbind(a1=example.a1, a1=example.a2)
     g5  <- genotype(example.mat)
     g5

     example.data5  <- c("D   /   D","D   /   I","D   /   D","I   /   I",
                         "D   /   D","D   /   D","D   /   D","D   /   D",
                         "I   /   I","")
     g5  <- genotype(example.data5,rem=TRUE)
     g5

     # show how genotype and haplotype differ
     data1 <- c("C/C", "C/T", "T/C")
     data2 <- c("C/C", "T/C", "T/C")

     test1  <- genotype( data1 )
     test2  <- genotype( data2 )

     test3  <-  haplotype( data1 )
     test4  <-  haplotype( data2 )

     test1==test2
     test3==test4

     test1=="C/T"
     test1=="T/C"

     test3=="C/T"
     test3=="T/C"

     ## also
     test1 
     test1 
     test3 

     test1 
     test1 

     test3 
     test3 

     ## "Messy" example

     m3  <-  c("D D/\t   D D","D\tD/   I",  "D D/   D D","I/   I",
               "D D/   D D","D D/   D D","D D/   D D","D D/   D D",
               "I/   I","/   ","/I")

     genotype(m3)
     summary(genotype(m3))

     m4  <-  c("D D","D I","D D","I I",
               "D D","D D","D D","D D",
               "I I","   ","  I")

     genotype(m4,sep=1)
     genotype(m4,sep=" ",remove.spaces=FALSE)
     summary(genotype(m4,sep=" ",remove.spaces=FALSE))

     m5  <-  c("DD","DI","DD","II",
               "DD","DD","DD","DD",
               "II","   "," I")
     genotype(m5,sep=1)
     haplotype(m5,sep=1,remove.spaces=FALSE)

     g5  <- genotype(m5,sep="")
     h5  <- haplotype(m5,sep="")

     heterozygote(g5)
     homozygote(g5)
     carrier(g5,"D")

     g5[9:10]  <- haplotype(m4,sep=" ",remove=FALSE)[1:2]
     g5

     g5[9:10]
     allele(g5[9:10],1)
     allele(g5,1)[9:10]

     # drop unused alleles
     g5[9:10,drop=TRUE]
     h5[9:10,drop=TRUE]

     # Convert allele.counts into genotype

     x <- c(0,1,2,1,1,2,NA,1,2,1,2,2,2)
     g <- as.genotype.allele.count(x, alleles=c("C","T") )
     g

     # Use of genotypeOrder
     example.data   <- c("D/D","D/I","I/D","I/I","D/D",
                         "D/D","D/I","I/D","I/I","")
     summary(genotype(example.data))
     genotypeOrder(genotype(example.data))

     summary(genotype(example.data, genotypeOrder=c("D/D", "I/I", "D/I")))
     summary(genotype(example.data, genotypeOrder=c(              "D/I")))
     summary(haplotype(example.data, genotypeOrder=c(             "I/D", "D/I")))
     example.data <- genotype(example.data)
     genotypeOrder(example.data) <- c("D/D", "I/I", "D/I")
     genotypeOrder(example.data)

