groupGenotype            package:genetics            R Documentation

_G_r_o_u_p _g_e_n_o_t_y_p_e _v_a_l_u_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'groupGenotype' groups genotype or haplotype values according to
     given "grouping/mapping" information

_U_s_a_g_e:

     groupGenotype(x, map, haplotype=FALSE, factor=TRUE, levels=NULL, verbose=FALSE)

_A_r_g_u_m_e_n_t_s:

       x: genotype or haplotype

     map: list, mapping information, see details and examples

haplotype: logical, should values in a 'map' be treated as haplotypes
          or genotypes, see details

  factor: logical, should output be a factor or a character

  levels: character, optional vector of level names if factor is
          produced ('factor=TRUE'); the default is to use the sort
          order of the group names in 'map'

 verbose: logical, print genotype names that match entries in the map -
          mainly used for debugging

_D_e_t_a_i_l_s:

     Examples show how 'map' can be constructed. This are the main
     points to be aware of:

        *  names of list components are used as new group names

        *  list components hold genotype names per each group

        *  genotype names can be specified directly i.e. "A/B" or
           abbreviated such as "A/*" or even "*/*", where "*" matches
           any possible allele, but read also further on

        *  all genotype names that are not specified can be captured
           with ".else" (note the dot!)

        *  genotype names that were not specified (and ".else" was not
           used) are changed to 'NA'

     'map' is inspected before grouping of genotypes is being done. The
     following steps are done during inspection:

        *  ".else" must be at the end (if not, it is moved) to match
           everything that has not yet been defined

        *  any specifications like "A/*", "*/A", or "*/*" are extended
           to all possible genotypes based on alleles in argument
           'alleles' - in case of 'haplotype=FALSE', "A/*" and "*/A"
           match the same genotypes

        *  since use of "*" and ".else" can cause duplicates along the
           whole map, duplicates are removed sequentially (first
           occurrence is kept)

     Using ".else" or "*/*" at the end of the map produces the same
     result, due to removing duplicates sequentially.

_V_a_l_u_e:

     A factor or character vector with genotypes grouped

_A_u_t_h_o_r(_s):

     Gregor Gorjanc

_S_e_e _A_l_s_o:

     'genotype', 'haplotype', 'factor', and 'levels'

_E_x_a_m_p_l_e_s:

     ## --- Setup ---

     x <- c("A/A", "A/B", "B/A", "A/C", "C/A", "A/D", "D/A",
            "B/B", "B/C", "C/B", "B/D", "D/B",
            "C/C", "C/D", "D/C",
            "D/D")
     g <- genotype(x, reorder="yes")
     ## "A/A" "A/B" "A/B" "A/C" "A/C" "A/D" "A/D" "B/B" "B/C" "B/C" "B/D" "B/D"
     ## "C/C" "C/D" "C/D" "D/D"

     h <- haplotype(x)
     ## "A/A" "A/B" "B/A" "A/C" "C/A" "A/D" "D/A" "B/B" "B/C" "C/B" "B/D" "D/B"
     ## "C/C" "C/D" "D/C" "D/D"

     ## --- Use of "A/A", "A/*" and ".else" ---

     map <- list("homoG"=c("A/A", "B/B", "C/C", "D/D"),
                 "heteroA*"=c("A/B", "A/C", "A/D"),
                 "heteroB*"=c("B/*"),
                 "heteroRest"=".else")

     (tmpG <- groupGenotype(x=g, map=map, factor=FALSE))
     (tmpH <- groupGenotype(x=h, map=map, factor=FALSE, haplotype=TRUE))

     ## Show difference between genotype and haplotype treatment
     cbind(as.character(h), gen=tmpG, hap=tmpH, diff=!(tmpG == tmpH))
     ##              gen          hap          diff
     ##  [1,] "A/A" "homoG"      "homoG"      "FALSE"
     ##  [2,] "A/B" "heteroA*"   "heteroA*"   "FALSE"
     ##  [3,] "B/A" "heteroA*"   "heteroB*"   "TRUE"
     ##  [4,] "A/C" "heteroA*"   "heteroA*"   "FALSE"
     ##  [5,] "C/A" "heteroA*"   "heteroRest" "TRUE"
     ##  [6,] "A/D" "heteroA*"   "heteroA*"   "FALSE"
     ##  [7,] "D/A" "heteroA*"   "heteroRest" "TRUE"
     ##  [8,] "B/B" "homoG"      "homoG"      "FALSE"
     ##  [9,] "B/C" "heteroB*"   "heteroB*"   "FALSE"
     ## [10,] "C/B" "heteroB*"   "heteroRest" "TRUE"
     ## [11,] "B/D" "heteroB*"   "heteroB*"   "FALSE"
     ## [12,] "D/B" "heteroB*"   "heteroRest" "TRUE"
     ## [13,] "C/C" "homoG"      "homoG"      "FALSE"
     ## [14,] "C/D" "heteroRest" "heteroRest" "FALSE"
     ## [15,] "D/C" "heteroRest" "heteroRest" "FALSE"
     ## [16,] "D/D" "homoG"      "homoG"      "FALSE"

     map <- list("withA"="A/*", "rest"=".else")
     groupGenotype(x=g, map=map, factor=FALSE)
     ##  [1] "withA" "withA" "withA" "withA" "withA" "withA" "withA" "rest"  "rest"
     ## [10] "rest"  "rest"  "rest"  "rest"  "rest"  "rest"  "rest"

     groupGenotype(x=h, map=map, factor=FALSE, haplotype=TRUE)
     ##  [1] "withA" "withA" "rest"  "withA" "rest"  "withA" "rest"  "rest"  "rest"
     ## [10] "rest"  "rest"  "rest"  "rest"  "rest"  "rest"  "rest"

     ## --- Use of "*/*" ---

     map <- list("withA"="A/*", withB="*/*")
     groupGenotype(x=g, map=map, factor=FALSE)
     ##  [1] "withA" "withA" "withA" "withA" "withA" "withA" "withA" "withB" "withB"
     ## [10] "withB" "withB" "withB" "withB" "withB" "withB" "withB"

     ## --- Missing genotype specifications produces NA's ---

     map <- list("withA"="A/*", withB="B/*")
     groupGenotype(x=g, map=map, factor=FALSE)
     ##  [1] "withA" "withA" "withA" "withA" "withA" "withA" "withA" "withB" "withB"
     ## [10] "withB" "withB" "withB" NA      NA      NA      NA

     groupGenotype(x=h, map=map, factor=FALSE, haplotype=TRUE)
     ##  [1] "withA" "withA" "withB" "withA" NA      "withA" NA      "withB" "withB"
     ## [10] NA      "withB" NA      NA      NA      NA      NA

