Starting in version 2.0-0,
OneMap can also deal with inbred-based populations, that is, populations that have homozygous parental lines in the genealogy (F2s, backcrosses, and RILs). As a consequence, linkage phases do not need to be estimated. Since version 2.3.0, phases are estimated for F2 populations to properly generate progeny haplotypes not only the recombination fraction.
In this vignette, we explain how to proceed with the analysis in an F2 population. The same procedure can be used for backcrosses and RILs as well, and therefore users should not have any difficulty in analyzing their data. However, there are a number of differences from genetic mapping in outcrossing species; please read the proper vignette.
If you are not familiar with
R, we recommend first the reading of vignette Introduction to R. You do not need to be an expert in R to build your linkage map, but some concepts are necessary and will help you through the process.
There is a GitHub
OneMap version which is constantly improved, we strongly recommend all users to try this version. In
augusto-garcia/onemap GitHub page you can find instructions to install the package from GitHub and also more fancy tutorials.
For F2s, backcrosses, and RILs, two input formats are accepted. The user can choose between the standard
OneMap file format or the same raw file used by
MAPMAKER/EXP (Lander et al., 1987). Therefore, one should have no difficulty in using data sets already available for
MAPMAKER/EXP when deciding to try
Both types of raw files can contain phenotypic information, but this will not be used during map construction, that requires only genotypic information (made available by molecular markers).
MAPMAKER/EXP raw file, combined with the map file produced by
OneMap, can be readily used for QTL mapping using
R/qtl (Broman et al., 2008) or
QTL Cartographer (Wang et al., 2010), among others.
Here, we briefly present how to set up this data file. For more detailed information see the
MAPMAKER/EXP manual (Lincon et al., 1993), available here.
The first line of your data file should be:
data type xxxx
xxxx is one of the following data types:
||RIL, produced by selfing|
||RIL, produced by sib mating|
The second line should contain the number of individuals in the progeny, the number of markers, and the number of quantitative traits. So, for example, a valid line would be
10 5 2
for a data set with 10 individuals (yes, very small, but this is just an example), 5 markers, and 2 traits evaluated.
Then, the genotype information is included for each marker. The character
* indicates the beginning of information of a marker, followed by the marker name. For instance, here is an example of such a file for an F2 population with 10 individuals, 5 markers, and 2 quantitative traits:
data type f2 intercross 10 5 2 *M1 A B H H A - B A A B *M2 C - C C C - - C C A *M3 D B D D - - B D D B *M4 C C C - A C C A A C *M5 C C C C C C C C C C *weight 10.2 - 9.4 11.3 11.9 8.9 - 11.2 7.8 8.1 *length 1.7 2.1 - 1.8 2.0 1.0 - 1.7 1.0 1.1
The codification for genotypes is the following:
||homozygous for allele A (from parent 1 - AA)|
||homozygous for allele B (from parent 2 - BB)|
||heterozygous carrying both alleles (AB)|
||Not homozygous for allele A (Not AA)|
||Not homozygous for allele B (Not BB)|
||Missing data for the individual at this marker|
symbols option (not included in this example), used in
MAPMAKER/EXP files, is also accepted (please, see its manual for details).
The quantitative trait data should come after the genotypic data and has a similar format, except the trait values for each individual must be separated by at least one space, a tab, or a line break. A dash (
-) indicates missing data.
This file must be saved in plain text format using a simple text editor such as notepad on Microsoft Windows. Historically,
MAPMAKER/EXP uses the
.raw extension for this file; however, you can use any other extensions, such as
If you want to see more examples about this file type, open
mapmaker_example_f2.raw, both available with
OneMap and saved in the directory
extdata on your computer, in the folder where you installed
system.file(package="onemap") to see where it is located on your computer).
Now, let us load
To save your project anytime, type:
if you are using Windows, otherwise, adapt the code. Notice that you need to specify where to save and the name of the file. You can also use the toolbar, of course.
OneMap data file has few differences compared to MAPMAKER/EXP format. As MAPMAKER/EXP format, the input
OneMap file is a text file, where the first line indicates the cross-type and the second line provides information about the number of individuals, the number of markers, and the number of quantitative traits. Here, the format also supports keeping physical markers location information. The followed numbers indicate the presence/absence (1/0) of chromosome and position information and the presence/absence(1/0) of phenotypic data.
The third line contains sample IDs. Then, the genotype information is included separately for each marker. The character
* indicates the beginning of information input for a new marker, followed by the marker name. Next, there is a code indicating the marker type according to:
||Dominant marker for allele B|
||Dominant marker for allele A|
||Marker for backcross|
||Marker for ril self/sib cross|
Finally, after each marker type, comes the genotype data for the segregating population. Missing data are indicated with the character
- (minus sign) and an empty space separates the information for each individual. Positions and phenotype information, if present, follows genotypic data with a similar structure. Details are found with the help of function
Here is an example of such file for 10 individuals, 5 markers (the two zeros in the second line indicate that there is no chromosome information, physical position information), and two phenotypic data, respectively). It is very similar to a MAPMAKER/EXP file, but has additional information about the cross_type.
data type f2 intercross 10 5 0 0 2 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 *M1 A.H.B ab a - ab b a ab - ab b *M2 A.H.B a - ab ab - b a - a ab *M3 C.A c a a c c - a c a c *M4 A.H.B ab b - ab a b ab b - a *M5 D.B b b d - b d b b b d *fen1 10.3 11.2 11.1 - 9.8 8.9 11.0 10.7 - 10.1 *fen2 42 49 - 45 51 42 28 32 38 40
In case you have physical chromosome and position information:
data type f2 intercross 10 5 1 1 2 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 *M1 A.H.B ab a - ab b a ab - ab b *M2 A.H.B a - ab ab - b a - a ab *M3 C.A c a a c c - a c a c *M4 A.H.B ab b - ab a b ab b - a *M5 D.B b b d - b d b b b d *CHROM 1 1 1 2 2 *POS 2391 3812 5281 1823 3848 *fen1 10.3 11.2 11.1 - 9.8 8.9 11.0 10.7 - 10.1 *fen2 42 49 - 45 51 42 28 32 38 40
The input file must be saved in text format, with extensions like
.raw. It is a good idea to open the text files called
onemap_example_riself (available in
OneMap and saved in the directory you installed it) to see how this file should be. You can see where
OneMap is installed using the command
Once you created your data file with raw data, you can use
read_mapmaker to import it to
<- read_mapmaker(dir="C:/workingdirectory", mapmaker_example_f2 file="your_data_file.raw")
The first argument is the directory where the input file is located, modify it accordingly. The second one is the data file name.
In this example, an object named
mapmaker_example_f2.raw was created. Notice that if you leave the argument
dir blank, the file will be read from your current working directory. To set a working directory, see Introduction to R (Importing and Exporting Data).
<- read_mapmaker(file= system.file("extdata/mapmaker_example_f2.raw", mapmaker_example_f2 package = "onemap"))
For this example, we will use a simulated data set from an F2 population which is distributed along with
OneMap. Because this particular data set is distributed along with the package, you can load it by typing:
To see what this data set is about, type:
mapmaker_example_f2#> This is an object of class 'onemap' #> Type of cross: f2 #> No. individuals: 200 #> No. markers: 66 #> CHROM information: no #> POS information: no #> Percent genotyped: 85 #> #> Segregation types: #> AA : AB : BB --> 36 #> Not AA : AA --> 15 #> Not BB : BB --> 15 #> #> No. traits: 1 #> Missing trait values: #> Trait_1: 0
As you can see, the data consists of a sample of 200 individuals genotyped for 66 markers (36 co-dominant (
BB), 15 dominant in one parent (
Not AA or
AA) and 15 dominant in the other parent (
Not BB or
BB) with 15% of missing data. You can also see that there is phenotypic information for one trait in the data set, that can be used for QTL mapping.
The same procedure is made for the
OneMap raw file, but, instead of using the function
read_mapmaker we use
read_onemap to read the
<- read_onemap(dir="C:/workingdirectory", onemap_example_f2 inputfile = "your_data_file.raw")
In this example, an object named
onemap_example_f2.raw was created. The data set containing the same markers and individuals of the
mapmaker_example_f2.raw file. Would be a good idea to open these two files in a text editor and compare them to better understand the differences between the two kinds of input files. We can read the
<- read_onemap(inputfile= system.file("extdata/onemap_example_f2.raw", onemap_example_f2 package = "onemap"))
Or, because this particular data are available together with the
To see what this data set is about, type:
onemap_example_f2#> This is an object of class 'onemap' #> Type of cross: f2 #> No. individuals: 200 #> No. markers: 66 #> CHROM information: no #> POS information: no #> Percent genotyped: 85 #> #> Segregation types: #> AA : AB : BB --> 36 #> Not AA : AA --> 15 #> Not BB : BB --> 15 #> #> No. traits: 1 #> Missing trait values: #> Trait_1: 0
As you can see, the mean difference in the output object is that the
read_onemap function keeps chromosome and position information. Because the objects
onemap_example_f2 are practically the same, from now we will use only
If you are working with biallelic markers, as SNPs and indels (only codominant markers A.H.B), in VCF (Variant Call Format) files, you can import information from
onemap_read_vcfR you can convert VCF file directly to
onemap_read_vcfR function keeps chromosome and position information for each marker at the end of the raw file.
We will use the same example file
vcf_example_f2.vcf to show how it works.
Here we use the the
vcfR package internally to help this conversion. The
vcfR authors mentioned in their tutorials that RAM memory use is an important consideration when using the package. Depending of your dataset, the object created can be huge and occupy a lot of memory.
You can use
onemap_read_vcfR function to convert the VCF file to
onemap object. The parameters used are the
vcf with the VCF file path, the identification of each parent (here, you must define only one sample for each parent) and the cross type.
<- onemap_read_vcfR(vcf = system.file("extdata/vcf_example_f2.vcf.gz", package = "onemap"), vcf_example_f2 parent1 = "P1", parent2 = "P2", cross = "f2 intercross")
Depending on your dataset, this function can take some time to run.
NOTE: From version 2.0.6 to 2.1.1005,
OneMap had the
vcf2raw function to convert
.raw. Now, this function is defunct, but it can be replaced by a combination of
write_onemap_raw functions. See Exporting .raw file from onemap object session to further information about
Before building your linkage map, you should take a look at your data set. First, notice that by reading the raw data into
OneMap, an object of classes
f2 was produced:
class(onemap_example_f2) #>  "onemap" "f2" class(vcf_example_f2) #> f2 intercross #> "onemap" "f2"
In fact, functions
read_onemap will produce objects of classes
f2, according to the information in the data file for inbred-based populations. Therefore, you can use
OneMap’s version of function
plot to produce a graphic with information about the raw data. It will automatically recognize the class of the object and produce the graphic. To see it in action, try:
The graphic is self-explanatory. If you want to save it, see the help for function
This graphic shows that missing data is somehow randomly distributed; also, the proportion of dominant markers is relatively high for this data set. In
OneMap’s notation, codominant markers are classified as of B type; dominant ones, by C type (for details about this notation, see the vignette for outcrossing species). You can see the number of loci within each type using function
So, as shown before, the object
onemap_example_f2 has 36 codominant markers and 30 dominant ones and the
vcf_example_f2has only codominant markers.
create_depth_profile generates dispersion graphics with x and y axis representing, respectively, the reference and alternative allele depths. The function is only available for biallelic markers in VCF files with allele counts information. Each dot represents a genotype for
mks markers and
inds individuals. If both arguments receive
NULL, all markers and individuals are considered. Dots are colored according to the genotypes present in the
OneMap object (
GTfrom = onemap) or in the VCF file (
GTfrom = vcf). An rds file is generated with the data in the graphic (
alpha argument controls the transparency of the color of each dot. Control this parameter is a good idea when having a big amount of markers and individuals. The
y_lim control the axis scale limits, by default, it uses the maximum value of the counts.
Here is an example of the simulated dataset.
<- onemap_read_vcfR(vcf = system.file("extdata/vcf_example_f2.vcf.gz", package="onemap"), simu_f2_obj cross = "f2 intercross", parent1 = "P1", parent2 = "P2") create_depths_profile(onemap.obj = simu_f2_obj, vcfR.object = system.file("extdata/vcf_example_f2.vcf.gz", package="onemap"), parent1 = "P1", parent2 = "P2", vcf.par = "AD", recovering = FALSE, mks = NULL, inds = NULL, GTfrom = "vcf", alpha = 0.1, rds.file = "depths_f2.rds")
Selecting genotypes from
vcf (GTfrom = “vcf”) the colors are separated by VCF genotypes:
0/0 homozygote for the reference allele,
0/1 heterozygote, and
1/1 homozygote for the alternative allele. Depending on your VCF, you can also have phased genotypes, which are presented by pipe (|) instead of the bar (/).
If you have more than one dataset of markers, all from the same cross-type, you can use the function
combine_onemap to merge them into only one
In our example, we have two
mapmaker_example_f2) with 66 markers and 200 individuals
vcf_example_f2with 25 biallelic markers and 192 individuals.
combine_function recognizes the correspondent individuals by the ID, thus, it is important to define exactly the same IDs to respective individuals in both
raw files. Compared with the first file, the second file does not have markers information for 8 individuals. The
combine_onemap will complete that information with NA.
In our examples, we have only genotypic information, but the function can also merge the phenotypic information if it exists.
<- combine_onemap(onemap_example_f2, vcf_example_f2) comb_example comb_example#> This is an object of class 'onemap' #> Type of cross: f2 #> No. individuals: 200 #> No. markers: 91 #> CHROM information: yes #> POS information: yes #> Percent genotyped: 84 #> #> Segregation types: #> AA : AB : BB --> 61 #> Not AA : AA --> 15 #> Not BB : BB --> 15 #> #> No. traits: 1 #> Missing trait values: #> Trait_1: 0
The function arguments are the names of the
OneMap objects you want to combine.
Plotting markers genotypes from the outputted
OneMap object, we can see that there are more missing data
- (black vertical lines) for some individuals because they were missing in the second file.
It is possible that there are redundant markers in your dataset, especially when dealing with too many markers. Redundant markers have the same genotypic information that others markers, usually because didn’t happen recombination events between each other. They will not increase information on the map but will increase computational effort during the map building. Therefore, it is a good practice to remove them to build the map and, once the map is already built, they can be added again.
First, we use the function
find_bins to group the markers into bins according to their genotypic information. In other words, markers with the same genotypic information will be in the same bin.
<- find_bins(comb_example, exact = FALSE) bins bins#> This is an object of class 'onemap_bin' #> No. individuals: 200 #> No. markers in original dataset: 91 #> No. of bins found: 90 #> Average of markers per bin: 1.011111 #> Type of search performed: non exact
The first argument is the
OneMap object and the
exact argument specifies if only markers with exact same information will be at the same bin. Using
FALSE at this second argument, missing data will not be considered and the marker with the lowest amount of missing data will be the representative marker on the bin.
Our example dataset has only two redundant markers. We can create a new
OneMap object without them, using the
create_data_bins function. This function keeps only the most representative marker of each bin from the
<- create_data_bins(comb_example, bins) bins_example bins_example#> This is an object of class 'onemap' #> Type of cross: f2 #> No. individuals: 200 #> No. markers: 90 #> CHROM information: yes #> POS information: yes #> Percent genotyped: 84 #> #> Segregation types: #> AA : AB : BB --> 60 #> Not AA : AA --> 15 #> Not BB : BB --> 15 #> #> No. traits: 1 #> Missing trait values: #> Trait_1: 0
The arguments for
create_data_bins function are the
OneMap object and the object created by
onemap_read_vcfR generates new
OneMap objects without use a input
.raw file. Also, the function
combine_onemap manipulates the information of the original
.raw file and creates a new data set. In both cases, you do not have an input file
.raw that contains the same information as your current onemap object If you want to create a new input file with the data set you are working on after using these functions, you can use the function
write_onemap_raw(bins_example, file.name = "new_dataset.raw", cross="f2 intercross")
new_dataset.raw will be generated in your working directory. In our example, it contains markers from
vcf_example_f2 data sets.
Now, it should be interesting to see if markers are segregating following what is expected by Mendel’s law. You first need to use function
test_segregation using as argument an object of class
<- test_segregation(bins_example, simulate.p.value = FALSE)f2_test
This will produce an object of class
class(f2_test) #>  "onemap_segreg_test"
You cannot see the results if you simply type the object name; use
OneMap’s version of the
(Nothing is shown!)
print(f2_test) #> Marker H0 Chi-square p-value % genot. #> 1 M1 1:2:1 0.206896552 0.901722662 87.0 #> 2 M2 3:1 0.292682927 0.588506355 82.0 #> 3 M3 3:1 0.124031008 0.724703003 86.0 #> 4 M4 3:1 0.292682927 0.588506355 82.0 #> 5 M5 3:1 0.159763314 0.689374522 84.5 #> 6 M6 3:1 0.620689655 0.430791121 87.0 #> 7 M7 1:2:1 3.185185185 0.203397600 81.0 #> 8 M8 3:1 0.165644172 0.684012345 81.5 #> 9 M9 3:1 0.126984127 0.721579725 84.0 #> 10 M10 1:2:1 1.721893491 0.422761645 84.5 #> 11 M11 3:1 1.390946502 0.238245323 81.0 #> 12 M12 1:2:1 1.678160920 0.432107681 87.0 #> 13 M13 3:1 0.154285714 0.694472964 87.5 #> 14 M14 1:2:1 5.337209302 0.069348924 86.0 #> 15 M15 3:1 0.403292181 0.525393917 81.0 #> 16 M16 1:2:1 2.036144578 0.361290733 83.0 #> 17 M17 1:2:1 1.000000000 0.606530660 81.0 #> 18 M18 1:2:1 3.413793103 0.181427972 87.0 #> 19 M19 3:1 0.279069767 0.597311573 86.0 #> 20 M20 3:1 4.462626263 0.034644194 82.5 #> 21 M21 1:2:1 4.892857143 0.086602329 84.0 #> 22 M22 1:2:1 2.862068966 0.239061489 87.0 #> 23 M23 3:1 0.008032129 0.928587488 83.0 #> 24 M24 3:1 8.649325626 0.003271824 86.5 #> 25 M25 1:2:1 0.686746988 0.709373215 83.0 #> 26 M26 1:2:1 7.022598870 0.029858091 88.5 #> 27 M27 1:2:1 0.108433735 0.947226662 83.0 #> 28 M28 1:2:1 3.319526627 0.190183989 84.5 #> 29 M29 1:2:1 0.161676647 0.922342801 83.5 #> 30 M30 1:2:1 0.607142857 0.738177160 84.0 #> 31 M31 3:1 0.007843137 0.929430415 85.0 #> 32 M32 3:1 0.016949153 0.896416968 88.5 #> 33 M33 1:2:1 0.640718563 0.725888192 83.5 #> 34 M34 3:1 0.235867446 0.627206926 85.5 #> 35 M35 3:1 0.606741573 0.436017310 89.0 #> 36 M36 3:1 0.272727273 0.601508134 88.0 #> 37 M37 1:2:1 3.169491525 0.204999905 88.5 #> 38 M38 1:2:1 3.055555556 0.217017393 90.0 #> 39 M39 3:1 0.163636364 0.685830434 82.5 #> 40 M40 1:2:1 4.872093023 0.087506123 86.0 #> 41 M41 1:2:1 4.253012048 0.119253235 83.0 #> 42 M42 3:1 0.124031008 0.724703003 86.0 #> 43 M43 1:2:1 0.927272727 0.628992237 82.5 #> 44 M44 1:2:1 2.000000000 0.367879441 86.0 #> 45 M45 3:1 0.008032129 0.928587488 83.0 #> 46 M46 1:2:1 0.740112994 0.690695307 88.5 #> 47 M47 1:2:1 0.758620690 0.684333200 87.0 #> 48 M48 1:2:1 2.122807018 0.345969898 85.5 #> 49 M49 3:1 1.190476190 0.275233524 87.5 #> 50 M50 3:1 1.068686869 0.301242240 82.5 #> 51 M51 3:1 0.031746032 0.858586201 84.0 #> 52 M52 1:2:1 2.684848485 0.261211660 82.5 #> 53 M53 3:1 0.017142857 0.895830102 87.5 #> 54 M54 1:2:1 0.772455090 0.679615865 83.5 #> 55 M55 1:2:1 0.655172414 0.720661163 87.0 #> 56 M56 1:2:1 2.310734463 0.314941859 88.5 #> 57 M57 1:2:1 6.159763314 0.045964696 84.5 #> 58 M58 3:1 2.050847458 0.152121494 88.5 #> 59 M59 3:1 0.074074074 0.785494747 81.0 #> 60 M60 3:1 0.050505051 0.822186767 82.5 #> 61 M61 1:2:1 0.655172414 0.720661163 87.0 #> 62 M62 1:2:1 1.047337278 0.592343463 84.5 #> 63 M63 1:2:1 2.147928994 0.341651353 84.5 #> 64 M64 3:1 0.238658777 0.625176499 84.5 #> 65 M65 1:2:1 4.571428571 0.101701392 84.0 #> 66 M66 1:2:1 0.452513966 0.797513128 89.5 #> 67 SNP1 1:2:1 1.278787879 0.527612092 82.5 #> 68 SNP2 1:2:1 1.559523810 0.458515169 84.0 #> 69 SNP3 1:2:1 0.503105590 0.777592404 80.5 #> 70 SNP5 1:2:1 4.450000000 0.108067419 80.0 #> 71 SNP6 1:2:1 2.641975309 0.266871595 81.0 #> 72 SNP7 1:2:1 3.745341615 0.153712576 80.5 #> 73 SNP8 1:2:1 4.164705882 0.124636604 85.0 #> 74 SNP9 1:2:1 2.578313253 0.275503037 83.0 #> 75 SNP10 1:2:1 3.359281437 0.186440949 83.5 #> 76 SNP11 1:2:1 2.395061728 0.301938820 81.0 #> 77 SNP12 1:2:1 4.731707317 0.093869134 82.0 #> 78 SNP13 1:2:1 2.963855422 0.227199291 83.0 #> 79 SNP14 1:2:1 7.123456790 0.028389714 81.0 #> 80 SNP15 1:2:1 3.109090909 0.211285400 82.5 #> 81 SNP16 1:2:1 2.751552795 0.252643368 80.5 #> 82 SNP17 1:2:1 5.850299401 0.053656659 83.5 #> 83 SNP18 1:2:1 3.108433735 0.211354837 83.0 #> 84 SNP19 1:2:1 3.335403727 0.188680181 80.5 #> 85 SNP20 1:2:1 3.867469880 0.144607090 83.0 #> 86 SNP21 1:2:1 3.238095238 0.198087264 84.0 #> 87 SNP22 1:2:1 4.455621302 0.107764105 84.5 #> 88 SNP23 1:2:1 2.847058824 0.240862412 85.0 #> 89 SNP24 1:2:1 1.778443114 0.410975549 83.5 #> 90 SNP26 1:2:1 1.981595092 0.371280460 81.5
This shows the results of the Chi-square test for the expected Mendelian segregation pattern of each marker locus. This depends of course on the marker type, because codominant markers can show heterozygous genotypes. The appropriate null hypothesis is selected by the function. The proportion of individuals genotyped is also shown.
To declare statistical significance, remember that you should consider that multiple tests are being performed. To guide you in the analysis, function
Bonferroni_alpha shows the alpha value that should be considered for this number of loci if applying Bonferroni’s correction with global alpha of 0.05:
Bonferroni_alpha(f2_test) #>  0.0005555556
You can subset object
f2_test to see which markers are distorted under Bonferroni’s criterion, but it is easier to see the proportion of markers that are distorted by drawing a graphic using
OneMap’s version of the function
plot for objects of class
The graphic is self-explanatory: p-values were transformed by using
-log10(p-values) for better visualization. A vertical line shows the threshold for tests if Bonferroni’s correction is applied. Significant and non-significant tests are identified. In this particular example, no test was statistically significant, so none will be discarded.
Please, remember that Bonferroni’s correction is conservative, and also that discarding marker data might not be a good approach to your analysis. This graphic is just to suggest a criterion, so use it with caution.
You can see a list of markers with non-distorted segregation using function
select_segreg(f2_test) #>  "M1" "M2" "M3" "M4" "M5" "M6" "M7" "M8" "M9" #>  "M10" "M11" "M12" "M13" "M14" "M15" "M16" "M17" "M18" #>  "M19" "M20" "M21" "M22" "M23" "M24" "M25" "M26" "M27" #>  "M28" "M29" "M30" "M31" "M32" "M33" "M34" "M35" "M36" #>  "M37" "M38" "M39" "M40" "M41" "M42" "M43" "M44" "M45" #>  "M46" "M47" "M48" "M49" "M50" "M51" "M52" "M53" "M54" #>  "M55" "M56" "M57" "M58" "M59" "M60" "M61" "M62" "M63" #>  "M64" "M65" "M66" "SNP1" "SNP2" "SNP3" "SNP5" "SNP6" "SNP7" #>  "SNP8" "SNP9" "SNP10" "SNP11" "SNP12" "SNP13" "SNP14" "SNP15" "SNP16" #>  "SNP17" "SNP18" "SNP19" "SNP20" "SNP21" "SNP22" "SNP23" "SNP24" "SNP26"
To get a list of distorted ones (none in this example):
select_segreg(f2_test, distorted = TRUE) #> character(0)
It is not recommended, but you can define a different threshold value by changing the
threshold argument of the function
For the next steps will be useful to know the numbers of each marker with segregation distortion, so then you can keep those out of your map building analysis. These numbers refer to the lines where markers are located on the data file.
To access the corresponding number for of these markers you can change the
<- select_segreg(f2_test, distorted = FALSE, numbers = TRUE) #to show the markers numbers without segregation distortion no_dist no_dist#>  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #>  26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 #>  51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 #>  76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 <- select_segreg(f2_test, distorted = TRUE, numbers = TRUE) #to show the markers numbers with segregation distortion dist dist#> integer(0)
After visualizing raw data and checking for segregation distortion, let us now estimate recombination fractions between all pairs of markers (two-point tests). This is necessary to allow us to test which markers are linked. At this point, you should pay no attention if markers show segregation distortion or not, that is, simply use all of them.
<- rf_2pts(input.obj = bins_example)twopts_f2
There are two optional arguments in function
max.rf which indicate the minimum LOD Score and the maximum recombination fraction to declare linkage (they default to 3.0 and 0.5, respectively).
The default for the recombination fraction is easy to understand, because if
max.rf < 0.5 we could state that markers are linked. The LOD Score is the statistic used to evaluate the significance of the test for
max.rf = 0.50. This needs to take into consideration the number of tests performed, which of course depends on the number of markers. Function
suggest_lod can help users to find an initial value to use for their linkage test. For this example:
<- suggest_lod(bins_example)) (LOD_sug #>  4.145858
Thus, one should consider using
LOD = 4.145858 for the tests. Please, notice that this is just a guide, not a value to take without any further consideration. For now, we will keep the default values, but later will show that results do not change in our example by using
LOD = 3 or
LOD = 4.145858.
If you want to see the results for a single pair of markers, say
print(twopts_f2, c("M12", "M42")) #> Results of the 2-point analysis for markers: M12 and M42 #> Criteria: LOD = 3 , Maximum recombination fraction = 0.5 #> #> rf LOD #> CC 0.6842077 2.29369 #> CR 0.3157923 2.29369 #> RC 0.6842077 2.29369 #> RR 0.3157923 2.29369
Since version 2.3.0, we estimate phases for F2 populations too, so here you can see the recombination fractions and LOD values for each possible phase. For RILs and backcross, you will obtain only one value.
This was possible because
OneMap has a version of the
class(twopts_f2) #>  "rf_2pts" "f2"
However, objects of this type are too complex to print if you do not specify a pair of markers:
print(twopts_f2) #> This is an object of class 'rf_2pts' #> #> Criteria: LOD = 3 , Maximum recombination fraction = 0.5 #> #> This object is too complex to print #> Type 'print(object, c(mrk1=marker, mrk2=marker))' to see #> the analysis for two markers #> mrk1 and mrk2 can be the names or numbers of both markers
In this example we follow two different strategies:
Using only recombinations information.
Using the recombinations and also the reference genome information, once our example has
POS information for some of the markers.
First, we will apply the strategy using only recombinations information. In the second part of this tutorial, we show a way to use also reference genome information.
OneMap has two functions to perform the markers grouping. The first presented here is the
<- group(mark_all_f2) LGs_f2 #> Selecting markers: #> group 1 #> ............................................................ #> ............................. LGs_f2#> This is an object of class 'group' #> It was generated from the object "mark_all_f2" #> #> Criteria used to assign markers to groups: #> LOD = 3 , Maximum recombination fraction = 0.5 #> #> No. markers: 90 #> No. groups: 1 #> No. linked markers: 90 #> No. unlinked markers: 0 #> #> Printing groups: #> Group 1 : 90 markers #> M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 SNP24 SNP26
This will show the linkage groups which are formed with criteria defined by
LOD. These criteria are applied as thresholds when assigning markers to linkage groups. If not modified, the same values used for the object
twopts (from two-point analysis) will be maintained (so,
LOD = 3.0 and
max.rf = 0.5 in this example).
Users can easily change the default values. For example, using LOD suggested by
suggest_lod (rounded up):
<- group(mark_all_f2, LOD = LOD_sug, max.rf = 0.5)) (LGs_f2 #> Selecting markers: #> group 1 #> ............................................................ #> ............................. #> This is an object of class 'group' #> It was generated from the object "mark_all_f2" #> #> Criteria used to assign markers to groups: #> LOD = 4.145858 , Maximum recombination fraction = 0.5 #> #> No. markers: 90 #> No. groups: 1 #> No. linked markers: 90 #> No. unlinked markers: 0 #> #> Printing groups: #> Group 1 : 90 markers #> M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 SNP24 SNP26
In our case, nothing different happens. (The parentheses above are just to avoid typing
LGs_f2 in a new row to have the object printed).
We can see that the markers were assigned to only one linkage group. This division isn’t so trustful, mostly if you are also working with dominant markers.
Another option is the
group_upgma function. It is an adapted version of MAPpoly grouping function.
<- group_upgma(mark_all_f2, expected.groups = 5, inter = F) LGs_upgma plot(LGs_upgma)
You can define the expected number of groups in the
expected.groups argument and check how the markers are split in the plotted dendrogram. Using argument
inter=TRUE you can change interactively the number of groups defined by the red squares in the graphic.
If you have reference genome information for some of your markers, you can separate the groups using it and the
group_seq function (further information in
Using the recombinations and the reference genome information) to add the markers without reference genome information. We can also confirm the separation of linkage groups in the ordering procedure.
Notice the class of object
class(LGs_f2) #>  "group" class(LGs_upgma) #>  "group.upgma"
It is possible to plot the recombination fraction matrix and LOD Scores based on a color scale using the function
rf_graph_table. This matrix can be useful to make some diagnostics about the map.
Ordered markers are presented on both axes. Hotter the colors lower the recombination fraction between markers related to each cell. Good map orders have hot colors in the diagonal and it gradually turns to blue at the superior left and inferior right corners. This pattern means that markers that have low recombination fractions are really positioned closely on the map.
Let’s see the maps we have until now:
rf_graph_table(LG1_rec_f2) rf_graph_table(LG1_ug_f2) rf_graph_table(LG1_mds_f2)
NOTE: Check the GitHub vignette version to visualize all the graphics.
With default arguments, the graphic cells represent the recombination fractions. If you change the argument to
graph.LOD = TRUE, LOD score values are plotted. The color scale varies from red (small distances and big LODs) to dark blue. You can also change the number of colors from the
rainbow palette with argument
n.colors, add/remove graphic main and axis title (
lab.xy), and shows marker numbers, instead of names in the axis (
We can see that any of the algorithms gave an optimal ordering, there are many red cells out of the diagonal, the color pattern is broke in all graphics, but we need to do an effort to find which one of the algorithms gave the best result. Then, we continue the analysis by doing hand manipulations.
record approaches are the ones that gave results more close to the expected pattern. We can also see that probably there is more than one group, because of the big gaps. Let’s see the map generated by the
ug algorithm with more details using the interactive mode of the
rf_graph_table(LG1_ug_f2, inter = TRUE, html.file = "test.html")
An interactive version of the graphic will pop up (not shown here) in your internet browser end generated an HTML file in your work directory. Hover the mouse cursor over the cell corresponding to two markers, you can see some useful information about them.
11 seem to form a separated group (we will call this group LG1). Markers from
55 will be our LG2. Markers from
50 show strong evidence that constitutes another separated group, including markers (LG3).
Let’s separate them and order again using the same method:
# New LG1 will be this separated group <- which(LG1_ug_f2$seq.num == 11) # Find position of marker 11 pos11 <- LG1_ug_f2$seq.num[1:pos11] # From marker 89 to 11 mksLG1 # LG2 <- which(LG1_ug_f2$seq.num == 23) pos23 <- which(LG1_ug_f2$seq.num == 55) pos55 <- LG1_ug_f2$seq.num[pos23:pos55] mksLG2 # LG3 <- which(LG1_ug_f2$seq.num == 39) pos39 <- LG1_ug_f2$seq.num[pos39:length(LG1_ug_f2$seq.num)] # use the position to find the 39 marker and take all the markers from there to the end of sequence mksLG3 # Ordering again LG1 <- make_seq(twopts_f2, mksLG1) LG1 <- ug(LG1, hmm = F) LG1_ug2_f2 rf_graph_table(LG1_ug2_f2) # Now it is better
# Ordering LG2 <- make_seq(twopts_f2, mksLG2) LG2 <- ug(LG2, hmm = F) LG2_ug_f2 rf_graph_table(LG2_ug_f2)
# Ordering LG3 <- make_seq(twopts_f2, mksLG3) LG3 <- ug(LG3, hmm = F) LG3_ug_f2 rf_graph_table(LG3_ug_f2)
Besides the groups now is better separated, the order is still not the best we can do. With fewer markers in each group, we can apply a multipoint strategy to better order those markers, starting with group 1.
In our example, we have reference genome chromosome and position information for some of the markers, here we will exemplify one method of using this information to help build the genetic map.
CHROM information in the input file, you can identify markers belonging to some chromosome using the function
make_seq with the
rf_2pts object. For example, assign the string
"1" for the second argument to get chromosome 1 makers. The output sequence will be automatically ordered by
<- make_seq(twopts_f2, "1") CHR1 CHR1#> #> Number of markers: 9 #> Markers in the sequence: #> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 #> #> Parameters not estimated. <- make_seq(twopts_f2, "2") CHR2 <- make_seq(twopts_f2, "3")CHR3
CHROM information we have three defined linkage groups, now we can try to group the markers without chromosome information to them using recombination information. For this, we can use the function
<- group_seq(input.2pts = twopts_f2, seqs = "CHROM", unlink.mks = mark_all_f2, CHR_mks repeated = FALSE) #> Selecting markers: #> group 1 #> ............................................................ #> .............. #> Selecting markers: #> group 1 #> ............................................................ #> .................. #> Selecting markers: #> group 1 #> ............................................................ #> ....... #> There are one or more markers that grouped in more than one sequence
The function works as the function
group but considering pre-existing sequences. Setting
seqs argument with the string
"CHROM", it will consider the pre-existing sequences according to
CHROM information. You can also indicate other pre-existing sequences if it makes sense for your study. For that, you should inform a list with objects of class
sequences, as the example:
<- group_seq(input.2pts = twopts_f2, seqs = list(CHR1=CHR1, CHR2=CHR2, CHR3=CHR3), CHR_mks unlink.mks = mark_all_f2, repeated = FALSE)
In this case, the command had the same effect as the previous, because we indicate chromosome sequences, but others sequences can be used.
unlink.mks argument receives an object of class
sequence, this defines which markers will be tested to group with the sequences in
seqs. In our example, we will indicate only the markers with no segregation distortion, using the sequence
mark_no_dist. It is also possible to use the string
"all" to test all the remaining markers at the
In some cases, the same marker can group into more than one sequence, those markers will be considered
repeated. We can choose if we want to remove or not (
FALSE/TRUE) them of the output sequences, with the argument
rm.repeated. Anyway, their numbers will be informed at the list
repeated in the output object.
In the example case, there are no repeated markers. However, if they exist, it could indicate that their groups actually constitute the same group. Also, genotyping errors can generate repeated markers. Anyway, they deserve better investigations.
We can access detailed information about the results just by printing:
CHR_mks#> This is an object of class 'group_seq' #> Criteria used to assign markers to groups: #> LOD = , Maximum recombination fraction = #> #> No. markers in input sequences: #> CHR1 : 9 markers #> CHR2 : 13 markers #> CHR3 : 2 markers #> #> No. unlinked input markers: 66 markers #> #> No. markers in output sequences: #> CHR1 : 9 markers #> CHR2 : 13 markers #> CHR3 : 2 markers #> No. unlinked: 0 markers #> No. repeated: 66 markers #> #> Printing output sequences: #> Group CHR1 : 9 markers #> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 #> #> Group CHR2 : 13 markers #> SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 #> #> Group CHR3 : 2 markers #> SNP24 SNP26 #> #> Unlinked markers: 0 markers #> #> #> Repeated markers: 66 markers #> #> Group CHR1 : 66 markers #> M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 #> #> Group CHR2 : 66 markers #> M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 #> #> Group CHR3 : 66 markers #> M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66
Also, we can access the numbers of repeated markers with:
$repeated CHR_mks#> $CHR1 #>  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #>  26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 #>  51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 #> #> $CHR2 #>  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #>  26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 #>  51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 #> #> $CHR3 #>  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #>  26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 #>  51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
In the same way, we can access the output sequences:
$sequences$CHR1 CHR_mks#> #> Number of markers: 9 #> Markers in the sequence: #> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 #> #> Parameters not estimated. # or $sequences[] CHR_mks#> #> Number of markers: 9 #> Markers in the sequence: #> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 #> #> Parameters not estimated.
For this function, optional arguments are
max.rf, which define thresholds to be used when assigning markers to linkage groups. If none is provided (default), criteria previously defined for the object
rf_2pts are used.
In our example, all markers without genomic information grouped with all chromosomes, then this approach did not give us more information about grouping and from here we could follow the same strategy did in Ordering markers within linkage groups.
If you have some information about the order of the markers, for example, from a reference genome or previously published paper, you can define a sequence of those markers in a specific order (using the function
make_seq) and then use the function
map to estimate the final genetic map (based on multi-point analysis). For example, let us assume that we know that the following markers are ordered in this sequence:
M22. In this case, select them from the two-point analysis, and use function
<- make_seq(twopts_f2, c(47, 38, 59, 16, 62, 21, 20, 48, 22))LG3seq_f2
Warning: If you find an error message like:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
It’s because the
map function has a very common name and you can have in your environment another function with the same name. In the case of the presented error, R is using the
map function from
purrr package instead of
OneMap, to solve this, simply specify that you want the
OneMap function with
:: command from
library(stringr) <- onemap::map(LG3seq_f2)) (LG3seq_f2_map #> #> Printing map: #> #> Markers Position Parent 1 Parent 2 #> #> 47 M47 0.00 a | | b a | | b #> 38 M38 14.69 a | | b a | | b #> 59 M59 24.45 a | | o o | | o #> 16 M16 38.96 a | | b a | | b #> 62 M62 49.22 a | | b a | | b #> 21 M21 56.19 a | | b a | | b #> 20 M20 65.10 o | | o a | | o #> 48 M48 73.14 a | | b a | | b #> 22 M22 81.80 a | | b a | | b #> #> 9 markers log-likelihood: -1004.624
NOTE: (new!) If your map population is F2 intercross and your sequence has many markers (more than 60), we suggest to speed up
map using BatchMap parallelization approach. See section
Speed up analysis with parallelization for more information.
This is a subset of the first linkage group. When used this way, the
map function searches for the best combination of phases between markers and prints the results.
To see the correspondence between marker names and numbers, use function
marker_type(LG3seq_f2_map) #> Marker Marker.name Type #> 1 47 M47 A.H.B #> 2 38 M38 A.H.B #> 3 59 M59 C.A #> 4 16 M16 A.H.B #> 5 62 M62 A.H.B #> 6 21 M21 A.H.B #> 7 20 M20 D.B #> 8 48 M48 A.H.B #> 9 22 M22 A.H.B
If one needs to add or drop markers from a predefined sequence, functions
drop_marker can be used. For example, to add markers
M50 at the end of
<- add_marker(LG3seq_f2_map, c(18, 56, 50))) (LG3seq_f2_map #> #> Number of markers: 12 #> Markers in the sequence: #> M47 M38 M59 M16 M62 M21 M20 M48 M22 M18 M56 M50 #> #> Parameters not estimated.
<- drop_marker(LG3seq_f2_map, c(59, 21))) (LG3seq_f2_map #> #> Number of markers: 10 #> Markers in the sequence: #> M47 M38 M16 M62 M20 M48 M22 M18 M56 M50 #> #> Parameters not estimated.
Once all linkage groups were obtained using both strategies, we can draw a map for each strategy using the function
draw_map. Since version 2.1.1007,
OneMap has a new version of
draw_map2. The new function draws elegant linkage groups and presents new arguments to personalize your draw.
If you prefer the old function, we also keep it. Follow examples of how to use both of them.
We can draw a genetic map for all linkage groups using the function
draw_map. First, we have to create a list of ordered linkage groups:
<- list(LG1_f2_final, LG2_f2_final, LG3_f2_final)maps_list
Then use function
draw_map for this list:
draw_map(maps_list, names = TRUE, grid = TRUE, cex.mrk = 0.7)
We also can draw a map for a specific linkage group:
draw_map(LG1_f2_final, names = TRUE, grid = TRUE, cex.mrk = 0.7)
draw_map draws a very simple graphic representation of the genetic map. More recently, we developed a new version called
draw_map2 that draws a more sophisticated figure. Furthermore, once the distances and the linkage phases are estimated, other map figures can be drawn by the user with any appropriate software.
The same figures did with
draw_map can be done with the
draw_map2 function. But it has different capacities and arguments. Here are some examples, but you can find more options on the help page
Let’s draw all three groups built:
draw_map2(LG1_f2_final, LG2_f2_final, LG3_f2_final main = "Only linkage information", group.names = c("LG1", "LG2", "LG3"), output = "map.eps")
NOTE: Check the GitHub vignette version to visualize the graphic.
You can include all
sequence objects referring to the groups as the first arguments. The
main argument defines the main title of the draw and
group.names define the names of each group. If no
output file and file extension is defined, the draw will be generated at your working directory as
map.eps. The eps extension is only the default option but there are others that can be used. You can have access to a list of them on the help page.
We also can draw a map for a specific linkage group:
draw_map2(LG1_f2_final, col.group = "#58A4B0", col.mark = "#335C81", output = "map_LG1.pdf")
NOTE: Check the GitHub vignette version to visualize the graphic.
You can also change the default colors using the
tag you can highlight some markers at the figure according to your specific purpose.
Warning: Only available for outcrossing and f2 intercross populations.
As already mentioned,
OneMap uses HMM multipoint approach to estimate genetic distances, a very robust method, but it can take time to run if you have many markers. In 2017, Schiffthaler et. al release an
OneMap fork with modifications in CRAN and in GitHub with the possibility of parallelizing the HMM chain dividing markers in batches and use different cores for each phase. Their approach speeds up our HMM and keeps the genetic distances estimation quality. It allows dividing the job into a maximum of four cores according to the four possible phases for outcrossing and f2 mapping populations. We add this parallelized approach to the functions:
ug. For better efficiency it is important that batches are composed of 50 markers or more, therefore, this approach is only recommended for linkage groups with many markers.
The parallelization is here available for all types of operational systems, however, we suggest setting argument
FORK if you are not using Windows system. It will improve the procedure speed.
Here we will show an example of how to use the BatchMap approach in some functions that requires HMM. For this, we simulated a dataset with a group with 300 markers (we don’t want this vignette to take too much time to run, but usually maps with markers from high-throughput technologies result in larger groups). Before start, you can see the time spent for each approach (See also Session Info) in this example:
|Without parallelization (h)||With parallelization (h)|
<- read_onemap(system.file("extdata/simParall_f2.raw", package = "onemap")) # dataset available only in onemap github version simParallel plot(simParallel)
# Calculates two-points recombination fractions <- rf_2pts(simParallel) twopts <- make_seq(twopts, "all") seq_all # There are no redundant markers find_bins(simParallel) # There are no distorted markers print(test_segregation(simParallel)) # Not shown
To prepare the data with defined bach size we use function
pick_batch_sizes. It selects a batch size that splits the data into even groups. Argument
size defines the batch size next to which an optimum size will be searched.
overlap defines the number of markers that overlap between the present batch and next. This is used because pre-defined phases at these overlap markers in the present batch are used to start the HMM in the next batch. The
around argument defines how much the function can vary around the defined number in
size to search for the optimum batch size.
Some aspects should be considered to define these arguments because if the batch size were set too high, there would be less gain in execution time. If the overlap size would be too small, phases would be incorrectly estimated and large gaps would appear in the map, inflating its size. In practice, these values will depend on many factors such as population size, marker quality, and species. BatchMap authors recommended trying several configurations on a subset of data and select the best performing one.
<- pick_batch_sizes(input.seq = seq_all, batch_size size = 80, overlap = 30, around = 10) batch_size
To use the parallelized approach you just need to include the arguments when using the functions:
# Without parallelization <- rcd(input.seq = seq_all) rcd_map # With parallelization <- rcd(input.seq = seq_all, rcd_map_par phase_cores = 4, size = batch_size, overlap = 30)
# Without parallelization <- record(input.seq = seq_all) record_map # With parallelization <- record(input.seq = seq_all, record_map_par phase_cores = 4, size = batch_size, overlap = 30)
# Without parallelization <- ug(input.seq = seq_all) ug_map # With parallelization <- ug(input.seq = seq_all, ug_map_par phase_cores = 4, size = batch_size, overlap = 30)
# Without parallelization ok <- mds_onemap(input.seq = seq_all) map_mds # With parallelization <- mds_onemap(input.seq = seq_all, map_mds_par phase_cores = 4, size = batch_size, overlap = 30)
Because we simulate this dataset we know the correct order. We can use
map_overlapping_batches to estimate genetic distance in this case. This is equivalent to
map, but with the parallelized process.
map, using argument
rm_unlinked = TRUE the function will return a vector with marker numbers without the problematic marker. To repeat the analysis removing automatically all problematic markers use
# Without parallelization <- map_avoid_unlinked(input.seq = seq_all) batch_map # With parallelization <- map_avoid_unlinked(input.seq = seq_all, batch_map_par size = batch_size, phase_cores = 4, overlap = 30)
As you can see in the above maps, heuristic ordering algorithms do not return an optimal order result, mainly if you don’t have many individuals in your population. Because of the erroneous order, generated map sizes are not close to the simulated size (100 cM) and their heatmaps don’t present the expected color pattern. Two of them get close to the color pattern, they are the ug and the MDS method. They present good global ordering but not local. If you have a reference genome, you can use its position information to rearrange local ordering. # Export estimated progeny haplotypes (new!)
progeny_haplotypes generates a data.frame with progeny phased haplotypes estimated by
OneMap HMM. For progeny, the HMM results in probabilities for each possible genotype, then the generated data.frame contains all possible genotypes. If
most_likely = TRUE the most likely genotype receives 1 and the rest 0 (if there are two most likely both receive 0.5), if
most_likely = FALSE genotypes probabilities will be according to the HMM results. You can choose which individual to be evaluated in
ind. The data.frame is composed of the information: individual (ind) and group (grp) ID, position in centimorgan (pos), progeny homologs (homologs), and from each parent the allele came (parents).
<- progeny_haplotypes(LG1_f2_final, most_likely = TRUE, ind = 2, group_names = "LG1_final")) (progeny_haplot #> ind marker grp pos prob progeny.homologs parents allele #> 1 IND2 SNP24 LG1_final 0.000000 1 H1 P1 H1_P1 #> 2 IND2 M33 LG1_final 5.532529 1 H1 P1 H1_P1 #> 3 IND2 SNP26 LG1_final 6.972824 1 H1 P1 H1_P1 #> 4 IND2 M37 LG1_final 13.329682 1 H1 P1 H1_P1 #> 5 IND2 M8 LG1_final 19.150158 1 H1 P1 H1_P1 #> 6 IND2 M51 LG1_final 30.869235 0 H1 P1 H1_P1 #> 7 IND2 M5 LG1_final 34.616980 0 H1 P1 H1_P1 #> 8 IND2 M25 LG1_final 37.239805 0 H1 P1 H1_P1 #> 9 IND2 M61 LG1_final 42.791710 0 H1 P1 H1_P1 #> 10 IND2 M66 LG1_final 44.474420 0 H1 P1 H1_P1 #> 11 IND2 M54 LG1_final 51.417445 0 H1 P1 H1_P1 #> 12 IND2 M45 LG1_final 53.927475 0 H1 P1 H1_P1 #> 13 IND2 M32 LG1_final 58.953438 0 H1 P1 H1_P1 #> 14 IND2 M43 LG1_final 63.283207 0 H1 P1 H1_P1 #> 15 IND2 M11 LG1_final 66.798033 0 H1 P1 H1_P1 #> 16 IND2 M2 LG1_final 69.577310 0 H1 P1 H1_P1 #> 17 IND2 M10 LG1_final 77.018309 0 H1 P1 H1_P1 #> 18 IND2 M41 LG1_final 85.500234 0 H1 P1 H1_P1 #> 19 IND2 SNP24 LG1_final 0.000000 0 H1 P2 H1_P2 #> 20 IND2 M33 LG1_final 5.532529 0 H1 P2 H1_P2 #> 21 IND2 SNP26 LG1_final 6.972824 0 H1 P2 H1_P2 #> 22 IND2 M37 LG1_final 13.329682 0 H1 P2 H1_P2 #> 23 IND2 M8 LG1_final 19.150158 0 H1 P2 H1_P2 #> 24 IND2 M51 LG1_final 30.869235 1 H1 P2 H1_P2 #> 25 IND2 M5 LG1_final 34.616980 1 H1 P2 H1_P2 #> 26 IND2 M25 LG1_final 37.239805 1 H1 P2 H1_P2 #> 27 IND2 M61 LG1_final 42.791710 1 H1 P2 H1_P2 #> 28 IND2 M66 LG1_final 44.474420 1 H1 P2 H1_P2 #> 29 IND2 M54 LG1_final 51.417445 1 H1 P2 H1_P2 #> 30 IND2 M45 LG1_final 53.927475 1 H1 P2 H1_P2 #> 31 IND2 M32 LG1_final 58.953438 1 H1 P2 H1_P2 #> 32 IND2 M43 LG1_final 63.283207 1 H1 P2 H1_P2 #> 33 IND2 M11 LG1_final 66.798033 1 H1 P2 H1_P2 #> 34 IND2 M2 LG1_final 69.577310 1 H1 P2 H1_P2 #> 35 IND2 M10 LG1_final 77.018309 1 H1 P2 H1_P2 #> 36 IND2 M41 LG1_final 85.500234 1 H1 P2 H1_P2 #> 37 IND2 SNP24 LG1_final 0.000000 0 H2 P1 H2_P1 #> 38 IND2 M33 LG1_final 5.532529 0 H2 P1 H2_P1 #> 39 IND2 SNP26 LG1_final 6.972824 0 H2 P1 H2_P1 #> 40 IND2 M37 LG1_final 13.329682 0 H2 P1 H2_P1 #> 41 IND2 M8 LG1_final 19.150158 0 H2 P1 H2_P1 #> 42 IND2 M51 LG1_final 30.869235 0 H2 P1 H2_P1 #> 43 IND2 M5 LG1_final 34.616980 0 H2 P1 H2_P1 #> 44 IND2 M25 LG1_final 37.239805 0 H2 P1 H2_P1 #> 45 IND2 M61 LG1_final 42.791710 0 H2 P1 H2_P1 #> 46 IND2 M66 LG1_final 44.474420 0 H2 P1 H2_P1 #> 47 IND2 M54 LG1_final 51.417445 0 H2 P1 H2_P1 #> 48 IND2 M45 LG1_final 53.927475 0 H2 P1 H2_P1 #> 49 IND2 M32 LG1_final 58.953438 0 H2 P1 H2_P1 #> 50 IND2 M43 LG1_final 63.283207 0 H2 P1 H2_P1 #> 51 IND2 M11 LG1_final 66.798033 0 H2 P1 H2_P1 #> 52 IND2 M2 LG1_final 69.577310 0 H2 P1 H2_P1 #> 53 IND2 M10 LG1_final 77.018309 0 H2 P1 H2_P1 #> 54 IND2 M41 LG1_final 85.500234 0 H2 P1 H2_P1 #> 55 IND2 SNP24 LG1_final 0.000000 1 H2 P2 H2_P2 #> 56 IND2 M33 LG1_final 5.532529 1 H2 P2 H2_P2 #> 57 IND2 SNP26 LG1_final 6.972824 1 H2 P2 H2_P2 #> 58 IND2 M37 LG1_final 13.329682 1 H2 P2 H2_P2 #> 59 IND2 M8 LG1_final 19.150158 1 H2 P2 H2_P2 #> 60 IND2 M51 LG1_final 30.869235 1 H2 P2 H2_P2 #> 61 IND2 M5 LG1_final 34.616980 1 H2 P2 H2_P2 #> 62 IND2 M25 LG1_final 37.239805 1 H2 P2 H2_P2 #> 63 IND2 M61 LG1_final 42.791710 1 H2 P2 H2_P2 #> 64 IND2 M66 LG1_final 44.474420 1 H2 P2 H2_P2 #> 65 IND2 M54 LG1_final 51.417445 1 H2 P2 H2_P2 #> 66 IND2 M45 LG1_final 53.927475 1 H2 P2 H2_P2 #> 67 IND2 M32 LG1_final 58.953438 1 H2 P2 H2_P2 #> 68 IND2 M43 LG1_final 63.283207 1 H2 P2 H2_P2 #> 69 IND2 M11 LG1_final 66.798033 1 H2 P2 H2_P2 #> 70 IND2 M2 LG1_final 69.577310 1 H2 P2 H2_P2 #> 71 IND2 M10 LG1_final 77.018309 1 H2 P2 H2_P2 #> 72 IND2 M41 LG1_final 85.500234 1 H2 P2 H2_P2
You can also have a view of progeny estimated haplotypes using
plot. It shows which markers came from each parent’s homologs.
position argument defines if haplotypes will be plotted by homologs (
stack) or alleles (
split option is a good way to better view the likelihoods of each allele.
plot(progeny_haplot, position = "stack")
plot(progeny_haplot, position = "split")
At this point, it should be clear that any potential
OneMap user must have some knowledge about genetic mapping and also the R language, because the analysis is not done with only one mouse click. In the future, perhaps a graphical interface will be made available to make this software is a lot easier to use.
We do hope that
OneMap is useful to researchers interested in genetic mapping in outcrossing or inbred-based populations. Any suggestions and critics are welcome.
sessionInfo() #> R version 4.1.0 (2021-05-18) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 10 x64 (build 22000) #> #> Matrix products: default #> #> locale: #>  LC_COLLATE=C #>  LC_CTYPE=English_United States.1252 #>  LC_MONETARY=English_United States.1252 #>  LC_NUMERIC=C #>  LC_TIME=English_United States.1252 #> #> attached base packages: #>  stats graphics grDevices utils datasets methods base #> #> other attached packages: #>  stringr_1.4.0 onemap_2.8.2 #> #> loaded via a namespace (and not attached): #>  minqa_1.2.4 colorspace_2.0-2 ggsignif_0.6.3 #>  ellipsis_0.3.2 class_7.3-19 htmlTable_2.3.0 #>  base64enc_0.1-3 rstudioapi_0.13 proxy_0.4-26 #>  mice_3.14.0 farver_2.1.0 ggpubr_0.4.0 #>  fansi_0.5.0 codetools_0.2-18 splines_4.1.0 #>  doParallel_1.0.16 knitr_1.36 Formula_1.2-4 #>  polynom_1.4-0 jsonlite_1.7.2 nloptr_22.214.171.124 #>  broom_0.7.10 cluster_2.1.2 png_0.1-7 #>  compiler_4.1.0 httr_1.4.2 backports_1.2.1 #>  assertthat_0.2.1 Matrix_1.3-3 fastmap_1.1.0 #>  lazyeval_0.2.2 htmltools_0.5.2 tools_4.1.0 #>  gtable_0.3.0 glue_1.4.2 rebus.base_0.0-3 #>  reshape2_1.4.4 dplyr_1.0.7 Rcpp_1.0.7 #>  carData_3.0-4 jquerylib_0.1.4 vctrs_0.3.8 #>  ape_5.5 gdata_2.18.0 nlme_3.1-152 #>  iterators_1.0.13 pinfsc50_1.2.0 xfun_0.24 #>  rebus.datetimes_0.0-1 lme4_1.1-27.1 lifecycle_1.0.1 #>  rebus.numbers_0.0-1 weights_1.0.4 gtools_3.9.2 #>  rstatix_0.7.0 dendextend_1.15.2 princurve_2.1.6 #>  candisc_0.8-6 MASS_7.3-54 scales_1.1.1 #>  heplots_1.3-9 parallel_4.1.0 smacof_2.1-3 #>  RColorBrewer_1.1-2 yaml_2.2.1 gridExtra_2.3 #>  ggplot2_3.3.5 sass_0.4.0 rpart_4.1-15 #>  latticeExtra_0.6-29 stringi_1.6.2 highr_0.9 #>  foreach_1.5.1 plotrix_3.8-2 permute_0.9-5 #>  e1071_1.7-9 checkmate_2.0.0 boot_1.3-28 #>  rlang_0.4.11 pkgconfig_2.0.3 evaluate_0.14 #>  lattice_0.20-44 purrr_0.3.4 labeling_0.4.2 #>  htmlwidgets_1.5.4 tidyselect_1.1.1 plyr_1.8.6 #>  magrittr_2.0.1 R6_2.5.1 generics_0.1.1 #>  nnls_1.4 Hmisc_4.6-0 DBI_1.1.1 #>  mgcv_1.8-35 pillar_1.6.4 foreign_0.8-81 #>  withr_2.4.3 rebus_0.1-3 survival_3.2-11 #>  abind_1.4-5 nnet_7.3-16 rebus.unicode_0.0-2 #>  tibble_3.1.2 crayon_1.4.2 car_3.0-12 #>  wordcloud_2.6 utf8_1.2.1 ellipse_0.4.2 #>  plotly_4.10.0 vcfR_1.12.0 rmarkdown_2.11 #>  viridis_0.6.2 jpeg_0.1-9 grid_4.1.0 #>  data.table_1.14.0 vegan_2.5-7 digest_0.6.27 #>  tidyr_1.1.3 munsell_0.5.0 viridisLite_0.4.0 #>  bslib_0.3.1
Adler, J. R in a Nutshell A Desktop Quick Reference, 2009.
Broman, K. W., Wu, H., Churchill, G., Sen, S., Yandell, B. qtl: Tools for analyzing QTL experiments R package version 1.09-43, 2008.
Buetow, K. H., Chakravarti, A. Multipoint gene mapping using seriation. I. General methods. American Journal of Human Genetics 41, 180-188, 1987.
Doerge, R.W. Constructing genetic maps by rapid chain delineation. Journal of Agricultural Genomics 2, 1996.
Garcia, A.A.F., Kido, E.A., Meza, A.N., Souza, H.M.B., Pinto, L.R., Pastina, M.M., Leite, C.S., Silva, J.A.G., Ulian, E.C., Figueira, A. and Souza, A.P. Development of an integrated genetic map of a sugarcane Saccharum spp. commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases. Theoretical and Applied Genetics 112, 298-314, 2006.
Haldane, J. B. S. The combination of linkage values and the calculation of distance between the loci of linked factors. Journal of Genetics 8, 299-309, 1919.
Jiang, C. and Zeng, Z.-B. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101, 47-58, 1997.
Kosambi, D. D. The estimation of map distance from recombination values. Annuaire of Eugenetics 12, 172-175, 1944.
Lander, E. S. and Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363-2367, 1987.
Lander, E.S., Green, P., Abrahanson, J., Barlow, A., Daly, M.J., Lincoln, S.E. and Newburg, L. MAPMAKER, An interactive computing package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1, 174-181, 1987.
Lincoln, S. E., Daly, M. J. and Lander, E. S. Constructing genetic linkage maps with MAPMAKER/EXP Version 3.0: a tutorial and reference manual. A Whitehead Institute for Biomedical Research Technical Report 1993.
Matloff, N. The Art of R Programming. 2011. 1st ed. San Francisco, CA: No Starch Press, Inc., 404 pages.
Margarido, G. R. A., Souza, A.P. and Garcia, A. A. F. OneMap: software for genetic mapping in outcrossing species. Hereditas 144, 78-79, 2007.
Mollinari, M., Margarido, G. R. A., Vencovsky, R. and Garcia, A. A. F. Evaluation of algorithms used to order markers on genetics maps. Heredity 103, 494-502, 2009.
Oliveira, K.M., Pinto, L.R., Marconi, T.G., Margarido, G.R.A., Pastina, M.M., Teixeira, L.H.M., Figueira, A.M., Ulian, E.C., Garcia, A.A.F., Souza, A.P. Functional genetic linkage map on EST-markers for a sugarcane (Saccharum spp.) commercial cross. Molecular Breeding 20, 189-208, 2007.
Oliveira, E. J., Vieira, M. L. C., Garcia, A. A. F., Munhoz, C. F.,Margarido, G. R.A., Consoli, L., Matta, F. P., Moraes, M. C., Zucchi, M. I., and Fungaro,M. H. P. An Integrated Molecular Map of Yellow Passion Fruit Based on Simultaneous Maximum-likelihood Estimation of Linkage and Linkage Phases J. Amer. Soc. Hort. Sci. 133, 35-41, 2008.
Tan, Y., Fu, Y. A novel method for estimating linkage maps. Genetics 173, 2383-2390, 2006.
Van Os H, Stam P, Visser R.G.F., Van Eck H.J. RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112, 30-40, 2005.
Voorrips, R.E. MapChart: software for the graphical presentation of linkage maps and QTLs. Journal of Heredity 93, 77-78, 2002.
Wang S., Basten, C. J. and Zeng Z.-B. Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC, 2010. https://brcwebportal.cos.ncsu.edu/qtlcart/
Wu, R., Ma, C.X., Painter, I. and Zeng, Z.-B. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theoretical Population Biology 61, 349-363, 2002a.
Wu, R., Ma, C.-X., Wu, S. S. and Zeng, Z.-B. Linkage mapping of sex-specific differences. Genetical Research 79, 85-96, 2002b.