About Me Goals and Personality






My name is Dr. Natascha Ivy-Israël and I recently graduated with high honors from Auburn University with a PhD in Wildlife Science. I am a firm believer in hard work, compassion and perserverance. My current goal is to find a job as a data scientist where I can learn from others and expand my knowledge base and skillset and leverage my deep research skills and inate curiousity. To learn more about me, please visit my facebook profile and send me a friend request. If you want an insight into my personality, please call me for an interview. You may also click on the infographic for a summary of the results from an online personality profile test taken at talentoday.com. I scored as organized, ambitious, patient and determined.

Personality Radar

Education Formal Academic Training

I believe education is a life-long endeavor. I learn best from reading, practice and self-study, but I appreciate a good teacher. I enjoy learning anything new and I always excel in my academic pursuits. Education is my passion and I love spending time in the classroom interacting with others.

Graduted 8/2019
Overall GPA 4.0
PhD
  • PhD in Wildlife Sciences
  • Full Ride Academic Scholarship (Graduate Teaching Assistantship)
  • Graduated Summa Cum Laude
  • Download Transcript
Graduated 12/2013
Overall GPA 4.0
M.S.
  • Master of Science in Wildlife, Aquatic, and Wildlands Science and Management
  • Full Ride Academic Scholarship (Research Assistantship)
  • Graduated Summa Cum Laude
  • Biology Student Organization Secretary, 2012-2013
  • Download Transcript
Graduated 5/2011
Overall GPA 3.89
B.S.
  • Bachelor of Science in Biology
  • Magna Cum Laude
  • Fine Arts Minor
  • Phi Betta Kappa, Beta Beta Beta, Golden Key Honor Society
  • Download Transcript
Graduated 5/2007
Overall GPA 3.71
H.S. Diploma
  • Captain of Girl's volleyball team and of Coed volleyball team
  • Participated in cross country running and track & field
  • Played violin in orchestra
  • Participated in several school plays
  • Advanced mathemetics, biology, chemistry
  • Download Transcript

Certifications Industry skills and training


Research Doctoral

My current focus for research is on studying the influence of the Major Histocompatibility Complex on the reproductive success of breeding pairs in white-tailed deer (Odocoileus virginaianus)

Abstract

Major histocompatibility complex (MHC) gene products can influence sexual selection through their impact on the vertebrate immune system. Individuals with greater MHC diversity are generally believed to have more effective immune systems, thereby allowing these individuals to allocate more resources towards growth and reproduction. However, maximum MHC diversity may be too costly for the individual, suggesting that maximum diversity is not always optimal. This research examined how MHC diversity, measured as pairwise allelic distances between two unlinked MHC type II loci (exon 2 for the classical antigen-binding protein MHC-DRB, exon 2 for the accessory protein MHC-DOB) influenced morphology (Chapter 2), annual reproductive success (Chapter 3), and pre- and post-copulatory selection (Chapter 4) in an enclosed white-tailed deer (Odocoileus virginianus) population in Alabama. To generate these allelic distances, we first sequenced the second exons of MHC-DRB and MHC-DOB on the MiSeq platform (Chapter 1). Since studies conducted with domestic ruminants found a unique MHC II gene structure in which MHC-DRB and MHC-DOB were separated by a recombination hotspot due to an ancestral chromosomal inversion, we also assessed the degree of linkage between these loci in white-tailed deer.

Downloads

  • Click here to download the original full proposal in word format
  • Click here to download a Poster on one of my dissertation chapters in power point format

Masters Research

This study was the focus of my master's thesis and it focused on water quality as affected by golden algae in the Pecos river basin in Texas and New Mexico.

Publication

Golden alga presence and abundance are inversely related to salinity in a high-salinity river ecosystem, Pecos River, USA

Natascha M.D. Israël, Matthew M. VanLandeghem, Shawn Denny, John Ingle, Reynaldo Patiño

Journal Homepage:
www.elsevier.com/locate/hal

Direct Link:
https://www.sciencedirect.com/science/article/pii/S1568988314001140

Download

  • Click here to download the article (pdf, 800K)
  • Click to download the entire thesis (pdf, 2MB)

Other Research

The School for Field Studies

Coursework:
  • Rainforest Ecology
  • Principles of Forest Management
  • Environmental Policy and Socioeconomic Values
Independent Directed Research:

How can we attract spiders to tropical restoration sites?

TCNJ Worm Lab

Duties:
  • Preparation and seeding of Nematode Growth Media (NGM) Plates, Luria-Bertani (LB) Plates, LB-Antibiotic Plates, etc.
  • Subject selection, preparation, dissection, etc.
  • Prepared reagents such as buffers, solutions, etc.

Elderkin Conservation Genetics Lab,

Duties:
  • Mitochondrial DNA sequencing of fresh water mussels
  • Performed Polymerase Chain Reactions (PCR)
  • Trained to perform Gel Electrophoresis

Code Samples Practical Application

Formatting Data for Statistical Analysis


This program uses python to extract pairwise allelic distances from distance matrices for 2 MHC II loci (DRB, DOB) and outputs the data as csv for statistical analysis and comparison. Each matrix contains the number of nucleotide and amino acid differences between all characterized alleles for both DRB and DOB separately. The matrices were initially generated using geneious. The alleles at these two loci for each individual were determined using bioinformatics in the first chapter of my disseration. For an example of how this was achieved, see the BASH coding snippet. You can also check out my github account for other python coding samples

                                            
separator = ','

filename = "/mnt/c/Users/Natas/Desktop/Chapter4_distances.csv"
w = open(filename, "w")

#loading drb distances
filename = "/mnt/c/Users/Natas/Desktop/drb_distances.csv"
f = open(filename, "r")


drb_distances = {}
line = f.readline()
data = line[:-2].split(",");

for col in data:            
    key = col.replace("*", "_")            
    drb_distances[key] = {}    

lines = f.readlines()

for line in lines:
    
    data = line[:-2].split(",");
    idx = data[0].replace("*", "_")
    keyCount = 1
    
    for col in data[1:]:
        key = "DRB_" + "%02d" % keyCount
        
        if idx not in drb_distances[key]:
            drb_distances[key][idx] = {}
            
        drb_distances[key][idx] = col
        keyCount += 1

assert drb_distances["DRB_13"]["DRB_20"] == "16"
assert drb_distances["DRB_29"]["DRB_30"] == "23"

print "loaded drb successfully"

f.close()

#loading full dob distances
filename = "/mnt/c/Users/Natas/Desktop/full_dob_distances.csv"
f = open(filename, "r")


dob_full_distances = {}
line = f.readline()
data = line[:-2].split(",");

for col in data:
    key = col.replace("*", "_")            
    dob_full_distances[key] = {}    

lines = f.readlines()

for line in lines:
    
    data = line[:-2].split(",");
    idx = data[0].replace("*", "_")
    keyCount = 1
    
    for col in data[1:]:
        key = "DOB_" + "%02d" % keyCount
        
        if idx not in dob_full_distances[key]:
            dob_full_distances[key][idx] = {}
            
        dob_full_distances[key][idx] = col
        keyCount += 1            

assert dob_full_distances["DOB_04"]["DOB_07"] == "3"
assert dob_full_distances["DOB_10"]["DOB_11"] == "2"

f.close()

print "loaded full drb distances"


#loading exon dob distances
filename = "/mnt/c/Users/Natas/Desktop/dob_exon_distances.csv"
f = open(filename, "r")


dob_exon_distances = {}
line = f.readline()
data = line[:-2].split(",");

for col in data:
    key = col.replace("*", "_")[:-12]            
    dob_exon_distances[key] = {}    

lines = f.readlines()

for line in lines:
    
    data = line[:-2].split(",");
    idx = data[0].replace("*", "_")[:-12]    
    keyCount = 1
    
    for col in data[1:]:
        key = "DOB_" + "%02d" % keyCount
        
        if idx not in dob_exon_distances[key]:
            dob_exon_distances[key][idx] = {}
            
        dob_exon_distances[key][idx] = col
        keyCount += 1            

assert dob_exon_distances["DOB_04"]["DOB_07"] == "1"
assert dob_exon_distances["DOB_06"]["DOB_07"] == "2"

f.close()

print "loaded exon drb distances"

#load drb allele info from file
filename = "/mnt/c/Users/Natas/Desktop/chap4.csv"
f = open(filename, "r")
w.write(f.readline()[:-2] + "mmDrbNuc, mmDrbAa, ffDrbNuc, ffDrbAa, m1f1DrbNuc, m1f1DrbAa, m1f2DrbNuc, m1f2DrbAa, m2f1DrbNuc, m2f1DrbAa, m2f2DrbNuc, m2f2DrbAa, mmDobNuc, mmDobAa, ffDobNuc, ffDobAa, m1f1DobNuc, m1f1DobAa, m1f2DobNuc, m1f2DobAa, m2f1DobNuc, m2f1DobAa, m2f2DobNuc, m2f2DobAa \n")
lines = f.readlines()
count = 0

for line in lines:
    
    count += 1
    
    data = line[:-2].split(",")
   
    mDrb_a1 = data[13]
    mDrb_a2 = data[14]

    fDrb_a1 = data[15]
    fDrb_a2 = data[16]

    mDob_fa1 = data[17]
    mDob_fa2 = data[18]

    fDob_fa1 = data[19]
    fDob_fa2 = data[20]

    mDob_xa1 = data[21]
    mDob_xa2 = data[22]

    fDob_xa1 = data[23]
    fDob_xa2 = data[24]

    mmDrbNuc,mmDrbAa, ffDrbNuc,ffDrbAa, m1f1DrbNuc, m1f1DrbAa, m1f2DrbNuc, m1f2DrbAa, m2f1DrbNuc, m2f1DrbAa, m2f2DrbNuc, m2f2DrbAa = "", "", "", "", "", "", "", "", "", "", "", ""
    mmDobNuc,mmDobAa, ffDobNuc,ffDobAa, m1f1DobNuc, m1f1DobAa, m1f2DobNuc, m1f2DobAa, m2f1DobNuc, m2f1DobAa, m2f2DobNuc, m2f2DobAa = "", "", "", "", "", "", "", "", "", "", "", ""

        
    ################################################################
    #                        DRB
    ################################################################
    
    #find female-female distances for drb
    if int(fDrb_a1[-2:]) < int(fDrb_a2[-2:]):
        ffDrbNuc = drb_distances[fDrb_a1][fDrb_a2]
        ffDrbAa = drb_distances[fDrb_a2][fDrb_a1]
    else:
        ffDrbNuc = drb_distances[fDrb_a2][fDrb_a1]
        ffDrbAa = drb_distances[fDrb_a1][fDrb_a2]

    if "-" in ffDrbNuc:
        ffDrbNuc = "0"
    if "-" in ffDrbAa:
        ffDrbAa = "0"

    assert int(ffDrbNuc) >= int(ffDrbAa)

    #find male-male distances for drb
    if int(mDrb_a1[-2:]) < int(mDrb_a2[-2:]):
        mmDrbNuc = drb_distances[mDrb_a1][mDrb_a2]
        mmDrbAa = drb_distances[mDrb_a2][mDrb_a1]
    else:
        mmDrbNuc = drb_distances[mDrb_a2][mDrb_a1]
        mmDrbAa = drb_distances[mDrb_a1][mDrb_a2]

    if "-" in mmDrbNuc:
        mmDrbNuc = "0"
    if "-" in mmDrbAa:
        mmDrbAa = "0"

    assert int(mmDrbNuc) >= int(mmDrbAa)

    #find m1-f1 distances for drb
    if int(mDrb_a1[-2:]) > int(fDrb_a1[-2:]):
        m1f1DrbNuc = drb_distances[fDrb_a1][mDrb_a1]
        m1f1DrbAa = drb_distances[mDrb_a1][fDrb_a1]
    else:
        m1f1DrbNuc = drb_distances[mDrb_a1][fDrb_a1]
        m1f1DrbAa = drb_distances[fDrb_a1][mDrb_a1]

    if "-" in m1f1DrbNuc:
        m1f1DrbNuc = "0"
    if "-" in m1f1DrbAa:
        m1f1DrbAa = "0"

    assert int(m1f1DrbNuc) >= int(m1f1DrbAa)

    #find m1-f2 distances for drb
    if int(mDrb_a1[-2:]) > int(fDrb_a2[-2:]):
        m1f2DrbNuc = drb_distances[fDrb_a2][mDrb_a1]
        m1f2DrbAa = drb_distances[mDrb_a1][fDrb_a2]
    else:
        m1f2DrbNuc = drb_distances[mDrb_a1][fDrb_a2]
        m1f2DrbAa = drb_distances[fDrb_a2][mDrb_a1]

    if "-" in m1f2DrbNuc:
        m1f2DrbNuc = "0"
    if "-" in m1f2DrbAa:
        m1f2DrbAa = "0"

    assert int(m1f2DrbNuc) >= int(m1f2DrbAa)

    #find m2-f1 distances for drb
    if int(mDrb_a2[-2:]) > int(fDrb_a1[-2:]):
        m2f1DrbNuc = drb_distances[fDrb_a1][mDrb_a2]
        m2f1DrbAa = drb_distances[mDrb_a2][fDrb_a1]
    else:
        m2f1DrbNuc = drb_distances[mDrb_a2][fDrb_a1]
        m2f1DrbAa = drb_distances[fDrb_a1][mDrb_a2]

    if "-" in m2f1DrbNuc:
        m2f1DrbNuc = "0"
    if "-" in m2f1DrbAa:
        m2f1DrbAa = "0"

    assert int(m2f1DrbNuc) >= int(m2f1DrbAa)

    #find m2-f2 distances for drb
    if int(mDrb_a2[-2:]) > int(fDrb_a2[-2:]):
        m2f2DrbNuc = drb_distances[fDrb_a2][mDrb_a2]
        m2f2DrbAa = drb_distances[mDrb_a2][fDrb_a2]
    else:
        m2f2DrbNuc = drb_distances[mDrb_a2][fDrb_a2]
        m2f2DrbAa = drb_distances[fDrb_a2][mDrb_a2]

    if "-" in m2f2DrbNuc:
        m2f2DrbNuc = "0"
    if "-" in m2f2DrbAa:
        m2f2DrbAa = "0"

    assert int(m2f2DrbNuc) >= int(m2f2DrbAa)

    #print the last 20 lines

    ########################################################
    #                   DOB
    ########################################################
     
     #find female-female distances for Dob
    if int(fDob_fa1[-2:]) < int(fDob_fa2[-2:]):
        ffDobNuc = dob_full_distances[fDob_fa1][fDob_fa2]        
    else:
        ffDobNuc = dob_full_distances[fDob_fa2][fDob_fa1]        

    if int(fDob_xa1[-2:]) < int(fDob_xa2[-2:]):
        ffDobAa = dob_exon_distances[fDob_xa1][fDob_xa2]
    else:
        ffDobAa = dob_exon_distances[fDob_xa2][fDob_xa1]

    if "-" in ffDobNuc:
        ffDobNuc = "0"
    if "-" in ffDobAa:
        ffDobAa = "0"

    if not int(ffDobNuc) >= int(ffDobAa):
        print ffDobNuc, ffDobAa
        
    assert int(ffDobNuc) >= int(ffDobAa)

    #find male-male distances for Dob
    if int(mDob_fa1[-2:]) < int(mDob_fa2[-2:]):
        mmDobNuc = dob_full_distances[mDob_fa1][mDob_fa2]        
    else:
        mmDobNuc = dob_full_distances[mDob_fa2][mDob_fa1]        

    if int(mDob_xa1[-2:]) < int(mDob_xa2[-2:]):
        mmDobAa = dob_exon_distances[mDob_xa1][mDob_xa2]
    else:
        mmDobAa = dob_exon_distances[mDob_xa2][mDob_xa1]
    

    if "-" in mmDobNuc:
        mmDobNuc = "0"
    if "-" in mmDobAa:
        mmDobAa = "0"

    assert int(mmDobNuc) >= int(mmDobAa)

    #find m1-f1 distances for Dob
    if int(mDob_fa1[-2:]) > int(fDob_fa1[-2:]):
        m1f1DobNuc = dob_full_distances[fDob_fa1][mDob_fa1]        
    else:
        m1f1DobNuc = dob_full_distances[mDob_fa1][fDob_fa1]        

    if int(mDob_xa1[-2:]) < int(fDob_xa1[-2:]):
        m1f1DobAa = dob_exon_distances[mDob_xa1][fDob_xa1]
    else:
        m1f1DobAa = dob_exon_distances[fDob_xa1][mDob_xa1]

    if "-" in m1f1DobNuc:
        m1f1DobNuc = "0"
    if "-" in m1f1DobAa:
        m1f1DobAa = "0"

    #assert int(m1f1DobNuc) >= int(m1f1DobAa)

    #find m1-f2 distances for Dob
    if int(mDob_fa1[-2:]) > int(fDob_fa2[-2:]):
        m1f2DobNuc = dob_full_distances[fDob_fa2][mDob_fa1]
    
    else:
        m1f2DobNuc = dob_full_distances[mDob_fa1][fDob_fa2]
    

    if int(mDob_xa1[-2:]) > int(fDob_xa2[-2:]):
        m1f2DobAa = dob_exon_distances[mDob_xa1][fDob_xa2]
    else:
        m1f2DobAa = dob_exon_distances[fDob_xa2][mDob_xa1]

    if "-" in m1f2DobNuc:
        m1f2DobNuc = "0"
    if "-" in m1f2DobAa:
        m1f2DobAa = "0"

    #assert int(m1f2DobNuc) >= int(m1f2DobAa)

    #find m2-f1 distances for Dob
    if int(mDob_fa2[-2:]) > int(fDob_fa1[-2:]):
        m2f1DobNuc = dob_full_distances[fDob_fa1][mDob_fa2]
        
    else:
        m2f1DobNuc = dob_full_distances[mDob_fa2][fDob_fa1]
        

    if int(mDob_xa2[-2:]) > int(fDob_xa1[-2:]):
        m2f1DobAa = dob_exon_distances[mDob_xa2][fDob_xa1]
    else:
        m2f1DobAa = dob_exon_distances[fDob_xa1][mDob_xa2]

    if "-" in m2f1DobNuc:
        m2f1DobNuc = "0"
    if "-" in m2f1DobAa:
        m2f1DobAa = "0"

    #assert int(m2f1DobNuc) >= int(m2f1DobAa)

    #find m2-f2 distances for Dob
    if int(mDob_fa2[-2:]) > int(fDob_fa2[-2:]):
        m2f2DobNuc = dob_full_distances[fDob_fa2][mDob_fa2]
        
    else:
        m2f2DobNuc = dob_full_distances[mDob_fa2][fDob_fa2]
        

    if int(mDob_xa2[-2:]) > int(fDob_xa2[-2:]):
        m2f2DobAa = dob_exon_distances[mDob_xa2][fDob_xa2]
    else:
        m2f2DobAa = dob_exon_distances[fDob_xa2][mDob_xa2]

    if "-" in m2f2DobNuc:
        m2f2DobNuc = "0"
    if "-" in m2f2DobAa:
        m2f2DobAa = "0"

    #assert int(m2f2DobNuc) >= int(m2f2DobAa)


    #####################################################################

    if count >= len(lines) - 20:
        print "DRB", mmDrbNuc, mmDrbAa, ffDrbNuc, ffDrbAa, m1f1DrbNuc, m1f1DrbAa, m1f2DrbNuc, m1f2DrbAa, m2f1DrbNuc, m2f1DrbAa, m2f2DrbNuc, m2f2DrbAa, "::::DOB", mmDobNuc, mmDobAa, ffDobNuc, ffDobAa, m1f1DobNuc, m1f1DobAa, m1f2DobNuc, m1f2DobAa, m2f1DobNuc, m2f1DobAa, m2f2DobNuc, m2f2DobAa

    ALL = mmDrbNuc, mmDrbAa, ffDrbNuc, ffDrbAa, m1f1DrbNuc, m1f1DrbAa, m1f2DrbNuc, m1f2DrbAa, m2f1DrbNuc, m2f1DrbAa, m2f2DrbNuc, m2f2DrbAa, mmDobNuc, mmDobAa, ffDobNuc, ffDobAa, m1f1DobNuc, m1f1DobAa, m1f2DobNuc, m1f2DobAa, m2f1DobNuc, m2f1DobAa, m2f2DobNuc, m2f2DobAa

    Append = separator.join(ALL)
        
    w.write(line[:-2] + Append + "\n")
    

print "done"
f.close()
w.close()
                                            
                                        

Batch Processing FASTQ Genetics Files


The following scripts were written to process, align and normalize the massive amounts of data stored in the popular genetics FASTQ file format. As part of my doctoral thesis, I extracted DNA from white-tailed deer and sent it to a lab for genetic sequencing. The lab did 1000 reads per individual and sent me the raw data files. I had to make sense of this data to determine if reproductive success is linked to the Major Histocompatibility Complex (MHC). A few simple bash scripts using grep and regular expressions makes for short work of this otherwise dauting and cumbersome task. To download this file, click the icon

                                                
### ORGANIZING DATA NEEDED FOR ANALYSES ###

#create a directory for each gene
mkdir /home/natascha/drb_combined
mkdir /home/natascha/dob

#change to the directory that contains natascha's DRB and DOB fastq files
cd /home/natascha/fastq_ni

#create a list of deer id's for which natascha has DRB sequenced fastq files
ls | grep "DeerMHC_R1.fastq" | cut -d "-" -f 2 > natascha_deermhc

#create a directory to filter out duplicates from caroline's fastq files and move there
mkdir /home/natascha/fastq_cm_unique
cd /home/natascha/fastq_cm_unique

#copy ALL of caroline's files to this directory, and the list of deer IDs for which Natascha has DRB sequence data
ls /home/natascha/fastq_cm/*.fastq | time parallel -j+0 --eta 'cp {} .'
cp /home/natascha/fastq_ni/natascha_deermhc

#for each deer id in the list
while read i; #create a variable called i
do

rm 4792-"$i"-DeerMHC_R*,fastq #delete caroline's version of the duplicated deer id file

done > natascha_deermhc

#move to the directory that will house the list of unique drb files
cd /home/natascha/drb_combined

#copy all files from caroline's filtered list to this directory
ls /home/natascha/fastq_cm_unique/*.fastq | time parallel -j+0 --eta 'cp {} .' 

#copy all natascha's DRB files to this directory
ls /home/natascha/fastq_ni/*DeerMCH*.fastq | time parallel -j+0 --eta 'cp {} .'

#move to the dob directory and copy all the dob files from nataschas data
cd /home/natascha/dob
ls /home/natascha/fastq_ni/*DOB*.fastq | time parallel -j+0 --eta 'cp {} .'

### FASTQC AND TRIMMOMATIC ###

##  Run fastqc on All the raw DOB files in parallel
ls *R1.fastq | time parallel -j+0 --eta 'fastqc {}'
ls *R2.fastq | time parallel -j+0 --eta 'fastqc {}'

##  Run fastqc on all the raw DRB files in parallel
cd /home/natascha/drb_combined
ls *R1.fastq | time parallel -j+0 --eta 'fastqc {}'
ls *R2.fastq | time parallel -j+0 --eta 'fastqc {}'

#get a list of unique batch_id+deer_id for drb
ls | grep "R1.fastq" | cut -d "DeerMCH" -f 1 > deer_list

#for each file in the list
while read i;
do
#run trim, supplying the R1 and R2 files, and creating the R1_paired/unpaired and R2 paired/unparied files
java -jar /opt/asn/apps/trimmomatic_0.35/Trimmomatic-0.35/trimmomatic-0.35.jar PE -threads 6 -phred33 "$i"DeerMCH_R1.fastq "$i"DeerMCH_R2.fastq "$i"DRB_R1_paired_threads.fastq "$i"DRB_R1_unpaired_threads.fastq "$i"DRB_R2_paired_threads.fastq "$i"DRB_R2_unpaired_threads.fastq LEADING:20 TRAILING:20 SLIDINGWINDOW:6:20 MINLEN:20 

#/mnt/c/Users/Natas/AppData/Local/Packages/Canonical

done < deer_list

############### Now assess Quality again
#fastqc on the cleaned paired fastq files in parallel for DRB
ls *_R1_paired_threads.fastq | parallel -j+0  --eta 'fastqc {}'
ls *_R2_paired_threads.fastq | parallel -j+0  --eta 'fastqc {}'

##### Make a directory for my results in my home folder
#cd ..
#mkdir /home/natascha/drb_trimmed
#cd drb_trimmed
#ls /home/natascha/drb_combined/*paired_threads.fastq | time parallel -j+0 --eta 'cp {} .'

#change to the DOB directory and create a list of files to trim
cd /home/natascha/dob
ls | grep "R1.fastq" | cut -d "DOB" -f 1 > deer_list

#for each file in the list
while read i;
do

#run trim, supplying the R1 and R2 files, and creating the R1_paired/unpaired and R2 paired/unparied files
java -jar /opt/asn/apps/trimmomatic_0.35/Trimmomatic-0.35/trimmomatic-0.35.jar PE -threads 6 -phred33 "$i"-DOB_R1.fastq "$i"DOB_R2.fastq "$i"DOB_R1_paired_threads.fastq "$i"DOB_R1_unpaired_threads.fastq "$i"DOB_R2_paired_threads.fastq "$i"DOB_R2_unpaired_threads.fastq ILLUMINACLIP:AdaptersToTrim.fa:2:30:10 HEADCROP:10 LEADING:30 TRAILING:30 SLIDINGWINDOW:6:30 MINLEN:36 

done < deer_list

############### Now assess Quality again
#fastqc on the cleaned paired fastq files in parallel for DOB
ls *_R1_paired_threads.fastq | parallel -j+0  --eta 'fastqc {}'
ls *_R2_paired_threads.fastq | parallel -j+0  --eta 'fastqc {}'

##### Make a directory for my results in my home folder
#cd ..
#mkdir /home/natascha/dob_trimmed
#cd dob_trimmed
#ls /home/natascha/dob/*paired_threads.fastq | time parallel -j+0 --eta 'cp {} .'



### ORGANIZING DATA FOR BWA & GATK AND PREPARING REQUIRED INDICES ###

#Make new directory for mapped DRB files and move into that directory
mkdir /home/natascha/DRB_Mapping
cd /home/natascha/DRB_Mapping

# Copy the Cleaned paired reads to your working directory for mappping
cp  /home/natascha/drb_trimmed/*_paired*.fastq .

# Copy the refernece gene exon 2 to your working directory
cp /home/natascha/Reference/DRB_ref_sequence.fasta .  

#Indexing reference library for BWA mapping:
        # -p is the prefix
        # -a is the algorithm (is) then the input file/cd 
bwa index -p WTD_DRB  -a is DRB_ref_sequence.fasta

# Required for gatk compatibility 
samtools faidx DRB_ref_sequence.fasta
picard CreateSequenceDictionary R=DRB_ref_sequence.fasta O=DRB_ref_sequence.dict

### PERFORMING ALIGNMENT WITH BWA AND GATK HAPLOTYPECALLER ###

####  Map paired files with BWA to WTD_DRB_allele01.Reference.fa using 6 threads
#### Example
	#bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
	# -t is the number of threads

# Make list of unique batch and deer ID numbers for all paired.fastq files
# Example: 5495-200934-DRB_R1_paired_threads.fastq --> 5495-200934-DRB
ls | grep "R1_paired_threads.fastq" | cut -d "_" -f 1 > list

while read i; # for each file in the list
do
	pear -f ${i}_R1_paired_threads.fastq -r ${i}_R2_paired_threads.fastq -o ./PEAR/${i}_pear
	z=$(echo $i | tr -d "-")
	bwa mem -t 4 -R "@RG\tID:$z\tLB:$z\tPL:ILLUMINA\tPM:MISEQ\tSM:$z" WTD_DRB  ./PEAR/${i}_pear.assembled.fastq > ./PEAR/${i}_aligned_reads.sam
	## convert .sam to .bam and sort the alignments
	samtools view -@ 4 -bS ./PEAR/${i}_aligned_reads.sam  | samtools sort -@ 4 -o ./PEAR/${i}_aligned_reads_sorted.bam   # Example Input: batch-deerID-DRB_All.sam; Output: batch-deerID-DRB_sorted.bam
	## index the sorted .bam
	samtools index 	./PEAR/${i}_aligned_reads_sorted.bam
	## Tally counts of reads mapped to reference gene and calcuate the stats. 
	samtools idxstats   ./PEAR/${i}_aligned_reads_sorted.bam > ./PEAR/${i}_new_Counts.txt # count info
	samtools flagstat 	./PEAR/${i}_aligned_reads_sorted.bam > ./PEAR/${i}_new_Stats.txt # alignment summary statistics
	gatk HaplotypeCaller -R DRB_ref_scaffold.fa -I ./PEAR/${i}_aligned_reads_sorted.bam -ERC GVCF -O ./PEAR/${i}_aligned_reads_sorted_GVCF.vcf # generate vcf file for each individual
	echo "-V" ${i}_aligned_reads_sorted_GVCF.vcf "\\" >> ./PEAR/vcommands
done < list	
	
#repeat all this for DOB files

### GATK VARIANT CALLING ###

# combine individualvcf files
#copy vcommands made in loop above and paste into command line for CombineGVCFs
gatk CombineGVCFs -R DRB_ref_sequence.fasta -V 4792-200825-DRB_aligned_reads_sorted_GVCF.vcf -V 4792-201209-DRB_aligned_reads_sorted_GVCF.vcf -V 5495-201709-DRB_aligned_reads_sorted_GVCF.vcf -O DRB_mergedGVCF.vcf

# run GenotypeGVCFs on combined vcf file
gatk GenotypeGVCFs -R DRB_ref_sequence.fasta -V DRB_mergedGVCF.vcf -O DRB_GVCF_finaloutput.vcf

### GATK VARIANT FILTERING ###

# separte SNPs and Indels into different vcf files using SelectVariants:
gatk SelectVariants -R DRB_ref_sequence.fasta -V DRB_GVCF_finaloutput.vcf -select-type-to-include SNP -O DRB_raw_SNPs.vcf
gatk SelectVariants -R DRB_ref_sequence.fasta -V DRB_GVCF_finaloutput.vcf -select-type-to-include INDEL -O DRB_raw_INDELs.vcf

# make SNP tables for SNPs and INDELS - first variant variables then GQ separately - used to assess how to filter variants & for making graphs in R studio
gatk VariantsToTable -R DRB_ref_sequence.fasta -V DRB_raw_SNPs.vcf -O DRB_raw_SNPs.table -F CHROM -F POS -F QUAL -F REF -F ALT -F QD -F SOR -F AN -F MQ -F MQRankSum -F ReadPosRankSum
gatk VariantsToTable -R DRB_ref_sequence.fasta -V DRB_raw_SNPs.vcf -O DRB_raw_SNPs_GQonly.table -GF GQ
# repeat for INDELs
# for graphs, paste GQ table in Excel - get average & median for each variant across all individuals

# generate statistics for unfiltered variants
vcftools --vcf raw_SNPs.vcf --TsTv-summary --out raw_SNPs_TsTv
vcftools --vcf raw_SNPs.vcf --het --out raw_SNPs_heterozygosity
vcftools --vcf raw_SNPs.vcf --hardy --out raw_SNPs_hardy
vcftools --vcf raw_SNPs.vcf --missing-site --out raw_SNPs_missing
vcftools --vcf raw_SNPs.vcf --singletons --out raw_SNPs_singletons
vcftools --vcf raw_SNPs.vcf --site-depth --out raw_SNPs_depth
# helpful link: https://vcftools.github.io/man_latest.html

# filter variants
gatk VariantFiltration -R DRB_ref_sequence.fasta -O DRB_SNPs_filtered.vcf -V DRB_raw_SNPs.vcf -filter-name "Monomorphic" -filter-expression "AF=1.0" -filter-name "MissingRate" -filter-expression "AN<700.0" -genotype-filter-name "LowGQ" -genotype-filter-expression "GQ<90.0" -filter-name "SOR" -filter-expression "SOR>6.0" --set-filtered-genotype-to-no-call true
#repeat for INDELs

# redo calculating missing rate to see how many individuals within a variant had lowGQ
vcftools --vcf DRB_SNPs_filtered.vcf --missing-site --out DRB_SNPs_filtered_missing
#repeat for INDELs

# filter out variants with LowGQ rate > 50% via position
gatk VariantFiltration -R DRB_ref_sequence.fasta -O DRB_SNPs_filtered2.vcf -V DRB_SNPs_filtered.vcf -filter-name "LowGQ" -filter-expression "POS=11027"
#repeat for INDELs

# create vcf file that only contains variants that passed filtering
gatk SelectVariants -V DRB_TOTAL_filtered.vcf -O DRB_PASSED_variants.vcf --exclude-filtered true

# merge filtered SNP and indel vcf files
picard MergeVcfs I=DRB_filtered2_SNPs.vcf I=DRB_filtered_INDELs.vcf O=DRB_total_filtered_variants.vcf

# phasing genotypes with beagle
beagle gt=DRB_total_filtered_variants.vcf out=DRB_total_filtered_variants_phased
# then unzip .vcf.gz file
gunzip file.vcf.gz

# generate 2 haplotypes for each individual
vcfx fasta input=DRB_total_filtered_variants_phased.vcf output=DRB_haplotypes reference=DRB_ref_sequence.fasta

#repeat all this for DOB files


                                        
                                    

Maximum Lynx and Coyote Kill Rates With Varying Prey Density



I have experience using R to help answer questions related to statistical analysis and also create visualizations of data. Here is one example that that highlights using bootstrap analysis in R to reveal insights about animal behavior and the predator/prey relationship.

Question

The purpose of this assignment was to analyze the difference in maximum lynx and coyote kill rates under varying prey (snow shoe hare) densities.

                                            
### import the boot package
utils:::menuInstallPkgs()
--- Please select a CRAN mirror for use in this session ---
Warning in install.packages(NULL, .libPaths()[1L], dependencies = NA, type = type) :
  'lib = "C:/Program Files/R/R-3.3.2/library"' is not writable
trying URL 'https://cran.cnr.berkeley.edu/bin/windows/contrib/3.3/boot_1.3-18.zip'
Content type 'application/zip' length 592353 bytes (578 KB)
downloaded 578 KB

package ‘boot’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\Natascha\AppData\Local\Temp\RtmpeoEYTi\downloaded_packages
> library(boot)

### import the data
> datum=read.csv(file.choose())
> head(datum)
   PreyDens CoyoteKill  LynxKill
1 0.0581923   0.522305 0.2232275
2 5.7209845   5.407257 6.1909726
3 8.1400914   2.334566 6.1543207
4 2.3738542   3.747407 4.5115388
5 9.2656310   4.770664 7.2036282
6 8.1604638   4.983919 5.9859710

### analyze the data

> parInit=list(a=1,b=1)
> CoyoteResults=nls(CoyoteKill~a*PreyDens/(b+PreyDens),data=datum,start=parInit)
> LynxResults=nls(LynxKill~a*PreyDens/(b+PreyDens),data=datum,start=parInit)

### a is the asymptotic value of the predator kill rate
### b is the prey density at which half the maximum prey kill rate is achieved

> summary(CoyoteResults)
Formula: CoyoteKill ~ a * PreyDens/(b + PreyDens)

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
a  2.02241    0.19528   10.36   <2e-16 ***
                                       b -0.67892 0.02647 -25.65   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.893 on 97 degrees of freedom

Number of iterations to convergence: 18
Achieved convergence tolerance: 5.271e-06

> summary(LynxResults)
Formula: LynxKill ~ a * PreyDens/(b + PreyDens)

Parameters:
  Estimate Std. Error t value Pr(>|t|)
a   7.9740     0.1869   42.66   <2e-16 ***
b   2.0009     0.1674   11.96   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4496 on 97 degrees of freedom

Number of iterations to convergence: 7
Achieved convergence tolerance: 1.222e-07

### calculate the estimated difference in the asymptotes
> CoyoteA=summary(CoyoteResults)$parameters[1] #extracts the value of ‘a’ from the coyote regression
> LynxA=summary(LynxResults)$parameters[1] #extracts the value of ‘a’ from the lynx regression
> Difference=LynxA-CoyoteA #calculates the difference in asymptotes
> Difference ### reports the difference in asymptotes
[1] 5.951564

### write a function to resample and reanalyze the data

> bootFunc=function(bootData,repeats){
+ parInit=list(a=1,b=1)
+ CoyoteResults=nls(CoyoteKill~a*PreyDens/(b+PreyDens),start=parInit,data=bootData[repeats,])
+ LynxResults=nls(LynxKill~a*PreyDens/(b+PreyDens),start=parInit,data=bootData[repeats,])
+ CoyoteA=summary(CoyoteResults)$parameters[1]
+ LynxA=summary(LynxResults)$parameters[1]
+ results=LynxA-CoyoteA
+ return(results)
+ }

### create a bootstrapped object that runs the function 1000 times

> boot.results=boot(datum,bootFunc,R=1000)
> boot.results

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = datum, statistic = bootFunc, R = 1000)


Bootstrap Statistics :
    original   bias    std. error
t1* 5.951564 -2.36019    1.005657

### calculate the confidence interval on the data

> boot.ci(boot.results)

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL :
boot.ci(boot.out = boot.results)

Intervals :
Level      Normal              Basic
95%   ( 6.341, 10.283 )   ( 5.915,  9.225 )

Level     Percentile            BCa
95%   ( 2.678,  5.988 )   ( 5.932,  6.392 )
Calculations and Intervals on Original Scale
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning messages:
1: In boot.ci(boot.results) :
  bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
  extreme order statistics used as endpoints


### Add comments here to interpret results
#are maximum lynx and coyote kill rates significantly different?

                                        
                                    

Answer

There is no “0” when you run sort(boot.results$t), suggesting that the p-value is significant [p < (1/1000)*2 ⇒ p < 0.002]. This means that the maximum lynx and coyote kill rates are significantly different from one another. The confidence intervals (2.678 – 5.988) includes the original difference estimate (5.952). Since the values are positive, it means that the asymptotic value of the predator kill rate for lynxes is greater than that for coyotes (as the difference was calculated as lynx minus coyote). This means that specialist predators, such as lynx, reach greater values of prey killed/predator/day with increasing prey density than generalist predators, such as coyotes. In other words, lynx will kill more snow shoe hares when the hare density is increased compared to coyotes. The plots are shown below using the code plot(boot.results).

Multivariate Stastical Analysis with SAS


Using the correlation matrix of job characteristics provided on the ISQS 6348 website, I fit parallel, tau-equivalent, and congeneric models in PROC CALIS in SAS. I then assessed the fit and true reliability of these models. Lastly, I performed factor analysis and principal components analysis on just the police applicant data provided on the ISQS 6348 website.

Below is the SAS code used and a few excerpts from the full paper. Please feel free to download the entire problem and solution

                                            

Title1 "Parallel model";
*parallel;
proc calis residual;
  var c1-c5 ;
   lineqs
        c1 =  1	f1 + e1,
        c2 =  1 f1 + e2,
        c3 =  1 f1 + e3,
        c4 =  1 f1 + e4,
        c5 =  1 f1 + e5;   
std
	e1-e5 = the1-the5,
    f1 =  varf1;
	the5 = the1;
	the4 = the1;
	the3 = the1;
	the2 = the1;
run;
title1;


title2 "Tau-equivalent Model";
*tau-equivalent;
proc calis residual;
var c1-c5;
lineqs
		c1 = 1 f1 + e1,
        c2 = 1 f1 + e2,
        c3 = 1 f1 + e3,
        c4 = 1 f1 + e4,
        c5 = 1 f1 + e5;
std
  	e1-e5 = the1-the5,
    f1 =  varf1;
run;
title2;


title3 "Congeneric model";
*congeneric;
proc calis residual;
var c1-c5;
lineqs
		c1 = beta1 f1 + e1,
        c2 = beta2 f1 + e2,
        c3 = beta3 f1 + e3,
        c4 = beta4 f1 + e4,
        c5 = beta5 f1 + e5;
std
	
    e1-e5 = the1-the5,
     f1 =  varf1;
run;
title3;

                                  
                                    

The above yields the following:

Fit statistic Parallel Tau-equivalent Congeneric
Chi-squared 35.7672 28.1596 21.2661
Chi-squared dF 13 9 4
Chi-squared p value 0.0006 0.0009 0.0003
RMSR 0.0314 0.0379 0.0204
RMSEA 0.0473 0.0521 0.0742

Overall, determination of the “best” model will depend on the choice of the fit statistics used to determine the best model, as some models perform well for some statistics and poorly for others. In this case, the parallel model appears to be the “best” because it matches the “good” fit criteria for the chi-squared value, RMSR, and RMSEA, whereas the tau-equivalent and congeneric models match the “good” fit criteria for only RMSR.

There are four resulting reliability estimates: (1) the true reliability of the parallel model (which is equal to its Cronbach’s alpha); (2) the true reliability estimate of the tau-equivalent model (which is also equal to its Cronbach’s alpha); (3) the true reliability estimate of the congeneric model; (4) the Cronbach’s alpha for the congeneric model (which was slightly lower than its true reliability estimate). When comparing these four reliability values, it seems that the tau-equivalent model has the best reliability estimate, as both its true reliability estimate and Cronbach’s alpha are the highest among the three models


Key Skills Expertise

Technology

I have strong computer skills and can quickly learn any software program or system. I have some experience with computer programming and would like to gain more. I am also an experienced academic internet researcher. Below are just a few programs and technologies I am familiar with.

  • Python
  • PAST, R, SAS
  • BASH scripting
  • Geneious
  • MEGA (Molecular Evolutionary Genetics Analysis)
  • OpenMEE
  • IQ-TREE, FigTree, etc.
  • ArcMap/GIS
  • Internet Research
  • Microsoft Office Suite

Math

I enjoy math and I have a natural ability to understand numbers and calculations. I've had several advanced math courses and I'm very strong in statsical analysis and stocastic reasoning. I'm mentioned in the acknowledgments and thank you portion of a statistical textbook by Dr. Peter Westfall entitled Understanding Advanced Stastical Methods (ISBN-10: 1466512105). Below are a few subjects I'm particularly strong in.

  • Multivariate Statistics
  • Advanced Calculus
  • Algebra
  • Physics
  • Etc.

Communication

I am a gifted communicator. I speak several languages with a high degree of proficiency. I know how to listen and when to talk. I am also an experienced technical and academic writer. I speak both English and Dutch fluently. I can also understand German, Flemish, French and Afrikaans although I don't speak those languages fluently.

Tutoring

I have experience leading both individual and group tutoring sessions for pay and at the request of my classmates. I am blessed with a knack for explaining complicated topics in a way that is concise, intuitive, and easy to understand. Following are a few of the subjects I've tutored others in:

  • Stastics
  • Biochemistry
  • Anatomy
  • Physiology


Work Experience Employment History

Graduate Research Assistant

6/2018 - 8/2019
Auburn University
Auburn, AL
  • Performing bioinformatics on sequence data
  • Statistical analyses for dissertation
  • Responsible for leading deer darting
  • Organize undergraduate volunteers, handle tranquilizer drugs, collect blood and tissue samples, etc
  • Extracting DNA from deer tissue samples

Graduate Teaching Assistant

1/2016 - 6/2018
Auburn University
Auburn, AL
  • Writing consultant for School of Forestry and Wildlife Science as part of Miller Writing Center of Auburn University
  • Writing consultant and editor at Solon Dixon Forestry Education Center during summers
  • Responsible for leading deer darting, which involved organizing undergraduate volunteers, handling tranquilizer drugs, collecting blood and tissue samples, etc
  • Extracting DNA from deer tissue samples

Graduate Research Assistant

1/2012 - 12/2013
Texas Tech University
Lubbock, TX
  • Collect samples along Pecos River
  • Run laboratory analyses on water samples
  • Identify algae present in water samples (both for the project and concerned private landowners)
  • Work together with NMDGF, TPWD, and private landowners for access to sample collection locations along Pecos River
  • Statistical analyses for thesis

Assistant Manager

9/2008 - 8/2010
Hawknest, LLC.
Mahwah, NJ
  • 'A' circuit hunter/jumper equestrian facility operations management
  • Manage horse care, feeding and turnout schedules for 20+ horses
  • Architect and Implement a daily Horse training program for 5+ horses
  • Assist in Daily Riding lessons as well as instuctor development training
  • Logistics Coordination with suppliers, ferrier, vets, deliveries, etc.

Vet Tech/Assistant

12/2010 - 7/2011
Pompton Lakes Animal Hospital
Pompton Lakes, NJ
  • Prepare animals for surgeries and assist during surgery
  • Intake processing of animals including health assessments and questionairres
  • Feeding, grooming walking, etc. of animals in the clinic
  • Administor medications and injections
  • Manage and update computer records

Internships

  • Dr. George Cattiny DVM, Pompton Lakes, NJ
  • Dr. William J Howard DVM, Watertown, SD
  • Dr. Hans Coster DVM, Zwaanshoek, Netherlands
  • Dr. Stephen Oswald DC, Spring Valley, NY

Interests Hobbies & Likes

  • Horses
  • Dogs
  • Hiking
  • Rock Climbing
  • Violin
  • Art
  • Volleyball
  • Dancing
  • Boxing
  • Yoga
  • Reading
  • Traveling

Blog Thoughts and Papers

Please choose an article from the list below

Genetics 4

  • Applying Molecular Analyses using Microsattelite and Mitochondrial Sequence Data
  • The Major Histocompatibility Complex
  • Are Fish Hatchery Programs Really Helping Conserve Our Native Steelhead Trout Populations?
  • Effective Population Size as a Tool for Wildlife Management

Ecology 5

  • Animal Behavior and Fitness
  • Signaling theory: Reliability and Honesty
  • Importance of Functional Metrics in Bioassessment Surveys
  • Roads and Gene Flow: A Meta-analysis of the response of Invertebrates and Vertebrates to an Anthropogenic Barrier
  • FreeForAll Ranch: Deer Herd Assessment

Health 7

  • The Adaptive and Innate Immune Systems
  • Lactation: Sustaining a Costly Activity in a Nutritionally Limited Environment
  • The Effects of Prymnesium parvum (Golden Alga) on both the Natural Ecosystem and the Human Species
  • Importance of Chiropractic in Today's Society
  • Lyme Disease: A Comprehensive Review
  • Review of the blood type diet described in Eat Right for Your Type by Dr. Peter D'Adamo
  • The Effect of Chiropractic Care of the Health of the Gastrointenstinal Tract

Statistics 3

  • Advanced Statistical Methods: Discrete Distributions
  • Statistical Interactions
  • Multivariate Statistic Analysis with SAS

Lyme Disease 12

Other 2

  • Diversity
  • Art and Religion

Download Extras

For convenience, You can download my resume, research and transcripts here.