Final exam - Speeding up Complex Genetic Mutations Detection in Large Human Genome Data

May 21, 2019 - 1:00pm
2520C UCC

PhD Candidate: Thamer Alsulaiman


     All cellular forms of life contain Deoxyribonucleic acid (DNA). DNA is a molecule that carries all the information necessary to perform both, basic and complex cellular functions. DNA is replicated to form new tissue/organs, and to pass genetic information to future generations. DNA replication ideally yield an exact copy of the original DNA. While replication generally occurs without error, it may leave DNA vulnerable  to accidental changes via mistakes made during the replication process. Those changes are called mutations. Mutations range in magnitude.  Yet, mutations of any magnitude range in consequences, from no effect on the organism, to disease initiation (e.g. cancer), or even death.

     In this thesis, we limit our focus to mutations in human DNA, and in particular, MMBIR mutations. Recent literature in human genomics has  found Microhomology-mediated break-induced replication (MMBIR) to be a common mechanism producing complex mutations in DNA. MMBIRFinder  is a tool to detect MMBIR regions in Yeast DNA. Although MMBIRFinder is successful on Yeast DNA, MMBIRFinder is not capable of detecting  MMBIR mutations in human DNA. Among several reasons, one major reason for its deficiency with human DNA is the amount of computations required to process human large data. Our contribution in this regard is two fold: 1) We utilize parallel computations to significantly reduce the processing time consumed by  the original MMBIFinder, and address several performance degrading issues inherent in the original design; 2) We introduce a new heuristic to detect MMBIR mutations that were not detected by the original MMBIRFinder, even in the case of small sized DNA, like Yeast DNA.

Advisor: Suely Oliveira