Now that I’m pretty sure that I have my DNA at every possible location available to collect new matches, I’m coming across interesting things. I currently have two mysteries related to my DNA matches that I’m working my way through.
On a side note, I found the amazing free DNA Painter site that lets you “paint” your chromosome matches on both sides (paternal and maternal) which makes it very easy to see how people may connect, especially if you’re a visual person. Blaine Bettinger does a great walk-through video about it, if you’re interested.
Recently, I saw someone post about visualizing their DNA match network. They were doing this as a service. You would order a visualization and they would build one for you and send it to you for a nominal fee. It sounded and looked awesome. I noticed they were using an open source program, so I thought to myself, if they can do it, so can I. So, that’s what I did…for the most part.
The open source program is called Gephi and it’s described as the leading visualization and exploration software for all kinds of graphs and networks. And first glance it can seem scary and overwhelming, and it is in some respects. In my job and on my own time, I’ve worked a lot with sets of data; organizing them, analyzing them, morphing them to work in another way, etc. This seemed like something I could do.
Finding the Data
The first issue was figuring out how to get my match data into a format that the software needed. I first tried to get all of my match data exported from Genome Mate Pro, which I was able to do. I just don’t know how to massage it into what I need, yet, at least not without a lot of manual work. So, then I looked at some of the files that are created when I run DNAGedcom to get my match info for GMP. The Ancestry DNA files looked good. They had mostly what I needed. I had to do some minor changes to the files, but overall it worked.
After a bit of a learning curve and some Googling, I was able to get a pretty decent looking network visualization of (most of) my Ancestry DNA matches. I say most since I’m not completely sure if I have all of the connections included. Here is the final visualization, without the names.
Here is a quick overview. The size of the circles are based on how many centimorgans (cMs) I share with my match and it also shows how closely related we are. This graph only includes matches with more than 20 cMs, so about 4th cousins or so. I color-coded a few of the major lines that I knew based on the match. I am the large white circle in the center. My mother is the large yellow circle at the bottom.
The purple-ish group at the top is from my Corrigan line as the larger one is my father’s cousin. The red group at the left, I think, is a collection of Polish matches. The small teal group under the red is my Thielke line as the larger teal circle is my mom’s cousin. The pinkish group under that one is my Van Price/Van Parijs Dutch side. The green and orange on the bottom right is mainly my mother’s French-Canadian matches. There are a lot of descendants from those original French immigrants as you can see by all of the inter-matching between them. The single pink circle above me is my one and only Zalewski match. You can see why that line is difficult to research. The rest of the randomly colored and white circles are either one-off matches or matches I have yet to organize.
My next steps are to not only analyze this graph to see if any odd connections pop out, but also to try to do this with my other data including 23andMe and/or Family Tree DNA. I am also going to try to do it with my Genome Mate Pro data as that has everything in one place, including GedMatch matches. Seeing the software take all of these matches, which are at first in one big blob, and organizing it into the graph above is cool to see as it moves around like it’s alive until it settles.
I ran across a helpful site recently called draw.io that allows you to build flow charts and other diagrams pretty easily. It also ties in nicely with Google Drive and Dropbox so you can get your designs anywhere. I ended up using the site to visualize some of my DNA matches, specifically matches on certain lines in my family tree. It worked nicely and allowed me to see how exactly we’re connected and what information may be gleaned from those matches (i.e., Y-DNA lines, etc.)
Here are my three designs, in the following order. I visualized my Zalewski cousin tests, my Corrigan cousin tests, my Thielke cousin tests, and my Last cousin tests. The last two are on my maternal side and sort of overlap. I have some other lines to do, yet. Click the images for a larger version.
If a lot of that sounds like gibberish to you, this book will definitely help. It has a good introduction to DNA and the different DNA tests. Even though I’m fairly well-versed in a lot of the DNA stuff, I still found a lot of helpful information.
The book is broken down into three main sections. “Getting Started” is the first section and it goes over the genetic genealogy basics, misconceptions and ethical considerations. It is a good info for anyone getting into genetic genealogy.
The second section is “Selecting A Test” which goes over each of the main types of DNA and the tests related to them: Mitochondrial-DNA, Y-DNA, Autosomal-DNA, and X-DNA. It’s a great read-through, especially if you’re trying to figure out a specific genealogical mystery in your tree since it will help you decide which test is best for solving it.
The third section is “Analyzing and Applying Test Results” which gets into more advanced tools for analyzing your DNA. This includes an overview of the most popular third-party tools, like GedMatch, and things like the often-marketed Ethnicity Estimates. This section also delves into using DNA testing for adoptees, which isn’t something I’m personally familiar with, but I imagine is a very powerful tool.
I personally like having all of the information in one place rather that bookmarking multiple websites and random notes. If I’m looking for where my X-Chromosome may come from in my ancestry, I can just pull up the X-DNA chart. How is a third-cousin twice-removed related to me? I can check the handy reference chart.
The author, Blaine T. Bettinger, has long been known as one of the best genetic genealogy resources in the community. His blog, The Genetic Genealogist, has always been one that I read often. He’s very knowledgeable in the subject and his writing is very easy to follow. I was able to get through the book in only a few days and it still sits actively on my desk as a constant reference. I’d recommend it to everyone involved in DNA testing, from those new to DNA testing to those who have tested but want to learn more.
Yesterday, the big news across the Genetic Genealogy community was the release of Ancestry DNA’s Genetic Communities. According to Ancestry, these communities are built like this:
We find Genetic Communitiesâ„¢ by looking at a network of DNA connections we build using millions of AncestryDNA members in our database. When we build a network like this using millions of AncestryDNA members with billions of DNA relationships between them, we find groups of people in the network that have more DNA matches to each other than to people in other parts of the network. We call these groups Genetic Communities. We use a popular network analysis method called community detection to discover them.
So, it’s sort of a mix of DNA matches along with information from the millions of family trees built on the site. Together they can find a community in the more recent past. Previously, we only had ethnicity estimates to work with, but those were usually more broad and much deeper in the past. For example, here are my ethnicity results.
That Scandinavia one still confuses me a bit. but who knows where my deep ancestry came from. Those Scandinavians were known to travel.
I have two active Genetic Communities, as do most people it seems. My first one is Germans in Brandenburg & Mecklenburg-Vorpommern (very likely >95%) which matches up very well with my known ancestry. The other one is Poles in Pomerania, which also matches up very well though their confidence is only at 20% for this one at the moment.
The German community points to this area, which is the original location of a lot of my German ancestry. The Pomeranian community points to a majority of northern Poland, which also has a lot of my ancestry. As always, click the images for a larger view.
You can also break down the communities into time periods to find out more information about what happened in that area during those years. If I open up the time period when most of my ancestors migrated, it talks about that exact thing and also talk about how they came to the Wisconsin area.
So far these communities have been helpful and surprisingly specific and on the right track. Based on a lot of the messy, incorrect trees I see on the site I’d expect some skew, but I imagine those are not the majority. If you’re looking for much more insight on these communities, check out the great post over at The Genetic Genealogist. Ancestry has also put together a short video introducing the feature.
One of those days I was waiting for finally happened. A DNA match contacted me that is from the Jacob Zalewski line that I had always assumed was the brother of my great-great grandfather, Frank Zalewski. This proves that Jacob and Frank are definitely related. They are probably brothers (as all other evidence points to) but not proven 100%.
Unfortunately, the match comes to me from AncestryDNA. While AncestryDNA is one of the most popular, it also gives the least amount of advanced tools. I cannot see where we match on our DNA as there is no Chromosome Browser like every other site has. I have contacted my match and asked if they would upload their data to GEDMatch so we can do the more advanced matching. I’d really love to see which part of my chromosome comes from my Zalewski line. That could point me towards more Zalewski relations and possibly finally breaking down more of that monstrous Zalewski line brick wall.
The possible Jacob-Frank connection all started back in July 2009 when I noticed a Jacob Zalewski family living with and quite near Frank and his family in Milwaukee in multiple city directories. After many years and finding more and more cross-family connections, I just assumed they were brothers as the pile of evidence was getting quite large. Though, I was always waiting and hoping for a DNA connection. I was planning on trying to convince a few distant cousins from that line that I had found to do a DNA test (I would probably even have paid for it.)
It was exiting to see another cousin listed on my 23andMe DNA Relatives list yesterday. While going through my matches, I noticed a familiar name, my paternal grandmother’s cousin (so, my first cousin, once removed.) I now have 4 confirmed cousin matches on that list (excluding my father.)
Also, earlier this week I confirmed the most recent common ancestor (MRCA) speculation on another one of my matches. I did some digging on who we thought was our common ancestor and was able to prove it (with like 95% certainty) that we share 3rd-great-grandparents. I found a lucky obituary via a Google search that confirmed her connection to the TROKA surname. Once there, it just took a little source triangulation to confirm dates and connections back up to Thomas Troka to prove he is the brother of my great-great-grandfather, Joseph Troka.
3 out of the 4 of the confirmed cousins on my list are paternal (1 first cousin; 1 third cousin, twice removed; 1 first cousin, once removed.) The connection on my maternal side is a third cousin through my paternal grandfather. I can now fill in the shared genomes of our MRCAs and see exactly which ancestors I received which chromosomes from. Obviously the goal in that is to go back as far as possible to make it as granular as possible.
Below is my updated Chromosome Map, courtesy of the Chromosome Mapping Tool by Kitty Cooper. Added are the new mapping points for my paternal great-great-grandparents, Thomas & Emma Jane (Firmenich) CORRIGAN and also my paternal grandparents, Richard & Mary Jane (Corrigan) ZALEWSKI.
Funny tidbit, I scheduled this post to go up at Ï€ (Pi) today: 3/14/15 9:26
One of the first steps in my 2015 Year of the DNA project is to look at new avenues of research and get my DNA info out there to other possible cousins. In the last few days I did a few things.
I finally transferred my 23andMe DNA over to the Family Tree DNA Family Finder. You can transfer it over for free right now to see a bunch of your matches, but you can’t do much analyzing and meeting until you pay the $39 transfer price. It’s actually a good deal to get into FTDNA’s database as they have a lot of users in it already who seem more interested in genealogy than a lot of the 23andMe members. I saw a few new matches and also someone with the surname CORRIGAN, which is my paternal grandmother’s surname. We matched on a location that both my father and my paternal cousin match on, so that’s good news.
I also finally donated to GEDMatch.com. I’ve been using it for a long time and even though it’s mostly flaky when using it due to its popularity, it’s still an invaluable tool to be able to match people from multiple testing companies. With a $10 donation, you also get access to their “Tier 1” tools like Triangulation, which are pretty helpful.
And I also updated my DNA information over at WikiTree. Once you add that, it will add your information to anyone that you may share DNA with including Y-DNA, mtDNA, and Autosomal. This way when someone finds one of their ancestors, they will also see that you share DNA with this ancestor. If they’ve also taken a test (or have a GEDMatch ID) you can see the match info. It’s just another way to find more people. You can see how it looks here on my great-great-grandfather’s wiki page.
Hopefully, some of these updates will help bring more matches and cousins to my door (well, not physically to my front door, that’d be weird.)
As some of you may know, genetic genealogy exploded in 2014. Hundreds of thousands of people have now tested their DNA with the big three testing companies (23andMe, Family Tree DNA, or Ancestry.) I have been interested in tracing my ancestry using DNA since back in 2006 with the first version of National Geographic’s Genographic Project when I swabbed my cheek for the first time (and last, actually, since the other tests were taken differently. )
I’m extremely interested in digging deeper into my DNA origins and my DNA matches, whether it’s using Autosomal DNA or Y-DNA. This year I’m planning to dig deeper and do more than ever before. Advanced analysis is a somewhat difficult thing to get into. There is a lot of information to learn and process along with the requirement of lots of DNA data to work with. I hope to use this new goal as a way to post about my journey and hopefully teach you along the way. People related to me may find it even more interesting.
Unbeknownst to me, one of my paternal cousins took a 23andMe test last year. I learned about this on Christmas Eve and have since hooked up with him on the site. What’s cool about that is that I can now mostly confirm which parts of my DNA come from my paternal grandparents. Though, not all of it, only the sections that we match on specifically since my father and his father may have have received different parts of DNA from my grandparents, which in turn may also be different than what he finally got from his father (my uncle.) Hopefully, other closer cousins start to test.
I’m not sure what my first post will be about, but we’ll see once I start digging. I’ve been recently reading a lot of posts from both Roberta Estes at DNAeXplained and Kitty Cooper. They do some great posts on the inner workings and complexities of our DNA and matching it with other people. Some of the posts get quite technical, and even if I don’t completely understand it, I love it. I guess that’s the data geek in me.
Here are some of my general goals, in no particular order:
Do more advanced analysis on some of my largest matches. Try to find MRCA (Most Recent Common Ancestor.)
Try to prod more cousins (close and distant) to test with one of the companies (preferably not Ancestry, or if they do, to upload their data to GEDMatch.)
Try to determine which parts of my DNA come from which ancestors (Chromosome Mapping.) I have a bit of it already. Works together with the last two goals.
Possibly get more Y-DNA upgrades with my data on Family Tree DNA to help determine my deeper R1a1a subclade using the Family Tree DNA project, currently it’s estimated to be R1a1a1b1a2b* or YP340-45 (in the Carpathian area of Section 6 on that linked graphic), but I need more of my Y-DNA analyzed to get more information. This one will cost something.
Post somewhat consistently about my journey and what I’m learning, even if it’s confusing to me.
After getting my DNA tests completed and for the past few years pouring over that data using tools like GEDMatch, and most recently, Genome Mate, I’ve started to accumulate Most Recent Common Ancestors (MRCA) with some of my DNA matches. How to figure those out is another post entirely.
Granted, I don’t have a lot of confirmed MRCAs, yet, but I do have a few. You can use this data to make a chromosome mapping. Genome Mate does this for you in the software, but there is also a web version (seen below) that will do it for you. This will paint all of the segments on your chromosome that match those ancestors. Once you get a lot of confirmed MRCAs, the mapping looks really cool. Mine is getting started.
As you can see, I only have 2 MRCAs confirmed, one on each side. My paternal 3rd-great-grandparents, Michael Troka and Josylna Grabowska and my maternal great-great-grandparents, Carl Last & Augusta Luedtke.
The Troka connection is not yet fully confirmed, but the information we have is pretty solid. The Last connection is confirmed as I’ve matched up family trees with a 3rd cousin I found via a 23andMe match. I have a few more matches in progress that are close to finding information on our MRCA. It can be tough work sometimes, but there is hope of finding all new ancestors.