Tag: visualization


The wonderful DNAGedcom software recently added an option to build a chart using the Collins Leeds Method (CLM). Here is a good write-up by Kitty Cooper about the nuts and bolts of the CLM.

Basically, what it does is use your list of Ancestry DNA matches and the ICW file (those matches that share other matches with you) and builds a chart showing all of this in a nice visual format.

Here is my CLM chart, though I may need to update it since I’m not sure the last time I pulled data from Ancestry DNA:

Brian Zalewski Collins Leeds Method 3D chart – as of 17 Feb 2019

As you can see, the chart puts the shared matches into groups. I’ve labeled them manually based on how I know we’re connected. The small grey boxes show that some matches also match into some of the other groups, which makes sense.

I was pretty surprised that my Van Price/Van Parijs line was the largest collection, especially since I usually have a ton of French Canadian matches, but maybe those cousins are more likely to test at Ancestry.

Sadly, not a lot on my paternal grandfather’s side, though there are some, especially in the Kashubian region of Poland. The other large sections are paternal, but on my grandmother’s Irish and German side.

I am waiting for DNAGedcom to finish a very large import and then I will run it for my mom’s Ancestry matches, but I did do a manual version of this and saw similar groupings.

If you don’t have access to DNAGedcom, since it does cost a small amount every month, you can do this yourself manually using either Excel or the free Google Sheets using the original Leeds Method by Dana Leeds.

I’d love to see if using this method opened up any doors for you. For me, it did make me wonder about some possible adoptions or non-parental events on my paternal grandmother’s side (not on my direct line, but off to the side) since some matches have some surnames, but not others and a few other odd things. So, I’m doing some more in-depth research off of those lines.

Recently, I saw someone post about visualizing their DNA match network. They were doing this as a service. You would order a visualization and they would build one for you and send it to you for a nominal fee. It sounded and looked awesome. I noticed they were using an open source program, so I thought to myself, if they can do it, so can I. So, that’s what I did…for the most part.

The open source program is called Gephi and it’s described as the leading visualization and exploration software for all kinds of graphs and networks. And first glance it can seem scary and overwhelming, and it is in some respects. In my job and on my own time, I’ve worked a lot with sets of data; organizing them, analyzing them, morphing them to work in another way, etc. This seemed like something I could do.

Finding the Data

The first issue was figuring out how to get my match data into a format that the software needed. I first tried to get all of my match data exported from Genome Mate Pro, which I was able to do. I just don’t know how to massage it into what I need, yet, at least not without a lot of manual work. So, then I looked at some of the files that are created when I run DNAGedcom to get my match info for GMP. The Ancestry DNA files looked good. They had mostly what I needed. I had to do some minor changes to the files, but overall it worked.

After a bit of a learning curve and some Googling, I was able to get a pretty decent looking network visualization of (most of) my Ancestry DNA matches. I say most since I’m not completely sure if I have all of the connections included. Here is the final visualization, without the names.

Click for bigger version

Here is a quick overview. The size of the circles are based on how many centimorgans (cMs) I share with my match and it also shows how closely related we are.  This graph only includes matches with more than 20 cMs, so about 4th cousins or so. I color-coded a few of the major lines that I knew based on the match. I am the large white circle in the center. My mother is the large yellow circle at the bottom.

The purple-ish group at the top is from my Corrigan line as the larger one is my father’s cousin. The red group at the left, I think, is a collection of Polish matches. The small teal group under the red is my Thielke line as the larger teal circle is my mom’s cousin. The pinkish group under that one is my Van Price/Van Parijs Dutch side. The green and orange on the bottom right is mainly my mother’s French-Canadian matches. There are a lot of descendants from those original French immigrants as you can see by all of the inter-matching between them. The single pink circle above me is my one and only Zalewski match. You can see why that line is difficult to research. The rest of the randomly colored and white circles are either one-off matches or matches I have yet to organize.

Now What?

My next steps are to not only analyze this graph to see if any odd connections pop out, but also to try to do this with my other data including 23andMe and/or Family Tree DNA. I am also going to try to do it with my Genome Mate Pro data as that has everything in one place, including GedMatch matches. Seeing the software take all of these matches, which are at first in one big blob, and organizing it into the graph above is cool to see as it moves around like it’s alive until it settles.