|
I frequently mention the acronym "GEDCOM" in this newsletter.
This week a reader wrote to me with an excellent question: "What
is GEDCOM?" I realized that I haven't explained this buzzword
in a long, long time. So, here is a brief, non-technical explanation
of the term for the newer subscribers to this publication. GEDCOM
is an abbreviation that stands for GEnealogy Data COMmunications.
In short, GEDCOM is the language by which different genealogy
software programs talk to one another. The purpose is to exchange
data between dissimilar programs without having to manually re-enter
all the data on a keyboard.
To illustrate the importance of GEDCOM, step back in time with
me for a moment. Back before the invention of GEDCOM and before
the invention of the home computer, I used 80-column punch cards
to record the names and limited information about 200 or so of
my ancestors. I did this after work hours in my employer's data
center. I then used the employer's mainframe computer that cost
hundreds of thousands of dollars to sort the data and to print
a few crude reports. Luckily for me, my employer allowed me to
use all the mainframe time I wanted during the evening, after
the company finished its daily work. Around 1980, I built my own
home computer. I decided to put my genealogy database onto the
new system, but it would not read the 80-column punch cards I
had used earlier. I manually re-typed every bit of data into a
dBASE-II program that I wrote. My database had grown, so I had
to enter data about 400 or so individuals. I stored the information
on 8-inch floppy disks attached to my homemade 8-bit CP/M computer,
which had 64 Kb (kilobytes) of memory. Some time later I discovered
a CP/M genealogy program that would operate on my home computer.
(CP/M was an operating system that was popular before MS-DOS which,
in turn, was popular before Windows.) Unlike my crude, homemade
dBASE-II program, this new genealogy program printed pedigree
charts, family group sheets, and other reports. I decided to convert
to the new, more powerful program (although I must say that it
was rather elementary when compared to today's powerful programs).
At this point my database had grown to about 600 individuals,
and I could not find any method of easily copying that data into
the new program. I first printed out the information from the
dBASE-II database. Then I sat at my computer for several evenings,
reading the information on paper and re-typing every bit of it
into my new program. I bet you can guess the next step: I purchased
an IBM clone in 1984 and decided to move my data to this new powerhouse.
After all, it had 640 kilobytes of memory and a 20-megabyte hard
drive, which I was certain that I could never fill. Having been
rather active in my genealogy research, I now had information
about 1,200 people to re-enter. I printed out the entire database
from the old system onto paper and then manually re-typed it into
the new PC powerhouse. That effort took weeks, and I promised
myself, "Never again!" Newer genealogy programs appeared in the
following years, each with new features that I found enticing.
However, I continued to use the same program simply because I
didn't want to go through the keyboard effort again. Roughly fifteen
years ago, the Church of Jesus Christ of Latter-day Saints announced
something new: a file format called GEDCOM. This new proposed
standard file format was designed to allow different genealogy
programs to exchange data. There was only one problem at the time:
the only program that could read and write GEDCOM data was the
one written by the Church of Jesus Christ of Latter-day Saints.
GEDCOM is a standard, not a program. As such, genealogy programs
that are going to use the same data have to be written by the
programmers to handle GEDCOM files. If you are trying to transfer
data from one program to another, only to discover that one of
the programs does not support GEDCOM, you are out of luck. To
complete the exchange of data, both programs have to support GEDCOM.
Slowly, over a period of several years, other genealogy programs
began to add the ability to read and write GEDCOM files. It became
possible to move data from one genealogy program to another without
manually re-typing everything. Now you can just export your file
from one genealogy program in GEDCOM format and then import that
GEDCOM file into another genealogy program. You can use GEDCOM
files to exchange genealogy data with your distant cousin in Poughkeepsie
as well as to upload data to GenCircles, RootsWeb, Ancestry.com,
FamilySearch.org, OneGreatFamily.com and many other online databases.
The author of the genealogy program that I used never did add
GEDCOM capability. Luckily for me, someone else eventually wrote
a small routine that would export data from this program in GEDCOM
format, and I was then able to move my data to increasingly powerful
new programs. By 1990, I was writing articles on CompuServe, advising
everyone to never use a genealogy program that lacked GEDCOM capabilities.
Luckily, that is no longer an issue. All of today's major genealogy
programs will import and export GEDCOM data. Data transfer may
still be a problem for those using older genealogy programs without
GEDCOM capability; many people still find their data trapped in
these "islands." For them, there is no easy solution. Unlike the
"dark ages" of the 1980s, it is now common for people to use two
or three or even more genealogy programs. You may find one program
that you prefer to use for storing all the bits of information
that you encounter in your research efforts. However, you might
prefer the printed reports or multimedia scrapbook features of
a different program. Thanks to GEDCOM, you can easily move your
data from one program to another. You can also share information
with distant cousins using yet other genealogy programs by sending
GEDCOM files to each other by e-mail. The instructions for creating
or reading GEDCOM files will vary from one program to another.
You need to consult the program's HELP files to find the exact
sequence of instructions your genealogy program requires. GEDCOM
files can be read by a human although it would be tedious to do
so.
Here is an extract from the beginning of a typical GEDCOM file:
0 HEAD 1 SOUR Legacy 2 VERS 4.0 2 NAME Legacy (R) 2 CORP Millennia
Corp. 3 ADDR PO Box 66 4 CONT El Mirage, AZ 85335 1 DEST Gedcom55
1 DATE 16 Oct 2004 1 SUBM @S0@ 1 FILE Kennedy.ged 1 GEDC 2 VERS
2 FORM LINEAGE_LINKED 1 CHAR ANSI 0 @S0@ SUBM 1 NAME Not Given
1 ADDR Not Supplied 2 CONT 0 @I1@ INDI 1 NAME Joseph Patrick /Kennedy/
2 GIVN Joseph Patrick 2 SURN Kennedy 1 SEX M 1 BIRT 2 DATE 6 Sep
1888 2 PLAC Boston, MA 2 SOUR @S2@ 3 PAGE pg 56 3 QUAY 3 1 DEAT
2 DATE 18 Nov 1969 2 PLAC Hyannis Port, MA (rest of file omitted)
The file contains genealogy data in a structured format. It utilizes
numbers to indicate the hierarchy and tags to indicate individual
pieces of information within the file. A number of zero indicates
the first line within a single record, and the letters, or tag,
after the zero indicate the type of record. The top line in any
GEDCOM file is the HEADER record, indicating that it is the beginning
of the file. Words that are more than four letters long are typically
abbreviated. In this case, the word HEADER is written as HEAD.
A number "1" shows that the line in question is one level below
the "zero" line. This indicates that this line is one level subservient
to the zero line and contains additional information. In the case
of the second line in the above file, the entry of "1 SOUR Legacy"
indicates that this file was created by (SOURCE) Legacy, a popular
genealogy program for Windows. The number "2" on the next line
shows that it is subservient to the preceding line with a number
1 in it. In this case, the line of "VERS 4.0" indicates that the
file was written with version 4 of Legacy. Below that you see
a line labeled ADDR (address) and another labeled CONT (the previous
line is CONTinued here).
Scanning a bit further down the file, you will see the following:
0 @I1@ INDI
Again, the zero indicates this is the beginning of a new record.
The "at" signs bracket the record number. In this case, the record
is of an INDIvidual, and it is individual #1 (I1) in the database.
Succeeding lines show events, such as birth, marriage, and death,
along with subsequent data listing dates and places. You will
also note an entry of "2 SOUR @S2@," which indicates that a source
citation for the event can be found in SOURce entry S2 to be found
later in this file. INDI, NAME, BIRT, DEAT, SEX, SOUR and the
other record types are called GEDCOM "tags." There are many available
tags within the GEDCOM standard and even a capability to create
user-defined tags for those situations not covered by the standard.
Of course, user-defined tags are usually not understood by the
receiving program, so they seem to be somewhat useless. They may
help define data within the program in which they were created,
but they will not translate to a new program via the GEDCOM format.
This is a very abbreviated explanation of the internals of a GEDCOM
file. You can a detailed explanation at http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm.
You need to be aware that the creation of the GEDCOM standard
was not a perfect implementation. For one thing, not all the data
fields are specified precisely in the GEDCOM specifications. Next,
not all the programmers of the various genealogy programs interpreted
the specifications in exactly the same manner. For instance, your
present genealogy program might be perfectly happy with a birth
date listed as, "after 1847 but before 1852." However, once that
information is exported in a GEDCOM file and then imported into
a different program, the birth date may say something else. The
receiving program may expect exact dates and not be able to handle
anything that says "after" or "before," especially not both in
the same statement. Typically, the receiving program simply leaves
the line blank. Sadly, one or two genealogy programs will accept
the first date found on the line and then will disregard any further
information. Another problem is that not all genealogy programs
have the same ideas about databases. One program may have only
one field for "occupation," assuming that every person on the
face of the earth never, ever changed careers. Another genealogy
program may have the ability to record multiple occupations during
the person's lifetime. When transferring data via GEDCOM from
the more powerful program to the simpler one, some of these occupations
will be lost. These are a couple of simple examples; you can find
numerous other inconsistencies when moving data between dissimilar
programs. Another limitation is the fact that the present GEDCOM
standard was created before the popularity of multimedia. You
can transfer textual data, such as names, dates, and locations
rather well in GEDCOM. However, transferring scanned images, sound
clips, and movies from one genealogy program to another is almost
impossible to accomplish via GEDCOM files. The present GEDCOM
implementation can point to the location of multimedia files on
a hard drive. In theory, this should suffice. However, in my experience
of moving data around in many genealogy programs, I have rarely
seen multimedia files handled properly. There is another problem
with translating from one program to another: that of data integrity.
Translating from one program's database to GEDCOM is sort of the
same as translating from one spoken language to another. The basics
work, but subtleties and details sometimes do not translate well.
Then, when translating to the third language (the receiving genealogy
program's database), more translation losses creep in. I well
remember reading a technical manual some years ago that had been
written in Japanese and then translated into Chinese. At a later
date, the Chinese version was translated into English. The resultant
English manual was barely readable. The same may happen with translating
a database from Program A into GEDCOM and then from GEDCOM into
Program B. A new method of transferring data between different
genealogy programs was announced some time ago by Wholly Genes
Software. Their Bridge technology reads data from one program
directly into a second program without requiring a "double translation"
via GEDCOM. The result is a much more accurate transfer process.
However, very few genealogy developers have adopted GenBridge.
To date, this technology is only available in a few programs:
The Master Genealogist and Family Tree Super Tools (both produced
by Wholly Genes), The Pocket Genealogist, and GedStar Pro are
the only ones I can think of. Despite all the shortcomings, GEDCOM
is still a simple and somewhat effective method of transferring
genealogy data from one program to another. Most of the data will
transfer properly, and then there are easy ways of reviewing the
data to look for errors. The names, dates, and locations normally
transfer correctly. Text, events, notes, and source citations
may not always work perfectly. The exact problems encountered
will depend upon the two genealogy programs involved. Most modern
genealogy programs will create an error log of GEDCOM data imported
but not understood by the receiving program. You can read that
log file to see what the program detected as inconsistent, then
manually go in and fix the errors. While tedious, this is still
a lot better than re-keying everything! Two and a half years ago
a new GEDCOM standard was proposed that is to be based upon XML,
a programming language that is popular on the World Wide Web.
This new standard should greatly improve data transfer accuracy.
See http://www.familysearch.org/GEDCOM/GedXML60.pdf for details.
However, don't look for this new GEDCOM 6.0 any time soon. It
has been a proposal for more than two and a half years, and nothing
has happened in that time. Older versions of GEDCOM have been
around for more than fifteen years, and only minor improvements
have been made in that time. I expect that GEDCOM 6.0 will not
appear in genealogy programs for several more years, if ever.
As an interesting side note: a program called "gedify" claims
to be able to convert GEDCOM 6.0 files to the older GEDCOM 5.5
standard so that older genealogy programs can read data created
with the new format. There doesn't seem to be much need for this
program as no one is yet creating GEDCOM 6.0 files! You can read
more about gedify at http://savannah.nongnu.org/projects/gedify.
I noticed that the pages there have not been updated in a long
time. I offer this article as a non-technical explanation of GEDCOM
plus some commentary on its use. For more details and for technical
explanations of the inner workings of GEDCOM, I would suggest
that you read the following: The GEDCOM Standard Release 5.5:
http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm GEDCOM
6.0 XML proposal: http://www.familysearch.org/GEDCOM/GedXML60.pdf
Introduction to GEDCOM: http://web.ukonline.co.uk/nigel.battysmith/gedinfo.html
GEDCOM 101 by Jan McClintock: http://www.leisterpro.com/doc/Articles/GEDCOM101.html
GEDCOM Usage Guide: http://www.cmis.csiro.au/Graham.Williams/personal/gedcom.html
Is GEDCOM Dead? By Beau Sharbrough: http://www.rootsworks.com/genart13.htm
|