|
I frequently mention the acronym "GEDCOM" in this newsletter. This week a reader
wrote to me with an excellent question: "What is GEDCOM?" I realized that I haven't
explained this buzzword in a long, long time. So, here is a brief, non-technical
explanation of the term for the newer subscribers to this publication. GEDCOM is
an abbreviation that stands for GEnealogy Data COMmunications. In short, GEDCOM
is the language by which different genealogy software programs talk to one another.
The purpose is to exchange data between dissimilar programs without having to manually
re-enter all the data on a keyboard.
To illustrate the importance of GEDCOM, step back in time with me for a moment.
Back before the invention of GEDCOM and before the invention of the home computer,
I used 80-column punch cards to record the names and limited information about 200
or so of my ancestors. I did this after work hours in my employer's data center.
I then used the employer's mainframe computer that cost hundreds of thousands of
dollars to sort the data and to print a few crude reports. Luckily for me, my employer
allowed me to use all the mainframe time I wanted during the evening, after the
company finished its daily work. Around 1980, I built my own home computer. I decided
to put my genealogy database onto the new system, but it would not read the 80-column
punch cards I had used earlier. I manually re-typed every bit of data into a dBASE-II
program that I wrote. My database had grown, so I had to enter data about 400 or
so individuals. I stored the information on 8-inch floppy disks attached to my homemade
8-bit CP/M computer, which had 64 Kb (kilobytes) of memory. Some time later I discovered
a CP/M genealogy program that would operate on my home computer. (CP/M was an operating
system that was popular before MS-DOS which, in turn, was popular before Windows.)
Unlike my crude, homemade dBASE-II program, this new genealogy program printed pedigree
charts, family group sheets, and other reports. I decided to convert to the new,
more powerful program (although I must say that it was rather elementary when compared
to today's powerful programs). At this point my database had grown to about 600
individuals, and I could not find any method of easily copying that data into the
new program. I first printed out the information from the dBASE-II database. Then
I sat at my computer for several evenings, reading the information on paper and
re-typing every bit of it into my new program. I bet you can guess the next step:
I purchased an IBM clone in 1984 and decided to move my data to this new powerhouse.
After all, it had 640 kilobytes of memory and a 20-megabyte hard drive, which I
was certain that I could never fill. Having been rather active in my genealogy research,
I now had information about 1,200 people to re-enter. I printed out the entire database
from the old system onto paper and then manually re-typed it into the new PC powerhouse.
That effort took weeks, and I promised myself, "Never again!" Newer genealogy programs
appeared in the following years, each with new features that I found enticing. However,
I continued to use the same program simply because I didn't want to go through the
keyboard effort again. Roughly fifteen years ago, the Church of Jesus Christ of
Latter-day Saints announced something new: a file format called GEDCOM. This new
proposed standard file format was designed to allow different genealogy programs
to exchange data. There was only one problem at the time: the only program that
could read and write GEDCOM data was the one written by the Church of Jesus Christ
of Latter-day Saints. GEDCOM is a standard, not a program. As such, genealogy programs
that are going to use the same data have to be written by the programmers to handle
GEDCOM files. If you are trying to transfer data from one program to another, only
to discover that one of the programs does not support GEDCOM, you are out of luck.
To complete the exchange of data, both programs have to support GEDCOM. Slowly,
over a period of several years, other genealogy programs began to add the ability
to read and write GEDCOM files. It became possible to move data from one genealogy
program to another without manually re-typing everything. Now you can just export
your file from one genealogy program in GEDCOM format and then import that GEDCOM
file into another genealogy program. You can use GEDCOM files to exchange genealogy
data with your distant cousin in Poughkeepsie as well as to upload data to GenCircles,
RootsWeb, Ancestry.com, FamilySearch.org, OneGreatFamily.com and many other online
databases. The author of the genealogy program that I used never did add GEDCOM
capability. Luckily for me, someone else eventually wrote a small routine that would
export data from this program in GEDCOM format, and I was then able to move my data
to increasingly powerful new programs. By 1990, I was writing articles on CompuServe,
advising everyone to never use a genealogy program that lacked GEDCOM capabilities.
Luckily, that is no longer an issue. All of today's major genealogy programs will
import and export GEDCOM data. Data transfer may still be a problem for those using
older genealogy programs without GEDCOM capability; many people still find their
data trapped in these "islands." For them, there is no easy solution. Unlike the
"dark ages" of the 1980s, it is now common for people to use two or three or even
more genealogy programs. You may find one program that you prefer to use for storing
all the bits of information that you encounter in your research efforts. However,
you might prefer the printed reports or multimedia scrapbook features of a different
program. Thanks to GEDCOM, you can easily move your data from one program to another.
You can also share information with distant cousins using yet other genealogy programs
by sending GEDCOM files to each other by e-mail. The instructions for creating or
reading GEDCOM files will vary from one program to another. You need to consult
the program's HELP files to find the exact sequence of instructions your genealogy
program requires. GEDCOM files can be read by a human although it would be tedious
to do so.
Here is an extract from the beginning of a typical GEDCOM file:
0 HEAD 1 SOUR Legacy 2 VERS 4.0 2 NAME Legacy (R) 2 CORP Millennia Corp. 3 ADDR
PO Box 66 4 CONT El Mirage, AZ 85335 1 DEST Gedcom55 1 DATE 16 Oct 2004 1 SUBM @S0@
1 FILE Kennedy.ged 1 GEDC 2 VERS 2 FORM LINEAGE_LINKED 1 CHAR ANSI 0 @S0@ SUBM 1
NAME Not Given 1 ADDR Not Supplied 2 CONT 0 @I1@ INDI 1 NAME Joseph Patrick /Kennedy/
2 GIVN Joseph Patrick 2 SURN Kennedy 1 SEX M 1 BIRT 2 DATE 6 Sep 1888 2 PLAC Boston,
MA 2 SOUR @S2@ 3 PAGE pg 56 3 QUAY 3 1 DEAT 2 DATE 18 Nov 1969 2 PLAC Hyannis Port,
MA (rest of file omitted)
The file contains genealogy data in a structured format. It utilizes numbers to
indicate the hierarchy and tags to indicate individual pieces of information within
the file. A number of zero indicates the first line within a single record, and
the letters, or tag, after the zero indicate the type of record. The top line in
any GEDCOM file is the HEADER record, indicating that it is the beginning of the
file. Words that are more than four letters long are typically abbreviated. In this
case, the word HEADER is written as HEAD. A number "1" shows that the line in question
is one level below the "zero" line. This indicates that this line is one level subservient
to the zero line and contains additional information. In the case of the second
line in the above file, the entry of "1 SOUR Legacy" indicates that this file was
created by (SOURCE) Legacy, a popular genealogy program for Windows. The number
"2" on the next line shows that it is subservient to the preceding line with a number
1 in it. In this case, the line of "VERS 4.0" indicates that the file was written
with version 4 of Legacy. Below that you see a line labeled ADDR (address) and another
labeled CONT (the previous line is CONTinued here).
Scanning a bit further down the file, you will see the following: 0 @I1@ INDI
Again, the zero indicates this is the beginning of a new record. The "at" signs
bracket the record number. In this case, the record is of an INDIvidual, and it
is individual #1 (I1) in the database. Succeeding lines show events, such as birth,
marriage, and death, along with subsequent data listing dates and places. You will
also note an entry of "2 SOUR @S2@," which indicates that a source citation for
the event can be found in SOURce entry S2 to be found later in this file. INDI,
NAME, BIRT, DEAT, SEX, SOUR and the other record types are called GEDCOM "tags."
There are many available tags within the GEDCOM standard and even a capability to
create user-defined tags for those situations not covered by the standard. Of course,
user-defined tags are usually not understood by the receiving program, so they seem
to be somewhat useless. They may help define data within the program in which they
were created, but they will not translate to a new program via the GEDCOM format.
This is a very abbreviated explanation of the internals of a GEDCOM file. You can
a detailed explanation at http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm.
You need to be aware that the creation of the GEDCOM standard was not a perfect
implementation. For one thing, not all the data fields are specified precisely in
the GEDCOM specifications. Next, not all the programmers of the various genealogy
programs interpreted the specifications in exactly the same manner. For instance,
your present genealogy program might be perfectly happy with a birth date listed
as, "after 1847 but before 1852." However, once that information is exported in
a GEDCOM file and then imported into a different program, the birth date may say
something else. The receiving program may expect exact dates and not be able to
handle anything that says "after" or "before," especially not both in the same statement.
Typically, the receiving program simply leaves the line blank. Sadly, one or two
genealogy programs will accept the first date found on the line and then will disregard
any further information. Another problem is that not all genealogy programs have
the same ideas about databases. One program may have only one field for "occupation,"
assuming that every person on the face of the earth never, ever changed careers.
Another genealogy program may have the ability to record multiple occupations during
the person's lifetime. When transferring data via GEDCOM from the more powerful
program to the simpler one, some of these occupations will be lost. These are a
couple of simple examples; you can find numerous other inconsistencies when moving
data between dissimilar programs. Another limitation is the fact that the present
GEDCOM standard was created before the popularity of multimedia. You can transfer
textual data, such as names, dates, and locations rather well in GEDCOM. However,
transferring scanned images, sound clips, and movies from one genealogy program
to another is almost impossible to accomplish via GEDCOM files. The present GEDCOM
implementation can point to the location of multimedia files on a hard drive. In
theory, this should suffice. However, in my experience of moving data around in
many genealogy programs, I have rarely seen multimedia files handled properly. There
is another problem with translating from one program to another: that of data integrity.
Translating from one program's database to GEDCOM is sort of the same as translating
from one spoken language to another. The basics work, but subtleties and details
sometimes do not translate well. Then, when translating to the third language (the
receiving genealogy program's database), more translation losses creep in. I well
remember reading a technical manual some years ago that had been written in Japanese
and then translated into Chinese. At a later date, the Chinese version was translated
into English. The resultant English manual was barely readable. The same may happen
with translating a database from Program A into GEDCOM and then from GEDCOM into
Program B. A new method of transferring data between different genealogy programs
was announced some time ago by Wholly Genes Software. Their Bridge technology reads
data from one program directly into a second program without requiring a "double
translation" via GEDCOM. The result is a much more accurate transfer process. However,
very few genealogy developers have adopted GenBridge. To date, this technology is
only available in a few programs: The Master Genealogist and Family Tree Super Tools
(both produced by Wholly Genes), The Pocket Genealogist, and GedStar Pro are the
only ones I can think of. Despite all the shortcomings, GEDCOM is still a simple
and somewhat effective method of transferring genealogy data from one program to
another. Most of the data will transfer properly, and then there are easy ways of
reviewing the data to look for errors. The names, dates, and locations normally
transfer correctly. Text, events, notes, and source citations may not always work
perfectly. The exact problems encountered will depend upon the two genealogy programs
involved. Most modern genealogy programs will create an error log of GEDCOM data
imported but not understood by the receiving program. You can read that log file
to see what the program detected as inconsistent, then manually go in and fix the
errors. While tedious, this is still a lot better than re-keying everything! Two
and a half years ago a new GEDCOM standard was proposed that is to be based upon
XML, a programming language that is popular on the World Wide Web. This new standard
should greatly improve data transfer accuracy. See http://www.familysearch.org/GEDCOM/GedXML60.pdf
for details.
However, don't look for this new GEDCOM 6.0 any time soon. It has been a proposal
for more than two and a half years, and nothing has happened in that time. Older
versions of GEDCOM have been around for more than fifteen years, and only minor
improvements have been made in that time. I expect that GEDCOM 6.0 will not appear
in genealogy programs for several more years, if ever. As an interesting side note:
a program called "gedify" claims to be able to convert GEDCOM 6.0 files to the older
GEDCOM 5.5 standard so that older genealogy programs can read data created with
the new format. There doesn't seem to be much need for this program as no one is
yet creating GEDCOM 6.0 files! You can read more about gedify at http://savannah.nongnu.org/projects/gedify.
I noticed that the pages there have not been updated in a long time. I offer this
article as a non-technical explanation of GEDCOM plus some commentary on its use.
For more details and for technical explanations of the inner workings of GEDCOM,
I would suggest that you read the following: The GEDCOM Standard Release 5.5: http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm
GEDCOM 6.0 XML proposal: http://www.familysearch.org/GEDCOM/GedXML60.pdf Introduction
to GEDCOM: http://web.ukonline.co.uk/nigel.battysmith/gedinfo.html GEDCOM 101 by
Jan McClintock: http://www.leisterpro.com/doc/Articles/GEDCOM101.html GEDCOM Usage
Guide: http://www.cmis.csiro.au/Graham.Williams/personal/gedcom.html Is GEDCOM Dead?
By Beau Sharbrough: http://www.rootsworks.com/genart13.htm
|