Matching Discussion

Matching and false identifications

Version of November 21, 2012

The basic matching scheme

MAST has a simple procedure for matching a collection of the same objects on the sky across two catalogs ("crossmatching"). Typically, the catalogs are the KIC and some Catalog A. The approach is to take each object in KIC and compare its J2000 coordinates with all those in Catalog A via an automated cone search. Next we take all matches within a prescribed matching radius (given in the table on the Explanations & Caveats page) and rank them by angular separation from the KIC object coordinates. Any secondary, tertiary, etc. ranked objects are rejected from this step of the matching. The procedure is then repeated from Catalog A against entries in the KIC, and again only primary matches (identifications) within the search radius are retained. In our present implementation we demand that the matches be mutually 1:1, that is, if the closest match for KIC1234 is is CatA_5678, then the converse must be true as well. For all of our matches retrieved from our Target Search form, we rely on this 1:1 matching standard. Note while that there is no guarantee that all the matches are "correct", the chances of having false or duplicate matches are greatly reduced by the matching/reverse-matching steps.

When these steps are implemented across all non-KIC catalogs the results are put into a database we call the Kepler Colors Table. The Target Search tool allows users to retrieve all results from this table. In the CasJobs implementation the Colors Table is called keplerObjectSearchWithColors.

Alert users of the CasJobs tool will notice that there are two CasJobs tables for matching between KIC and GALEX catalog lists. One is the KGGoldStandard, for which the 1:1 matching criterion applies. A second, KGMatch, gives secondary, tertiary, etc. matches in both directions, KIC to GALEX and GALEX to KIC. KGMatch also includes columns of the numbers of secondary matches and reverse matches. The intent here is to provide CasJobs users with alternative choices that are particularly relevant for comparisons to GALEX - a survey which is shallower than others in brightness. MAST decided not to provide such secondary tables for crossmatches to other catalogs. We did so, first, because the other catalogs extend to fainter magnitudes and therefore have more reliable identifications with respect to a common catalog (KIC). Second, the results would add a bewildering array of additional columns to their Target Search retrieval pages that most users do not need and would thus likely interfere with their work.

Similar to the case of the GALEX survey, we adopted a conservative matching search radius of 1" for objects in our imported Sloan (SDSS/DR9) catalog. However, we occasionally find objects out to 2" or so that should be physical matches but are not in our ColorTable database. In this case we have not adopted a second table for SDSS in Casjobs: first, because unlike GALEX, the SDSS goes faint (by >2 magnitudes) than the KIC, and second, because only a small portion of the Kepler FOV is covered by this survey.

Notes on false identifications

The matching of objects between two different astronomical catalogs cannot be 100% reliable for a variety of reasons. On one hand, a comparatively small number of matches to the correct counterpart objects can be inadvertently missed (for example, the secondary identications referred to above may be the correct ones). On other hand, apparent dual associations can be made between two KIC objects and an object in another catalog, even though neither has a formal match. These are subtle points, so we will elaborate.

False and missed identifications can still occur with a 1:1 matching criterion. Consider, first, that any catalog may contain two entries with very small angular separations between them. One of these may be an image artifact, perhaps not. In such cases where artifacts are listed, crossmatches of them to a secondary catalog will obviously lead to questionable results. Second, one catalog may not go as deep in brightness as another, making the object with the shallower exposure hard to discern; the project pipeline may decide not to extract it. It may not show up in the catalog, or if it does, the faint object's coordinates may be inaccurate. Third, the matching catalogs are constructed from images extracted from a montage of individual observations of overlapping areas of the sky. Under these conditions the same object can occasionally be assigned inaccurate coordinates. As discussed below, if the positional errors are larger than the catalog’s the rejection criterion of duplicate matches, then duplicate associations can be found for objects listed in the KIC and another catalog.

Here are two examples of how matches can go wrong:
Consider first the example given on Part 2 of the CasJobs GOhelp page. This discusses the example of two apparent GALEX objects matching to a particular KIC object, KIC7434250, if the 1:1 match condition is ignored. In this case both FUV and NUV GALEX identifications have been assigned coordinates each near the same KIC object, but these coordinates are inaccurate because in both observations the object is detected near the edges of the observation field, where the coordinates are often inaccurate. In fact, the coordinates from one or both of them are in error enough in this particular case that the GALEX catalog recognizes them falsely as two separate GALEX objects. Since our matching relies on assumed 1:1 matches, the matching of Kepler KIC7434250 is not made, and our Color Table gives no matches. This is an example of a missed match. It will be lost to researchers searching for this match on the Target Search form. Note that this form does not give information on the distances of objects from each of their secondary matches, so the investigator has no clue that a match has been missed. (The solution is to go to the more liberal KGMatch table in CasJobs, This table does not require 1:1 matches, so the apparent dual matches can be discovered.)

A second example follows of a double close association that leads to a possible ambiguity but not to a missed object. If one searches on the coordinates (292.2301245, 37.589047) and makes sure that the search radius is set to at least to 0.03 arcminutes, the retrieval page shows two KIC objects (rows 1and 3) and one UKIRT object (row 2). This anomaly again occurs because of the requirement for 1:1 matching. This UKIRT object is very close to both KIC objects, so it cannot be matched uniquely to either of them. Our procedure causes the the UKIRT object as be recognized an independent object, probably incorrectly. An ambiguity in the match to KIC objects results. The alert user can then make the appropriate decisions. Note that is also possible that in this case one of the objects is false. However, this cannot be demonstrated because the two KIC entries have different KIC r magnitudes. As of this moment, the correct association for this UKIRT object appears in limbo.

These and other examples require user judgment calls, usually settled by more information. One way of resolving these difficult cases is to see if matches are found from additional catalogs. The take-away from all this is that the matches between catalogs require simple follow ups: they are not written in stone. Users should investigate such cases and be aware that their judgments carry an element of risk.