The HST Pipeline: What Processing Was Done to My Data?

The calibration of raw HST observations involves a number of data processing steps. Here is an insider’s look at creating calibrated data products for the MAST archive.

Matthew Burger

The HST Calibration Pipeline

Before data are archived in MAST, it passes through the data calibration pipeline. The pipeline itself is implemented as workflows in HTCondor with the in-house Open Workflow Layer (OWL) managing the jobs that are submitted for processing. A typical workflow for a new science association is shown in Figure 1. The process names and descriptions are detailed in Table 1.

Figure 1: Workflow for WFC3 dataset id8f23030. The steps proceed from bottom to top.

Process Name Description
dan_receipt Science data receipt
INGEST Archive raw exposure
2FITS Convert raw data format to FITS
RF Determine best reference files
BC Before-Calibration checks
CA Calibration step
MD Multi-drizzle/astro-drizzle
AC After-Calibration checks
INGEST_SCI Archive calibrated science data
PVW Create preview files
INGEST_PVW Archive preview files
CL Cleanup
Table 1: New science calibration pipeline steps

Record of Pipeline Steps Taken

The steps taken by the calibration pipeline are recorded in several of the FITS files delivered by MAST. As an example, take the ACS association J9CV58020 made from the exposures J9CV58TZQ and J9CV58U1Q. In the primary header for the raw data files (j9cv58tzq_raw.fits and j9cv58u1q_raw.fits), there is a section of keywords with the header "/ CALIBRATION SWITCHES: PERFORM, OMIT, COMPLETE". A series of keywords/values follows:

DQICORR = 'PERFORM '           / data quality initialization                    
ATODCORR= 'OMIT    '           / correct for A to D conversion errors           
BLEVCORR= 'PERFORM '           / subtract bias level computed from overscan img 
BIASCORR= 'PERFORM '           / Subtract bias image                            
FLSHCORR= 'OMIT    '           / post flash correction                          
CRCORR  = 'OMIT    '           / combine observations to reject cosmic rays     
EXPSCORR= 'PERFORM '           / process individual observations after cr-reject
SHADCORR= 'OMIT    '           / apply shutter shading correction               
PCTECORR= 'PERFORM '           / cte correction                                 
DARKCORR= 'PERFORM '           / Subtract dark image                            
FLATCORR= 'PERFORM '           / flat field data                                
PHOTCORR= 'PERFORM '           / populate photometric header keywords           
RPTCORR = 'OMIT    '           / add individual repeat observations             
DRIZCORR= 'PERFORM '           / drizzle processing
                         

These are the steps in the calibration pipeline that need to be performed, are omitted, or have been completed (none have been completed in the raw file). The final associated products (j9cv58020_drz.fits and j9cv58020_drc.fits) have the following for the same series of keywords:

DQICORR = 'COMPLETE'           / data quality initialization                    
ATODCORR= 'OMIT    '           / correct for A to D conversion errors           
BLEVCORR= 'COMPLETE'           / subtract bias level computed from overscan img 
BIASCORR= 'COMPLETE'           / Subtract bias image                            
FLSHCORR= 'OMIT    '           / post flash correction                          
CRCORR  = 'OMIT    '           / combine observations to reject cosmic rays     
EXPSCORR= 'COMPLETE'           / process individual observations after cr-reject
SHADCORR= 'OMIT    '           / apply shutter shading correction               
DARKCORR= 'COMPLETE'           / Subtract dark image                            
FLATCORR= 'COMPLETE'           / flat field data                                
PHOTCORR= 'COMPLETE'           / populate photometric header keywords           
DRIZCORR= 'COMPLETE'           / drizzle processing

Now the steps that had been marked as "PERFORM" are "COMPLETE." Note that the list of pipeline steps in the raw and drz/drc files are not identical because calibration steps for associations and their members may not be the same.

The full processing log for data files are found in the trl files (e.g., j9cv58020_trl.fits, j9cv58tzq_trl.fits, and j9cv58u1q_trl.fits). The logs are stored as a binary table in the first data extension of the FITS file. These log files are somewhat more difficult to read, as seen in the excerpt below:

('>>>>>>>>>>>>>>>>>>>> /ifs/archive/ops/hst/store/HSTDP-2015_3-160126//bin/exposure_times.py 
j9cv58020  <<<<<<<<<<<<<<<<<<<<')
('2016074184159-I-INFO-Start ------ Exposure Times Updater for j9cv58020 ------')
('2016074184159-I-INFO-exposure_times-UPDATE_EXPOSURE_TIMES is False. No update necessary.')
('2016074184159-I-INFO-  End ------ Exposure Times Updater Nothing to do for j9cv58020 ------')
('FYI: exit( 0 )')
('2016074183916-I-INFO DP_open_newobs: -------------- Data Partitioning Started: j9cv58tzq 
------------ (46927151842912)')
('2016074183916-I-INFO DP_open_newobs: Partitioning from POD file: 
lz_bdc5_067_0000102262_j9cv58tzq (46927151842912)')
('2016074183916-I-INFO Search osf is ????????-
p???????????????????????.j9cv58tzq_______________________________-acs-???-????')
('(46927151842912)')
('2016074183916-I-INFO Search osf is ????????-
????????????????????????.j9cv58tzq_______________________________-acs-???-????')

They do however provide more detailed information, including timestamps in the form “YYYYDDDHHMMSS” where YYYY is the year, DDD is the day of year, HH is the hour, MM is the minute, and SS is the second.

The version of the pipeline software used to process the data is given in keywords in the primary header of the FITS files. The section of keywords with the header "/ DIAGNOSTIC KEYWORDS" contains the following keywords and values:

OPUS_VER= 'COMMON 2017_2     ' / data processing software system version        
CAL_VER = '3.4.1 (20-April-2017)' / CALSTIS code version                        
PROCTIME=   5.793008144676E+04 / Pipeline processing time (MJD)                 
CSYS_VER= 'hstdp-2017.2'       / Calibration software system version id

"OPUS_VER" and "CSYS_VER" give the current Data Management System (DMS) build version, and "CAL_VER" gives the version of CALSTIS used to process this dataset (STIS association od0m070b0). The keyword “PROCTIME” gives the modified Julian date of the last processing (26 June 2017, 01:57:17 UT for this dataset). Information regarding the current DMS build can be found at https://archive.stsci.edu/hst/processing_status/. Note that these keywords are standard for the current DMS build, but may be different for older archival data.

Are you interested in more details or do you have additional questions? Please send an email to the MAST Helpdesk at archive@stsci.edu.

Monitoring the Processing of HST Data: A New Pipeline Processing Status Page

The HST Instrument and Data Management Teams at STScI are continually working to improve the quality of the data products we provide at MAST. Read more about how you can keep updated on the status of the data reprocessing efforts.

Anton Koekemoer

The release of the HST Online Cache has resulted in dramatically improved retrieval times for HST data, compared to the older on-the-fly reprocessing (OTFR) system, since the most up-to-date version of the data are now generally available on disk after having been processed using the latest software deliveries and calibration reference files. When new calibration reference files or software versions are delivered for use in the HST Pipeline, the relevant datasets are reprocessed to take advantage of these improvements. The current status of any such reprocessing, along with the software versions delivered to the pipeline, and related information, are presented on a new HST Pipeline Processing page (https://archive.stsci.edu/hst/processing_status/). This page also includes information about previous software version deliveries to the pipeline, which can be useful in determining what types of improvements in data processing are included in the latest pipeline software deliveries.

Have questions about our data reprocessing efforts or the HST Pipeline Processingstatus page? Please contact us at archive@stsci.edu for help.

The TESS Input Catalog and Candidate Target List at MAST

The Transiting Exoplanet Survey Satellite (TESS) mission team has released two catalogs of astronomical sources for the community to explore in preparation for Guest Investigator program.

The Transiting Exoplanet Survey Satellite (TESS) (https://tess.gsfc.nasa.gov/) will be conducting a nearly all-sky photometric survey over two years, with a core mission goal to discover small transiting exoplanets orbiting nearby bright stars. It will obtain 30-minute cadence observations of all objects in the TESS fields of view, along with 2-minute cadence observations of 200,000 to 400,000 selected stars. The choice of which stars to observe at the 2-min cadence is driven by the need to detect small transiting planets, which leads to the selection of primarily bright, cool dwarfs.

Essential to the success of the TESS mission is the TESS Input Catalog (TIC), a catalog of luminous sources on the sky. The purpose of the TIC is to enable the selection of optimal targets for the planet transit search, to enable calculation of flux contamination in the TESS aperture for each target, and to provide the most accurate stellar and planetary radii estimates, which determine which targets receive mission-funded photometric and spectroscopic follow-up. From the TIC we construct a prioritized Candidate Target List (CTL), from which the final 2-min cadence target stars are selected. Priorities are established via a scheme that emphasizes detectability of small planets. The TIC and CTL are essential for the community to select potential additional targets through the Guest Investigator program.

The first publicly available versions of the TIC and CTL have been staged on MAST (https://archive.stsci.edu/tess/tic_search.html), and a value-added version of the CTL has also been staged on Filtergraph (https://filtergraph.com/tess_ctl) . The full TIC includes some 596 million objects (470 million point sources, 125 million extended sources, and about 1 million objects from specially curated lists, such as an M-dwarf list and a bright stars list). The TIC includes positions and TESS magnitudes for all objects, and the CTL includes a prioritized ranking of putative dwarfs in the TIC. The value-added CTL also provides best-estimate stellar properties, such as effective temperatures, radii, masses, and many other quantities.

A paper providing comprehensive documentation of the manner in which the TIC and CTL were constructed, and providing complete documentation of the various algorithms employed to estimate the stellar properties, is now posted on arXiv (https://arxiv.org/abs/1706.00495).

Questions about how to access the TIC and CTL can be directed to MAST at archive@stsci.edu or posted on the MAST TESS Forum.

New and Updated High-Level Science Products available from MAST

The MAST team has recently released a number of new high level science products on topics from solar system to galaxy cluster science. Read more about these programs and how to access their data products from MAST.

Jonathan Hargis

The High-Level Science Products Team at MAST has been recently released a number new High-Level Science Products (HLSPs) to the astronomical community. These include:

Questions about HLSP data products can be sent to MAST at archive@stsci.edu or posted on the MAST Forum.