Classes

Contents

Classes#

class src.exomercat.catalogs.Catalog#

Bases: object

A base class for managing exoplanet catalogs.

This class provides a foundation for handling various exoplanet catalogs. It includes methods for downloading, reading, and processing catalog data, as well as standardizing and validating the information.

download_catalog(url, filename, local_date, timeout=None)#

Download a catalog from a given URL and save it to a file.

This method attempts to download the catalog for the specified date. If the file already exists locally, it uses that file. If downloading fails, it attempts to use the most recent local copy.

Parameters:
  • self (Catalog) – An instance of class Catalog

  • url (str) – The URL from which to download the catalog.

  • filename (str) – The name of the file to save the catalog to.

  • local_date (str) – The date of the catalog to download.

  • timeout (float) – The maximum amount of time to wait for the download to complete. Default is None.

Returns:

The path to the downloaded file.

Return type:

Path

check_input_columns()#

Check if the loaded data contains all required columns.

Parameters:

self – An instance of class Catalog

Returns:

A comma-separated string of missing column names, if any.

Return type:

str

check_column_dtypes()#

Check if the data types of columns match the expected types.

Parameters:

self – An instance of class Catalog

Returns:

A comma-separated string of columns with mismatched data types, if any.

Return type:

str

find_non_ascii()#

Identify non-ASCII characters in string columns of the dataset.

Parameters:

self – An instance of class Catalog

Returns:

A dictionary where keys are column names and values are lists of row indices containing non-ASCII characters.

Return type:

dict

read_csv_catalog(file_path_str)#

Read a CSV file into the catalog’s data attribute.

Parameters:
  • self (Catalog) – An instance of class Catalog

  • file_path_str (Union[Path, str]) – Specify the file path of the csv file

Returns:

None

Return type:

None

Raises:

ValueError – If reading of the .csv file fails.

keep_columns()#

Retain only specified columns in the dataframe.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

identify_brown_dwarfs()#

Identify possible brown dwarfs in the dataframe based on naming conventions.

This method marks potential brown dwarfs by setting the ‘letter’ column to ‘BD’ and handles special cases for certain objects.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

replace_known_mistakes()#

Replace known errors in the dataframe based on predefined rules.

This method applies corrections specified in the ‘replacements.ini’ file, including dropping rows, replacing values, and standardizing names and coordinates.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

remove_theoretical_masses()#

Remove theoretical masses from the dataframe.

This process is catalog-dependent, so it is not implemented in this base class. Subclasses should override this method to provide the specific implementation for their respective catalog.

Parameters:

self – An instance of class Catalog

Returns:

None

Return type:

None

Raises:

NotImplementedError – This method is not implemented in the base class.

handle_reference_format()#

Standardize the reference format for various parameters.

This process is catalog-dependent, so it is not implemented in this base class. Subclasses should override this method to provide the specific implementation for their respective catalog.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

Raises:

NotImplementedError – This method is not implemented in the base class.

standardize_catalog()#

Standardize the dataframe columns and values.

This process is catalog-dependent, so it is not implemented in this base class. Subclasses should override this method to provide the specific implementation for their respective catalog.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

Raises:

NotImplementedError – This method is not implemented in the base class.

make_errors_absolute()#

Makes all columns related to errors absolute values.

This function takes in a DataFrame and returns a DataFrame where all the columns related to errors are made absolute. The columns that are modified are: p_max, a_max, e_max, i_max, r_max, msini_max, mass_max, p_min, a_min, e_min, i_min, r_min, msini_min, mass_min.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

remove_impossible_values()#

Remove impossible or nonsensical values from various parameters in the dataset.

This method performs the following operations:

  1. Sets negative values of ‘p’, ‘a’, ‘e’, ‘i’, ‘r’, ‘msini’, and ‘mass’ to NaN.

  2. Sets the corresponding ‘_min’ and ‘_max’ values to NaN when the main value is negative.

  3. Sets eccentricity (‘e’) values greater than 1 to NaN, along with their ‘_min’ and ‘_max’ values.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

standardize_name_host_letter()#

Standardize the ‘name’, ‘host’, and ‘letter’ columns in the data.

This function performs the following operations:

  1. Standardizes the ‘name’ column using the Utils.standardize_string function.

  2. Fills empty ‘host’ values with the corresponding ‘name’ value.

  3. Cleans and standardizes the ‘host’ column, removing specific suffixes and applying Utils.standardize_string.

  4. Refines the ‘alias’ column by removing specific characters and standardizing the values.

  5. Assigns the ‘letter’ column based on specific conditions from the ‘name’ column.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

assign_status()#

Assign a status to each planet based on the status column.

This process is catalog-dependent, so it is not implemented in this base class. Subclasses should override this method to provide the specific implementation for their respective catalog.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

Raises:

NotImplementedError – If the function is called directly from the base class.

check_mission_tables(table_path_str)#

Check and update the dataframe against mission tables.

This function compares the objects in the dataframe with entries in the specified mission table. If a match is found, it updates the object’s status, discovery method, and aliases. It also handles cases where the object name or host name matches entries in the mission table.

Parameters:
  • self (Catalog) – An instance of class Catalog

  • table_path_str (str) – The file path to the mission table (CSV format)

Returns:

None

Return type:

None

Raises:
  • FileNotFoundError – If the specified table file does not exist

  • pd.errors.EmptyDataError – If the table file is empty

fill_binary_column()#

Fills the binary column of the dataframe with appropriate values.

This function performs the following operations:

  1. Initializes the ‘binary’ column with empty strings.

  2. Cleans up the ‘host’ column by removing planet names if present.

  3. Identifies and marks circumbinary systems (AB).

  4. Identifies and marks simple binary systems (A, B, C, N, S).

  5. Cleans the ‘host’ column by removing extra spaces and characters.

  6. Handles NASA-specific cases using the ‘cb_flag’ column if present.

  7. Handles OEC-specific cases using the ‘binaryflag’ column if present.

The function uses regular expressions to identify different binary system patterns in the ‘name’ and ‘host’ columns. It updates the ‘binary’ column accordingly and adjusts the ‘host’ column as needed.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

create_catalogstatus_string(string)#

Creates a new column in the dataframe which is a concatenation of the Catalog and status columns.

This function generates a new column in the dataframe by combining the ‘catalog’ and ‘status’ columns. The new column’s name is specified by the ‘string’ parameter. This method can be used to create either an “original” status column (as provided by the catalog) or a “checked” status column (as determined by EMC after cross-checking with KOI/K2 catalogs).

Parameters:
  • self (Catalog) – An instance of class Catalog

  • string (str) – The name of the new column to be created

Returns:

None

Return type:

None

make_standardized_alias_list()#

Standardize and consolidate alias lists for each host in the catalog.

The method standardizes and consolidates alias lists for each host in the catalog. It groups the data by host, combines all aliases for each host, removes invalid values, standardizes the aliases, and removes duplicates. The resulting ‘alias’ column will contain a comma-separated string of unique, standardized aliases for each host.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

convert_coordinates()#

Convert the ra and dec columns of the dataframe to decimal degrees.

This process is catalog-dependent, so it is not implemented in this base class. Subclasses should override this method to provide the specific implementation for their respective catalog.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

Raises:

NotImplementedError – This method is not implemented in the base class.

fill_nan_on_coordinates()#

Fill missing values in the ‘ra’ and ‘dec’ columns of the dataframe with NaN.

Parameters:

self (Catalog) – An instance of class Catalog

Returns:

None

Return type:

None

print_catalog(filename)#

Print the catalog data to a CSV file.

This method saves the catalog’s dataframe to a CSV file at the specified location.

Parameters:

filename (Union[str, Path]) – The path where the CSV file will be saved.

Returns:

None

Return type:

None

class src.exomercat.epic.Epic#

Bases: Catalog

A class representing the EPIC (Exoplanet Population Information Catalog) catalog.

This class inherits from the Catalog base class and provides specific functionality for handling and processing data from the EPIC catalog. It includes methods for standardizing the catalog data, converting coordinates, handling reference formats, and assigning status to entries.

standardize_catalog()#

Standardize the EPIC catalog data.

This method performs the following operations:

  1. Sets the catalog name and catalog-specific names.

  2. Renames columns to standard names used across all catalogs.

  3. Filters data based on the default_flag.

  4. Adds and modifies columns such as Kepler_host, letter, and alias.

  5. Converts discovery methods to standard format.

Parameters:

self (Epic) – The instance of the Epic class.

Returns:

None

Return type:

None

convert_coordinates()#

Convert coordinates to decimal degrees.

This method is a placeholder and does not perform any operations, as the EPIC catalog already has coordinates in decimal degrees.

Parameters:

self (Epic) – An instance of class Epic

Returns:

None

Return type:

None

Note:

This function is not necessary for the EPIC catalog, as the coordinates are already in decimal degrees.

remove_theoretical_masses()#

Remove theoretical masses from the dataframe.

This method is a placeholder and does not perform any operations, as it’s not necessary for the EPIC catalog.

Parameters:

self (Epic) – The instance of the Epic class.

Returns:

None

Return type:

None

Note:

This function is not necessary for the Epic catalog.

handle_reference_format()#

Standardize reference format and create URL columns for parameters.

This method performs the following operations:

  1. Adds URL columns for each parameter (e, mass, msini, i, a, p, r).

  2. Extracts and standardizes URLs from the reference column.

  3. Replaces null values with empty strings in URL columns.

Parameters:

self (Epic) – The instance of the Epic class.

Returns:

None

Return type:

None

assign_status()#

Assign status to each entry based on the ‘disposition’ column.

This method maps the values in the ‘disposition’ column to standard status values:

  • ‘CONFIRMED’ for confirmed planets

  • ‘CANDIDATE’ for candidate planets

  • ‘FALSE POSITIVE’ for false positives and refuted planets

The method also logs the updated status counts.

Parameters:

self (Epic) – The instance of the Epic class.

Returns:

None

Return type:

None

class src.exomercat.koi.Koi#

Bases: Catalog

A class representing the Kepler Objects of Interest (KOI) catalog.

This class inherits from the Catalog class and provides specific functionality for handling and processing data from the KOI catalog. It includes methods for standardizing the catalog data, converting coordinates, and managing KOI-specific attributes.

standardize_catalog()#

Standardize the Kepler Objects of Interest catalog data.

This method performs the following operations:

  1. Selects relevant columns from the raw data.

  2. Creates new columns: KOI, KOI_host, Kepler_host, KIC_host.

  3. Generates a ‘letter’ column for planet designation.

  4. Creates ‘alias’ and ‘aliasplanet’ columns with various identifiers.

  5. Renames and creates standard columns like ‘name’, ‘disposition’, and ‘discoverymethod’.

  6. Retains only the standardized columns in the final dataset.

Parameters:

self (Koi) – The instance of the Koi class.

Returns:

None

Return type:

None

convert_coordinates()#

Convert right ascension (RA) and declination (Dec) from string format to decimal degrees.

This method performs the following operations:

  1. Replaces missing values in RA and Dec columns with empty strings.

  2. Converts RA and Dec from string format (HH:MM:SS) to decimal degrees using astropy’s SkyCoord.

  3. Assigns NaN to entries where conversion is not possible (empty strings).

Parameters:

self (Koi) – An instance of class Koi

Returns:

None

Return type:

None

class src.exomercat.eu.Eu#

Bases: Catalog

A class representing the Exoplanet Encyclopedia (EU) catalog.

This class inherits from the Catalog class and provides specific functionality for handling and processing data from the Exoplanet Encyclopedia. It includes methods for standardizing the catalog, removing theoretical masses, assigning status to planets, handling reference formats, and converting coordinates.

standardize_catalog()#

Standardize the Exoplanet Encyclopaedia catalog data.

This method performs the following operations:

  1. Sets the catalog name.

  2. Replaces “None” and “nan” values.

  3. Renames columns to standard names used across all catalogs.

  4. Adds new columns such as catalog_name, catalog_host, and reference.

  5. Processes and standardizes the alias information.

  6. Converts discovery methods to a standard format.

Parameters:

self (Eu) – An instance of class Eu

Returns:

None

Return type:

None

remove_theoretical_masses()#

Remove theoretical masses and radii from the dataframe.

This method sets mass, msini, and radius values (including their error ranges) to NaN where the MASSPROV or RADPROV columns indicate theoretical values.

Parameters:

self (Eu) – An instance of the Eu class.

Returns:

None

Return type:

None

assign_status()#

Assign status to each entry based on the ‘planet_status’ column.

This method maps the values in the ‘planet_status’ column to standard status values:

  • ‘CONFIRMED’ for confirmed planets

  • ‘CANDIDATE’ for candidate, unconfirmed, or controversial planets

  • ‘FALSE POSITIVE’ for retracted planets

The method also logs the updated status counts.

Parameters:

self (Eu) – An instance of the Eu class.

Returns:

None

Return type:

None

handle_reference_format()#

Create placeholder URL references for each parameter.

Since the Exoplanet Encyclopaedia does not provide specific references, this method creates a placeholder ‘eu’ URL for each non-null, finite parameter value.

Parameters handled: e, mass, msini, i, a, p, r

Parameters:

self (Eu) – An instance of class Eu

Returns:

None

Return type:

None

convert_coordinates()#

Convert coordinates to decimal degrees.

This method is a placeholder and does not perform any operations, as the Exoplanet Encyclopaedia catalog already has coordinates in decimal degrees.

Parameters:

self (Eu) – An instance of class Eu

Returns:

None

Return type:

None

Note:

It is not necessary for Eu, as the coordinates are already in decimal degrees.

class src.exomercat.oec.Oec#

Bases: Catalog

The Oec class represents the Open Exoplanet Catalogue.

This class inherits from the Catalog class and provides specific functionality for handling and processing data from the Open Exoplanet Catalogue. It includes methods for downloading, standardizing, and manipulating the catalog data.

download_catalog(url, filename, local_date, timeout=None)#

Download the Open Exoplanet Catalogue from a given URL and save it to a file.

This method performs the following operations:

  1. Checks if a local file for the given date already exists.

  2. If not, attempts to download the catalog from the URL.

  3. Converts the downloaded XML file to CSV format.

  4. If download fails, attempts to use the most recent local copy.

  5. Handles various error scenarios and provides appropriate logging.

Parameters:
  • self (Catalog) – An instance of class Catalog

  • url (str) – The URL from which to download the catalog.

  • filename (str) – The name of the file to save the catalog to.

  • local_date (str) – The date of the catalog to download.(format: YYYY-MM-DD)

  • timeout (float) – The maximum amount of time to wait for the download to complete. Default is None.

Returns:

The path to the downloaded file.

Return type:

Path

standardize_catalog()#

Standardize the Open Exoplanet Catalogue data.

This method performs the following operations:

  1. Sets the catalog name.

  2. Renames columns to standard names used across all catalogs.

  3. Adds new columns such as host, catalog_name, and catalog_host.

  4. Separates mass into msini and mass based on the masstype column.

  5. Cleans host names.

  6. Converts discovery methods to a standard format.

Parameters:

self (Oec) – An instance of class Oec

Return type:

None

Returns:

None

remove_theoretical_masses()#

Remove theoretical masses from the dataframe.

This method is a placeholder and does not perform any operations, as the Open Exoplanet Catalogue does not include theoretical masses.

Parameters:

self (Oec) – The instance of the Oec class.

Returns:

None

Return type:

None

Note:

This function is not necessary for the Oec catalog, since it does not have theoretical masses.

assign_status()#

Assign status to each entry based on the ‘list’ column.

This method assigns status as follows:

  • “CONFIRMED” if ‘Confirmed’ is in the list.

  • “CANDIDATE” if ‘Controversial’ is in the list or for Kepler Objects of Interest.

  • “FALSE POSITIVE” if ‘Retracted’ is in the list.

The method also logs the updated status counts.

Parameters:

self (Oec) – An instance of class Oec

Returns:

None

Return type:

None

handle_reference_format()#

Standardize the reference format for various parameters.

This method creates a ‘_url’ column for each parameter (e, mass, msini, i, a, p, r), setting the value to ‘oec’ for non-null, finite values and an empty string otherwise.

Parameters:

self (Oec) – An instance of class Oec

Returns:

None

Return type:

None

convert_coordinates()#

Convert right ascension (RA) and declination (Dec) from string format to decimal degrees.

This method performs the following operations:

  1. Replaces missing values in RA and Dec columns with empty strings.

  2. Converts RA and Dec from string format (HH:MM:SS) to decimal degrees using astropy’s SkyCoord.

  3. Assigns NaN to entries where conversion is not possible (empty strings).

Parameters:

self (Oec) – An instance of class Oec

Returns:

None

Return type:

None

class src.exomercat.nasa.Nasa#

Bases: Catalog

A class representing the NASA Exoplanet Archive catalog.

This class inherits from the Catalog base class and provides specific functionality for processing and standardizing data from the NASA Exoplanet Archive. It includes methods for initializing the catalog, standardizing the data format, handling references, assigning planet status, and removing theoretical masses and radii.

standardize_catalog()#

Standardize the NASA Exoplanet Archive catalog data.

This method performs the following operations:

  1. Sets the catalog name.

  2. Renames columns to standard names used across all catalogs.

  3. Adds new columns such as catalog_name and catalog_host.

  4. Splits best mass into mass and msini.

  5. Processes and standardizes the alias information.

  6. Converts discovery methods to a standard format.

Parameters:

self (Nasa) – An instance of class Nasa

Returns:

None

Return type:

None

sort_bestmass_to_mass_or_msini()#

Sort the ‘bestmass’ values into either ‘mass’ or ‘msini’ columns.

This method categorizes the ‘bestmass’ values based on the ‘bestmass_provenance’:

  • If ‘Mass’, values are placed in the ‘mass’ columns.

  • If ‘Msini’, values are placed in the ‘msini’ columns.

  • If ‘M-R relationship’ or ‘Msin(i)/sin(i)’, both ‘mass’ and ‘msini’ are set to NaN.

  • For any other provenance, a RuntimeError is raised.

Parameters:

self (Nasa) – An instance of the Nasa class

Raises:

RuntimeError – If ‘bestmass’ is not a mass or an ‘msini’

Returns:

None

Return type:

None

handle_reference_format()#

Standardize the reference format for various parameters.

This method performs the following for each parameter (e, mass, msini, i, a, p, r):

  1. Ensures a ‘_url’ column exists for each parameter.

  2. Extracts the bibcode from the reference URL.

  3. Standardizes the URL format.

  4. Sets empty strings for null values.

Parameters:

self (Nasa) – The instance of the Nasa class.

Returns:

None

Return type:

None

assign_status()#

Assign status to each entry in the catalog.

For the NASA Exoplanet Archive, all entries are set to “CONFIRMED” by default.

Parameters:

self (Nasa) – An instance of the Nasa class.

Returns:

None

Return type:

None

convert_coordinates()#

Convert coordinates to decimal degrees.

This method is a placeholder and does not perform any operations, as the NASA Exoplanet Archive catalog already has coordinates in decimal degrees.

Parameters:

self (Nasa) – An instance of class Nasa

Returns:

None

Return type:

None

Note:

It is not necessary for Nasa, as the coordinates are already in decimal degrees.

remove_theoretical_masses()#

Remove theoretical masses and radii from the catalog.

This method removes values for mass, msini, and radius (including their error ranges and URLs) where the corresponding URL contains “Calculated”, indicating a theoretical value.

Parameters:

self (Nasa) – An instance of the Nasa class.

Returns:

None

Return type:

None

class src.exomercat.emc.Emc#

Bases: Catalog

The Emc (Exo-MerCat) class is a subclass of Catalog that represents the Exo-MerCat catalog. It contains methods for processing, merging, and managing exoplanet data from various sources. This class provides functionality for data cleaning, standardization, and analysis of exoplanet information.

convert_coordinates()#

Convert the right ascension (RA) and declination (Dec) columns of the dataframe to decimal degrees.

This function is not implemented as the EMC already has coordinates in decimal degrees.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

Note:

This function is not necessary as the EMC already has coordinates in decimal degrees.

alias_as_host()#

Check if any aliases are labeled as hosts in some other entry and standardize the host name.

This function takes the alias column of a dataframe and checks if any of the aliases are labeled as hosts in some other entry. If an alias is labeled as a host, it changes the host to be that of the original host. It then adds all aliases of both hosts into one list for each row. It logs results into “Logs/alias_as_host.txt”.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

check_binary_mismatch(keyword, tolerance=0.0002777777777777778)#

Check for binary mismatches in the dataframe.

This function checks if there are multiple values of binary for a given system (identified by name and letter). It attempts to standardize the binary values and flags complex systems or those with coordinate disagreements.

If there are multiple values of binary for a given system (identified by name and letter), it tries to replace the null or S-type binaries with the value of another non-null entry in that system. If all entries have null or S-type binaries, it replaces them with ‘S-type’. If there are multiple non-null values of binary, it flags this as a complex system and does not try to correct anything. It also flags systems where coordinates do not agree within a tolerance (by default 1 arcsecond).

Parameters:
  • self (Emc) – The instance of the Emc class.

  • keyword (str) – The keyword to search for in the dataframe.

  • tolerance (float) – The tolerance to use for coordinate comparisons. (default is 1/3600)

Returns:

None

Return type:

None

Prepare columns for the search of the main identifier.

This function prepares various columns including “hostbinary”, “aliasbinary”, “main_id”, “list_id”, “main_id_ra”, “main_id_dec”, “angular_separation”, and “main_id_provenance” for the search of the main identifier.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

fill_mainid_provenance_column(keyword)#

Fills the ‘main_id_provenance’ column with the provided keyword if ‘main_id_provenance’ is empty and ‘main_id’ is not empty for each relevant index.

This function is used to track the source of the main identifier for each entry in the catalog. It only updates entries where the main_id_provenance is currently empty but a main_id exists, avoiding overwriting any existing provenance information.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • keyword (str) – The keyword to fill the ‘main_id_provenance’ column with.

Returns:

None

Return type:

None

Searches for host stars in SIMBAD using the specified column.

This function takes a column name as an argument and searches for the host star in that column in SIMBAD. It then fills in the main_id, list_id, main_id_ra, and main_id_dec columns with information from SIMBAD if it finds a match.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • typed_id (str) – The name of the column that contains the host star to search for (host or hostbinary)

Returns:

None

Return type:

None

Searches for the main ID of each object in the specified column using SIMBAD.

This function performs the following steps:

  1. Creates a DataFrame of aliases from the specified column

  2. Queries SIMBAD for each alias

  3. Updates the main dataframe with the SIMBAD information

Parameters:
  • self (Emc) – The instance of the Emc class.

  • column (str) – The name of the column that contains the host star aliases to search for (e.g., ‘alias’ or ‘aliasbinary’)

Return type:

None

Returns:

None

get_host_info_from_simbad()#

Queries SIMBAD for the main identifier based on the host star name.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

get_coordinates_from_simbad(tolerance=0.0002777777777777778)#

Prepares a query for SIMBAD, executes the query and then merges the results with the original dataframe.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • tolerance (float) – The tolerance for the query in degrees (default is 1 arcsecond)

Returns:

None

Return type:

None

get_host_info_from_tic()#

Retrieves host information from the TIC (TESS Input Catalog) for hosts with TIC identifiers.

This function performs the following steps:

  1. Extracts unique host star names that are TIC identifiers.

  2. Queries the TIC for each of these names.

  3. Merges the obtained information with the original dataframe.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

get_coordinates_from_tic(tolerance=0.0002777777777777778)#

Retrieves coordinates from the TESS Input Catalog (TIC) for objects without main IDs.

This function performs the following steps:

  1. Prepares a query for the TIC using objects without main IDs.

  2. Executes the query to retrieve matching TIC entries.

  3. Merges the results with the original dataframe.

  4. Updates the main_id, coordinates, and other relevant fields for matched objects.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • tolerance (float) – The tolerance for the query in degrees (default is 1 arcsecond)

Returns:

None

Return type:

None

check_coordinates(tolerance=0.0002777777777777778)#

Checks for mismatches in the RA and DEC coordinates of a given host.

This function is used for targets that cannot rely on SIMBAD or TIC MAIN_ID because the query was unsuccessful. It groups all entries with the same host name, then checks if any of those entries have a RA or DEC that is more than a given tolerance away from the mode value for that group. If so, it logs information about those mismatched values to a file called “check_coordinates.txt”.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • tolerance (float) – The tolerance for coordinate mismatch, in degrees. Default is 1 arcsecond (1/3600 degrees).

Returns:

None

Return type:

None

replace_old_new_identifier(identifier, new_identifier, binary=None)#

Replaces the old identifier with the new identifier in the dataframe.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • identifier (str) – The old identifier

  • new_identifier (str) – The new identifier

  • binary (str) – The binary string

Returns:

The explanation string for logging purposes

Return type:

str

polish_main_id()#

Polish the main_id column in the data by removing planet/binary letters.

This function iterates over the unique values in the main_id column of the data and performs the following operations:

  1. Check for planet letters in the main_id column. If a planet letter is found, it tries to look for the corresponding star in SIMBAD and replaces the main_id with the star’s main_id.

  2. Check for binary letters in the main_id column. If a binary letter is found, it checks if the binary value is already in the binary column. It then replaces the main_id with the modified identifier.

  3. All of the above operations are logged in a text file named “Logs/polish_main_id.txt”.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

fill_missing_main_id()#

Fill missing values in main_id related columns with data from other columns.

This function performs the following operations:

  1. Fills empty ‘main_id_provenance’ with values from ‘catalog’.

  2. Fills empty ‘main_id’ with values from ‘host’.

  3. Fills empty ‘main_id_ra’ with values from ‘ra’, converting to float.

  4. Fills empty ‘main_id_dec’ with values from ‘dec’, converting to float.

  5. Creates ‘angular_separation’ by concatenating ‘catalog’ and ‘angsep’.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

check_same_host_different_id()#

Checks if there are instances where the same host has multiple SIMBAD main IDs.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

check_same_coords_different_id(tolerance=0.0002777777777777778)#

Checks if there are any instances where the same host has multiple SIMBAD main IDs.

This might happen in case of very close stars or binary stars. The user should check in Logs/post_main_id_query_checks.txt that the two main ids and coordinates are indeed different stars. Otherwise, the user can force a replacement.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

group_by_list_id_check_main_id()#

Groups the data by ‘list_id’ and checks for inconsistencies in ‘main_id’.

This function performs the following steps:

  1. Groups the data by the ‘list_id’ column.

  2. For each group, checks if the ‘list_id’ is not empty and if there are multiple unique ‘main_id’ values.

  3. If inconsistencies are found, it sets all ‘main_id’ values in the group to the first unique ‘main_id’.

  4. Logs details of any inconsistencies found.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

post_main_id_query_checks(tolerance=0.0002777777777777778)#

Performs a series of checks after querying SIMBAD for main IDs.

This function executes three main checks:

  1. Checks for instances where the same host has different main IDs.

  2. Checks for cases where the same coordinates (within a specified tolerance) have different main IDs.

  3. Checks for situations where the same list ID has different main IDs.

The results of these checks are logged in the file ‘Logs/post_main_id_query_checks.txt’ for further analysis and review.

Parameters:
  • self (Emc) – The instance of the Emc class.

  • tolerance (float) – The angular separation tolerance in degrees for considering coordinates as the same. Default is 1 arcsecond (1/3600 degrees).

Returns:

None

Return type:

None

group_by_main_id_set_main_id_aliases()#

Groups the dataframe by main_id and combines alias and list_id columns into a single main_id_aliases column. This function consolidates all identifiers for each unique main_id.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

cleanup_catalog()#

Cleans up the catalog by replacing 0 and infinity values with NaN for specific columns.

This function iterates through a set of columns (‘i’, ‘mass’, ‘msini’, ‘a’, ‘p’, ‘e’) and their corresponding ‘_min’ and ‘_max’ errors. It replaces any values that are exactly 0 or infinity with NaN (Not a Number). This helps to ensure that these extreme values don’t skew analyses or cause issues in later processing steps.

Parameters:

self (Emc) – The instance of the Emc class.

Returns:

None

Return type:

None

group_by_period_check_letter()#

Check for inconsistencies in the letter column and attempt to fix them.

This function performs the following steps:

  1. Groups the data by main_id and binary.

2. For each group with multiple planets it calculates an estimate of period (p) and semi-major axis (a). For each unique period (or semi-major axis if period is not available) it checks for inconsistencies in the letter column and attempts to fix inconsistencies by standardizing the letter.

  1. Logs any inconsistencies and fixes to a file.

Parameters:

self (Emc) – An instance of the Emc class

Returns:

None

Return type:

None

static merge_into_single_entry(group, mainid, binary, letter, period_mismatch_flag=0, fallback_merge_flag=0)#

Merges multiple entries with the same main_id and letter into a single entry.

This function combines information from different catalogs for a specific exoplanet, selecting the best available data and resolving conflicts. It performs the following tasks:

  1. Creates a new entry with the given main_id, binary, and letter.

  2. Selects the most common host name.

  3. Saves catalog-specific names (NASA, TOI, EPIC, EU, OEC).

  4. Selects the best measurement for various parameters (i, mass, msini, r, a, p, e) based on the smallest relative error.

  5. Determines the status of the exoplanet.

  6. Selects the earliest discovery year and combines discovery methods.

  7. Combines aliases.

  8. Sets various flags for mismatches and duplicates.

  9. Selects the best source for coordinates (main_id_ra, main_id_dec).

  10. Logs warnings for multiple main_id_provenance and duplicate entries.

The final entry contains: the official SIMBAD ID and coordinates; the measurements that have the smallest relative error with the corresponding reference; the preferred name, the preferred status, the preferred binary letter (chosen as the most common in the group); year of discovery, method of discovery, and final list of aliases. The function then concatenates all of these entries together into a final catalog.

Parameters:
  • group (pd.DataFrame) – A pandas DataFrame containing the duplicate occurrences.

  • mainid (str) – The main identifier of the group

  • binary (str) – The binary identifier of the group

  • letter (str) – The letter identifier of the group

  • period_mismatch_flag (int) – Flag for period mismatch. Defaults to 0

  • fallback_merge_flag (int) – Flag for fallback merge. Defaults to 0

Returns:

A pandas Series corresponding to the merged single entry.

Return type:

pd.DataFrame

group_by_letter_check_period(verbose)#

Group the catalog by main_id, binary, and letter, then merge entries based on period or semi-major axis agreement.

This function processes the entire catalog to consolidate multiple entries for the same exoplanet. It performs the following steps:

  1. Groups the catalog by main_id, binary, and letter.

  2. For each group it calculates working period and semi-major axis values.

  3. Checks for agreement in period values. If periods agree, merges entries into a single entry. If periods disagree, keeps separate entries and logs the disagreement.

  4. If no period data, checks for agreement in semi-major axis values. If semi-major axes agree, merges entries into a single entry. If semi-major axes disagree, keeps separate entries and logs the disagreement.

  5. If neither period nor semi-major axis data available, merges all entries.

  6. Assigns merging_mismatch_flags: 0: Successful merge (period or semi-major axis agreement); 1: Disagreement in period or semi-major axis; 2: Fallback merge (no period or semi-major axis data)

  7. Creates a new catalog with the merged entries.

  8. Logs the merging process and any disagreements.

Parameters:
  • self (Emc) – An instance of the Emc class

  • verbose (bool) – If True, displays a progress bar during processing

Returns:

None

Return type:

None

select_best_mass()#

Selects the best mass estimate for each planet in the catalog.

This function determines whether to use the mass or minimum mass (msini) as the best mass estimate for each planet based on their relative errors. It performs the following steps:

  1. If MASSREL (relative error of mass) is greater than or equal to MSINIREL (relative error of msini), it uses msini as the best mass estimate.

  2. If MASSREL is less than MSINIREL, it uses mass as the best mass estimate.

  3. If both mass and msini are missing (NaN), it sets all best mass related fields to NaN or empty string.

Parameters:

self (Emc) – An instance of the Emc class

Returns:

None

Return type:

None

set_exomercat_name()#

Creates the ‘exo-mercat_name’ column by joining the main_id, binary (if any), and letter.

Parameters:

self (Emc) – An instance of the class Emc

Returns:

None

Return type:

None

identify_misnamed_duplicates()#

Identifies potential misnamed duplicate planets in systems with multiple planets based on discrepancies in their estimated periods (p) or semi-major axes (a). Flags records where planets within the same system (main_id) have different periods or semi-major axes, which might indicate misnaming.

Parameters:

self (Emc) – An instance of the class Emc

Returns:

None

Return type:

None

keep_columns()#

Retain only specified columns in the dataframe and remove all others.

This function performs the following operations:

  1. Defines a list of columns to keep, including various exoplanet properties and metadata.

  2. Attempts to filter the dataframe to retain only these specified columns.

  3. If any specified column is missing from the dataframe, it raises a KeyError.

Parameters:

self (Emc) – An instance of Emc class

Returns:

None

Return type:

None

Raises:

KeyError – If any of the specified columns to keep are not present in the dataframe

remove_known_brown_dwarfs(local_date, print_flag)#

Remove objects with masses greater than 20 Jupiter masses (considered brown dwarfs) from the dataset.

This function performs the following operations:

  1. Identifies objects with mass or minimum mass (msini) greater than 20 Jupiter masses.

  2. Optionally saves these identified objects to CSV files.

  3. Removes the identified objects from the main dataset.

The mass threshold is applied as follows:

  • Uses ‘mass’ if available, otherwise uses ‘msini’.

  • If both are unavailable, treats the object as having zero mass (thus not removed).

  • Empty strings are treated as zero mass.

Parameters:
  • self (Emc) – An instance of the Emc class

  • local_date (str) – A string representation of the current date, used for naming output files

  • print_flag (bool) – If True, saves the removed objects to CSV files

Returns:

None

Return type:

None

fill_row_update(local_date)#

Update the ‘row_update’ column in the DataFrame based on changes from the previous version.

This function performs the following operations:

  1. Uses the provided local_date as the update date.

  2. Checks for previous versions of the catalog.

  3. If previous versions exist, compares the current DataFrame with the most recent previous version.

  4. Updates the ‘row_update’ column for rows that have changed or are new.

  5. Retains the previous ‘row_update’ value for unchanged rows.

The function handles the following scenarios:

  • If no previous versions exist, all rows are considered new and updated with the current date.

  • If previous versions exist, only changed or new rows are updated with the current date.

Parameters:
  • self (Emc) – An instance of the Emc class

  • local_date (str) – The date to use for updating the ‘row_update’ column

Returns:

None

Return type:

None

save_catalog(local_date, postfix='')#

Saves the catalog to csv viles. It is saved to the ‘Exo-MerCat’ folder both as a exo-mercat.csv file and as a exo-mercatYYYY-MM-DD.csv file.

Parameters:
  • self (Emc) – An instance of the class Emc

  • local_date (str) – The date to save the catalog

  • postfix (str) – The postfix to add to the filename

Returns:

None

Return type:

None

class src.exomercat.utility_functions.UtilityFunctions#

Bases: object

A class that contains utility functions that can be used in other modules.

static folder_initialization()#

Initialize the directory structure for the exoplanet catalog processing.

Creates the following directories if they don’t exist: ‘Exo-MerCat/’; ‘InputSources/’; ‘StandardizedSources/’; ‘Logs/’.

Returns:

None

Return type:

None

static ping_simbad_vizier()#

Test the connection to SIMBAD and VizieR services.

Attempts to perform a simple query on both SIMBAD and VizieR services to check if they are accessible and responding.

Returns:

A string containing the status of both connection attempts.

Return type:

str

static get_common_nomenclature()#

Provide a mapping of astronomical constants and abbreviations.

Returns:

A dictionary mapping full names to abbreviated forms for various astronomical terms, constellations, and catalog prefixes.

Return type:

dict

static read_config()#

Read and parse the ‘input_sources.ini’ configuration file.

Returns:

A dictionary of the configuration parameters.

Return type:

dict

static read_config_replacements(section)#

Read and parse a specific section of the ‘replacements.ini’ configuration file.

Parameters:

section (str) – Specify which section of the replacements

Returns:

A dictionary containing the custom replacements

Return type:

dict

static standardize_string(name)#

Standardize the format of exoplanet and star names.

Applies various rules to correct common formatting inconsistencies in exoplanet and star naming conventions.

Parameters:

name (str) – Specify the string to standardize

Returns:

The standardized string

Return type:

str

static calculate_working_p_sma(group, tolerance)#

Calculate working period and semi-major axis values for a group of exoplanets.

Parameters:
  • group (pd.DataFrame) – The input DataFrame containing columns ‘p’ and ‘a’.

  • tolerance (float) – The tolerance factor used in calculations.

Returns:

The DataFrame with ‘working_p’ and ‘working_a’ values calculated.

Return type:

pd.DataFrame

static get_parameter(treeobject, parameter)#

Extract a parameter value from an XML ElementTree object.

Parameters:
  • treeobject (ElementTree.Element) – An ElementTree object.

  • parameter (str) – A string representing the name of an element in the XML file.

Returns:

A string containing the parameter value.

Return type:

str

static get_attribute(treeobject, parameter, attrib)#

Extract an attribute value from a specific element in an XML ElementTree object.

Parameters:
  • treeobject (ElementTree.Element) – An ElementTree object, which is the root of a parsed XML file.

  • parameter (str) – A string representing the name of an element in the XML file.

  • attrib (str) – A string representing one of that element’s attributes.

Returns:

A string containing the value of the attribute.

Return type:

str

static get_parameter_all(treeobject, parameter)#

Extract all occurrences of a parameter from an XML ElementTree object.

Parameters:
  • treeobject (ElementTree.Element) – An ElementTree object, which is the root of a parsed XML file.

  • parameter (str) – A string representing the name of an element in the XML file.

Returns:

A string containing all values in treeobject for the supplied parameter.

Return type:

str

static convert_xmlfile_to_csvfile(file_path, output_file)#

Convert an XML file containing exoplanet data to a CSV file.

Parameters:

file_path (Union[Path, str]) – The file path of the XML file to be converted.

Returns:

None

Return type:

None

static convert_discovery_methods(data)#

Convert the discovery methods in the DataFrame to standardized values.

Parameters:

data (pd.DataFrame) – The DataFrame containing the discovery methods.

Returns:

The DataFrame with the discovery methods converted.

Return type:

pd.DataFrame

static perform_query(service, query, uploads_dict=None)#

Perform a query using the given service and query.

Parameters:
  • service (object) – The service object used to perform the query.

  • query (str) – The query string.

  • uploads_dict (dict, optional) – A dictionary of uploads. Defaults to None.

Returns:

The result of the query as a DataFrame.

Return type:

pd.DataFrame

static calculate_angsep(table)#

Calculate angular separations between coordinates in a DataFrame.

Parameters:

table (pd.DataFrame) – A pandas DataFrame containing columns ‘ra’, ‘dec’, ‘ra_2’, ‘dec_2’.

Returns:

A modified DataFrame with the angular separation calculated and selected rows.

Return type:

pd.DataFrame

load_standardized_catalog(local_date)#

Load a standardized catalog file for a given date. If not found, it falls back to the most recent available version.

Parameters:
  • filename (str) – The base filename of the catalog.

  • local_date (str) – The date for which to load the catalog (format: YYYY-MM-DD).

Returns:

A DataFrame containing the loaded catalog data.

Return type:

pd.DataFrame

Raises:

ValueError – If no suitable catalog file can be found.

static print_progress_bar(iteration, total, prefix='', suffix='', length=50, fill='█')#

Print a progress bar to the console.

Parameters:
  • iteration – Current iteration (Int).

  • total – Total iterations (Int).

  • prefix – Prefix string (Str).

  • suffix – Suffix string (Str).

  • length – Character length of bar (Int).

  • fill – Bar fill character (Str).

Functions#

src.exomercat.cli.main()#

Main entry point for the Exo-MerCat command-line interface.

This function parses command-line arguments and executes the appropriate function based on the user’s input. It supports the following operations:

  • maintenance: Perform sanity checks on the catalog data

  • input: Download and standardize catalog files

  • run: Process and merge catalog data to create the Exo-MerCat catalog

  • check: Perform validation checks on the final Exo-MerCat catalog

  • all: Execute all of the above operations in sequence

Command-line arguments:

  • function: The operation to perform (maintenance, input, run, check, or all)

  • –verbose (-v): Increase output verbosity (use -v, -vv, or -vvv for more detail)

  • –date (-d): Specify a date for catalog data (format: YYYY-MM-DD)

This function is not intended to be imported and used directly in other modules.

src.exomercat.cli.ping(local_date)#

Perform sanity checks on the input catalog data.

This function performs the following checks for each catalog:

  1. Attempts to download the catalog

  2. Tries to read the downloaded catalog

  3. Checks if all required columns are present

  4. Verifies the data types of the columns

  5. Checks for non-ASCII characters in the data

It also checks the connection to SIMBAD and VizieR services.

Parameters:

local_date (str) – The date for which to perform the checks (format: YYYY-MM-DD)

Raises:

ValueError – If one or more sanity checks fail

src.exomercat.cli.input(local_date)#

Download and standardize catalog files.

This function performs the following operations for each catalog:

  1. Downloads the catalog data

  2. Reads the downloaded data

  3. Standardizes the catalog format

  4. Converts coordinates to a standard format

  5. Performs various data cleaning and standardization operations

  6. Saves the standardized catalog

Parameters:

local_date (str) – The date for which to download and process catalogs (format: YYYY-MM-DD)

src.exomercat.cli.run(local_date, verbose)#

Process and merge catalog data to create the Exo-MerCat catalog.

This function performs the following operations:

  1. Loads standardized catalog files

  2. Merges data from different catalogs

  3. Matches with stellar catalogs

  4. Performs various data processing and standardization steps

  5. Merges entries for the same exoplanet from different catalogs

  6. Saves the final Exo-MerCat catalog

Parameters:
  • local_date (str) – The date for which to process the catalogs (format: YYYY-MM-DD)

  • verbose (int) – The verbosity level for output

src.exomercat.cli.check(local_date)#

Perform validation checks on the final Exo-MerCat catalog.

This function loads the final Exo-MerCat catalog and performs various consistency checks, including: - Verifying the presence and correctness of catalog-specific name columns - Checking the consistency of parameter values and their associated metadata (e.g., errors, URLs) - Verifying the completeness of critical fields like discovery method and year

Parameters:

local_date (str) – The date of the catalog to check (format: YYYY-MM-DD)