midrc_react.core package

Submodules

midrc_react.core.aggregate_jsd_calc module

This module contains functions for calculating Jensen-Shannon Distance (JSD) between datasets.

midrc_react.core.aggregate_jsd_calc.calc_aggregate_jsd_at_date(df1: DataFrame, df2: DataFrame, cols_to_use: list[str], date: object) float

Calculate Jensen-Shannon Distance (JSD) based on features between two datasets at a specific date.

Parameters:
  • df1 – pandas DataFrame containing the first dataset

  • df2 – pandas DataFrame containing the second dataset

  • cols_to_use – list of columns to use for the JSD calculation

  • date – date at which to calculate the JSD

Returns:

dictionary of JSD values for each feature

Return type:

dict

midrc_react.core.aggregate_jsd_calc.calc_jsd_by_features(df_list: list[DataFrame], cols_to_use: list[str]) dict[str, float]

Calculate Jensen-Shannon Distance (JSD) based on features based on input datasets.

Parameters:
  • df_list – list of pandas DataFrames containing the datasets

  • cols_to_use – list of columns to use for the JSD calculation

Returns:

dictionary of JSD values for each dataset combination

Return type:

dict

midrc_react.core.aggregate_jsd_calc.calc_jsd_by_features_2df(df1: DataFrame, df2: DataFrame, cols_to_use: list[str]) float

Calculate Jensen-Shannon Distance (JSD) based on features between two datasets.

Parameters:
  • df1 – pandas DataFrame containing the first dataset

  • df2 – pandas DataFrame containing the second dataset

  • cols_to_use – list of columns to use for the JSD calculation

Returns:

dictionary of JSD values for each feature

Return type:

dict

midrc_react.core.aggregate_jsd_calc.calc_jsd_from_counts_dict(counts_dict, dataset_names)

Calculates the Jensen-Shannon Distance (JSD) between each pair of datasets in a dictionary.

Parameters:
  • counts_dict – dictionary of counts for each dataset

  • dataset_names – list of dataset names to compare

Returns:

dictionary of JSD values for each dataset combination

Return type:

dict

midrc_react.core.cucconi module

This module contains functions for calculating the Cucconi test and distribution.

class midrc_react.core.cucconi.CucconiMultisampleResult(statistic, pvalue)

Bases: tuple

pvalue

Alias for field number 1

statistic

Alias for field number 0

class midrc_react.core.cucconi.CucconiResult(statistic, pvalue)

Bases: tuple

pvalue

Alias for field number 1

statistic

Alias for field number 0

midrc_react.core.cucconi.cucconi_multisample_test(samples: list[ndarray[Any, dtype[_ScalarType_co]]], method: str = 'bootstrap', replications: int = 1000, ties: str = 'average', n_jobs: int = 1) CucconiMultisampleResult

Method to perform a multisample Cucconi scale-location test.

Parameters:
  • samples (List[numpy.ndarray]) – list of observation vectors

  • method (str) – method for determining p-value, possible values are ‘bootstrap’ and ‘permutation’

  • replications (int) – number of bootstrap replications

  • ties (str) – string specifying a method to deal with ties in data, possible values as for scipy.stats.rankdata

  • n_jobs (int) – the maximum number of concurrently running jobs. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution)

Returns:

namedtuple with test statistic value and the p-value

Return type:

tuple

Raises:

ValueError – if ‘method’ parameter is not specified to ‘bootstrap’ or ‘permutation’

Examples

>>> np.random.seed(987654321) # set random seed to get the same result
>>> sample_a = sample_b = np.random.normal(loc=0, scale=1, size=100)
>>> cucconi_multisample_test([sample_a, sample_b], replications=100000)
CucconiMultisampleResult(statistic=6.996968353551774e-07, pvalue=1.0)
>>> np.random.seed(987654321)
>>> sample_a = np.random.normal(loc=0, scale=1, size=100)
>>> sample_b = np.random.normal(loc=10, scale=10, size=100)
>>> cucconi_multisample_test([sample_a, sample_a, sample_b], method='permutation')
CucconiMultisampleResult(statistic=45.3891929069273, pvalue=0.000999000999000999)
midrc_react.core.cucconi.cucconi_test(a: ndarray[Any, dtype[_ScalarType_co]], b: ndarray[Any, dtype[_ScalarType_co]], method: str = 'bootstrap', replications: int = 1000, ties: str = 'average', n_jobs: int = 1) CucconiResult

Method to perform a Cucconi scale-location test.

Parameters:
  • a (np.ndarray) – vector of observations

  • b (np.ndarray) – vector of observations

  • method (str) – method for determining p-value, possible values are ‘bootstrap’ and ‘permutation’

  • replications (int) – number of bootstrap replications

  • ties (str) – string specifying a method to deal with ties in data, possible values as for scipy.stats.rankdata

  • n_jobs (int) – the maximum number of concurrently running jobs. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution)

Returns:

namedtuple with test statistic value and the p-value

Return type:

tuple

Raises:

ValueError – if ‘method’ parameter is not specified to ‘bootstrap’ or ‘permutation’

Examples

>>> np.random.seed(987654321) # set random seed to get the same result
>>> sample_a = sample_b = np.random.normal(loc=0, scale=1, size=100)
>>> cucconi_test(sample_a, sample_b, replications=10000)
CucconiResult(statistic=3.7763314663244195e-08, pvalue=1.0)
>>> np.random.seed(987654321)
>>> sample_a = np.random.normal(loc=0, scale=1, size=100)
>>> sample_b = np.random.normal(loc=10, scale=10, size=100)
>>> cucconi_test(sample_a, sample_b, method='permutation')
CucconiResult(statistic=2.62372293956099, pvalue=0.000999000999000999)

midrc_react.core.data_preprocessing module

This module contains functions for data preprocessing and combining datasets.

midrc_react.core.data_preprocessing.bin_dataframe_column(df_to_bin: DataFrame, column_name: str, cut_column_name: str = 'CUT', bins: list[int], list[float] = None, labels: list[str] = None, *, right: bool = False)

Cuts the age column into bins and adds a column with the bin labels.

Parameters:
  • df_to_bin – pandas DataFrame containing the data

  • column_name – name of the column to be binned

  • cut_column_name – name of the column to be added with the bin labels

  • bins – list of bins to be used for the binning

  • labels – list of labels for the bins

  • right – whether to use right-inclusive intervals

Returns:

pandas DataFrame with the binned column and the labels

Return type:

pd.DataFrame

midrc_react.core.data_preprocessing.combine_datasets_from_list(df_list: list[DataFrame], dataset_column: str = '_dataset_')

Combines a list of dataframes into a single dataframe with a new column for the dataset name.

Parameters:
  • df_list (list[pd.DataFrame]) – A list of dataframes to be combined.

  • dataset_column (str, optional) – The name of the column to be used for the dataset name. Defaults to ‘_dataset_’.

Returns:

A combined dataframe with a new column for the dataset name.

Return type:

pd.DataFrame

midrc_react.core.datetimetools module

This module contains functions for converting between different date and time formats.

midrc_react.core.datetimetools.convert_date_to_milliseconds(date: QDate)

Converts a date to milliseconds since epoch.

Parameters:

date (QDate) – PySide6 QDate object.

Returns:

Milliseconds since epoch.

Return type:

int

midrc_react.core.datetimetools.get_date_parts(date_val)

Extracts the year, month, and day from a date value.

Parameters:

date_val – A date value (could be a string, datetime, or other formats).

Returns:

A tuple containing the year, month, and day.

Return type:

tuple

midrc_react.core.datetimetools.numpy_datetime64_to_qdate(numpy_datetime: datetime64)

Convert a NumPy datetime64 object to a PySide6 QDate object.

Parameters:

numpy_datetime (numpy.datetime64) – NumPy datetime64 object.

Returns:

PySide6 QDate object representing the same date.

Return type:

QDate

midrc_react.core.datetimetools.pandas_date_to_qdate(pandas_date)

Convert a pandas Timestamp or datetime object to a PySide6 QDate object.

Parameters:

pandas_date (pd.Timestamp or datetime64) – Pandas Timestamp or datetime object.

Returns:

PySide6 QDate object representing the same date.

Return type:

QDate

midrc_react.core.excel_layout module

This module contains classes and functions for building and processing Excel and CSV files.

class midrc_react.core.excel_layout.DataSheet(sheet_name, data_source, custom_age_ranges, is_excel=False, file: ExcelFile = None, df: DataFrame = None)

Bases: object

Class representing a data sheet.

Variables:
  • name (str) – The name of the data sheet.

  • _columns (dict) – A dictionary containing the columns of the data sheet.

__init__(sheet_name, data_source, custom_age_ranges, is_excel=False, file: ExcelFile = None, df: DataFrame = None)

Initialize the DataSheet object.

Parameters:
  • sheet_name (str) – The name of the sheet in the Excel file to parse.

  • data_source (dict) – The data source object.

  • is_excel (bool, optional) – Flag indicating whether the data source is an Excel file. Defaults to False.

  • file (pd.ExcelFile, optional) – The Excel file to read the sheet from

Returns:

None

property columns

Return the columns.

create_custom_age_columns(age_ranges: list[tuple])

Creates custom age columns by summing values from columns that match each age range.

Parameters:

age_ranges (list of tuple) – Each tuple is (min_age, max_age).

Notes

  • Drops any previously created custom columns.

  • Considers only columns that start with a digit and do not contain ‘(%)’ or ‘(CUSUM)’.

  • A column is included for an age range if:
    • Its lower bound (the first number in the header) is within the range.

    • If an upper bound exists (a second number), the age range’s max is not less than it.

    • If no upper bound exists, the column is only included if max_age is infinite.

  • Warns if any eligible column is unused.

property data_columns

Return the data columns. This skips the first column, which is the date column.

property df

Return the dataframe.

class midrc_react.core.excel_layout.DataSource(data_source, custom_age_ranges=None)

Bases: object

Class representing a data source with optional plugin-based preprocessing and numeric column adjustments.

__init__(data_source, custom_age_ranges=None)

Initializes the DataSource class.

Parameters:
  • data_source (dict) – The data source configuration.

  • custom_age_ranges (dict, optional) – A dictionary of custom age ranges.

apply_numeric_column_adjustments(df: DataFrame)

Applies numeric column adjustments to a DataFrame using binning.

Parameters:

df (pd.DataFrame) – The input DataFrame.

Returns:

The DataFrame with numeric column adjustments.

Return type:

pd.DataFrame

build_data_frames_from_content(content: BytesIO)

Loads data from an in-memory content stream.

Parameters:

content (io.BytesIO) – Binary Excel file data.

Returns:

None

build_data_frames_from_csv(filename: str)

Loads and preprocesses a CSV or TSV file.

Parameters:

filename (str) – The file path.

Returns:

None

build_data_frames_from_file(filename: str)

Loads an Excel file.

Parameters:

filename (str) – The file path.

Returns:

None

calculate_cumulative_sums(df: DataFrame, col: str)

Calculates cumulative sums for a given column.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • col (str) – The column to calculate cumulative sums for.

Returns:

A DataFrame with cumulative sums.

Return type:

pd.DataFrame

create_sheets(file: ExcelFile)

Creates sheets from a given file.

Parameters:

file (pd.ExcelFile) – The Excel file object.

Returns:

None

create_sheets_from_df(df: DataFrame)

Creates data sheets from a DataFrame.

Parameters:

df (pd.DataFrame) – The processed DataFrame.

Returns:

None

static load_plugin(plugin_path)

Dynamically loads a preprocessing plugin from the given path.

Parameters:

plugin_path (str) – Path to the plugin Python file.

Returns:

A reference to the plugin’s preprocess_data function if found, else None.

property numeric_cols

Returns a dictionary of numeric columns to use for the analysis.

Returns:

A dictionary of numeric columns to use for the analysis.

Return type:

dict

raw_columns_to_use()

Returns a list of the raw columns to use for the analysis.

Returns:

A list of the raw columns to use for the analysis.

Return type:

list

midrc_react.core.excelparse module

This module contains a function to parse Excel files and return a pandas DataFrame.

CURRENTLY UNUSED IN midrc_react PROJECT

midrc_react.core.excelparse.excelparse(filename, sheet_name)

Parse a spreadsheet using the filename and sheet name specified and return a pandas dataframe

Parameters:
  • filename (string) – filename to open

  • sheet_name (string) – sheet name to parse

Returns:

pandas dataframe

midrc_react.core.famd_calc module

This module contains functions for calculating Factor Analysis of Mixed Data (FAMD) and related distances.

midrc_react.core.famd_calc.adjust_bin_widths(bins, hist, multiple=2)

Adjust the width of bins by merging a specified number of adjacent bins.

Parameters:
  • bins (array) – The original bin edges.

  • hist (array) – The original histogram values.

  • multiple (int) – The factor by which to adjust the bin widths (e.g., 2 to double, 3 to triple).

Returns:

The new bin edges with adjusted widths. new_hist (array): The new histogram values corresponding to the new bins.

Return type:

new_bins (array)

midrc_react.core.famd_calc.calc_famd_df(raw_df, cols_to_use, numeric_cols, dataset_column='_dataset_', print_outliers=False, famd_column='famd_x_coordinates')

Calculate the FAMD coordinates for the input DataFrame and return a new DataFrame with the coordinates added.

Parameters:
  • raw_df (DataFrame) – The raw data to be preprocessed.

  • cols_to_use (list) – List of columns to use for the calculation.

  • numeric_cols (list) – List of numeric columns to use for the calculation.

  • dataset_column (str, optional) – The name of the column to be used for the dataset name. Defaults to ‘_dataset_’.

  • print_outliers (bool, optional) – Whether to print outliers. Defaults to False.

  • famd_column (str, optional) – The name of the column to be used for the FAMD coordinates. Defaults to ‘famd_x_coordinates’.

Returns:

A DataFrame with the FAMD coordinates added.

Return type:

DataFrame

midrc_react.core.famd_calc.calc_famd_distances(df, cols_to_use, numeric_cols, dataset_column='_dataset_', distance_metrics='all', jsd_scaled_bin_width=0.01, print_outliers=False)

Calculate various distance metrics based on FAMD coordinates calculated from the input DataFrame using the feature columns specified in the SamplingData object.

This function computes Jensen-Shannon Divergence (JSD), Wasserstein distance, Kolmogorov-Smirnov (KS) statistics, and Cucconi distance, with optional scaling methods for the given distances. The results are returned as a dictionary where keys represent the metric names.

Parameters:
  • df (DataFrame) – The DataFrame containing the data.

  • cols_to_use (list) – List of columns to use for the calculation.

  • numeric_cols (list) – List of numeric columns to use for the calculation.

  • dataset_column (str) – The name of the column to be used for the dataset name.

  • distance_metrics (tuple) – A tuple of strings specifying which distance metrics to compute. Use ‘all’ to compute all available metrics or specify individual metrics (e.g., ‘jsd’, ‘wass’, ‘ks2’, ‘cuc’) along with optional scaling options (e.g., ‘wass(std)’, ‘ks2(rob)’, etc.).

  • jsd_scaled_bin_width (float) – Width of each histogram bin for scaled JSD (default is 0.01).

  • print_outliers (bool) – Whether to print outliers (default is False).

Returns:

Dictionary of distance values specified in distance_metrics for each dataset combination.

Return type:

dict

midrc_react.core.famd_calc.calc_famd_ks2_at_date(df1, df2, cols_to_use, numeric_cols, calc_date)

Calculate the KS2 distance between two datasets at a specific date.

Parameters:
  • df1 – first DataFrame

  • df2 – second DataFrame

  • cols_to_use – columns to use for the calculation

  • numeric_cols – list of numeric columns

  • calc_date – date to calculate the KS2 distance

Returns:

KS2 distance at specified date

Return type:

float

midrc_react.core.famd_calc.calc_famd_ks2_at_dates(df1, df2, cols_to_use, numeric_cols, calc_date_list)

Calculate the KS2 distance between two datasets at multiple dates.

Parameters:
  • df1 – first DataFrame

  • df2 – second DataFrame

  • cols_to_use – columns to use for the calculation

  • numeric_cols – list of numeric columns

  • calc_date_list – list of dates to calculate the KS2 distance

Returns:

list of KS2 distances at each date

Return type:

list(float)

midrc_react.core.famd_calc.fit_famd(data)

Fits a Factor Analysis of Mixed Data (FAMD) model to the input data.

Parameters:

data (pandas.DataFrame) – The input data to fit the FAMD model.

Returns:

A tuple containing the fitted FAMD model and the row coordinates.

Return type:

tuple

Example

famd_model, coordinates = fit_famd(data)

midrc_react.core.famd_calc.preprocess_data_for_famd(raw_df, features, numeric_features, scaling_method='standard')

Preprocesses the raw data for Factor Analysis of Mixed Data (FAMD).

Parameters:
  • raw_df (DataFrame) – The raw data to be preprocessed.

  • features (List) – List of features to be included in the preprocessing.

  • numeric_features (List) – List of numeric features to be scaled.

  • scaling_method (str) – The scaling method to use for numeric features.

Returns:

Preprocessed data with selected features. df (DataFrame): Concatenated DataFrame with preprocessed data and ‘dataset’ column.

Return type:

c_data (DataFrame)

midrc_react.core.jsdconfig module

This module contains the JSDConfig class, which loads and stores data from a YAML file.

class midrc_react.core.jsdconfig.JSDConfig(filename: str = 'jsdconfig.yaml')

Bases: object

The JSDConfig class loads and stores data from a YAML file.

Variables:
  • filename (str) – The name of the YAML file to load. Default is ‘jsdconfig.yaml’.

  • data (dict) – The loaded data from the YAML file.

__init__(self, filename='jsdconfig.yaml')

Initializes a new instance of JSDConfig.

__post_init__(self)

Loads the YAML data from the current filename.

data: dict
filename: str = 'jsdconfig.yaml'
set_filename(new_filename: str)

Set a new filename and reload the data.

Parameters:

new_filename (str) – The new filename to load.

midrc_react.core.jsdcontroller module

This module contains the JSDController class, which manages the JSD view and model.

class midrc_react.core.jsdcontroller.JSDController(jsd_view, jsd_model, config)

Bases: QObject

Class JSDController

This class represents a JSD Controller. It emits a signal when the model changes.

Variables:
  • modelChanged – A Signal that is emitted when the model changes.

  • fileChangedSignal – A Signal that is emitted when the file changes.

  • NOT_REPORTED_COLUMN_NAME (str) – A constant string representing the ‘Not Reported’ column name.

NOT_REPORTED_COLUMN_NAME = 'Not Reported'
__init__(jsd_view, jsd_model, config)

Initialize the JSDController.

Parameters:
  • jsd_view (object) – The JSD view object.

  • jsd_model (object) – The JSD model object.

  • config (JSDConfig) – A dictionary containing configuration data.

Returns:

None

category_changed()

Parses the dates from all files for the current category and updates the data in the model appropriately.

Returns:

None

connect_signals()

Connects signals for file and category comboboxes.

fileChangedSignal
file_changed(_, new_category_index=None)

Parses the categories from the files selected in the comboboxes and updates the category box appropriately. Emits the fileChangedSignal signal upon completion.

Parameters:

Optional (new_category_index) – The index of the category to set, if None then use previous index.

get_categories()

Get the list of categories from the data sources.

Returns:

A list of categories.

Return type:

list

get_cols_to_use_for_jsd_calc(source_id, category)

Generates a list of columns from a sheet that should be used in the JSD calculation.

This handles custom categories i.e. for custom age ranges

Parameters:
  • source_id (str) – The combobox used to get the data file from.

  • category (str) – The sheet category to get the columns from.

Returns:

List of columns in the current sheet category

get_file_sheets_from_index(index=0)

Get the sheets from the selected file combobox.

Parameters:

index (int) – The index of the file combobox. Default is 0.

Returns:

A dictionary containing the sheets from the selected file.

Return type:

dict

get_spider_plot_values(calc_date=None)

Compiles a dictionary of categories and JSD values for a given date.

Parameters:

calc_date (Optional[datetime.date]) – The date to use for JSD calculation. Default is None.

Returns:

A dictionary of categories and JSD values for a given date.

Return type:

dict

get_timeline_data(category: str)

Get the timeline data for the specified category.

Parameters:

category (str) – The category for which to get the timeline data.

Returns:

A DataFrame containing the timeline data.

Return type:

pd.DataFrame

initialize()

Initialize the JSDController.

Returns:

None

property jsd_model: JSDTableModel

Get the JSD model object.

Returns:

The JSD model object.

Return type:

object

property jsd_view: JsdViewBase

Get the JSD view object.

Returns:

The JSD view object.

Return type:

object

modelChanged
property num_categories: int

Get the number of categories.

Returns:

The number of categories.

Return type:

int

update_category_plots()

Update the category plots.

This method updates the JSD timeline plot and the area chart.

Returns:

True if the update was successful, False otherwise

update_file_based_charts()

Update the file-based charts.

This method updates the pie chart dock and the spider chart.

Returns:

True if the update was successful, False otherwise

Raises:

None

midrc_react.core.jsdcontroller.calculate_jsd(df1, df2, cols_to_use, calc_date)

Calculate the Jensen-Shannon distance between two dataframes for a given date.

There is an assumption that the date column of the dataframes are sorted from smallest to largest.

Note: The Jensen-Shannon distance returned is the square root of the Jensen-Shannon divergence.

Parameters:
  • df1 (pd.DataFrame) – First dataframe.

  • df2 (pd.DataFrame) – Second dataframe.

  • cols_to_use (list) – List of columns to use for the calculation.

  • calc_date (pd.Timestamp) – Date for which the calculation is performed.

Returns:

Jensen-Shannon distance between the two dataframes.

Return type:

float

midrc_react.core.jsdcontroller.remove_elements_less_than_from_sorted_list(sorted_list, value)

Remove elements less than the given value from a sorted list.

Parameters:
  • sorted_list (list) – A sorted list of elements.

  • value – The value to compare against.

Returns:

A new list containing only elements greater than or equal to the value.

Return type:

list

midrc_react.core.jsdmodel module

This module contains the JSDTableModel class, which is a subclass of QAbstractTableModel.

class midrc_react.core.jsdmodel.JSDTableModel(data_source_list=None, custom_age_ranges=None)

Bases: QAbstractTableModel

A class representing a table model for JSD data.

This class inherits from the QAbstractTableModel class and provides the necessary methods to interact with the JSD data in a table format. It handles the display of data, editing of data, and mapping of colors to specific areas of the table.

Variables:
  • HEADER_MAPPING (List[str]) – A list of header labels for the table columns.

  • data_source_added (Signal) – A signal emitted when a data source is added to the model.

HEADER_MAPPING = ['Date', 'JSD']
__init__(data_source_list=None, custom_age_ranges=None)

Initialize the JSDTableModel.

This method initializes the JSDTableModel by setting up the input data, mapping, and raw data sources.

Parameters:
  • data_source_list (List[dict], optional) –

    A list of data sources. Each data source is a dictionary with the following keys:

    • ’name’ (str): The name of the data source.

    • ’data type’ (str): The type of the data source.

    • ’filename’ (str): The filename of the data source.

  • custom_age_ranges (Any, optional) – Custom age ranges for the data sources.

Returns:

None

add_color_mapping(color: str, mapping_area: Any)

Add a color mapping to the JSDTableModel.

Parameters:
  • color (str) – The color to be mapped.

  • mapping_area (Any) – The area to be mapped to the color.

Returns:

None

add_data_source(data_source_dict)

Add a data source to the JSDTableModel.

This method adds a data source to the JSDTableModel by creating a new instance of the DataSource class and storing it in the data_sources dictionary. The data source is identified by its name, which is obtained from the ‘name’ key in the data_source_dict parameter. The DataSource object is initialized with the data_source_dict and the custom_age_ranges, if provided.

Parameters:

data_source_dict (dict) –

A dictionary containing the information about the data source.

The dictionary should have the following keys:

  • ’name’ (str): The name of the data source.

  • ’data type’ (str): The type of the data source.

  • ’filename’ (str): The filename of the data source.

Returns:

None

clear_color_mapping()

Clear the color mapping in the JSDTableModel.

This method clears the color mapping, removing all color mappings from the JSDTableModel.

Returns:

None

columnCount(_parent: QModelIndex = None) int

Returns the number of columns in the model.

Parameters:

_parent (QModelIndex) – The parent index. Defaults to QModelIndex(). This is unused.

Returns:

The number of columns in the model.

Return type:

int

property column_infos

Returns the column information of the JSDTableModel.

This method returns the column information of the JSDTableModel, which is a list of dictionaries representing the metadata for each set of two columns in the model (one column for date, one column for the JSD value).

Returns:

The column information of the JSDTableModel.

Each dictionary contains the following keys:

  • ’index1’ (int): The index of the first file used

  • ’file1’ (str): The file name of the first file used.

  • ’index2’ (int): The index of the second file used.

  • ’file2’ (str): The file name of the second file used.

Return type:

List[dict]

data(index: QModelIndex, role: int = ItemDataRole.DisplayRole) Any | None

Returns the data for the given index and role.

Parameters:
  • index (QModelIndex) – The index of the data.

  • role (int) – The role of the data. Defaults to Qt.DisplayRole.

Returns:

The data for the given index and role. If the role is Qt.DisplayRole or Qt.EditRole, it returns the corresponding data from the input_data list. If the role is Qt.BackgroundRole, it checks if the index is within any of the mapping areas defined in the _color_mapping dictionary. If it is, it returns the corresponding color from the _color_cache dictionary. If it is not within any mapping area, it returns the color white from the _color_cache dictionary. If the role is not any of the above, it returns None.

Return type:

Optional[Any]

data_source_added
flags(index: QModelIndex) ItemFlag

Get the flags for the given index.

Parameters:

index (QModelIndex) – The index of the item.

Returns:

The flags for the item.

Return type:

int

headerData(section: int, orientation: Orientation, role: int = ItemDataRole.DisplayRole) Any

Returns the header data for the specified section, orientation, and role.

Parameters:
  • section (int) – The section index.

  • orientation (int) – The orientation of the header (Qt.Horizontal or Qt.Vertical).

  • role (int) – The role of the header data.

Returns:

The header data for the specified section, orientation, and role.

Return type:

Any

property input_data

Returns the input data of the JSDTableModel.

This method returns the input data of the JSDTableModel, which is a list of lists representing the data for each column in the model. Each inner list represents a column and contains the data for that column.

Returns:

The input data of the JSDTableModel.

Return type:

List[List[Any]]

rowCount(parent: QModelIndex = None) int

Get the number of rows in the model.

This method returns the number of rows in the model based on the length of the input data. If the parent index is invalid, this method returns the largest number of rows in a column.

Parameters:

parent (QModelIndex) – The parent index.

Returns:

The number of rows in the model.

Return type:

int

setData(index: QModelIndex, value: Any, role: int = ItemDataRole.EditRole) bool

Set the data for the given index.

Parameters:
  • index (QModelIndex) – The index of the data to be set.

  • value (Any) – The new value to be set.

  • role (int) – The role of the data. Defaults to Qt.EditRole.

Returns:

True if the data was successfully set, False otherwise.

Return type:

bool

update_input_data(new_input_data, new_column_infos)

Update the input data and column information in the JSDTableModel.

This method updates the input data and column information in the JSDTableModel. It clears the existing input data and column information, and then sets them to the new values provided as arguments. It also updates the maximum row count based on the new input data.

Parameters:
  • new_input_data (List[List[Any]]) – The new input data to be set in the model. It should be a list of lists, where each inner list represents a column and contains the data for that column.

  • new_column_infos (List[dict]) – The new column information to be set in the model. It should be a list of dictionaries, where each dictionary represents the metadata for a column.

Returns:

None

midrc_react.core.numeric_distances module

This module contains functions for calculating various numerical distance metrics between datasets.

midrc_react.core.numeric_distances.build_histogram_dict(df, dataset_column, datasets, feature_column, bin_width, scaling_method=None)

Build a dictionary of histogram data for a specific dataset.

Parameters:
  • df – DataFrame containing the dataset.

  • dataset_column – Name of the dataset column within the DataFrame.

  • datasets – List of dataset names within the DataFrame.

  • feature_column – Name of the feature column within the DataFrame.

  • bin_width – Width of each histogram bin (default is 0.01).

  • scaling_method – Method to use for scaling the feature column (default is None).

See also

get_supported_scaling_methods() :

Returns the list of scaling methods that can be used.

Returns:

Dictionary containing histogram data for the specified dataset.

Return type:

hist_dict

midrc_react.core.numeric_distances.calc_cucconi_by_feature(df, feature: str, dataset_column: str = '_dataset_', scaling: str = None)

Calculate the Cucconi test for a specific feature.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the data.

  • feature (str) – The name of the feature to calculate the metric for.

  • dataset_column (str, optional) – The name of the column containing the dataset information. Defaults to ‘_dataset_’.

  • scaling (str, optional) – The scaling method to use for the feature. Defaults to None. (e.g., ‘standard’, ‘minmax’, ‘maxabs’, or ‘robust’)

See also

calc_ks2_samp_by_feature() :

Calculates the Kolmogorov-Smirnov test for a feature.

Returns:

A dictionary containing the metric results for each dataset combination.

Return type:

dict

midrc_react.core.numeric_distances.calc_distances_via_df(famd_df: DataFrame, feature_column: str, dataset_column: str = '_dataset_', *, distance_metrics: tuple[str] = 'all', jsd_scaled_bin_width=0.01)

Calculate various distance metrics based on histogram data.

This function computes Jensen-Shannon Divergence (JSD), Wasserstein distance, Kolmogorov-Smirnov (KS) statistics, and Cucconi distance, with optional scaling methods for Wasserstein and KS distances. The results are returned as a dictionary where keys represent the metric names.

Parameters:
  • famd_df (pd.DataFrame) – A DataFrame containing FAMD (Factorial Analysis of Mixed Data) results.

  • feature_column (str) – The name of the column containing the feature data.

  • dataset_column (str) – The name of the column containing the dataset information.

  • distance_metrics (tuple) – A tuple of strings specifying which distance metrics to compute. Use ‘all’ to compute all available metrics or specify individual metrics (e.g., ‘jsd’, ‘wass’, ‘ks2’, ‘cuc’) along with optional scaling options (e.g., ‘wass(std)’, ‘ks2(rob)’, etc.).

  • jsd_scaled_bin_width (float) – The bin width to use for the JSD calculation.

Returns:

A dictionary with keys as distance metric names and values as the computed metrics.

For example, output could include keys like ‘jsd’, ‘wass’, ‘ks2’, ‘cuc’, etc. If specific scaling options are computed, keys would include these as well, such as ‘wass(std)’, ‘jsd(rob)’, etc.

Return type:

dict

midrc_react.core.numeric_distances.calc_ks2_samp_by_feature(df, feature: str, dataset_column: str = '_dataset_', scaling: str = None)

Calculate the Kolmogorov-Smirnov test for a specific feature.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the data.

  • feature (str) – The name of the feature to calculate the metric for.

  • dataset_column (str, optional) – The name of the column containing the dataset information. Defaults to ‘_dataset_’.

  • scaling (str, optional) – The scaling method to use for the feature. Defaults to None. (e.g., ‘standard’, ‘minmax’, ‘maxabs’, or ‘robust’)

See also

calc_wasserstein_by_feature() :

Calculates the Wasserstein distance for a feature.

Returns:

A dictionary containing the metric results for each dataset combination.

Return type:

dict

midrc_react.core.numeric_distances.calc_numerical_metric_by_feature(df, feature: str, dataset_column: str, metric_function)

Calculate a specified metric based on a single feature for input datasets.

Parameters:
  • df – pandas DataFrame containing the data

  • feature – a string representing the feature to calculate the metric for

  • dataset_column – a string representing the column containing the dataset information

  • metric_function – a function to calculate the desired metric (e.g., cucconi, ks_2samp)

Returns:

A dictionary containing metric results for each dataset combination.

Return type:

dict

midrc_react.core.numeric_distances.calc_wasserstein_by_feature(df, feature: str, dataset_column: str = '_dataset_', scaling: str = None)

Calculate the Wasserstein distance for a specific feature.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the data.

  • feature (str) – The name of the feature to calculate the metric for.

  • dataset_column (str, optional) – The name of the column containing the dataset information. Defaults to ‘_dataset_’.

  • scaling (str, optional) – The scaling method to use for the feature. Defaults to None. (e.g., ‘standard’, ‘minmax’, ‘maxabs’, or ‘robust’)

See also

calc_ks2_samp_by_feature() :

Calculates the Kolmogorov-Smirnov test for a feature.

get_supported_scaling_methods() :

Returns the list of scaling methods that can be used.

Returns:

A dictionary containing the metric results for each dataset combination.

Return type:

dict

midrc_react.core.numeric_distances.generate_histogram(df, dataset_column, dataset_name, feature_column, bin_width=0.01)

Generate a histogram for a specific dataset within a DataFrame.

Parameters:
  • df – DataFrame containing the dataset.

  • dataset_column – Name of the dataset column within the DataFrame.

  • dataset_name – Name of the dataset within the DataFrame.

  • feature_column – Name of the feature column within the DataFrame.

  • bin_width – Width of each histogram bin (default is 0.01).

Returns:

Array of histogram values. bins: Array of bin edges.

Return type:

hist

Enhance readability by generating a histogram for a specific dataset based on the provided x-coordinates and bin width.

midrc_react.core.numeric_distances.get_supported_scaling_methods()

Get a list of supported scaling methods.

Returns:

A list of supported scaling methods.

Return type:

list

midrc_react.core.numeric_distances.scale_feature(df, feature: str, method: str = 'standard')

Normalize a feature to mean 0 and standard deviation 1.

Parameters:
  • df – pandas DataFrame containing the data

  • feature – a string representing the feature to normalize

  • method – a string representing the normalization method to use

See also

get_supported_scaling_methods() :

Returns the list of scaling methods that can be used.

Returns:

A copy of the DataFrame with the feature normalized.

midrc_react.core.numeric_distances.scale_values(values, method: str = 'standard')

Normalize a feature to mean 0 and standard deviation 1.

Parameters:
  • values – a list of values to normalize

  • method – a string representing the normalization method to use

See also

get_supported_scaling_methods() :

Returns the list of scaling methods that can be used.

Returns:

A copy of the DataFrame with the feature normalized.

Module contents