midrc_react.core package
Submodules
midrc_react.core.aggregate_jsd_calc module
This module contains functions for calculating Jensen-Shannon Distance (JSD) between datasets.
- midrc_react.core.aggregate_jsd_calc.calc_aggregate_jsd_at_date(df1: DataFrame, df2: DataFrame, cols_to_use: list[str], date: object) float
Calculate Jensen-Shannon Distance (JSD) based on features between two datasets at a specific date.
- Parameters:
df1 – pandas DataFrame containing the first dataset
df2 – pandas DataFrame containing the second dataset
cols_to_use – list of columns to use for the JSD calculation
date – date at which to calculate the JSD
- Returns:
dictionary of JSD values for each feature
- Return type:
dict
- midrc_react.core.aggregate_jsd_calc.calc_jsd_by_features(df_list: list[DataFrame], cols_to_use: list[str]) dict[str, float]
Calculate Jensen-Shannon Distance (JSD) based on features based on input datasets.
- Parameters:
df_list – list of pandas DataFrames containing the datasets
cols_to_use – list of columns to use for the JSD calculation
- Returns:
dictionary of JSD values for each dataset combination
- Return type:
dict
- midrc_react.core.aggregate_jsd_calc.calc_jsd_by_features_2df(df1: DataFrame, df2: DataFrame, cols_to_use: list[str]) float
Calculate Jensen-Shannon Distance (JSD) based on features between two datasets.
- Parameters:
df1 – pandas DataFrame containing the first dataset
df2 – pandas DataFrame containing the second dataset
cols_to_use – list of columns to use for the JSD calculation
- Returns:
dictionary of JSD values for each feature
- Return type:
dict
- midrc_react.core.aggregate_jsd_calc.calc_jsd_from_counts_dict(counts_dict, dataset_names)
Calculates the Jensen-Shannon Distance (JSD) between each pair of datasets in a dictionary.
- Parameters:
counts_dict – dictionary of counts for each dataset
dataset_names – list of dataset names to compare
- Returns:
dictionary of JSD values for each dataset combination
- Return type:
dict
midrc_react.core.cucconi module
This module contains functions for calculating the Cucconi test and distribution.
- class midrc_react.core.cucconi.CucconiMultisampleResult(statistic, pvalue)
Bases:
tuple
- pvalue
Alias for field number 1
- statistic
Alias for field number 0
- class midrc_react.core.cucconi.CucconiResult(statistic, pvalue)
Bases:
tuple
- pvalue
Alias for field number 1
- statistic
Alias for field number 0
- midrc_react.core.cucconi.cucconi_multisample_test(samples: list[ndarray[Any, dtype[_ScalarType_co]]], method: str = 'bootstrap', replications: int = 1000, ties: str = 'average', n_jobs: int = 1) CucconiMultisampleResult
Method to perform a multisample Cucconi scale-location test.
- Parameters:
samples (List[numpy.ndarray]) – list of observation vectors
method (str) – method for determining p-value, possible values are ‘bootstrap’ and ‘permutation’
replications (int) – number of bootstrap replications
ties (str) – string specifying a method to deal with ties in data, possible values as for scipy.stats.rankdata
n_jobs (int) – the maximum number of concurrently running jobs. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution)
- Returns:
namedtuple with test statistic value and the p-value
- Return type:
tuple
- Raises:
ValueError – if ‘method’ parameter is not specified to ‘bootstrap’ or ‘permutation’
Examples
>>> np.random.seed(987654321) # set random seed to get the same result >>> sample_a = sample_b = np.random.normal(loc=0, scale=1, size=100) >>> cucconi_multisample_test([sample_a, sample_b], replications=100000) CucconiMultisampleResult(statistic=6.996968353551774e-07, pvalue=1.0)
>>> np.random.seed(987654321) >>> sample_a = np.random.normal(loc=0, scale=1, size=100) >>> sample_b = np.random.normal(loc=10, scale=10, size=100) >>> cucconi_multisample_test([sample_a, sample_a, sample_b], method='permutation') CucconiMultisampleResult(statistic=45.3891929069273, pvalue=0.000999000999000999)
- midrc_react.core.cucconi.cucconi_test(a: ndarray[Any, dtype[_ScalarType_co]], b: ndarray[Any, dtype[_ScalarType_co]], method: str = 'bootstrap', replications: int = 1000, ties: str = 'average', n_jobs: int = 1) CucconiResult
Method to perform a Cucconi scale-location test.
- Parameters:
a (np.ndarray) – vector of observations
b (np.ndarray) – vector of observations
method (str) – method for determining p-value, possible values are ‘bootstrap’ and ‘permutation’
replications (int) – number of bootstrap replications
ties (str) – string specifying a method to deal with ties in data, possible values as for scipy.stats.rankdata
n_jobs (int) – the maximum number of concurrently running jobs. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution)
- Returns:
namedtuple with test statistic value and the p-value
- Return type:
tuple
- Raises:
ValueError – if ‘method’ parameter is not specified to ‘bootstrap’ or ‘permutation’
Examples
>>> np.random.seed(987654321) # set random seed to get the same result >>> sample_a = sample_b = np.random.normal(loc=0, scale=1, size=100) >>> cucconi_test(sample_a, sample_b, replications=10000) CucconiResult(statistic=3.7763314663244195e-08, pvalue=1.0)
>>> np.random.seed(987654321) >>> sample_a = np.random.normal(loc=0, scale=1, size=100) >>> sample_b = np.random.normal(loc=10, scale=10, size=100) >>> cucconi_test(sample_a, sample_b, method='permutation') CucconiResult(statistic=2.62372293956099, pvalue=0.000999000999000999)
midrc_react.core.data_preprocessing module
This module contains functions for data preprocessing and combining datasets.
- midrc_react.core.data_preprocessing.bin_dataframe_column(df_to_bin: DataFrame, column_name: str, cut_column_name: str = 'CUT', bins: list[int], list[float] = None, labels: list[str] = None, *, right: bool = False)
Cuts the age column into bins and adds a column with the bin labels.
- Parameters:
df_to_bin – pandas DataFrame containing the data
column_name – name of the column to be binned
cut_column_name – name of the column to be added with the bin labels
bins – list of bins to be used for the binning
labels – list of labels for the bins
right – whether to use right-inclusive intervals
- Returns:
pandas DataFrame with the binned column and the labels
- Return type:
pd.DataFrame
- midrc_react.core.data_preprocessing.combine_datasets_from_list(df_list: list[DataFrame], dataset_column: str = '_dataset_')
Combines a list of dataframes into a single dataframe with a new column for the dataset name.
- Parameters:
df_list (list[pd.DataFrame]) – A list of dataframes to be combined.
dataset_column (str, optional) – The name of the column to be used for the dataset name. Defaults to ‘_dataset_’.
- Returns:
A combined dataframe with a new column for the dataset name.
- Return type:
pd.DataFrame
midrc_react.core.datetimetools module
This module contains functions for converting between different date and time formats.
- midrc_react.core.datetimetools.convert_date_to_milliseconds(date: QDate)
Converts a date to milliseconds since epoch.
- Parameters:
date (QDate) – PySide6 QDate object.
- Returns:
Milliseconds since epoch.
- Return type:
int
- midrc_react.core.datetimetools.get_date_parts(date_val)
Extracts the year, month, and day from a date value.
- Parameters:
date_val – A date value (could be a string, datetime, or other formats).
- Returns:
A tuple containing the year, month, and day.
- Return type:
tuple
- midrc_react.core.datetimetools.numpy_datetime64_to_qdate(numpy_datetime: datetime64)
Convert a NumPy datetime64 object to a PySide6 QDate object.
- Parameters:
numpy_datetime (numpy.datetime64) – NumPy datetime64 object.
- Returns:
PySide6 QDate object representing the same date.
- Return type:
QDate
- midrc_react.core.datetimetools.pandas_date_to_qdate(pandas_date)
Convert a pandas Timestamp or datetime object to a PySide6 QDate object.
- Parameters:
pandas_date (pd.Timestamp or datetime64) – Pandas Timestamp or datetime object.
- Returns:
PySide6 QDate object representing the same date.
- Return type:
QDate
midrc_react.core.excel_layout module
This module contains classes and functions for building and processing Excel and CSV files.
- class midrc_react.core.excel_layout.DataSheet(sheet_name, data_source, custom_age_ranges, is_excel=False, file: ExcelFile = None, df: DataFrame = None)
Bases:
object
Class representing a data sheet.
- Variables:
name (str) – The name of the data sheet.
_columns (dict) – A dictionary containing the columns of the data sheet.
- __init__(sheet_name, data_source, custom_age_ranges, is_excel=False, file: ExcelFile = None, df: DataFrame = None)
Initialize the DataSheet object.
- Parameters:
sheet_name (str) – The name of the sheet in the Excel file to parse.
data_source (dict) – The data source object.
is_excel (bool, optional) – Flag indicating whether the data source is an Excel file. Defaults to False.
file (pd.ExcelFile, optional) – The Excel file to read the sheet from
- Returns:
None
- property columns
Return the columns.
- create_custom_age_columns(age_ranges: list[tuple])
Creates custom age columns by summing values from columns that match each age range.
- Parameters:
age_ranges (list of tuple) – Each tuple is (min_age, max_age).
Notes
Drops any previously created custom columns.
Considers only columns that start with a digit and do not contain ‘(%)’ or ‘(CUSUM)’.
- A column is included for an age range if:
Its lower bound (the first number in the header) is within the range.
If an upper bound exists (a second number), the age range’s max is not less than it.
If no upper bound exists, the column is only included if max_age is infinite.
Warns if any eligible column is unused.
- property data_columns
Return the data columns. This skips the first column, which is the date column.
- property df
Return the dataframe.
- class midrc_react.core.excel_layout.DataSource(data_source, custom_age_ranges=None)
Bases:
object
Class representing a data source with optional plugin-based preprocessing and numeric column adjustments.
- __init__(data_source, custom_age_ranges=None)
Initializes the DataSource class.
- Parameters:
data_source (dict) – The data source configuration.
custom_age_ranges (dict, optional) – A dictionary of custom age ranges.
- apply_numeric_column_adjustments(df: DataFrame)
Applies numeric column adjustments to a DataFrame using binning.
- Parameters:
df (pd.DataFrame) – The input DataFrame.
- Returns:
The DataFrame with numeric column adjustments.
- Return type:
pd.DataFrame
- build_data_frames_from_content(content: BytesIO)
Loads data from an in-memory content stream.
- Parameters:
content (io.BytesIO) – Binary Excel file data.
- Returns:
None
- build_data_frames_from_csv(filename: str)
Loads and preprocesses a CSV or TSV file.
- Parameters:
filename (str) – The file path.
- Returns:
None
- build_data_frames_from_file(filename: str)
Loads an Excel file.
- Parameters:
filename (str) – The file path.
- Returns:
None
- calculate_cumulative_sums(df: DataFrame, col: str)
Calculates cumulative sums for a given column.
- Parameters:
df (pd.DataFrame) – The input DataFrame.
col (str) – The column to calculate cumulative sums for.
- Returns:
A DataFrame with cumulative sums.
- Return type:
pd.DataFrame
- create_sheets(file: ExcelFile)
Creates sheets from a given file.
- Parameters:
file (pd.ExcelFile) – The Excel file object.
- Returns:
None
- create_sheets_from_df(df: DataFrame)
Creates data sheets from a DataFrame.
- Parameters:
df (pd.DataFrame) – The processed DataFrame.
- Returns:
None
- static load_plugin(plugin_path)
Dynamically loads a preprocessing plugin from the given path.
- Parameters:
plugin_path (str) – Path to the plugin Python file.
- Returns:
A reference to the plugin’s preprocess_data function if found, else None.
- property numeric_cols
Returns a dictionary of numeric columns to use for the analysis.
- Returns:
A dictionary of numeric columns to use for the analysis.
- Return type:
dict
- raw_columns_to_use()
Returns a list of the raw columns to use for the analysis.
- Returns:
A list of the raw columns to use for the analysis.
- Return type:
list
midrc_react.core.excelparse module
This module contains a function to parse Excel files and return a pandas DataFrame.
CURRENTLY UNUSED IN midrc_react PROJECT
- midrc_react.core.excelparse.excelparse(filename, sheet_name)
Parse a spreadsheet using the filename and sheet name specified and return a pandas dataframe
- Parameters:
filename (string) – filename to open
sheet_name (string) – sheet name to parse
- Returns:
pandas dataframe
midrc_react.core.famd_calc module
This module contains functions for calculating Factor Analysis of Mixed Data (FAMD) and related distances.
- midrc_react.core.famd_calc.adjust_bin_widths(bins, hist, multiple=2)
Adjust the width of bins by merging a specified number of adjacent bins.
- Parameters:
bins (array) – The original bin edges.
hist (array) – The original histogram values.
multiple (int) – The factor by which to adjust the bin widths (e.g., 2 to double, 3 to triple).
- Returns:
The new bin edges with adjusted widths. new_hist (array): The new histogram values corresponding to the new bins.
- Return type:
new_bins (array)
- midrc_react.core.famd_calc.calc_famd_df(raw_df, cols_to_use, numeric_cols, dataset_column='_dataset_', print_outliers=False, famd_column='famd_x_coordinates')
Calculate the FAMD coordinates for the input DataFrame and return a new DataFrame with the coordinates added.
- Parameters:
raw_df (DataFrame) – The raw data to be preprocessed.
cols_to_use (list) – List of columns to use for the calculation.
numeric_cols (list) – List of numeric columns to use for the calculation.
dataset_column (str, optional) – The name of the column to be used for the dataset name. Defaults to ‘_dataset_’.
print_outliers (bool, optional) – Whether to print outliers. Defaults to False.
famd_column (str, optional) – The name of the column to be used for the FAMD coordinates. Defaults to ‘famd_x_coordinates’.
- Returns:
A DataFrame with the FAMD coordinates added.
- Return type:
DataFrame
- midrc_react.core.famd_calc.calc_famd_distances(df, cols_to_use, numeric_cols, dataset_column='_dataset_', distance_metrics='all', jsd_scaled_bin_width=0.01, print_outliers=False)
Calculate various distance metrics based on FAMD coordinates calculated from the input DataFrame using the feature columns specified in the SamplingData object.
This function computes Jensen-Shannon Divergence (JSD), Wasserstein distance, Kolmogorov-Smirnov (KS) statistics, and Cucconi distance, with optional scaling methods for the given distances. The results are returned as a dictionary where keys represent the metric names.
- Parameters:
df (DataFrame) – The DataFrame containing the data.
cols_to_use (list) – List of columns to use for the calculation.
numeric_cols (list) – List of numeric columns to use for the calculation.
dataset_column (str) – The name of the column to be used for the dataset name.
distance_metrics (tuple) – A tuple of strings specifying which distance metrics to compute. Use ‘all’ to compute all available metrics or specify individual metrics (e.g., ‘jsd’, ‘wass’, ‘ks2’, ‘cuc’) along with optional scaling options (e.g., ‘wass(std)’, ‘ks2(rob)’, etc.).
jsd_scaled_bin_width (float) – Width of each histogram bin for scaled JSD (default is 0.01).
print_outliers (bool) – Whether to print outliers (default is False).
- Returns:
Dictionary of distance values specified in distance_metrics for each dataset combination.
- Return type:
dict
- midrc_react.core.famd_calc.calc_famd_ks2_at_date(df1, df2, cols_to_use, numeric_cols, calc_date)
Calculate the KS2 distance between two datasets at a specific date.
- Parameters:
df1 – first DataFrame
df2 – second DataFrame
cols_to_use – columns to use for the calculation
numeric_cols – list of numeric columns
calc_date – date to calculate the KS2 distance
- Returns:
KS2 distance at specified date
- Return type:
float
- midrc_react.core.famd_calc.calc_famd_ks2_at_dates(df1, df2, cols_to_use, numeric_cols, calc_date_list)
Calculate the KS2 distance between two datasets at multiple dates.
- Parameters:
df1 – first DataFrame
df2 – second DataFrame
cols_to_use – columns to use for the calculation
numeric_cols – list of numeric columns
calc_date_list – list of dates to calculate the KS2 distance
- Returns:
list of KS2 distances at each date
- Return type:
list(float)
- midrc_react.core.famd_calc.fit_famd(data)
Fits a Factor Analysis of Mixed Data (FAMD) model to the input data.
- Parameters:
data (pandas.DataFrame) – The input data to fit the FAMD model.
- Returns:
A tuple containing the fitted FAMD model and the row coordinates.
- Return type:
tuple
Example
famd_model, coordinates = fit_famd(data)
- midrc_react.core.famd_calc.preprocess_data_for_famd(raw_df, features, numeric_features, scaling_method='standard')
Preprocesses the raw data for Factor Analysis of Mixed Data (FAMD).
- Parameters:
raw_df (DataFrame) – The raw data to be preprocessed.
features (List) – List of features to be included in the preprocessing.
numeric_features (List) – List of numeric features to be scaled.
scaling_method (str) – The scaling method to use for numeric features.
- Returns:
Preprocessed data with selected features. df (DataFrame): Concatenated DataFrame with preprocessed data and ‘dataset’ column.
- Return type:
c_data (DataFrame)
midrc_react.core.jsdconfig module
This module contains the JSDConfig class, which loads and stores data from a YAML file.
- class midrc_react.core.jsdconfig.JSDConfig(filename: str = 'jsdconfig.yaml')
Bases:
object
The JSDConfig class loads and stores data from a YAML file.
- Variables:
filename (str) – The name of the YAML file to load. Default is ‘jsdconfig.yaml’.
data (dict) – The loaded data from the YAML file.
- __init__(self, filename='jsdconfig.yaml')
Initializes a new instance of JSDConfig.
- __post_init__(self)
Loads the YAML data from the current filename.
- data: dict
- filename: str = 'jsdconfig.yaml'
- set_filename(new_filename: str)
Set a new filename and reload the data.
- Parameters:
new_filename (str) – The new filename to load.
midrc_react.core.jsdcontroller module
This module contains the JSDController class, which manages the JSD view and model.
- class midrc_react.core.jsdcontroller.JSDController(jsd_view, jsd_model, config)
Bases:
QObject
Class JSDController
This class represents a JSD Controller. It emits a signal when the model changes.
- Variables:
modelChanged – A Signal that is emitted when the model changes.
fileChangedSignal – A Signal that is emitted when the file changes.
NOT_REPORTED_COLUMN_NAME (str) – A constant string representing the ‘Not Reported’ column name.
- NOT_REPORTED_COLUMN_NAME = 'Not Reported'
- __init__(jsd_view, jsd_model, config)
Initialize the JSDController.
- Parameters:
jsd_view (object) – The JSD view object.
jsd_model (object) – The JSD model object.
config (JSDConfig) – A dictionary containing configuration data.
- Returns:
None
- category_changed()
Parses the dates from all files for the current category and updates the data in the model appropriately.
- Returns:
None
- connect_signals()
Connects signals for file and category comboboxes.
- fileChangedSignal
- file_changed(_, new_category_index=None)
Parses the categories from the files selected in the comboboxes and updates the category box appropriately. Emits the fileChangedSignal signal upon completion.
- Parameters:
Optional (new_category_index) – The index of the category to set, if None then use previous index.
- get_categories()
Get the list of categories from the data sources.
- Returns:
A list of categories.
- Return type:
list
- get_cols_to_use_for_jsd_calc(source_id, category)
Generates a list of columns from a sheet that should be used in the JSD calculation.
This handles custom categories i.e. for custom age ranges
- Parameters:
source_id (str) – The combobox used to get the data file from.
category (str) – The sheet category to get the columns from.
- Returns:
List of columns in the current sheet category
- get_file_sheets_from_index(index=0)
Get the sheets from the selected file combobox.
- Parameters:
index (int) – The index of the file combobox. Default is 0.
- Returns:
A dictionary containing the sheets from the selected file.
- Return type:
dict
- get_spider_plot_values(calc_date=None)
Compiles a dictionary of categories and JSD values for a given date.
- Parameters:
calc_date (Optional[datetime.date]) – The date to use for JSD calculation. Default is None.
- Returns:
A dictionary of categories and JSD values for a given date.
- Return type:
dict
- get_timeline_data(category: str)
Get the timeline data for the specified category.
- Parameters:
category (str) – The category for which to get the timeline data.
- Returns:
A DataFrame containing the timeline data.
- Return type:
pd.DataFrame
- initialize()
Initialize the JSDController.
- Returns:
None
- property jsd_model: JSDTableModel
Get the JSD model object.
- Returns:
The JSD model object.
- Return type:
object
- property jsd_view: JsdViewBase
Get the JSD view object.
- Returns:
The JSD view object.
- Return type:
object
- modelChanged
- property num_categories: int
Get the number of categories.
- Returns:
The number of categories.
- Return type:
int
- update_category_plots()
Update the category plots.
This method updates the JSD timeline plot and the area chart.
- Returns:
True if the update was successful, False otherwise
- update_file_based_charts()
Update the file-based charts.
This method updates the pie chart dock and the spider chart.
- Returns:
True if the update was successful, False otherwise
- Raises:
None –
- midrc_react.core.jsdcontroller.calculate_jsd(df1, df2, cols_to_use, calc_date)
Calculate the Jensen-Shannon distance between two dataframes for a given date.
There is an assumption that the date column of the dataframes are sorted from smallest to largest.
Note: The Jensen-Shannon distance returned is the square root of the Jensen-Shannon divergence.
- Parameters:
df1 (pd.DataFrame) – First dataframe.
df2 (pd.DataFrame) – Second dataframe.
cols_to_use (list) – List of columns to use for the calculation.
calc_date (pd.Timestamp) – Date for which the calculation is performed.
- Returns:
Jensen-Shannon distance between the two dataframes.
- Return type:
float
- midrc_react.core.jsdcontroller.remove_elements_less_than_from_sorted_list(sorted_list, value)
Remove elements less than the given value from a sorted list.
- Parameters:
sorted_list (list) – A sorted list of elements.
value – The value to compare against.
- Returns:
A new list containing only elements greater than or equal to the value.
- Return type:
list
midrc_react.core.jsdmodel module
This module contains the JSDTableModel class, which is a subclass of QAbstractTableModel.
- class midrc_react.core.jsdmodel.JSDTableModel(data_source_list=None, custom_age_ranges=None)
Bases:
QAbstractTableModel
A class representing a table model for JSD data.
This class inherits from the QAbstractTableModel class and provides the necessary methods to interact with the JSD data in a table format. It handles the display of data, editing of data, and mapping of colors to specific areas of the table.
- Variables:
HEADER_MAPPING (List[str]) – A list of header labels for the table columns.
data_source_added (Signal) – A signal emitted when a data source is added to the model.
- HEADER_MAPPING = ['Date', 'JSD']
- __init__(data_source_list=None, custom_age_ranges=None)
Initialize the JSDTableModel.
This method initializes the JSDTableModel by setting up the input data, mapping, and raw data sources.
- Parameters:
data_source_list (List[dict], optional) –
A list of data sources. Each data source is a dictionary with the following keys:
’name’ (str): The name of the data source.
’data type’ (str): The type of the data source.
’filename’ (str): The filename of the data source.
custom_age_ranges (Any, optional) – Custom age ranges for the data sources.
- Returns:
None
- add_color_mapping(color: str, mapping_area: Any)
Add a color mapping to the JSDTableModel.
- Parameters:
color (str) – The color to be mapped.
mapping_area (Any) – The area to be mapped to the color.
- Returns:
None
- add_data_source(data_source_dict)
Add a data source to the JSDTableModel.
This method adds a data source to the JSDTableModel by creating a new instance of the DataSource class and storing it in the data_sources dictionary. The data source is identified by its name, which is obtained from the ‘name’ key in the data_source_dict parameter. The DataSource object is initialized with the data_source_dict and the custom_age_ranges, if provided.
- Parameters:
data_source_dict (dict) –
A dictionary containing the information about the data source.
The dictionary should have the following keys:
’name’ (str): The name of the data source.
’data type’ (str): The type of the data source.
’filename’ (str): The filename of the data source.
- Returns:
None
- clear_color_mapping()
Clear the color mapping in the JSDTableModel.
This method clears the color mapping, removing all color mappings from the JSDTableModel.
- Returns:
None
- columnCount(_parent: QModelIndex = None) int
Returns the number of columns in the model.
- Parameters:
_parent (QModelIndex) – The parent index. Defaults to QModelIndex(). This is unused.
- Returns:
The number of columns in the model.
- Return type:
int
- property column_infos
Returns the column information of the JSDTableModel.
This method returns the column information of the JSDTableModel, which is a list of dictionaries representing the metadata for each set of two columns in the model (one column for date, one column for the JSD value).
- Returns:
The column information of the JSDTableModel.
Each dictionary contains the following keys:
’index1’ (int): The index of the first file used
’file1’ (str): The file name of the first file used.
’index2’ (int): The index of the second file used.
’file2’ (str): The file name of the second file used.
- Return type:
List[dict]
- data(index: QModelIndex, role: int = ItemDataRole.DisplayRole) Any | None
Returns the data for the given index and role.
- Parameters:
index (QModelIndex) – The index of the data.
role (int) – The role of the data. Defaults to Qt.DisplayRole.
- Returns:
The data for the given index and role. If the role is Qt.DisplayRole or Qt.EditRole, it returns the corresponding data from the input_data list. If the role is Qt.BackgroundRole, it checks if the index is within any of the mapping areas defined in the _color_mapping dictionary. If it is, it returns the corresponding color from the _color_cache dictionary. If it is not within any mapping area, it returns the color white from the _color_cache dictionary. If the role is not any of the above, it returns None.
- Return type:
Optional[Any]
- data_source_added
- flags(index: QModelIndex) ItemFlag
Get the flags for the given index.
- Parameters:
index (QModelIndex) – The index of the item.
- Returns:
The flags for the item.
- Return type:
int
- headerData(section: int, orientation: Orientation, role: int = ItemDataRole.DisplayRole) Any
Returns the header data for the specified section, orientation, and role.
- Parameters:
section (int) – The section index.
orientation (int) – The orientation of the header (Qt.Horizontal or Qt.Vertical).
role (int) – The role of the header data.
- Returns:
The header data for the specified section, orientation, and role.
- Return type:
Any
- property input_data
Returns the input data of the JSDTableModel.
This method returns the input data of the JSDTableModel, which is a list of lists representing the data for each column in the model. Each inner list represents a column and contains the data for that column.
- Returns:
The input data of the JSDTableModel.
- Return type:
List[List[Any]]
- rowCount(parent: QModelIndex = None) int
Get the number of rows in the model.
This method returns the number of rows in the model based on the length of the input data. If the parent index is invalid, this method returns the largest number of rows in a column.
- Parameters:
parent (QModelIndex) – The parent index.
- Returns:
The number of rows in the model.
- Return type:
int
- setData(index: QModelIndex, value: Any, role: int = ItemDataRole.EditRole) bool
Set the data for the given index.
- Parameters:
index (QModelIndex) – The index of the data to be set.
value (Any) – The new value to be set.
role (int) – The role of the data. Defaults to Qt.EditRole.
- Returns:
True if the data was successfully set, False otherwise.
- Return type:
bool
- update_input_data(new_input_data, new_column_infos)
Update the input data and column information in the JSDTableModel.
This method updates the input data and column information in the JSDTableModel. It clears the existing input data and column information, and then sets them to the new values provided as arguments. It also updates the maximum row count based on the new input data.
- Parameters:
new_input_data (List[List[Any]]) – The new input data to be set in the model. It should be a list of lists, where each inner list represents a column and contains the data for that column.
new_column_infos (List[dict]) – The new column information to be set in the model. It should be a list of dictionaries, where each dictionary represents the metadata for a column.
- Returns:
None
midrc_react.core.numeric_distances module
This module contains functions for calculating various numerical distance metrics between datasets.
- midrc_react.core.numeric_distances.build_histogram_dict(df, dataset_column, datasets, feature_column, bin_width, scaling_method=None)
Build a dictionary of histogram data for a specific dataset.
- Parameters:
df – DataFrame containing the dataset.
dataset_column – Name of the dataset column within the DataFrame.
datasets – List of dataset names within the DataFrame.
feature_column – Name of the feature column within the DataFrame.
bin_width – Width of each histogram bin (default is 0.01).
scaling_method – Method to use for scaling the feature column (default is None).
See also
get_supported_scaling_methods()
:Returns the list of scaling methods that can be used.
- Returns:
Dictionary containing histogram data for the specified dataset.
- Return type:
hist_dict
- midrc_react.core.numeric_distances.calc_cucconi_by_feature(df, feature: str, dataset_column: str = '_dataset_', scaling: str = None)
Calculate the Cucconi test for a specific feature.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the data.
feature (str) – The name of the feature to calculate the metric for.
dataset_column (str, optional) – The name of the column containing the dataset information. Defaults to ‘_dataset_’.
scaling (str, optional) – The scaling method to use for the feature. Defaults to None. (e.g., ‘standard’, ‘minmax’, ‘maxabs’, or ‘robust’)
See also
calc_ks2_samp_by_feature()
:Calculates the Kolmogorov-Smirnov test for a feature.
- Returns:
A dictionary containing the metric results for each dataset combination.
- Return type:
dict
- midrc_react.core.numeric_distances.calc_distances_via_df(famd_df: DataFrame, feature_column: str, dataset_column: str = '_dataset_', *, distance_metrics: tuple[str] = 'all', jsd_scaled_bin_width=0.01)
Calculate various distance metrics based on histogram data.
This function computes Jensen-Shannon Divergence (JSD), Wasserstein distance, Kolmogorov-Smirnov (KS) statistics, and Cucconi distance, with optional scaling methods for Wasserstein and KS distances. The results are returned as a dictionary where keys represent the metric names.
- Parameters:
famd_df (pd.DataFrame) – A DataFrame containing FAMD (Factorial Analysis of Mixed Data) results.
feature_column (str) – The name of the column containing the feature data.
dataset_column (str) – The name of the column containing the dataset information.
distance_metrics (tuple) – A tuple of strings specifying which distance metrics to compute. Use ‘all’ to compute all available metrics or specify individual metrics (e.g., ‘jsd’, ‘wass’, ‘ks2’, ‘cuc’) along with optional scaling options (e.g., ‘wass(std)’, ‘ks2(rob)’, etc.).
jsd_scaled_bin_width (float) – The bin width to use for the JSD calculation.
- Returns:
- A dictionary with keys as distance metric names and values as the computed metrics.
For example, output could include keys like ‘jsd’, ‘wass’, ‘ks2’, ‘cuc’, etc. If specific scaling options are computed, keys would include these as well, such as ‘wass(std)’, ‘jsd(rob)’, etc.
- Return type:
dict
- midrc_react.core.numeric_distances.calc_ks2_samp_by_feature(df, feature: str, dataset_column: str = '_dataset_', scaling: str = None)
Calculate the Kolmogorov-Smirnov test for a specific feature.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the data.
feature (str) – The name of the feature to calculate the metric for.
dataset_column (str, optional) – The name of the column containing the dataset information. Defaults to ‘_dataset_’.
scaling (str, optional) – The scaling method to use for the feature. Defaults to None. (e.g., ‘standard’, ‘minmax’, ‘maxabs’, or ‘robust’)
See also
calc_wasserstein_by_feature()
:Calculates the Wasserstein distance for a feature.
- Returns:
A dictionary containing the metric results for each dataset combination.
- Return type:
dict
- midrc_react.core.numeric_distances.calc_numerical_metric_by_feature(df, feature: str, dataset_column: str, metric_function)
Calculate a specified metric based on a single feature for input datasets.
- Parameters:
df – pandas DataFrame containing the data
feature – a string representing the feature to calculate the metric for
dataset_column – a string representing the column containing the dataset information
metric_function – a function to calculate the desired metric (e.g., cucconi, ks_2samp)
- Returns:
A dictionary containing metric results for each dataset combination.
- Return type:
dict
- midrc_react.core.numeric_distances.calc_wasserstein_by_feature(df, feature: str, dataset_column: str = '_dataset_', scaling: str = None)
Calculate the Wasserstein distance for a specific feature.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the data.
feature (str) – The name of the feature to calculate the metric for.
dataset_column (str, optional) – The name of the column containing the dataset information. Defaults to ‘_dataset_’.
scaling (str, optional) – The scaling method to use for the feature. Defaults to None. (e.g., ‘standard’, ‘minmax’, ‘maxabs’, or ‘robust’)
See also
calc_ks2_samp_by_feature()
:Calculates the Kolmogorov-Smirnov test for a feature.
get_supported_scaling_methods()
:Returns the list of scaling methods that can be used.
- Returns:
A dictionary containing the metric results for each dataset combination.
- Return type:
dict
- midrc_react.core.numeric_distances.generate_histogram(df, dataset_column, dataset_name, feature_column, bin_width=0.01)
Generate a histogram for a specific dataset within a DataFrame.
- Parameters:
df – DataFrame containing the dataset.
dataset_column – Name of the dataset column within the DataFrame.
dataset_name – Name of the dataset within the DataFrame.
feature_column – Name of the feature column within the DataFrame.
bin_width – Width of each histogram bin (default is 0.01).
- Returns:
Array of histogram values. bins: Array of bin edges.
- Return type:
hist
Enhance readability by generating a histogram for a specific dataset based on the provided x-coordinates and bin width.
- midrc_react.core.numeric_distances.get_supported_scaling_methods()
Get a list of supported scaling methods.
- Returns:
A list of supported scaling methods.
- Return type:
list
- midrc_react.core.numeric_distances.scale_feature(df, feature: str, method: str = 'standard')
Normalize a feature to mean 0 and standard deviation 1.
- Parameters:
df – pandas DataFrame containing the data
feature – a string representing the feature to normalize
method – a string representing the normalization method to use
See also
get_supported_scaling_methods()
:Returns the list of scaling methods that can be used.
- Returns:
A copy of the DataFrame with the feature normalized.
- midrc_react.core.numeric_distances.scale_values(values, method: str = 'standard')
Normalize a feature to mean 0 and standard deviation 1.
- Parameters:
values – a list of values to normalize
method – a string representing the normalization method to use
See also
get_supported_scaling_methods()
:Returns the list of scaling methods that can be used.
- Returns:
A copy of the DataFrame with the feature normalized.