ourtils.diffs#
Diff related functions.
- class ourtils.diffs.DataFrameDiffer(left: ~pandas.core.frame.DataFrame, right: ~pandas.core.frame.DataFrame, join_on: list[str], column_cleaner: ~typing.Callable = <function DataFrameDiffer.<lambda>>)#
Bases:
objectA class to help compare dataframes. Deals with:
Matching columns
Filtering to differing rows
- Example:
In [1]: import pandas as pd In [2]: from ourtils import diffs In [3]: df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) In [4]: df1 Out[4]: a b 0 1 4 1 2 5 2 3 6 In [5]: df2 = pd.DataFrame({"a": [1, 2, 4], "B": [4, 5, 1], "c": [5, 6, 1]}) In [6]: df2 Out[6]: a B c 0 1 4 5 1 2 5 6 2 4 1 1 # Create a DataFrameDiffer object In [7]: diffy = diffs.DataFrameDiffer(df1, df2, "a") In [8]: print(diffy.create_report()) --------------- ourtils.DifferenceReport --------------- Column summary: Removed: Added: {'c'} Matching: {'b', 'a'} --------------- Row summary (2 total changes): Inserted: 1 Updated: 0 Deleted: 1 Same: 2 In [9]: diffy.combined Out[9]: a b__left b__right c diff action 0 1 4.0 4.0 5.0 {} same 1 2 5.0 5.0 6.0 {} same 2 3 6.0 NaN NaN {'b': (6.0, nan)} deleted 3 4 NaN 1.0 1.0 {'b': (nan, 1.0)} inserted
- property changed_data: DataFrame#
- property comparable: DataFrame#
- classmethod create_diff(row: Series, cols: list[str]) dict#
Creates a dictionary that stores changes to the columns in a row.
- Parameters:
row – A row from a dataframe
cols – A list of strings
- create_report() str#
Returns a string report of the differences.
- property left_columns: set#
- lsuffix = '__left'#
- property matching_columns: set#
- property missing_columns#
- property n_changed_rows#
- property n_deleted#
- property n_inserted#
- property n_same#
- property n_updated#
- property new_columns#
- property right_columns: set#
- rsuffix = '__right'#
- property summary_dict: dict#
- property summary_msg: str#
- class ourtils.diffs.SetComparison(only_a: set, only_b: set, in_both: set)#
Bases:
object- in_both: set#
- only_a: set#
- only_b: set#
- ourtils.diffs.compare_sets(a: set, b: set, report=True) SetComparison#
Returns a tuple of useful differences of a set.
- Example:
In [1]: a = {1, 2, 3} In [2]: b = {2, 3, 5} In [3]: diffs.compare_sets(a, b) Only A: {1} Only B: {5} In both: {2, 3} Out[3]: SetComparison(only_a={1}, only_b={5}, in_both={2, 3})