ourtils.diffs#

Diff related functions.

class ourtils.diffs.DataFrameDiffer(left: ~pandas.core.frame.DataFrame, right: ~pandas.core.frame.DataFrame, join_on: list[str], column_cleaner: ~typing.Callable = <function DataFrameDiffer.<lambda>>)#

Bases: object

A class to help compare dataframes. Deals with:

  • Matching columns

  • Filtering to differing rows

Example:
In [1]: import pandas as pd

In [2]: from ourtils import diffs

In [3]: df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

In [4]: df1
Out[4]: 
   a  b
0  1  4
1  2  5
2  3  6

In [5]: df2 = pd.DataFrame({"a": [1, 2, 4], "B": [4, 5, 1], "c": [5, 6, 1]})

In [6]: df2
Out[6]: 
   a  B  c
0  1  4  5
1  2  5  6
2  4  1  1

# Create a DataFrameDiffer object
In [7]: diffy = diffs.DataFrameDiffer(df1, df2, "a")

In [8]: print(diffy.create_report())
---------------
ourtils.DifferenceReport
---------------
Column summary:
Removed:
Added: {'c'}
Matching: {'b', 'a'}
---------------
Row summary (2 total changes):
Inserted: 1
Updated: 0
Deleted: 1
Same: 2

In [9]: diffy.combined
Out[9]: 
   a  b__left  b__right    c               diff    action
0  1      4.0       4.0  5.0                 {}      same
1  2      5.0       5.0  6.0                 {}      same
2  3      6.0       NaN  NaN  {'b': (6.0, nan)}   deleted
3  4      NaN       1.0  1.0  {'b': (nan, 1.0)}  inserted
property changed_data: DataFrame#
property comparable: DataFrame#
classmethod create_diff(row: Series, cols: list[str]) dict#

Creates a dictionary that stores changes to the columns in a row.

Parameters:
  • row – A row from a dataframe

  • cols – A list of strings

create_report() str#

Returns a string report of the differences.

property left_columns: set#
lsuffix = '__left'#
property matching_columns: set#
property missing_columns#
property n_changed_rows#
property n_deleted#
property n_inserted#
property n_same#
property n_updated#
property new_columns#
property right_columns: set#
rsuffix = '__right'#
property summary_dict: dict#
property summary_msg: str#
class ourtils.diffs.SetComparison(only_a: set, only_b: set, in_both: set)#

Bases: object

in_both: set#
only_a: set#
only_b: set#
ourtils.diffs.compare_sets(a: set, b: set, report=True) SetComparison#

Returns a tuple of useful differences of a set.

Example:
In [1]: a = {1, 2, 3}

In [2]: b = {2, 3, 5}

In [3]: diffs.compare_sets(a, b)
Only A: {1}
Only B: {5}
In both: {2, 3}
Out[3]: SetComparison(only_a={1}, only_b={5}, in_both={2, 3})