Quickstart

Quickstart#

ourtils can help with things like combining similar columns:

In [1]: import pandas as pd

In [2]: from ourtils.wrangling import squish

In [3]: mydf = pd.DataFrame({
   ...:    'index_col': ['a', 'a', 'b', 'b'],
   ...:    'col_a_1': [1, 2, 30, 40],
   ...:    'col_b_2': [3, 4, 50, 60]
   ...: })
   ...: 

In [4]: mydf
Out[4]: 
  index_col  col_a_1  col_b_2
0         a        1        3
1         a        2        4
2         b       30       50
3         b       40       60

In [5]: squish(mydf, index_var='index_col', col_sep='_', agg_func= list)
Out[5]: 
  index_col group     value
0         a     a    [1, 2]
1         a     b    [3, 4]
2         b     a  [30, 40]
3         b     b  [50, 60]

Or creating new columns in a dataframe:

In [6]: import pandas as pd

In [7]: from ourtils.wrangling import create_column

In [8]: mydf = pd.DataFrame({
   ...:    'group': ['a', 'a', 'b', 'b', 'c'],
   ...:    'score': [60, 61, 15, 16, 99]
   ...: })
   ...: 

In [9]: mydf
Out[9]: 
  group  score
0     a     60
1     a     61
2     b     15
3     b     16
4     c     99

In [10]: def create_score_message(group_name: str, total_score: int) -> str:
   ....:    return f"Group {group_name} had a score of {total_score}"
   ....: 

In [11]: (
   ....:    mydf
   ....:    .pipe(create_column, 'newcolumn', create_score_message, 'group', 'score')
   ....: )
   ....: 
Out[11]: 
  group  score                  newcolumn
0     a     60  Group a had a score of 60
1     a     61  Group a had a score of 61
2     b     15  Group b had a score of 15
3     b     16  Group b had a score of 16
4     c     99  Group c had a score of 99

Or compare dataframes with the same columns before / after an operation:

In [12]: import pandas as pd

In [13]: from ourtils.diffs import DataFrameDiffer

In [14]: database_before = pd.DataFrame({'a': [1, 2, 3]})

In [15]: database_after = pd.DataFrame({'a': [1, 2]})

In [16]: diffy = DataFrameDiffer(database_before, database_after, "a")

In [17]: print(diffy.create_report())
---------------
ourtils.DifferenceReport
---------------
Column summary:
Removed:
Added:
Matching: {'a'}
---------------
Row summary (1 total changes):
Inserted: 0
Updated: 0
Deleted: 1
Same: 2

In [18]: diffy.combined
Out[18]: 
   a diff   action
0  1   {}     same
1  2   {}     same
2  3   {}  deleted