Quickstart#
ourtils can help with things like combining similar columns:
In [1]: import pandas as pd
In [2]: from ourtils.wrangling import squish
In [3]: mydf = pd.DataFrame({
...: 'index_col': ['a', 'a', 'b', 'b'],
...: 'col_a_1': [1, 2, 30, 40],
...: 'col_b_2': [3, 4, 50, 60]
...: })
...:
In [4]: mydf
Out[4]:
index_col col_a_1 col_b_2
0 a 1 3
1 a 2 4
2 b 30 50
3 b 40 60
In [5]: squish(mydf, index_var='index_col', col_sep='_', agg_func= list)
Out[5]:
index_col group value
0 a a [1, 2]
1 a b [3, 4]
2 b a [30, 40]
3 b b [50, 60]
Or creating new columns in a dataframe:
In [6]: import pandas as pd
In [7]: from ourtils.wrangling import create_column
In [8]: mydf = pd.DataFrame({
...: 'group': ['a', 'a', 'b', 'b', 'c'],
...: 'score': [60, 61, 15, 16, 99]
...: })
...:
In [9]: mydf
Out[9]:
group score
0 a 60
1 a 61
2 b 15
3 b 16
4 c 99
In [10]: def create_score_message(group_name: str, total_score: int) -> str:
....: return f"Group {group_name} had a score of {total_score}"
....:
In [11]: (
....: mydf
....: .pipe(create_column, 'newcolumn', create_score_message, 'group', 'score')
....: )
....:
Out[11]:
group score newcolumn
0 a 60 Group a had a score of 60
1 a 61 Group a had a score of 61
2 b 15 Group b had a score of 15
3 b 16 Group b had a score of 16
4 c 99 Group c had a score of 99
Or compare dataframes with the same columns before / after an operation:
In [12]: import pandas as pd
In [13]: from ourtils.diffs import DataFrameDiffer
In [14]: database_before = pd.DataFrame({'a': [1, 2, 3]})
In [15]: database_after = pd.DataFrame({'a': [1, 2]})
In [16]: diffy = DataFrameDiffer(database_before, database_after, "a")
In [17]: print(diffy.create_report())
---------------
ourtils.DifferenceReport
---------------
Column summary:
Removed:
Added:
Matching: {'a'}
---------------
Row summary (1 total changes):
Inserted: 0
Updated: 0
Deleted: 1
Same: 2
In [18]: diffy.combined
Out[18]:
a diff action
0 1 {} same
1 2 {} same
2 3 {} deleted