12.3.10.4.5. OperationsΒΆ

import pandas as pd
import numpy as np


dates = pd.date_range("20220501", periods=6)
dataFrame = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))

Statistics

dataFrame.mean()
A   -0.076181
B   -0.154871
C    0.021672
D    0.170158
dtype: float64

Mean value of axis 1:

dataFrame.mean(1)
2022-05-01   -0.539588
2022-05-02    0.889715
2022-05-03    0.513966
2022-05-04    0.142509
2022-05-05   -0.654542
2022-05-06   -0.410892
Freq: D, dtype: float64
series = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
dataFrame.sub(series, axis="index")
A B C D
2022-05-01 NaN NaN NaN NaN
2022-05-02 NaN NaN NaN NaN
2022-05-03 -1.640960 -1.456855 1.206000 -0.052322
2022-05-04 -3.468697 -4.656828 -2.609051 -0.695389
2022-05-05 -5.352329 -4.932975 -5.455950 -6.876913
2022-05-06 NaN NaN NaN NaN


Apply

dataFrame.apply(np.cumsum)
A B C D
2022-05-01 0.631104 -0.756255 -2.131839 0.098637
2022-05-02 1.017295 0.137240 -1.091693 1.337665
2022-05-03 0.376335 -0.319615 1.114307 2.285343
2022-05-04 -0.092362 -1.976443 1.505256 4.589954
2022-05-05 -0.444691 -1.909418 1.049306 2.713041
2022-05-06 -0.457084 -0.929227 0.130034 1.020947


dataFrame.apply(lambda x: x.max() - x.min())
A    1.272064
B    2.637019
C    4.337839
D    4.181524
dtype: float64

Histogramming

series = pd.Series(np.random.randint(0, 7, size=10))
series.value_counts()
0    3
4    2
6    2
1    2
3    1
dtype: int64

String methods

series = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
series.str.lower()
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

Total running time of the script: ( 0 minutes 0.020 seconds)