12.3.10.4.9. Selection of dataΒΆ

import pandas as pd
import numpy as np


dates = pd.date_range("20220501", periods=6)
dataFrame = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))

Getting data

dataFrame["A"]
2022-05-01   -0.251770
2022-05-02    0.918941
2022-05-03   -0.871973
2022-05-04    0.881984
2022-05-05   -0.848954
2022-05-06   -0.293716
Freq: D, Name: A, dtype: float64
dataFrame[0:3]
A B C D
2022-05-01 -0.251770 0.636441 -1.130886 0.712172
2022-05-02 0.918941 -0.245160 -0.261813 0.368618
2022-05-03 -0.871973 -0.620147 0.692768 0.838307


dataFrame["20220501":"20220502"]
A B C D
2022-05-01 -0.251770 0.636441 -1.130886 0.712172
2022-05-02 0.918941 -0.245160 -0.261813 0.368618


**Selection by label **

dataFrame.loc[dates[0]]
A   -0.251770
B    0.636441
C   -1.130886
D    0.712172
Name: 2022-05-01 00:00:00, dtype: float64
dataFrame.loc[:, ["A", "B"]]
A B
2022-05-01 -0.251770 0.636441
2022-05-02 0.918941 -0.245160
2022-05-03 -0.871973 -0.620147
2022-05-04 0.881984 -1.001859
2022-05-05 -0.848954 -1.514564
2022-05-06 -0.293716 0.525157


dataFrame.loc["20220501":"20220502", ["A", "B"]]
A B
2022-05-01 -0.251770 0.636441
2022-05-02 0.918941 -0.245160


dataFrame.loc["20220501", ["A", "B"]]
A   -0.251770
B    0.636441
Name: 2022-05-01 00:00:00, dtype: float64
dataFrame.loc[dates[0], "A"]
-0.25176959648489
dataFrame.at[dates[0], "A"]
-0.25176959648489

Selection by position

dataFrame.iloc[3]
A    0.881984
B   -1.001859
C    0.047767
D    0.622211
Name: 2022-05-04 00:00:00, dtype: float64
dataFrame.iloc[3:5, 0:2]
A B
2022-05-04 0.881984 -1.001859
2022-05-05 -0.848954 -1.514564


dataFrame.iloc[[1, 2, 4], [0, 2]]
A C
2022-05-02 0.918941 -0.261813
2022-05-03 -0.871973 0.692768
2022-05-05 -0.848954 -0.449648


dataFrame.iloc[1:3, :]
A B C D
2022-05-02 0.918941 -0.245160 -0.261813 0.368618
2022-05-03 -0.871973 -0.620147 0.692768 0.838307


dataFrame.iloc[:, 1:3]
B C
2022-05-01 0.636441 -1.130886
2022-05-02 -0.245160 -0.261813
2022-05-03 -0.620147 0.692768
2022-05-04 -1.001859 0.047767
2022-05-05 -1.514564 -0.449648
2022-05-06 0.525157 0.403480


dataFrame.iloc[1, 1]
-0.24515994350472906
dataFrame.iat[1, 1]
-0.24515994350472906

Boolean indexing

dataFrame[dataFrame["A"] > 0]
A B C D
2022-05-02 0.918941 -0.245160 -0.261813 0.368618
2022-05-04 0.881984 -1.001859 0.047767 0.622211


dataFrame[dataFrame > 0]
A B C D
2022-05-01 NaN 0.636441 NaN 0.712172
2022-05-02 0.918941 NaN NaN 0.368618
2022-05-03 NaN NaN 0.692768 0.838307
2022-05-04 0.881984 NaN 0.047767 0.622211
2022-05-05 NaN NaN NaN 1.018739
2022-05-06 NaN 0.525157 0.403480 NaN


dataFrame2 = dataFrame.copy()
dataFrame2["E"] = ["one", "one", "two", "three", "four", "three"]
dataFrame2[dataFrame2["E"].isin(["two", "four"])]
A B C D E
2022-05-03 -0.871973 -0.620147 0.692768 0.838307 two
2022-05-05 -0.848954 -1.514564 -0.449648 1.018739 four


Setting data

series = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
dataFrame["F"] = series
dataFrame.at[dates[0], "A"] = 0
dataFrame.iat[0, 1] = 0
dataFrame.loc[:, "D"] = np.array([5] * len(dataFrame))
dataFrame2 = dataFrame.copy()
dataFrame2[dataFrame2 > 0] = -dataFrame2

Total running time of the script: ( 0 minutes 0.057 seconds)