Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

programing

Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

kingscode 2022. 10. 1. 20:47

Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

"BoolCol" 열이 있는 DataFrame을 지정하면 "BoolCol" == True인 DataFrame의 인덱스를 찾습니다.

저는 현재 완벽하게 작동하는 반복적인 방법을 가지고 있습니다.

for i in range(100,3000):
    if df.iloc[i]['BoolCol']== True:
         print i,df.iloc[i]['BoolCol']

하지만 이것은 판다의 올바른 방법이 아니다.조사 결과, 현재 이 코드를 사용하고 있습니다.

df[df['BoolCol'] == True].index.tolist()

이것은 인덱스 목록을 제공하지만 다음 작업을 수행하여 인덱스를 확인했을 때 일치하지 않습니다.

df.iloc[i]['BoolCol']

결과는 사실 False!!

판다들이 이렇게 하는 올바른 방법은 무엇일까요?

df.iloc[i]를 반환하다ith줄지어 늘어선df.i인덱스 라벨을 참조하지 않습니다.i는 0 기반의 인덱스입니다.

반면 Atribute는 숫자 행 표시기가 아닌 실제 인덱스 레이블을 반환합니다.

df.index[df['BoolCol'] == True].tolist()

또는 동등하게

df.index[df['BoolCol']].tolist()

행의 숫자 위치와 같지 않은 기본값 이외의 인덱스를 사용하여 DataFrame을 재생하면 차이를 명확하게 알 수 있습니다.

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
       index=[10,20,30,40,50])

In [53]: df
Out[53]: 
   BoolCol
10    True
20   False
30   False
40    True
50    True

[5 rows x 1 columns]

In [54]: df.index[df['BoolCol']].tolist()
Out[54]: [10, 40, 50]

색인을 사용하려면

In [56]: idx = df.index[df['BoolCol']]

In [57]: idx
Out[57]: Int64Index([10, 40, 50], dtype='int64')

그런 다음 대신 을 사용하여 행을 선택할 수 있습니다.

In [58]: df.loc[idx]
Out[58]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

는 부울 배열을 받아들일 수도 있습니다.

In [55]: df.loc[df['BoolCol']]
Out[55]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

부울 배열이 있고 서수 인덱스 값이 필요한 경우 다음과 같이 계산할 수 있습니다.

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

사용하다df.iloc순서형 인덱스로 행을 선택하려면:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]: 
   BoolCol
10    True
40    True
50    True

numpy where() 함수를 사용하여 수행할 수 있습니다.

import pandas as pd
import numpy as np

In [716]: df = pd.DataFrame({"gene_name": ['SLC45A1', 'NECAP2', 'CLIC4', 'ADC', 'AGBL4'] , "BoolCol": [False, True, False, True, True] },
       index=list("abcde"))

In [717]: df
Out[717]: 
  BoolCol gene_name
a   False   SLC45A1
b    True    NECAP2
c   False     CLIC4
d    True       ADC
e    True     AGBL4

In [718]: np.where(df["BoolCol"] == True)
Out[718]: (array([1, 3, 4]),)

In [719]: select_indices = list(np.where(df["BoolCol"] == True)[0])

In [720]: df.iloc[select_indices]
Out[720]: 
  BoolCol gene_name
b    True    NECAP2
d    True       ADC
e    True     AGBL4

일치에 항상 인덱스가 필요한 것은 아니지만 필요한 경우:

In [796]: df.iloc[select_indices].index
Out[796]: Index([u'b', u'd', u'e'], dtype='object')

In [797]: df.iloc[select_indices].index.tolist()
Out[797]: ['b', 'd', 'e']

데이터 프레임 개체를 한 번만 사용하려면 다음 명령을 사용합니다.

df['BoolCol'].loc[lambda x: x==True].index

간단한 방법은 필터링 전에 DataFrame 인덱스를 리셋하는 것입니다.

df_reset = df.reset_index()
df_reset[df_reset['BoolCol']].index.tolist()

좀 촌스럽긴 하지만, 빠르잖아!

먼저 확인하시기 바랍니다.query대상 열이 유형인 경우bool(PS: 사용방법에 대해서는 링크를 확인해 주세요.)

df.query('BoolCol')
Out[123]: 
    BoolCol
10     True
40     True
50     True

원래 df를 Boolean 컬럼으로 필터링한 후 인덱스를 선택할 수 있습니다.

df=df.query('BoolCol')
df.index
Out[125]: Int64Index([10, 40, 50], dtype='int64')

또한 팬더들은nonzero의 위치만 선택하면 됩니다.True노를 젓고 그것을 사용하여DataFrame또는index

df.index[df.BoolCol.nonzero()[0]]
Out[128]: Int64Index([10, 40, 50], dtype='int64')

나는 이 질문을 확장했다. 어떻게 하면 이 질문을 얻을 수 있을까?row,column그리고.value모든 일치값의 값?

해결책은 다음과 같습니다.

import pandas as pd
import numpy as np


def search_coordinate(df_data: pd.DataFrame, search_set: set) -> list:
    nda_values = df_data.values
    tuple_index = np.where(np.isin(nda_values, [e for e in search_set]))
    return [(row, col, nda_values[row][col]) for row, col in zip(tuple_index[0], tuple_index[1])]


if __name__ == '__main__':
    test_datas = [['cat', 'dog', ''],
                  ['goldfish', '', 'kitten'],
                  ['Puppy', 'hamster', 'mouse']
                  ]
    df_data = pd.DataFrame(test_datas)
    print(df_data)
    result_list = search_coordinate(df_data, {'dog', 'Puppy'})
    print(f"\n\n{'row':<4} {'col':<4} {'name':>10}")
    [print(f"{row:<4} {col:<4} {name:>10}") for row, col, name in result_list]

출력:

          0        1       2
0       cat      dog        
1  goldfish           kitten
2     Puppy  hamster   mouse


row  col        name
0    1           dog
2    0         Puppy

관심 있는 기존의 인덱스 후보에서는 다음과 같이 전체 열을 체크하지 않음으로써 더 빠른 방법을 사용할 수 있습니다.

np.array(index_slice)[np.where(df.loc[index_slice]['column_name'] >= threshold)[0]]

완전 비교:

import pandas as pd
import numpy as np

index_slice = list(range(50,150)) # know index location for our inteterest
data = np.zeros(10000)
data[(index_slice)] = np.random.random(len(index_slice))

df = pd.DataFrame(
    {'column_name': data},
)

threshold = 0.5

%%timeit
np.array(index_slice)[np.where(df.loc[index_slice]['column_name'] >= threshold)[0]]
# 600 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
[i for i in index_slice if i in df.index[df['column_name'] >= threshold].tolist()]
# 22.5 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

동작 방법은 다음과 같습니다.

# generate Boolean satisfy condition only in sliced column
df.loc[index_slice]['column_name'] >= threshold

# convert Boolean to index, but start from 0 and increment by 1
np.where(...)[0]

# list of index to be sliced
np.array(index_slice)[...]

주의: 다음 점에 주의해 주십시오.np.array(index_slice)로 대체할 수 없다df.index때문에np.where(...)[0]색인화start from 0 and increment by 1, 하지만 당신은 다음과 같은 것을 만들 수 있다.df.index[index_slice]그리고 한 번만 적은 열로 하면 번거롭지 않을 것 같아요.

언급URL : https://stackoverflow.com/questions/21800169/python-pandas-get-index-of-rows-which-column-matches-certain-value

저작자표시 (새창열림)

'programing' 카테고리의 다른 글

DBForge를 통해 MariaDB column_list 결과 보기 (0)	2022.10.01
jQuery를 사용하여 테이블 행을 삭제하는 가장 좋은 방법은 무엇입니까? (0)	2022.10.01
mail() 함수에 대한 오류 메시지를 받으려면 어떻게 해야 합니까? (0)	2022.10.01
Python에서 모든 서브디렉토리를 가져오는 방법 (0)	2022.10.01
mariadb: mariadb의 직업.치명적인 신호가 제어 프로세스에 전달되어 서비스가 실패했습니다. (0)	2022.09.27

현재글Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

각종 프로그래밍 정보를 다루는 블로그입니다.

javascript, spring3, java, c++, C#, jquery, Spring,

Today :
Yesterday :

kingscode

Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

Python 판다:특정 값과 일치하는 열의 인덱스 가져오기

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바