NLTK 및 불용어 실패 #lookuperror

programing

NLTK 및 불용어 실패 #lookuperror

kingscode 2021. 1. 17. 10:58

NLTK 및 불용어 실패 #lookuperror

감정 분석 프로젝트를 시작하려고하는데 불용어 방법을 사용하겠습니다. 나는 약간의 조사를했고 nltk에 불용어가 있다는 것을 발견했지만 명령을 실행할 때 오류가 있습니다.

내가하는 일은 nltk가 사용하는 단어가 무엇인지 알기 위해 다음과 같습니다 (예 : 섹션 4.1의 http://www.nltk.org/book/ch02.html ).

from nltk.corpus import stopwords
stopwords.words('english')

하지만 Enter 키를 누르면

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.

LookupError:
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
- 'C:\\Users\\Meru/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\lib\\nltk_data'
- 'C:\\Users\\Meru\\AppData\\Roaming\\nltk_data'
**********************************************************************

그리고이 문제 때문에 이와 같은 것은 제대로 실행될 수 없습니다 (동일한 오류가 발생 함).

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

무엇이 문제인지 알고 있습니까? 나는 스페인어로 단어를 사용해야합니다. 다른 방법을 추천합니까? 나는 또한 영어로 된 데이터 세트와 함께 Goslate 패키지를 사용한다고 생각했습니다.

읽어 주셔서 감사합니다!

PD : 저는 Ananconda를 사용합니다.

컴퓨터에 불용어 말뭉치가없는 것 같습니다.

NLTK 다운로더를 시작하고 필요한 모든 데이터를 다운로드해야합니다.

Python 콘솔을 열고 다음을 수행합니다.

>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/

열리는 GUI 창에서 간단히 '다운로드'버튼을 눌러 모든 말뭉치를 다운로드하거나 'Corpora'탭으로 이동하여 필요 / 원하는 것만 다운로드합니다.

우분투 터미널에서 시도했지만 tttthomasssss 답변에 따라 GUI가 표시되지 않은 이유를 모르겠습니다. 그래서 KLDavenport의 의견을 따랐습니다. 요약은 다음과 같습니다.

터미널 / 명령 줄을 열고 python을 입력 한 다음

>>> import nltk .>>> nltk.download("stopwords")

이것은 nltk_data 아래에 불용어 말뭉치를 저장합니다. 제 경우에는 /home/myusername/nltk_data/corpora/stopwords.

다른 말뭉치가 필요한 경우 nltk 데이터 를 방문 하여 해당 ID로 말뭉치를 찾으십시오. 그런 다음 ID를 사용하여 불용어처럼 다운로드합니다.

NLTK Corpus를 수동으로 설치하려는 경우.

1) Go to http://www.nltk.org/nltk_data/ and download your desired NLTK Corpus file.

2) Now in a Python shell check the value of nltk.data.path

3) Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.

4) Now you can import the data from nltk.corpos import stopwords

Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9

import nltk
nltk.download()

Click on download button when gui prompted. It worked for me.(nltk.download('stopwords') doesn't work for me)

ReferenceURL : https://stackoverflow.com/questions/26693736/nltk-and-stopwords-fail-lookuperror

'programing' 카테고리의 다른 글

문자열에서 마지막 점과 일치하는 정규식 (0)	2021.01.17
rake db : create가 postgresql에서 "데이터베이스가 존재하지 않음"오류를 발생시킵니다. (0)	2021.01.17
Angular material $ mdToast의 메시지 유형에 따라 Toast의 색상을 어떻게 변경할 수 있습니까? (0)	2021.01.17
iOS는 다른 배열에서 배열 요소를 신속하게 제거합니다. (0)	2021.01.17
.NET을 사용하여 동일한 프로세스로 여러 명령 줄 실행 (0)	2021.01.17

현재글NLTK 및 불용어 실패 #lookuperror

각종 프로그래밍 정보를 다루는 블로그입니다.

spring3, Spring, C#, c++, jquery, javascript, java,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

kingscode

NLTK 및 불용어 실패 #lookuperror

NLTK 및 불용어 실패 #lookuperror

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

NLTK 및 불용어 실패 #lookuperror

NLTK 및 불용어 실패 #lookuperror

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바