Python 라이브러리 공부 기록 05. 함수형 프로그래밍 모듈

Date: 2022.11.27 Updated: 2022.11.27

카테고리: Python

태그: python

💡 파이썬에서 자주 쓰이는 라이브러리, 통계, 데이터 분석에 유용한 라이브러리들을 학습하고 정리한 페이지입니다.
관심분야인 야구에 관한 응용문제를 직접 만들어 풀어보며 학습했습니다.

05. 함수형 프로그래밍 모듈

05-01. itertools.cycle

itertools.cycle(iterable)함수는 iterable을 순서대로 무한히 반복시키는 이터레이터를 생성하는 함수이다.

import itertools
pool = itertools.cycle(['양현종', '놀린', '로니', '이의리','윤중현'])
next(pool) #출력할 때마다 순서대로 바뀜.

위 코드의 next함수는 파이썬 자체 내장 함수로, 이터레이터의 다음 요소를 리턴하는 함수이다.

❓ 문제.

기아 타이거즈의 선발 로테이션은 ['양현종', '놀린', '로니', '이의리','윤중현']이다. 
프로야구 개막일(2022년 4월 2일)부터 6월 2일까지 로테이션이 예외없이 돌아간다고 가정할 때, 
각 날짜에 맞는 선발투수를 나열하라. 
이 때 `itertools`와 `datetime` 모듈을 사용하고, 월요일은 야구가 없는 날이니 로테이션을 건너뛴다.
예시) 
2022-04-04 야구가 없는 월요일입니다.
2022-04-05 선발투수는 로니.

💡 정답.

import itertools
pool = itertools.cycle(['양현종', '놀린', '로니', '이의리','윤중현'])

import datetime
open = datetime.date(2022,4,2)
close = datetime.date(2022,6,2)
days = close - open

for n in range(days.days):
  matchday = open+datetime.timedelta(days=n)
  if matchday.weekday()==0:
    print(matchday,"야구가 없는 월요일입니다.")
  else:
    print(matchday, "선발투수는 {}".format(next(pool)))

#결과
2022-04-02 선발투수는 양현종
2022-04-03 선발투수는 놀린
2022-04-04 야구가 없는 월요일입니다.
2022-04-05 선발투수는 로니
2022-04-06 선발투수는 이의리
2022-04-07 선발투수는 윤중현
2022-04-08 선발투수는 양현종
...중략...
2022-05-27 선발투수는 로니
2022-05-28 선발투수는 이의리
2022-05-29 선발투수는 윤중현
2022-05-30 야구가 없는 월요일입니다.
2022-05-31 선발투수는 양현종
2022-06-01 선발투수는 놀린

05-02. itertools.accumulate - 누적합 계산

itertools.accumulate(iterable) 함수는 iterable의 누적합을 계산하여 이터레이터로 리턴하는 함수이다.

한 회사의 월별 매출이 다음과 같다고 하자. 이 때 월별 누적합을 계산하려면 itertools.accumulate()함수를 사용하면 된다.

monthly_income = [1161,1814,1270,2256,1413,1842,2221,2207,2450,2823,2540,2134]

import itertools
result = list(itertools.accumulate(monthly_income))
print(result)

❓ 문제.

KIA 최원준의 2021시즌 월별 안타개수는 다음 딕셔너리와 같다
`choi_hit = {'4월':29, '5월':35, '6월':23, '7월':5, '8월':12, '9월':35, '10월':35}`
이 때 최원준의 월별 누적안타를 딕셔너리 형태로 출력하라.

💡 정답.

choi_hit = {'4월':29, '5월':35, '6월':23, '7월':5, '8월':12, '9월':35, '10월':35}
hit_list = choi_hit.values()
result = list(itertools.accumulate(hit_list))
choi_hit_acc = dict()
i=0
for n in range(4,11):
  choi_hit_acc["{}월까지 누적 안타".format(n)]=result[i]
  i+=1
print(choi_hit_acc)

#결과
{'4월까지 누적 안타': 29, '5월까지 누적 안타': 64, '6월까지 누적 안타': 87, '7월까지 누적 안타': 92, '8월까지 누적 안타': 104, '9월까지 누적 안타': 139, '10월까지 누적 안타': 174}

05-03. itertools.groupby - 키값으로 분류

itertools.groupby(iterable, key=None) 함수는 iterable을 key값으로 분류한 결과를 리턴하는 함수이다.

다음과 같이 선수이름과 포지션이 key로 주어진 딕셔너리의 묶음을 생각해보자. 이 때 포지션별로 분류해보자.

data = [
        {'name':'양현종', 'pos': 'p'},
        {'name':'김선빈', 'pos': '2b'},
        {'name':'김도영', 'pos': 'ss'},
        {'name':'박찬호', 'pos': 'ss'},
        {'name':'이승재', 'pos': 'p'},
        {'name':'나성범', 'pos': 'of'},
        {'name':'소크라테스', 'pos': 'of'},
        {'name':'윤중현', 'pos': 'p'},
]

우선 data를 포지션별로 sort해야 한다.

import operator
data = sorted(data, key=operator.itemgetter('pos'))

이제 itertools.groupby로 포지션별 그룹을 나눈다. (분류기준, 분류기준으로 묶인 데이터) 튜플을 반환하기 때문에 아래 반복문으로 딕셔너리를 만들어줄 수 있다.

grouped_data = itertools.groupby(data, key=operator.itemgetter('pos'))

result = {}
for key, group_data in grouped_data:
  result[key] = list(group_data)
import pprint
pprint.pprint(result)

#결과
{'2b': [{'name': '김선빈', 'pos': '2b'}],
 'of': [{'name': '나성범', 'pos': 'of'}, {'name': '소크라테스', 'pos': 'of'}],
 'p': [{'name': '양현종', 'pos': 'p'},
       {'name': '이승재', 'pos': 'p'},
       {'name': '윤중현', 'pos': 'p'}],
 'ss': [{'name': '김도영', 'pos': 'ss'}, {'name': '박찬호', 'pos': 'ss'}]}

❓ 문제.

2021시즌 LAA의 선수들을 포지션별로 그룹화하여 나타내려고 한다. 
데이터는 MLB API를 이용하고, itertools 모듈을 활용하여 그룹화하라

💡 정답.

#데이터 가져오기
import pprint
player={}
for n in statsapi.get('sports_players',{'season':2021, 'gameType':'R'})['people']:
  if n['currentTeam']['id'] == 108: #팀id가 108(LAA)인 팀일 경우
    player['name'] = n['fullName'] #'name' 키에 선수 이름을 value로
    player['pos'] = n['primaryPosition']['abbreviation'] #'pos' 키에 선수 포지션 약어를 value로
    pprint.pprint(player)

#그룹화
import operator
import itertools
data = sorted(data, key=operator.itemgetter('pos'))
grouped_data = itertools.groupby(data, key=operator.itemgetter('pos'))

result = {}
for key, group_data in grouped_data:
  result[key] = list(group_data)
pprint.pprint(result)

#결과
{'1B': [{'name': 'Jared Walsh', 'pos': '1B'}],
 '2B': [{'name': 'David Fletcher', 'pos': '2B'},
        {'name': 'Kean Wong', 'pos': '2B'}],
 '3B': [{'name': 'Phil Gosselin', 'pos': '3B'},
        {'name': 'Anthony Rendon', 'pos': '3B'},
        {'name': 'Jose Rojas', 'pos': '3B'}],
 'C': [{'name': 'Anthony Bemboom', 'pos': 'C'},
       {'name': 'Drew Butera', 'pos': 'C'},
       {'name': 'Jack Kruger', 'pos': 'C'},
       {'name': 'Max Stassi', 'pos': 'C'},
       {'name': 'Kurt Suzuki', 'pos': 'C'},
       {'name': 'Matt Thaiss', 'pos': 'C'}],
 'CF': [{'name': 'Juan Lagares', 'pos': 'CF'},
        {'name': 'Brandon Marsh', 'pos': 'CF'},
        {'name': 'Mike Trout', 'pos': 'CF'}],
 'LF': [{'name': 'Jo Adell', 'pos': 'LF'},
        {'name': 'Justin Upton', 'pos': 'LF'}],
 'OF': [{'name': 'Jon Jay', 'pos': 'OF'},
        {'name': 'Scott Schebler', 'pos': 'OF'}],
 'P': [{'name': 'Jaime Barria', 'pos': 'P'},
       {'name': 'Dylan Bundy', 'pos': 'P'},
       {'name': 'Griffin Canning', 'pos': 'P'},
       {'name': 'Steve Cishek', 'pos': 'P'},
       {'name': 'Alex Claudio', 'pos': 'P'},
       {'name': 'Alex Cobb', 'pos': 'P'},
...생략...

05-04. itertools.zip_longest - 사이즈가 큰 것을 기준으로 묶기

만약 다음에 주어진 두 리스트를 zip한다면 아래처럼 ‘이광수’, ‘김승민’은 짝을 찾지 못한다.

students = ['한민서', '황지민', '이영철', '이광수', '김승민']
rewards = ['사탕', '초컬릿', '젤리']
result = zip(students, rewards)
print(list(result))
#결과
[('한민서', '사탕'), ('황지민', '초컬릿'), ('이영철', '젤리')]

이런 경우 itertools.zip_longest(iterables, fillvalue)함수를 사용하면 짝을 찾지 못한 것을 fillvalue에 입력한 값으로 채워준다.

import itertools
students = ['한민서', '황지민', '이영철', '이광수', '김승민']
rewards = ['사탕', '초컬릿', '젤리']
result = itertools.zip_longest(students, rewards, fillvalue='새우깡')
print(list(result))

#결과
[('한민서', '사탕'), ('황지민', '초컬릿'), ('이영철', '젤리'), ('이광수', '새우깡'), ('김승민', '새우깡')]

05-05. itertools.permutations - 순열

[’1’, ‘2’, ‘3’]이라는 세 장의 카드 중 두 장을 뽑아 만들 수 있는 두 자리 수를 구해보자. 이는 순열 3P2와 같다.

순열을 구하는 함수는 itertools.permutations()이다.

import itertools
for a,b in list(itertools.permutations(['1','2','3'],2)):
  print(a+b)

#결과
12
13
21
23
31
32

순열이 아닌 조합을 알고 싶으면 itertools.combinations()를 사용하면 된다.

중복을 허용하는 중복조합은 itertools.combinations_with_replacement를 사용하면 된다.

05-06. functools.partial - 인수를 지정하여 함수 재정의

functools.partial함수는 하나 이상의 인수가 이미 채워진 함수의 새 버전을 만들 때 사용하는 함수이다.

다음 코드는 choice 매개변수에 따라 *args 인수를 합하거나 곱하는 함수이다.

def add_mul(choice, *args):
    if choice == "add":
        result = 0
        for i in args:
            result = result + i
    elif choice == "mul":
        result = 1
        for i in args:
            result = result * i
    return result

print(add_mul('add', 1, 2, 3, 4, 5))
print(add_mul('mul', 1, 2, 3, 4, 5))

#결과
15
120

만약 여기서 choice 매개변수를 미리 지정해놓고 add(1,2,3,4,5)만으로 15를 출력하는 함수를 만들려면 partial을 사용하면 된다.

add = partial(add_mul, 'add')
mul = partial(add_mul, 'mul')

print(add(1,2,3,4,5))  # 15 출력
print(mul(1,2,3,4,5))  # 120 출력

05-10. funtools.reduce

functools.reduce(function, iterable)함수는 function을 iterable의 요소에 차례로(왼쪽에서 오른쪽으로) 누적 적용하여 iterable을 단일 값으로 줄여나가는 함수이다.

다음 함수는 data의 모든 요소를 더하는 함수이다.

def add(data):
    result = 0
    for i in data:
        result += i
    return result

functools.reduce를 사용하면 이와 동일한 기능을 하는 함수를 만들 수 있다.

reduce는 lambda함수를 data의 가장왼쪽부터 차례로 수행하는 기능을한다. 아래 예시에선 ((((1+2)+3)+4)+5)처럼 기능한다.

import functools
data = [1,2,3,4,5]
result = functools.reduce(lambda x, y: x+y, data)
print(result)

같은 원리로 아래 코드는 최댓값을 구한다.

num_list = [3, 2, 8, 1, 6, 7]
max_num = functools.reduce(lambda x, y: x if x > y else y, num_list)
print(max_num)

05-11. operator.itemgetter - 다중 수준 정렬

선수 명단이 다음과 같이 튜플로 주어질 때 등번호순으로 정렬을 해보자.

player = [
          ("양현종", 54, 'P'),
          ("김도영", 5, 'SS'),
          ("최형우", 34, 'DH'),
          ("소크라테스", 30, "CF")
]

이 문제는 sorted 함수의 key 매개변수에 itemgetter(1)을 지정하면 된다. 여기서 1은 튜플의 두번째 요소를 의미한다.

from operator import item
sorted_player = sorted(player, key=itemgetter(1))
print(sorted_player)

#결과
[('김도영', 5, 'SS'), ('소크라테스', 30, 'CF'), ('최형우', 34, 'DH'), ('양현종', 54, 'P')]

만약에 정렬대상이 튜플이 아니라 딕셔너리라면 itemgetter()의 인수에 인덱싱 번호가 아니라 key 이름을 입력해야 한다.

from operator import itemgetter
player = [
          {"name":"양현종", "num":54, "pos":'P'},
          {"name":"김도영", "num":5, "pos":'SS'},
          {"name":"최형우", "num":34, "pos":'DH'},
          {"name":"소크라테스", "num":30, "pos":"CF"}
]

sorted_player = sorted(player, key=itemgetter('num'))
print(sorted_player)

Twitter Facebook LinkedIn

Hee Jun Kim🍀

Python 라이브러리 공부 기록 05. 함수형 프로그래밍 모듈

05. 함수형 프로그래밍 모듈

05-01. itertools.cycle

05-02. itertools.accumulate - 누적합 계산

05-03. itertools.groupby - 키값으로 분류

05-04. itertools.zip_longest - 사이즈가 큰 것을 기준으로 묶기

05-05. itertools.permutations - 순열

05-06. functools.partial - 인수를 지정하여 함수 재정의

05-10. funtools.reduce

05-11. operator.itemgetter - 다중 수준 정렬

공유하기

참고

에브리타임 원하는 게시판 본문과 댓글 등 크롤링

깃허브 블로그와 깃허브 커밋 기록 크롤링하기

[NLP] 위키피디아 문서 요약 시스템

[NLP] huggingface 나무위키 데이터셋 전처리