분석기법과 R 패키지

« 2026/4 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

ADsP(데이타분석준분석가)/ADsP 자격증(국가공인) 2018. 11. 17. 00:02

분석기법과 R 패키지

[분석기법과 R 패키지]

(1) reshape – melt(), cast()

- aqm= melt(airquality, id=c(“month”, “day”), na.rm=TRUE)

- a <- cast(aqm, day ~ month ~ variable)

(2) sqldf 패키지

- sqldf(“select * from iris where Species like ‘se%’ ”)

(3) plyr 패키지

- ddply(d, “year”, summarise, mean.count=mean(count))

(4) data.talble 패키지

- DF= data.table(x=c(‘b’, ‘b’, ‘b’, ‘a’, ‘a’), v=rnorm(5))

(5) 결측값(NA) 처리

– is.na(y), Amlia 패키지, complete.cases() 함수로 결측값 삭제

(6) 이상값 검색

– boxplot(x), outwidth = boxplot(x), outlier 패키지의 outlier(y)

(7) 확률 분포별 난수 발생 함수

- 정규분포 – rnorm()

- t분포 – rt()

- F분포 – rf()

- 연속 균등 분포 – runif()

(8) 회귀 분석 – lm()

- 전진선택법 – step(lm(종속변수 ~ ., ..., direction=“forward”)

- 후진선택법 – step(lm(종속변수 ~ ., ..., direction=“backward”)

- 단계적방법 – step(lm(종속변수 ~ ., ..., direction=“both”)

(9) 상관 계수 – corr(), 공분산 – cov()

- 피어슨 상관계수 – rcorr(as.matrix(mtcars), type=“pearson”)

- 스피어만 상관계수 – rcorr(as.matrix(test), type=“spearman”)

(10) 다차원 척도법(MDS) - cmdscale()

(11) 주성분 분석(PCA) - princomp(USArrests, cor=TRUE)

(12) 시계열 예측

- ARIMA 모형 – arima(Nile, order=c(1,1,1))

- 분해시계열 – decompose(ldeaths)

(13) 로지스틱 회귀모형

– a <- glm(Species~Sepal.Length, data=a, family=binomial)

(14) 새로운 자료 예측

- predict()

(15) 신경망 모형

- {nnet} 패키지

nn.iris <- nnet(Species~., data=iris, size=2, rang=0.1, ...)

- {neuralnet} 패키지

infert <- neuralnet(case~., data=infert, hidden=2,

err.fct=“ce”, linear.output=FALSE, likelihooe=TRUE)

(16) 의사결정 나무

- {rpart} 패키지의 rpart() 함수

c <- rpart(Species~., data=iris)

- 예측

predict(c, newdata=iris, type=“class”)

- {party}의 ctree() 함수

tree <- ctree(ploidy~., data=trainData)

- 예측

predict(tree, newdata=testData)

(17) 앙상블 모형

① 배깅(bagging) : {adabag} 패키지의 bagging() 함수

- iris.bagging <- bagging(Species~., data=iris, mfinal=10)

② 부스팅(boosting) : {adabag} 패키지의 boosting() 함수

- boo.adabag <- boosting(Species~., data=iris, boos=TRUE, mfinal=10)

③ 랜덤 포레스트

- rf <- randomForest(ploidy~., data=trainData, ntree=100, proximity=TRUE)

(18) 오분류표 : {caret} 패키지의 confusionMatrix() 함수

- nn_con <- confusionMatrix(nn_pred, testData$Species)

(19) ROC 그래프 : {Epi} 패키지의 ROC() 함수

- nn_ROC <- ROC(form=case~net_pred, data=testData, plot=“ROC”)

(20) 이익도표

- n_lift <- performance(n_r, “lift”, “rpp”)

(21) 군집 분석

- 병합적 방법 : hclust(), {cluster} 패키지의 agnes(), mclust()

-- 거리 지정 : dist() 함수, method=“옵션”

-- 병합 방법 지정 : hclust() 함수, method=“옵션”

d <- dist(USArrests, method=“euclidean”)

fit <- hclust(d, method=“ave”)

- 분할적 방법 : {cluster} 패키지의 diana(), mona()

(22) k-평균 군집 : kmeans() 함수

- 군집분석 하기 전에 scale() 함수로 표준화 수행

- 적절한 군집 수 정하기 위해 wssplot() 함수 수행

- fit.km <- kmeans(df, 3, nstart=25)

(25) SOM : {kohonen} 패키지 som() 함수

(26) 연관 분석 : {arules} 패키지의 apriori() 함수

- adult.rules <- apriori(Adult,

parameter=list(support=0.1, confidence=0.6),

appearance = list(rhs=c(‘income==small’, ‘income=large’),

default=’lhs’),

control=list(verbose=F))

- adult.rules.sorted <- sort(adult_rules, by=’lift’)

저작자표시 비영리 변경금지 (새창열림)

'ADsP(데이타분석준분석가) > ADsP 자격증(국가공인)' 카테고리의 다른 글

ADsP(데이타분석준전문가)자격증(국가공인) 취득 노하우 (0)	2018.07.25

posted by swgooddream

swgooddream

Category

Notice

Tag

calendar

Recent Post

Recent Comment

Archive

My Link

분석기법과 R 패키지

'ADsP(데이타분석준분석가) > ADsP 자격증(국가공인)' 카테고리의 다른 글

티스토리툴바

swgooddream

Category

Notice

Tag

calendar

Search

Recent Post

Recent Comment

Archive

My Link

분석기법과 R 패키지

'ADsP(데이타분석준분석가) > ADsP 자격증(국가공인)' 카테고리의 다른 글

티스토리툴바