????????????? Hadoop ? Mahout - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

????????????? Hadoop ? Mahout

Description:

Hadoop Mahout . . . ... – PowerPoint PPT presentation

Number of Views:283
Avg rating:3.0/5.0
Slides: 43
Provided by: 2098211
Category:

less

Transcript and Presenter's Notes

Title: ????????????? Hadoop ? Mahout


1
????????????? Hadoop ? Mahout ? ???????? ????????
??????? ?????? ???????????? ?.?.?. ???.????????
?????????????????????? ???????????? ??????????
2
Hadoop ? Mahout ??????? ?.?.
Big Data
  • Big Data ?????? ????????? ??????? ???????
    ??????
  • ????????? ? ?????????
  • ??????? ???????? ????????? ?????????? ???????
    ????????????? ?????????
  • ?????? ?????????? Gartner ? IDC
  • Big Data ?????? ? ??? 10 ???????? ??????
    ????????? ???????? ?????????????? ??????????
  • ????? Big Data ???? ?? ????? ??????????????
  • MapReduce ???? ?? ???????? ?????????? ???????
    ????????? ?????? ? Big Data

3
Hadoop ? Mahout ??????? ?.?.
????
  • ?????? MapReduce ? Apache Hadoop
  • ?????????? Hadoop
  • ???????? ???????? ? Apache Mahout

4
Hadoop ? Mahout ??????? ?.?.
??????? Hadoop ? MapReduce
  • ?????????? MapReduce ????????? ? Google ???
    ??????? ?????? ? ????????
  • ???? ??????? ? ???????????? ??????? ??????
    ?????? ?? ??????? ???????????, ???????????? ?????
  • Goggle ?? ?????????????? ???? ??????????
    MapReduce
  • Jeffrey Dean, Sanjay Ghemawat. MapReduce
    Simplified Data Processing on Large Clusters
  • Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung.
    The Google File System
  • Apache Hadoop ???????? ?????????? MapReduce
  • ?????????? ?? ?????? ???? Google
  • ??????? ?? Java
  • http//hadoop.apache.org/

5
Hadoop ? Mahout ??????? ?.?.
??? ?????????? Hadoop
  • ??? ?????????? Hadoop
  • ????? ??????? ??????? Hadoop ? Yahoo!
  • 4500 ????????
  • ???????????? ??? ????????? ??????? ? ???????
    ????????? ??????????

6
Hadoop ? Mahout ??????? ?.?.
???????? ?????????? Hadoop
  • HDFS (Hadoop Distributed File System) ????????
    ??????
  • MapReduce ????????? ??????

7
Hadoop ? Mahout ??????? ?.?.
HDFS
????
8
Hadoop ? Mahout ??????? ?.?.
HDFS
????
64??
64??
64??
9
Hadoop ? Mahout ??????? ?.?.
HDFS
????
64??
64??
64??
10
Hadoop ? Mahout ??????? ?.?.
HDFS
Name Node
Data Node 1
Data Node 2
Data Node 3
1, 4, 6
1, 3, 5
1, 2, 5
Data Node 4
Data Node 5
Data Node 6
11
Hadoop ? Mahout ??????? ?.?.
?????? ? HDFS
  • ????? ?????? ? HDFS ???????????? ?? ??????
    ????????
  • ?????? ???????????? HDFS
  • ?? ???????? ??????????? ??????? ls, cp, mv ? ?.?.
  • ?????????? ???????????? ??????????? ???????
  • hadoop dfs cmd
  • ???????
  • hadoop dfs -ls
  • Found 3 items
  • -rw-r--r-- 1 hadoop supergroup 0 2011-06-22
    1358 /user/hadoop/file1
  • -rw-r--r-- 1 hadoop supergroup 0 2011-06-22
    1358 /user/hadoop/file2
  • -rw-r--r-- 1 hadoop supergroup 0 2011-06-22
    1358 /user/hadoop/file3
  • hadoop dfs -put /tmp/file4
  • hadoop dfs -cat file4
  • Hello, world!

12
Hadoop ? Mahout ??????? ?.?.
??????????? HDFS
  • HDFS ?????????????????? ???????? ???????,
    ???????????????? ??? ???????????? ?????????
    ?????? ? ???????? ???????
  • ???????? ?? ??? ???? ?????!
  • ?????? Write Once Read Many
  • ?????? ???????? ????, ????? ?????? ????????? ?
    ?????
  • ??????? ?????? ?????
  • ??-???????? 64 ?? (????? 128 ??? 256 ??)
  • ?? ?????????? ???????????? ?????? (???? ?????? ?
    ?.?.)

13
Hadoop ? Mahout ??????? ?.?.
MapReduce
  • MapReduce ?????????? ?????????????? ??????????
  • ???? MapReduce ????????? ?????? ?????????? ?
    ??????????? ??????????????? ??????????????
  • ??????????? ????????? ?????? ?????? ??????????
  • ?????????????? ?????? ? ???????? ??????????????
    ?????????????
  • MapReduce ???????? ? ??????? ??? ? ??????
    ????????????
  • ???????? ? ????? ?????
  • ????????????? ???????????? ???????
  • ???????????? ?????? ??????
  • ????????? ????? ??????? ? ???????

???????? http//www.youtube.com/watch?vSS27F-hYW
fU
14
Hadoop ? Mahout ??????? ?.?.
??????? Map ? Reduce
???????? http//developer.yahoo.com/hadoop/tutori
al/module4.html
15
Hadoop ? Mahout ??????? ?.?.
?????? MapReduce WordCount
  • ?????? ?????????, ??????? ??? ????? ???????????
    ? ?????
  • ?????????? ????????? ? Web-?????????
  • ?????????? ????????? ????? ??? ?????????????
    ?????
  • ???????? ??????
  • ????????? ?????
  • ?????? ???? ??????? ?? ???? ????????????
  • ??????
  • ???? MapReduce ????????? ?????? ?????????? ?
    ??????????? ??????????????? ??????????????.
    ??????????? ????????? ?????? ?????? ??????????

16
Hadoop ? Mahout ??????? ?.?.
WordCount ??????? Map
  • ???????? ??????
  • ???? MapReduce ????????? ?????? ?????????? ?
    ??????????? ??????????????? ??????????????.
    ??????????? ????????? ?????? ?????? ??????????
  • ?????????? ?????????
  • lt????, 1gt, ltmapreduce,1gt, lt?????????, 1gt,
    lt??????,1gt, lt??????????, 1gt, lt?, 1gt,
    lt???????????, 1gt, lt???????????????, 1gt,
    lt??????????????, 1gt, lt???????????, 1gt,
    lt?????????, 1gt, lt??????,1gt, lt??????, 1gt,
    lt??????????, 1gt
  • ?????????? ? ??????????? ?? ?????
  • ltmapreduce,1gt, lt??????????????, 1gt, lt?, 1gt,
    lt??????,1gt, lt??????, 1gt, lt???????????, 1gt,
    lt??????????, 1gt, lt??????????, 1gt, lt???????????,
    1gt, lt?????????, 1gt, lt???????????????, 1gt,
    lt?????????, 1gt, lt??????,1gt, lt????, 1gt.

17
Hadoop ? Mahout ??????? ?.?.
WordCount ??????? Reduce
  • ???? ? ??????????? ??????? ?????????? ? ????
    ??????? Reduce
  • ltmapreduce,1gt ? ltmapreduce,1gt
  • lt??????????????, 1gt ? lt??????????????, 1gt
  • lt?, 1gt ? lt?, 1gt
  • lt??????,1gt, lt??????, 1gt ? lt??????, 2gt
  • lt???????????, 1gt ? lt???????????, 1gt
  • lt??????????, 1gt, lt??????????, 1gt ? lt??????????,
    2gt
  • lt???????????, 1gt ? lt???????????, 1gt
  • lt?????????, 1gt ? lt?????????, 1gt
  • lt???????????????, 1gt ? lt???????????????, 1gt
  • lt?????????, 1gt ? lt?????????, 1gt
  • lt??????,1gt ? lt??????,1gt
  • lt????, 1gt ? lt????, 1gt

18
Hadoop ? Mahout ??????? ?.?.
?????? MapReduce
  • MapReduce ???????? ?????? ? ??????? ??????
    WordCount
  • ???? ????? ??????? ? ?????????? ???????? ???????
  • ??????????? MapReduce
  • ??????????? ??????????????? ?????????????????
    ??????? Map ? Reduce ????? ???????????? ????????
    ?????? ??????????? ?? ???????? ???? ?? ?????
  • ???????????????? ?????? ????? ??????????? ??
    ?????? ???????? (? HDFS) ? ?????????????? ?????
    ?? ?????? ????????
  • ?????????????????? ??? ?????? ?? ????? ???????
    ??????? Map ??? Reduce ??????????? ?? ??????
    ???????
  • ?????????? MapReduce
  • ????????????? ???????? ????????? ??????
  • ??????? ????????? ??????? ?? ?????????????????

19
Hadoop ? Mahout ??????? ?.?.
??????????? ?????????? ? ??????
20
Hadoop ? Mahout ??????? ?.?.
?????? ??????? ?????? Hadoop
  • hadoop jar hadoop-examples-.jar grep input
    output 'dfsa-z.'
  • hadoop-examples-.jar ??? ?????? ? ????????? ??
    ???????????? Hadoop
  • grep ??? ??????? ? ?????? ? ?????????
  • input ??????? ??????? ?????? (? HDFS)
  • output ??????? ???????? ?????? (? HDFS)
  • 'dfsa-z.' ?????? ??? ??????

21
Hadoop ? Mahout ??????? ?.?.
?????????? Hadoop
  • MapReduce ?????? ?????? ????????????????, ??
    ??????????????
  • ?????????? ??????????? ???????? ??????????
    ??????? ??????? ???????????
  • Hadoop ?????? ? ????????? ? ?????????????????
  • ?? ?????? Hadoop ????????? ??????????
  • ??????????? ???????? ??? ??????? ?????????
    ?????????? ?????, ???????????? Hadoop ???
    ???????????????
  • ???????????? Hadoop
  • ???????? ??????? ??? Hadoop

22
Hadoop ? Mahout ??????? ?.?.
?????????? Hadoop
  • Pig ????????????? ???? ??????? ??????
  • Hive ?????? ?????? ? ?????????????? ?????,
    ???????? ? SQL
  • Oozie ????? ????? ? Hadoop
  • Hbase ???? ?????? (?????????????), ??????
    Google Big Table
  • Mahout ???????? ????????
  • Sqoop ??????? ?????? ?? ????? ? Hadoop ?
    ????????
  • Flume ??????? ????? ? HDFS
  • Zookeeper, MRUnit, Avro, Giraph, Ambari,
    Cassandra, HCatalog, Fuse-DFS ? ?.?.

23
Hadoop ? Mahout ??????? ?.?.
???????????? Hadoop
  • Apache
  • hadoop.apache.org
  • ???????????? ???????????, ?????? Hadoop
  • ?????????????? ????????????
  • ????????? Hadoop, HBase, Pig, Hive, Mahout,
    Sqoop, Zookeeper ? ??.
  • ???????? ????????????? ????????? ?
    ?????????????????, ??????????, ????????????
  • ?????????? ?????????????? ?????????????
  • Cloudera
  • MapR
  • Hortonworks
  • Intel

24
Hadoop ? Mahout ??????? ?.?.
???????? ??????? Hadoop
  • Amazon Elastic MapReduce (Amazon EMR)
  • http//aws.amazon.com/elasticmapreduce/
  • ??????????? ? MapR
  • Apache Hadoop on Rackspace
  • http//www.rackspace.com/knowledge_center/article/
    apache-hadoop-on-rackspace-private-cloud
  • ??????????? ? Hortonworks
  • Microsoft Windows Azure
  • http//www.windowsazure.com/en-us/home/scenarios/b
    ig-data/
  • Qubole Data Service
  • http//www.qubole.com/qubole-data-service
  • Web-????????? ??? ??????? ?????? ? Hadoop, Hive,
    Pig ? ??. ?? Amazon EMR

25
Hadoop ? Mahout ??????? ?.?.
Apache Mahout
  • ?????????????? ?????????? ????????? ????????
    (machine learning)
  • ?????? ??????
  • ? ???????? Hadoop
  • ???????? ?? ????? ??????????
  • Mahout ????? ?? ?????????? ?????, ????????
    ???????? ??????
  • ???????? ???????? ????
  • ??????? ?? Java
  • ???????? Apache 2.0
  • ???????? ???????
  • http//mahout.apache.org/

26
Hadoop ? Mahout ??????? ?.?.
???????? ???????? ? Mahout
  • ??????????????? (??????????) ??????????
  • ????????????
  • ?????????????
  • ??????????? ???????? ? ?????? (????????, ???????
    ?? ?????????)
  • ??????? Google News ?????????? ??????? ?? ????
    ????
  • ????????? ? Mahout K-Means, Fuzzy K-Means, Mean
    Shift, Dirichlet, Canopy ? ??.
  • ?????????????
  • ??????????? ?????????????? ??????? ? ?????????
    ?????? (?????? ???????? ???????)
  • ??????? ??????????? ?????, ??????????? ????????
    ?????? (????? ? ????????, ?????? ? ?.?.)
  • ????????? ? Mahout Logistic Regression, Naive
    Bayes, Support Vector Machines, Online Passive
    Aggressive ? ??.

27
Hadoop ? Mahout ??????? ?.?.
????????????
28
Hadoop ? Mahout ??????? ?.?.
???????????? ????????????
  • ??????? ??????? ???????????? ????? ???????????
    ??????? ????? ?? ?????? ? ???????
  • 1M NetflixPrize
  • ???????? Netflix ???????? ???????????? ??
    ????????? ????????? ???????????? DVD
  • ?????? ????? 1 ??????? ????????
  • ??????? ????????? ????? ???????? ????????
    ???????????? ?? 10
  • ???? ???????? ??????? BellKors Pragmatic Chaos
    ? 2009 ?.
  • ???????????? ????????? ? 2006 ?? 2009 ?.
  • ?????? ??? ???????????? ???? ?? ???????? 50 000
  • http//www.netflixprize.com/

29
Hadoop ? Mahout ??????? ?.?.
??????? ????????????
  • ?? ?????? ????????
  • ?????? ???????????? ?????? ????? ???????, ??????
    ????? ????????????? ??? ?????? ????? ??????? ???
    ???????????? ??????????
  • ?????????? ??????? ???????????? ?????? ???????
    ?? ?????? ? ????????????
  • ?? ?????? ????????????
  • ???????????? ?? ?????? ?????? ?????????????
  • ??????? ???????????? ????? ???? ??????
  • ????? ??????????? ????? ???????, ?? ????????? ??
    ??????
  • ?????????? ? Mahout

30
Hadoop ? Mahout ??????? ?.?.
????????????
  • ???????????? ? Mahout ???????? ?? ??????
    ???????????? ?????????????
  • ???????????? ? Mahout
  • ???????????? (????? ?????)
  • ?????? (????? ?????)
  • ???????????? (????? ??????? ????????)
  • ?????? ?????? ? ????????????? ??? Mahout ??
    ??????? GroupLens (??????????? ????????) ??????
    ?????????????? ???????
  • 196 242 3 881250949
  • 186 302 3 891717742
  • 22 377 1 878887116
  • 244 51 2 880606923
  • user id item id rating timestamp

(?? ???????????? ? Mahout)
31
Hadoop ? Mahout ??????? ?.?.
??????? ? ????????????
  • ?? ?????? ?????????????
  • ????? ????????????? ? ???????? ???????
  • ??????????, ??? ???????? ???? ?????????????
  • ????????????? ??????? ? ???????????? ?
    ?????????????? ??????? ?????????????
  • ?????????? ?????? ????? ??????????????,
    ???????????? ?????? ????????
  • ?? ?????? ????????
  • ????? ???????, ??????? ?? ??, ??????? ???????????
    ????????????
  • ????????????? ???????? ?????????? ?? ???
  • ???????????? ?????? ??????????????, ??????
    ???????? ???????? ?????. ???????????? ?????
    ???????????? ? ?????????? ?????? (?
    ?????????????? Hadoop)

32
Hadoop ? Mahout ??????? ?.?.
???????????? ?? ?????? ?????????????
public static void main(String args) throws
Exception DataModel model new FileDataModel
(new File("u.data")) UserSimilarity
similarity new PearsonCorrelationSimilarity
(model) UserNeighborhood neighborhood
new NearestNUserNeighborhood (2, similarity,
model) Recommender recommender new
GenericUserBasedRecommender ( model,
neighborhood, similarity)
ListltRecommendedItemgt recommendations
recommender.recommend(1, 1) for
(RecommendedItem recommendation
recommendations) System.out.println(recommen
dation) RecommendedItem item643,
value4.27682
33
Hadoop ? Mahout ??????? ?.?.
???????????? ?? ?????? ?????????????
???????? Sean Owen, Robin Anil, Ted Dunning, and
Ellen Friedman. Mahout in Action
34
Hadoop ? Mahout ??????? ?.?.
????? ??????? ?????????????
  • ??? ??????????, ??? ????? ????????????? ???????
  • ???? ????????? - ????? ?? -1 ?? 1.
  • 1 ????? ????????????? ?????????
  • 0 ? ????????????? ??? ????? ??????
  • -1 ????? ????????????? ??????????????
  • Mahout ?????????? ????????? ?????????? ???????
    ?????????
  • ??????????? ???????
  • ????????? ??????????
  • ?????????? ????????
  • ??????????? ????????
  • ??????????????? ?????????????

35
Hadoop ? Mahout ??????? ?.?.
???????? ????????????
????????????? ????? ??????? (NearestNUserNeighborh
ood )
?????? ? ???????? ??????? (ThresholdUserNeighborho
od)
???????? Sean Owen, Robin Anil, Ted Dunning, and
Ellen Friedman. Mahout in Action
36
Hadoop ? Mahout ??????? ?.?.
????? ??????????
  • ????? ??? ????????? ????????????? ??????
  • ????? ??? ????????? ??????
  • ???????? ??????
  • ???????????? ?????? ???
  • ?????????? ?????? ??? ?????? ??????
  • ????????? ???????????? ? ??????? ???????????!

???????? Sean Owen, Robin Anil, Ted Dunning, and
Ellen Friedman. Mahout in Action
37
Hadoop ? Mahout ??????? ?.?.
???????????? ?? ?????? ????????
public static void main(String args) throws
Exception DataModel model new FileDataModel
(new File("u.data")) ItemSimilarity
itemSimilarity new LogLikelihoodSimilarity(dataM
odel) ItemBasedRecommender recommender
new GenericItemBasedRecommender(dataModel,
itemSimilarity) ListltRecommendedItemgt
recommendations recommender.recommend(1, 1)
for (RecommendedItem recommendation
recommendations) System.out.println(recommen
dation) RecommendedItem item271,
value4.27682
38
Hadoop ? Mahout ??????? ?.?.
Mahout ? Hadoop
???????? Sean Owen, Robin Anil, Ted Dunning, and
Ellen Friedman. Mahout in Action
39
Hadoop ? Mahout ??????? ?.?.
Mahout ? Hadoop
  • Mahout ????? ???????? ??? ????????, ??? ? ?
    ???????? Hadoop
  • ?????? ???????????? Mahout ? Hadoop ???????????
    ? ??????? ?????? RecommenderJob
  • ?????? ? ????????????? ?????? ???? ???????? ?
    HDFS
  • ?????????? ???????????? ???????????? ? HDFS
  • ???????????? ????? ????????? ? ???? ?????? ?
    ??????? sqoop

40
Hadoop ? Mahout ??????? ?.?.
?????? ??????? Mahout ? Hadoop
hadoop jar mahout-core-0.7-job.jar \
org.apache.mahout.cf.taste.hadoop.item.Recommender
Job \ -Dmapred.input.dirinput
-Dmapred.output.diroutput --usersFile
users_list.txt
  • ????????? ?????????
  • Dmapred.input.dir ??????? ? ??????? ?
    ????????????? (? HDFS, ????? ???? ?????????
    ??????)
  • Dmapred.output.dir ???????, ???? ????????????
    ??????????????? ???????????? (? HDFS)
  • --usersFile ???? ? ????????????????
    ?????????????, ??? ??????? ????? ?????????????
    ????????????
  • --similarityClassname ??? ??????, ???????
    ????????? ?????? ?????????
  • --numRecommendations ?????????? ???????????? ??
    ?????? ????????????

41
Hadoop ? Mahout ??????? ?.?.
?????
  • MapReduce ??????????? ?????? ??? ?????????
    ??????? ??????? ?????? (BigData)
  • Hadoop ???????? ?????????? MapReduce
  • ?????????? Hadoop
  • Mahout ???????? ???????? ? Hadoop
  • ????????????, ?????????????, ?????????????
  • ???????????? ? Mahout
  • ???????????? ????????????, ??????, ??????
  • ???????????? ?? ?????? ????????????? ? ?? ??????
    ????????
  • ????????? ????????????? ? ????????
  • ????????? ?????????????
  • ?????? Mahout RecommenderJob ? Hadoop

42
Hadoop ? Mahout ??????? ?.?.
???????? ???????? ?????? ??????? avs_at_imm.ura
n.ru www.asozykin.ru
Write a Comment
User Comments (0)
About PowerShow.com