Genreanalyysimenetelm - PowerPoint PPT Presentation

About This Presentation
Title:

Genreanalyysimenetelm

Description:

Title: Genreanalyysimenetelm Author: tukilpe Last modified by: yr m , Sami Kari-Pekka Created Date: 4/1/2003 11:17:32 AM Document presentation format – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 39
Provided by: tuki
Category:

less

Transcript and Presenter's Notes

Title: Genreanalyysimenetelm


1
  • Data mining
  • Demo 1
  • 28.11.2007

2
Introduction
  • This slide set contains
  • Very easy and supervised introductory material
    for MATLAB
  • Homework (Task 1 and Task 2 in the last two
    slides)
  • If you are already expert in MATLAB then you can
    skip the introduction and start to work with the
    homeworks
  • By returning homeworks (a short report about the
    accomplished tasks) one can get credits for the
    final exam (total 3 x 2points, exam max is 4x6
    24 points)
  • The reports for the tasks of Demo 1 must be
    returned not later than 10.12.2007
  • The reports can be returned by e-mail
    (sami.ayramo_at_jyu.fi), or to the office room (Ag
    C416.2), or to mailbox (which can be found two
    meter away from the office door)
  • Some additional codes that may be useful for the
    homework can be found at http//users.jyu.fi/sami
    ayr/DM/demot

3
Requirements
  • Basic computer skills (e.g., starting
    applications, opening, closing and saving files,
    cutting and pasting text, directory structures,
    )
  • Know how to use a text editor, such as Windows
    Notepad, that you can use to write MATLAB
    programs (MATLAB also has its own built-in text
    editor which you can use)
  • Basic algebra and trigonometry
  • Knowledge of basic linear algebra (i.e., concepts
    such as matrix, vector, inverse etc.) would also
    be very helpful
  • While you are following this introduction, have
    MATLAB running in a separate window and perform
    and experiment with the examples
  • This introduction is extracted, modified and
    compressed from the one available at the
    Mathworks Student Center
  • http//www.mathworks.com/academia/student_center/t
    utorials/

4
Facts about MATLAB
  • MATLAB is a computer program
  • for solving the sorts of mathematical problems
    frequently encountered, for example, data mining,
    data analysis, statistics, simulation,
    engineering, mathematical modelling
  • Built-in features of MATLAB to enables effortless
    solving of a wide variety of numerical problems
  • from the very basic, such as a system of 2
    equations with 2 unknowns
  • X 2Y 24
  • 12X - 5Y 10
  • to the more complex, such as factoring
    polynomials, fitting curves to data points,
    making calculations using matrices, performing
    signal processing operations such as Fourier
    transforms, and building and training neural
    networks.
  • MATLAB can be used to plot many different kinds
    of graphs, enabling the visualization of complex
    mathematical functions and laboratory data
  • The three images below have been created using
    MATLAB plotting functions

Images are taken from www.mathworks.com
5
Starting MATLAB
  • You can start MATLAB by double-clicking on the
    MATLAB icon
  • The MATLAB Desktop will then pop-up

6
Entering commands in MATLAB
  • gtgt is the command prompt,
  • Type a command at a command prompt and MATLAB
    executes the command you typed in, and then
    prints out the result
  • Ex1 Enter a simple MATLAB command date to see
    how it works
  • Ex2 Try also the clc command (clear command
    window)
  • Ex3 To exit MATLAB can just enter quit at the
    MATLAB command prompt
  • To get a good feel for the kinds of things you
    can use MATLAB for, also many different demos are
    provided, all accessible from a demo window that
    is popped up when you type, demo, at the command
    prompt

7
Getting help
  • MATLAB has an extensive help system built into
    it, containing detailed documentation and help
    information on all of the commands and functions
    of MATLAB
  • To obtain help on a given function there are
    three main functions help, helpwin (short for
    help window) or doc (short for documentation).
  • help and helpwin give you the same information,
    but in a different window, the doc command
    returns an HTML page with a lot more information
  • Ex4 Find help on the date function using the
    different functions
  • Another source of help is the MATLAB help browser
  • you can invoke the MATLAB help browser by
  • typing helpbrowser at the MATLAB command prompt
  • clicking on the help button ?
  • by selecting Start-gtMATLAB-gtHelp from the MATLAB
    desktop
  • Tutorials and documents can also be found at
    www.mathworks.com in large amounts

8
Working with variables
  • Variables are a fundamental concept in MATLAB and
    are used all the time
  • In its simplest mode of use, MATLAB can be used
    just like a pocket calculator
  • MATLAB supports all the basic arithmetic
    operations , -, , /, , etc. and you can
    group and order operations by enclosing them in
    parentheses
  • Ex5 Try the following calculator-like operations
    with MATLAB by typing
  • 4 10
  • 5 10 6
  • (6 6) / 3
  • 92
  • What is ans? In short ans is short for "answer",
    and is used in MATLAB as the default variable
    name when none is specified
  • Ex6 Check the value of ans by typing ans
  • Ex7 Try to change the value of ans by typing ans
    6
  • You can also define and use your own variables
  • Ex8 Create three variables, chech the value of
    the first one, and calculate the average. For
    instance, enter the following commands
  • a 10
  • b 20
  • c 30
  • a
  • the_average (a b c) / 3

9
Working with variables
  • If you have defined a lot of different variables,
    you probably can't remember all the variable
    names you have defined. Therefore, it is nice to
    get a list of all the variables currently
    defined. Simply typing whos at the command prompt
    will return to you the names of all variables
    that are currently defined.
  • Ex9 Try the sequence of the following commands
  • clear
  • a 5
  • b 6
  • whos
  • Typing clear at the command prompt will remove
    all variables and values that were stored up to
    that point.
  • Ex10 For example, continue from the above
    example
  • whos
  • clear
  • whos

10
Working with variables
  • If a command is followed by a semicolon () then
    MATLAB evaluates the expression and store the
    result internally, but will not print put the
    result
  • The user is mainly concerned only with some final
    result in your MATLAB sessions, which will be
    calculated by combining many temporary,
    intermediate variables and by appending a
    semicolon to the expressions that assign values
    to the temporary, intermediate variables causes
    their results to not be printed
  • Ex11 Compare the following expressions
  • a 4 5
  • b 5 6

11
Working with variables
  • In MATLAB, there are some specific rules for what
    you can name your variables
  • Only use primary alphabetic characters (i.e.,
    "A-Z"), numbers, and the underscore character
    (i.e., "_") in your variable names.
  • You cannot have any spaces in your variable
    names
  • For example, using "this is a variable" as a
    variable name is not allowed, but
    "this_is_a_variable" is fine
  • MATLAB is case sensitive.
  • For example, "A_VaRIAbLe", "a_variable",
    "A_VARIABLE", and "A_variablE" would all be
    considered distinct variables in MATLAB
  • Using single quotes one can also assign pieces of
    text to variables
  • Ex12 For example, try
  • some_text 'This is some text assigned to a
    variable!'
  • some_text
  • Be careful not to mix up variables that have text
    values with variables that have numeric values in
    equations

12
MATLAB the matrix laboratory
  • Three fundamental concepts in MATLAB, and in
    linear algebra, are
  • A scalar is simply just a fancy word for a
    number (a single value)
  • A vector is an ordered list of numbers
    (one-dimensional)
  • In MATLAB they can be represented as a row-vector
    or a column-vector
  • A matrix is a rectangular array of numbers
    (multi-dimensional)
  • In MATLAB, a two-dimensional matrix is defined by
    its number of rows and columns
  • Both scalars and vectors can be considered a
    special type of matrix.
  • A scalar is a matrix with a row and column
    dimension of one (1-by-1 matrix)
  • A vector is a one-dimensional matrix one row
    and n-number of columns or n-number of rows and
    one column
  • All calculations in MATLAB are done with
    "matrices". Hence the name MATrix LABoratory.

13
MATLAB the matrix laboratory
  • In MATLAB matricies are defined inside a pair of
    square braces ()
  • A comma (,) and semicolon () are used as a row
    separator and column separator, respectfully
  • Note you can also use a space as a row
    separator, and a carriage return (the enter key)
    as a column separator as well
  • Ex13 Try the examples to see how a scalar, and
    row and column vectors, can be created
  • my_scalar 3.1415
  • my_vector1 1, 5, 7
  • my_vector2 1 5 7

14
MATLAB the matrix laboratory
  • What about a two dimensional matrix?
  • Ex14 Create a 4-by-3 matrix called my_matrix
    with the numbers 8, 12, and 19 in the first row,
    7, 3, 2 in the second row, 12, 4, 23 in the third
    row, and 8, 1, 1, in the fourth row by typing the
    following command
  • my_matrix 8, 12, 19 7, 3, 2 12, 4, 23 8,
    1, 1
  • You can also combine different vectors and
    matrices together to define a new matrix
  • Remember that the output needs to be a valid
    rectangular matrix
  • Ex15 Construct a matrix from row vectors by
    typing the following lines
  • row_vector1 1 2 3
  • row_vector2 3 2 1
  • matrix_from_row_vec row_vector1
    row_vector2
  • Ex16 Construct a matrix from column vectors by
    typing the following lines
  • column_vector1 13
  • column_vector2 28
  • matrix_from_col_vec column_vector1
    column_vector2
  • Ex17 Construct a matrix from a 4x3 matrix by
    typing the following lines
  • my_matrix 8, 12, 19 7, 3, 2 12, 4, 23 8,
    1, 1
  • combined_matrix my_matrix, my_matrix

15
Indexing vectors and matrices
  • Once a vector or a matrix is created you might
    needed to extract only a subset of the data, and
    this is done through indexing.
  • In a row vector the left most element has the
    index of one.
  • In a column vector the top most element has the
    index of one.
  • Ex17 Create vectors my_vector1 and
    my_vector2 and try to index into its values
  • my_vector1 1 5 7
  • my_vector2 1 5 7
  • my_vector1(1)
  • my_vector2(2)
  • my_vector1(3)
  • my_vector2(1)
  • my_vector2(2)
  • my_vector2(3)
  • The process is much the same for a
    two-dimensional matrix. The only difference is
    that you have to specify both the row and column
    indices.
  • Ex18 Access the value of 4 in my_matrix
  • my_matrix 8, 12, 19 7, 3, 2 12, 4, 23 8,
    1, 1
  • my_matrix(3,2)
  • Note The row number is first, followed by the
    column number.

16
Indexing vectors and matrices
  • You can also extract any contiguous subset of a
    matrix, by referring to the row range and column
    range you want.
  • Ex19 Try the following examples
  • mat 1 3 2 3 5 6 5 7 4 8 1 2 3 4 3 2 8 4 7 3
    2 3 2 3 4 1 4 2
  • mat(24,47)
  • mat(12,13 56)
  • You can change a number in a matrix by assigning
    to it
  • Ex20 Try to change the value of an element by
    the following commands
  • mat 1 3 2 2 3 4 7 3 2 1 4 2
  • mat(2,2) 999

17
Element-by-element operations
  • Element-by-element operations are performed on
    two vectors or matrices of the same size to get
    the result of the same size
  • For example, "element-by-element multiplication"
    of two vectors 1 2 3 and 4 5 6 would give you
    4 10 18.
  • The element-by-element operators in MATLAB are as
    follows
  • element-by-element multiplication "."
  • element-by-element division "./"
  • element-by-element addition ""
  • element-by-element subtraction "-"
  • element-by-element exponentiation "."
  • Ex21 Try the following operations (which of
    these works?)
  • a1 2 3
  • b4 5 6
  • c6 7 8
  • d6 7 8
  • a.b
  • a.c
  • c.d
  • c.d

18
Element-by-element operations
  • An additional note about element-by-element
    operators is that you can use them with scalars
    and vectors together
  • Ex22 Try the following operation
  • a 1 2 3 4 5 6
  • b a . 2
  • You can similarly use ".", "", and "-" with a
    vector and scalar.
  • Ex23 Try some examples
  • c a . 2
  • d a 2
  • e a 2
  • The reason that element-by-element multiplication
    and exponentiation operators have "." appended to
    the front of them, while the element-by-element
    addition and subtraction operators do not, is
    that there are other kinds of multiplication,
    division, and exponentiation operators (denoted
    by "" , "/"and ") for matrices, which are not
    element-by-element

19
Matrix operations
  • Element-by-element operations allow us to compute
    things on an element-by-element basis, but matrix
    operations allow us to perform matrix-based
    computation.
  • For example, the multiplication of two matrices,
    represented by "", performs a dot product of the
    two matrices. What the dot product does is that
    it first multiplies the corresponding elements
    (i.e., same position elements) of the two
    vectors, similar to what element-by-element
    multiplication does, and then adds up all the
    results of these multiplications to get a single,
    final number as the answer.
  • Ex24 Try the following matrix multipilication
  • a 1 2 3
  • b 4 5 6
  • a b
  • To get the answer "32", what MATLAB first
    performs the multiplications of the corresponding
    elements of the two vectors "14 4", "2510",
    and "3618". Then, to get the final answer of
    "32", MATLAB adds all these multiplications
    together "4101832".
  • The length of vectors and the size of matrices
    can be found by length and size functions
  • Ex25 Try the following examples
  • a 1 2 3
  • length(a)
  • mat 1 3 2 2 3 4 7 3 2 1 4 2
  • size(mat)

20
Plotting
  • The most basic plotting command in MATLAB is the
    plot command. The plot command, when called with
    two same-sized vectors X and Y, makes a
    two-dimensional line plot for each point in X and
    its corresponding point in Y. In other words, it
    will draw points at (X(1),Y(1)), (X(2),Y(2)),
    (X(3),Y(3)), etc., and then connect all these
    points together with lines.
  • Ex26 Try a very simple example to illustrate
    what the plot command does
  • simple_x_points 1 2 3 4 5
  • simple_y_points 25 0 20 5 15
  • plot(simple_x_points, simple_y_points)
  • The ordering of the vectors in the plot command
    is important
  • Ex27 Try the reversed order for the previous
    simple example
  • plot(simple_y_points, simple_x_points)

21
Plotting
  • To add text to a plot, you need to keep the
    figure window open (i.e., type the commands in
    the MATLAB command window while the figure window
    is still open).
  • The xlabel/ylabel command prints out a text
    string describing the x-axis/y-axis The title
    command prints out a title for your plot. Typing
    "grid on" at the command prompt, the grid lines
    will be added to the open figure window (typing
    "grid off" will get rid of the grid lines).
  • Ex28 Try to use these commands on the previous
    plot
  • simple_x_points 1 2 3 4 5
  • simple_y_points 25 0 20 5 15
  • plot(simple_x_points, simple_y_points)
  • xlabel('this is text describing the x-axis')
  • ylabel('this is text describing the y-axis')
  • title('this is text giving a title for the
    graph')
  • grid on

22
Plotting a parabola
  • Ex29 Let's look at a more practical example of
    plotting. First you need to create a vector of
    regularly spaced points and a vector of function
    values at those points for some function. Do this
    for the function "y x2" (i.e., a parabola) for
    x values between -5 and 5 and with regular
    spacing of .1
  • x_points -5 .1 5
  • y_points x_points . 2
  • Then plot the x_points against the y_points,
    and get the familiar plot of a parabola
  • plot(x_points,y_points)
  • xlabel('x-axis') ylabel('y-axis') title('A
    Parabola')
  • grid on
  • Note The result is very smooth you can't really
    see any of the individual line segments like you
    could for the simple example previously. That is
    because the points are so close together (at
    regular spacings of .1) --- MATLAB is still
    drawing line segments between the points, but
    your eye just can't see them because they are so
    small, and so the result seems to be a smooth
    curve.

23
Multiple plots
  • Using the hold command, you can add multiple
    plots in the same figure window, to compare the
    plots for example. (Normally, when you type a
    plot command, any previous figure window is
    simply erased, and replaced by the results of the
    new plot.)
  • If you type "hold on" at the command prompt, all
    line plots created after that will be
    superimposed in the same figure window and axes.
    Like wise the command "hold off" will stop this
    behavior, and revert to the default (i.e., new
    plot will replace the previous plot).
  • Ex30 Try the following example of how to plot
    several different exponential functions in the
    same axes (you need to define the points on
    x-axis only once)
  • x_points -10 .05 10
  • plot(x_points, exp(x_points))
  • grid on
  • hold on
  • plot(x_points, exp(.95 . x_points))
  • plot(x_points, exp(.85 . x_points))
  • plot(x_points, exp(.75 . x_points))
  • xlabel('x-axis') ylabel('y-axis')
  • title('Comparing Exponential Functions')

24
Subplots
  • In order to have multiple plots in the same
    window, but each in a separate part of the window
    (i.e., each with their own axes), you use the
    subplot command. If you type subplot(M,N,P) at
    the command prompt, MATLAB will divide the plot
    window into a bunch of rectangles --- there will
    be M rows and N columns of rectangles --- and
    MATLAB will place the result of the next "plot"
    command in the Pth rectangle (where the first
    rectangle is in the upper left).
  • Ex31 Try this example of a line plot, a
    parabola, an exponential, and the absolute value
    function into four rectangles in the same figure
    window
  • x_points -10 .05 10
  • line 5 . x_points
  • parabola x_points . 2
  • exponential exp(x_points)
  • absolute_value abs(x_points)
  • subplot(2,2,1)plot(x_points,line)
  • title('Here is the line')
  • subplot(2,2,2)plot(x_points,parabola)
  • title('Here is the parabola')
  • subplot(2,2,3)plot(x_points,exponential)
  • title('Here is the exponential')
  • subplot(2,2,4)plot(x_points,absolute_value)
  • title('Here is the absolute value')

25
Line Plots in Three Dimensions
  • MATLAB cover two different kinds of
    three-dimensional plots you can do in MATLAB, 1)
    three-dimensional line plots and 2) surface mesh
    plots.
  • The three-dimensional line plots are analagous to
    the two-dimensional line plots created with the
    plot command. The only difference is that the
    command has a "3" added to it, plot3, and that it
    requires an extra input, Z, for the third
    dimension.
  • Ex32 A simple example of using the plot3
    command, and the resulting output figure window
    (notice that you can also here use hold and
    subplot in the same way too)
  • X 10 20 30 40
  • Y 10 20 30 40
  • Z 0 210 70 500
  • plot3(X,Y,Z) grid on
  • xlabel('x-axis') ylabel('y-axis')
    zlabel('z-axis')
  • title('Pretty simple')

26
Three-Dimensional Surface Mesh Plots
  • The mesh and meshgrid commands can be used to
    create surface mesh plots, which show the surface
    of three-dimensional functions, such as "z x2
    y2"
  • The way it works is that
  • Generate a grid of points in the xy-plane using
    the meshgrid command
  • Evaluate the three-dimensional function at these
    points
  • Create the surface plot with the mesh command
  • Ex33 Try to generate the meshgrid and generate
    the surface mesh plot
  • x_points -10 1 10
  • y_points -10 4 10
  • X, Y meshgrid(x_points,y_points)
  • Z X.2 Y.2
  • mesh(X,Y,Z)
  • xlabel('x-axis')
  • ylabel('y-axis')
  • zlabel('z-axis')

27
MATLAB scripts
  • A MATLAB script is an ASCII text file that
    contains a sequence of MATLAB commands
  • the commands contained in a script file can be
    run, in order, in the MATLAB command window
    simply by typing the name of the file at the
    command prompt
  • Any text editor, such as Microsoft Windows
    Notepad, or wordprocessor, such as Microsoft
    Word, can used to create scripts, but the scripts
    must always be saved as simple text documents
    (i.e., in the "Save As" dialogue box, choose
    "Text Document" or its equivalent for "Save as
    type").
  • It is easiest to create scripts using MATLAB's
    built-in text editor, which automatically just
    saves files as ASCII text files for you.
  • When naming script files, you need to append the
    suffix ".m" to the filename, for example
    "my_script.m".
  • Scripts in MATLAB are also called "M-files"
    because of this, and the ".m" suffix tells MATLAB
    that the file is associated with MATLAB.

28
Creating MATLAB script
  • Ex34 Create a simple script that calculates the
    average of five numbers that are stored in
    variables. Start with typing edit
    average_script.m after the command prompt. Then
    add the following contents of the script file
    "average_script.m" in the MATLAB's built-in text
    editor
  • a simple MATLAB m-file to calculate the average
    of 5 numbers.
  • first define variables for the 5 numbers
  • a 5
  • b 10
  • c 15
  • d 20
  • e 25
  • now calculate the average of these and print it
    out
  • five_number_average (a b c d e) / 5
  • five_number_average
  • NOTE! Save the above script for the later use!
  • The text in green (i.e., the lines starting with
    --- all comment lines must start with ) are
    comments.

29
Running MATLAB script
  • If you saved the above script "average_script.m"
    into the present working directory, then it can
    be run simply by typing average_script at the
    MATLAB command prompt.
  • Ex35 Try to run it using the following sequence
    of commands in the command prompt
  • clear
  • whos
  • pwd
  • dir
  • average_script
  • whos

30
Saving variables 1
  • The save command can be used to save all or only
    some of your variables into a MATLAB data file
    type called MAT-file
  • If you want to choose the name of the file
    yourself, you can type save followed by the
    filename you want to use.
  • MATLAB will then save all currently defined
    variables in a file named with the name you chose
    followed by the suffix ".mat" (for example, if
    you chose the name my_variables MATLAB would save
    as "my_variables.mat" in your present working
    directory).
  • Before saving you should change your present
    working directory to one of your own directories
    (such as some directory on your floppy diskette),
    or specify the complete path to where you want
    MATLAB to save your variables (for example
    "a\my_variables\my_vars").
  • Ex36 Try this example of using save
  • clear
  • who
  • cd c\my_variables (replace this with your own
    folder)
  • pwd present working directory
  • a 10
  • b 20
  • c 30
  • d sqrt((a b c)/pi)
  • who
  • save my_chosen_filename (replace this with your
    own filename)
  • dir
  • clear
  • who

31
Saving variables 2
  • The above use of the save command saved all the
    MATLAB workspace defined variables. If you just
    want to save some of your variables, you simply
    list the variables you want to save after typing,
    save and the filename.
  • Ex37 Try to save only the variables a and c
  • clear
  • who
  • a 10
  • b 20
  • c 30
  • who
  • pwd
  • save some_of_my_variables a c (replace this with
    your own filename)
  • dir
  • clear
  • who

32
Loading variables 1
  • The load command is used for loading variables
    back in later to use them again. Typing load
    followed by a filename (without the ".mat"
    suffix) will search the MATLAB path (refer to the
    next lesson regarding the MATLAB path) for the
    file, "filename.mat", and load all the variables
    saved in that file (for example, typing load
    my_vars would cause MATLAB to search for
    "my_vars.mat" and load the variables saved in
    it).
  • Ex38 Try this example of loading variables back
    into MATLAB
  • clear
  • who
  • cd c\my_variables (replace this with your own
    folder)
  • dir
  • load my_chosen_filename (replace this with your
    own filename)
  • who
  • a
  • clear
  • who
  • load some_of_my_variables (replace this with
    your own filename)
  • who
  • c

33
Loading variables 2
  • You can also choose to load in only some of the
    variables that are saved in a MATLAB data file
    (MAT-file). To load only some of the variables
    saved in a file back into MATLAB, just type the
    names of the variables you want loaded back in
    after typing load and the filename (without
    ".mat") at the command prompt.
  • Ex39 Assuming that variables "a", "ans", "b",
    "c", and "d" are all saved in a file, you can use
    the load command to load only "a" and "c" back
    in
  • who
  • dir
  • whos -file my_chosen_filename (replace this with
    your own filename)
  • load my_chosen_filename a c (replace this with
    your own filename)
  • who
  • a

34
Working with Files, Directories and Paths
  • In general, files are managed, organized, and
    accessed in MATLAB in the same way as in
    Microsoft Windows, that is, in a hierarchical
    file system.
  • How MATLAB Finds Files?
  • MATLAB always look inside your present working
    directory (type pwd at the MATLAB command prompt
    to see your present working directory)
  • If the file is not located in the present
    working directory MATLAB will also search in
    other directories that are stored in the MATLAB
    path (The present working directory can also be
    thought of as part of the MATLAB path)
  • Ex40 To print out the current MATLAB path type,
    matlabpath or path, at the command prompt
  • If you want to store your MATLAB files in some
    directory that does not exist in the matlabpath,
    add the complete path to your directory to the
    MATLAB path.
  • Ex41 There are two ways you can append your own
    paths to the MATLAB path
  • use the addpath command - type addpath followed
    by the complete path to your directory
  • use the path tool of MATLAB - type pathtool at
    the command prompt, or select File-gtSet Path
  • addpath a\my_stuff\letters
  • matlabpath

35
Useful functions
  • pwd - present working directory
  • dir, or ls - List directory
  • what - List MATLAB-specific files in directory
  • cd - Change current working directory
  • path, or matlabpath - List the MATLAB search path
  • addpath - Add directory to search path
  • pathtool - Invoke the path tool interface
  • help general - List of general MATLAB commands

36
Exercises
  1. Download the well-known Iris data to your working
    directory from http//www.ics.uci.edu/mlearn/MLSu
    mmary.html
  2. Import the data into MATLAB by choosing from
    menu File-gtImport data-gt
  3. Perform some explorative DM for the Iris data
    set.
  4. Make a global summarization for the Iris data
    (for example, compute the mean, median, variance
    and range of the variables)
  5. Explore data by plotting 2-dimensional scatter
    plots for each pair of variables (e.g.,
    plotmatrix)
  6. Find the two most correlating variables in the
    data (corrcoef)
  7. Plot histogram (using, for example, 10 bins) for
    each variable (hist)
  8. Compute attribute means and medians for each
    class
  9. Compute the variance of all the variables (var)
  10. Compute the covariance matrix of the whole data
    (cov)
  11. Construct histograms of 10 bins for each Iris
    variable
  12. Make 2-dimensional scatter plots for each pair of
    variables on Iris data. Use different markers
    with different colors for different classes

Use help commands and documentation at
http//www.mathworks.com/access/helpdesk/help/tech
doc/matlab.html!!!
37
Task 1
  • Load the Iris data set from UCI repository
    http//www.ics.uci.edu/mlearn/MLSummary.html
  • A short description of the data is found from the
    lecture slides, Tan et al. Chapter 3 Exploring
    data, slide number 4.
  • Expect that you do not know the class names, the
    number of classes etc. of the Iris data set. What
    you know is that you have some data about flowers
    and the attribute names. Then, without prior
    assumptions and knowledge, analyse the data set
    using the available (or self-implemented)
    explorative and summarizing MATLAB tools.
    Document and explain all you can learn from the
    data by exploring. The documentation should
    contain figures and interpretation of useful and
    interesting visual views (different plots,
    colors, histograms,... see the techniques in the
    lecture slides Tan et al. Chapter 3 Exploring
    data). For example
  • Can you determine the number of classes by
    exploring (assumed to be unknown)? How?
  • Behavior of attributes (correlations, scatters
    (variance/MAD/covariance), ranges, ). What kind
    of preprocessing might be needed? Are there
    redundant attributes? Outliers? And so on..
  • other findings?
  • Explain what you can learn about data (that
    represents the three flower types). Describe
    carefully your findings and compare your results
    with respect to the known class labels of the
    flowers. Did you find the classes from the data?

38
Task 2
  1. Load the synthetic cluster data set from the file
    clusterdata1.data. The data set contains a set of
    generated 7-dimensional clusters. Try to find the
    best possible prototypes for the clusters using
    the MATLAB implementation of the dckmeans.m.
    Before this, exploring and preprocessing the data
    set (see Task 1), try to find all possible
    information for the clustering step (for example,
    data may contain errors, noise, redundancies,
    moreover, you must determine the number of
    clusters and so on). You may also modify the
    dckmeans code (e.g., replace the sample mean
    estimate wih a more robust one such as median).
    Remember that K-means is a local seach method
    (results depend on the initial prototypes, you
    may find the good ones). You can also utilize the
    PCA code in exploration and/or clustering.
  2. Document and explain all the steps and all the
    significant facts you can learn from the data by
    exploring, summarization, visualization,
    clustering etc. The documentation should contain
    plots, histograms, etc. with interpretations. If
    you do some prepocessing, transformations,
    scaling for data, report and explain them
    carefully. The most important thing is to
    document the final clustering results (prototypes
    and clusterlabels) that is your refinement for
    the data set.
  3. Remember not to only report the findings, but
    also how did you proceed (your mining process)!

Exploit frequently the help commands and
documentation at http//www.mathworks.com/access/h
elpdesk/help/techdoc/matlab.html!!!
Write a Comment
User Comments (0)
About PowerShow.com