Title: Towards Successful Ph.D. Research in Database Systems and Data Mining
1(No Transcript)
2Towards Successful Ph.D. Research in Database
Systems and Data Mining
- Jiawei Han
- Department of Computer Science
- University of Illinois at Urbana-Champaign
- www.cs.uiuc.edu/hanj
- November 27, 2020
3Outline
- Database and data mining highly promising themes
- Long history of strong and successful research
- Lots of new challenges
- Lots of research themes
- Selection of promising directions and promising
topics - Making your research bigger impact
- Discussing, debating, and active brain-storming
- Capturing and harvesting the sparks of thought
- Towards highly productive research
- Learning from others reviews and judgment
- Collaborations and team work
4DB and DM Long History of Strong Successful
Research
- Necessity is the mother of invention
- Coming from the real application demand
- Constantly seeking new and extended applications
- Developing core technologies for information
systems - A long history of success
- Real systems, numerous applications, and big
industry - Relational database systems ? application-oriented
DBMS (spatiotemporal, CRM, banking, health info,
) ? data warehouses ? data mining ? Web
search Google - In-depth and thoroughness in research
- Constant search for new, innovative methodologies
and algorithms - In-depth study of implementation, optimization,
and user needs - Scalability, uncertainty, approximation,
streaming, ranking, aggregation, privacy, and
security
5Still Challenging and Promising
- Huge amount of data is mounting up rapidly
- Giga-bytes ? terabytes ? peta-bytes in very fast
pace - Data collection and dissemination sensors,
digital cameras, Web - Database and data mining Various new
applications - Data streams, RFID, sensor networks, video/audio
data, text and Web, computer/software systems,
social networks, biological data, and
science/engineering data - Searching, ranking, mining, uncertainty, noise,
privacy, security - Database and data mining are still flourishing
- Scalable statistical and machine learning methods
- Pattern analysis methods
- Integrated with database systems, data
warehouses, and Web as a natural, hidden process - Still many open research problems and multiple
research frontiers
6Research Frontiers in Data Mining
- Information network analysis
- Stream data warehousing data mining
- Pattern mining, pattern usage, and pattern
understanding - Warehousing, and mining of moving object data,
RFID data, and data from sensor networks - Spatiotemporal and multimedia data mining
- Biological data mining
- Text and Web mining
- Data mining for software engineering and system
analysis - Data cube-oriented multidimensional online
analysis - Classification and ranking everywhere databases,
Web, documents, and knowledge
7A Multidimensional View of Research Themes
- Data view
- relational data, transactional data, information
network data, stream data, spatial, temporal,
multimedia (video/audio), moving object data,
RFID data, sensor data, biological data, text and
Web data, software engineering and system data - Issue view
- modeling, management, indexing, retrieval
(query), update, integration, warehousing,
mining, data cube computation, multidimensional
online analysis, security, privacy, - Methodology view incremental, parallel,
distributed - For mining statistical, machine learning,
decision-tree, MDL, HMM, Naïve-Bayes, - Application view Different industries,
governments, science engr. - Adding dimensions time, space,
- Relaxing assumptions approximation, uncertainty,
8Outline
- Database and data mining highly promising themes
- Long history of strong and successful research
- Lots of new challenges
- Lots of research themes
- Selection of promising directions and promising
topics - Making your research bigger impact
- Discussing, debating, and active brain-storming
- Capturing and harvesting the sparks of thought
- Towards highly productive research
- Learning from others reviews and judgment
- Collaborations and team work
9Selection of Promising Directions
- Read survey papers, proceedings, etc., discuss
with your friends and professors, and use your
own reasoning - Is the direction likely to be much needed and
have a bright future? - Do I have sufficient background to work on it?
- Am I truly interested in it?
- Does the direction attract long-term
investigation? - It is OK to change it or adjust it?
- May need to constantly adjust your research
directions - Ex. Myself, from deductive DBs (recursive query
processing) to data mining
10Making Your Research Bigger Impact
- Necessity is the mother of invention
- What is the most needed in the next several
years? - Will it have long term impact or fade out soon?
- Innovative and thorough research
- Is your approach fresh, innovative, somewhat
ground-breaking? - Have you examined it systematically? Have you
considered alternative or previously studied
methods? - Can it be further improved?
- Two kinds of research topics creative vs.
improvement - Find new themes (new patterns, new methodologies,
new directions) - Improve the existing solutions
- Never be tied with the existing solutions
- First think on it independently, and work out
independently - Believe always can find new ways to improve it!
11Discussions, Sparks, and Technical Meat
- Watch before you leap
- Careful and thorough thinking should go before
implementing and testing - Form small groups instead of working alone
- Slides, emails, and weekly theme-based meetings
or teleconferences - Questions on slides, related work, new design,
proposed algorithms, try to find ways to improve
it - Capture and harvest the sparks of thought
- Many good ideas may come from a weak spark of
thinking - Capture the sparks timely and do not let it slip
away
12Case 1 ICDE07 Best Student Paper Award
- Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu,
and Hong Cheng, Mining Colossal Frequent
Patterns by Core Pattern Fusion, in Proc. 2007
Int. Conf. on Data Engineering (ICDE'07),
Istanbul, Turkey, April 2007 (the BEST STUDENT
award) - Identifying the problem that the current
technology cannot solve and its applications - Colossal patterns, bio-applications
- How the paper was generated? Progressive
refinement - slides ? discussions ? algorithms ? discussions ?
experiments ? new slides - Smart ideas and technical innovation
13Case 2 ICDE06 Best Student Paper Award
- Hector Gonzalez, Jiawei Han, Xiaolei Li, and
Diego Klabjan, Warehousing and Analysis of
Massive RFID Data Sets, in Proc. 2006 Int. Conf.
on Data Engineering (ICDE'06), Atlanta, Georgia,
April 2006. - Necessity is the mother of invention
- Working on a key problem RFID data warehousing
- The key solution deep compression
- How deep is deep? Maximal sharing of bulky
movements - Multiple designs, refinements, testing and
refinement again - slides ? discussions ? algorithms ? discussions ?
experiments ? new slides - Constant brain-storming
14Outline
- Database and data mining highly promising themes
- Long history of strong and successful research
- Lots of new challenges
- Lots of research themes
- Selection of promising directions and promising
topics - Making your research bigger impact
- Discussing, debating, and active brain-storming
- Capturing and harvesting the sparks of thought
- Towards highly productive research
- Learning from others reviews and judgment
- Collaborations and team work
15Learning from Others Reviews and Judgment
- A very important task for training Ph.D. is the
judgment judging others as well as judging
yourself - A good researcher should be first a good judge on
research - Reading a good research paper First read the
problem and try it by yourself - Be active at serving as a reviewer See how
others evaluate the work and learn from the good
judges - Read survey papers and write your own simple
surveys on the problems you intend to work on
16Putting All the Eggs in One Basket?
- Working on several research problems or only on
one? - Initially, more than one theme may help test the
water and settle down a promising theme that
matches you - Even after you have been focused on one theme, it
is good to try slight different problems - Productivity, alternative thoughts, adjustable
solutions, and research collaborations - Working with your friends and colleagues
- Complement each other on strength and expertise
17Seminar Course Continuous Training/Education
- Advanced seminars for DAIS and DM group
- Constantly running in every semester
- Presenting your own work and get feedbacks from
the group - Mostly are recently accepted conference papers
- Requiring only one page summary/abstract
- Presenting good papers from recent, top
conferences selecting only SIGKDD, SIGMOD, VLDB,
ICDE, ICDM, SDM, WWW, , conference papers
published in the last 12 months.
18Conference and Journal Reviews
- Volunteering on conference and journal
coordination - For each conference we served as a PC member, we
have one Ph.D. student volunteering as conference
coordinator - S/he will communicate with the group members to
select papers, collect reviews, and I will have
one or more rounds of thorough discussions with
the coordinator to make sure the reviews are not
biased, comprehensive and in high quality - Also, the reviews will be relatively ranked and
balanced - A good exercise for all the participants
- Similar exercises for journals and proposal
reviews
19Semester Summary and Awards
- Award summary as a way to promote excellence on
research - Summary meeting at each semester
- Summary on each students Webpage and
presentation - Award voting with multiple grades Gold, silver,
bronze and honorable mentioning - Vote after the major conference evaluation
results are out - Publish the award voting summary
- Presents and web publicity
- Award competition also promotes collaborations
20Questions
21Thanks and Questions
22Create a Productive Research Group
- Selection of promising students
- Training and selection of students from classes
- Test run with research problems
- Watch on sparks and working attitude
- Written qualifications vs. oral ones
- Team organization
- CS591 vs. meetings (start ending meetings)
- Use students expertise, strength, and interests
- Division of group work Everyone is in charge
- Theme-based dynamic small research groups
- Encouraging students on their progress papers,
etc. - Semester summary, web-pages
- Award competition
23Group Administration/Public Relation Work (Sept.
06-Aug.'07)
- Group Webmaster (news, group Web page, pictures,
etc.) Tianyi Wu - Web-based research reference collections Hong
Cheng - Hardware, equipment, and software master Sang
Kim - TKDD Information Director Xiaoxin Yin
- DAIS seminar coordinator Deng Cai
- DAISY System administrator Hector Gonzalez
- IlliMine project coordinator Xiaolei Li
- Industry/visitor coordinator Chao Liu
- Conference and journal review coordinator (3)
Dong Xin, Jing Gao and Chen Chen - Research proposal coordinator (2) Feida Zhu and
Jianlin Feng - Social activity coordinator Jaegil Lee, Ok-ran
Jeong
24Work on Promising Research Topics
- Selection of promising research topics
- Select topics based on your strength and interest
- Putting all the eggs in one basket ?? may work on
2-3 topics at the same time - Discussion, debate, and active brain-storming
- Capture and harvest the sparks of thought
- Two kinds of research topics creative vs.
improvement - Find complete new theme (new patterns, new
methodologies, new directions) - Improve the existing solutions
- Never be tied with the existing solutions
- First think on it independently, and work out
independently - Believe always can find new ways to improve it!