UNIT - I: Data Warehousing and Online Analytical Processing
2 Marks Questions:
- Define data warehouse and explain its basic concepts.
- What is OLAP and how does it differ from OLTP?
- Explain data objects and attribute types in data mining.
5 Marks Questions:
- Explain data warehouse modeling concepts including data cube and OLAP operations.
- Describe data warehouse design principles and implementation strategies.
- Discuss cloud data warehouse architecture and its advantages over traditional systems.
- Explain data mining technologies, applications and major issues in pattern mining.
- Describe basic statistical descriptions of data including measures of central tendency and dispersion.
- Explain data visualization techniques and methods for measuring data similarity and dissimilarity.
UNIT - II: Data Preprocessing
2 Marks Questions:
- What are the main steps involved in data preprocessing?
- Define data cleaning and its importance in data mining.
- What is data discretization and when is it used?
5 Marks Questions:
- Explain data cleaning techniques for handling missing values, noisy data and inconsistent data.
- Describe data integration process and challenges in combining data from multiple sources.
- Explain various data reduction techniques including dimensionality reduction and numerosity reduction.
- Describe data transformation methods including normalization, aggregation and discretization.
- Compare different data preprocessing techniques and their applications in data mining.
- Explain the complete data preprocessing pipeline with practical examples.
UNIT - III: Classification
2 Marks Questions:
- What are the basic concepts of classification in data mining?
- Define entropy and information gain in decision tree induction.
- What is Bayes theorem and its application in classification?
5 Marks Questions:
- Explain the general approach to solving classification problems with evaluation metrics.
- Describe decision tree induction algorithm with attribute selection measures.
- Explain tree pruning techniques and scalability issues in decision tree induction.
- Describe Bayesian classification methods including Naïve Bayes classification with examples.
- Explain rule-based classification techniques and their advantages over other methods.
- Discuss model evaluation and selection techniques including cross-validation and performance metrics.
UNIT - IV: Association Analysis
2 Marks Questions:
- Define frequent itemsets and association rules.
- What is the difference between support and confidence in association rules?
- Explain the Apriori property in frequent itemset generation.
5 Marks Questions:
- Explain the problem definition of association analysis with market basket analysis example.
- Describe frequent itemset generation techniques and the Apriori algorithm.
- Explain rule generation process including confidence-based pruning methods.
- Describe the complete Apriori algorithm for association rule mining with numerical examples.
- Explain compact representation techniques for frequent itemsets.
- Describe FP-Growth algorithm and compare it with Apriori algorithm.
UNIT - V: Cluster Analysis
2 Marks Questions:
- What is cluster analysis and why is it important in data mining?
- Define different types of clusters in clustering techniques.
- What are the strengths and weaknesses of DBSCAN algorithm?
5 Marks Questions:
- Explain the overview and basics of cluster analysis with different clustering techniques.
- Describe the basic K-means algorithm with numerical examples and convergence criteria.
- Explain additional issues in K-means clustering and bi-secting K-means algorithm.
- Describe agglomerative hierarchical clustering algorithm with dendrogram construction.
- Explain DBSCAN algorithm including density-based approach and parameter selection.
- Compare K-means, hierarchical clustering and DBSCAN algorithms with their applications.