Análise de técnicas de agrupamento de dados para notícias de futebol
Data
Autor(es)
Orientado(es)
Título da Revista
ISSN da Revista
Título de Volume
Editor
Abstract
Data clustering is an unsupervised learning technique that searches for hidden patterns in a set of data. To do this, the set is divided into subgroups with characteristics similar to each other and different from the other groups. The present work investigates the K-Means, hierarchical, DBSCAN and Gaussian mixture clustering techniques, applied to news from the Brazilian Football Championship. The research aims to analyze the functioning of the techniques and provide possibilities for identifying patterns in the data. In the initial stage, data pre-processing was carried out, including tokenization and removal of stop words. The news was represented using the TF-IDF technique. Next, the dimensionality reduction technique was used using Latent Semantic Analysis. The grouping of news was carried out with the number of groups set at 21, representing the number of teams participating in the championship. The results indicated that both the K-Means algorithm and the Gaussian Mixture Model achieved an accuracy of 75%, demonstrating superior performance compared to the others. Additionally, experiments were carried out without prior definition of the number of clusters, using grid search to determine the best silhouette coefficient. The algorithms varied between 25 and 32 groups, suggesting that this range is appropriate for dividing the news database.
