An overview of distance and similarity functions for structured data

S Ontañón - Artificial Intelligence Review, 2020 - Springer
The notions of distance and similarity play a key role in many machine learning approaches,
and artificial intelligence in general, since they can serve as an organizing principle by …

Eleven quick tips for data cleaning and feature engineering

D Chicco, L Oneto, E Tavazzi - PLOS Computational Biology, 2022 - journals.plos.org
Applying computational statistics or machine learning methods to data is a key component of
many scientific studies, in any field, but alone might not be sufficient to generate robust and …

[PDF][PDF] Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis.

M Sugiyama - Journal of machine learning research, 2007 - jmlr.org
Reducing the dimensionality of data without losing intrinsic information is an important
preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is …

[PDF][PDF] Marginalized kernels between labeled graphs

H Kashima, K Tsuda, A Inokuchi - Proceedings of the 20th international …, 2003 - cdn.aaai.org
A new kernel function between two labeled graphs is presented. Feature vectors are defined
as the counts of label paths produced by random walks on graphs. The kernel computation …

[КНИГА][B] Kernel methods in computational biology

B Schölkopf, K Tsuda, JP Vert - 2004 - books.google.com
A detailed overview of current research in kernel methods and their application to
computational biology. Modern machine learning techniques are proving to be extremely …

A survey of kernels for structured data

T Gärtner - ACM SIGKDD explorations newsletter, 2003 - dl.acm.org
Kernel methods in general and support vector machines in particular have been successful
in various learning tasks on data represented in a single table. Much'real-world'data …

Semi-supervised local Fisher discriminant analysis for dimensionality reduction

M Sugiyama, T Idé, S Nakajima, J Sese - Machine learning, 2010 - Springer
When only a small number of labeled samples are available, supervised dimensionality
reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled …

[PDF][PDF] Fast methods for kernel-based text analysis

T Kudo, Y Matsumoto - Proceedings of the 41st Annual Meeting of …, 2003 - aclanthology.org
Kernel-based learning (eg, Support Vector Machines) has been successfully applied to
many hard problems in Natural Language Processing (NLP). In NLP, although feature …

An application of boosting to graph classification

T Kudo, E Maeda, Y Matsumoto - Advances in neural …, 2004 - proceedings.neurips.cc
This paper presents an application of Boosting for classifying labeled graphs, general
structures for modeling a number of real-world data, such as chemical compounds, natural …

[КНИГА][B] Adaptive stream mining: Pattern learning and mining from evolving data streams

A Bifet - 2010 - books.google.com
This book is a significant contribution to the subject of mining time-changing data streams
and addresses the design of learning algorithms for this purpose. It introduces new …