Note that some papers could fall under different headings
Y. Altun and A. Smola, Unifying Divergence Minimization and Statistical Inference via Convex Duality link
(ML, MAP, GP classification and regression, graphical models, conditional random fields, sparse estimation methods are all mentioned)
M. Dudik and R. E. Schapire, Maximum entropy distribution estimation with generalized regularization link
For PAC-Bayesian bounds
Arindam Banerjee, On Bayesian Bounds, link
H. Snoussi, Geometry of Prior Selection link
(mixture of Gaussians, blind source separation)Several papers by Zhu/Rohwer link, especially 1995-7
Huaiyu Zhu and Richard Rohwer, Information Geometry, Bayesian Inference, Ideal Estimates, and Error Decomposition link
Stephane Canu & Alex Smola, Kernel methods and the exponential family link
Kenji Fukumizu, Infinite dimensional exponential families by reproducing kernel Hilbert spaces, abstract
Jun Zhang, Nonparametric Information Geometry: Referential Duality and Representational Duality on Statistical Manifolds, link
Sumio Watanbe, link
(Layered
models, such as neural nets, Bayesian networks)
Alex Smola, Summer School Taiwan 2006, lecture 2 link. (and conditional random fields)
J.-F. Cardosa, Dependence, correlation and Gaussianity in independent component analysis, Journal of Machine Learning Research. Vol. 4, pages 1177-1203, dec 2003. link
A. Paiva et al., Kernel Principal Components Are Maximum Entropy Projections link
M. Collins et al., A Generalization of Principal Component Analysis to the Exponential Family link
I. Csiszar and G. Tusnady Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue, 1: 205-237, 1984
A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research, (6):2049-2073, December 2005. link
T. Jebara et al, Maximum Entropy Discrimination link (and graphical models)
S. Canu and A. Smola, Kernel methods and the exponential family link.
Koji Tsuda, Information Geometry of Diffusion Kernels link and link
Justin Dauwels, On information-geometric aspects of graphical models and kernel machines link
Information Geometry of U-Boost and Bregman Divergence Noboru Murata et al. link
Information Geometry and Statistical Pattern Recognition Shinto Eguchi link
Guy Lebanon, An Extended Cencov-Campbell Characterization of Conditional Information Geometry link (AdaBoost and logistic regression)
see also his thesis
Guy Lebanon, Riemannian Geometry and Statistical Machine Learning link, includes a regularized AdaBoost (section 5.3)
Ikeda, S., Tanaka, T., and Amari, S. (2004). Stochastic reasoning, free energy, and information geometry. Neural Computation, 16. link
Shinto Eguchi and John Copas, Recent Developments in Discriminant Analysis from an Information Geometric Point of View link
E. Laurķa, "Learning the structure of a Bayesian network," in Maximum Entropy and Bayesian Methods, AIP Conf. Proc., 2005, link
A Simple Approach for Finding Globally Optimal Bayesian Network
Structure
link (not
MaxEnt)
C.-H. Yeang, An information geometric perspective on active learning link
Shun-ichi Amari, Hierarchy of Probability distributions link
Franz Josef Och, Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation link
V. Balasubramanian, MDL, Bayesian Inference and the Geometry of the Space of Probability Distributions, in Advances in Minimum Description Length: Theory and Applications, P.J. Grunwald et al. eds, pp. 81-99. MIT Press, 2005. link
C. Rodrigues, The ABC of Model Selection: AIC, BIC, and the New CIC. link
Joshua Goodman, Sequential Conditional Generalized Iterative Scaling, link, also link
Miscellaneous
Topsoe manuscripts, link
Shalizi, link
See also Funchun Peng's Maximum Entropy Models list, link
Discussion about information geometry, link
Maximum entropy and central limit theorem, law of small numbers link