H. Almuallim and T. G. Dietterich, "Learning with many irrelevant features," in Proc. Ninth National Conf. Artificial Intelligence (AAAI91), 1991, pp. 541-552.
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama, "Greedily finding a dense subgraph," J. Algorithms, vol. 34, no. 1, pp. 203-221, 2000.
R. Battiti, "Using mutual information for selecting features in supervised neural net learning," IEEE Trans. Neural Netw., 1994.
D.A. Bell and H. Wang, "A formalism for relevance and its application in feature subset selection," Mach. Learn., vol. 41, no. 2, pp. 175-195, 2000.
Y. Benjamini and Y. Hochberg, "Controlling the false discovery rate: A practical and powerful approach to multiple testing," J. Royal Statist. Soc., vol. B57, pp. 289-300, 1995.
A. Billionnet and F. Calmels, "Linear programming for the 0-1 quadratic knapsack problem," Eur. J. Oper. Res., vol. 92, pp. 310-325, 1996.
A. Blum and P. Langley, "Selection of relevant features and examples in machine learning," Artif. Intell., vol. 97, pp. 245-271, 1997.
A. Blum and R. L. Rivest, "Training a 3-node neural network is NP-complete," Machine Learning: From Theory to Applications, vol. 661, Lecture Notes in Computer Science, 1993.
U. M. Braga-Neto, "Fads and fallacies in the name of small-sample microarray classification," IEEE Signal Process. Mag., Special Issue on Signal Processing Methods in Genomics and Proteomics, vol. 24, no. 1, pp. 91-99, 2007.
T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1990.
M. Dash and H. Liu, "Feature selection for classification," Intell. Data Anal., 1997.
S. Davies and S. Russell, "Np-completeness of searches for smallest possible feature sets," in Proc. AAAI Fall Symp. Relevance, 1994.
T. Van den Bulcke, K. Van Leemput, B. Naudts, P. van Remortel, H. Ma, A. Verschoren, B. De Moor, and K. Marchal, "Syntren: A generator of synthetic gene expression data for design and analysis of structure learning algorithms," BMC Bioinformatics, vol. 7, no. 1, p. 43, 2006.
C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," J. Bioinform. Comput. Biol., vol. 3, no. 2, pp. 185-205, 2005.
J. Dougherty, R. Kohavi, and M. Sahami, "Supervised and unsupervised discretization of continuous features," in Int. Conf. Machine Learning, 1995, pp. 194-202.
W. Duch, T. Winiarski, J. Biesiada, and A. Kachel, "Feature selection and ranking filters," in Int. Conf. Artificial Neural Networks (ICANN) and Int. Conf. Neural Information Processing (ICONIP , Jun. 2003, pp. 251-254.
F. Fleuret, "Fast binary feature selection with conditional mutual information," J. Mach. Learn. Res., vol. 5, pp. 1531-1555, 2004.
I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, 2003.
A. Jain and D. Zongker, "Feature selection: Evaluation, application, and small sampleperformance," IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, 1997.
A. Jakulin and I. Bratko, Quantifying and Visualizing Attribute Interactions 2003.
R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artif. Intell., vol. 97, no. 1-2, pp. 273-324, 1997.
I. Kojadinovic, "Relevance measures for subset variable selection in regression problems based on k-additive mutual information," Comput. Statist. Data Anal., vol. 49, 2005.
D. Koller and M. Sahami, "Toward optimal feature selection," in Int. Conf. Machine Learning, 1996, pp. 284-292.
I. Kononenko, "Estimating attributes: Analysis and extensions of RELIEF," in Eur. Conf. Machine Learning, 1994, pp. 171-182.
A. A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla Favera, and A. Califano, "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context," BMC Bioinformatics, vol. 7, 2006.
W. J. McGill, "Multivariate information transmission," Psychometrika, vol. 19, 1954.
P. Merz and B. Freisleben, "Greedy and local search heuristics for unconstrained binary quadratic programming," J. Heuristics, vol. 8, no. 2, pp. 1381-1231, 2002.
P. E. Meyer and G. Bontempi, "On the use of variable complementarity for feature selection in cancer classification," in Applications of Evolutionary Computing: EvoWorkshops, F. Rothlauf, Ed. et al., 2006, vol. 3907, Lecture Notes in Computer Science, pp. 91-102.
P. E. Meyer, O. Caelen, and G. Bontempi, "Speeding up feature selection by using an information theoretic bound," in The 17th Belgian-Dutch Conf. Artificial Intelligence (BNAIC'05), 2005, KVAB.
P. E. Meyer, K. Kontos, F. Lafitte, and G. Bontempi, "Information-theoretic inference of large transcriptional regulatory networks," EURASIP J. Bioinform. Syst. Biol., 2007.
L. Paninski, "Estimation of entropy and mutual information," Neural Comput., vol. 15, no. 6, pp. 1191-1253, 2003.
H. Peng and F. Long, "An efficient max-dependency algorithm for gene selection," in 36th Symp. Interface: Computational Biology and Bioinformatics, May 2004.
H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, 2005.
D. Pisinger, "Upper bounds and exact algorithms for dispersion problems," Comput. OR, vol. 33, pp. 1380-1398, 2006.
G. Provan and M. Singh, "Learning bayesian networks using feature selection," in Fifth Int. Workshop on Artificial Intelligence and Statistics, 1995, pp. 450-456.
D. W. Scott, Multivariate Density Estimation. Theory. New York: Wiley, 1992.
M. Studený and J. Vejnarova, "The multiinformation function as a tool for measuring stochastic dependence," in Proc. NATO Advanced Study Institute on Learning in Graphical Models, 1998, pp. 261-297.
N. Tishby, F. Pereira, and W. Bialek, "The information bottleneck method," in Proc. 37th Annu. Allerton Conf. Communication, Control and Computing, 1999.
G. D. Tourassi, E. D. Frederick, M. K. Markey, and C. E. Floyd, Jr., "Application of the mutual information criterion for feature selection in computer-aided diagnosis," Med. Phys., vol. 28, no. 12, pp. 2394-2402, 2001.
G. Trunk, "A problem of dimensionality: A simple example," IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, 1979.
I. Tsamardinos and C. Aliferis, "Towards principled feature selection: Relevancy, filters, and wrappers," Artif. Intell. Statist., 2003.
M. J. van de Vijver, Y. D. He, L. J. van't Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, and R. Bernards, "A gene-expression signature as a predictor of survival in breast cancer," New England J. Med., vol. 347, 2002.
L. J. van 't Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. van der Kooy,M. J. Marton, A. T.Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend, "Gene expression profiling predicts clinical outcome of breast cancer," Nature, vol. 406, 2002.
W. Wienholt and B. Sendhoff, "How to determine the redundancy of noisy chaotic time series," Int. J. Bifurc. Chaos, vol. 6, no. 1, pp. 101-117, 1996.
Y. Y. Yao, S. K. M. Wong, and C. J. Butz, "On information-theoretic measures of attribute importance," in Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, 1999.
L. Yu and H. Liu, "Efficient feature selection via analysis of relevance and redundancy," J. Mach. Learn. Res., vol. 5, pp. 1205-1224, 2004.
Z. Zhao and H. Liu, "Searching for interacting features," in Proc. 20th Int. Joint Conf. Artificial Intelligence (IJCAI-07), 2007.