About Me

I am currently a fifth year Ph.D. Candidate of Biostatistics at the University of Michigan. I work in the Song Lab under the supervision of Professor Peter X.K. Song. I am especially interested in developing new methods for Big Data, for example, scalable and numerically stable methods for doing regression and machine learning. Besides methodological research, I have been working as a data manager and analyst in the Data Management and Modeling Core of the Children's Environmental Health and Disease Prevention Center at the University of Michigan since 2013.
Method Interests: Data integration and harmonization, high-dimensional statistical inference, regression parameter clustering, statistical computing, optimization.
Application Interests: Metabolomics, environmental health, epigenetics, bioinformatics, children’s health, statistical quality control.

Doctoral Dissertation Research:

  • Develop and implement statistical methods for regression coefficient clustering in data integration. Applications include quantification of data heterogeneity, clustering of longitudinal patient trajectories, and detection of outlying studies.
  • An R pacakge metafuse has been developed for the above purpose.
  • Efficient and robust divide-and-conquer methods for generalized linear regression with variable selection and inference.



  • Fusion Learning Algorithm to Combine Partially Heterogeneous Cox Models
    (Invited minor revision) -- Tang, L., Zhou, L., and Song, P.X.K.
    2017+ -- Computational Statistics
  • Method of Divide-and-Combine in Regularized Generalized Linear Models for Big Data
    [Link] -- Tang, L., Zhou, L., and Song, P.X.K.
    2016 -- arXiv
  • Fused Lasso Approach in Regression Coefficients Clustering -- Learning Parameter Heterogeneity in Data Integration
    [Link] -- Tang, L., and Song, P.X.K.
    2016 -- Journal of Machine Learning Research
  • Lipid Metabolism is a Key Mediator of Developmental Epigenetic Programming
    [Link] -- Marchlewicz, E.H., Dolinoy, D.C., Tang, L., Milewski, S., Jones, T.R., Goodrich, J.M., Soni, T., Domino, S.E., Song, P.X.K., Burant, C. and Padmanabhan, V.
    2016 -- Scientific Reports
  • A LASSO Method to Identify Protein Signature Predicting Post-transplant Renal Graft Survival
    [Link] -- Zhou, L., Tang, L., Song, A.T., Cibrik, D., and Song, P.X.K.
    2016 -- Statistics in Biosciences
  • Automatic Quality Control of Transportation Reports Using Statistical Language Processing
    [Link] -- Gerber, M.S., and Tang, L.
    2013 -- IEEE Transactions on Intelligent Transportation Systems


  • Map-Reduce functions for 'MODAC'
    Map-Reduce functions for fitting generalized linear models on Hadoop cluster. When a dataset is extrememly large (in terabytes), storing and fitting GLMs on local machine become impossible. The provided functions can fit and provide inference to GLMs for big data on a distributive file system using mapper and reducer functions in a parallel framework. The method is numerically robust. [Link]
  • R package 'metafuse'
    Used for regression coefficients clustering when doing data integration. If data are heterogeneous, it is able to provide visualization of subset clustering pattern by heirarchical dendrogram and output coefficient estimats. [Link]

Random Stuff

This page was last modified on: 12/18/2017