The magnetoencephalography (MEG) response to continuous auditory stimuli, such as speech, is commonly described using a linear ﬁlter, the auditory temporal response function (TRF). Though components of the sensor level TRFs have been well characterized, the underlying neural sources responsible for these components are not well understood. In this work, we provide a uniﬁed framework for determining the TRFs of neural sources directly from the MEG data, by integrating the TRF and distributed forward source models into one, and casting the joint estimation task as a Bayesian optimization problem. Though the resulting problem emerges as non-convex, we propose efﬁcient solutions that leverage recent advances in evidence maximization. We demonstrate the effectiveness of the resulting algorithm in both simulated and experimentally recorded MEG data from humans.