We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the
first nonparametric learning algorithm that admits a regret bound of O(T^(1/2) (log T)^3 loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.