Conference paperDynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language