The SEND Data Format

The Standard Exchange for Non-clinical Data or SEND format is currently gaining traction within the pharmaceutical industry, largely thanks to its adoption by the US Food and Drug Administration (FDA) as a format for submitting data. In many respects this is a very good thing. There is a distinct lack of standardisation in organisations dealing with pre-clinical and safety data, and the further back into the R&D pipeline you go, the worse it gets. Many in vivo groups are still wedded to spreadsheets for collecting and distributing their data, and at its worst you will find ad hoc templates springing up on a users whim.

I was heavily involved from the beginning in prototyping, designing and implementing the PredICT pre-clinical data platform at AstraZeneca, working exclusively on this project for four years. During that time I tried to assimilate as much of the workflow, the pain points and the irregularities of the in vivo oncology group as I could, to understand what was required to fix the issue of data standardisation. I’ll probably go a little deeper into my findings in a future post. At the time, the SEND format was still in deep discussion, and not mature enough for our purposes, so we created (with the help of Tessella) our own standard, the in vivo data contract (IVDC). This data standard is a very elegant specification of this kind of data, and in many respects is totally intuitive.

When I encountered SEND for the first time I was surprised by how clumsy it is in comparison. The worst of its faults seems to be the specification of data domains. I could write an essay on how this is bad data design, how it seems to me to be specification of data in the schema, and how a little bit of abstraction would help it expand seamlessly in future. I’m a little surprised it’s had any traction at all in the industry, for these and other reasons. I think the PredICT team were hoping that the IVDC would become the industry standard in this space, it’s certainly hugely easier to work with. The IVDC hasn’t been published as an open data standard, and is still part of AZ’s IP which certainly rules out its wider adoption for now.

I left AZ in November and set up sci-telligent partly to consider these ideas – to produce tools which enable users to work with this kind of data easily, a data format should never be the blocker for doing smart analytics.

I’m going to blog my journey investigating SEND and my findings working with it entirely using tools available in the public domain. I should add that these investigations are largely academic for now, and I’m in no way trying to compete with any commercial products.