Open data has seen great progress in recent years, but new opportunities and challenges continue to emerge. Future progress requires greater attention to equity, security, and a better understanding of context and culture.
"Collaboration between traditional civil society and civic technologies or between journalists and private-sector application providers is driving new uses of data that could highlight corruption, promote public integrity, or shape public policy debate," summarized Silvana Fumega PhD, Researcher and Policy Director of ILDA, in a recent event hosted by the Wilson Center, in collaboration with Google. This event, “Open Data: What’s Next in Policy and Practice?” brought together academia, government, and industry representatives to explore past progress and future opportunities to drive open data forward. The speakers agreed: open data involves more than just putting datasets online. It is clear that truly accessible and equitable open data requires tools, resources, collaboration and standardization. Through exploration of panelists’ current work and their analysis of gaps and opportunities in the open data movement, recommendations emerged.
1. Open data exists on a spectrum
Making datasets public is the first step to even the scales between who is using and who could use data. However, data access is not a closed-open binary. Like other open movements, open data exists on a spectrum, influenced by different databases and different standards. Reaching the potential promised by open data has many challenges in both policy and practice.
Making open data more than FAIR but equitable is the next frontier that industry, academia, and government must tackle, and this cannot happen in silos. Additionally, equity doesn’t stop at access, but addresses who is asking the questions and what questions are prioritized. Stefaan G. Verhulst, Co-Founder, Chief of R&D, and Director of the Data Program of the Governance Laboratory (The GovLab), pointed out that “not everyone is part of formulating the questions and as a result, we already have an equity issue from the first part of the scientific enterprise.” Chris Marcum, Assistant Director for Open Science and Data Policy, White House Office of Science & Technology Policy, responded by saying “we need to ensure community is engaged in providing data, engaged in using this data, [and] are also asking the questions that are interesting to them.”
2. Open data needs to be reusable and interoperable
Analyzing data can be timely and costly. Data is shared in different formats with fields within the data expressing the same thing in different ways. Something as simple as different datasets expressing a field differently (e.x. “State” represented as Texas and TX) requires additional cleaning of data before it can be used. Panelists called for standards, from how the data is stored (CSV, pdf, etc.) to the normalization of data. Guha V. Ramanathan PhD gave an example of this, stating “the Bureau of Labor Statistics and the Labor and Economic Analysis Division definitions of employment are different, so today it is up to the user of the data to dissolve all these differences." Reusability creates transparency by allowing research results to be more broadly verified. However, even when data is coded and formatted uniformly, there can be underlying disagreements on definitions and categorization.
3. Context and culture need to be taken into account when assessing open data, especially internationally; This creates the need for shared definitions
The open data movement is not US-centric; the international lens shows the need for more collaboration. Challenges arise around different legal and social contexts, and definitions may differ drastically from country to country. Although open data principles remain largely the same, the context can differ. “We can't really say that the infrastructure is the same in a developed country as in a developing one. The barriers and the gaps and the infrastructure are clearly not the same, so we need to learn a little bit about the context to also understand what we are measuring, despite that the survey is exactly the same,” said Dr. Fumega. For example, different Latin American countries have different definitions of femicide, impacting data collection and categorization. Context, when analyzing these datasets, adds value.
4. Incentive structures should encourage open data
Dr. Ramanathan stated, “You’re not going to get usable effective open data without the right incentive structures.” For example, the Department of Energy (DOE) is promoting Digital Object Identifiers (DOI) for datasets and offering support for their use to affiliated researchers. Awards and other rewards can incentivize usable datasets that bolster research. New incentives could mean rehauling systems for promotion and tenure to include acknowledging open datasets like publications; naming and promoting datasets in the same way as research studies could foster a culture that more explicitly values this work.
5. Technical and sociocultural barriers limit the use of open data
Technical barriers include the need for massive computing power, tedious data cleaning, and navigating discipline-specific databases. Many people, groups, and researchers do not have the skill or money required for processing this data. Although this is a barrier, there are current initiatives funding and giving resources to alleviate these well-identified issues. However, the sociocultural barriers are harder to pinpoint and overcome. Elena Steponaitis PhD, Program Executive in NASA's Chief Science Data Office, described the Transform to Open Science (TOPS) mission at NASA, saying “TOPS aims to increase understanding and adoption of open science and accelerate major scientific discoveries.” This initiative, and its certificate program, shows what capacity sharing could look like. Building capacity for an open ecosystem can bridge silos through common language and goals.
In recent years, open data has seen great progress and there is still tremendous potential for greater impact. A huge amount of rich data is now publicly available, there is a foundation of government support at the highest levels, and open data is the default for many research programs. However, it is often still seen as a side project. Additionally, the momentum of many early leaders is slowing. According to OpenGov, we are in the “Third Wave” of this movement; this wave “takes a much more purpose-directed approach than prior waves; it seeks not simply to open data, but to do so in a way that focuses on impactful reuse, especially through inter-sectoral collaborations and partnerships.” It is now the time to talk about the next steps, and that open data for the sake of open data is not enough to drive forward the change that is possible. The Wilson Center has previously explored the importance of data quality from open tools, including where data fits in the open science community.
Many of the needs in creating more open and FAIR data point to one thing: the need for cultural change. Collaboration between expert communities in government, academia and private industry will build bridges between them. Additionally, creating incentives to collect and publish open data that is reusable and interoperable will assist in creating more equitable open data. Keeping these needs in mind can help to create more equity and accessibility in the open data movement.
The Science and Technology Innovation Program (STIP) serves as the bridge between technologists, policymakers, industry, and global stakeholders.
Read more