{"id":331451,"date":"2010-02-17T10:30:35","date_gmt":"2010-02-17T15:30:35","guid":{"rendered":"http:\/\/blog.epa.gov\/blog\/?p=1330"},"modified":"2010-02-17T10:30:35","modified_gmt":"2010-02-17T15:30:35","slug":"science-wednesday-onair-huge-datasets-pose-challenges-but-hold-promise","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/331451","title":{"rendered":"Science Wednesday: OnAir &#8211; Huge Datasets Pose Challenges but Hold Promise"},"content":{"rendered":"<p>During a recent visit to Harvard, I sat down with Francesca Dominici, a biostatistician and former director of the <a href=\"http:\/\/www.jhsph.edu\/particulate_matter\/\">Johns Hopkins Particulate Matter Research Center<\/a>.<\/p>\n<p>Dominici confessed that she has spent much of her time at Harvard thus far figuring out how to transfer, store and manage all of the data that has accumulated over years of research.<\/p>\n<p>How hard could it be to move data, I wondered?<\/p>\n<p>Her projects at Hopkins included a national study showing hospital admissions and mortality associated with exposure to air pollution particles.<\/p>\n<p>\u201cWe\u2019re using all data on particulate matter and particulate matter composition for every single monitoring station in the United States from the first date it has been available up until 2007.\u201d<\/p>\n<p>This includes years\u2019 worth of ambient air data from every zip code in the country.<\/p>\n<p>To get information on human health effects, Dominici uses Medicare data, including \u201cevery hospitalization for every person older than 65,\u201d amounting to over 48 million subjects.<\/p>\n<p>In all, the data (which continue to grow) add up to seven terabytes, Dominici said.<\/p>\n<p>How much is a terabyte? It would take 1,000, 1-gigabyte flash drives to hold a terabyte. Now, imagine 7,000 of those flash drives\u2014and you can wrap your mind around how much data Dominici has on her hands.<\/p>\n<p>As a way to cope with the mass of information, Dominici explained that it helps to pick and choose what data to work with at any give time. She compared the process to using a storage closet\u2014where you can put away winter clothes during the summer months and take them out again when it gets cold.<\/p>\n<p>\u201cThe good news\u2026 is that you don\u2019t need to manage it dynamically, all at once,\u201d she said.<\/p>\n<p>Despite the challenges of handling and analyzing such a vast amount of information, Dominici thinks the efforts will be fruitful.<\/p>\n<p>\u201cI have high confidence in the national study because I can see real improvements in getting sharper results as more data becomes available,\u201d she said.<\/p>\n<p>One study using the data, published in the Journal of the American Medical Association (JAMA), showed that causes of death and hospitalization related to air pollution differed in different parts of the country. \u201cCardiovascular risks tended to be higher in counties located in the Eastern region of the United States,\u201d the study reported.<\/p>\n<p>As analysis continues, other questions about air pollution risks will be answered. For now though, Dominici is neck deep in data, and it seems she likes it that way.<\/p>\n<p>\u201cAs a statistician, I really like to do this because I can have an impact,\u201d she said.<\/p>\n<p>\u201cGoing from seven terabytes of data to estimates that have an impact on policy\u2026 it\u2019s very, very satisfying.\u201d<\/p>\n<p><em>About the Author: A student contractor with EPA\u2019s Office of Research and Development, Becky Fried is a regular \u201cScience Wednesday\u201d contributor. <\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>During a recent visit to Harvard, I sat down with Francesca Dominici, a biostatistician and former director of the Johns Hopkins Particulate Matter Research Center. Dominici confessed that she has spent much of her time at Harvard thus far figuring out how to transfer, store and manage all of the data that has accumulated over [&hellip;]<\/p>\n","protected":false},"author":6469,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-331451","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/331451","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/6469"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=331451"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/331451\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=331451"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=331451"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=331451"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}