Big Spatial Data seminar Part 1: Analythics in the Age of Big Data

Video thumbnail (Frame 0) Video thumbnail (Frame 1237) Video thumbnail (Frame 2215) Video thumbnail (Frame 3230) Video thumbnail (Frame 4145) Video thumbnail (Frame 6479) Video thumbnail (Frame 7341) Video thumbnail (Frame 9936) Video thumbnail (Frame 11180) Video thumbnail (Frame 13009) Video thumbnail (Frame 14081) Video thumbnail (Frame 15318) Video thumbnail (Frame 16005) Video thumbnail (Frame 17172) Video thumbnail (Frame 17616) Video thumbnail (Frame 19797) Video thumbnail (Frame 21809) Video thumbnail (Frame 22326) Video thumbnail (Frame 22792) Video thumbnail (Frame 23273) Video thumbnail (Frame 23827) Video thumbnail (Frame 24476) Video thumbnail (Frame 24902) Video thumbnail (Frame 25482) Video thumbnail (Frame 25919) Video thumbnail (Frame 26326) Video thumbnail (Frame 26740) Video thumbnail (Frame 29960) Video thumbnail (Frame 30401)
Video in TIB AV-Portal: Big Spatial Data seminar Part 1: Analythics in the Age of Big Data

Formal Metadata

Title
Big Spatial Data seminar Part 1: Analythics in the Age of Big Data
Title of Series
Part Number
1
Number of Parts
2
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2013
Language
English

Content Metadata

Subject Area
Abstract
Big Spatial Data seminar - University of Alabama - July 25th 2013. Rahul discusses their NASA ACCESS funded online tool: Curated Data Albums, an online tool for gathering and presenting relevant data and information from distributed sources. These Data Albums are complied collections of information related to events with links to relevant data files (granular) from different instruments
Domain name Principal ideal Data mining Presentation of a group Group action Information Text editor Mereology Kolmogorov complexity Data analysis Physical system Neuroinformatik
Computer font Information Principal ideal Projective plane Analytic set Bit Twitter Optical disc drive Event horizon Ring (mathematics) Program slicing System programming File archiver Information
Area Building Focus (optics) Principal ideal Bit Wave packet Degree (graph theory) Mathematics Data mining System programming Computer science Self-organization Information Game theory Mutual information
Spacetime Distribution (mathematics) Code Multiplication sign Connectivity (graph theory) Collaborationism Video game Set (mathematics) Mereology Event horizon Metadata Product (business) Data management Mathematics Sign (mathematics) Operator (mathematics) Information Process (computing) Social class Area Service (economics) Distribution (mathematics) Standard deviation Satellite Variety (linguistics) Validity (statistics) Cycle (graph theory) Planning Coroutine Product (business) Component-based software engineering Process (computing) Drum memory File archiver Video game Cycle (graph theory) Freeware Resultant
Scale (map) Standard deviation Group action Information Decision theory Decision theory Projective plane Neuroinformatik Mereology Neuroinformatik Knowledge extraction Videoconferencing Software framework Energy level Dispersion (chemistry) Information Multiplication
Slide rule Freeware Service (economics) Observational study Link (knot theory) Computer file Variety (linguistics) Source code Tape drive File format Event horizon Perspective (visual) Web 2.0 Type theory Telecommunication Velocity Different (Kate Ryan album) Personal digital assistant Videoconferencing Automation Information Traffic reporting Physical system Observational study Link (knot theory) Variety (linguistics) Information Decision theory File format Projective plane Mathematical analysis Analytic set Volume (thermodynamics) Type theory Uniform resource locator Velocity Event horizon Personal digital assistant Data center Self-organization Right angle Summierbarkeit Volume Physical system
Observational study Variety (linguistics) Set (mathematics) Mathematical analysis Mereology Focus (optics) Event horizon Metadata Time domain Term (mathematics) Repository (publishing) Information Source code Observational study Variety (linguistics) Information Expert system Mathematical analysis Parameter (computer programming) Range (statistics) Denial-of-service attack Greatest element Uniform resource locator Event horizon Personal digital assistant System programming Right angle Volume Form (programming) Electric current
Web page Building Focus (optics) Satellite Information Web page Interactive television Library catalog Event horizon Architecture Sign (mathematics) Videoconferencing Software cracking Information Figurate number YouTube Computer architecture
Presentation of a group Service (economics) Algorithm Texture mapping Source code Analytic set Mereology Architecture Different (Kate Ryan album) Ontology Ranking Service-oriented architecture Information security Sanitary sewer Pressure Computer architecture Rule of inference Service (economics) Internet service provider Computer Analytic set Bit Cartesian coordinate system Statistics Web browser Visualization (computer graphics) Ontology Estimation System programming Multimedia Service-oriented architecture Physical system
Android (robot) Statistics Euclidean vector Algorithm Weight Connectivity (graph theory) Calculation 1 (number) Inverse element Mereology Entire function Metadata Data model Frequency Performance appraisal Bit rate Root Term (mathematics) Ontology Ranking Endliche Modelltheorie YouTube Computer architecture Flux Standard deviation Algorithm Link (knot theory) Observational study Key (cryptography) Information Concentric Inverse element Thermal expansion Instance (computer science) Term (mathematics) Cartesian coordinate system Grass (card game) Statistics Digital object identifier Word Frequency Ontology Personal digital assistant Query language Calculation Data center Remote procedure call
Web page Frequency Digital filter Process (computing) Demo (music) Algorithm Graph (mathematics) Decision theory Web page Thresholding (image processing) Mereology Power (physics)
Email Inclusion map Execution unit Chain Information Venn diagram Chemical equation View (database) Normed vector space Form (programming)
Category of being Execution unit Dependent and independent variables Information Menu (computing) Gamma function Graph coloring Number
Email Category of being Execution unit Angle Personal digital assistant
Information Avatar (2009 film) Probability density function
Execution unit Process (computing) Service-oriented architecture Rule of inference
Execution unit Different (Kate Ryan album) Set (mathematics) Curvature
Email Object-oriented programming Radio-frequency identification Musical ensemble Convex hull Selectivity (electronic) Thresholding (image processing) Resultant
Email
Inclusion map Execution unit Convex hull
Inheritance (object-oriented programming) Hypermedia Information Interior (topology) Strategy game Mereology YouTube Exception handling
Email Event horizon Distribution (mathematics) Principal ideal Neuroinformatik Text editor
dr. rahul ramachandran is deputy editor for earth science informatics and a principal research scientist at the information technology and systems center at the university of alabama in huntsville Rahul has pioneered the concept of data prospecting which sits between data mining and data discovery he is also chair of the newly formed research data reliance working group on big data analytics he graduated with a PhD in 2002 from the atmospheric science department at the University of Alabama in Huntsville and was selected for the 2010 presidential early career award for scientists and engineers in this presentation he speaks about his work on data discovery and big complex data in the domain of atmospheric science thank
you for inviting me cook in the stock so here's the outline odd gough your slice
to introduce Who am I you know what I do with them to research talk a little bit more about the research lab work a lamb actually runs a data archive I think very much similar commands so they may be you know things of interest there and then i'll talk about to ongoing data projects that might be of interest to people here the first one is actually looking at data and information aggregation the other one is big data are looking at my ring here at analytics I have a rather eclectic educational
background it's kind of all over the place I guess I'm still in engineering that's what I did as an undergrad but I have degrees in episode science and computer science a little bit about my
Center so you know we are a research lab research focus you know we've we've done a lot of work in data mining knowledge discovery or obviously work in the area of dramatics we do a fair amount of organic stuff build games that I used by the US military for training so one of our funding comes from all these three-letter agencies in us so we also
run one of the 12 NASA data archives so it's basically you know fully operational data archive operational again this custom processing not a distribution and our primary data sets are lightning data we also hold on of the drum validation free campaign data that you get NASA launch for instrument validation and then we have the microwave data products exactly an important part of 10 times if you all
notice this is the leader life cycle what has change in the recent years is that cannot be there is no longer treated as a second-class citizen its kind of treated as a first class object the importance of data is now being slowly realized like there's a code techno data is not a currency in science what we try to do is to figure out dinner what are the process inefficiencies that are there in the people outside so how can we make scientific cross scientific process be faster that have more productive so what's happening is as technology is evolving some of the process components may be getting obsolete so we have to you know start looking at new solutions that can be used and the other thing that's happening in at least in the US and it may be happening here too as you are getting new policy requirements that you have to deal with you know like an NSF but not requires a bigger lunch plan nASA has a data preservation requirement that in that you know you have to have an do all the metadata details so that you can you have enough contextual understanding two years down the road that you cannot use the data and the other thing that's coming down a horizon is the own motion of reproducibility an executable paper as the gold standard that everyone's going to you know signs especially the aspects of signs that have major policy implications after make sure that every producers results so the area that i work in is in science
informatics and that's basically looking at you know how do you apply systemic technology approaches to think are aspects of the video outside not just the knowledge extraction and the decision to part with the world or even dealing acquisition processing how do you gathered information multi-pocket and the important thing is providing you know really does my solutions to the stakeholders and not being tools that they cannot utilize now I'm going to
transition this is pretty much a background on what I do and I'm going to talk about to ongoing projects and these dispersant a shin while these projects are at a very high level so this is
slide that's required not to give a definition of big data the first one is Gardner's definition of big data that every week nose out where you have the whole notion of velocity volume that you know it could be understands velocity that you have real-time aspects to the baker and the notion of the variety there's different kinds of data that you get that will have different kinds of quality information format types that you to handle I really don't like to gartner's definition for big data that's a different perspective I guess so this is Jim fruit from University of Santa Barbara his definition I like I think it's more from a data center perspective it's you know you can't really do it it's like an organ if you want to play you and go to the org and the organ comes even so that's what the big data you can move it if you want to use it you have to go where it is so implications to data centres throughout the world is that you know now yet start looking at systems that cannot be to analysis on tape so the two projects that I'm presenting here the first one is is actually focusing on this whole notion of variety you know how do you now actually you have so much of information and distributed data that you can get on the web right at different locations different sources how can you automate aggregation around events interest the second case is actually looking at more analytics is with the ativan Angela text so here's
the first project this is a NASA funded project it's called Q aggregate albums for science case studies and the concept here is that you know data album is basically a compiled collection of it summation crown big event of interest so this combined collection includes not just the data files that you want to use for studying that event but also has links to services tools news reports video to make you know anything that gives you food can picture understanding that's useful for studying big event on the road and the curation here is to allow and in user to basically customize the data for their particular study of each user may have what I think you up or how they want what did that event so
the motivation behind the building a tool like this is that even well it is an atmospheric science well the most common research that is done is case study analysis that plant ecology studies focused on a significant event we have a major flooding rain but if your major hurricane come through knowledge research is done on the understanding how they work occurred so to do that what we need is a wide variety of data and information from the only distributed locations that are there so for example you know nASA has all these different acts each tag is going you know one kind of data set if you are an individual assertion when you can figure out where to go and get the data based on know the metadata that's their private the other thing is science is also becoming very eight exploring you may have users who may or may not be experts in a particular set or they may not know the exact vocabulary or metadata term to use to find to do the search right so how do you support users like that so the whole gathering and you know gathering of data and information around this emotional events actually very tedious and time-consuming so the
challenge is to build a tool that can do this gathering of information in an automated manner but the gathering part is actually easy part you know the hard part is how do you figure out what is relevant in terms of what's out there so that's the Challenger is once you're gathering stuff figuring out what to filter and what to keep the other part of it is the metadata tends to be fairly boring right how do you present the skillet and information in a manner that is actually more useful and included you know we've been present metadata in a really dry and so can we do something really different here that's that is that challenging
essential for building this tool so the science driver here is for hurricane signs it's throwing the easiest event to start but because it's a major event there's lots of information about it there's information about cracks and you know how does the hurricane actually progressed so the goal was to you know use this as the first science driver and build catalogs on all the hurricane events for last night a year focus is not just on the data but it's also on you know the information that is not yet acquired is the back you know the background information what was the damage caused how we get square there and all of these things require parsing through web pages or PDF files of strongly towards this is the conceptual
architecture you know you have these different resources on the left hand side that coming actually from all these agencies in u.s. you may have things that are crowdsource like videos and pictures and YouTube that you can get so the goal is to aggregate this figure out what is relevant and then put it in a structured form and present it to a end user so that they can actually utilize this information so this is the their
architecture for the tool there is an engine then tries the different brokers that talk to the difference typically sources and then puts everything together internal security to base because some of our the day gets so large equating it is an issue and there is a service layer and then the at the top we have a presentation geography for why do you write a user to to become attractive analytics there is interactive visualization and a fasted visual search part fit the piece I'm going to talk about is the piece that's new is this world ontology-based relevancy ranking capability and then hottest lecture you little bit about the the truth so the ranking service the
ontology is ranking services actually designed as a general service that can be customized to many different applications and it combines is an ontology-based school and a traditional statistical school has based on these two papers by paramore when the score act paper now I'm going to the kids on
the algorithm but the two components here at the first one is an anthology component where we have an application ecology in this case we have an anthology for a hurricane for all the concepts in the technology we calculate the ways basically the linkages between the concepts so more connected a concept is high rate it has and then we calculate an activation value so not everything in the ontology not all the concepts into ontology are in a deep are important there are certain ones that are much higher r-value so those are the key ones that's where the search starts so those are those concentrate are much higher activation values compared to the other ones and then then we use this very standard dfid model for statistical calculations right there you do a term frequency calculation for a word and then you do a inverse document frequency a potential problem so the relevancy school when we calculate is that you know we look at a document and get all the metadata can we match against the concepts in their ecology and then we calculate a score based on the ontology we calculate a score based on still the TFI remodel and then that is given a relevancy school so that's how we you know can do some relevancy filtering on the through all the information that they're gathering question from android
sure all so i can see how that's going to work the document silence he has no word for some remote document resources that you're showing off my architecture who done like what just on youtube for instance vinegar so what we do in youtube is we do a query expansion okay the ontology guys a great expansion you just have searching for us a hurry excited right so we do just a hurricane sandy I need to give me get people call sandy but there def right so we can use the ontology to automate the query expansion so we add more detailed terms so that yeah does the relevant sleeper facing it so we did the realtor the year how well this algorithm works are clearly algorithm can improve part of it but you know this is a grass Plains work so you know known things we know we can improve it not in terms of the algorithm itself so we compared the algorithm against roots data which is against our data center collections from hurricanes so we manually selected 35 data collection and we compared it against the top 35 that were returned by the Iraqi so we get an accuracy about eighty-two percent the precision and recall is about 60 no it's so cute um so
you know ideally you want hard decision a higher power but not only that lower happens to for such purposes if killed it to be at a hardly recall effort and it but that is much little lower frequency because the goal here is to make sure that everything that is important is victor as part of the search processing so maybe I shouldn't
demo this page so this is see you know
the information that we navigate it for the different hurricanes you have three
different views we still have people who like things in a tabular form insisted
on having a mystery there's a different
view of like a bubble you can see the different or put here year the different
storms in size actually the color is the category of respond and the size is the number of the information that we collected for the particular storm the
Sunburst use the same thing but you know
we're at doing a visual assets on as you
can actually can bring it down to pretty bit here you can see the stones based on
different categories and again that the angle of the size of small duration in this case then if you select a
particular song so this is all the aggregated information that we are gay
so this is from Wikipedia of this information the tabular information all
this is coming by parsing PDF ebooks things like this that are important for the users like how well they do the forecast for this particular storm whistles they reckon this is easy
brokers to insurance particular consequence yeah and passing the PDF is based on a rule basically this Bible process and then you have for the
particular stall itself now you have the actual data sets so these are the different data collections that you have and and do you have all the granules
that you can magically a list of that all so that you can download or studying
the protest all you can search based on keywords of instruments and to use again change the threshold irrelevant selective threshold so they take their you know potential is too high or too low and then are getting enough results then they can change the threshold oops
and that's not good sorry Bella
and you can select your individual
somewhat refreshing person if you select
in a neutral get a collection then you get Tony co-star legalized stuff so this is you know a kind of a different way of
gathering distributed information as depreciation
you
Feedback