Merken

Development of the GRASS/R interface - GIS and statistical data analysis

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
and then we proceed with the next 1 often presented by a world the from another villager School of Economics and Business Administration and there again in Norway the title of the although the graph are into the GA and and that the because they can I didn't OK good
morning this is the year that's equal to the demonstration that I gave yesterday what I'd like to do now is to reflect a little bit of listening to be very interesting presentations in in this section so far I found myself on the 1 hand being convinced on the other hand worrying and their worries of 1 of the reasons why an interface to a data analysis and statistical environment I think are important that 1 of the worries is whether the models that were getting overfitting to the data we have so that we don't really know how we can generalize the question about how do you generalize to a different city and use the same tools to extract features from a different scene from the 1 on which the models will base is important and the other 1 was about errors in data and in both of these cases access to a statistical
environment should allow us to prototype things
in in a in a sensible way what was the motivation for developing the interface was that it wasn't an interface designed to replace the uh the derivation of production of new modules with grass itself because for many purposes particular can computational efficiency is not sensible to have 2 systems running at once but when you cross prototyping something having access to a system which gives it gives you an and an interactive and interpreted environment rather than having them having to compile things makes it a little faster you can try different techniques in order to to to work and how they how they perform in relation to each other that in addition the R interface with initially thought of as being a way to look at data and look at the quality of the data in the way kind of way in which I was that was representing the dataset and yesterday pointing out the fact that the but damage was found in areas where there were no pine trees or where the data did not reveal the presence of mind trees although them perhaps were there because of the fact that the pine trees where the finest scale than that the resolution of the dataset being used and in the case of the mask pollution datasets that there's flood frequency registered from flood frequency did not seem to accord with where the points where the map so that again it's pointing up the fact that very often geographical data expensive to get hold of and that when we have got hold of them we should attempt to use them in a way which respects the fact that they may not be recorded accurately either as tribute or position so that was the reason for proof for working on an interface
and now run through a few points about where the interface and analyzed in terms of where it lies in relation to a GIS is is principally for the manipulation analysis and displaying and generating product and very often involves in in the traditional form of moving the data out to statistical and modeling packages so that that's why we are where we are the the stadium in format which I think we need we need to approaches which I found helpful to approach is firstly looking at data analysis is then going from data analysis is perhaps draw statistical conclusions and finally to move toward the spatial statistics of the area itself all of these 3 components of present now what I'm going to talk about very briefly as the spatial data analysis in in in a in a in a small capsule then GIS data models and the mapping into objects then running our within grasp this was the initial approach
and the building the rest of data based on on the the grass library which I demonstrated demonstrated yesterday that the spatial data analysis and the number of components in both of which we've touched on yesterday and today partly it's exploratory what's going on there what's around and the other 1 is do we have that do we have hypotheses from which we would like to generalize which we can test out on the that data we have to have with the obvious underlying assumption that if we can test them out and they should also be applicable to data which we haven't seen and which we haven't used to fit models but this I think it is 1 of the problems which is is very present in in in in our current JS practices that would fit to the data we have and we never really take the step further and say what would happen beyond our age this state wavelet is a problem because in the time series you can assume that the future which will be something like the present but with with that with the scene you don't you can't make the same assumptions that that you can go over the border between Texas and Mexico and things will be the same maybe they will maybe they won't but you making assumptions which are not necessarily justified and is 1 of the areas where they were spatial statistics are difficult but challenging and interesting interesting
area and also because of the fact that there were are package is devoted to relatively relatively a robust and uh and and well supported in spatial statistical methods this seemed interesting to to look at some of the most of these are nothing to do with me but my other my other love is used as the data which is collected the area laughter makes quite a lot of my time and what needs to be explained statisticians when you make presentations of people working in data analysis is that the is that spatial reference systems very important that tend to be new to them because they think of observations on observations observation indicates as without seeing it in it's complex and we live in a world which is more structured in this kind of way it becomes difficult it takes time to explain to them how to think how things work in space but when you get to it they come back and say OK well world you say you got this vector representation of these areas and that's great we've got these rosters and you've got the rest the resolution and you go around believing or treating the data so it was homogenous within your block by using invariant this is another area where you find the interaction with the statisticians very interesting because they can say OK we have 1 patient 1 patient is quite obviously units and give or take some microbes which is moving backwards and forwards the but your your already dealing with aggregated data and and it's not obvious how how we should handle that this is 1 of the areas where we're talking to statisticians is helpful because they remind that that our our Christmas of the kinds of assumptions were making analyzing data not necessarily justifiable it's not just our statistical system but are the community of working statisticians many of whom were doing that work on data which also spatio but you have different sensitivity to to others in the way in which they use the data the we tend to be happy when we got some data and they tend to be unhappy when they have to much data and when they can't control whether variations coming from that's an enormously interesting and I think very instructive paper and in the June number of the journal of the American Statistical Association on the support problem but what happens when you change from point support to block support uh and it's called the work that I I can't remember the exact title but it's called something like working with incompatible geographical spatial data and it's certainly worth looking at to see where a statistician would not how a statistician would approach that would approach the spatial data now I mean it are graphed interface with not more than just a technical fixes based on the the assumption that there are things which need to be sent to the statistical community about spatial data and the things which we can hopefully learned from from their experience of working with OWL kind of are kinds of data now this is the way in which the interface the interface began under President and it only supports rest inside data but not vector and this is something which needs to be needs to be addressed the way in which the interface worked for the way in which which you could see it working yesterday for those of you who were here was that I was running and are interactive session within the graph environment location that was then running on top of in this case the Linux operating system that it could have been whatever the examples of running on Windows as well and the the way in which the interface is nanostructured is that that it's using dynamically loaded modules which run through all of all of the layers that the present although all of the other layers can be used in that will when the interface is seen from graphs and it's important there to to sort of look at the the the the the type of work that's been done and in principle as I stressed yesterday I'm assuming that the analysis should take place in the current window using the resolution of the rest of data which is with with which is recorded in the current window the and there may be other attributes which need to accompany the the the the the rest of data across the interface it seems important when the data come back from the interface that they should look like should be that have the same kinds of attributes to the grocery store what have when they come back that despite the fact that there it's not necessarily the case that the the way they're being looked at enough so that at the moment and looking at it from the graph side and moved to the other side of the podium to indicate that I'm looking at it from the other side these criteria important course when using the interface in batch mode which is 1 of the things which I'd like to look at myself in more detail in another market has has done this think the people have as well which means that that you make the R interface disappear you don't use are interactively use are simply as a computer engine to which you send instructions in a batch file and that retrieve the data which is then put back into the into graphs when the interface is seen from the outside of moved the the the the outside of the podium it's not necessarily the same because it is is sensible to treat it as a rectangular rectangular rest array of array arrested cells as a rectangle or should take each cell be treated as an observation well it should be treated as an observation which is a vector in R is not necessarily an array so that there are a number of questions which I have to face looking at this which are not necessarily solve sensibly but it may be better to handle them as well as already props arrays with bounds arrays with RGB colors or something like that and this is this is something which would be looked at in in the future and 1 disadvantage of using data with our is that are graphics iVector-based so that when you display rest in our then it draws spent quite a lot of time drawing lots and lots of little boxes and filling in the colors which is obviously not an optimal thing to do even more the analysis will run very fast so that the there that there other things to do now and then back to the graph side but this is not just grass expect actually open GIS that 1 of the prime need that our has used for uh marking mapping package the probability that this sleep although I think that the synergies between the 5 1 vector model and providing a mapping package for R that in some way making sure that flight 1 vector model doesn't become to graphs that is also usable by by by other programs in the is important and this would be very useful statisticians who work with spatial data at the moment they have difficulties in in visualizing that the the way in which the interface worked when it was designed to this is then not now a two-year-old diagram is that there are also the subject in memory state space and continues to do this it continues to hold all of if you want to read it a 10 thousand by 10 thousand scene into ah it will read it in and a 10 thousand by 10 thousand datasets 1 vector very very very very long vector if your computer is large enough this isn't a problem and are used for medical imaging imagery but it it's an issue which needs to be faced is extensible to read the data and a further question which is the reason is it sensible to analyze all of the data from the same so fitting a model should you pick the middle of model on the whole scene or you get it on some
subsample of this and in discussions about data mining in the Arab community it's been pointed out that if if you're trying to make a pass parsimonious representation of uh a of the model to pick it is sensible to pick the model on all of the data and this is something which is foreign to us because we used to using all of the data but it may not be may not necessarily be the only way to think a we 1 consideration which deserves to be mentioned is that objects this is that people who make practically used the interface objects have specific rules for for the way in which names are constructed because past language the parser works in much the same way as are mapped out that if in are mapped out you we using uh variable a map layer name which you was using a hyphen there's not much coupling interpret the hyphen as a minor and try to subtract 1 not existing layer from another not existing layer so in the same way are interprets the names of objects and you have to be careful what you see in particular the underscore or is it a symbol you shouldn't use and the interface in intercept that would change that changes that to adopt I talked a little yesterday about classes could particle to talked about also talked about classes this is an area which is changing in R and S S evolves the classes are also evolving in this and this is an issue which which which will be treated in in development and the that a further issue which it is associated with a possible idea of having some kind of graph demon what do we do about metadata when data being moved between graphs and some other system but how do we ensure that the metadata are the 1st recorded can be transmitted and then not comforted by the other systems so that when they come back they still think so that if you're working in a location the data come back with essentially the same think that they had when they met this is done at the moment in this interface by saying you you not allowed to change the window and you're not allowed to change the resolution this is perhaps not a good way of doing it it it's also possible that the meta data should not be on a per-session basis that they should be on a per object in the same way that the cross trust can have their own windows so you could have different windows could be looked at these could then be transferred to new objects created from the existing objects if they have the right class attributes but it's not it's not immediately clear how this should be done the typical practice that you see in the there in the spatial statistics packages which had contributed to our is to put it mildly eclectic remember the geo-data of the spatial data has any position tributes apart from being numeric vectors that nothing to tell you about the projection nothing to tell you about anything essentially and and if the analyst is left to their own whim so to pick things in but provided the same analysts who remembers and as noted down what the projection was you probably still OK if you're trying to reconstruct the result in some little time afterward uh if if the person lost the notes and and the results are essentially they they can't be reconstructed the metadata is really very important
this is also in connection with the uh with uh and uh a point I think it's worth making about a shared philosophy of grass and are which is which came up in input on the presentation in a very clear way that which is both grass and seem to be concerned with replication of research results but it's something that the research community and the user communities who doing environmental work or other work we do tend to feel that it's a good idea if other people can check our results so use different techniques different methods to try and see whether the method made it the prediction the way it was 1 the prediction is robust so changes of methods and or to changes of data that and unless you recorded the metadata of course it's very difficult for somebody else to to to to to to come back and say well no I got a different result so yes my analysts others supports your conclusions we think that this is the way things are the way things are likely to go in with respect to the interface at the moment uh the other little unclear a movement is taking place 1 the pixmap package which I showed yesterday very briefly to move a possible and and ending up in in in in into the 2nd 1 is an early version of wrappers which have been created to the G . library by that in the case which also provides an interface which may may be a useful way to go the interesting thing about Kate's
work is that he has provided wrappers for fairly complicated library and the rappers seemed to work but with with with some some other considerations and that
their database access mechanisms which are maturing very fast connections provided 1 kind of interface and this is the 1 which is used in the victim of pixmap in library connections provide a very flexible interpreted environment for reading and writing text and binary files whatever you like to throw out the ark and read it and they were written in in order to provide wrappers for reading medical that when when different kinds of medical hardware produce images that very often in various standardized the format that the format of the area great deals you right it is little wrapper using connections to read in each each different kind of data and data can also be read from the Web so it would be possible to read graph data
from so far have been and interconnection to a pipes that GDL is is is is an important advance giving us a considerable amount of flexibility that that the the version which is available at the moment from from contemplates website has a number of interesting features 1 of which is that it also allows you to delete your grass let even if you've opened read only which I experience so that if somebody investigates his his his library please keep them away from the dataset of the moment because it work that both the DDI the packages and G values what are called style classes and you stop classes are not being used to things that for users to read and understand and so that that there is a consideration to whether whether we should go in that direction more or stay with the old style classes at the moment that the interface is written using old old-style classes and the the things that I'm planning to do and would welcome indications as to what is important to at the move the be interpreted mode interface this is the 1 which doesn't know that you're running graphs that to use connections mechanisms rather than at the moment it users are out ASCII some other variants of after that the right things and then read them back in using using an ASCII read mechanism that can be done in binary which would be passed maybe they should migrate to new style classes you stop + is much stronger in terms of what they require things and what they can expect objects to have also started classes said that and object can have a class that doesn't have to to be the have the class the class can have certain attributes that it doesn't have to new style plans say every object has to have a class no object can have 2 classes and each object of a particular class has to have and must have failed at tributes for a specified list of attributes of the previously was an archaic and now it's become question but but there are reasons for doing it that way because if you write functions for classes and then you always have to check to see whether the attributes to present a lot and if you can assume that for this class all of these attributes have to be present it makes writing function very much easier this this thing for object structures so that maybe this would be something something which would be sensible to do although probably it will be done by right leaving of continuing to develop present interface the way that it is developing a new generation interface which uses the new style the stock prices it needs to be extended to vector that but she had a very interesting possibilities of the open GIS and graph community to make a contribution to our itself by helping provide a mapping package into into our they help us we can help that we can show them how things can be done in the free software and and finally as I've been asked to comment on what to do about more demanding data structures that treaty with time I support time-varying and time-varying nicely for medical research of course they needed they do medical imaging which is quite often very similar question which arises is it sensible to do model fitting on that volume of data and if we're trying to do exploratory work on the volume of data can we do it by segmenting entirely moving chunks of June of data it into our and working on those exploratory rather than trying to move everything in given that at the moment at least until 2004 will be assuming that the whole object is in memory and that is the is open question that is very interesting and it's interesting not just for for us but the other people in your computer's your theory community also working on clusters of computers on very large datasets medical images typically have the same kinds of sizes balanced that's where things are at the moment of the grateful for for comments and contributions and the it can also contact me by e-mail if something that that that that you think of the year somebody else have a question query and and and could help improve and the interface which are used and that would the game plan entropy questions and comments
I have a question regarding the management of large that in you mentioned that it's possible to merge them into either by 2 ways 1 of them is lower than that of the common ground and that they can run the will the them from was of the URL and can also what is not a memory efficient and some of them in the the amount from a computer or the probability of relations do they have the same that ended same amount of memory and assume that it's possible to use in SQL directly when loading the data which concluded that the government was going to be and but this is like that and that's what the questions of you talk to move from the wrong question if this is this is the case where the field is moving fast fast memory is getting cheaper to the people wrote exceeds like intake wrote interface was rescued from and that uses a proxy the data frames that you look at that the data is going to be our but in fact it is you're looking at uh a window and you can then choose choose think the the window to the you don't call in the whole data objects reporters here but he was a great well interface not no longer and is being migrated to some of the same techniques have been migrated into the PVI package mechanism which hopes to have some of the same the features that doesn't necessarily and 1 of the reasons why it doesn't necessarily mean that when your wife when when the when on the discussion of the plants that question the answer is well which is the breakdown that you have more time to do development yourself that contribute that that handles which is needed to have a the data objects which you only get it they're all of the anybody small memory and and and then not up to the budget of the person involved the very poor but have lots of time than they could contribute code that we used PostgreSQL I think on balance it's going to it's going to go both ways but for many analytical purpose is helpful to have the object in memory on the other hand on the other hand there some selection techniques that be able to take blocks of the of the data objects and look at those is also very likely to be included in that isn't isn't completely in place here so there isn't a good idea OK and questions OK thank you for your entire presentation
Graph
Vorlesung/Konferenz
Demoszene <Programmierung>
Bit
Informationsmodellierung
Datenanalyse
Vorlesung/Konferenz
Statistische Analyse
Garbentheorie
Programmierumgebung
Schnittstelle
Fehlermeldung
DoS-Attacke
Addition
Subtraktion
Punkt
Ortsoperator
Relativitätstheorie
Derivation <Algebra>
Physikalisches System
Biprodukt
Modul
Frequenz
Komplexitätstheorie
Verdeckungsrechnung
Netzwerktopologie
Mapping <Computergraphik>
Flächeninhalt
Beweistheorie
Vorlesung/Konferenz
GRASS <Programm>
Ordnung <Mathematik>
Programmierumgebung
Schnittstelle
Prototyping
Bildauflösung
Punkt
Datenanalyse
Zahlenbereich
Term
Statistische Hypothese
Computeranimation
Demoszene <Programmierung>
Informationsmodellierung
Bildschirmmaske
Zeitreihenanalyse
Programmbibliothek
Vorlesung/Konferenz
Zusammenhängender Graph
Schnittstelle
Analysis
Relativitätstheorie
Statistische Analyse
Strömungsrichtung
Biprodukt
Mapping <Computergraphik>
Objekt <Kategorie>
Flächeninhalt
Wavelet
Dateiformat
GRASS <Programm>
Aggregatzustand
Resultante
TVD-Verfahren
Sensitivitätsanalyse
Punkt
Momentenproblem
Datenanalyse
Formale Sprache
Selbstrepräsentation
Computer
Ungerichteter Graph
Komplex <Algebra>
Raum-Zeit
Computeranimation
Gebundener Zustand
Metadaten
Bildschirmfenster
Vorlesung/Konferenz
Array <Informatik>
Schnittstelle
Bildauflösung
Softwareentwicklung
p-Block
Strahlensätze
Rechter Winkel
Festspeicher
Projektive Ebene
URL
Programmierumgebung
Message-Passing
Aggregatzustand
Ortsoperator
Quader
Invarianz
Klasse <Mathematik>
Rechteck
Interaktives Fernsehen
Zahlenbereich
Zellularer Automat
Technische Informatik
Kombinatorische Gruppentheorie
Data Mining
Demoszene <Programmierung>
Informationsmodellierung
Netzbetriebssystem
Stichprobenumfang
Datentyp
Luenberger-Beobachter
Softwareentwickler
Speicher <Informatik>
Bildgebendes Verfahren
Analysis
Attributierte Grammatik
Mathematik
Graph
Symboltabelle
Statistische Analyse
Vektorraum
Physikalisches System
Parser
Elektronische Publikation
Modul
Quick-Sort
Objekt <Kategorie>
Mapping <Computergraphik>
Diagramm
Flächeninhalt
Basisvektor
Gamecontroller
GRASS <Programm>
Partikelsystem
Kantenfärbung
Dämon <Informatik>
Stapelverarbeitung
Einfach zusammenhängender Raum
Resultante
Subtraktion
Punkt
Momentenproblem
Mathematik
Güte der Anpassung
Versionsverwaltung
Onlinecommunity
Kombinatorische Gruppentheorie
Ein-Ausgabe
Metadaten
Prognoseverfahren
Datenreplikation
Wrapper <Programmierung>
Programmbibliothek
Vorlesung/Konferenz
GRASS <Programm>
Programmierumgebung
Schnittstelle
Web Site
Subtraktion
Momentenproblem
Klasse <Mathematik>
Versionsverwaltung
Automatische Handlungsplanung
Zahlenbereich
Ungerichteter Graph
Computerunterstütztes Verfahren
Term
Physikalische Theorie
Richtung
Benutzerbeteiligung
Spieltheorie
Binärdaten
Wrapper <Programmierung>
Ausgleichsrechnung
Programmbibliothek
Vorlesung/Konferenz
Indexberechnung
Spezifisches Volumen
Datenstruktur
Cluster <Rechnernetz>
E-Mail
Bildgebendes Verfahren
Attributierte Grammatik
Schnittstelle
Einfach zusammenhängender Raum
ATM
Lineares Funktional
Kraftfahrzeugmechatroniker
Hardware
Graph
Datenhaltung
Abfrage
Mailing-Liste
Mapping <Computergraphik>
Objekt <Kategorie>
Generator <Informatik>
Flächeninhalt
Offene Menge
Festspeicher
Dateiformat
GRASS <Programm>
Entropie
Ordnung <Mathematik>
Programmierumgebung
Lesen <Datenverarbeitung>
Kraftfahrzeugmechatroniker
Proxy Server
Rahmenproblem
Güte der Anpassung
Relativitätstheorie
Systemaufruf
Computer
Analytische Menge
p-Block
Kombinatorische Gruppentheorie
Code
Summengleichung
Objekt <Kategorie>
Datenfeld
Festspeicher
Trennschärfe <Statistik>
Bildschirmfenster
Vorlesung/Konferenz
Softwareentwickler
Verkehrsinformation
Schnittstelle

Metadaten

Formale Metadaten

Titel Development of the GRASS/R interface - GIS and statistical data analysis
Serientitel Open source GIS - GRASS user conference 2002
Anzahl der Teile 45
Autor Bivand Roger
Lizenz CC-Namensnennung - keine Bearbeitung 3.0 Deutschland:
Sie dürfen das Werk in unveränderter Form zu jedem legalen Zweck nutzen, vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/21766
Herausgeber University of Trento
Erscheinungsjahr 2002
Sprache Englisch

Technische Metadaten

Dauer 28:02

Inhaltliche Metadaten

Fachgebiet Informatik

Ähnliche Filme

Loading...