key: cord-0009670-yw3nlips authors: Petřík, Petr; Bruelheide, Helge title: Species groups can be transferred across different scales date: 2006-06-12 journal: J Biogeogr DOI: 10.1111/j.1365-2699.2006.01514.x sha: c450b6a5839eb31e8ea23ea7ffa759e72a7fd666 doc_id: 9670 cord_uid: yw3nlips Aim To test whether species groups (i.e. assemblages of species co‐occurring in nature) that are statistically derived at one scale (broad, medium, or fine scale) can be transferred to another scale, and to identify the driving forces that determine species groups at the various scales. Location Northern Bohemia (Czech Republic, central Europe) in the Ještědský hřbet mountain range and its neighbourhood. Methods Three data sets were sampled: a floristic data set at the broad scale, another floristic data set at the intermediate scale, and a vegetation data set at the habitat scale. First, in each data set, species groups were produced by the COCKTAIL algorithm, which ensures maximized joint occurrence in the data set using a fidelity coefficient. Corresponding species groups were produced in the individual data sets by employing the same species for starting the algorithm. Second, the species groups formed in one data set, i.e. at a particular scale, were applied crosswise to the other data sets, i.e. to the other scales. Correspondence of a species group formed at a particular scale with a species group at another scale was determined. Third, to highlight the driving factors for the distribution of the plant species groups at each scale, canonical correspondence analysis was carried out. Results Twelve species groups were used to analyse the transferability of the groups across the three scales, but only six of them were found to be common to all scales. Correspondence of species groups derived from the finest scale with those derived at the broadest scale was, on average, higher than in the opposite direction. Forest (tree layer) cover, altitude and bedrock type explained most of the variability in canonical correspondence analysis across all scales. Main conclusions Transferability of species groups distinguished at a fine scale to broader scales is better than it is in the opposite direction. Therefore, a possible application of the results is to use species groups to predict the potential occurrence of missing species in broad‐scale floristic surveys from fine‐scale vegetation‐plot data. We know that the distribution patterns of species are shaped by various factors that might differ across scales (e.g. Pearson & Dawson, 2003) . In addition, with varying principal scale entities (grain, extent, and focus) the results of studies will differ (see Gurevitch et al., 2002) . In studies of macroecological processes, extent and focus are the most important components (Blackburn & Gaston, 2002) . Species relationships are features that can be compared in samples across different scales. When the objective of a study moves from the analysis of single-species patterns to the analysis of species assemblages across scales, a further difficulty is data inconsistency. In most cases, the available data sets at the different scales will differ in species composition. This is mainly because at broader scales more species are involved than at finer scales, but also because the effort of data sampling at smaller scales often forces the researcher to focus on a restricted number of species. One way to overcome this difficulty is to use the smallest subset of species for which information is consistent across all scales. However, this type of data-set truncation has the disadvantage that changing species relationships with scale are ignored. For example, if at the finest scale only woodland species have been sampled and the data set is restricted to this set, no change of species co-occurrences between forest and grassland species will be detectable. We have employed both the approach of a fixed subset of species and the use of species groups of variable species composition across scales. It has already been demonstrated that inter-specific associations defined by species groups can be compared across scales (Bruelheide & Chytrý, 2000; Kuželová & Chytrý, 2004) . In general, species of the same species group share similar ecological and spatial requirements at a given scale. Species groups that emerge at different scales (here termed spatially stable species groups) might be particularly useful for scaling data. The spatially stable species groups that display the same pattern at different scales are likely to depend on the same environmental factors at each scale, including climate, topography, land use, soil conditions, and biotic interactions. If the co-occurrence of species is incidental at a particular scale (e.g. one species might depend on particular chemical soil conditions while another might depend on a favourable microclimate), this co-occurrence will disappear at a different scale, as environmental factors do not co-vary consistently across scales (Pearson & Dawson, 2003) . Conversely, all species of a spatially stable species group can be expected to depend on the same principal environmental factor at a particular scale. In other words, the relative importance of environmental factors at a particular scale should be similar for all species in a spatially stable species group. In contrast, the different species composition of a species group at a different scale should reflect differences in the relative importance of environmental variables between these scales, resulting in different scaledomains (see Wiens, 1989) . There are many data sets in which information on the occurrence of plant species or vegetation samples are stored digitally. Floristic data sets have the advantage that they rely on systematic sampling with a defined extent and grain. They are available mainly at coarse scales and have only rarely been produced at finer scales (e.g. Scheller, 1989) , however, and they are increasingly less complete at broader scales (e.g. 20% completeness of the Atlas Florae Europaeae mapping, see Kurtto et al., 2004) . There is a need for methods that can provide estimates of the completeness of floristic surveys, for example for detecting overlooked species or for reducing sampling effort. Such methods might use information on species relationships from a different source, that is, from vegetation data. At present, around one million vegetation records exist (Ewald, 2001) . These data have the disadvantage that they were collected using different sampling designs and for different purposes (e.g. Mucina et al., 2001) . Their main purpose is to describe vegetation at the regional scale, for example for assessing conservation status, risk assessment, or preparation of vegetation maps. Another goal is to produce a broad-scale classification from vegetation plots (relevés) of different regions, i.e. to follow a bottom-up approach. It is the aim of the International Association of Vegetation Science (IAVS), Ecoinformatics Working Group, which was formed in 2003, to facilitate access to these large vegetation data sets (http://www.bio.unc.edu/faculty/peet/vegdata/). In addition, the on-going project SYNBIOSYS deals with the integration of differently scaled data sets at the species, vegetation and landscape level scattered throughout Europe . A convenient method to extract species groups from independent data sets is the COCKTAIL algorithm (Bruelheide, 1995 (Bruelheide, , 2000 . So far, this method has only been applied to vegetation data sets; our study is the first example of its use with floristic data as well. For vegetation data, the COCKTAIL method has successfully produced species groups across various scales (Kuželová & Chytrý, 2004) , and across various plant syntaxa at a regional scale (montane grasslands by Bruelheide, 1995; limestone grasslands by Jandt, 1995 and Jandt, 1999 ; deciduous forests by Pflume, 1999 ) and a national scale [wet grasslands by Bruelheide & Chytrý, 2000; dwarf spike-rush (Eleocharis) communities by Täuber, 2000; rock-outcrop dry grasslands by Chytrý et al., 2002b] . In this study, three data sets gathered in the same area have been investigated at three different scales. First, a floristic broad-scale data set (BD) has a coarse grain and an extent of a whole mountain range. Second, a floristic meso-scale data set (MD) consists of a grain four times finer than of the BD. Third, a vegetation fine-scale data set (FD) is compiled from phytosociological relevés. All data sets differed greatly in grain but only slightly in extent. Our main questions are (1) whether the vegetation data set (i.e. the fine scale) can be used to predict the results of floristic mapping (at the broad scale) or vice versa, and (2) which environmental factors characterize the species groups at the various scales. This is very important to know if we want to compare different vegetation and floristic data sets, i.e. data sets of different grain, across scales. To our knowledge, this is the first attempt to use species groups for such a purpose. The field survey was carried out in the Ještědský hřbet Range (northern Bohemia) and its neighbourhood (Fig. 1 ). The range extends in a NW-SE direction (between 14.87°and 15.06°E, and 50.68°and 50.82°N) along a geological fault dividing sediments poor in minerals in the west (sandstone) and granite with Quaternary sediments in the east by a geologically complicated part in the centre of the mountain range consisting of different types of metamorphic bedrock (schist, quartzite, and dolomitic limestone), plutonic rocks (basalt), and sediments (loam and loess, chalk and marl). With altitudes from 270 to 1012 m a.s.l., the terrain has a substantial influence on the micro-and meso-climatic conditions. The north-west and central parts of the study area are colder and drier and floristically characterized by forest species with a suboceanic distribution range, while thermophilous continental elements are typical of the warmer south-east part. The actual vegetation consists of a mosaic of managed meadows and spruce forest plantations with remnants of beech forests, which are the potential natural vegetation. The floristic data sets were obtained by systematic mapping by recording species presence within defined grid cells from 1998 to 2004. Each basic grid cell (1/256 of the Central European Basic Area -CEBA, i.e. c. 0.52 km 2 ) was visited at least twice to cover the early spring and the summer periods. The time spent on a single grid cell depended on the environmental heterogeneity of the given cell and varied between one and two days. Some grid cells, mainly marginal ones, which were sampled with less intensity, were excluded from the analysis. The basic data base (not analysed here) contained information on the presence/absence distribution of 1082 vascular plant taxa (including hybrids) within 213 basic grid cells. The BD was derived from this data set after data unification (see below). The MD was obtained by additional field mapping of 852 grid cells (1/1024 CEBA, i.e. c. 0.13 km 2 ) for 547 selected species. These species were either diagnostic species of phytosociologic units (mostly at the level of alliances; Chytrý & Tichý, 2003) or anthropophytes, conserved, endangered or rare species (< three localities within the surveyed area). In the analysis, 47 species had to be excluded from the BD and MD because it was not certain whether they had escaped from cultivation (planted species were not recorded in the field). In addition, all 52 hybrids in the BD and MD, and 173 cryptogam taxa in the FD were excluded, as they were not recorded systematically. In sum, 37 species aggregates were used to unify inconsistent field recordings and to minimize determination bias. After this data adjustment, 807 species in the BD, 513 species in the MD, and 722 species in the FD were used for analysis. In total, 1141 phytosociological relevés (cryptogams excluded) sampled according to the Braun-Blanquet approach (Braun-Blanquet, 1921) were gathered from five authors (see Acknowledgements) and combined into the FD. The relevés covered mainly forests, clearings and grassland vegetation, and were located in the study area (within 80% of all 213 grids of the BD) or in the nearest surroundings (60 relevés from 22 adjacent grid cells). The relevés in the FD represented 19 vegetation types (according to Chytrý et al., 2001) : beech, oak-hornbeam and spruce forest plantations (41%), meadows and pastures, submontane and montane Nardus grasslands (19%), clearings (15%), herbaceous ruderal and anthropogenic vegetation (9%), dry grasslands and forest fringe vegetation (6%), vegetation of vernal therophytes and chasmophytic vegetation on cliffs (4%), springs and acidic moss-rich fens (3%), reed and tall sedge beds and alder carrs (2%), mesic scrub (1%), and others (subalpine tall-herb vegetation, Fig. 1 Study area covered by the mapping grid used in the broad-scale data set (BD). The finer resolution used in the meso-scale data set (MD) is indicated by the arrow. macrophyte vegetation of naturally eutrophic and mesotrophic still waters, vegetation of river banks, riverine willow scrub, and the unclassified relevés). The grain of the FD was much finer than that in the BD and MD (sample areas from 1 to more than 900 m 2 ). The most frequent sample areas were between 20 and 25 (or 50 and 100) m 2 for grasslands, and 300 and 500 m 2 for forests. The FD had 89% of its species in common with the BD and 60% with the MD. In order to analyse the effect of the different species numbers of the three data sets, truncated versions of the FD and BD were produced by selecting 513 species, corresponding to the species number of the smallest data set (the MD). Species nomenclature follows Kubát et al. (2002) , and vegetation terminology follows Chytrý & Tichý (2003) . All data sets were stored in a turboveg program . To produce plots (groups of samples) in each data set, the COCKTAIL species group method (Bruelheide, 1995 (Bruelheide, , 2000 was used. The method produces a group of species whose joint occurrences are more frequent than would be expected in a random species distribution in the data sets. The COCKTAIL method works with presence/absence data and is therefore appropriate for data sets with varying species abundances. First, we started the algorithm with a single arbitrarily chosen species, which gave the name to a species group. The first species was always chosen on the basis of locally known conditions as it reflects the real situation better than choices based on nationally defined groups (Chytrý & Tichý, 2003) . Furthermore, the species contained in the smallest data base (MD) were selected with priority for starting a species group in order to have as many as possible common species in the species groups among all data sets. Second, further species were added to the species group if their association to the one or more species in the group exceeded a certain fidelity threshold. As a fidelity measure we used the U-coefficient, recommended by Chytrý et al. (2002b) if data sets of unequal sizes are to be compared. The U-coefficient of association describes the correlation between two categorial factors in a 2 by 2 contingency table (Sokal & Rohlf, 1995) . A positive value of U means that there is a positive correlation between a species and an existing species group. Based on preliminary runs we chose an arbitrary threshold value of U ¼ 0.5 for including new species into a group. The value of this threshold determines the size of the species group, and the lower the threshold, the more species are included in a group. A value of U ¼ 0.5 was low enough to yield groups of more than two species but high enough to prevent species groups from becoming too large and, thus, from becoming too environmentally unspe-cific. However, a choice of U ¼ 0.4 or 0.6 resulted in the same overall pattern. A problem of the species groups' method is the selection of the initial species with which a group-forming algorithm is started. However, we overcame this problem by choosing the same initial species at all scales, and using identical threshold values. In this way, we were able to focus on comparable sections of data sets with very different species compositions. A similar approach is also recommended for vegetation classification (Chytrý et al., 2002a) . In most cases, however, the same species group is obtained irrespective of which initial species of the group is chosen to start the algorithm (Bruelheide, 1995) . The COCKTAIL criterion for allocating a plot to a species group is that a certain minimum number of species from the species group must be present in this plot. This minimum number is also defined statistically, using cumulative distribution functions, which ensure that fewer plots always belong to a species group than would be expected if the species in the group were distributed randomly among plots (Bruelheide & Jandt, 1995; Bruelheide, 1995 Bruelheide, , 2000 . Only the species groups that had at least three plots assigned to them in a data set were used for further analysis. The final number of species groups reflected the main environmental gradients both in the study area and in the set of the relevés included in the FD. In a few cases, we obtained several species groups with slightly different species composition but with essentially the same plots assigned to them. To avoid this problem a new species group was recognized as different from another one if it had at most one or two species in common. No species group typical of water habitats was formed, as these habitats are very rare and did not provide enough diagnostic species in the data sets. The final results of the COCKTAIL species group method were clear assignment criteria of how many species of a certain species group, plot, or grid cell must be present to be allocated to this group. The crucial advantage over alternative approaches is that these assignment criteria can be transferred from one data set to another. Third, correspondence between the groups formed in one data set and the groups resulting from application from other data sets to the data set under consideration was also calculated using the U-coefficient, as described in Bruelheide & Chytrý (2000) . As the BD and the MD were not independent (the species found only in the MD were added to the BD), the correspondence between them was not investigated. In contrast, the FD was fully independent from the floristic data sets (BD and MD). The above-described steps were performed for all three data sets and also for the truncated BD and FD data sets. All these operations were performed using software juice (Tichý, 2002) . For all BD and MD cells, two principal variables were derived from a digital elevation model (DEM): mean altitude, and mean potential direct solar irradiation (PDSI). The PDSI describes how much incident radiation a grid cell receives during a certain period and is calculated from slope and aspect of the terrain, taking into account shading of the grid cell by the horizon [see Conrad (2002) for further details]. Furthermore, eight bedrock types according to their influence on vegetation were used: granite and acid volcanic rock; metaconglomerate; quartzite and silicates; metamorphosed pelite (i.e. phyllite and mica schist); basic volcanic rock and metamorphic rock (i.e. green schist and amphibolite); Quaternary sediments (i.e. alluvium and colluvium); chalk and marl; loam and loess. In addition, a particular unit (anthropogenic substrates) was used for the FD. Land-cover information was only included through distinguishing forested areas from open land. In the FD, the cover of the tree layer in relevés was used as land-cover information. The PDSI in the BD and MD was calculated using the program digem (Conrad, 2002) ; in the FD it was obtained using the program pot_rad (downloadable from http://botany.natur.cuni.cz/cz/studium/pot_ radiace.php; for a discussion of the method see Herben, 1987) . All GIS operations were compiled in arcview 3.2 (ESRI, 1999) with the Spatial Analyst extension. For an overview of all environmental variables see Appendix S1 in the Supplementary Material. The relationship between environmental factors and species composition was explored using canonical correspondence analysis (CCA) with downweighting of rare species, forward selection and unrestricted Monte Carlo permutation tests (499 permutations, P < 0.05) in canoco (ter Braak & Šmilauer, 2002) . Supplementary (passive) variables for each sample without any influence on the analysis were used to include some common species-related features: species number; Shannon-Wiener diversity index; and Ellenberg indicator values (see Ellenberg et al., 2001) for light (L), temperature (T), soil moisture (F), continentality (K), soil reaction (R), and soil nitrogen or productivity (N). To account for autocorrelation, grid cells in the BD and MD were coded as x and y in a Cartesian coordinate system (see e.g. Titeux et al., 2004) . The statistical significance of the coordinates in single (x, y), in double (x 2 , y 2 , xy) and in triple (x 3 , y 3 , x 2 y, xy 2 ) combinations was tested using forward selection. As all these combinations were statistically significant, they were all used as covariables in CCA. In the FD, two data sets were chosen for CCA: one with weighted species abundances (FD cv ) according to seven classes in the degree of species cover (1%, 2%, 3%, 13%, 38%, 68%, and 88%, square-root transformation), and a second one (FD pa ) with presence/absence data only. Species groups formed at various scales All species groups produced with their complete species composition and characteristics are listed in Table 1 . Origin-ally, 25 well-defined species groups were formed at the fine scale (not shown), but for various reasons it was not possible to form corresponding species groups at the other scales. Only six species groups were found that occurred in all data sets (i.e. at all scales). A general trend was that the species groups formed in the MD were poorer in species than those formed in the BD and FD. In addition, the broader the scale, the more plots were assigned to a species group. In the BD, 12 species groups were detected. Species number varied between 2 and 48 species, and, at maximum, 95% of the BD cells were assigned to a species group. In the MD, six species groups were formed, with species number varying between 2 and 21. The largest species group comprised 50% of all MD cells. In the FD, 12 species groups were recognized, with species numbers ranging between 2 and 17. At maximum, 33% of plots were assigned to a species group. In the case of the analysis of the truncated data sets, only five species groups across all scales could be formed. In one case (the GaleLute group) in the BD, the U-value had to be reduced to 0.48 instead of 0.5. Table 2a shows the comparisons of fidelity coefficients between species groups derived from the BD and applied to the FD (downwards in scale), and Table 2b those for species groups derived from the FD and applied to the BD (upwards in scale). Scaling in a top-down direction generally resulted in lower correspondence (mean U-coefficient in the diagonal of the U-correlation matrix was 0.57) than scaling bottom-up (mean U ¼ 0.74). In addition, the proportion of plots assigned by species groups in BD that were correctly assigned in FD was 0.05 compared with 0.43 in the opposite direction. This means that transferability of species groups was better in the bottomup than in the top-down direction. The comparison between the MD and the FD gave similar results (Table 3) . Nevertheless, there were fewer species groups and smaller proportions of plots assigned by species groups as a result of the smaller species number in the MD compared with the BD. The comparison of truncated data sets resulted in a similar general pattern, i.e. a better bottom-up than top-down transferability of the species groups (Table 4) . The difference in the transferability of species groups (D in Table 2 ) between bottom-up and top-down directions was negatively related to the size of species groups. The more species in a group, the more different downward scaling was from upward scaling (R 2 ¼ 0.80, n ¼ 12, P < 0.01). The groups in which this difference was the highest were the groups richest in species: the PlanLanc group (difference in U ¼ 0.78), and the CentScab group (difference in U ¼ 0.36). All studied variables had a significant influence on explaining floristic variation at all scales. In all data sets, the most important variables in CCA were size of forest area (tree cover), mean altitude, and mean PDSI ( Table 5 ). The effects of bedrock types did not differ substantially across scales: the most important substrates were Quaternary sediments, chalk and marl, and silicates. The anthropogenic substrate was very important at the fine scale, but this category was not distinguished at the other scales. Mean PDSI was negatively correlated with forest (tree) cover at all scales. Altitude revealed no relationship with mean PDSI at the fine and the meso-scales, but showed a negative correlation at the broad scale (ordination diagrams see Fig. 2 ). Spatial autocorrelation increased with grain: 18% of the total variation in the BD was explained by the spatial covariables, in comparison with only 3% in the MD. Species number and Shannon-Wiener diversity index in grid cells were negatively correlated with forest (tree) cover, as were the Ellenberg values for light, temperature, and continentality. The Ellenberg values for soil reaction were positively correlated with the occurrence of chalk and marl and Quaternary sediments at fine and broad scales, but not at the meso-scale. The Ellenberg values for soil nutrients and soil moisture did not show any stable pattern. Abbrev, abbreviation of the group in the text; Size, number of species in a species group; Min no, minimum species number required for a plot to be assigned to a species group; Rel fr, relative frequency of plots assigned to the species group. In the FD cv (Fig. 2a) , most variability in the data was explained by tree cover, mean altitude, and presence of Quaternary sediments, chalk or marl, and anthropogenic substrates. The cumulative percentage of variance in species composition explained by all variables was 5.6%. In the FD pa (Fig. 2b) , the cumulative percentage of variance explained by all variables was almost twice as high (10.7%), indicating that the environmental variables were more suited to explain the presence of species than their abundance. The percentages of explained variation of the variables did not differ substantially between FD cv and FD pa , with two excep-tions: in the FD pa , the importance of PDSI was more than doubled and the importance of silicates was almost halved compared with the FD cv (Table 5 ). In addition, the effect of altitude was decomposed into more axes. In the MD (Fig. 2c) , forest cover, altitude, presence of chalk, marl and Quaternary sediments explained most of the floristic variation. The cumulative percentage of variance explained by all environmental variables was the lowest (2.4%) of all the data sets owing to the lowest species number. In the BD (Fig. 2d) , the same environmental variables as in the MD contributed to explaining floristic variation, but the mean PDSI had a much lower influence. The cumulative percentage of floristic variation explained by all the environmental variables was highest (15.0%) in the BD. A common problem of vegetation data sets is a preferential sampling design, which results in biases of various types. In general, some vegetation units (mainly species-poor, transient, rare) are under-represented in data sets compared with what they would have been in systematic or randomized sampling design (Chytrý & Rafajová, 2003) . In our study, this is the case with coniferous forest plantations, which cover nearly half of the study area, and with degraded meadows either transformed from desiccated wetlands or from abandoned fields. However, the inclusion of additional species-poor relevés from these habitats would not significantly influence the criterion of species groups as this method is not sensitive to dataset enlargement by relevés that contain species that are otherwise absent or rare in the data set (Bruelheide & Chytrý, 2000) . A further consequence of a preferential sampling bias is the existence of over-sampled, clumped areas embedded in an under-sampled matrix. Thus, 46% of all relevés in the FD are included in only 17% of all grid cells in the BD. Nevertheless, these cores are distributed relatively regularly throughout the region and they also cover all main vegetation types (see Materials and methods). The bias mentioned might have influenced the species composition of the two most frequent groups (the VaccMyrt and PlanLanc groups), as these were formed by ubiquitous species occurring either in forests or on open-land. In most groups, the frequency of relevés assigned to species group in the FD tends to agree with what is observed in the field and corresponds to frequencies that we would expect to be encountered in a stratified sampling design. An additional problem is the difference in the extent of data sets, which might have affected species composition and size of species groups. Under the assumption of a random occurrence of species and a given relative frequency of species in the data set, the number of co-occurrences will increase with the number of species. Therefore, larger species groups are expected to be formed in data sets with a higher total species number (the BD and FD in this study). In total, 65 species (45%) chosen as diagnostic in the species groups are not Sp group, species group; U, phi-coefficient multiplied by 100; Prop, proportion of plots assigned by species groups in one data set that are correctly assigned in the other data set; D, difference in transferability of the groups across the scales (absolute difference in U-values). Sp group, species group; U, phi-coefficient multiplied by 100; Prop, proportion of plots assigned by species groups in one data set that are correctly assigned in the other data set; D, difference in transferability of the groups across the scales (absolute difference in U-values). present in the MD. However, this is a problem mainly in the PlanLanc group (over 90% of its species are not present in the MD) and in the VaccMyrt group (all species are absent in the MD). For this reason it is not possible to form these groups in the MD. On the other hand, there are nine other groups in which all species (e.g. the ArunDioi group) or almost all species (e.g. the CallVulg group) are present in all three data sets. In addition, a different size of a species group does not necessarily result in a different selection of plots in which the species groups are present, as our assignment criteria involve a minimum number of species that depends on the total species number of the species group (Bruelheide & Jandt, 1995; Bruelheide, 2000) . Without this feature (i.e. with a fixed minimum number of species), the correspondence of plot assignment would have been much lower between data sets (i.e. between scales). Even with similar numbers of species, the species groups vary in species composition. On the one hand, this is a result of the bias in data sets mentioned above; on the other hand, this difference is inevitable even in balanced data sets, as relative frequency of species varies with scale (Kunin, 1998) . It is an advantage of the COCKTAIL method that different species compositions of groups can be handled by the optimization procedure. The different sizes of the relevés included in the FD might also influence the classification results (Chytrý, 2001) , but no analysis was carried out on how relevé size influences the formation of species groups by the COCKTAIL method. However, the intra-dataset variation in grain in the FD (± 200 m 2 ) is negligible compared with the 650-times greater grain used in the MD and the 2500-times greater grain in the BD. Sp group, species group; U, phi-coefficient multiplied by 100; Prop, proportion of plots assigned by species groups in one data set that are correctly assigned in the other data set; D, difference in transferability of the groups across the scales (absolute difference in U-values). Our results show that not only species patterns (e.g. Kunin, 1998) but also species co-occurrences (species groups) are scale-dependent. Ecologically specialized groups such as the ArunDioi group, the EpipMicr group, the GaleLute group and the CareDava group are less scale-dependent (i.e. spatially stable) than groups with wide ecological amplitudes. The ecologically specialized group contains mostly species occurring in specialized (extra-zonal) habitats in the study area (such as wet meadows, calcareous beech forests on rendzinas, ravine forests; see Table 1 ). Such species are also characterized by more extreme indicator values according to Ellenberg et al. (2001) . Specialized species occur in the same habitats at the fine scale, and as these habitats are rare also at the broader scale, the species tend to co-occur at all scales. This explains why the species groups of extra-zonal habitats are spatially stable compared with those of common habitats, such as fringe vegetation, ruderal habitats, and mesophilous meadows. In contrast, species with wide ecological amplitudes (e.g. the PlanLanc group) can form well-defined species groups at broad scales by co-occurring in the same grid cells, as it is probable Fig. 2 Ordination diagrams depicting the first two axes of canonical correspondence analysis of (a) the fine-scale data set FD cv with species coverage, (b) the fine-scale data set FD pa with presence/absence occurrence, (c) the meso-scale data set MD, and (d) the broad-scale data set BD. Explained variability: (a) 1st axis ¼ 1.8%, 2nd axis ¼ 1.1%, total variance ¼ 5.6%; (b) 1st axis ¼ 4.3%, 2nd axis ¼ 1.8%, total variance ¼ 10.7%; (c) 1st axis ¼ 0.6%, 2nd axis ¼ 0.4%, total variance ¼ 2.4%; (d) 1st axis ¼ 5.1%, 2nd axis ¼ 2%, total variance ¼ 15%. The passive variables are in grey; for their abbreviations, see Materials and methods. that at least one of a larger number of habitats occurs in this grid cell, but can co-occur at fine scales (i.e. in relevés) only to a limited degree, as a certain area in one habitat can only contain a limited species number as a result of dispersal limitation and local exclusion mechanisms (Gause, 1934) . Kuželová & Chytrý (2004) stated that the transferability of species groups formed at a regional scale is best in those areas with high habitat heterogeneity, which is congruent with our observation of better transferability of specialized groups. They also found better transferability of species groups that had a central position along the major environmental gradients existing at the regional scale. However, this is not the case in our study. For example, the PlanLanc group has low transferability although it occupies a central ecological position in our study. In contrast, Kuželová & Chytrý's (2004) result seems to apply to the GaleLute group, with a central position in forests in our study area. Considering all the factors that influence transferability of species groups across scales, the degree of correspondence across scales is remarkable, in particular in the bottom-up direction. With an average U of 0.74, about three-quarters of the discriminant capability of a species group at the broad scale can be predicted by fine-scale co-occurrences. This result was also obtained in the truncated data sets. A comparison of the driving factors with those of other studies is difficult as different studies use different extents (e.g. regional scale, see Heikkinen & Neuvonen, 1997; Chytrý et al., 1999; and national scale, see Hill, 1991) , varying resolution, and different species dataset structures with various environmental variables according to the character of the study area. In addition, often only selected species are analysed in such studies (Pedersen, 1990; Myklestad & Birks, 1993; Myklestad, 1993) . Moreover, the studies listed above ignore the spatial structure of grid data, which might be unimportant (Heikkinen & Birks, 1996; Heikkinen et al., 1998; Korvenpää et al., 2003; bird distribution data in Titeux et al., 2004) , but might be the main explanatory factor in the data set (bird and butterfly distribution data in Storch et al., 2003) . In our case, spatial autocorrelation explains a large proportion of the floristic variation in the BD. This could be connected with the rather fine resolution, which was not used in similar studies. The main correlating variables, altitude, geology and forest (tree) cover, are also considered to have a causal influence on plant distribution in meso-scale studies (Gaston, 2003; Pearson & Dawson, 2003) . Potential direct solar irradiation is used only occasionally (e.g. Hennenberg & Bruelheide, 2003) , as it is not easy to obtain precise data. The biases listed above are more severe when comparing ordination results across scales compared with species groups, as outliers have a strong influence. Thus, generalizations should be made only with caution. Nevertheless, the results obtained make sense: environmental factors that have a more direct (proximal) impact on plants, i.e. are closer to the effect on plants in the causal chain, are more important at the fine scale (e.g. PDSI), while more indirect (distal) factors have a greater importance at the broad scale (e.g. altitude). The statistical approach to defining species groups across scales might have wide applications in assessing biodiversity. If spatially stable species groups have been identified in areas with completed floristic or vegetation surveys, these groups can be used to complement corresponding vegetation or floristic data sets. Another application might be temporal prediction, which has not yet been tested. As vegetation surveys are repeated with less expenditure of time and money (e.g. monitoring in the Natura 2000 network) than complete floristic mappings, bottom-up predictions would be valuable. As Wilson et al. (2004) stated that biodiversity change (decline or increase rate) can be revealed from current species distribution data alone using spatial patterns of occupancy in combination with the size of the occupied area, spatially stable groups from vegetation data sets might be monitored for changes in time and then applied across various scales. Thus, we may detect a declining species group at the regional or national scale by using monitoring results from local scales. As there are already abundant data on the sociology of endangered species and vegetation plot data are increasingly collected for monitoring purposes, a scaling-up approach would offer a new approach to utilizing such data. Using species groups for risk assessment is another possible application. For this purpose, the predictive power of species groups for particular risks (e.g. land abandonment) would have to be assessed first. Then, various scales would have to be compared, as it has been shown that rates of decline are scaledependent (Hartley & Kunin, 2003) . The next step will be to repeat our approach in other areas. The most suitable areas will be those in which both floristic and vegetation surveys have already been undertaken, and preferably those with high biodiversity with a high number of different species groups. An example at the regional scale is the Křivoklátsko Biosphere Reserve (Czech Republic), where a floristic mapping in 1/100 CEBA grids and vegetation survey with more than 4000 relevés have already been performed (see Kolbek et al., 1999) . Similarly, in Mecklenburg-Western Pomerania (northern Germany), vegetation and a floristic data bases have been produced at the level of a federal state (Berg et al., 2004) . Unfortunately, this case is the exception rather than the rule in central Europe. At the national level there are many floristic distribution atlases using different resolutions (e.g. Universal Transverse Mercator UTM 10 · 10 km grid in Preston et al., 2002 ; Distribution Atlas of Vascular Plants in Poland ATPOL 10 · 10 km grid in Zając & Zając, 2001 ; CEBA 10¢ E · 6¢ N grid c. 133 km 2 in Haeupler & Schönfelder, 1989) . At this national scale, digital vegetation data bases are still lacking but are under preparation. If we Transferability of species groups across scales Journal of Biogeography 33, 1628-1642 apply the method at a continental scale, other available floristic data sets could be used (Meusel et al., 1965-92; Jalas & Suominen, 1972-94; Jalas et al., 1996 Jalas et al., , 1999 Kurtto et al., 2004) , even reaching the worldwide level (e.g. project WORLDMAP; Gaston, 2003) . The final aim would be to use vegetation or floristic data sets from all available scales to make predictions of species distribution at the global scale, but this goal is still far out of reach. Die Pflanzengesellschaften Mecklenburg-Vorpommerns und ihre Gefährdung -Textband Scale in macroecology CANOCO reference manual and CanoDraw for Windows. User's guide. Software for canonical community ordination Die Grünlandgesellschaften des Harzes und ihre Standortsbedingungen. Mit einem Beitrag zum Gliederungssystem auf der Basis von statistisch ermittelten Artengruppen A new measure of fidelity and its application to defining species groups Towards unification of national vegetation classifications: a comparison of two methods for analysis of large datasets Survey of limestone grasslands by statistically formed groups of differential species Phytosociological data give biased estimates of species richness Czech national phytosociological database: basic statistics of the available vegetation-plot data Diagnostic, constant and dominant species of vegetation classes and alliances of the Czech Republic: a statistical revision Phytogeographical boundary between the Pannonicum and Hercynicum: a multivariate analysis of landscape in the Podyjí/Thayatal National Park Katalog biotopů Č eské republiky. Interpretační příručka k evropským programům Natura 2000 a Smaragd Context-dependence of diagnostic species: a case study of the central European spruce forests Determination of diagnostic species with statistical fidelity measures DiGeM 2.0 Zeigerwerte von Pflanzen in Mitteleuropa Der Beitrag pflanzensoziologischer Datenbanken zur vegetationsökologischen Forschung Biodiversity. A biology of numbers and difference The structure and dynamics of geographic ranges The struggle for existence Atlas der Farn-und Blütenpflanzen der Bundesrepublik Deutschland Scale dependency of rarity, extinction risk, and conservation priority Spatial and environmental components of variation in the distribution patterns of subarctic plant species at Kevo, N Finland -a case study at the meso-scale level Species richness of vascular plants in the subarctic landscape of northern Finland: modelling relationships to environment A numerical analysis of the mesoscale distribution patterns of vascular plants in the subarctic Kevo Nature Reserve, northern Finland TURBOVEG, a comprehensive data base management system for vegetation data Ecological investigations on the northern distribution range of Hippocrepis comosa L. in Germany Dynamika invaze Orthodontium lineare Schwaegr. v Č echách Patterns of species distribution in Britain elucidated by canonical correspondence analysis -94) Atlas Florae Europaeae, Vols 1-10. The Committee for Mapping the Flora of Europe & Societas Biologica Fennica Vanamo The Committee for Mapping the Flora of Europe & Societas Biologica Fennica Vanamo The Committee for Mapping the Flora of Europe & Societas Biologica Fennica Vanamo, Helsinki. Jandt, U. (1999) Kalkmagerrasen am Südharzrand und im Kyffhäuser Květena Chráněné krajinné oblasti a Biosférické rezervace Křivoklátsko The relationship between local and regional species richness and spatial turnover A mesoscale analysis of floristic patterns in the south-west Finnish Archipelago Interspecific associations in phytosociological data sets: how do they change between local and regional scale? Plant Ecology Klíč ke květeně Č eské republiky Extrapolating species abundance across spatial scales The Committee for Mapping the Flora of Europe & Societas Biologica Fennica Vanamo The problem of pattern and scale in ecology Common data standards for recording relevés in field survey for vegetation classification The distribution of Salix species in Fennoscandia -a numerical analysis A numerical analysis of the distribution patterns of Salix species in Europe Predicting the impacts of climate change on the distribution of species: are bioclimate envelope models useful? Distributional patterns of vascular plants in Fennoscandia: a numerical approach Laubwaldgesellschaften im Harz. Gliederung, Ö kologie, Verbreitung. Archiv naturwissenschaftlicher Dissertationen New atlas of the British and Irish Flora. An atlas of the vascular plants of Britain, Ireland, the Isle of Man and the Channel Islands neue Entwicklungen in der Pflanzensoziologie. Berichte der Reinhold Tüxen Gesellschaft Flora von Coburg. Die Farn-und Blütenpflanzen des Coburger Landes Distribution patterns in butterflies and birds of the Czech Republic: separating effects of habitat and geographical position Zwergbinsen-Gesellschaften (Isoëto-Nanojuncetea) in Niedersachsen -Verbreitung, Gliederung, Dynamik JUICE, software for vegetation classification Multivariate analysis of a fine-scale breeding bird atlas using a geographical information system and partial Transferability of species groups across scales Spatial scaling in ecology Spatial patterns in species distributions reveal biodiversity change Atlas rozmieszczenia roślin naczyniowych w Polsce. Instytut Botaniki Uniwersytetu Jagielloń skiego Petr Petřík is a PhD student currently working on a biogeographical evaluation of grid mapping distribution data. His main research interests are numerical phytogeography, phytosociology (formalized classification) and field floristic surveys His most recent interests are ecoinformatics, in particular the development of methods that allow ecological data sets of different scopes, grains and extents to be combined We thank John Birks, Zuzana Münzbergová, David Storch and two anonymous referees for their valuable comments on this paper. Luboš Tichý and Jan Wild are acknowledged for their help with the data analyses. Tim C. G. Rich kindly improved our English. We are indebted to the late Tomáš Sýkora, and to Květa Morávková, Richard Višň ák and Jarmila Sýkorová for providing their vegetation samples. This research was partly supported by Biodiversity Research Center (grant no. LC06073 funded by Ministry of Education, Youth and Sports of the CR) and grant nos AV0Z60050516 and 206/03/H137 (GACR). P.P. is indebted to the Deutsche Bundesstiftung Umwelt (http://www.dbu.de) and to BioHab (http://www.biohab.alterra.nl) for financial support for his study visits. The following supplementary material is available for this article online from http://www.blackwell-synergy.com.Appendix S1. Environmental variables with brief descriptive statistics.