Fauna and flora now respond and have responded in the past to a more or less complex combination of environmental changes. A good knowledge of their modern ecology coupled to adequate mathematical methods permits a quantitative approach for reconstructing these changes from biological proxy data. Such reconstruction methods are often called "transfer functions", but this name must be used only figuratively because they are no longer based on the calibration of a function. These transfer functions may convert two types of proxies into environmental information: either assemblages (i.e. the relative abundance of a great number of species or more simply their presence/absence), or parameters related to the growth of selected individuals of a given species (e.g. tree-ring width, density, isotopic content ...). That type of approach needs a reference dataset containing the same type of proxies as in the fossil data associated with the environmental variables assumed to be the most explicative. I would like to review briefly--as much for the data provider as for the data user (e.g. climate modeller)--problems met in quantification, and to suggest possible solutions.
2. When the reference dataset is too limited, misinterpretations of the relationships between the environment versus the proxy are possible, e.g. an absence of both dry and cold samples in the reference dataset will lead to the systematic reconstruction of dry and warm conditions for glacial periods. More generally, if two climatic parameters are too strongly correlated in the reference dataset, the same correlation will be obtained in the reconstructions.
3. The reference dataset has often been built from various sources by different authors and is heterogeneous. This is the price that must be paid for a sufficiently diversified modern dataset. This heterogeneity can come from the chemical preparation of the samples, from the taxonomic identification (some are not recognized, some are misinterpreted), from the number of individuals counted...
4. In long-settled regions, the distribution of some proxy data has been modified by humans. This can be reflected in a modification of the assemblages or/and a modification of the growth of the individuals. Consequently, the statistical relationship between proxies and environment is disturbed, resulting in spurious recon- structions.
5. Biological data are often influenced by a complex combination of numerous biotic and abiotic variables, making the deconvolution of the different signals of interest extremely difficult. Taking climatic factors as an example, the presence of a species or the efficient growth of individuals will depend on the combined effects of temperature, water availability, and sunlight in a quantitative way; but it may also depend on the occurrence of extremes (frost, drought...).
6. Some species are ubiquitous or have a large environmental tolerance. Frequently, the species is not differentiated by the analyst, so that a taxon covers several species with very different ecological behaviour. All that restricts the potential of the reconstructions. Nevertheless the assemblage itself sometimes permits an indirect access to the species.
7. The low-resolution reconstructions are based on environmental "normals" calculated, for example, on a 30-year period. The variability around these normals is sometimes high, and this illustrates the ability of the biological organisms studied to adapt to a more or less large range. This variability is certainly a lower limit to the error bar associated with these reconstructions..
8. All the proxies have a more or less high inertia. They usually integrate the environmental conditions of the past. This shows up as statistical autocorrelation in the proxy time-series. Sometimes, that inertia is too great to permit adequate reconstruction during phases of rapid transition.
9. As in all paleodata study, the time scale is crucial. Dating techniques are now rapidly evolving. Results which were presented in a particular time scale (sometimes with no precise indications on the age model used) may be nearly worthless when the time scale itself is revised.
Qualitative methods may solve some of these problems, because they are more intuitive and are not constrained to use a predefined shape for the proxy-environment relationship. Such methods can qualitatively manage the problems of thresholds. The counterpart is that non- constrained intuition may not always set reasonable limits and often do not convince the user of the results (e.g. the climate modeller). Quantitative information, even if associated with a large error bar, is much more useful because the large error bar reflects the limits of what can be extracted from the data. In fact, the same limits also affect qualitative results, but they are often hidden for a period of time.
Problems 1 and 2 are linked to the diversity of the modern dataset. They can be solved by gathering all the modern samples available and then by collaboration between scientists. There still remains the problem of heterogeneity of the dataset (problem 3) which can be solved by carefully checking the data. Some authors prefer to use only their own data or data only analysed using the same protocol. But the consequence can be the loss of diversity. Statistical methods exist to smooth the samples according to the environmental variables to be reconstructed (e.g. response surface method). Perhaps another solution to the heterogeneity problem is that the effort of creating databases results in better collaboration between scientists, which in turn provides the opportunity for technical and taxonomical discussion.
Some mathematical methods are more affected by the biases present in the modern data set. These are regression-type methods or extrapolative methods. Coefficients are calibrated on the modern data, and they can be strongly influenced by biases. These kinds of methods generally are practicable for extrapolation back to recent time (the last millennia) and particularly for tree-ring data. When they are applied to older fossil data, possibly in no-analogue situations, the predictions can be completely unrealistic. We prefer interpolative methods, based on the research of analogues (others are neural networks...), which are unable to provide non-existing environmental estimates but in consequence can provide underestimated predictions. The extrapolative methods, amplifying the defects of the reference data, will often be unable to deconvolve signals too strongly correlated (problem 3), while the interpolative ones are able to take profit of the few samples inconsistent with the general trend.
The surface samples strongly disturbed by humans (problem 4) are easily recognized and discarded. When this disturbance is weaker, a statistical analysis can be used to smooth out the human action from the modern samples. All calibration on modern data can be then biased by this disturbance: a way to avoid that is to use an inverse step: environmental information present is extracted from the paleodata and interpreted by comparison to the modern data.
A major point in all reconstruction ventures is to give a clear idea of the errors associated with the results. For high-resolution data, such as tree-rings, it is often possible to keep a test-sample for independent verification. Jackknife or bootstrap methods are appropriate tools to evaluate the error bars. More recent methods based on "Fuzzy logic" techniques take profit of the uncertainty associated with the data. Nevertheless, this verification is not sufficient for low-resolution data where analogues are frequently taken in distant geographical regions. In this case the environmental history cannot be taken into account. Only a large amount of diversified reconstruction using several types of proxies will definitely validate the results. Therefore, we do need large databases and particularly multi-proxy databases.
In the last decades, proxy data have been converted into conventional climatic variables such as temperature or precipitation, to facilitate the comparison with output from models. Now ecological models are available, either at a global scale or even a regional or local scale, to use this output and produce more biological entities. They are the result of a complex combination of environmental variables, and they can be more directly compared to the proxy data. This fairly natural "forward approach" solves the problems of signal deconvolution (problem 5). Examples are comparison of pollen data with the biomes issued from the biome model of Prentice, comparison of the well-dated pollen diagram with the predicted vegetation succession (as predicted by a gap model of Shugart-type), or comparison of tree-ring data with tree-growth predicted by a cambial model (of Vaganov-Fritts type). A great effort must now be done on the most appropriate way to produce proxy data compatible with the biological models output, e.g. to convert pollen data into biomes.
The recent use of the forward approach is not a brutal rejection of the former "inverse approach". On the contrary, we have new tools to enhance the environmental signal present in the proxy data. The key to the method is the constraint using a qualitative or quantitative independent variable. The advantage of proxies like pollen, tree-rings or foraminifera is that widespread measurements are available, but it is sometimes difficult to isolate the desired environmental signal. The constraint is, in fact, a decoder. The method is as follows: given a fossil assemblage, all the modern analogues of the fossil assemblage are determined at a certain degree of similarity, but finally only those satisfying a particular condition are kept. The condition can be established by an indicator coming from the same proxy dataset (e.g. rare taxa) or from other proxies (multiproxy approach). Examples are the following: (1) Often modern pollen analogues for glacial periods come from tundra, cool or even warm steppes; if some rare taxon proves the incompatibility of steppes, only tundra analogues are kept. (2) Absolute frequencies showing a low pollen productivity; only analogues with extreme climate can be taken. (3) Modern analogues come from a large variety of loca- tions; we may restrict the choice to those having a close value of a given geochemical parameter (carbon content, delta18O...). (4) Lake-level data can be used to select modern pollen analogues indicating precipitation values compatible with those of close lakes. (5) In periods of rapid climatic transition, biological organisms with a rapid response (insects) are useful to correct results coming from slower organisms (vegetation). (6) Discontinuous historical documents can be also used to add precision to the climatic signal driving tree-rings.
The rapid evolution of dating systems impose some caution. All the environmental reconstruction techniques based on time-series do not need any precise time scale. It is then possible to obtain the reconstructions as a function of the depth. The conventional 14C time-scale can be associated at the end of the process, permitting a later calibration based on the last improvement of the age scale. The problem is different for time-slice reconstructions, for which it is necessary to adopt a 14C age model but also use the stratigraphical correlation of close sites. However there is the danger of incorporating errors based on reckless hypotheses of synchronism of some minor events over large geographical regions.
Multi-proxy reconstructions are able to provide solutions to most of the quoted problems. In particular qualitative indicators are useful to improve the precision of some ambiguous reconstructions obtained automatically from large datasets. The multi-proxy approach also is certainly necessary for solving the problem of human action and data inertia. The databases must also be used in a forward way for comparison with biological objects produced by ecological models coupled to climatic models. As an example, we have to develop tools to deduce biomes from pollen.